Introduction to OpenAI

Vision

Project 2 Image Captioning

In this tutorial, you’ll build a Python script that takes an image URL and generates a descriptive caption using GPT-4’s vision capabilities. Instead of DALL·E, we’ll use the GPT-4 chat completion endpoint, which can process image URLs directly and describe what it “sees.”

Table of Contents

  1. Prerequisites
  2. Installation
  3. Initialize the OpenAI Client
  4. Define the Image URL
  5. Generate Captions Function
  6. Run the Script
  7. Sample Output
  8. References

Prerequisites

  • Python 3.7+
  • pip package manager
  • An OpenAI API key with GPT-4 access
  • Internet connectivity to fetch the image

Warning

Never hard-code your API key in a public repository. Use environment variables or a secure vault.


Installation

Install the official OpenAI Python client:

pip install openai

Initialize the OpenAI Client

Import and initialize the client with your API key:

from openai import OpenAI

# Initialize the client with your API key
client = OpenAI(api_key="sk-your-api-key-here")

Note

You can find the latest OpenAI Python SDK and examples in the openai-python GitHub repo.


Define the Image URL

Specify the publicly accessible image URL you want to caption:

# Image URL to caption
image_url = "https://assets-prd.ignimgs.com/2022/06/10/netflix-one-piece-1654901410673.jpg"

Generate Captions Function

Create a helper function that sends a chat completion request to GPT-4, including both a text prompt and the image URL. We’ll cap the response at 125 tokens to keep captions concise.

from typing import Dict

def generate_captions(image_url: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "What is this image?"},
                    {"type": "image_url", "image_url": {"url": image_url}}
                ]
            }
        ],
        max_tokens=125
    )
    # Extract the generated caption
    return response.choices[0].message.content

Run the Script

Use the function and print the returned caption:

if __name__ == "__main__":
    caption = generate_captions(image_url)
    print(caption)

Sample Output

This image features characters from the anime and manga "One Piece." In the center is Monkey D. Luffy wearing his trademark straw hat. To his left stands Sanji with blond hair, and to his right is Nami, recognizable by her orange hair. They are members of the Straw Hat Pirates.

References

Watch Video

Watch video content

Practice Lab

Practice lab

Previous
Project 1 Image Generator