KodeKloud Notes

In this tutorial, you’ll build a Python script that takes an image URL and generates a descriptive caption using GPT-4’s vision capabilities. Instead of DALL·E, we’ll use the GPT-4 chat completion endpoint, which can process image URLs directly and describe what it “sees.”

Prerequisites
Installation
Initialize the OpenAI Client
Define the Image URL
Generate Captions Function
Run the Script
Sample Output
References

Prerequisites

Python 3.7+
pip package manager
An OpenAI API key with GPT-4 access
Internet connectivity to fetch the image

Warning

Never hard-code your API key in a public repository. Use environment variables or a secure vault.

Installation

Install the official OpenAI Python client:

pip install openai

Initialize the OpenAI Client

Import and initialize the client with your API key:

from openai import OpenAI

# Initialize the client with your API key
client = OpenAI(api_key="sk-your-api-key-here")

Note

You can find the latest OpenAI Python SDK and examples in the openai-python GitHub repo.

Define the Image URL

Specify the publicly accessible image URL you want to caption:

# Image URL to caption
image_url = "https://assets-prd.ignimgs.com/2022/06/10/netflix-one-piece-1654901410673.jpg"

Generate Captions Function

Create a helper function that sends a chat completion request to GPT-4, including both a text prompt and the image URL. We’ll cap the response at 125 tokens to keep captions concise.

from typing import Dict

def generate_captions(image_url: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "What is this image?"},
                    {"type": "image_url", "image_url": {"url": image_url}}
                ]
            }
        ],
        max_tokens=125
    )
    # Extract the generated caption
    return response.choices[0].message.content

Run the Script

Use the function and print the returned caption:

if __name__ == "__main__":
    caption = generate_captions(image_url)
    print(caption)

Sample Output

This image features characters from the anime and manga "One Piece." In the center is Monkey D. Luffy wearing his trademark straw hat. To his left stands Sanji with blond hair, and to his right is Nami, recognizable by her orange hair. They are members of the Straw Hat Pirates.

Project 2 Image Captioning

Table of Contents