KodeKloud Notes

In this tutorial, you’ll build a speech-to-text application using OpenAI’s Whisper model. You’ll learn how to record or supply an audio file, send it to the OpenAI API, and receive accurate transcriptions in seconds. This guide is ideal for Python developers looking to integrate speech recognition into their projects.

Prerequisites

Before you begin, make sure you have the following:

Requirement	Description	Example
Python	Version 3.7 or newer	`python --version`
OpenAI API Key	Access for Whisper transcriptions	Get your key
Audio File	Local MP3, WAV, or OGG file (10–20 seconds)	`~/recordings/voice_sample.mp3`

Note

Whisper supports most common audio formats, including mp3, wav, ogg, and m4a. Ensure your file is clear and has minimal background noise for best results.

1. Install the OpenAI Python Client

First, install the OpenAI SDK using pip:

pip install openai

If you’re working in a virtual environment, activate it before running the command.

2. Prepare Your Python Script

Create a file named speech_to_text.py and add the following code. It initializes the OpenAI client, reads your local audio file in binary mode, and sends it to Whisper for transcription.

import os
from openai import OpenAI

# Best practice: Store your API key in an environment variable
api_key = os.getenv("OPENAI_API_KEY")  # e.g., export OPENAI_API_KEY="sk-..."

client = OpenAI(api_key=api_key)

# Open the audio file in binary mode
file_path = "/full/path/to/your/audio.mp3"
with open(file_path, "rb") as audio_file:
    transcription = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file
    )

print("Transcribed Text:")
print(transcription.text)

Be sure to:

Replace "/full/path/to/your/audio.mp3" with your file’s actual path.
Set the OPENAI_API_KEY environment variable instead of hardcoding sensitive credentials.

Warning

Never commit your API key to version control. Use environment variables or a secrets manager to keep your key secure.

3. Run the Script

Execute the script in your terminal:

python speech_to_text.py

You should see output similar to:

Transcribed Text:
Here is an example of me talking.

Congratulations! You’ve successfully converted speech from an audio file into text using OpenAI’s Whisper model.

Speech to Text

Prerequisites

1. Install the OpenAI Python Client

2. Prepare Your Python Script

3. Run the Script

References & Further Reading

Watch Video

Practice Lab