Introduction to OpenAI
Text Generation
Speech to Text
In this tutorial, you’ll build a speech-to-text application using OpenAI’s Whisper model. You’ll learn how to record or supply an audio file, send it to the OpenAI API, and receive accurate transcriptions in seconds. This guide is ideal for Python developers looking to integrate speech recognition into their projects.
Prerequisites
Before you begin, make sure you have the following:
Requirement | Description | Example |
---|---|---|
Python | Version 3.7 or newer | python --version |
OpenAI API Key | Access for Whisper transcriptions | Get your key |
Audio File | Local MP3, WAV, or OGG file (10–20 seconds) | ~/recordings/voice_sample.mp3 |
Note
Whisper supports most common audio formats, including mp3
, wav
, ogg
, and m4a
. Ensure your file is clear and has minimal background noise for best results.
1. Install the OpenAI Python Client
First, install the OpenAI SDK using pip:
pip install openai
If you’re working in a virtual environment, activate it before running the command.
2. Prepare Your Python Script
Create a file named speech_to_text.py
and add the following code. It initializes the OpenAI client, reads your local audio file in binary mode, and sends it to Whisper for transcription.
import os
from openai import OpenAI
# Best practice: Store your API key in an environment variable
api_key = os.getenv("OPENAI_API_KEY") # e.g., export OPENAI_API_KEY="sk-..."
client = OpenAI(api_key=api_key)
# Open the audio file in binary mode
file_path = "/full/path/to/your/audio.mp3"
with open(file_path, "rb") as audio_file:
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)
print("Transcribed Text:")
print(transcription.text)
Be sure to:
- Replace
"/full/path/to/your/audio.mp3"
with your file’s actual path. - Set the
OPENAI_API_KEY
environment variable instead of hardcoding sensitive credentials.
Warning
Never commit your API key to version control. Use environment variables or a secrets manager to keep your key secure.
3. Run the Script
Execute the script in your terminal:
python speech_to_text.py
You should see output similar to:
Transcribed Text:
Here is an example of me talking.
Congratulations! You’ve successfully converted speech from an audio file into text using OpenAI’s Whisper model.
References & Further Reading
Watch Video
Watch video content
Practice Lab
Practice lab