In this tutorial, you’ll build a speech-to-text application using OpenAI’s Whisper model. You’ll learn how to record or supply an audio file, send it to the OpenAI API, and receive accurate transcriptions in seconds. This guide is ideal for Python developers looking to integrate speech recognition into their projects.Documentation Index
Fetch the complete documentation index at: https://notes.kodekloud.com/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
Before you begin, make sure you have the following:| Requirement | Description | Example |
|---|---|---|
| Python | Version 3.7 or newer | python --version |
| OpenAI API Key | Access for Whisper transcriptions | Get your key |
| Audio File | Local MP3, WAV, or OGG file (10–20 seconds) | ~/recordings/voice_sample.mp3 |
Whisper supports most common audio formats, including
mp3, wav, ogg, and m4a. Ensure your file is clear and has minimal background noise for best results.1. Install the OpenAI Python Client
First, install the OpenAI SDK using pip:2. Prepare Your Python Script
Create a file namedspeech_to_text.py and add the following code. It initializes the OpenAI client, reads your local audio file in binary mode, and sends it to Whisper for transcription.
- Replace
"/full/path/to/your/audio.mp3"with your file’s actual path. - Set the
OPENAI_API_KEYenvironment variable instead of hardcoding sensitive credentials.
Never commit your API key to version control. Use environment variables or a secrets manager to keep your key secure.