KodeKloud Notes

In this tutorial, we’ll demonstrate how to translate a short Spanish audio clip into English text using OpenAI’s Whisper API. We'll process a 20-second MP3 segment (up to 25 MB) extracted from an Easy Spanish YouTube video and send it to the API in one request.

Prerequisites

Python 3.7+
openai Python SDK
An OpenAI API key

Install the SDK with:

pip install --upgrade openai

Note

Ensure your MP3 file is under 25 MB. Whisper supports formats like MP3, WAV, and FLAC.

Translation Code Example

import os
import openai
import IPython.display as ipd

# 1. Configure API key
openai.api_key = os.getenv("OPENAI_API_KEY")

# 2. Load and play the Spanish audio clip
file_name = "data/Spanish.mp3"
audio_file = open(file_name, "rb")
ipd.display(ipd.Audio(file_name))

# 3. Call Whisper for translation
result = openai.Audio.translate("whisper-1", audio_file)

# 4. Output the English translation
print(result.text)

Step-by-Step Breakdown

Step	Action	Code Snippet
1	Configure the OpenAI API key	`openai.api_key = os.getenv("OPENAI_API_KEY")`
2	Load and display the MP3 clip inline	`ipd.display(ipd.Audio(file_name))`
3	Translate audio using `whisper-1`	`openai.Audio.translate("whisper-1", audio_file)`
4	Print the translated English text	`print(result.text)`

Warning

Keep your API key secure. Do not hard-code it in public repositories.

Next Steps

Once you have the translated text, you can pass it to GPT-4 (or any other LLM) for further processing—such as summarization, sentiment analysis, or content moderation.

Demo Audio Translation

Prerequisites

Translation Code Example

Step-by-Step Breakdown

Next Steps

Links and References

Watch Video

Practice Lab