Mastering Generative AI with OpenAI

Audio Transcription Translation

Demo Audio Translation

In this tutorial, we’ll demonstrate how to translate a short Spanish audio clip into English text using OpenAI’s Whisper API. We'll process a 20-second MP3 segment (up to 25 MB) extracted from an Easy Spanish YouTube video and send it to the API in one request.

Prerequisites

  • Python 3.7+
  • openai Python SDK
  • An OpenAI API key

Install the SDK with:

pip install --upgrade openai

Note

Ensure your MP3 file is under 25 MB. Whisper supports formats like MP3, WAV, and FLAC.

Translation Code Example

import os
import openai
import IPython.display as ipd

# 1. Configure API key
openai.api_key = os.getenv("OPENAI_API_KEY")

# 2. Load and play the Spanish audio clip
file_name = "data/Spanish.mp3"
audio_file = open(file_name, "rb")
ipd.display(ipd.Audio(file_name))

# 3. Call Whisper for translation
result = openai.Audio.translate("whisper-1", audio_file)

# 4. Output the English translation
print(result.text)

Step-by-Step Breakdown

StepActionCode Snippet
1Configure the OpenAI API keyopenai.api_key = os.getenv("OPENAI_API_KEY")
2Load and display the MP3 clip inlineipd.display(ipd.Audio(file_name))
3Translate audio using whisper-1openai.Audio.translate("whisper-1", audio_file)
4Print the translated English textprint(result.text)

Warning

Keep your API key secure. Do not hard-code it in public repositories.

Next Steps

Once you have the translated text, you can pass it to GPT-4 (or any other LLM) for further processing—such as summarization, sentiment analysis, or content moderation.

Watch Video

Watch video content

Practice Lab

Practice lab

Previous
Demo Audio Transcription