Demo Automatic Language Recognition and Translation

Welcome back. In this lesson we’ll build an asynchronous audio-to-insight pipeline that:

Accepts a WAV audio file
Transcribes the audio
Detects the spoken language and a prevailing emotion/tone
Translates the transcription into English
Generates a concise summary and a suggested title

This design separates each step into small, composable async functions so you can reuse or replace individual components (for example swapping models or custom agents).

Store your OpenAI API key in a .env file (for example OPENAI_API_KEY=<your_key>). This lesson will load environment variables via python-dotenv.

Quick overview — Pipeline steps and the corresponding functions:

Step	Purpose	Function
1	Transcribe WAV audio to text	`transcribe_audio`
2	Detect language and one-word emotional tone	`analyze_language_and_emotion`
3	Translate text to English	`translate_text`
4	Produce suggested title and short summary	`generate_title_and_summary`

Setup and imports

Load environment variables, initialize the OpenAI client, and import utilities. This example uses the modern OpenAI Python client (OpenAI()), plus an assumed agents package providing Agent and Runner as used in the original material.

from dotenv import load_dotenv
import os
from pathlib import Path
import asyncio
import re

# OpenAI Python client
from openai import OpenAI

# Optional display in notebooks
from IPython.display import Image, display

# Agent & Runner (kept as in the original content)
from agents import Agent, Runner

# Load environment variables
load_dotenv()

# Initialize OpenAI client
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

Be mindful of API usage and costs when using large models like gpt-4 and uploading audio files. Use lower-cost models for development and testing if desired.

Transcription (Whisper)

We create an async helper to upload a WAV file to the Whisper transcription model and return the transcription text. The function validates the path and handles common return shapes from the client.

async def transcribe_audio(file_path: str) -> str:
    """
    Transcribe a WAV file using the Whisper model and return the transcription text.
    """
    # Ensure the file exists
    path = Path(file_path)
    if not path.exists():
        raise FileNotFoundError(f"Audio file not found: {file_path}")

    with open(file_path, "rb") as audio_file:
        transcript = client.audio.transcriptions.create(
            model="whisper-1",
            file=audio_file
        )

    # The client returns an object with a text attribute for the transcript
    return getattr(transcript, "text", transcript.get("text") if isinstance(transcript, dict) else None)

Reference: Whisper docs — https://platform.openai.com/docs/models/whisper-1

Language and Emotion Analysis

Use a chat model to detect the language and provide a one-word emotional descriptor. The function uses a deterministic temperature (0.3) and extracts values using tolerant regular expressions to handle slightly varied replies.

async def analyze_language_and_emotion(text: str) -> dict:
    """
    Ask a chat model to detect the language and a one-word emotional tone for the given text.
    Returns: {"language": "<language>", "emotion": "<emotion>"}
    """
    system_msg = (
        "You're an AI that analyzes messages. Detect the language (e.g., English, French) "
        "and describe the emotional tone in one word (e.g., joyful, sad, angry, professional, excited, persuasive). "
        "Respond in the format:\nLanguage: <language>\nEmotion: <emotion>"
    )

    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": system_msg},
            {"role": "user", "content": f"Here is the message:\n{text}"}
        ],
        temperature=0.3
    )

    content = response.choices[0].message.content.strip()

    # Tolerant regex to capture "Language: ..." and "Emotion: ..." (allow multi-word and punctuation)
    language_match = re.search(r"(?i)^\s*Language[:\-\s]*([^\r\n]+)", content, re.MULTILINE)
    emotion_match = re.search(r"(?i)^\s*Emotion[:\-\s]*([^\r\n]+)", content, re.MULTILINE)

    return {
        "language": language_match.group(1).strip() if language_match else "Unknown",
        "emotion": emotion_match.group(1).strip() if emotion_match else "Unknown"
    }

Note: temperature is set to 0.3 to favor more deterministic outputs, which helps reliable parsing of the model response.

Translator Agent and translate_text

This example uses a simple Agent to translate text into English and a Runner to execute it. The Agent/Runner implementation is assumed from the original content; if your agents package returns different shapes, adapt the result extraction accordingly.

translator_agent = Agent(
    name="Translator",
    instructions="Translate the input text into English. Only return the translated result."
)

async def translate_text(text: str) -> str:
    """
    Use the Agent Runner to translate text into English.
    Returns the final translated string returned by the agent.
    """
    result = await Runner.run(translator_agent, input=text)
    # Handle common return shapes: string, object with attribute, or dict
    if isinstance(result, str):
        return result
    return getattr(result, "final_output", result.get("final_output") if isinstance(result, dict) else None)

If your agents/runner implementation differs, adapt the return extraction accordingly.

Title and Summary Generation

Ask a chat model to provide a concise summary and a suggested title. Temperature is slightly higher for creativity (0.5).

async def generate_title_and_summary(text: str) -> str:
    """
    Generate a concise summary and a suggested title for the given text.
    Returns a string containing both title and summary.
    """
    system_msg = "You are a helpful AI assistant. Summarize the user's message and suggest a title for it."

    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": system_msg},
            {"role": "user", "content": f"Here's the text:\n\n{text}"}
        ],
        temperature=0.5
    )

    return response.choices[0].message.content.strip()

Tip: If you prefer structured outputs (e.g., JSON with title and summary), ask the model to respond in JSON and parse the result. For simple display, the free-text response above often suffices.

Full pipeline: process_audio_translation

This orchestrator composes the previous functions into a complete asynchronous flow. Each step’s output is printed; you can replace prints with logging, storage, or event emissions for production usage.

async def process_audio_translation(file_path: str):
    """
    Full pipeline for an audio file:
      1. Transcribe audio
      2. Analyze language and emotion
      3. Translate into English
      4. Generate title and summary
    Prints each result step to the console.
    """
    # 1) Transcribe
    transcript = await transcribe_audio(file_path)
    print(f"Transcript:\n{transcript}\n")

    # 2) Language & emotion analysis
    analysis = await analyze_language_and_emotion(transcript)
    print(f"Detected language: {analysis['language']}")
    print(f"Detected emotion: {analysis['emotion']}\n")

    # 3) Translate to English
    translation = await translate_text(transcript)
    print(f"Translation:\n{translation}\n")

    # 4) Title & summary
    extras = await generate_title_and_summary(translation)
    print(f"Title and Summary:\n{extras}\n")

Run the pipeline

Pass in the full path to your WAV file. In Jupyter or other async-capable REPLs you can await the function directly.

# Replace with the path to your WAV file
audio_path = "/Users/gavinridgeway/Documents/Anaconda/AiAgent/final_fixed.wav"

await process_audio_translation(audio_path)

If running from a standard Python script, wrap the call in asyncio:

if __name__ == "__main__":
    audio_path = "/path/to/your/file.wav"
    asyncio.run(process_audio_translation(audio_path))

Troubleshooting common issues

File not found: ensure the file_path is correct and accessible by your process.
UnboundLocalError or NameError: double-check variable names and that you return the expected attributes (for example result.final_output).
API key errors: confirm OPENAI_API_KEY is set and loaded via load_dotenv() or environment variables.
Agent/Runner differences: the agents package usage (Agent, Runner) is retained from the original content — adapt Runner.run() and result access if your agents library returns different shapes.
Unexpected model output format: prefer instructing the model to respond in a strict format (for example Language: <language>\nEmotion: <emotion> or JSON), then validate with regex or a JSON parser.

Example output (expected)

After running on a French sample, the pipeline prints something like:

Transcript: “Apprendre à programmer, c’est comme avoir un super-pouvoir…”
Detected language: French
Detected emotion: Encouraging
Translation: “Learning to program is like having a superpower…”
Title and Summary: (a short summary and a suggested title)

You now have a working asynchronous pipeline that transcribes audio, detects language and emotion, translates into English, and generates a title plus a short summary.

Links and References

OpenAI API docs
Whisper model: https://platform.openai.com/docs/models/whisper-1
GPT models: https://platform.openai.com/docs/models
python-dotenv
Jupyter: https://jupyter.org

If you want to extend this pipeline: consider adding speaker diarization, punctuation normalization, or persisting outputs to a database for downstream search and analytics.

Introduction

Prerequisites

Agent Architecture & Multi-Agent Systems

Building AI Agents

API Integrations & Tools

Practical Projects

Advanced Agents Projects

Demo Automatic Language Recognition and Translation

Setup and imports

Transcription (Whisper)

Language and Emotion Analysis

Translator Agent and translate_text

Title and Summary Generation

Full pipeline: process_audio_translation

Run the pipeline

Troubleshooting common issues

Example output (expected)

Links and References

Watch Video

Practice Lab

​Setup and imports

​Transcription (Whisper)

​Language and Emotion Analysis

​Translator Agent and translate_text

​Title and Summary Generation

​Full pipeline: process_audio_translation

​Run the pipeline

​Troubleshooting common issues

​Example output (expected)

​Links and References

Watch Video

Practice Lab

Setup and imports

Transcription (Whisper)

Language and Emotion Analysis

Translator Agent and translate_text

Title and Summary Generation

Full pipeline: process_audio_translation

Run the pipeline

Troubleshooting common issues

Example output (expected)

Links and References