KodeKloud Notes

Build a seamless text-to-speech pipeline in Python by combining OpenAI’s Chat API with Google’s gTTS library. Generate natural language responses from an LLM and have them spoken aloud automatically.

Prerequisites

1. Install Dependencies

Package	Purpose	Install Command
gTTS	Google Text-to-Speech Python client	`pip install gTTS`
OpenAI Python client	Official OpenAI API SDK	`pip install openai`
Audio playback utility	Play MP3 files (macOS: `afplay`; Linux: `mpg123` or `mpg321`)	`brew install mpg123` or `sudo apt install mpg123`

Note

This example is tested on Python 3.7+. If you use a different version, adjust commands as needed.

2. Set Your OpenAI API Key

export OPENAI_API_KEY="your_openai_api_key"

Warning

Never commit your API key to public repositories. Use a secure vault or environment manager in production.

Imports and Client Initialization

Begin by importing standard libraries, gTTS, and initializing the OpenAI client:

import os
from gtts import gTTS
from openai import OpenAI

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

1. Define the Prompt

Decide what you want the model to say. For example:

prompt = "Tell me a story about a brave knight who loves basketball."

2. Text-to-Speech Function

Convert text to speech and play the resulting MP3:

def text_to_speech(text: str, lang: str = "en", slow: bool = False) -> None:
    """
    Generate speech from text using gTTS, save as MP3, and play it.
    """
    tts = gTTS(text=text, lang=lang, slow=slow)
    filename = "tts_output.mp3"
    tts.save(filename)
    # macOS uses 'afplay'; Linux users can install 'mpg123' or 'mpg321'
    os.system(f"afplay {filename}")

Warning

Adjust the playback command (afplay, mpg123, or mpg321) based on your operating system.

3. Generate Text from OpenAI

Send the prompt to the Chat API and retrieve the response:

def generate_text(
    prompt: str,
    model: str = "gpt-3.5-turbo",
    temperature: float = 0.8,
    max_tokens: int = 150
) -> str:
    """
    Generate a chat completion for the given prompt.
    """
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=temperature,
        max_tokens=max_tokens
    )
    return response.choices[0].message.content

4. Combine Generation and Speech

Create a helper that prints the generated text, then speaks it:

def gen_and_speak(prompt: str) -> None:
    """
    Generate text from the prompt, display it, and play the speech.
    """
    text = generate_text(prompt)
    print("Generated Text:\n")
    print(text, "\n")
    text_to_speech(text)

5. Entry Point

Run the full pipeline with your defined prompt:

if __name__ == "__main__":
    gen_and_speak(prompt)

Example Console Output

$ python3 text_to_speech_pipeline.py
Generated Text:

Once upon a time in the kingdom of Eldoria, there lived a brave knight named Sir Cedric...
# (Audio will start playing automatically)

Next Steps & Extensions

Support multiple languages by changing lang in text_to_speech().
Experiment with different voices and speech parameters.
Integrate other audio libraries like pydub or playsound for advanced playback.

Links and References

Watch Video

Watch video content