> ## Documentation Index
> Fetch the complete documentation index at: https://notes.kodekloud.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Module Introduction

> Overview of speech recognition, translation, and synthesis, teaching setup, SSML customization, and practical integration for building voice enabled multilingual and accessible applications.

Welcome to the Speech Recognition, Translation, and Synthesis module. In this lesson you'll learn how machines listen to spoken language, convert it to text (speech-to-text), translate that text between languages, and generate natural-sounding voice from text (text-to-speech). These building blocks power voice assistants, real-time translation tools, accessibility features, and conversational AI services.

This module focuses on practical setup and integration of Speech Services, techniques for accurate speech recognition, and approaches to customize synthetic voices using Speech Synthesis Markup Language (SSML). You’ll get hands-on knowledge useful for building voice-enabled apps, multilingual experiences, and accessible interfaces.

Learning objectives

By the end of this module you will be able to:

1. Provision and configure the Speech Service required for recognition and synthesis.
2. Implement speech recognition pipelines to reliably convert spoken language to text.
3. Enable speech synthesis to convert text back into natural-sounding audio.
4. Customize audio output by selecting voice, adjusting style, pitch, and speaking rate.
5. Use Speech Synthesis Markup Language (SSML) to fine-tune prosody, pronunciation, and audio effects.

<Frame>
  <img src="https://mintcdn.com/kodekloud-c4ac6d9a/MVK09m96KxI8SuM5/images/AI-102-Microsoft-Certified-Azure-AI-Engineer-Associate/Speech-Recognition-Translation-and-Synthesis/Module-Introduction/learning-objectives-speech-recognition-synthesis-ssml.jpg?fit=max&auto=format&n=MVK09m96KxI8SuM5&q=85&s=d717c6880c644766bfd89e3175fcbf3d" alt="A presentation slide titled &#x22;Learning Objectives&#x22; showing five numbered items about speech technology: setting up a speech service, implementing speech recognition, enabling speech synthesis, customizing audio output, and leveraging Speech Synthesis Markup Language (SSML)." width="1920" height="1080" data-path="images/AI-102-Microsoft-Certified-Azure-AI-Engineer-Associate/Speech-Recognition-Translation-and-Synthesis/Module-Introduction/learning-objectives-speech-recognition-synthesis-ssml.jpg" />
</Frame>

<Callout icon="lightbulb" color="#1CB2FE">
  Before you begin: make sure you have access to a Speech Service (or equivalent provider), an API key and endpoint, and sample audio or a microphone for testing. Familiarity with basic REST or SDK usage in your preferred language (Python, C#, or JavaScript) will help you follow the hands-on examples in later lessons.
</Callout>

Core capabilities overview

|                  Capability | Primary use cases                                              | Quick reference                                                                                                              |
| --------------------------: | -------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
|    Speech Recognition (STT) | Voice commands, meeting transcriptions, accessibility captions | [Speech-to-Text docs](https://learn.microsoft.com/azure/cognitive-services/speech-service/overview)                          |
| Translation (Text & Speech) | Real-time multilingual chat, interpreter apps                  | [Speech Translation docs](https://learn.microsoft.com/azure/cognitive-services/speech-service/how-to-use-speech-translation) |
|      Speech Synthesis (TTS) | Voice assistants, narrated content, accessibility              | [Text-to-Speech docs](https://learn.microsoft.com/azure/cognitive-services/speech-service/overview-text-to-speech)           |
|                        SSML | Control prosody, pronunciation, and audio events in TTS        | [SSML reference](https://learn.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup)                |

Recommended next steps

* Review the Speech Service quickstarts for your language of choice.
* Gather sample audio (or prepare a microphone) and target languages for translation tests.
* Skim the SSML reference to understand tags for voice, rate, pitch, and breaks.

References

* [Azure Speech Service documentation](https://learn.microsoft.com/azure/cognitive-services/speech-service/)
* [SSML for Speech Synthesis](https://learn.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup)

<CardGroup>
  <Card title="Watch Video" icon="video" cta="Learn more" href="https://learn.kodekloud.com/user/courses/ai-102-microsoft-certified-azure-ai-engineer-associate/module/188c2a25-9d63-45b4-b934-33ab2d412470/lesson/a70a4577-f2be-4808-aa40-899e95032a3a" />
</CardGroup>
