Key Capabilities
| Capability | Description |
|---|---|
| Transcription | Convert spoken content from supported formats (e.g., MP3, WAV, FLAC) into written text. |
| Translation | Transcribe audio recorded in a foreign language and translate the resulting text into English. |
Whisper supports multiple audio formats and language pairs out of the box. For optimal accuracy, use audio files encoded at 16 kHz.
- Uploading an audio file to the Whisper API
- Invoking the transcription endpoint
- Processing and analyzing the transcription result
- Translating non-English audio into English