Mastering Generative AI with OpenAI

Getting Started with OpenAI

OpenAI Foundation Models

OpenAI’s foundation models form the backbone of modern generative AI, powering text, image, and audio applications. In this lesson, we explore the three flagship models available via the OpenAI API:

  • GPT: Large language model for text tasks
  • DALL·E: Text-to-image synthesis
  • Whisper: Automatic speech recognition and translation

Generative Pre-trained Transformer (GPT)

The Generative Pre-trained Transformer (GPT) is a transformer-based large language model trained on massive public-domain text corpora. It excels at generating contextually relevant text and supports a wide range of natural-language tasks.

Note

GPT-3.5 is the current public release with 175 billion parameters. GPT-4 offers enhanced capabilities but is accessible only to approved users via the OpenAI waitlist.

Key Capabilities of GPT-3.5

TaskDescriptionExample API Call
Text GenerationContinue or complete textopenai.chat.completions.create({ model: "gpt-3.5-turbo", ... })
Code GenerationProduce code snippets in multiple languagesopenai.chat.completions.create({ prompt: "Write Python sort...", ... })
SummarizationCondense long documentsopenai.chat.completions.create({ prompt: "Summarize research...", ... })
Sentiment AnalysisClassify emotional toneopenai.chat.completions.create({ prompt: "Analyze sentiment of...", ... })
Chat & Q&AInteractive conversationsopenai.chat.completions.create({ messages: [...], ... })
EmbeddingsVector representations for search & clusteringopenai.embeddings.create({ model: "text-embedding-ada-002", ... })

The image is a diagram illustrating the process of a Generative Pretrained Transformer (GPT) taking a prompt and producing various outputs such as classification, sentiment analysis, summarization, word completion, chat, code generation, and word embeddings.

DALL·E: Text-to-Image Generation

DALL·E 2 is a diffusion-based model that transforms textual descriptions into high-fidelity images. It uses a two-step process: first generating a low-resolution image, then upscaling it.

Generating an Image with DALL·E 2

openai images.generate \
  --model "dall-e-2" \
  --prompt "A futuristic city skyline at sunset" \
  --n 1 \
  --size "1024x1024"

Note

DALL·E 2 is currently in beta. Image credits and usage rights vary—review the OpenAI Image Policy before deploying.

The image is a diagram explaining DALL-E 2, showing the process from a text prompt to image generation, with notes on its similarity to GPT and its diffusion model architecture.

Whisper: Automatic Speech Recognition

Whisper is a versatile ASR model trained on diverse languages, accents, and audio qualities. It provides both transcription and translation features out of the box.

Transcribing Audio with Whisper

openai audio.transcriptions.create \
  --model "whisper-1" \
  --file "meeting_recording.mp3" \
  --language "en"
openai audio.translations.create \
  --model "whisper-1" \
  --file "entrevista.mp3"

Warning

Audio files with background noise or overlapping speakers may reduce transcription accuracy. Preprocess audio with noise reduction when possible.

The image is a diagram explaining Whisper, an automatic speech-recognition model that supports transcription and translation, trained on diverse languages and accents. It highlights Whisper 1 as the latest version in beta.

Summary of OpenAI Foundation Models

ModelPrimary FunctionRelease StatusExample Use Cases
GPT-3.5Text generation & comprehensionProductionChatbots, content creation, code gen
DALL·E 2Text-to-image synthesisBetaMarketing art, concept design
WhisperSpeech transcription & translationBetaMeeting transcriptions, subtitles

Explore these foundation models to unlock cutting-edge generative AI in your applications.

Watch Video

Watch video content

Previous
What is OpenAI