Introduction to OpenAI

Text Generation

Embeddings Demo

Welcome back! In this lesson, we’ll dive into embeddings—a fundamental building block for modern NLP applications. You’ll learn what embeddings are, how to create them with the OpenAI Python library, inspect the resulting vectors, and explore next steps for integrating embeddings into your projects.

What Is an Embedding?

An embedding is a high-dimensional vector representation of text (or other data) that captures semantic meaning. Machine learning models use embeddings to measure relatedness between inputs for tasks like:

Use CaseDescription
Semantic SearchRetrieve documents most relevant to a query
ClusteringGroup similar texts together
RecommendationsSuggest related items based on content similarity
Anomaly DetectionIdentify outliers in text datasets
Diversity MeasurementQuantify variation within a corpus
ClassificationImprove text classifiers with richer feature vectors

Embeddings map text strings into a continuous vector space, allowing algorithms to compute distances (e.g., cosine similarity) and uncover relationships.

Creating an Embedding

First, install the OpenAI Python library and export your API key:

pip install openai
export OPENAI_API_KEY="YOUR_API_KEY"

Warning

Never commit your OPENAI_API_KEY to public repositories. Use environment variables or secret managers to keep your key secure.

Next, generate an embedding:

from openai import OpenAI

client = OpenAI()

response = client.embeddings.create(
    model="text-embedding-ada-002",
    input="The food was delicious and the waiter was very attentive."
)

print(response)
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [
        0.00230642455,
        0.0032792702,
        ...
        // 1536 floats total for ada-002
      ]
    }
  ]
}

For details on models, parameters, and rate limits, see the OpenAI embeddings documentation.

Embedding Model Comparison

ModelDimensionsContext WindowTypical Use Case
text-embedding-ada-00215368,192 tokensCost-effective, general purpose
text-embedding-3-large12,2888,192 tokensHigh-fidelity semantic tasks

Note

Larger models often yield richer representations but come with higher compute costs and latency.

Inspecting Embeddings in Python

Once you have embeddings, you can examine the raw vectors:

from openai import OpenAI

client = OpenAI(api_key="YOUR_API_KEY")

response = client.embeddings.create(
    input="Michael Jordan is better than LeBron James",
    model="text-embedding-3-large"
)

vector = response.data[0].embedding
print(vector)
[ 0.016919609921797, 0.008237271796965, 0.015445727936748, 
  0.011458127799397, 0.015966911630216, 0.01077441280857632, 
  0.001197237995964063, ... ]

These floating-point values position your text in a semantic space. Use similarity metrics (e.g., cosine similarity) to compare vectors.

Next Steps

With embeddings at your disposal, you can:

  1. Store them in a vector database like Pinecone or Weaviate.
  2. Perform similarity searches to retrieve related documents or answers.
  3. Cluster content by semantic similarity for topic modeling.
  4. Build recommendation systems based on text or user-profile embeddings.

Embeddings power a wide range of NLP pipelines—experiment with different models, inputs, and downstream algorithms to unlock new insights!


Watch Video

Watch video content

Previous
Embeddings