Mastering Generative AI with OpenAI

Using Word Embeddings For Dynamic Context

Demo Building Dynamic Context with Custom Data Part 2

In this guide, we'll continue from having a text column ready as context. Next, we'll generate word embeddings for each sentence, compute similarities, and dynamically build context for OpenAI API calls—all using a pandas DataFrame.

1. Generate Embeddings for the text Column

Use your embedding function to vectorize each row in the DataFrame:

# Generate embeddings for each row in the 'text' column
df = df.assign(embedding=df["text"].apply(lambda x: text_embedding(x)))

Note

Generating embeddings for a large dataset may take some time. For 121 rows, expect multiple API calls.

Inspect the new embedding column:

df.head()

Example output:

   year_film  year_ceremony  ceremony                    category                     name                      film  winner                                               text                                           embedding
10639      2022            2023        95       actor in a leading role           Austin Butler                    Elvis   False                 Austin Butler got nominated under the category...                [–0.0378, –0.0199, …]
10640      2022            2023        95       actor in a leading role         Colin Farrell       The Banshees of Inisherin   False                 Colin Farrell got nominated under the category...                [–0.0082, –0.0100, …]
…          …               …       …                                 …                          …                         …      …                                               …                                               …

To view a specific example:

df["text"].iloc[100]
# Output:
# 'Viktor Prášil, Frank Kruse, Markus Stembler, Lars Ginzel and Stefan Korte got nominated under the category, sound, for the film All Quiet on the Western Front but did not win'

2. Define a Similarity Function

We'll use the dot product to measure vector similarity:

import numpy as np

def vector_similarity(vec1, vec2):
    return np.dot(np.squeeze(np.array(vec1)), np.squeeze(np.array(vec2)))

When asking a question (e.g., “Who won the Best Picture award?”), follow these steps:

  1. Embed the query
  2. Compute similarity against each row
  3. Select top candidates
# 1. Embed the query
query = "Who won the Best Picture award?"
query_embedding = text_embedding(query)

# 2. Compute similarity for each row
df["similarity"] = df["embedding"].apply(lambda x: vector_similarity(x, query_embedding))

# 3. Retrieve top 20 matches
top_res = df.nlargest(20, "similarity")

4. Build the Context Block

Concatenate the top candidates into one string:

context = "\n".join(top_res["text"])
print(context)

Now craft the prompt and call the API:

prompt = f"""
From the data provided in three backticks, respond to the question: {query}
```{context}```
"""

result = get_word_completion(prompt)
print(result)
```text

Expected output:

The film "Everything Everywhere All at Once" won the Best Picture award at the 95th Oscar awards.

[object Object],

## 5. Try a Different Question

For example, “Who is the lyricist for RRR?”:

```python
query = "Who is the lyricist for RRR?"
query_embedding = text_embedding(query)

# Recompute similarities and get top matches
df["similarity"] = df["embedding"].apply(lambda x: vector_similarity(x, query_embedding))
top_res = df.nlargest(20, "similarity")
context = "\n".join(top_res["text"])

# Craft prompt and call API
prompt = f"""
From the data provided in three backticks, respond to the question: {query}
```{context}```
"""

result = get_word_completion(prompt)
print(result)

Expected response:

The lyricist for RRR is Chandra Bose.

6. Resources and References

ResourceDescriptionLink
pandas DataFrameData manipulation and analysishttps://pandas.pydata.org/
OpenAI Embeddings APIGenerate vector embeddingshttps://openai.com/embeddings
NumPyNumerical computing in Pythonhttps://numpy.org/
Best Practices for Prompt DesignTips for crafting effective AI promptshttps://platform.openai.com/docs/guides/chat/completions-prompt-design

This workflow—building dynamic context from a DataFrame using embeddings, then crafting prompts on the fly—unlocks powerful generative AI applications with OpenAI.

Watch Video

Watch video content

Practice Lab

Practice lab

Previous
Demo Building Dynamic Context with Custom Data Part 1