Demo Building Dynamic Context with Custom Data Part 2
This workflow builds dynamic context from a DataFrame using embeddings and crafts prompts for OpenAI API calls.
In this guide, we’ll continue from having a text column ready as context. Next, we’ll generate word embeddings for each sentence, compute similarities, and dynamically build context for OpenAI API calls—all using a pandas DataFrame.
Use your embedding function to vectorize each row in the DataFrame:
Copy
Ask AI
# Generate embeddings for each row in the 'text' columndf = df.assign(embedding=df["text"].apply(lambda x: text_embedding(x)))
Generating embeddings for a large dataset may take some time. For 121 rows, expect multiple API calls.
Inspect the new embedding column:
Copy
Ask AI
df.head()
Example output:
Copy
Ask AI
year_film year_ceremony ceremony category name film winner text embedding10639 2022 2023 95 actor in a leading role Austin Butler Elvis False Austin Butler got nominated under the category... [–0.0378, –0.0199, …]10640 2022 2023 95 actor in a leading role Colin Farrell The Banshees of Inisherin False Colin Farrell got nominated under the category... [–0.0082, –0.0100, …]… … … … … … … … … …
To view a specific example:
Copy
Ask AI
df["text"].iloc[100]# Output:# 'Viktor Prášil, Frank Kruse, Markus Stembler, Lars Ginzel and Stefan Korte got nominated under the category, sound, for the film All Quiet on the Western Front but did not win'
When asking a question (e.g., “Who won the Best Picture award?”), follow these steps:
Embed the query
Compute similarity against each row
Select top candidates
Copy
Ask AI
# 1. Embed the queryquery = "Who won the Best Picture award?"query_embedding = text_embedding(query)# 2. Compute similarity for each rowdf["similarity"] = df["embedding"].apply(lambda x: vector_similarity(x, query_embedding))# 3. Retrieve top 20 matchestop_res = df.nlargest(20, "similarity")
prompt = f"""From the data provided in three backticks, respond to the question: {query}```{context}```"""result = get_word_completion(prompt)print(result)```textExpected output:
The film “Everything Everywhere All at Once” won the Best Picture award at the 95th Oscar awards.
Copy
Ask AI
<Callout icon="triangle-alert" color="#FF6B6B">Keep an eye on token limits when concatenating many text entries into `context`.</Callout>## 5. Try a Different QuestionFor example, “Who is the lyricist for RRR?”:```pythonquery = "Who is the lyricist for RRR?"query_embedding = text_embedding(query)# Recompute similarities and get top matchesdf["similarity"] = df["embedding"].apply(lambda x: vector_similarity(x, query_embedding))top_res = df.nlargest(20, "similarity")context = "\n".join(top_res["text"])# Craft prompt and call APIprompt = f"""From the data provided in three backticks, respond to the question: {query}```{context}```"""result = get_word_completion(prompt)print(result)
This workflow—building dynamic context from a DataFrame using embeddings, then crafting prompts on the fly—unlocks powerful generative AI applications with OpenAI.