text-embedding-ada-002 model and perform similarity searches with NumPy. This technique is essential for building semantic search, recommendation engines, and context-aware chatbots.
1. Setup
1.1 Install Dependencies
Make sure you have the OpenAI SDK and NumPy installed:1.2 Import Libraries and Define Helper
Each embedding from
text-embedding-ada-002 has a fixed dimension of 1536, regardless of the input length.2. Sample Phrases
We’ll use four phrases that share keywords but differ in meaning:| Phrase | Context |
|---|---|
| ”Most of the websites provide the users with the choice of accepting or denying cookies” | Web cookies |
| ”Olivia went to the bank to open a savings account” | Financial bank |
| ”Sam sat under a tree that was on the bank of a river” | River bank |
| ”John’s cookies were only half-baked but he still carries them for Mary” | Edible cookies |
3. Generating Embeddings
Convert each phrase to its embedding vector:4. Defining Cosine Similarity
Cosine similarity measures the angle between two vectors in the semantic space. Identical vectors yield a score of 1.0.5. Running Similarity Searches
Define a function to find the most similar phrase from our list:5.1 Example Queries
- Exact riverbank match → similarity ≈ 1.00
- Biscuits (edible cookies) → ≈ 0.92
- Financial advice → ≈ 0.84
- Nature reference → ≈ 0.82
- GDPR cookies → ≈ 0.83
6. Discussion
- Embeddings capture semantic context, not just surface-level keywords.
- All vectors have the same dimensionality (1536) to sit in a common embedding space.
- Cosine similarity retrieves items by meaning, not by exact word overlap.
This approach powers many AI-driven features such as semantic search, recommendation engines, and dynamic context for chatbots.