In this tutorial, you’ll learn how to generate embeddings using OpenAI, store them in a Chroma vector database, and execute simple semantic searches. By the end, you’ll understand how to retrieve conceptually related documents—even when they share no exact keywords.Documentation Index
Fetch the complete documentation index at: https://notes.kodekloud.com/llms.txt
Use this file to discover all available pages before exploring further.
Table of Contents
- Prerequisites
- Step 1: Install and Import Dependencies
- Step 2: Prepare Your Documents
- Step 3: Create a Chroma Vector Store
- Step 4: Execute Semantic Queries
- How It Works
- Further Reading
Prerequisites
- Python 3.7+
- An OpenAI API key
- Install the required libraries:
Ensure your
OPENAI_API_KEY environment variable is set:Step 1: Install and Import Dependencies
Begin by importing LangChain’s embedding model and the Chroma vector store:Step 2: Prepare Your Documents
Here, we create a small collection of sports headlines. In real applications, you might load text from files, PDFs, or a database.Step 3: Create a Chroma Vector Store
Chroma will automatically generate embeddings for each document and store them in a local vector database:| Component | Description | Example |
|---|---|---|
| Embeddings | Converts text into high-dimensional vectors | OpenAIEmbeddings(model="text-embedding-ada-002") |
| Vector Database | Stores and indexes embedding vectors for similarity ops | Chroma.from_texts(texts=docs, embedding=embeddings) |
Step 4: Execute Semantic Queries
With your documents indexed, you can now query the vector store. Semantic search will return contextually related headlines, even without shared keywords.Cricket Query
Football Query
Adjust
k to change the number of returned results. For instance, k=1 returns only the single most similar document.How It Works
- Embedding Generation
Both documents and queries are transformed into vector embeddings by the same model. - Vector Similarity
Chroma computes distances between the query vector and stored document vectors, retrieving the top-kclosest matches. - Semantic Matching
Unlike keyword-based search, semantic search finds conceptually related content—even if the exact terms differ.