LangChain
Performing Retrieval
Performing Retrieval
Welcome back! In this lesson, we’ll dive into retrieval techniques and explore how retrieval-augmented generation (RAG) enhances your LLM applications with external context.
Why Retrieval Matters
Large language models rely on context—external data and background—to generate accurate, factual responses. Without relevant context, models often “hallucinate,” producing plausible but incorrect information.
| Data Source | Example Systems | Use Case |
|---|---|---|
| Relational Databases | MySQL, PostgreSQL, SQLite | Structured queries via SQL |
| NoSQL Databases | MongoDB | Flexible document storage |
| Full-Text Search | Elasticsearch | Keyword-based document retrieval |
| Vector Databases | Pinecone, Chroma, Milvus, Weaviate | Semantic search with embeddings |
| Document Stores | PDF, Word, HTML, CSV (e.g., Amazon S3) | Unstructured file retrieval |
| APIs & Web Search | REST endpoints, real-time web scraping | Live data fetching |
![]()
Note
Retrieving only the most relevant passages keeps your prompt within the LLM’s context window (e.g., 4,096 tokens) and reduces hallucinations.
What Is Retrieval-Augmented Generation (RAG)?
Retrieval-Augmented Generation (RAG) is a two-step framework:
- Retrieve relevant context from external data sources.
- Augment the LLM prompt with those facts before generation.

Why Not Send the Entire Document?
- Context window limits: LLMs can’t process an entire 100-page PDF in one go.
- Efficiency: Fetching only necessary chunks saves on compute and token usage.
- Explainability: You can cite the exact source of each fact.
- Privacy & Control: Only selected passages are exposed to the model.
RAG Workflow Overview
A typical RAG pipeline consists of five steps:
- User Query: The user asks a question.
- Search: The system queries databases or document stores.
- Context Injection: Retrieved passages are inserted into the prompt.
- LLM Call: The augmented prompt is sent to the LLM.
- Response: The model generates an answer, which is returned to the user.

Two Phases of RAG
Phase 1: Indexing
- Load unstructured documents (PDFs, HTML, JSON, images).
- Split each document into manageable chunks (sentences, paragraphs, or token windows).
- Embed each chunk via an embeddings model, producing numerical vectors.
- Store these vectors in a vector database (e.g., Chroma, Milvus, Weaviate).
Tip
Choosing the right chunk size (e.g., 500 tokens) balances retrieval precision and recall.
from langchain.document_loaders import UnstructuredPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
# 1. Load documents
loader = UnstructuredPDFLoader("path/to/doc.pdf")
docs = loader.load()
# 2. Split into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)
# 3. Create embeddings and store
embeddings = OpenAIEmbeddings()
vector_db = Chroma.from_documents(chunks, embeddings, persist_directory="chroma_db")
vector_db.persist()

Phase 2: Retrieval
- Embed the user’s query using the same embeddings model.
- Perform a semantic search against stored vectors in the vector database.
- Retrieve the top‐k matching chunks.
- Augment the original prompt with those chunks.
- Generate the final answer via the LLM.
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
# Create a retriever
retriever = vector_db.as_retriever(search_kwargs={"k": 5})
# Build a RAG chain
qa_chain = RetrievalQA.from_chain_type(
llm=OpenAI(model_name="gpt-4"),
chain_type="stuff",
retriever=retriever
)
# Ask a question
response = qa_chain.run("What are the main benefits of RAG?")
print(response)

Building a RAG Pipeline with LangChain
LangChain offers modular components to assemble a complete RAG workflow:
| Component | Description |
|---|---|
| Document Loaders | Fetch external data (PDFs, web pages, APIs) |
| Text Splitters | Break documents into searchable chunks |
| Embeddings Models | Convert text chunks into semantic vectors |
| Vector Stores | Store and index vectors for efficient similarity search |
| Retrievers | Query vector stores to retrieve the most relevant document chunks |
![]()
When these modules are connected, you get a robust RAG pipeline. Next, we’ll build a Q&A application that ingests both a PDF and a web page.

Let’s jump into the demo!
Links and References
Watch Video
Watch video content