LangChain
Implementing Chains
Overview of Chains
In this lesson, we’ll explore two high-level chains that streamline common workflows in LangChain:
- Summarization: Batch summarization using the Create Stuff Documents Chain
- Retrieval: Retrieval-Augmented Generation (RAG) via the Create Retrieval Chain
These constructs let you process multiple documents without reinventing the wheel, whether you need to summarize everything at once or selectively retrieve relevant passages.
1. Summarization with Create Stuff Documents Chain
The Create Stuff Documents Chain merges a list of document chunks into a single prompt and sends it to your LLM. This is ideal when your combined content stays within the model’s context window.
Use cases:
- Summarize multiple documents in one pass
- Extract specific insights across all inputs
Example: Batch Summarization
from langchain.chains import create_stuff_documents_chain
from langchain.llms import OpenAI
from langchain.schema import Document
# Initialize your LLM
llm = OpenAI(model_name="gpt-4")
# Prepare document chunks
docs = [
Document(page_content="Document text chunk 1..."),
Document(page_content="Document text chunk 2..."),
# ...
]
# Build the chain
chain = create_stuff_documents_chain(llm=llm)
# Run the summarization
summary = chain.run(input_documents=docs)
print("Summary:", summary)
Note
Ensure the total token count of your document chunks doesn’t exceed your model’s context limit. You can use the tiktoken
library to estimate tokens in advance.
2. Retrieval with Create Retrieval Chain
For larger corpora that exceed a single prompt window, the Create Retrieval Chain combines a retriever with the Stuff Documents logic. This effectively implements a simple RAG workflow:
- Retriever retrieves the most relevant chunks.
- Document Chain formats those chunks into a prompt.
- LLM generates the final answer.
Example: Simple RAG Pipeline
from langchain.chains import create_retrieval_chain
from langchain.llms import OpenAI
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
# Set up embeddings and vector store
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(["Doc text A", "Doc text B"], embeddings)
# Build the retrieval chain
retrieval_chain = create_retrieval_chain(
llm=OpenAI(model_name="gpt-4"),
retriever=vectorstore.as_retriever()
)
# Run RAG
query = "What are the key takeaways from these documents?"
result = retrieval_chain.run(query)
print("Answer:", result)
Warning
Always index your documents using the same embedding model you use at query time to ensure consistency in vector representations.
Comparison of Built-In Chains
Chain Type | Use Case | Components | Key Benefit |
---|---|---|---|
Create Stuff Documents Chain | Batch summarization, multi-doc info extraction | LLM | Simple prompt stitching |
Create Retrieval Chain | RAG over large corpora | Retriever + LLM + Document Chain | Scales beyond context window |
Next Steps
Now that you understand both the Summarization and Retrieval chains, let’s dive into hands-on demos to see them in action.
Links and References
Watch Video
Watch video content