LangChain
Implementing Chains
Using Document Chain
In this tutorial, you’ll learn how to use LangChain’s create_stuff_documents_chain
to load two TechCrunch articles, merge their content into a single context, and query an LLM for answers that span both sources. This approach is ideal for static prompts when you need to aggregate multiple documents into one cohesive input.
1. Install & Import Dependencies
First, ensure you have LangChain and any required community packages installed:
pip install langchain langchain-community openai
Then import the necessary modules:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.document_loaders import WebBaseLoader
from langchain.chains.combine_documents import create_stuff_documents_chain
2. Load Documents from URLs
Define the two TechCrunch article URLs and use WebBaseLoader
to fetch them:
URL1 = "https://techcrunch.com/2024/02/27/microsoft-made-a-16-million-investment-in-mistral-ai/"
URL2 = "https://techcrunch.com/2024/03/28/ai21-labs-new-text-generating-ai-model-is-more-efficient-than-most/"
loader = WebBaseLoader([URL1, URL2])
data = loader.load()
# Confirm two documents were loaded
print(len(data)) # 2
# Preview the start of the first article
print(data[0].page_content[:200])
3. Build the Prompt Template
Create a chat-style prompt that asks the LLM to identify models launched by Mistral AI and AI21 Labs:
prompt = ChatPromptTemplate.from_messages([
("system", "Identify the models launched by Mistral AI and AI21 Labs:\n\n{context}")
])
4. Initialize the LLM and Create the Chain
Instantiate your LLM client and assemble the stuff chain:
llm = ChatOpenAI(model="gpt-3.5-turbo")
chain = create_stuff_documents_chain(llm, prompt)
Note
The Stuff chain concatenates all loaded documents into one context chunk. Use this when your combined input remains within the model’s context window.
5. Invoke the Chain
Run the chain by passing in the document list as the context
:
result = chain.invoke({"context": data})
print(result)
Example output:
The models launched by Mistral AI are Mistral 7B (a 7 billion parameter open source language model) and Mixtral (a chat-optimized fine-tuned model). AI21 Labs released a text-generating and analysis model called J1-Jumbo.
6. When to Use the Stuff Chain
Use Case | Approach | Description |
---|---|---|
Static or Known Prompts | Stuff chain | Simplest method when you can merge all documents at once. |
Large or Dynamic Collections | Retrieval chain | Performs semantic search over embeddings and sends only relevant chunks to the LLM. |
Warning
If the total size of concatenated documents exceeds the model’s context window, you will hit limitations or errors. Switch to a retrieval-based chain for larger datasets.
In the next lesson, we’ll explore LangChain’s retrieval chains for handling high-volume or dynamically changing document collections.
Links and References
- LangChain Documentation
- ChatPromptTemplate Reference
- OpenAI API Reference
- TechCrunch
- LangChain Retrieval Chains
Watch Video
Watch video content