LangChain
Implementing Chains
Using Retrieval Chain
In this tutorial, you’ll create a Retrieval-Augmented Generation (RAG) workflow using two TechCrunch articles—one on Anthropic’s latest AI chatbot and another on AI21 Labs’ efficient text model. By the end, you’ll have a chain that searches indexed article chunks and answers questions with grounded context.
1. Load and Chunk Documents
Begin by loading both article URLs into LangChain’s web loader and splitting the text into manageable, overlapping chunks.
from langchain.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
URL1 = "https://techcrunch.com/2024/03/04/anthropic-claims-its-new-models-beat-gpt-4/"
URL2 = "https://techcrunch.com/2024/03/28/ai21-labs-new-text-generating-ai-model-is-more-efficient-than-most/"
loader = WebBaseLoader([URL1, URL2])
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=50)
chunks = text_splitter.split_documents(documents)
Note
Make sure you have the latest versions of langchain
installed:
pip install langchain langchain-community
2. Create Embeddings and Vector Store
Transform each text chunk into embeddings and index them with FAISS for fast similarity search.
from langchain_community.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vector_store = FAISS.from_documents(chunks, embeddings)
retriever = vector_store.as_retriever()
Vector Store Options
Vector Store | Description | Install Command |
---|---|---|
FAISS | In-memory similarity search from Meta | pip install faiss-cpu |
Chroma | Lightweight persistent vector database | pip install chromadb |
Weaviate | Managed cloud vector service | pip install weaviate-client |
3. Define Prompt and Initialize LLM
Craft a template that limits responses to retrieved context. Use a zero temperature to reduce hallucinations.
from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI
prompt_template = """
Answer the question "{input}" based solely on the context below:
<context>
{context}
</context>
If you can't find an answer, say "I don't know."
"""
prompt = PromptTemplate.from_template(prompt_template)
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.0)
Warning
Setting temperature=0.0
helps ensure the model sticks to retrieved information and minimizes made-up content.
4. Build the Retrieval Chain
Link the retriever with a “stuff documents” chain so only relevant chunks appear in the prompt.
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
combine_docs_chain = create_stuff_documents_chain(llm=llm, prompt=prompt)
chain = create_retrieval_chain(retriever, combine_docs_chain)
5. Query the Chain and Inspect Results
Submit a question about token windows and get an answer grounded in the articles.
query = "List the models and their token size of models only from Anthropic and AI21 Labs"
result = chain.invoke({"input": query})
print(result["answer"])
Expected output:
- Anthropic's Claude 3: 200,000-token context window
- AI21 Labs' new AI model: 65,000-token context window
To examine which text chunks were retrieved:
print(result["context"])
Note
Viewing result["context"]
helps you debug which segments were used to generate the answer.
References
- LangChain Documentation
- OpenAI Embeddings
- FAISS Vector Store (GitHub)
- TechCrunch Article 1: https://techcrunch.com/2024/03/04/anthropic-claims-its-new-models-beat-gpt-4/
- TechCrunch Article 2: https://techcrunch.com/2024/03/28/ai21-labs-new-text-generating-ai-model-is-more-efficient-than-most/
Watch Video
Watch video content
Practice Lab
Practice lab