Documentation Index Fetch the complete documentation index at: https://notes.kodekloud.com/llms.txt
Use this file to discover all available pages before exploring further.
In this tutorial, you’ll learn how to implement a complete RAG pipeline using web pages instead of PDFs. We’ll cover:
Loader selection
Content splitting
Embedding generation
Vector storage
Retrieval & formatting
QA over the retrieved context
By the end, you’ll have a reusable recipe for answering questions grounded in any web article.
1. Setup: Imports & Loader Selection
Begin by installing and importing the required packages. We’ll switch out the PDF loader for a web-based loader.
pip install langchain langchain-community chromadb openai
from langchain_community.document_loaders import PyPDFLoader, WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_chroma import Chroma
from langchain.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
# Option 1: PDF loader
# Option 2: Web page loader
URL = "https://www.theverge.com/2024/4/18/24133808/meta-ai-assistant-llama-3-chatgpt-openai-rival"
loader = WebBaseLoader( URL )
Loader Source Type Use Case Example PyPDFLoader PDF file Static document ingestion PyPDFLoader("path/to/manual.pdf")WebBaseLoader Web page URL Dynamic HTML content WebBaseLoader("https://example.com/article")
WebBaseLoader fetches rendered HTML content, so it’s best for news articles, blog posts, and static documentation.
2. Load & Split the Document
Fetch the page and break it into manageable chunks for embedding.
# 1. Load the web page
docs = loader.load()
# 2. Split content into chunks
text_splitter = RecursiveCharacterTextSplitter(
chunk_size = 200 ,
chunk_overlap = 50
)
chunks = text_splitter.split_documents(docs)
Smaller chunk sizes increase retrieval precision but raise API call counts. Adjust chunk_size and chunk_overlap based on your quota and use case.
3. Embed & Store in a Vector Database
Generate embeddings for each text chunk and store them in Chroma .
# Initialize the embedding model
embeddings = OpenAIEmbeddings( model = "text-embedding-3-large" )
# Create a Chroma vector store from the chunks
vectorstore = Chroma.from_documents(
documents = chunks,
embedding = embeddings
)
# Prepare a retriever interface
retriever = vectorstore.as_retriever()
Define a helper that merges multiple document chunks into a single context string for the LLM.
def format_docs ( docs ):
"""
Concatenates page_content from each retrieved document.
"""
return " \n\n " .join(doc.page_content for doc in docs)
5. Build the QA Chain
Create a prompt template and chain together retrieval, formatting, and the LLM.
# System prompt template
template = """
SYSTEM: You are a question-answering assistant.
Use only the provided context to answer; if unknown, reply "I don't know."
Question: {question}
Context:
{context}
"""
prompt = PromptTemplate.from_template(template)
# Initialize the chat model
llm = ChatOpenAI()
# Assemble the chain:
# retrieval → formatting → prompt → LLM → string output
chain = (
{
"context" : retriever | format_docs,
"question" : RunnablePassthrough()
}
| prompt
| llm
| StrOutputParser()
)
6. Run a Query
Invoke the chain with any factual question related to the web page.
result = chain.invoke( "What's the size of the largest Llama 3 model?" )
print (result)
# Expected output: "The largest Llama 3 model will have over 400 billion parameters."
You can adapt the question to explore any section of the article or domain-specific content.
7. Next Steps & References
Experiment with other web pages, blogs, or API-based content sources.
Explore pre-built RAG chains for summarization, code generation, or translation.
Fine-tune chunk_size, embedding models, and LLM parameters for performance and cost.
Links and References