KodeKloud Notes

In this tutorial, you’ll learn how to implement a complete RAG pipeline using web pages instead of PDFs. We’ll cover:

Loader selection
Content splitting
Embedding generation
Vector storage
Retrieval & formatting
QA over the retrieved context

By the end, you’ll have a reusable recipe for answering questions grounded in any web article.

1. Setup: Imports & Loader Selection

Begin by installing and importing the required packages. We’ll switch out the PDF loader for a web-based loader.

pip install langchain langchain-community chromadb openai

from langchain_community.document_loaders import PyPDFLoader, WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_chroma import Chroma
from langchain.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# Option 1: PDF loader
# Option 2: Web page loader
URL = "https://www.theverge.com/2024/4/18/24133808/meta-ai-assistant-llama-3-chatgpt-openai-rival"
loader = WebBaseLoader(URL)

Loader	Source Type	Use Case	Example
PyPDFLoader	PDF file	Static document ingestion	`PyPDFLoader("path/to/manual.pdf")`
WebBaseLoader	Web page URL	Dynamic HTML content	`WebBaseLoader("https://example.com/article")`

Note

WebBaseLoader fetches rendered HTML content, so it’s best for news articles, blog posts, and static documentation.

2. Load & Split the Document

Fetch the page and break it into manageable chunks for embedding.

# 1. Load the web page
docs = loader.load()

# 2. Split content into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=200, 
    chunk_overlap=50
)
chunks = text_splitter.split_documents(docs)

Warning

Smaller chunk sizes increase retrieval precision but raise API call counts. Adjust chunk_size and chunk_overlap based on your quota and use case.

3. Embed & Store in a Vector Database

Generate embeddings for each text chunk and store them in Chroma.

# Initialize the embedding model
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

# Create a Chroma vector store from the chunks
vectorstore = Chroma.from_documents(
    documents=chunks, 
    embedding=embeddings
)

# Prepare a retriever interface
retriever = vectorstore.as_retriever()

4. Format Retrieved Chunks

Define a helper that merges multiple document chunks into a single context string for the LLM.

def format_docs(docs):
    """
    Concatenates page_content from each retrieved document.
    """
    return "\n\n".join(doc.page_content for doc in docs)

5. Build the QA Chain

Create a prompt template and chain together retrieval, formatting, and the LLM.

# System prompt template
template = """
SYSTEM: You are a question-answering assistant.
Use only the provided context to answer; if unknown, reply "I don't know."

Question: {question}
Context:
{context}
"""
prompt = PromptTemplate.from_template(template)

# Initialize the chat model
llm = ChatOpenAI()

# Assemble the chain:
#   retrieval → formatting → prompt → LLM → string output
chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough()
    }
    | prompt
    | llm
    | StrOutputParser()
)

6. Run a Query

Invoke the chain with any factual question related to the web page.

result = chain.invoke("What's the size of the largest Llama 3 model?")
print(result)
# Expected output: "The largest Llama 3 model will have over 400 billion parameters."

You can adapt the question to explore any section of the article or domain-specific content.

7. Next Steps & References

Experiment with other web pages, blogs, or API-based content sources.
Explore pre-built RAG chains for summarization, code generation, or translation.
Fine-tune chunk_size, embedding models, and LLM parameters for performance and cost.

RAG with Webpages