- load web documents,
- split them into chunks,
- create embeddings,
- populate an in-memory FAISS vector store,
- build a retriever,
- create a prompt and a document-combining chain, and
- wire everything into a retrieval chain.


- Initialize URLs and load the pages
- Split the loaded documents into chunks
- Create embeddings and populate FAISS (in-memory vector store)
FAISS (Facebook AI Similarity Search) is an in-memory vector index. It supports fast nearest-neighbor search but does not persist to disk by default. If you need persistence across runs, consider a persistent vector store (e.g., Chroma with persistence enabled) or serialize the FAISS index explicitly.
- Create a prompt template that forces the LLM to answer only from the provided context
- Initialize the LLM, build the combine-documents chain, and compose the retrieval chain
- Invoke the chain with a query and inspect results
- Loader: converts web pages into Document objects with metadata (source URL, title).
- Splitter: creates manageable chunks so embeddings and retrieval operate on smaller units (200-character chunks with a 50-character overlap in this example).
- Embeddings + Vector Store: indexes document chunks for fast semantic retrieval.
- Retrieval Chain: retrieves relevant chunks, the combine-documents chain assembles context, and the LLM answers constrained to that context — reducing hallucinations and enabling source-backed answers.
- Question-answering constrained to a fixed set of documents.
- Summarization across a document collection.
- Any use case requiring explicit source attribution or limiting the LLM to a defined corpus.
| Component | Purpose | Example / Notes |
|---|---|---|
| WebBaseLoader | Fetch web pages and produce Document objects | WebBaseLoader([URL1, URL2]).load() |
| RecursiveCharacterTextSplitter | Break long text into overlapping chunks for embedding | chunk_size=200, chunk_overlap=50 |
| OpenAIEmbeddings | Create vector embeddings for text | OpenAIEmbeddings(model="text-embedding-3-large") |
| FAISS | In-memory vector index for nearest-neighbor search | Use persistent store if you need state across runs |
| create_stuff_documents_chain | Combine retrieved documents using a PromptTemplate | Ensures LLM receives composed context |
| create_retrieval_chain | Wires retriever + combiner to provide an end-to-end RAG pipeline | chain.invoke({"input": query}) |
- Use temperature=0.0 (or low) for deterministic answers and to reduce hallucinations.
- Tune chunk size and overlap to balance retrieval granularity and context coherence.
- Persist your vector store (or use a persistent vectore store) if you need retrieval across runs.
- Include source metadata in returned results to enable traceability and validation.
- LangChain documentation: https://python.langchain.com/
- FAISS: https://github.com/facebookresearch/faiss
- Chroma (persistent vector store option): https://www.trychroma.com/
- Retrieval-Augmented Generation (RAG): https://en.wikipedia.org/wiki/Retrieval-augmented_generation