Demo BM25 vs Semantic Search

In this lesson we compare classic BM25 retrieval with true semantic search (using sentence-transformers), then build a simple hybrid ranker that blends both signals. This example is compact, corrected, and runnable in a Jupyter-friendly notebook. BM25 is a token-statistics method that excels at matching exact terms and weighting important tokens, but it can miss synonyms, paraphrases, and deeper meaning. A bi-encoder (sentence-transformers) maps queries and documents to vectors and uses cosine similarity to retrieve semantically similar items. A hybrid approach combines both strengths and often yields more robust retrieval for real applications. Quick links:

SentenceTransformers: https://www.sbert.net/
rank_bm25: https://github.com/dorianbrown/rank_bm25

Installation

Install the required Python packages (Jupyter-friendly):

!pip install -q sentence-transformers rank-bm25 scikit-learn numpy

Overview: how the demo works

Prepare a small document corpus and example queries.
Tokenize documents for BM25 and initialize the BM25 index.
Encode documents with a sentence-transformer to produce normalized embeddings.
For each query:
- Get BM25 scores (token overlap / importance).
- Get semantic scores (cosine similarity with embeddings).
- Normalize both score vectors to [0,1] and combine them with a weighted linear blend: hybrid = alpha * semantic + (1-alpha) * bm25.
Compare the top-k results for BM25, semantic, and hybrid.

Corpus, queries, and imports

from rank_bm25 import BM25Okapi
from sentence_transformers import SentenceTransformer, util
import numpy as np

docs = [
    "Enable two-factor authentication (2FA) in your account settings to add an extra security step.",
    "HbA1c measures long-term glucose; talk to your physician about tests for glycated hemoglobin.",
    "Our PTO policy covers paid time off for vacations and sick leave.",
    "How to fix engine misfires caused by bad spark plugs.",
    "Kubernetes Ingress configuration for path-based routing.",
    "Configure MFA with authenticator apps.",
    "Doctor appointment scheduling policy."
]

queries = ["How do I set up 2FA?", "What does HbA1c mean?", "sick leave policy?"]

Prepare BM25 and SentenceTransformer embeddings

# Prepare BM25
tokenized_corpus = [d.lower().split() for d in docs]
bm25 = BM25Okapi(tokenized_corpus)

# Choose a model (fast general-purpose)
model = SentenceTransformer('all-MiniLM-L6-v2')
doc_emb = model.encode(docs, convert_to_tensor=True, normalize_embeddings=True)

Helper: show results for a query (BM25, semantic, and weighted hybrid)

def show_results(query, k=3, alpha=0.8):
    """
    Show BM25 top-k, semantic top-k, and hybrid top-k for `query`.
    alpha: semantic weight in [0,1] for hybrid. hybrid = alpha*semantic + (1-alpha)*bm25
    """
    print(f"\nQUERY: {query}\n" + "="*60)
    # BM25 scores
    bm25_scores = np.array(bm25.get_scores(query.lower().split()))
    top_bm25 = np.argsort(-bm25_scores)[:k]
    print("BM25 top-k:")
    for i in top_bm25:
        print(f"  [{i}] {bm25_scores[i]:.3f}  {docs[i]}")

    # Semantic (SentenceTransformer) scores (cosine similarity)
    q_emb = model.encode([query], convert_to_tensor=True, normalize_embeddings=True)
    cos = util.cos_sim(q_emb, doc_emb)[0].cpu().numpy()
    top_st = np.argsort(-cos)[:k]
    print("\nSemantic (SentenceTransformer) top-k:")
    for i in top_st:
        print(f"  [{i}] {cos[i]:.3f}  {docs[i]}")

    # Normalize both scores into [0,1] to combine them
    bm25_min, bm25_max = bm25_scores.min(), bm25_scores.max()
    bm25_norm = (bm25_scores - bm25_min) / (bm25_max - bm25_min + 1e-9)

    st_min, st_max = cos.min(), cos.max()
    st_norm = (cos - st_min) / (st_max - st_min + 1e-9)

    # Hybrid: semantic-weighted combination
    hybrid = alpha * st_norm + (1.0 - alpha) * bm25_norm
    top_h = np.argsort(-hybrid)[:k]
    print(f"\nHybrid (BM25 + Semantic) top-k (alpha={alpha}):")
    for i in top_h:
        print(f"  [{i}] {hybrid[i]:.3f}  {docs[i]}")

Run the demo for all queries (default semantic weight alpha=0.8)

for q in queries:
    show_results(q, k=3, alpha=0.8)

Discussion and example behavior

BM25 relies on token overlap and term importance. It may favor documents that share surface words with the query (even if the meaning differs).
SentenceTransformer returns semantically similar results by embedding meaning, so it better handles synonyms and paraphrases (e.g., mapping “2FA” to “two-factor authentication”).
The hybrid approach blends both signals using alpha. Values:
- alpha > 0.5 favors semantic matching.
- alpha < 0.5 favors BM25.
Normalizing both score arrays to [0,1] before combining allows a simple weighted linear blend that is robust to differing score scales. The small epsilon (1e-9) prevents division-by-zero for degenerate score distributions.

The image shows a Jupyter notebook interface comparing retrieval results using BM25, Semantic (SentenceTransformer), and a hybrid method for different queries related to HbA1c and sick leave policy. The results include top-k entries for each query with associated scores.

Sample (cleaned) output for illustration

QUERY: How do I set up 2FA?
============================================================
BM25 top-k:
  [3] 1.487  How to fix engine misfires caused by bad spark plugs.
  [0] 0.000  Enable two-factor authentication (2FA) in your account settings to add an extra security step.
  [5] 0.000  Configure MFA with authenticator apps.

Semantic (SentenceTransformer) top-k:
  [0] 0.689  Enable two-factor authentication (2FA) in your account settings to add an extra security step.
  [5] 0.583  Configure MFA with authenticator apps.
  [3] 0.064  How to fix engine misfires caused by bad spark plugs.

Hybrid (BM25 + Semantic) top-k (alpha=0.8):
  [0] 0.720  Enable two-factor authentication (2FA) in your account settings to add an extra security step.
  [5] 0.670  Configure MFA with authenticator apps.
  [3] 0.574  How to fix engine misfires caused by bad spark plugs.

Tuning guidance

Tune alpha and experiment with different sentence-transformer models (for example, all-MiniLM-L6-v2 for speed or multi-qa-MiniLM-L6-cos-v1 for QA-style retrieval). Use a labeled validation set (queries with known correct docs) to measure precision/recall and choose alpha and model for your dataset. If BM25 returns many identical scores (common in small corpora), the semantic signal typically helps; if semantic matching over-generalizes in your domain, increase BM25 weight.

Switching the embedding model (example)

# Example: use a QA-tuned model
model = SentenceTransformer('multi-qa-MiniLM-L6-cos-v1')
doc_emb = model.encode(docs, convert_to_tensor=True, normalize_embeddings=True)

# Re-run show_results with the new embeddings:
for q in queries:
    show_results(q, k=3, alpha=0.8)

Quick comparison

Method	Strengths	When to use
BM25	Fast, interpretable, strong on exact matches and term importance	Short queries, domain with consistent terminology
Semantic (bi-encoder)	Handles synonyms, paraphrase, and deeper meaning	Natural-language queries, varied vocabulary
Hybrid (weighted)	Combines both signals, tunable with `alpha`	Real-world retrieval where both surface form and meaning matter

Final notes

This demo uses a very small toy corpus for illustration. On larger corpora you’ll get more stable BM25 distributions and richer semantic matches.
Keep the retrieval pipeline configurable: alpha, k, and model selection should be part of your evaluation loop.
Validate hybrid weighting with representative queries and metrics (e.g., recall@k, MRR) before deploying to production.

References and further reading:

Introduction

Document Processing and Chunking

Keyword Search & Retrieval

Semantic Search & Embeddings

Vector Databases

Building the RAG Pipeline

Demo BM25 vs Semantic Search

Installation

Overview: how the demo works

Corpus, queries, and imports

Prepare BM25 and SentenceTransformer embeddings

Helper: show results for a query (BM25, semantic, and weighted hybrid)

Run the demo for all queries (default semantic weight alpha=0.8)

Discussion and example behavior

Sample (cleaned) output for illustration

Tuning guidance

Switching the embedding model (example)

Quick comparison

Final notes

Watch Video

​Installation

​Overview: how the demo works

​Corpus, queries, and imports

​Prepare BM25 and SentenceTransformer embeddings

​Helper: show results for a query (BM25, semantic, and weighted hybrid)

​Run the demo for all queries (default semantic weight alpha=0.8)

​Discussion and example behavior

​Sample (cleaned) output for illustration

​Tuning guidance

​Switching the embedding model (example)

​Quick comparison

​Final notes

Watch Video

Installation

Overview: how the demo works

Corpus, queries, and imports

Prepare BM25 and SentenceTransformer embeddings

Helper: show results for a query (BM25, semantic, and weighted hybrid)

Run the demo for all queries (default semantic weight alpha=0.8)

Discussion and example behavior

Sample (cleaned) output for illustration

Tuning guidance

Switching the embedding model (example)

Quick comparison

Final notes