Explains using embeddings and cosine similarity to measure semantic relevance for effective semantic search and retrieval augmented generation, improving document retrieval and grounding model outputs.
In this lesson we explain how similarity calculations let us find the most semantically relevant passages — the “needles” in a large digital haystack — without scanning every document. This is essential for semantic search and Retrieval-Augmented Generation (RAG), where the quality of retrieved context directly affects model output reliability.A traditional keyword search matches literal characters and phrases. For example, asking about how plants communicate with a keyword search will only surface passages that contain your exact words and may miss related concepts expressed with different vocabulary.
Because keyword matching is literal, it can miss related concepts such as root signaling, nutrient-sharing networks, or fungal connections beneath the soil — passages that are semantically relevant but do not include your exact search terms.
Similarity-based methods avoid this problem by measuring how close two pieces of text are in meaning rather than comparing characters. This is done by mapping words, sentences, or documents into high-dimensional vectors called embeddings.Conceptually, embeddings are like GPS coordinates in a much higher-dimensional space. A single embedding might look like:
# Example embedding vector (truncated)[0.2, 0.8, -0.3, 0.5, ...]
Each dimension can capture latent semantic features (for example “animalness”, “size”, “maturity”, “domestication”). Related terms cluster near each other in this vector space.A simple analogy: imagine two darts thrown at a board. If they land in the same place and point the same way, the throwers were thinking similarly. In vector terms, similarity examines the direction (angle) between vectors rather than raw magnitude.
The standard numerical measure for directional similarity is cosine similarity:
cosine_similarity(A, B) = (A · B) / (||A|| * ||B||)
Here, A · B is the dot product and ||A|| is the vector norm (magnitude). Values close to 1 indicate vectors pointing in the same direction (high semantic similarity), values near 0 indicate orthogonality (little relation), and values near -1 indicate opposite directions (opposite meanings).You can compute cosine similarity easily in Python:
import numpy as npdef cosine_similarity(a: np.ndarray, b: np.ndarray) -> float: return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))# Example (small synthetic vectors)vec_cat = np.array([0.9, 0.1, 0.0])vec_kitten = np.array([0.88, 0.12, -0.01])vec_dog = np.array([0.7, 0.2, 0.1])print(cosine_similarity(vec_cat, vec_kitten)) # close to 1.0print(cosine_similarity(vec_cat, vec_dog)) # lower than cat/kitten
Why this matters: similarity calculations are the core of retrieval. If retrieved documents are irrelevant or only weakly related, the language model receives poor context and may generate incorrect or confidently wrong answers (hallucinations). Accurate similarity scoring improves the relevance of retrieved context, which in turn strengthens factual grounding for RAG systems.
High-quality similarity transforms retrieval into reliable context for generation. Poor similarity produces weak or misleading context and degrades model outputs.
Similarity calculations measure semantic closeness, not string equality.
Embeddings map text into vectors that capture meaning; similar meanings occupy similar directions in vector space.
Cosine similarity compares vector directions and is robust to differences in document length.
Retrieval quality determines RAG quality: good similarity → better context → more accurate model answers.
Focus on semantic meaning rather than exact keywords. Proper embeddings plus cosine-based retrieval are fundamental to building reliable, well-grounded RAG systems.