In modern natural language processing (NLP), word embeddings provide a dense, numerical representation of text. Whether it’s individual words, phrases, sentences, or entire documents, embeddings map semantically similar inputs to vectors that lie close together in high-dimensional space. Embeddings are generated by specialized models that convert text into one-dimensional arrays (vectors). When you feed related words or sentences into these models, the resulting vectors cluster based on meaning.Documentation Index
Fetch the complete documentation index at: https://notes.kodekloud.com/llms.txt
Use this file to discover all available pages before exploring further.

How Embeddings Power Language Models
Large language models (LLMs) use embeddings as a foundational building block. Before processing text, they tokenize the input and transform those tokens into dense vectors, capturing semantic and syntactic features.
Embedding Workflow
- Prepare any text (a single word up to a large document).
- Send it to an embedding model API.
- Receive a numerical vector encoding its meaning.

Embeddings typically have hundreds to thousands of dimensions. Despite their size, they enable efficient similarity computations and downstream tasks such as clustering, recommendation, and semantic search.
Measuring Similarity: Cosine Similarity
To compare two embedding vectors, we often use cosine similarity, which measures the cosine of the angle between them: cos(θ) = (A · B) / (||A|| ||B||)
| Similarity Range | Interpretation |
|---|---|
| 0.9 – 1.0 | Nearly identical meaning |
| 0.5 – 0.9 | Related concepts |
| 0.0 – 0.5 | Weak or no relation |
| Negative values | Opposing or dissimilar topics |
Semantic Search with Embeddings
Traditional keyword-based search can miss relevant results when phrasing differs. Semantic search converts both queries and documents into embeddings and retrieves items whose vectors lie closest to the query vector. Instead of exact string matches, semantic search finds content by meaning, powering applications like:- Document retrieval
- FAQ matching
- Contextual recommendation
Generating Embeddings with OpenAI
Below is an example of how to generate embeddings using the OpenAI Python SDK:Keep your API keys secure. Never expose them in client-side code or public repositories.