How are embeddings different from keywords?

Keyword search matches exact words; embeddings match meaning, so paraphrases and synonyms still retrieve the right content.

What is vector search?

Finding the stored embeddings nearest to a query embedding by a similarity metric like cosine distance — search by semantic closeness.

Why combine vector and keyword search?

Vectors excel at meaning but can miss exact terms (codes, names). Hybrid search blends both for the best recall and precision.

Do embeddings power RAG?

Yes. RAG embeds documents and queries, retrieves the nearest chunks by vector search, and grounds the model's answer in them.

ConceptsUpdated 2026-06-21 · Version 1.0

What are Embeddings & Vector Search?

An embedding is a numeric vector that represents the meaning of text (or images, audio, code) so that semantically similar items sit close together in vector space. Vector search finds the nearest embeddings to a query, enabling search by meaning rather than keywords. Embeddings are the backbone of retrieval-augmented generation, semantic search, clustering and recommendation.

Evidence: BenchmarkConfidence: HighSource: BenchmarkSource: Paper

Machine-readable: JSON

Definition

An embedding is a dense numeric vector that encodes the meaning of a piece of data, positioned so that similar items are close in vector space; vector search retrieves the nearest embeddings to a query.

Key takeaways

Embeddings turn meaning into vectors; similar things sit close together.
Vector search retrieves by semantic similarity, not exact keywords.
They power RAG, semantic search, clustering and recommendation.
Hybrid search (vector + keyword) usually beats either alone.
Chunking and the embedding model choice drive retrieval quality.

Context

Computers compare numbers, not meaning. Embeddings bridge that gap: an embedding model maps text to a vector such that 'cancel my subscription' and 'how do I unsubscribe' land near each other, even with no shared words.

This is what makes semantic retrieval possible. Instead of matching keywords, a system embeds the query and finds the closest stored vectors — the foundation of how RAG and modern search retrieve relevant content.

Architecture

Indexing: content is split into chunks, each passed through an embedding model to produce a vector, then stored in a vector index. Querying: the query is embedded and the index returns the nearest vectors by a similarity metric (e.g. cosine).

Production systems add a re-ranker to refine the top results, combine vector search with keyword search (hybrid), and filter by metadata and permissions. Quality depends heavily on chunking and the embedding model.

Components

Embedding modelChunkingVector index / databaseSimilarity metric (cosine)Re-rankerHybrid (keyword) search

Benefits

Search by meaning, robust to wording.
Cross-lingual and multimodal matching.
The backbone of RAG and semantic search.
Cheap to query at scale once indexed.

Risks

Poor chunking degrades every downstream result.
Embedding model mismatch hurts relevance.
Vectors can leak sensitive info; secure the store.
Pure vector search can miss exact-match needs (use hybrid).

Tools & technologies

Embedding models (OpenAI, Cohere, open-source)Vector databases (pgvector, Pinecone, Vertex AI Vector Search)Re-rankersHybrid search engines

Examples

Semantic search over a help center that matches intent, not keywords.
Retrieving relevant passages to ground a RAG answer.
Clustering support tickets by topic using their embeddings.

FAQs

How are embeddings different from keywords?: Keyword search matches exact words; embeddings match meaning, so paraphrases and synonyms still retrieve the right content.
What is vector search?: Finding the stored embeddings nearest to a query embedding by a similarity metric like cosine distance — search by semantic closeness.
Why combine vector and keyword search?: Vectors excel at meaning but can miss exact terms (codes, names). Hybrid search blends both for the best recall and precision.
Do embeddings power RAG?: Yes. RAG embeds documents and queries, retrieves the nearest chunks by vector search, and grounds the model's answer in them.