Semantic Search vs Keyword Search: When NLP Changes Everything

A user types "how do I cancel my subscription" into your support search. Your keyword search returns an article titled "How to cancel your subscription." Great. But it also returns irrelevant articles containing the words "cancel" and "subscription" in unrelated contexts, and misses the article titled "Managing your account and billing settings" — which is actually the most helpful result. Semantic search would have found that article. This is the core distinction, and it matters enormously in practice.

How Keyword Search Works (and Why It Fails)

Traditional keyword search — whether BM25, TF-IDF, or standard Elasticsearch — represents both the query and documents as vectors of term frequencies. Relevance is computed as a statistical measure of term overlap. It is fast, interpretable, and remarkably effective for exact-match retrieval.

It fails when:

Synonyms: "car" vs "automobile," "MI" vs "myocardial infarction"
Paraphrases: "cancel subscription" vs "stop recurring charge"
Conceptual queries: "what should I eat after a workout?" — no document will contain all those exact words in the expected form
Spelling variants and informal language: user-generated queries rarely match the formal language of technical documentation

How Semantic Search Works

Semantic search encodes both queries and documents as dense vectors in a high-dimensional embedding space, using a bi-encoder (e.g., Sentence-BERT, E5, or OpenAI's text-embedding-ada-002). Similarity is computed as cosine similarity or dot product between the query vector and all document vectors.

Documents that are semantically similar to the query end up near it in vector space, regardless of whether they share exact terms. A document about "canceling a recurring payment" will score highly for the query "stop my subscription" because both map to nearby regions of the embedding space.

Modern deployment uses Approximate Nearest Neighbor (ANN) search (via FAISS, Hnswlib, Weaviate, Pinecone, etc.) to make this scalable to millions of documents with sub-10ms latency.

When Semantic Search Wins

Semantic search clearly outperforms keyword search when:

Queries are natural language questions rather than keyword queries
Documents use technical or formal language while users speak informally
Vocabulary mismatch is high between the query and relevant documents
Entity-centric retrieval: "companies similar to Stripe" benefits from semantic similarity between company descriptions
Cross-lingual retrieval: multilingual embedding models can match an English query to a Spanish document

When Keyword Search Still Wins

Keyword search retains advantages in:

Exact identifier lookup: searching for a specific product code (SKU-10429-B), a person's name, or a precise technical term is better handled by exact matching
Long-tail queries: rare, precise queries that depend on specific terminology perform better with BM25 because embedding models may not have enough training signal for highly specialized vocabulary
Interpretability requirements: BM25 can explain why a result ranked where it did (this term matched X times). Dense retrievers cannot.
Low latency environments: BM25 over an inverted index is still 5-10x faster than ANN search at scale

The Hybrid Approach: Best of Both Worlds

The current best practice for most production search systems is hybrid retrieval: run both BM25 and semantic search in parallel, then combine scores using Reciprocal Rank Fusion (RRF) or a learned re-ranking model.

Hybrid retrieval:

Retrieves top-100 candidates from both BM25 and ANN search
Merges and deduplicates results
Re-ranks the combined list using a cross-encoder (a more expensive but higher-quality model that jointly encodes the query and each candidate)

This pipeline typically outperforms either method alone by a significant margin, particularly on heterogeneous query types.

Building a Semantic Search System

Key implementation considerations:

Embedding model choice: domain-specific fine-tuning of embedding models (e.g., on your corpus of support articles) dramatically improves retrieval quality
Chunking strategy: documents need to be split into chunks small enough to embed meaningfully but large enough to contain useful context (typically 200-500 words)
Index freshness: updating dense indexes when documents change is more complex than updating an inverted index
Evaluation: use NDCG@k and MRR over a query set with human-labeled relevance judgments, not just anecdotal examples

Conclusion

The choice between semantic and keyword search is not binary. Modern production search uses both, with semantic search handling conceptual and natural language queries and keyword search handling exact-match and rare-term cases. The shift to semantic search represents one of the most impactful applications of NLP to real-world information retrieval.

Keywords: semantic search, keyword search, BM25, dense retrieval, Sentence-BERT, embeddings, NLP search, hybrid search, vector search, information retrieval