Nick Hoff

RAG is not just vector similarity

April 10, 2025

Retrieval-Augmented Generation (RAG) is a technique that connects a Large Language Model (LLM) to an external knowledge base. This allows the LLM to generate responses based on specific, verifiable information instead of relying solely on its generalized training data, which can lead to hallucinations. The central challenge in RAG is the "Retrieval" step—a classic search problem. The goal is to find and provide the most relevant content to the LLM to answer the user's question.

RAG is not just semantic search. Many developers today equate the "Retrieval" step with vector similarity search, likely because it's a new and powerful tool. Relying only on semantic similarity can lead to significant errors though. For example, "Find the voucher for my rental car reservation in Berlin.". A pure vector search might retrieve documents about hotel vouchers in Berlin, ski rentals, or articles about the automotive history of Berlin. It finds documents that are semantically related to "rental," "car," or "Berlin" and ends up returning a lot of information that's not relevant to the actual query.

True relevance is far more complex than semantic similarity. A production-grade retrieval system must include other signals like:

So you don't want to just chunk and embed your email history and stick that in a vector db. The central challenge is that much of the most important information isn't in the content of a document, but in its metadata. Consider the query, "Find last week's email about the state of the prototype." A pure vector search on the email body will fail because the date ("last week") is a metadata field, not text within the body. The same applies to document authority, author, or source type—this information is rarely written out in the text itself. Without it, even the most advanced semantic search is flying blind.

The solution is not to abandon vector search, but to augment it with structured search over this extracted metadata. This is a classic information retrieval problem, and we don't need to reinvent the solution. Your email client already does this: it combines keyword search with filters and sorting for date, sender, and attachments. A production-grade RAG system needs a similar hybrid search capability, allowing it to query for semantic meaning and filter on concrete metadata fields.

This approach is especially critical for building personalized AI systems like Second Brain. Take the rental car voucher example again. The correct document might be a PDF attachment named booking_confirmation.pdf on an email with the subject "Your trip," sent from a generic noreply@booking-agent.com address. A successful retrieval requires a sequence of operations: first, a structured search to find emails from travel companies sent in the last month, then filtering for those with attachments, followed by OCR on those attachments, and finally a keyword or semantic search on the extracted text.

The "R" is the hardest part of RAG. The objective shouldn't be to find one perfect retrieval algorithm, but to orchestrate several - structured queries, keyword matching, and vector search - to assemble the relevant context the LLM needs.