Am I missing the boat on vector databases for RAG?

In defense of full text search

Sep 21, 2025

For un-labeled image and video similarity, dense vector DBs may be your only option, but with text, I’m not convinced that traditional FTS, aka sparse vector search via an inverted index, isn’t the way to go. It seems like among web indicators for database popularity, the top “vector” DB is ranked 60th in September 2025. There are several hybrid solutions with dense vector support that rank higher, but I can’t really disentangle if they are being used for their non-“dense” qualities. For the purposes of this article I’m calling a vector DB a DB which focuses on dense vector retrieval.

Transparently, my only experiences with the modern incarnation of dense vector DBs have been hobby projects and dev-only traffic in production systems. Their inner workings are pretty familiar to me though, as I’ve been in the semantic similarity game for a while. Circa 2018, at Weight Watchers, my team used a massive food tracking corpus (~1B tracked meals) to build multi-lingual food embeddings, treating a meal as a document and food as tokens. This allowed users to substitute healthier alternatives in recipes or find recommendations that pair well with your current kitchen contents. To go to production, these embeddings were dimensionally reduced with umap, and a lookup index was built with an ANN methodology like ANNOY. We put this index behind an API, and horizontally scaled it across a few large instances, voila — a nascent vector DB.

Around that same time period, I tried (and failed) to start an art search engine company which crawled the web to find art which matched a user’s visual preferences, kind of like Pinterest I guess. This ran crawled art sources through a chop-shop version of resnet v2, concatenated some TF-IDF tag based embeddings, then dimensionally reduced and indexed this new art vector. The served index was cached rebuilt nightly.

I only mention these projects because if good vector DB solutions existed at that time, I think they would have been perfect for my applications, both being based on dense vector search. But that was in 2018 — Nvidia has since blessed us with giant multi-modal LLMs. I think these LLMs can zero-shot the problems I spent years working on. If you ask them for recipe substitutions, they are great. If you give some artwork to an LLM, prompting for descriptive search tags, they are great.

Because LLMs provide a universal serialization layer between a sparse vector encoding (i.e. text) and data, do we need more than sparse search for LLM applications?

Looking at RAG, a prototypical use case for vector DBs, models need some way to get relevant contexts. DeepMind’s recent paper, “On the Theoretical Limitations of Embedding-Based Retrieval” lays out how dense embeddings have theoretical retrieval limitations, even with basic questions. Their results show that BM25 based retrieval, i.e. sparse full-text indexed data, is able to overcome this limitation, the implication being that hybrid sparse/dense embeddings are necessary for high-performance retrieval.

But, the HNSW indexes in modern vector DBs are optimized for dense vector retrieval, not sparse retrieval. Though many support hybrid retrieval, the inverse indices for sparse vectors are usually built as a wholly separate index, which basically means they are a “traditional” FTS engine bolted onto a dense similarity engine.

Dense vector representations have drawbacks that sparse representation do not. Their generalizability is not great, just visit the MTEB leaderboard, and change the filter to be domain or task specific, the rankings between the overall multi-language performance and any specific category will change significantly. This means that you should really be training a dense embedding model for your application, and that you will also probably need to change your dense embeddings from time to time. Every time you change your model, you’ll then need to re-index everything. Inverted text indexes won’t require the same sort of “global reset” for changes.

Dense vector representations take a lot of compute compared to sparse embeddings and the indexes cost more to maintain. To do some ballpark pricing, I have a corpus with about 100M scientific papers. Each paper is on average 7.1 chunks for an embedding model, and about 73kB of text data. Given the public, non-enterprise pricing for both elastic and milvus, the milvus pricing for ~700M vectors would run ~$45k per month. The elastic cost for similar cluster would run ~$8.2k per month. There are many potential optimizations which make this price comparison more complex, but let’s say that it’s somewhere between 2x and 4x the price for a dense vector system.

Now that I’ve complained about disadvantages, the advantage of a dense vector representation, and transitively the advantage of the HNSW indices in a vector database, is the ability for semantic search. So like “CRISPR” results will be returned if you search “gene editing”. This is cool — but, in a RAG application, why not prompt your LLM to do query expansion for you? Also, in using Elastic or OpenSearch, you can use algorithms to automatically create synonyms, or have LLMs do that for you. Even if we look to reference implementations of tools, like OpenAI’s OSS for search or Claude code, they both use full text search (via grep or regexp) to find what they need, no dense vectors necessary.

Seems like in some benchmarks, dense/hybrid DBs like pinecone or infinity are actually faster than FTS-native DBs like elastic. I think this is great, but this is really just a better traditional FTS solution, right? Also seems to be from the docs that these DBs are less configurable than OpenSearch or Elastic’s DSL for search. The DSL spec can also be written by an LLM, further adding flexibility in RAG applications.

Granted, we still need dense vector DB look-ups for multi-modal content or recommenders which embed user preferences, but to me this is more of a specialized use case, I otherwise don’t see the vison for dense vector DBs yet.

Little Leaps

Discussion about this post