Daily Database Roasts

Lower-Cost Vector Retrieval with Voyage AI’s Model Options

Originally from mongodb.com

August 6, 2025 • Roasted by Jamie "Vendetta" Mitchell Read Original Article

Alright, settle down, settle down. I just read the latest dispatch from the MongoDB marketing—sorry, engineering—blog, and I have to say, it’s a masterpiece. A true revelation. They’ve discovered that using less data… is cheaper. Truly groundbreaking stuff. I’m just shocked they didn’t file a patent for the concept of division. This is apparently “the future of AI-powered search,” folks. And I thought the future involved flying cars, not just making our existing stuff slightly less expensive by making it slightly worse.

They’re talking about the “cost of dimensionality.” It’s a cute way of saying, “Turns out those high-fidelity OpenAI embeddings cost a fortune to store and query, and our architecture is starting to creak under the load.” I remember those roadmap meetings. The ones where "scale" was a magic word you sprinkled on a slide to get it approved, with zero thought for the underlying infrastructure. Now, reality has sent the bill. And that bill is 500GB for 41M documents. Oops.

So, what’s the big solution? The revolutionary technique to save us all? Matroyshka Representation Learning. Oh, it sounds so sophisticated, doesn't it? So scientific. They even have a little diagram of a stacking doll. It’s perfect, because it’s exactly what this is: a gimmick hiding a much smaller, less impressive gimmick.

They call it “structuring the embedding vector like a stacking doll.” I call it what we used to call it in the engineering trenches: truncating a vector. They’re literally just chopping the end off and hoping for the best. This isn’t some elegant new data structure; it’s taking a high-resolution photo and saving it as a blurry JPEG. But “Matroyshka” sounds so much better on a press release than “Lossy Vector Compression for Dummies.”

And the technical deep-dive? Oh, honey, this is my favorite part.

def cosine_similarity(v1,v2): ...

Let’s all just take a moment to admire this Python function. A for loop to calculate cosine similarity. In a blog post about performance. In the year of our lord 2024. This is the code they’re proud to show the public. This tells you everything you need to know. It’s like a Michelin-starred chef publishing a recipe for boiling water. You just know the shortcuts they’re taking behind the scenes in the actual product code if this is what they put on the front page. I bet the original version of this feature was just vector[:512], and a product manager said, "Can we give it a cool Russian name?"

Then we get to the results. The grand validation of this bold new strategy. Look at this table:

Dimensions	Relative Performance	Storage for 100M Vectors
512	0.987	205GB
2048	1.000	820GB

They proudly declare that you get ~99% relative performance for a quarter of the cost! Wow! What a deal!

Let me translate that from marketing-speak into reality-speak for you:

"For the low, low price of throwing away 75% of your data, you only lose a little bit of accuracy!"
"Our system works almost as well when you cripple it!"
"We will now charge you for a new 'tuning' feature that lets you decide precisely how inaccurate you want your results to be."

That 1.3% drop in performance from 2048d to 512d sounds tiny, right? But what is that 1.3%? Is it the one query from your biggest customer that now returns garbage? Is it the crucial document in a legal discovery case that now gets missed? Is it the difference between a user finding a product and bouncing from your site? They don't know. But hey, the storage bill is lower! The Ops team can finally afford that second espresso machine. Mission accomplished.

This whole post is a masterclass in corporate judo. They’re turning a weakness—"our system is expensive and slow at high dimensions"—into a feature: "choice." They’re not selling a compromise; they're selling “tunability.” It’s genius, in a deeply cynical way.

So, what’s next? I’ll tell you what’s next. Mark my words. In six months, there will be another blog post. It’ll announce the next revolutionary cost-saving feature. It’ll probably be “Binary Quantization as a Service,” where they turn all your vectors into just 1s and 0s. They’ll call it something cool, like “Heisenberg Representation Fields,” and they’ll show you a chart where you can get 80% of the accuracy for 1% of the storage cost.

And everyone will applaud. Because as long as you use a fancy enough name, people will buy anything. Even a smaller doll.

🔥 The DB Grill 🔥