Daily Database Roasts

Towards a Standard for JSON Document Databases

Originally from muratbuffalo.blogspot.com/feeds/posts/default

February 9, 2026 • Roasted by Jamie "Vendetta" Mitchell Read Original Article

Ah, yes. A formal mathematical framework. It’s truly heartwarming to see them finally get around to this. It’s like finding the original blueprints for a skyscraper after the tenants have been complaining for a decade about the load-bearing columns being made of papier-mâché. “We’ve done the math, and it turns out, this thing we built might actually stand up! Mostly. On a good day.”

Of course, the whole motivation section is a masterpiece of corporate revisionist history. They call the document database world a “Wild West” full of “immense opportunity.” I remember it differently. We called it the “Wild West” because there were no laws, the sheriff was drunk on VC funding, and you built things by nailing together whatever driftwood washed ashore, hoping it looked vaguely like a saloon. The “opportunity” was shipping a feature before the competition did, even if it meant queries would occasionally return the wrong documents or, my personal favorite, just a cryptic error message and a shrug.

And this gem right here:

In MongoDB, the query origin: "UK" matches a document where origin is the string "UK". However, it also matches a document where origin is the array ["UK", "Japan"]. While this loose equality is convenient for developers, it is bad for mathematical logic...

“Convenient for developers.” That’s the most beautiful piece of spin I’ve seen since the last roadmap meeting where we were told a six-month delay was actually a strategic timeline recalibration. That wasn't a "convenience," it was a shortcut. It was a half-baked solution cooked up at 2 AM to make some demo work, and it got hard-coded into the core logic because fixing it would have required admitting the initial design was flawed. I can still hear the meeting: “It’s not a bug that violates the basic principles of logic, it’s a developer-friendly feature that enhances flexibility!” Just don't ask what happens when you have an array of arrays. We never got around to defining the behavior for that “edge case.”

Then there's “path polysemy.” What a wonderfully academic way of saying, “We never decided what a dot in a key path should actually mean, so good luck!” This wasn't some deep philosophical choice; it was the direct result of a dozen different teams implementing pathing logic over five years without ever talking to each other. The result? A query’s behavior is entirely dependent on the shape of the data that happens to be in the collection at that exact moment. It’s not a database; it’s a game of Russian Roulette with your application’s runtime.

And now, to solve all this, they’ve proposed MQuery. Or as the article so helpfully points out, McQuery. It’s a fitting name. It’s fast, it’s cheap, it looks like food, but five years from now we’re all going to be dealing with the health consequences. They proudly declare that after years of rigorous academic work, they’ve proven that their aggregation framework is “at least as expressive as full relational algebra.”

Let me get this straight. After more than a decade of telling everyone that relational algebra was old-fashioned and that joins were the devil, you’ve finally published a paper that triumphantly declares you’ve… reinvented the join. Congratulations. You've spent a billion dollars in R&D to prove your shiny new rocket ship can do what Codd's Ford Model T was doing in 1970. What an achievement.

The payoff, they claim, is algebraic optimization. They’ve discovered you can reorder pipeline stages to make queries faster!

Moving a $match earlier to filter data? Groundbreaking. Relational databases have only been doing filter pushdown since, oh, the Nixon administration.
Moving $unwind later to save memory? Astounding. It’s almost like you shouldn’t generate a billion intermediate documents if you don’t have to. Who knew?

This paper isn’t a theoretical breakthrough. It’s an apology letter written in LaTeX. It’s a retroactive attempt to bolt a coherent design onto a product that grew like a fungus in a dark, damp server room. They’re not building a foundation; they’re frantically trying to pour concrete under a house that’s already tilting.

I can just see the next all-hands meeting now. The VPs will be on stage, beaming, presenting this paper as proof of their commitment to engineering excellence. What they won’t mention is that this entire exercise was only necessary because the original design philosophy was “whatever works by Friday.”

Can’t wait for the 2030 paper that provides a formal mathematical model for why our clustered index still randomly drops writes under load. Truly revolutionary.

🔥 The DB Grill 🔥