Where database blog posts get flame-broiled to perfection
Ah, yes. A formal mathematical framework. Itâs truly heartwarming to see them finally get around to this. Itâs like finding the original blueprints for a skyscraper after the tenants have been complaining for a decade about the load-bearing columns being made of papier-mĂąchĂ©. âWeâve done the math, and it turns out, this thing we built might actually stand up! Mostly. On a good day.â
Of course, the whole motivation section is a masterpiece of corporate revisionist history. They call the document database world a âWild Westâ full of âimmense opportunity.â I remember it differently. We called it the âWild Westâ because there were no laws, the sheriff was drunk on VC funding, and you built things by nailing together whatever driftwood washed ashore, hoping it looked vaguely like a saloon. The âopportunityâ was shipping a feature before the competition did, even if it meant queries would occasionally return the wrong documents or, my personal favorite, just a cryptic error message and a shrug.
And this gem right here:
In MongoDB, the query
origin: "UK"matches a document whereoriginis the string "UK". However, it also matches a document whereoriginis the array["UK", "Japan"]. While this loose equality is convenient for developers, it is bad for mathematical logic...
âConvenient for developers.â Thatâs the most beautiful piece of spin Iâve seen since the last roadmap meeting where we were told a six-month delay was actually a strategic timeline recalibration. That wasn't a "convenience," it was a shortcut. It was a half-baked solution cooked up at 2 AM to make some demo work, and it got hard-coded into the core logic because fixing it would have required admitting the initial design was flawed. I can still hear the meeting: âItâs not a bug that violates the basic principles of logic, itâs a developer-friendly feature that enhances flexibility!â Just don't ask what happens when you have an array of arrays. We never got around to defining the behavior for that âedge case.â
Then there's âpath polysemy.â What a wonderfully academic way of saying, âWe never decided what a dot in a key path should actually mean, so good luck!â This wasn't some deep philosophical choice; it was the direct result of a dozen different teams implementing pathing logic over five years without ever talking to each other. The result? A queryâs behavior is entirely dependent on the shape of the data that happens to be in the collection at that exact moment. Itâs not a database; itâs a game of Russian Roulette with your applicationâs runtime.
And now, to solve all this, theyâve proposed MQuery. Or as the article so helpfully points out, McQuery. Itâs a fitting name. Itâs fast, itâs cheap, it looks like food, but five years from now weâre all going to be dealing with the health consequences. They proudly declare that after years of rigorous academic work, theyâve proven that their aggregation framework is âat least as expressive as full relational algebra.â
Let me get this straight. After more than a decade of telling everyone that relational algebra was old-fashioned and that joins were the devil, youâve finally published a paper that triumphantly declares youâve⊠reinvented the join. Congratulations. You've spent a billion dollars in R&D to prove your shiny new rocket ship can do what Codd's Ford Model T was doing in 1970. What an achievement.
The payoff, they claim, is algebraic optimization. Theyâve discovered you can reorder pipeline stages to make queries faster!
$match earlier to filter data? Groundbreaking. Relational databases have only been doing filter pushdown since, oh, the Nixon administration.$unwind later to save memory? Astounding. Itâs almost like you shouldnât generate a billion intermediate documents if you donât have to. Who knew?This paper isnât a theoretical breakthrough. Itâs an apology letter written in LaTeX. Itâs a retroactive attempt to bolt a coherent design onto a product that grew like a fungus in a dark, damp server room. Theyâre not building a foundation; theyâre frantically trying to pour concrete under a house thatâs already tilting.
I can just see the next all-hands meeting now. The VPs will be on stage, beaming, presenting this paper as proof of their commitment to engineering excellence. What they wonât mention is that this entire exercise was only necessary because the original design philosophy was âwhatever works by Friday.â
Canât wait for the 2030 paper that provides a formal mathematical model for why our clustered index still randomly drops writes under load. Truly revolutionary.