Where database blog posts get flame-broiled to perfection
Ah, yes. Another blog post explaining why a database's "surprising" and "flexible" behavior is actually a brilliant, index-friendly design choice and not, you know, a bug with a PhD. Reading the phrase "querying them can be confusing because a field might be a scalar value in one document and an array in another" is already triggering my fight-or-flight response. It’s the same soothing tone my VP of Engineering used before explaining why our "infinitely scalable" key-value store couldn't handle a simple COUNT(*) without falling over, and that our new weekend project was to re-implement analytics from scratch. Fun times.
I love the premise here. We start with a little jab at good old Oracle and SQL for having, god forbid, different settings for sorting and comparison. How quaint. How… configurable. But don’t worry, MongoDB is here to be consistent. Except, you know, when it’s not. And when it’s not, it’s not a bug, it’s a feature of its advanced, multi-key indexing strategy. Of course it is.
Let's dive into the fruit salad of an example, because nothing screams "enterprise-ready" like sorting an array of single characters. The core of this masterpiece is the admission that sorting and comparing arrays are two completely different operations with different results.
Comparisons evaluate array elements from left to right until a difference is found, while sorting uses only a single representative value from the array.
My soul just left my body. So, if I ask the database for everything > ['p', 'i', 'n', 'e'] and then ask it to sort by that same field, the logic used for the filter is completely abandoned for the sort. This isn't a "different semantic approach"; it's a landmine. I can already picture the bug report: "Ticket #8675309: Pagination is broken and showing duplicate/missing results on page 2." And I'll spend six hours debugging it on a Saturday, fueled by lukewarm coffee and pure spite, only to find this blog post and realize the database is just gleefully schizophrenic by design.
And then we get this absolute gem:
⚠️ Ascending and descending sorts of arrays differ beyond direction. One isn't the reverse of the other.
I... what? I have to stop. This is a work of art. This sentence should be framed and hung in every startup office. It’s the database equivalent of "the exit is not an emergency exit." You’re telling me that ORDER BY foo ASC and ORDER BY foo DESC aren't just mirror images? That the fundamental expectation of sorting built up over 50 years of computer science is just a suggestion here? My PTSD from that "simple" Cassandra migration is kicking in. I remember them saying things like, "eventual consistency is intuitive once you embrace it." It's the same energy.
But don't worry! If you want predictable, sane behavior, you can just write this tiny, simple, perfectly readable aggregation pipeline:
db.fruits.aggregate([
{ $match: { "arr": { $gt: ["p","i","n","e"] } } },
{ $addFields: {
mySort: { $reduce: {
input: "$arr",
initialValue: "",
in: { $concat: ["$$value", "$$this"] }
}}
} },
{ $sort: { mySort: 1 } },
{ $project: { _id: 1, txt: 1, mySort: 1 } }
]);
Oh, perfect. Just casually calculate a new field at query time for every matching document to do what ORDER BY does in every other database on the planet. I’m sure that will be incredibly performant when we're not sorting 16 fruits, but 16 million user event logs. This isn't a solution; it's a cry for help spelled out in JSON.
The best part is the triumphant conclusion about indexing. Look at all these stats! totalKeysExamined: 93, dupsDropped: 77, nReturned: 16. We’re so proud that our index is so inefficient that we have to scan six times more keys than we return, all for the privilege of a sort order that makes no logical sense. This is a feature. This is why we have synergy and are disrupting the paradigm. We've optimized for the index, not for the user, and certainly not for the poor soul like me who gets the PagerDuty alert when the SORT stage runs out of memory and crashes the node.
So, thank you for this clarification. I’ll be saving it for my post-mortem in six months. The title will be: "How a 'Minor' Sort Inconsistency Led to Cascading Failures and Data Corruption." But hey, at least the query that brought down the entire system was, technically, very index-friendly.