Where database blog posts get flame-broiled to perfection
Alright, pull up a chair. Let me get my emergency-caffeine mug for this.
Ah, another blog post about how MongoDB "simplifies" things. That's fantastic. It simplifies mapping your application object directly to a data structure that will eventually become so unwieldy and deeply nested it develops its own gravitational pull. I love this. It’s my favorite genre of technical fiction, right after "five-minute zero-downtime migration."
The author starts with this adorable little two-document collection in a MongoDB Playground. A playground. That's cute. It’s a safe, contained space where your queries run in milliseconds and memory usage is a theoretical concept. My production cluster, which is currently sweating under the load of documents with 2,000-element arrays that some genius decided was a "rich document model," doesn't live in a playground. It lives in a perpetual state of fear.
The best part is where they "discover" the problem. You can't just group by team.memberId. Oh no! It tries to group by the entire array. Who could have possibly foreseen this? It's almost as if you've abandoned a decades-old, battle-tested relational model for a structure that requires you to perform complex pipeline gymnastics to answer a simple question: "Who worked on what?"
And the grand solution? The silver bullet? $unwind.
Let me tell you about $unwind. It’s presented here as a handy little tool, a "bridge" to make things feel like SQL again. In reality, $unwind is a hand grenade you toss into your aggregation pipeline. On your little two-document example, it’s charming. It creates, what, six or seven documents in the pipeline? Adorable.
Now, let's play a game. Let's imagine this isn't a toy project. Let's imagine it's our actual user data. One of our power users, let's call her "Enterprise Brenda," is a member of 4,000 projects. Her document isn't a neat 15 lines of JSON; it's a 14-megabyte monster. Now, a junior dev, fresh off reading this very blog post, writes an analytics query for the new C-level dashboard. It contains a single, innocent-looking stage: { $unwind: "$team" }.
I can see it now. It’ll be 3:15 AM on the Saturday of a long holiday weekend.
$unwind Enterprise Brenda's 14MB document with its 4,000-element projects array.mongod process in the head.And how will I know this is happening? I won't. Because the monitoring tools to see inside an aggregation pipeline to spot a toxic $unwind are always the last thing we get budget for. We have a million graphs for CPU and disk I/O, but "memory usage per-query" is a feature request on a vendor's Jira board with 300 upvotes and a status of "Under Consideration."
In practice, $lookup in MongoDB is often compared to JOINs in SQL, but if your fields live inside arrays, a join operation is really
$unwindfollowed by$lookup.
This sentence should be printed on a warning label and slapped on the side of every server running Mongo. This isn't a "tip," it's a confession. You’re telling me that to replicate the most basic function of a relational database, I have to first detonate my document into thousands of copies of itself in memory? Revolutionary. I'll add that to my collection of vendor stickers for databases that don't exist anymore. It'll go right between my one for RethinkDB ("Realtime, scalable, and now defunct") and my prized Couchbase sticker ("It's like Memcached and MongoDB had a baby, and abandoned it").
So, thank you for this article. It's a perfect blueprint for my next incident post-mortem. You've done a great job showing how to solve a simple problem in a way that is guaranteed to fail spectacularly at scale. Keep up the good work. I'll just be over here, pre-caffeinating for that inevitable holiday page. You developers write the code, but I'm the one who has to live with it.