Daily Database Roasts

Why MongoDB skips indexes when flattening or renaming sub-document fields in $project before $match aggregation pipeline

Originally from dev.to/feed/franckpachot

August 4, 2025 • Roasted by Patricia "Penny Pincher" Goldman Read Original Article

Alright, so I just finished reading this article about MongoDB being a "general-purpose database" with its flexible schemas and efficient indexing. Efficient indexing, they say! My eyes nearly rolled right out of my head and bounced off the conference room table. Because what I see here isn't efficiency; it's a meticulously crafted financial black hole designed to suck every last penny out of your budget under the guise of "innovation" and "agility."

Let's dissect this, shall we? They start by telling us their query planner optimizes things, but then, in the very next breath, they're explaining how their "optimizer transformations" don't work like they do in those quaint, old-fashioned SQL databases. And why? Because of this glorious flexible schema! You know, the one that lets you shove any old garbage into your database without a moment's thought about structure, performance, or, you know, basic data integrity. It's like a hoarder's attic, but for your critical business data.

The real gem, though, is when they calmly explain that if you dare to rename a JSON dotted path in a $project stage before you filter, your precious index is magically ignored, and you get a delightful COLLSCAN. A full collection scan! On a large dataset, that's not just slow; that's the sound of our cloud bill screaming like a banshee and our customers abandoning ship! They build up this beautiful index, then tell you that if you try to make your data look halfway presentable for a query, you've just kicked the tires off your supercar and are now pushing it uphill. And their solution? "Oh, just remember to $match first, then $project later!" Because who needs intuitive query design when you can have a secret handshake for basic performance? This isn't flexibility; it's a semantic minefield laid specifically to trap your developers, drive up their frustration, and ultimately, drive up your operational costs.

They wax poetic about how you "do not need to decide between One-to-One or One-to-Many relationships once for all future insertions" and how it "avoids significant refactoring when business rules change." Translation: You avoid upfront design by deferring all the complexity into an inscrutable spaghetti-ball data model that will require a team of their highly-paid consultants to untangle when you inevitably need to query it efficiently. And did you see the example with the arrays of arrays? Customer C003 has emails that are arrays within arrays! Trying to query that becomes a logic puzzle worthy of a Mensa convention. This isn't "accommodating changing business requirements"; it's accommodating chaos.

So, let's talk about the true cost of embracing this kind of "flexibility." Because they'll trot out some dazzling ROI calculation, promising the moon and stars for your initial license fee or cloud consumption. But let's get real.

First, your initial investment. Let's be generous and say it's a cool $500,000 for licenses or cloud credits for a mid-sized operation. Peanuts, right?

Then, the migration costs. You think you're just moving data? Oh no, you're refactoring every single piece of code that interacts with the database. You're learning their unique syntax, their peculiar aggregation pipeline stages, and, crucially, all the ways to avoid getting a COLLSCAN. We're talking developers tearing their hair out for six months, easily. That's $250,000 in lost productivity and developer salaries, minimum.

Next, training. Every single developer, every single data analyst, needs to be retrained on this "intuitive" new way of thinking. They'll need to understand why $match before $project is a religious rite. That's another $100,000 in courses, certifications, and bewildered team leads trying to explain array-of-array semantics.

And then, the pièce de résistance: the inevitable consultants. Because when your queries are grinding to a halt, and your team can't figure out why their "intuitive" projections are blowing up the CPU, who do you call? Their Professional Services team, of course! They'll show up, charge you $500 an hour (because they're the only ones who truly understand their own undocumented quirks), and spend three months explaining that you just needed to reshape your data with a $unwind stage you've never heard of. That's another $300,000 right there, just to make their "flexible" database perform basic operations.

And the ongoing operational cost? With all those COLLSCANs happening because someone forgot the secret handshake, your cloud compute costs will skyrocket. You'll scale horizontally, throw more hardware at it, and watch your margins evaporate faster than an ice cube in July. That's easily $150,000 more per year, just to run the thing inefficiently.

So, let's tally it up, shall we?

Initial Investment: $500,000
Migration & Developer Pain: $250,000
Training: $100,000
Consultants (Inevitable!): $300,000
Increased Compute: $150,000 (annual, but let's just add it to the first year's sticker shock)

That's a grand total of $1,300,000 in the first year alone, for a solution that promises "flexibility" but delivers only hidden complexity and a license to print money for the vendor. They promise ROI, but all I see is R.O.I.P. for our budget. This isn't a database; it's a monument to technical debt wrapped in pretty JSON.

My prediction? We'll be explaining to the board why our "revolutionary" new database requires a dedicated team of alchemists and a monthly offering of first-borns to the cloud gods just to find a customer's email address. Mark my words, by next quarter, we'll be serving ramen noodles from the server room while they're off counting their Monopoly cash.

🔥 The DB Grill 🔥