Where database blog posts get flame-broiled to perfection
Alright, let's pour one out for my on-call rotation, because I've just read the future and it's paged at 3 AM on Labor Day weekend.
"A simple example, easy to reproduce," it says. Fantastic. I love these kinds of articles. They’re like architectural blueprints drawn by a kid with a crayon. The lines are all there, but there’s no plumbing, no electrical, and the whole thing is structurally unsound. This isn’t a db<>fiddle, buddy; this is my Tuesday.
Let’s start with the premise, which is already a five-alarm fire. "I have two tables. One is stored on one server, and the other on another." Oh, wonderful! So we're starting with a distributed monolith. Let me guess: they're in different VPCs, one is three patch versions behind the other, and the network connection between them is held together with duct tape and a prayer to the SRE gods. The developer who set this up definitely called it "synergistic data virtualization" and got promoted, leaving me to deal with the inevitable network partition.
And then we get to the proposed solutions. The author, with thirty years of experience, finds MongoDB "more intuitive." That’s the first red flag. "Intuitive" is corporate jargon for "I didn't have to read the documentation on ACID compliance."
He presents this beautiful, multi-stage aggregation pipeline. It’s so... elegant. So... declarative. He says it’s "easier to code, read, and debug." Let's break down this masterpiece of future outages, shall we?
$unionWith
: Ah yes, let's just casually merge two collections over a network connection that's probably flapping. What's the timeout on that? Who knows! Is it logged anywhere? Nope! Can I put a circuit breaker on it? Hah! It’s the database equivalent of yelling into the void and hoping a coherent sentence comes back.$unwind
: My absolute favorite. Let's take a nice, compact document and explode it into a million tiny pieces in memory. What could possibly go wrong? It's fine with four rows of sample data. Now, let’s try it with that one user who has 50,000 items in their cart because of a front-end bug. The OOM killer sends its regards.$group
and $push
... twice: So we explode the data, do some math, and then painstakingly rebuild the JSON object from scratch. It’s like demolishing a house to change a lightbulb. This isn't a pipeline; it's a Rube Goldberg machine for CPU cycles.I can see it now. The query runs fine for three weeks. Then, at the end of the quarter, marketing runs a huge campaign. The data volume triples. This "intuitive" pipeline starts timing out. It consumes all the available memory on the primary. The replica set fails to elect a new primary because they're all choking on the same garbage query. My phone buzzes. The alert just says "High CPU." No context. No query ID. Just pain.
And don't think I'm letting PostgreSQL off the hook. This SQL monstrosity is just as bad, but in a different font. We've got CROSS JOIN LATERAL
on a jsonb_array_elements
call. It’s a resume-driven-development special. It's the kind of query that looks impressive on a whiteboard but makes the query planner want to curl up into a fetal position and cry. You think the MongoDB query was a black box? Wait until you try to debug the performance of this thing. The EXPLAIN
plan will be longer than the article itself and will basically just be a shrug emoji rendered in ASCII art.
And now we have the "new and improved" SQL/JSON standard. Great. Another way to do the exact same memory-hogging, CPU-destroying operation, but now it's "ANSI standard." That'll be a huge comfort to me while I'm trying to restore from a backup because the write-ahead log filled the entire disk.
But you know what's missing from this entire academic exercise? The parts that actually matter.
Where’s the section on monitoring the performance of this pipeline? Where are the custom metrics I need to export to know if
$unwind
is about to send my cluster to the shadow realm? Where's the chapter on what happens when the source JSON has a malformed field because a different team changed the schema without telling anyone?
It's always an afterthought. They build the rocket ship, but they forget the life support. They promise a "general-purpose database" that can solve any problem, but they hand you a box of parts with no instructions and the support line goes to a guy who just reads the same marketing copy back to you.
This whole blog post is a perfect example of the problem. It's a neat, tidy solution to a neat, tidy problem that does not exist in the real world. In the real world, data is messy, networks are unreliable, and every "simple" solution is a future incident report waiting to be written.
I'll take this article and file it away in my collection. It’ll go right next to my laptop sticker for RethinkDB. And my mug from Compose.io. And my t-shirt from Parse. They all made beautiful promises, too. This isn't a solution; it's just another sticker for the graveyard.