Where database blog posts get flame-broiled to perfection
Alright, settle down, kids. Let me put down my coffeeāthe kind that's brewed strong enough to dissolve a floppy diskāand read this... manifesto. I swear, Iāve seen more complex logic on a punch card.
So, let me get this straight. You've discovered that there are different ways to join data. And that, get this, one way might be faster than another depending on the situation. Groundbreaking. Truly. I haven't been this shocked since they told me we could store more than 80 characters on a single line. This whole article is like watching a toddler discover his own feet and calling it a breakthrough in bipedal locomotion.
The author starts with a treatise on join algorithms like heās cracking the Enigma code. Nested Loop joins, Hash Joins... Son, we were debating the finer points of hash bucket overflow in DB2 on a System/370 mainframe while your parents were still trying to figure out how to program a VCR. You're talking about cardinality estimates? Back in my day, we estimated cardinality by weighing the boxes of punch cards. It was more accurate than half the query planners I see today.
And this... this $lookup syntax. My god. It looks like a cat walked across a keyboard full of special characters.
{
$lookup: {
from: "profiles",
localField: "profileID",
foreignField: "ID",
as: "profile"
}
}
You call that a query? That's a cry for help. Iāve seen cleaner COBOL code written during a power surge. We had a keyword for this back in the 80s. It was elegant, simple, powerful. It was called LEFT JOIN. Maybe you've heard of it.
The author then runs a test on a dataset so small I could probably fit it on a single reel of magnetic tape. Twenty-six users and four profiles. He then "scales it up" by cloning the same records 10,000 times. Thatās not scaling, thatās just hitting CTRL+C/CTRL+V until your finger gets tired. It tells you nothing about real-world data distribution. It's like testing a battleship by seeing if it floats in a bathtub.
And the big reveal!
Discovery #1: The Indexed Loop Join. You're telling me that if you create an index, the database... uses it? And that looking up a key in an index is faster than scanning the whole damn table for every single row? Hold the phone! Someone alert the press! I remember waiting six hours for an index to build on a multi-gigabyte table, listening to the DASD platters scream, just so the nightly batch job wouldn't take until next Thursday. And you're presenting "use an index" as some kind of advanced optimization technique.
Discovery #2: The Hash Join. You found that if the lookup table is small, it's faster to load it into memory and build a hash table than to repeatedly scan the disk. Welcome to 1985, kid. We called that a good idea then, and it's still a good idea now. It's not a revolutionary HashJoin strategy, it's just... common sense. The only difference is our "in-memory hash table" was limited to 640K of RAM and we had to pray it didn't spill over into the space reserved for the operating system.
And my absolute favorite part:
Unlike SQL databases, where the optimizer makes all decisions but can also lead to surprises, MongoDB shifts responsibility to developers.
Let me translate that for you from corporate-speak into English: "Our query optimizer is dumber than a bag of hammers, so it's your problem now. We're calling this developer empowerment."
This isn't a feature. This is you doing the job the database is supposed to be doing for you. You have to "design your schema with join performance in mind," "understand your data," "test different strategies," and "measure performance." You've just perfectly described the job of a Database Administrator. A job that these NoSQL systems were supposed to make obsolete. Congratulations, you've reinvented my career, only you've made it more tedious and given it a worse title.
So when you hear someone say "joins are slow," maybe the real problem isn't the join. Maybe it's that you're using a glorified document shredder that makes you manually reassemble the pieces, and then you write a blog post bragging about the predictable performance of using staples instead of glue.
You haven't found some new paradigm. You've just taken a forty-year-old concept, slapped a JSON wrapper on it, and sold it back to a generation that thinks a database schema is a form of oppression. Now if you'll excuse me, I have some tapes to rotate. They aren't "web-scale," but at least they work.