Where database blog posts get flame-broiled to perfection
Alright team, gather 'round the virtual water cooler. I just read this little love letter to the query planner, and my pager-induced twitch is acting up again. It’s a beautiful, academic exploration of a feature that sounds great on a slide deck but is an absolute grenade in practice. Let me break down this masterpiece of “theoretical performance” for you.
First, we have the Profoundly Perplexing Planner. This blog post spends half its word count reverse-engineering a query planner that gives out "bonuses" like a game show host. An EOF bonus? Are we optimizing a database or handing out participation trophies? The planner sees three identical ways to solve a problem, picks one at random because it finished a microsecond faster in a sterile lab, and declares it the winner. This isn't intelligent design; it's a coin flip with extra steps, and my on-call schedule is the one that pays the price when it inevitably guesses wrong on real, skewed production data.
Then there's the showstopper: the internalQueryForceIntersectionPlans parameter. Let me translate that for you from dev-speak to ops-reality. The word "internal" is vendor code for “if you touch this, you are on your own, and your support contract is now a decorative piece of paper.” The author casually enables it for a "test," but I see the future: a well-meaning developer will discover this post, think they’ve found a secret performance weapon, and deploy it. I can't wait to explain that one during the root cause analysis. “So, you’re telling me you enabled a hidden, undocumented flag named ‘force’ in our production environment?”
I have to admire the casual mention of AND_HASH and its little memUsage metric. Oh, look, it only used 59KB of memory in this tiny, pristine sample dataset where every document is {a: random(), b: random()}. That's adorable. Now, let's extrapolate that to our production cluster with its sprawling, messy documents and a query that returns a few million keys from the first scan. That memUsage won't be a quaint footnote; it’ll be the OOM killer’s last will and testament, scrawled across my terminal at 3 AM on New Year's Day.
My favorite part is the grand conclusion, the dramatic reveal after this entire journey into the database's esoteric internals: just use a compound index. Groundbreaking. They’ve written a thousand-word technical odyssey to arrive at the solution from page one of "Indexing for Dummies." This is the database equivalent of a salesman spending an hour pitching you on a car’s experimental anti-gravity mode, only to conclude with, “But for driving, you should really stick to the wheels.” It reminds me of the sticker on my laptop for "RethinkDB"—they also had some really cool ideas that were fantastic in theory.
So, here’s my prediction. Some hotshot developer, armed with this article, is going to deploy a new "ad-hoc analytics feature" without the right compound index. They'll justify it by saying, "the database is smart enough to use index intersection!" For a few weeks, it'll seem fine. Then, on the first day of a long weekend, a user will run a query with just the right (or wrong) parameters. The planner, in its infinite wisdom, will forgo a simple scan, opt for a "clever" AND_HASH plan, consume every last byte of RAM on the primary node, trigger a failover cascade, and bring the entire application to its knees.
And I'll be there, staring at the Grafana dashboard that looks like a Jackson Pollock painting, adding another vendor sticker to my laptop's graveyard. Back to work.