Where database blog posts get flame-broiled to perfection
Ah, another dispatch from the digital salt mines. One occasionally peers out from the ivory tower to observe the practitioners, and it is always... illuminating. This latest benchmark of MySQL, bless their hearts, is a veritable case study in the industry's profound, almost willful, ignorance of first principles. One is almost moved to tears, if one were capable of such an unquantifiable emotional response.
Let's dissect this little piece of folk art, shall we?
One is immediately struck by the breathless obsession with Queries Per Second. It’s the metric of choice for those who view a database not as a pristine, logical system for the preservation of truth, but as a short-order cook slinging hash-browns at maximum velocity. It’s as if Edgar Codd's Twelve Rules were merely gentle suggestions, easily discarded in the frantic race to... what, exactly? Shave a few microseconds off a query that was likely ill-formed to begin with? The paper documents a litany of "regressions," where newer, more feature-laden versions are slower. This is presented as a shocking discovery, rather than the perfectly predictable outcome of bolting on more chrome without a sound theoretical chassis. Such a laudable goal.
The author seems genuinely perplexed that adding complexity increases CPU overhead. It’s rather quaint. Clearly, they've never read Stonebraker's seminal work on the price of feature-rich systems. One doesn't simply bolt on a JSON data type—an affront to First Normal Form, I might add—and expect the carefully balanced machinery of a query optimizer to remain pristine. They are discovering, in the most painful and public way, that more code is not, in fact, "better." It's the inevitable heat death of a system designed by committee rather than by formal logic.
This frantic measurement of "context switches per operation" and "CPU per query" is a classic example of what I call implementation-first thinking. They are so deeply mired in the grubby details of the machine that they can no longer see the Platonic ideal of the relational model. They celebrate a 30% drop in CPU per insert in one version, only to lament a 36% rise in the next, as if they are watching the fickle whims of a pagan god rather than the deterministic result of their own poor choices.
The purpose is to search for regressions from new CPU overhead and mutex contention. They might as well be reading tea leaves. The real regression happened decades ago when the industry decided that reading papers was optional.
And of course, this entire exercise is predicated on the flimsy foundation of microbenchmarks. It's the equivalent of judging a symphony by testing the resonant frequency of a single violin string. They run isolated SELECT and INSERT statements in a sterile environment and declare victory or defeat. What of complex, multi-statement transactions? What of logical consistency under contention? What of the "I" in ACID—Isolation? I see no mention of phantom reads or dirty writes. But by all means, let's panic because the read-only-count test regressed by 15%. It's a cargo cult of performance metrics, utterly devoid of meaning.
Still, one must commend the effort. It is... thorough. They've produced many charts. So many colourful lines going up and down. It's wonderful that the practitioners are keeping themselves busy. Keep tinkering, little ones. Perhaps one day you'll stumble backwards into a sound design. Or, you know, you could just read a paper.