Where database blog posts get flame-broiled to perfection
Ah, another dispatch from the trenches of "industry practice." One reads this sort of thing not with anger, but with a deep, weary sigh, the kind reserved for a promising student who has decided the study of formal grammars is best advanced by composing limericks. They are so very proud of their benchmarks, so meticulous in their compiler flag comparisons. It’s almost... cute.
But let us, for the sake of pedagogy, examine this artifact. It is a perfect specimen of the modern affliction: the relentless pursuit of "more," with nary a thought for "correct."
One notes with a certain weary amusement the myopic obsession with Queries Per Second. It's as if they've built a phenomenally fast automobile that, by design, occasionally forgets the destination or substitutes the driver's family with a bag of wet leaves. 'But look how fast it gets there!' they cry, celebrating a 10% gain from a compiler flag. The 'C' and 'I' in ACID, one must assume, now stand for 'Compiler' and 'Inconsequential'. The purpose of a database, my dear boy, is not to be a firehose of questionable bits.
Then there is the choice of subject: RocksDB. An LSM-tree. Charming. It seems we've abandoned the mathematical elegance of the relational model for what amounts to a cleverly sorted log file. They have gleefully traded Codd's twelve rules for the singular, frantic goal of writing things to disk slightly faster. One imagines Edgar Codd weeping into his seminal 1970 paper, "A Relational Model of Data for Large Shared Data Banks." Clearly, it is no longer on the syllabus, if it ever was.
My favorite part is the hand-wringing over 'variance.' They list its sources—compilers, "intermittent" overhead, "noisy neighbors"—as if they were unavoidable forces of nature, like continental drift or the tides.
The overhead from compaction is intermittent and the LSM tree layout can help or hurt CPU overhead during reads... A system whose performance is a roll of the dice is not a system; it is a casino. They speak of the CAP theorem as if it were a license to build unpredictable contraptions, rather than a formal trade-off to be navigated with intellectual rigor. Clearly they've never read Stonebraker's seminal work on the matter; they're too busy blaming the cloud's "clever" frequency management.
And the methodology! An exhaustive treatise on compiler flags. clang+LTO versus gcc. It is a masterclass in rearranging deck chairs on a ship that has gleefully jettisoned its navigational charts in favor of a faster engine. They have produced pages of data to prove that one compiler can paint a flaking wall slightly faster than another, all while ignoring the fact that the building's foundation is made of sand and good intentions. 'But the paint job is 15% more efficient!' Yes, splendid.
All told, it is a valiant effort in the field of... empirical tinkering. Truly. One must commend the diligence required to produce so many charts about so little of consequence. Keep up the good work, children. Perhaps one day, when the thrill of measuring raw throughput wanes, you might stumble upon a library. There are some wonderful papers in there you might enjoy.