Where database blog posts get flame-broiled to perfection
Alright, settle down, kids. Another one of these blog posts landed in my inbox, forwarded from some DevOps intern who thinks he's discovered cold fusion because he ran fio for five minutes. He asked for my "veteran perspective." He's about to get it. I've seen more reliable storage on a reel-to-reel tape that's been through a flood.
Let's pour some stale coffee and dissect this "groundbreaking research."
Your central thesis, presented with all the fanfare of a moon landing, is that enterprise SSDs are better than consumer SSDs for database workloads. Stop the presses. You mean the expensive, purpose-built hardware with robust components and actual capacitors is more reliable than the flashy gizmo you bought on Amazon Prime Day? Back in my day, we called this "common sense," not a blog post. We didn't have "consumer grade" and "enterprise grade." We had hardware that worked, and hardware that was a boat anchor. You chose poorly, you updated your resume. Simple.
You're all tickled pink about tweaking innodb_flush_method and the "risks" of using O_DIRECT_NO_FSYNC. It’s adorable. You’re essentially debating how fast you can drive with the seatbelt unbuckled. This isn't a feature; it's a footgun for people who want to trade data integrity for a few extra lines on a benchmark chart. We had knobs like this on the mainframe. We also had procedures, written in blood and COBOL, that forbade anyone from touching them unless they wanted to spend the weekend restoring the master customer file from an off-site tape library. Which, by the way, was an actual library.
The breathless discussion of "Power Loss Protection" is my favorite part. You call it PLP; I call it a capacitor and a prayer. You think a power loss is scary now? Try being in a data center when the city block goes dark and the backup generator fails to kick in. That's not a risk of losing a few writes in a buffer. That's the sound of a hundred spinning-platter disks simultaneously grinding to a halt, followed by the sound of your boss's footsteps. Your little microsecond sync latency doesn't mean squat when Stan has to drive the tapes over from the salt mine in Iron Mountain.
I have to chuckle at the "web-scale" comment. You ran these tests on a couple of mini-PCs at home and a cloud instance.
...those checksums made web-scale life much easier when using less than stellar hardware. Son, "web-scale" on "less than stellar hardware" is a recipe for disaster I've been cleaning up since before the web was a thing. Back then, we called it "under-provisioning" and it got you a one-way ticket to the unemployment line. We ran checksums on punch cards to make sure the reader wasn't having a bad day. This isn't a new concept, it's just table stakes.
All these tables, all these microseconds, all this agonizing over fsync versus fdatasync. You've spent days to prove that asking the hardware to actually save the data takes time. Congratulations, you've rediscovered the concept of latency. You know what we did in DB2 on MVS back in '85? We committed the transaction. The system guaranteed it was written to the Direct Access Storage Device. If it was slow, you bought a faster controller or more spindles. You didn't write a novel about it; you wrote a purchase order.
There, there. You ran your little tests and learned a valuable lesson about hardware. It's cute. Keep tinkering, kid. In another thirty years, you'll be just as cynical as I am. Now get off my lawn, I have to go defrag my hard drive. Manually.