Daily Database Roasts

Postgres 18rc1 vs sysbench

Originally from smalldatum.blogspot.com/feeds/posts/default

September 11, 2025 • Roasted by Jamie "Vendetta" Mitchell Read Original Article

Ah, benchmark season. It’s that magical time of year when engineering has to justify the last six months of meetings by producing a wall of numbers that marketing can boil down to a single, glorious headline. Seeing this latest dispatch from my old stomping grounds really takes me back. The more things change, the more they stay the same.

Let's take a closer look at this victory lap, shall we?

It’s a bold strategy to lead with "Postgres 18 looks great" and then immediately follow up with "I continue to see small CPU regressions... I have yet to explain that." This is a masterclass in what we used to call "leading with the roadmap." The conclusion was clearly written before the tests were run. Don't worry about those pesky, unexplained performance drops in your core functionality; just focus on the big picture, which, as always, is "next version will be amazing, we promise."
My favorite part of any release candidate benchmark is the list of known, uninvestigated issues. It’s not just a bug, it’s a mystery! We’re treated to a delightful tour of regressions and variances the author freely admits they can't explain.

"I am not certain it is a regression as this might be from non-deterministic CPU overheads... I hope to look at CPU flamegraphs soon." Translation: "It's slower, we don't know why, and QA is just one guy with a laptop who promised to get back to us after his vacation." The promise of "flamegraphs soon" is the engineering equivalent of "the check is in the mail."
Ah, and there’s our old friend, the "variance from MVCC GC (vacuum here)" excuse. A classic. When the numbers are bad, blame vacuum. When the numbers are too good, also blame vacuum. It's the universal scapegoat. I remember meetings where we'd pin entire project failures on "unpredictable vacuum behavior." It’s a brilliant way to frame a fundamental architectural headache as a quirky, unpredictable variable in an otherwise perfect system. If your garbage collection is so noisy it throws off your benchmarks by 30-50%, maybe the problem isn't the benchmark.
The results themselves are a thing of beauty. A 3% regression here, a 1% improvement there, and then—bam!—a 49% improvement on deletes and a 32% improvement on inserts on one machine, which the author themselves admits they've never seen before and assumes is just more "variance." Elsewhere, a full table scan gets a magical 36% speed boost on one box and a 9% slowdown on another. This isn't a performance report; it's a lottery drawing. It hints at a codebase so delicately balanced that a single commit can have wildly unpredictable consequences, a known side effect of bolting on features to meet conference deadlines.
The best part is the frank admission of cherry-picking: "To save time I only run 32 of the 42 microbenchmarks." I see the spirit of the old "efficiency committee" lives on. When you can’t make the numbers look good, just use fewer numbers. It’s elegant, really. Just test the parts you know (or hope) are faster and call it a day. Who needs to test everything? That’s what customers are for.

All in all, a familiar and comforting read. Keep up the... work. It's good to see that even with a new version number, the institutional memory for shipping impressive-looking blogs full of questionable data is alive and well. You'll get there one day.

🔥 The DB Grill 🔥