Daily Database Roasts

Using sysbench to measure how Postgres performance changes over time, November 2025 edition

Originally from smalldatum.blogspot.com/feeds/posts/default

November 29, 2025 • Roasted by Alex "Downtime" Rodriguez Read Original Article

Alright, team, gather 'round the warm glow of the terminal. I just finished reading this… masterpiece of theoretical performance art. It’s a beautiful set of charts, really. They’ll look great in the PowerPoint presentation right before the slide where I have to explain the Q3 outage. They say Postgres is "boring" because they can't find regressions. That's adorable. In my world, "boring" means I get to sleep. Your kind of "boring" is the quiet hum of a server a few seconds before it spectacularly re-partitions the C-suite's sense of calm.

Let's break down this lab report, shall we?

First, the idea that a perfectly sterile benchmark on a freshly compiled binary has any bearing on my production environment is hilarious. You've got your database perfectly cached in memory, running a synthetic workload. That’s not a benchmark; that’s a database's senior prom photo. Let me know how that QPS holds up when the analytics team's intern runs a cross-join on two billion-row tables because they "thought it would be faster." Your cleanroom is my chaotic hellscape of long-running transactions, unexpected vacuum processes, and filesystem-level corruption from a SAN that decided to take an unscheduled holiday.
Ah, the "large improvements" starting in PG 17! I can already hear the pitch: "Alex, the data is clear! We just need to upgrade the main cluster. It's a minor version bump, a simple rolling restart, zero downtime!" I’ve heard that one before. These "large improvements" are always tied to some clever new optimization that has an undocumented edge case. I predict this one will involve a subtle memory leak in the new partitioned hash aggregate that only triggers on Tuesdays when the query is run by a user whose name contains the letter 'Q'. I'll see you all on Slack at 3 AM on Labor Day weekend when the primary fails over, and the replica—which has been silently accumulating replication lag because of a new WAL format incompatibility—comes up with data from last Thursday.
You’re very proud of your iostat and vmstat results. You measured CPU overhead and context switches. Cute. You know what metrics you didn't measure?
- time_to_google_obscure_error_code
- pages_of_documentation_scrolled_past_to_find_the_one_breaking_change
- configs_reverted_per_minute

You're measuring the hum of the engine in a soundproof room. I'm trying to listen for the rattling sound that tells me a wheel is about to fly off on the freeway. While you're optimizing for mutex contention, I'm just hoping the new query planner doesn't suddenly decide all my index scans should be sequential scans after a minor point release.

And you compiled from source. Of course, you did. Nothing says "ready for enterprise production" like a bespoke binary cooked up on a developer's machine. We use hardened, managed, and exhaustively tested packages for a reason. This is the equivalent of a car manufacturer testing a concept car on a private track and then telling me it's ready to haul my kids to soccer practice through a snowstorm. Nice work on the custom build, but I'll stick with the version that has a clear support path that doesn't end at "well, it worked on my machine."

I love the enthusiasm, I really do. It reminds me of the folks from GridScaleDB and VaporCache. I still have their stickers on my old laptop, right next to the empty spot I'm saving for whatever this benchmark convinces my boss to buy next.

Go on, ship it. My pager and I will be waiting.

🔥 The DB Grill 🔥