Where database blog posts get flame-broiled to perfection
Alright, grab a cup of lukewarm coffee and listen up. Some fresh-faced DevOps evangelist just forwarded me this "deep dive" on CPU metrics. It's adorable. Itâs like watching a toddler discover their own feet, except the feet are basic system performance counters weâve had for forty years. Iâve seen more revolutionary ideas on a roll of microfiche.
Here's my take on this groundbreaking piece of literature.
Congratulations on discovering "IO Wait". We had a term for this back in my day, too. It was called âwaiting for the tape drive to spin up.â The stunning revelation that a process stalled on I/O isn't actually burning CPU cycles is, and I say this with all the sincerity I can muster, a real game-changer for 2025. Itâs cute that you needed a fancy dashboard and a complex SELECT query to figure this out. We used to just look at the blinking lights on the disk array. If the "CPU busy" light was off and the "Disk Active" light was having a seizure, we drew the same earth-shattering conclusion. For free.
The breathless exposĂ© on the "silly number" that is load average is my favorite part. You found the comment in the kernel source code! Gold star for you. We knew load average was a blended metric since we were arguing about it over Tab sodas while waiting for our COBOL programs to compile. It includes processes in an uninterruptible sleep state. This isn't a secret; itâs the whole point. It tells you the pressure on the system, not just the raw computation. Treating this like youâve uncovered a conspiracy is like being shocked that a car's speedometer doesn't tell you the engine temperature. They're... different gauges.
I have to admire the scientific rigor of running fio with 32 jobs to prove that disk I/O... causes I/O wait. Brilliant. Back when we were provisioning our DB2 instances on MVS, we had tools that gave us a complete I/O subsystem breakdownâchannel path utilization, control unit contention, head seek times. You kids have "cpuStealPercent," which is just a fancy way of saying you're paying for a CPU that some other tenant is using.
...I've run that on an overprovisioned virtual machine where the hypervisor gives only 1/4th of the CPU cycles... On the mainframe, when you paid for a MIPS, you got a MIPS. This isn't a metric; it's an invoice for time you didn't get. It's the cloud's version of a landlord charging you for the electricity your neighbor uses.
The grand recommendation to replace cpuPercent with cpuUserPercent and cpuSystemPercent is truly the stuff of legends. Youâve basically re-implemented the us and sy columns from the top command. A tool that has existed, in some form, since before most of these "cloud native" engineers were born. I'm half expecting your next blog post to reveal the hidden magic of the ls -l command and how it provides more detail than just ls.
Look, I get it. You have a shiny new observability platform and you need to justify its existence by "demystifying" metrics we've understood for decades. It's all very exciting. You've successfully used a multi-billion dollar cloud infrastructure and a sophisticated SaaS platform to explain what we used to print out on green bar paper from a sar report. The core problem hasn't changed, just the number of PowerPoint slides it takes to explain it.
Thanks for the read. I'll be sure to file this away with my collection of Y2K survival guides. And no, I will not be subscribing.