Daily Database Roasts

Disaggregation: A New Architecture for Cloud Databases

Originally from muratbuffalo.blogspot.com/feeds/posts/default

September 8, 2025 • Roasted by Alex "Downtime" Rodriguez Read Original Article

Ah, another dispatch from the pristine, theoretical world of academia. This is just fantastic. It’s always a treat to read these profoundly predictable papers praising the latest architectural acrobatics. I can already hear the PowerPoint slides being written for the next vendor pitch.

It’s truly insightful how they’ve identified the elastic scalability of the cloud. Groundbreaking. And the solution, of course, is to break everything apart. This move to disaggregated designs is a masterstroke. Why have one thing to manage when you can have three? Or five? Or, as the paper hints, dozens of little database microservices? What could possibly go wrong?

I especially love the parallel to the microservices trend. I remember that world tour. We went from one monolith I barely understood to 50 microservices nobody understood, all held together by YAML and wishful thinking. Now we get to do it all over again with the most critical piece of our infrastructure. This proposed "unified middleware layer" that looks "suspiciously like Kubernetes" doesn't fill me with confidence. It fills me with the cold, creeping dread of debugging network policies and sidecar proxy failures when all I want to know is why the primary is flapping.

And the praise for Socrates, splitting storage into three distinct services—Logging, Caching, and Durable storage—is just delightful. Three services, three potential points of failure, three different monitoring dashboards to build after the first production outage. They promise each can be "tuned for its performance/cost tradeoffs." I can tell you what that means in practice:

The logging service will be on hyper-expensive, hyper-fast storage that fills up and crashes the cluster because no one set up log rotation.
The page cache will have some bizarre eviction policy that triggers a thundering herd problem under load.
The durable page store will be on the cheapest tier possible to save money, ensuring that any recovery scenario takes approximately one geological epoch.

But the real comedy is in the "Tradeoffs" section.

A 2019 study shows a 10x throughput hit compared to a tuned shared-nothing system.

You have to admire the casual way they drop that in. A minor 10x throughput hit. But don't you worry, "optimizations can help narrow the gap." I’m sure they can. Meanwhile, I'll be explaining to the VP of Engineering why our database, built on the revolutionary principles of disaggregation, is now performing on par with a SQLite database running on a Raspberry Pi. But look how elastic it is!

And the proposals for "rethinking core protocols" are a gift that will keep on giving—to my on-call schedule. Cornus 2PC, where active nodes can write to a failed node's log in a shared service? Fantastic. A brand-new, academically clever way to introduce subtle race conditions and split-brain scenarios that will only manifest during the Black Friday peak. My pager just started vibrating sympathetically.

I can't wait for Hermes. An entirely new service that "intercepts transactional logs and analytical reads, merging recent updates into queries on the fly." It sits between compute and storage, creating a brand new, single point of failure that can corrupt data in two directions at once. It’s not a bug, it’s a feature of our HTAP-enabled architecture!

But the final suggestion is the pièce de résistance. Take a monolithic, battle-hardened database like Postgres and "transform it to a disaggregated database." Yes! Let’s perform open-heart surgery on a system known for its stability and reliability, all for the sake of a research paper and some "efficiency tradeoffs." I'll save a spot on my laptop lid for your shiny new sticker, right next to the one from that "unforkable" database that forked, failed, and folded.

Mark my words. This dazzlingly disaggregated dream will become a full-blown operational nightmare. It’s going to fail spectacularly at 3 AM on the Sunday of a long holiday weekend. Not because of some grand, elegant design flaw, but because one of these twenty new "database microservices" will hit a single, esoteric S3 API rate limit. This will cause a cascading calamity of timeouts, retries, and corrupted state that brings the entire system to its knees. And I'll be the one awake, drinking lukewarm coffee, digging through terabytes of uncorrelated logs from seventeen different "observability platforms," trying to piece together why our infinitely scalable, zero-downtime, cloud-native future decided to take an unscheduled vacation.

🔥 The DB Grill 🔥