πŸ”₯ The DB Grill πŸ”₯

Where database blog posts get flame-broiled to perfection

Why is RocksDB spending so much time handling page faults?
Originally from smalldatum.blogspot.com/feeds/posts/default
October 16, 2025 β€’ Roasted by Marcus "Zero Trust" Williams Read Original Article

Ah, another heartwarming tale from the trenches of "performance engineering." A developer gets confused by a flamegraph, has a little "a-ha!" moment, and writes a blog post about it. The lesson? Just run your benchmark longer! It's so simple, so elegant. I’m sure the attackers targeting your production systems will be kind enough to wait for your block cache to warm up before launching their denial-of-service campaign. Please, Mr. Hacker, give us ten minutes, jemalloc is still asking the OS for another 36 gigs.

Let me translate this "discovery" for the adults in the room. You're telling me that for a completely indeterminate "warm-up" period, your database service spends 22.69% of its CPU time not serving queries, not compacting data, but just… faulting. This isn't a performance quirk; it's a documented, self-inflicted resource exhaustion vulnerability. You've built a system that, upon startup or a cold cache scenario, is designed to immediately thrash and beg for memory. An adversary doesn't even need a sophisticated attack; they just need to restart the pod and watch it choke.

And the underlying cause is just a cascade of beautiful, compliance-violating assumptions. Let's talk about this per-block allocation strategy. You call it a "stress test for a memory allocator." I call it an engraved invitation for every memory corruption exploit known to man. Instead of a single, clean allocation that can be monitored and protected, you've opted for a chaotic system of constant, tiny allocations and deallocations. Every single read operation is a little prayer to the allocation gods. What could possibly go wrong?

You casually mention that "jemalloc and tcmalloc work better than glibc malloc." Oh, delightful. So you've swapped out the default, universally audited system allocator for a third-party dependency because it's faster at papering over your fundamentally unstable allocation model. Did you perform a full security audit on your specific build of jemalloc? Are you subscribed to its CVE feed? Or are you just blindly trusting another layer of abstraction in your already teetering Jenga tower of dependencies?

And my absolute favorite part: the workload is "read-only." It's so quaint, this idea that "read-only" means "safe." As if a carefully crafted series of point lookups couldn't trigger a pathological case in your b-tree traversal, or cause a buffer over-read, or exploit a flaw in the deserialization logic for the block you're pulling off disk. You're not just reading data; you're processing it. Every line of that parsing and processing code is attack surface.

I can just see the SOC 2 audit report now.

Finding C-144.1: Unpredictable System State. The system enters a prolonged state of high CPU utilization (20-25% overhead) for an indeterminate period following a service restart or cache invalidation event. The official remediation from the engineering team is to "wait for it to finish." This lack of deterministic behavior presents a significant availability risk and fails to meet control objectives for CC7.1 and CC7.2 regarding system performance and capacity management.

This isn't a "lesson" about benchmarks. It's a confession. A confession that you've prioritized marginal steady-state IOPS over baseline stability, predictability, and security. You've built a race car that explodes if you take a corner too fast right out of the pit lane.

Honestly, the more I read things like this, the more I think we should just go back to clay tablets. They had predictable latency, at least.