Daily Database Roasts

WiredTigerHS.wt: MongoDB MVCC Durable History Store

Originally from dev.to/feed/franckpachot

September 28, 2025 • Roasted by Rick "The Relic" Thompson Read Original Article

Alright, settle down, kids. Let me put down my coffee—the kind that's brewed strong enough to dissolve a spoon—and take a look at this... masterpiece of technical discovery. So, MongoDB has figured out how to keep old versions of data around using something they call a "durable history store."

How precious. It's like watching my grandson show me a vinyl record he found, thinking he's unearthed some lost magic.

Back in my day, we called this concept "logging" and "rollback segments," and we were doing it on DB2 on a System/370 mainframe while most of these developers' parents were still learning how to use a fork. But sure, slap a fancy name on it, call it MVCC, and act like you've just invented fire. It's adorable, really.

Let's break down this... 'architecture.'

They're very proud of their No-Force/No-Steal policy. "Uncommitted changes stay only in memory." Let me translate that from Silicon Valley jargon into English for you: "We pray the power doesn't go out." In memory. You mean in that volatile stuff that vanishes faster than a startup's funding when the power flickers? I've seen entire data centers go dark because a janitor tripped over the wrong plug. We had uninterruptible power supplies the size of a Buick and we still wrote every damned thing to disk, because that's where data lives. We didn't just cross our fingers and hope the write-ahead log could piece it all back together from memory dust.

And then I see this. This beautiful, unholy pipeline of commands: wt ... dump ... | grep ... | cut ... | xxd -r -p | bsondump.

My God, it’s like watching a chimp trying to open a can with a rock. You had to chain together four different utilities just to read your own data file? Back in '88, I had an ISPF panel on a 3270 terminal that could dump a VSAM file, format it in EBCDIC or HEX, and print it to a line printer down the hall before your artisanal coffee was even cool enough to sip. This command line salad you've got here isn't "clever," it's a cry for help. It tells me you built a database engine but forgot to build a damn steering wheel for it.

And what does this grand exploration reveal?

Each entry contains MVCC metadata and the full previous BSON document, representing a full before-image of the collection's document, even if only a single field changed.

A full before-image. So, let me get this straight. You change one character in a 1MB "document," and to keep track of it, you write another full 1MB document to your little "history store"? Congratulations, you've invented the most inefficient transaction logging in the history of computing. We were using change vectors and delta encodings in COBOL programs writing to tape drives when a megabyte was the size of a refrigerator and cost more than a house. We had to care about space. You kids have so much cheap disk you just throw copies of everything around like confetti and call it "web scale."

The author then has the gall to compare this to Oracle and PostgreSQL.

"WiredTiger uses 64-bit logical timestamps, which removes the wraparound risk that PostgreSQL must address." Oh, the horror! Transaction ID wraparound! A solved problem that every competent DBA has handled for thirty years. That's like bragging that your new car has a gas cap so you don't have to worry about rain getting in the tank. It's not a feature, it's the bare minimum.
"MongoDB avoids the accumulation of dead tuples and table bloat found in PostgreSQL, and therefore does not require a process like VACUUM." You know why Postgres has to vacuum? Because it's busy doing the actual, hard work of writing transactional data to a persistent, on-disk heap structure. This is like bragging that you don't have to mow your lawn because you paved the whole thing over with concrete. Sure, it's 'cleaner,' but you've killed the whole point. You're not "avoiding bloat"; you're just pushing the problem into a massive in-memory cache and a separate, inefficient history file.

And this is the part that made me spit out my coffee:

...the trade-off is that long-running transactions may abort if they cannot fit into memory.

There it is. The punchline. Your "modern, horizontally scalable" database just... gives up. It throws its hands in the air and says, "Sorry, this is too much work for me." I used to run batch jobs that updated millions of records and ran for 18 hours straight, processing stacks of punch cards fed into a reader. The job didn't "abort because it couldn't fit in memory." The job ran until it was done, or until the machine caught fire. Usually the former.

So let me predict the future for you. Give 'em five years. They'll be writing breathless blog posts about their next revolutionary feature: a "persistent transactional memory buffer" that's written to disk before commit. They'll call it the "Pre-Commit Durability Layer" or some other nonsense. We called it a "redo log." Then they'll figure out that storing full BSON objects is wasteful, and they'll invent "delta-based historical snapshots."

They're not innovating. They're just speed-running the last 40 years of solved database problems and calling each mistake a feature. Now if you'll excuse me, I have to go check on my tape rotations. At least I know where that data will be tomorrow.

🔥 The DB Grill 🔥