🔥 The DB Grill 🔥

Where database blog posts get flame-broiled to perfection

MongoDB Internals: How Collections and Indexes Are Stored in WiredTiger
Originally from dev.to/feed/franckpachot
September 14, 2025 • Roasted by Rick "The Relic" Thompson Read Original Article

Alright, settle down, kids. Let me put on my reading glasses. What fresh-faced bit of digital evangelism have we got today? A "deep dive" into WiredTiger? Oh, a deep dive! You mean you ran a few commands and looked at a hex dump? Back in my day, a "deep dive" meant spending a week in a sub-zero machine room with the schematics for the disk controller, trying to figure out why a head crash on platter three was causing ripples in the accounting department's batch reports. You kids and your "containers." Cute. It’s like a playpen for code so it doesn’t wander off and hurt itself.

So you installed a dozen packages, compiled the source code with a string of compiler flags longer than my first mortgage application, just to get a utility to... read a file? Son, in 1988, we had utilities that could read an entire mainframe DASD pack, format it in EBCDIC, and print it to green bar paper before your apt-get even resolved its dependencies. And we did it with three lines of JCL we copied off a punch card.

Let's see here. You've discovered that data is stored in B-Trees. Stop the presses! You're telling me that a data structure invented when I was still programming in FORTRAN IV is the "secret" behind your fancy new storage engine? We were using B-Trees in DB2 on MVS when the closest thing you had to a "document" was a memo typed on a Selectric typewriter. This isn't a deep dive, it's a history lesson you're giving yourself.

And this whole song and dance with piping wt through xxd and jq and some custom Python script... my God. It's a Rube Goldberg machine for reading a catalog file. We had a thing called a data dictionary. It was a binder. A physical binder. You opened it, you looked up the table name, and it told you the file location. Took ten seconds and it never needed a patch. This _mdb_catalog of yours, with its binary BSON gibberish you need three different interpreters to read, is just a less convenient binder.

"The 'key' here is the recordId — an internal, unsigned 64-bit integer MongoDB uses... to order documents in the collection table."

A record ID? You mean... a ROWID? A logical pointer? Groundbreaking. We called that a Relative Byte Address in VSAM circa 1979. It let us update records without the index needing to know where the physical block was. It's a good idea. So good, in fact, that it's been a fundamental concept in database design for half a century. Slapping a new name on it doesn't make it an invention. It just means you finally read chapter four of the textbook.

And this "multi-key" index... an index that has multiple entries for a single document when a field contains an array. You mean... an inverted index? The kind used for text search since the dawn of time? Congratulations on reinventing full-text indexing and acting like you've split the atom. The only thing you've split is a single record into a half-dozen index entries, creating more write amplification than a C-suite executive's LinkedIn post.

But this... this is the real kicker. This whole section at the end. The preening about "No-Steal / No-Force" cache management.

In contrast, MongoDB was designed for short transactions on modern infrastructure, so it keeps transient information in memory and stores durable data on disk to optimize performance and avoid resource intensive background tasks.

Oh, you sweet summer children. You think keeping transaction logs in memory is a feature? We called that "playing with fire." You've built a database that basically crosses its fingers and hopes the power doesn't flicker. I've spent nights sleeping on a data center floor, babysitting a nine-track tape restore because some hotshot programmer thought writing to disk was "too slow." The only thing faster than your in-memory transactions is how quickly your company goes out of business after a city-wide blackout.

"Eliminating the need for expensive tasks such as vacuuming..." You haven't eliminated the need. You've just ignored it and called the resulting mess "eventual consistency." You think a vacuum is expensive? Try restoring a billion-record collection from yesterday's backup because your "No-Steal" policy meant that last hour of committed transactions only existed in the dreams of a server that's now a paperweight. We had write-ahead logging and two-phase commit protocols that were more durable than the concrete they built the data center on. You have a philosophy that sounds like it was cooked up at a startup incubator by someone who's never had to explain data loss to an auditor.

So you've dug into your little .wt files and found B-Trees, logical pointers, and inverted indexes. You've marveled at a system that gambles with data durability for a marginal performance gain in a benchmark nobody cares about.

Let me sum up your "deep dive" for you: You've discovered that under the hip, schema-less, JSON-loving exterior of MongoDB beats the heart of a 1980s relational database, only with less integrity and a bigger gambling problem.

Call me when your web-scale toy has the uptime of a System/370. I've got COBOL jobs older than your entire stack, and guess what? They're still running.