Daily Database Roasts

Data Locality vs. Independence: Which Should Your Database Prioritize?

Originally from dev.to/feed/franckpachot

November 23, 2025 • Roasted by Marcus "Zero Trust" Williams Read Original Article

Alright, let's take a look at this... Manifesto. "Understand how the principle of 'store together what is accessed together' is a game-changer." Oh, it’s a game-changer, alright. It’s a game-changer for threat actors, compliance officers, and anyone who enjoys a good old-fashioned, catastrophic data breach. You’ve just written a love letter to exfiltration. Congratulations.

You’re celebrating the idea of bundling everything an application needs into one neat little package. You call it data locality. I call it a loot box for hackers. Instead of them having to painstakingly piece together user data with complex JOINs across five different tables, you’ve served it up on a silver platter. “Here you go, Mr. Attacker, is the user’s PII, their last five orders, their payment token references, and their shipping addresses all in one convenient, monolithic JSON blob. Would you like a single API call to go with that?” It’s not a performance enhancement; it’s a data breach speed-run kit.

And the battle cry is, “give developers complete control rather than an abstraction.” My God, have you met developers? I love them, but I wouldn't trust them to water my plants, let alone architect the physical storage layout of a production database. You’re taking the guardrails off the highway and calling it "agile." The whole point of separating the logical and physical layers was to prevent a developer, hopped up on caffeine and chasing a deadline, from creating a schema that doubles as a denial-of-service vector. But no, you want to let the application dictate storage. The same application that probably has a dozen unpatched Log4j vulnerabilities and stores secrets in a public GitHub repo. What could possibly go wrong?

This whole “application-first approach” where “the responsibility for maintaining integrity…is pushed to the application” is the most terrifying thing I’ve read all week. You’re telling me that instead of battle-hardened, database-level constraints, we’re now relying on some hastily written validation logic in a NodeJS microservice to enforce data integrity?

Good luck with that when one service thinks a postal code is a string and another thinks it's an integer.
Good luck when a new developer pushes a change that bypasses validation and starts injecting { "isAdmin": true } into user profile documents.
Good luck explaining to the SOC 2 auditor that your data integrity model is “trust me, bro, the code handles it.”

You mock the relational model's goal to “serve online interactive use by non-programmers and casual users.” And then you turn around and hand the keys to the engine room to application developers who, you admit, are supposed to be shielded from these complexities! The irony is so thick you could use it for B-tree padding. Those abstractions you’re so eager to throw away—Codd’s rules, normalization, foreign key constraints—aren't legacy cruft. They're the seatbelts, the airbags, and the roll cage that stop a simple coding error from turning into a multi-million-dollar GDPR fine.

And this section on MongoDB’s WiredTiger engine… it’s a masterpiece of catastrophic thinking. “Updates in MongoDB are applied in memory” and then written out in a new version. You call it copy-on-write. I see a race condition factory. You praise that a single document operation is handled by a single node. Wonderful. So when an attacker finds a NoSQL injection vulnerability—and they will, because your "flexible schema" is an open invitation—they only need to compromise one node to rewrite an entire customer aggregate. It’s efficient!

The same domain model in the application is used directly as the database schema. Developers can reason about access patterns without mapping to a separate model, making latency and plan stability predictable.

Predictable, you say? I’ll tell you what’s predictable. The moment your domain model changes—and it will—every single document stored with the old model becomes a ticking time bomb of technical debt and data corruption. You haven’t simplified development; you’ve just tightly coupled your application logic to your physical storage, creating a brittle monolith that will be impossible to refactor. Every feature flag becomes a potential schema schism. This isn’t "domain-driven design"; it’s disaster-driven deployment.

You wave your hand at cross-document joins like $lookup as if they're some arcane evil to be avoided. But what happens when business requirements change and you do need to relate data that you didn't foresee? You'll end up with developers pulling massive documents into the application layer just to pick out one field, joining data in application memory, and inevitably introducing bugs, inconsistencies, and N+1 query nightmares that make an ORM look like a pinnacle of efficiency.

Honestly, reading this feels like watching someone build a bank vault out of plywood because it’s faster and “gives the carpenters more control.” They brag about how quickly they can assemble it while ignoring the fact that it offers zero actual security. All this talk about shaving milliseconds off disk I/O, and you’ve completely ignored the years it will take to clean up the inevitable data cesspool you’ve created.

Just… another day, another revolutionary paradigm that trades decades of hard-won database wisdom for a marginal performance gain on a benchmark. I need a coffee. And a much stronger firewall.

🔥 The DB Grill 🔥