Daily Database Roasts

Parquet as our 'Ski Lift' to Migrate from ClickHouse to CedarDB

Originally from cedardb.com/blog/index.xml

February 11, 2026 • Roasted by Alex "Downtime" Rodriguez Read Original Article

Alright, let me just put down my lukewarm coffee that I've reheated for the third time this morning. I just finished reading this... masterpiece on CedarDB and its "streamlined data exchange." Streamlined. That's a good one. It's about as streamlined as trying to merge a feature branch that hasn't been rebased in six months.

Look, I get it. The skiing metaphor is cute. Black diamond runs for your complex SQL queries. Very creative. My black diamond runs involve a PagerDuty alert at 3 AM on New Year's Day telling me the "painless" data ingest pipeline has monumentally corrupted the primary user table because someone somewhere used a slightly different Parquet library. Your adventure is on the slopes; mine is in the dark, staring at a terminal, fueled by pure rage and whatever stale holiday candy is left in the kitchen.

Let's talk about this "chair lift" – this glorious Parquet migration.

-- Clickhouse: INTO OUTFILE 'votes.parquet'

-- CedarDB: CREATE TABLE votes AS SELECT * FROM '/var/lib/cedardb/data/ext/votes.parquet';

Oh, wow. It's that simple? Just a little SELECT * and you're there? Fantastic. I can't wait to explain this to the product team. Yes, we'll just pause all writes to our multi-terabyte production ClickHouse cluster for the, uh... let's see... 294 seconds it took you to dump the 'posts' table, plus the... 15 minutes it took to load 'posthistory' into CedarDB. All while the business is running. This isn't a migration; it's a weekend project for someone whose biggest production responsibility is keeping their Docker daemon from running out of disk space on their laptop.

And the setup! A single m7a.8xlarge EC2 instance, running everything in Docker, one at a time. This isn't a benchmark; it's a science fair project. A very expensive one at $1.85 an hour, I'll grant you. What happens when you have, I don't know, users? You know, those pesky things that generate data and run queries concurrently? What happens when the OLAP queries you're so proud of start locking up the OLTP workload you also claim to support?

Speaking of which, let's get to my favorite part. The grand promise:

It’s also worth pointing out that, unlike ClickHouse, CedarDB is able to simultaneously handle both your OLAP and your OLTP workload, and it looks just like Postgres, so you get to run your analytical queries on fresh data, with zero replication lag and a simpler data architecture.

Chef's kiss. Perfection. The "one database to rule them all" pitch. I've got a whole section in my sticker collection for databases that promised this. It's right next to the ones that promised "infinite, linear scalability." This magical HTAP database will run our complex, multi-table join analytics right alongside our latency-sensitive transactional updates with no contention. And it "looks just like Postgres," which is tech-speak for 'it's compatible with psql until you try to use a slightly obscure function, an extension we haven't implemented yet, or a query plan that hits a C++ assertion failure, at which point the whole thing will segfault and you'll spend 72 hours on a support ticket with their one engineer who understands the storage engine.'

And "zero replication lag"? Of course there's no replication lag! You did a static, one-time file dump! That's like claiming my car has zero fuel consumption because it's turned off in the garage.

I see the results table. 11x faster! Wonderful. These numbers will look great on a slide deck. They'll look less great when I'm trying to figure out why the memory usage spikes to 99% and the OOM killer nukes the process every time the marketing team runs their "community-touched posts" report. You didn't measure stability, memory pressure, or resource contention because in your little snow globe of an experiment, those things don't exist.

So here's my prediction. The CTO will read this. They'll get excited by the "simpler data architecture" and the "geometric mean speedup." We'll spend six months planning a migration. We'll get to the "painless" Parquet part, and realize that half our data types don't map correctly, the TIMESTAMPTZ has a weird off-by-one-hour bug in their parser for our specific timezone, and the C++ CASE WHEN "ParentId" = '' THEN NULL logic they hand-rolled is slightly different from ClickHouse's. The "zero-downtime" migration will turn into a 12-hour planned maintenance window on a Saturday night.

And then, months later, on Memorial Day weekend, a minor version upgrade will introduce a query planner regression. Q3, the one that was 11.5x faster, will suddenly be 50x slower, consume all available memory, and bring the entire "hybrid" OLTP/OLAP system to its knees. And I'll be the one rolling it back, while the "Après-Ski" details are a distant, mocking memory.

Another day, another revolutionary database. I'll go ahead and clear a spot on my laptop lid for the CedarDB sticker. Right between CockroachDB and FoundationDB. They'll all have something to talk about in the great laptop-sticker-graveyard in the sky. Now if you'll excuse me, my coffee is cold again.

🔥 The DB Grill 🔥