When Postgres takes a while to answer analytical questions like "what's the 75th percentile of response time for these 900 some billion requests rows, grouped by device, network, and date for the past 30 days", that's when you might want to try out ClickHouse
Or literally any other OLAP database.
Is it a surprise that OLTP is not efficient at aggregation and analytics?
ClickHouse also has great compression and it's easy to install and to try since it's open-source. Also it's typically much faster than even other OLAP, often by a _lot_
That seems like the kind of problem that would be easily done through monte-carlo approximation? How hard is it to get 1M random rows in a postgres database?
ClickHouse has native support for sampling https://clickhouse.com/docs/sql-reference/statements/select/...
I'm not sure storing 900B or 900MM records for analytics benefits anyone other than AWS. Why not sample?
A use case where we reached for Clickhouse years ago at an old job was for streaming music royalty reporting. Days of runtime on our beefy MySQL cluster, minutes of runtime in a very naively optimized Clickhouse server. And sampling wasn't an option because rightholders like the exactly correct amount of money per stream instead of some approximation of the right amount of money :)
There's nothing Clickhouse does that other OLAP DBs can't do, but the killer feature for us was just how trivially easy it was to replicate InnoDB data into Clickhouse and get great general performance out of the box. It was a very accessible option for a bunch of Rails developers who were moonlighting as DBAs in a small company.
Yes, payments is an N=all scenario. Analytics is not, however.
Use-case dependent. For some analytics, you really want to see the tail ends (e.g. rare events) which sampling can sometimes omit or under-represent.
I'm struggling with TimescaleDB performance right now and wondering if the grass is greener.
Started migrating away from TimescaleDB some time ago too. Initially we self-hosted to test it out. It was very quickly clear that it was a lot better for our use case and we decided to go with Clickhouse Cloud to not have to worry about the ops. The pricing for the cloud offering is very good IMO. We use it for telemetry data from a fleet of IoT devices.
Migrated from TimescaleDB to ClickHouse and it was like night and day. Naive reimplementation of the service performed wayyyy better than timescaledb. Self-hosted.
What is the workload or query that is causing issues?
We denormalized some data then wanted to quickly filter by it. I managed to find a decent index to get us through, but now I'm stuck with another dimension in my data that I'd rather not have. I think I'll have to create a new table, migrate data, then rename it.