Item 44127455

jameslk • 4 days ago

When Postgres takes a while to answer analytical questions like "what's the 75th percentile of response time for these 900 some billion requests rows, grouped by device, network, and date for the past 30 days", that's when you might want to try out ClickHouse

cluckindan • 4 days ago

Or literally any other OLAP database.

Is it a surprise that OLTP is not efficient at aggregation and analytics?

2 replies

nasretdinov • 4 days ago

ClickHouse also has great compression and it's easy to install and to try since it's open-source. Also it's typically much faster than even other OLAP, often by a _lot_

swyx • 4 days ago

maybe HTAP works for most people though

NunoSempere • 4 days ago

That seems like the kind of problem that would be easily done through monte-carlo approximation? How hard is it to get 1M random rows in a postgres database?

1 reply

sylvinus • 4 days ago

ClickHouse has native support for sampling https://clickhouse.com/docs/sql-reference/statements/select/...

jgalt212 • 4 days ago

I'm not sure storing 900B or 900MM records for analytics benefits anyone other than AWS. Why not sample?

1 reply

sethhochberg • 4 days ago

A use case where we reached for Clickhouse years ago at an old job was for streaming music royalty reporting. Days of runtime on our beefy MySQL cluster, minutes of runtime in a very naively optimized Clickhouse server. And sampling wasn't an option because rightholders like the exactly correct amount of money per stream instead of some approximation of the right amount of money :)

There's nothing Clickhouse does that other OLAP DBs can't do, but the killer feature for us was just how trivially easy it was to replicate InnoDB data into Clickhouse and get great general performance out of the box. It was a very accessible option for a bunch of Rails developers who were moonlighting as DBAs in a small company.

1 reply

jgalt212 • 3 days ago

Yes, payments is an N=all scenario. Analytics is not, however.

1 reply

antisthenes • 3 days ago

Use-case dependent. For some analytics, you really want to see the tail ends (e.g. rare events) which sampling can sometimes omit or under-represent.

NewJazz • 4 days ago

I'm struggling with TimescaleDB performance right now and wondering if the grass is greener.

4 replies

andness • 3 days ago

Started migrating away from TimescaleDB some time ago too. Initially we self-hosted to test it out. It was very quickly clear that it was a lot better for our use case and we decided to go with Clickhouse Cloud to not have to worry about the ops. The pricing for the cloud offering is very good IMO. We use it for telemetry data from a fleet of IoT devices.

whatevermom • 3 days ago

Migrated from TimescaleDB to ClickHouse and it was like night and day. Naive reimplementation of the service performed wayyyy better than timescaledb. Self-hosted.

sukruh • 4 days ago

It is.

applied_heat • 4 days ago

What is the workload or query that is causing issues?

1 reply

NewJazz • 3 days ago

We denormalized some data then wanted to quickly filter by it. I managed to find a decent index to get us through, but now I'm stuck with another dimension in my data that I'd rather not have. I think I'll have to create a new table, migrate data, then rename it.

bandoti • 4 days ago

Or if you have to use it because you’re self-hosting PostHog :)