jgalt212 4 days ago

I'm not sure storing 900B or 900MM records for analytics benefits anyone other than AWS. Why not sample?

1
sethhochberg 4 days ago

A use case where we reached for Clickhouse years ago at an old job was for streaming music royalty reporting. Days of runtime on our beefy MySQL cluster, minutes of runtime in a very naively optimized Clickhouse server. And sampling wasn't an option because rightholders like the exactly correct amount of money per stream instead of some approximation of the right amount of money :)

There's nothing Clickhouse does that other OLAP DBs can't do, but the killer feature for us was just how trivially easy it was to replicate InnoDB data into Clickhouse and get great general performance out of the box. It was a very accessible option for a bunch of Rails developers who were moonlighting as DBAs in a small company.

jgalt212 3 days ago

Yes, payments is an N=all scenario. Analytics is not, however.

antisthenes 3 days ago

Use-case dependent. For some analytics, you really want to see the tail ends (e.g. rare events) which sampling can sometimes omit or under-represent.