fishtoaster 4 days ago

Yep. Clickhouse is absolutely great for tons of production use cases.

Unless you try to join tables in it, in which case it will immediately explode.

More seriously, it's a columnar data store, not a relational database. It'll definitely pretend to be "postgres but faster", but that's a very thin and very leaky facade. You want to do massively a complex set of selects and conditional sums over one table with 3b rows and tb of data? You'll get a result in tens of seconds without optimization. You want to join two tables that postgres could handle easily? You'll OOM a machine with TB of memory.

So: good for very specific use cases. If you have those usecases, it's great! If you don't, use something else. Many large companies have those use cases.

3
Boxxed 4 days ago

Yeah I think that's a good summary. For instance, clickbench is comprised of >40 queries and there's not a single join in them: https://github.com/ClickHouse/ClickBench/blob/main/clickhous...

zX41ZdbW 1 day ago

There is the "versions benchmark," which includes a lot of queries with JOINs and compares ClickHouse performance on them: https://benchmark.clickhouse.com/versions/

adrian17 3 days ago

The majority of our queries have joins (plus our core logic often depends on fact table expansion with `arrayJoin()`s) before aggregations and we're doing fine. AFAIK whenever we hit memory issues, they are mostly due to high-cardinality aggregations (especially with uniqExact), not joins. But I'm sure it can depend on the specifics.

legorobot 3 days ago

Definitely agree with this, I think ClickHouse can do a lot with joins if you don't implement them naively. Keeping the server up-to-date is a part of it too.

They've made strides in the last year or two to implement more join algorithms, and re-order your joins automatically (including whats on the "left" and "right" of the join, relating to performance of the algorithm).

Their release notes cover a lot of the highlights, and they have dedicated documentation regarding joins[1]. But we've made improvements by an order-of-magnitude before by just reordering our joins to align with how ClickHouse processes them.

[1]: https://clickhouse.com/docs/guides/joining-tables

hodgesrm 4 days ago

> More seriously, it's a columnar data store, not a relational database.

Could you explain why you don't think ClickHouse is relational? The storage is an implementation detail. It affects how fast queries run but not the query model. Joins have already improved substantially and will continue to do so in future.

fishtoaster 2 days ago

The storage is not just an implementation detail because it affects how fast things run, which affects which tasks it's better or worse for. There's a reason people reach for a columnar datastore for some tasks and something like postgres or mysql for other tasks, even though both are technically capable of nearly the same queries.