amazingamazing 4 days ago

How hard is it to self host clustered clickhouse? Is there parity with the hosted offering?

2
nasretdinov 4 days ago

It's quite easy to host your own instance, we've done it ~7 years ago and had a cluster of over 50 nodes without any major issues. What ClickHouse Cloud offers is "shared nothing" storage, via SharedMergeTree that has S3 as a backing store, and it allows to scale storage and compute separately. The implementation is closed source.

amazingamazing 4 days ago

Interesting - hardware is so cheap though, I guess most enterprises don’t want the hassle.

Personally I’d just go to a colo center buy a rack of super micro and call it a day. No way that’s more expensive after a year (per public pricing).

nasretdinov 4 days ago

Sharding in Open-Source version isn't automatic, so you have to manage it yourself, as in there is no automatic resharding and you need to insert data accordingly. IMO that's the biggest bottleneck in its adoption at larger scale. Previously you didn't have a choice in terms of whether or not to do sharding (and compute/storage separation if you want it), now you have more options, including one from ClickHouse authors themselves.

nine_k 4 days ago

Apparently it's not a bottleneck, it's a sales funnel.

nasretdinov 4 days ago

I don't see a contradiction here tbh. There's nothing wrong in not providing some extra functionality for free (especially for features that users will pay for). If you have engineering resources to manage sharding manually you're welcome to do so. Since ClickHouse is a commercial company and not part of Yandex they need to earn money one way or another to fund the database development.

marvinblum 4 days ago

It's not that hard, but there are a few pitfalls you can stumble into. I currently run three clusters for myself and have set some for clients in the past.

Some of the default config options are weird and SSL is something that needs to be addressed. Overall, still one of the easier DBs to maintain.