the__alchemist 4 days ago

Is there an ELI5 for this company? I'm having a difficult time understanding it from their website. Is it an alternative to Postgres etc? Something that runs on top of it? And analyzes your DB automatically?

8
jameslk 4 days ago

When Postgres takes a while to answer analytical questions like "what's the 75th percentile of response time for these 900 some billion requests rows, grouped by device, network, and date for the past 30 days", that's when you might want to try out ClickHouse

cluckindan 4 days ago

Or literally any other OLAP database.

Is it a surprise that OLTP is not efficient at aggregation and analytics?

nasretdinov 4 days ago

ClickHouse also has great compression and it's easy to install and to try since it's open-source. Also it's typically much faster than even other OLAP, often by a _lot_

swyx 4 days ago

maybe HTAP works for most people though

NunoSempere 4 days ago

That seems like the kind of problem that would be easily done through monte-carlo approximation? How hard is it to get 1M random rows in a postgres database?

sylvinus 4 days ago

ClickHouse has native support for sampling https://clickhouse.com/docs/sql-reference/statements/select/...

jgalt212 4 days ago

I'm not sure storing 900B or 900MM records for analytics benefits anyone other than AWS. Why not sample?

sethhochberg 4 days ago

A use case where we reached for Clickhouse years ago at an old job was for streaming music royalty reporting. Days of runtime on our beefy MySQL cluster, minutes of runtime in a very naively optimized Clickhouse server. And sampling wasn't an option because rightholders like the exactly correct amount of money per stream instead of some approximation of the right amount of money :)

There's nothing Clickhouse does that other OLAP DBs can't do, but the killer feature for us was just how trivially easy it was to replicate InnoDB data into Clickhouse and get great general performance out of the box. It was a very accessible option for a bunch of Rails developers who were moonlighting as DBAs in a small company.

jgalt212 3 days ago

Yes, payments is an N=all scenario. Analytics is not, however.

antisthenes 3 days ago

Use-case dependent. For some analytics, you really want to see the tail ends (e.g. rare events) which sampling can sometimes omit or under-represent.

NewJazz 4 days ago

I'm struggling with TimescaleDB performance right now and wondering if the grass is greener.

andness 3 days ago

Started migrating away from TimescaleDB some time ago too. Initially we self-hosted to test it out. It was very quickly clear that it was a lot better for our use case and we decided to go with Clickhouse Cloud to not have to worry about the ops. The pricing for the cloud offering is very good IMO. We use it for telemetry data from a fleet of IoT devices.

whatevermom 3 days ago

Migrated from TimescaleDB to ClickHouse and it was like night and day. Naive reimplementation of the service performed wayyyy better than timescaledb. Self-hosted.

sukruh 4 days ago

It is.

applied_heat 4 days ago

What is the workload or query that is causing issues?

NewJazz 3 days ago

We denormalized some data then wanted to quickly filter by it. I managed to find a decent index to get us through, but now I'm stuck with another dimension in my data that I'd rather not have. I think I'll have to create a new table, migrate data, then rename it.

bandoti 4 days ago

Or if you have to use it because you’re self-hosting PostHog :)

arecurrence 4 days ago

Clickhouse has a wide range of really interesting technologies that are not in Postgres; fundamentally, it's not an OLTP database like Postgres but more-so aimed at OLAP workloads. I really appreciate Clickhouse's focus on performance and quite a bit of work goes into optimizing the memory allocation and operations among different data types.

The heart of Clickhouse are these table engines (they don't exist in Postgres) https://clickhouse.com/docs/engines/table-engines . The primary column (or columns) is ordered in some way and adjacent values in memory are from the same column in the table. Index entries span wide areas (EG: By default there's only one key record in the primary index for every 8192 rows) because most operations in Clickhouse are aggregate in nature. Inserts are also expected to be in bulk (They are initially a new physical part that is later merged into the main table structure). A single DELETE is an ALTER TABLE operation in the MergeTree engine. :)

This structure allows it to literally crunch billions of values per second (brutally, not with pre-processing, erm, "tricks" although there is a lot of support for that in Clickhouse as well). I've had tables with hundreds of columns and 100+ billion rows that are nearly as performant as a million row table if I can structure the query to work with the table's physical ordering.

Clickhouse recommends not using nullable fields because of the performance implications (it requires storing a bit somewhere for each value). That's how much they care about perf and how close to the raw data type it is that their memory allocation uses. :)

porridgeraisin 4 days ago

> Inserts are also expected to be in bulk (They are initially a new physical part that is later merged into the main table structure). A single DELETE is an ALTER TABLE operation in the MergeTree engine.

> They are initially a new physical part that is later merged into the main table structure

> A single DELETE is an ALTER TABLE operation

Can you explain these two further?

arecurrence 4 days ago

The Clickhouse docs are so good that I'd point straight to them https://clickhouse.com/docs/sql-reference/statements/alter/d... .

The reason I mentioned it is because it's a huge surprise to some people that... from the docs: "The ALTER TABLE prefix makes this syntax different from most other systems supporting SQL. It is intended to signify that unlike similar queries in OLTP databases this is a heavy operation not designed for frequent use. ALTER TABLE is considered a heavyweight operation that requires the underlying data to be merged before it is deleted."

There's also a "lightweight delete" available in many circumstances https://clickhouse.com/docs/sql-reference/statements/delete. Something really nice about the ClickHouse docs is that they devote quite a bit of text to describing the design and performance implications of using an operation. It reiterates the focus on performance that is pervasive across the product.

Edit: Per the other part of your question, why inserts create new parts and how they are merged is best described here https://clickhouse.com/docs/engines/table-engines/mergetree-...

porridgeraisin 3 days ago

Thankyou!

lbhdc 4 days ago

Its a db company that offers an open source database and cloud managed services.

The database is OLAP where Postgres is an OLTP database. Essentially it very fast at complex queries, and is targeted at analytics workloads.

datavirtue 4 days ago

Postgres has been used as the basis for several OLAP systems. These guys are probably using a modified Greenplum.

__s 4 days ago

I got to see Citus at Microsoft fail to close against ClickHouse for an internal project

ClickHouse spun out of Yandex & is open source, https://github.com/ClickHouse/clickhouse

Disclosure: I started at Citus & ended up at ClickHouse

lbhdc 4 days ago

As far as I am aware it is not a derivative of another database.

https://dbdb.io/db/clickhouse

Silasdev 4 days ago

SQL, OLAP, Primary use case is fast aggregations on append only data, like usage analytics.

It's fast, it's........ really fast!!

But you need to get comfortable with their extended SQL dialect that forces you to think a little different than with usual SQL if you want to keep perf high.

simantel 4 days ago

It's an alternative to Postgres in the sense that they're both databases. Read up on OLAP vs. OLTP to see the difference.

doix 4 days ago

I guess you could say it's an alternative to postgres. It's a different database, that's column oriented which makes different tradeoffs. I'd say DuckDB is a better comparison, if you're familiar with it.

pythonaut_16 4 days ago

Expanding for the original question:

Roughly speaking, Postgres is to SQLite what Clickhouse is to DuckDB.

OLTP -> Online Transaction Processing. Postgres and traditional RDBMS. Mainly focused on transactions and addressing specific rows. Queries like "show me all orders for customer X".

OLAP -> Online Analytical Processing. Clickhouse and other columnar oriented. For analytical and calculation queries, like "show me the total value of all orders in March 2024". OLTP database typically store data by column rather than row, and usually have optimizations for storage space and query speed based on that. As a tradeoff they're typically slower for OLTP type queries. Often you'd bring in an OLAP db like Clickhouse when you have a huge volume of data and your OLTP database is struggling to keep up.

ksynwa 4 days ago

What's the significance of "online" in these acronyms?

IMTDb 3 days ago

Online means you expect the responses to come quickly (seconds) after launching the request. The opposite is "offline" where you expect the results to come a long time after making the request (hours / days).

ClickHouse is designed so you can build dashboard with it. Other offline system are designed so you can build reports that you send in PDF over email with them.

edoceo 4 days ago

Live and real-time

stonemetal12 4 days ago

It is a rather old acronym. The other option was batch processing, you will get your results in the mail type thing.

Here "Online" means results while connected to the system, not real time since there is no time requirement for results.

joshstrange 3 days ago

If you go into it with MySQL/Postgres knowledge you will probably hate it.

Source: me

I almost wish it didn’t use SQL so that it was clear how different it is. Nothing works like you are used to, footguns galore, and I hate zookeeper.

I’d replace it with Postgres in a heartbeat if I thought I could get away with it, I don’t think our data size really needs CH. Unfortunately, my options are “spin up a Custer on company resources to prove my point” or “spin it up on my own infra” (which is not possible since that would require pulling company data to my servers which I would never do). So instead I’m stuck dealing with CH.

whobre 4 days ago

It's not like Postgres at all, except on the very superficial level. It is an analytical engine like BigQuery, Snowflake, Teradata, etc...