Item 44107859

dkdcio • 6 days ago

This looks awesome. One of my biggest gripe's personally with Iceberg (less-so Delta Lake, but similar) is how difficult it is to just try out on a laptop. Delta Lake has vanilla Python implementations, but those are fragemented and buggy IME. Iceberg has just never worked locally, you need a JVM cluster and a ton of setup. I went down a similar road of trying to use sqlite/postgres+duckdb+parquet files in blob storage, but it was a lot of work.

It seems like this will just work out of the box, and just scale up to very reasonable data sizes. And the work from the DuckDB folks is typically excellent. It's clear they understand this space. Excited to try it out!

frisbm • 6 days ago

Have you tried out PyIceberg yet? It's a pure Python implementation and it works pretty well. It supports a SQL Catalog as well as an In-Memory Catalog via a baked in SQLite SQL Catalog.

https://py.iceberg.apache.org/

1 reply

dkdcio • 5 days ago

this looks much simpler than the last time I tried it!

mritchie712 • 5 days ago

Here's a step-by-step setup. It's using S3 and RDS, but I wouldn't be hard to swap in a local sqlite instead.

https://www.definite.app/blog/cloud-iceberg-duckdb-aws

akshayka • 5 days ago

It's indeed very easy to try locally! For example in a marimo notebook, just a few lines of code: https://www.youtube.com/watch?v=x6YtqvGcDBY

(Disclosure, I am a developer of marimo.)

TheCondor • 5 days ago

I was thinking of putting something together for this. Like a helm chart that works with k3s.

datapains has some good stuff to get trino running and you can get a hivemetastore running with some hacking. I dorked around with it and then got the iceberg connector working with trino and see how it all works. I load data in to a dumb hive with a trino table pointed at it and then insert from select ... in to iceberg.

If the duck guys have some simple to get running stuff, they could probably start to eat everyone else' lunch.

wodenokoto • 6 days ago

Delta-io (based on deltalake-r) runs very very easily locally. Just pip install, and write and you get catalog and everything.

https://delta-io.github.io/delta-rs/

1 reply

dkdcio • 5 days ago

I tried using it but on more than one occasion hit showstopping bugs -- they're probably fixed by now though