jakozaur 6 days ago

Iceberg competitor, addressing some of its shortcomings, like blown-up metadata:

https://quesma.com/blog-detail/apache-iceberg-practical-limi...

Even Snowflake was using FoundationDB for metadata, whereas Iceberg attempts to use blob storage even for the metadata layer.

2
buremba 6 days ago

I had the same impression but I wouldn't call it competitor after watching their video: https://youtu.be/zeonmOO9jm4?t=4032

They support syncing to Iceberg by writing the manifest and metadata files on demand, and they already have read support for Iceberg. They just fixed Iceberg's core issues but it's not a direct competitor as you can use DuckLake along with Iceberg in a very nice and bidirectional way.

prpl 6 days ago

metadata bloat can be due to a few things, but it’s manageable.

* number of snapshots

* frequent large schema changes

* lots of small files/row level updates

* lots of stats

The last one IIRC used to be pretty bad especially with larger schemas.

Most engines have ways to help with this - compaction, snapshot exportation, etc… Though it can still be up to the user. S3 tables is supposed to do some of this for you.

If metadata is below 1-5MB it’s really not an issue. Your commit rate is effectively limited by the size of your metadata and the number of writers you have.

I’ve written scripts to fix 1GB+ metadata files in production. Usually it was pruning snapshots without deleting files (relying on bucket policy to later clean things up) or removing old schema versions.