tishj 5 days ago

One thing to add to this: Snapshots can be retained (though rewritten) even through compaction

As a consequence of compaction, when deleting the build up of many small add/delete files, in a format like Iceberg, you would lose the ability to time travel to those earlier states.

With DuckLake's ability to refer to parts of parquet files, we can preserve the ability to time travel, even after deleting the old parquet files

1
ryanworl 5 days ago

Does this trick preclude the ability to sort your data within a partition? You wouldn’t be able to rely on the row IDs being sequential anymore to be able to just refer to a prefix of them within a newly created file.