jonstewart 5 days ago

I think maybe this is a pet peeve of Hive and not of Parquet? Yes, it does require opening the Parquet file to look at the min, max range for the column, but only that data and if the data isn’t in range there shouldn’t be further requests.

That is the kind of metadata that is useful to push up, into something like DuckLake.

1
amluto 5 days ago

I guess my peeve could be restated as: Hive’s naming scheme doesn’t handle this, and Parquet per se can’t handle it because it’s out of scope, but all the awesome tools (e.g. DuckDB), when used on Parquet files without something with “lake” or “ice” in the name, use the Hive scheme.

Someone could buck the trend and extend Hive’s scheme to support ranges.