amluto 5 days ago

But I want a file per range! I’m already writing out an entire chunk of rows, and that chunk is a good size for a Parquet file, and that chunk doesn’t overlap the previous chunk.

Sure, metadata in the Parquet file handles this, but a query planner has to read that metadata, whereas a sensible way to stick the metadata in the file path would allow avoiding reading the file at all.

1
mrlongroots 5 days ago

I have the same gripe. You want a canonical standard that's like "hive partitioning" but defines the range [val1, val2) as column=val1_val2. It's a trivial addition on top of Parquet.

amluto 5 days ago

That would do the trick, as would any other spelling of the same thing.