Description
Is your feature request related to a problem or challenge?
As far as I can tell, there is no good way to load a subset of files from a partitioned table. Using ListingTable
or another TableProvider
like DeltaTableProvider
from deltalake
, I'm able to read_table
, but this loads the entire table. I can also load a list of parquet files with read_parquet
, but this doesn't work with partitioned tables if the partitions are not "materialized" columns in the raw parquet. The only way I've found to load partitioned files is by iterating over a list of file paths, and doing the entire TableProvider
/read_table
process on each one individually, and union
ing the results together.
Describe the solution you'd like
It seems like it would be nice to be able to create a TableProvider
with a table path, then pass some sort of file "whitelist" in. Maybe a read_table_files(TableProvider, impl IntoIterator<Item = String>)
.
Describe alternatives you've considered
As stated above, I've tried reading the files one-by-one and union
ing results, but it's shockingly inefficient compared to reading all files at once.
Additional context
No response