Skip to content

Commit bc5d7f7

Browse files
committed
Enable reading string view by default from Parquet
1 parent ed2b222 commit bc5d7f7

File tree

2 files changed

+9
-5
lines changed

2 files changed

+9
-5
lines changed

datafusion/common/src/config.rs

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -487,9 +487,13 @@ config_namespace! {
487487
/// data frame.
488488
pub maximum_buffered_record_batches_per_stream: usize, default = 2
489489

490-
/// (reading) If true, parquet reader will read columns of `Utf8/Utf8Large` with `Utf8View`,
491-
/// and `Binary/BinaryLarge` with `BinaryView`.
492-
pub schema_force_string_view: bool, default = false
490+
/// (reading) If true (the default), parquet reader will read text and
491+
/// binary columns using Arrow byte view types. DataFusion has
492+
/// specialized proceessing using the Arrow `Utf8View` type for columns
493+
/// that could also be read as `Utf8/Utf8Large` and using the Arrow
494+
/// `BinaryView` type for columns that could also be read as
495+
/// `Binary/BinaryLarge`.
496+
pub schema_force_string_view: bool, default = true
493497
}
494498
}
495499

datafusion/sqllogictest/test_files/information_schema.slt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,7 @@ datafusion.execution.parquet.metadata_size_hint NULL
201201
datafusion.execution.parquet.pruning true
202202
datafusion.execution.parquet.pushdown_filters false
203203
datafusion.execution.parquet.reorder_filters false
204-
datafusion.execution.parquet.schema_force_string_view false
204+
datafusion.execution.parquet.schema_force_string_view true
205205
datafusion.execution.parquet.skip_metadata true
206206
datafusion.execution.parquet.statistics_enabled page
207207
datafusion.execution.parquet.write_batch_size 1024
@@ -291,7 +291,7 @@ datafusion.execution.parquet.metadata_size_hint NULL (reading) If specified, the
291291
datafusion.execution.parquet.pruning true (reading) If true, the parquet reader attempts to skip entire row groups based on the predicate in the query and the metadata (min/max values) stored in the parquet file
292292
datafusion.execution.parquet.pushdown_filters false (reading) If true, filter expressions are be applied during the parquet decoding operation to reduce the number of rows decoded. This optimization is sometimes called "late materialization".
293293
datafusion.execution.parquet.reorder_filters false (reading) If true, filter expressions evaluated during the parquet decoding operation will be reordered heuristically to minimize the cost of evaluation. If false, the filters are applied in the same order as written in the query
294-
datafusion.execution.parquet.schema_force_string_view false (reading) If true, parquet reader will read columns of `Utf8/Utf8Large` with `Utf8View`, and `Binary/BinaryLarge` with `BinaryView`.
294+
datafusion.execution.parquet.schema_force_string_view true (reading) If true (the default), parquet reader will read text and binary columns using Arrow byte view types. DataFusion has specialized proceessing using the Arrow `Utf8View` type for columns that could also be read as `Utf8/Utf8Large` and using the Arrow `BinaryView` type for columns that could also be read as `Binary/BinaryLarge`.
295295
datafusion.execution.parquet.skip_metadata true (reading) If true, the parquet reader skip the optional embedded metadata that may be in the file Schema. This setting can help avoid schema conflicts when querying multiple parquet files with schemas containing compatible types but different metadata
296296
datafusion.execution.parquet.statistics_enabled page (writing) Sets if statistics are enabled for any column Valid values are: "none", "chunk", and "page" These values are not case sensitive. If NULL, uses default parquet writer setting
297297
datafusion.execution.parquet.write_batch_size 1024 (writing) Sets write_batch_size in bytes

0 commit comments

Comments
 (0)