Skip to content

Commit b12c912

Browse files
committed
Add smaller optimization
1 parent 9c9d779 commit b12c912

File tree

4 files changed

+5
-11
lines changed

4 files changed

+5
-11
lines changed

datafusion/common/src/config.rs

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -343,11 +343,7 @@ config_namespace! {
343343
/// When sorting, below what size should data be concatenated
344344
/// and sorted in a single RecordBatch rather than sorted in
345345
/// batches and merged.
346-
/// Note:
347-
/// In theory we should always be able to sort in place, but some corner cases for merging testing
348-
/// failed, so we set a large threshold to avoid that.
349-
/// Future work: potential remove this option and always sort in place.
350-
pub sort_in_place_threshold_bytes: usize, default = 1000 * 1024 * 1024
346+
pub sort_in_place_threshold_bytes: usize, default = 1024 * 1024
351347

352348
/// Number of files to read in parallel when inferring schema and statistics
353349
pub meta_fetch_concurrency: usize, default = 32

datafusion/core/tests/memory_limit/mod.rs

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -459,8 +459,7 @@ async fn test_stringview_external_sort() {
459459
let runtime = builder.build_arc().unwrap();
460460

461461
let config = SessionConfig::new()
462-
.with_sort_spill_reservation_bytes(40 * 1024 * 1024)
463-
.with_sort_in_place_threshold_bytes(1024 * 1024);
462+
.with_sort_spill_reservation_bytes(40 * 1024 * 1024);
464463

465464
let ctx = SessionContext::new_with_config_rt(config, runtime);
466465
ctx.register_table("t", Arc::new(table)).unwrap();
@@ -483,7 +482,6 @@ async fn test_stringview_external_sort() {
483482
async fn test_in_mem_buffer_almost_full() {
484483
let config = SessionConfig::new()
485484
.with_sort_spill_reservation_bytes(3000000)
486-
.with_sort_in_place_threshold_bytes(1024 * 1024)
487485
.with_target_partitions(1);
488486
let runtime = RuntimeEnvBuilder::new()
489487
.with_memory_pool(Arc::new(FairSpillPool::new(10 * 1024 * 1024)))

datafusion/sqllogictest/test_files/information_schema.slt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -260,7 +260,7 @@ datafusion.execution.skip_partial_aggregation_probe_ratio_threshold 0.8
260260
datafusion.execution.skip_partial_aggregation_probe_rows_threshold 100000
261261
datafusion.execution.skip_physical_aggregate_schema_check false
262262
datafusion.execution.soft_max_rows_per_output_file 50000000
263-
datafusion.execution.sort_in_place_threshold_bytes 1048576000
263+
datafusion.execution.sort_in_place_threshold_bytes 1048576
264264
datafusion.execution.sort_spill_reservation_bytes 10485760
265265
datafusion.execution.split_file_groups_by_statistics false
266266
datafusion.execution.target_partitions 7
@@ -360,7 +360,7 @@ datafusion.execution.skip_partial_aggregation_probe_ratio_threshold 0.8 Aggregat
360360
datafusion.execution.skip_partial_aggregation_probe_rows_threshold 100000 Number of input rows partial aggregation partition should process, before aggregation ratio check and trying to switch to skipping aggregation mode
361361
datafusion.execution.skip_physical_aggregate_schema_check false When set to true, skips verifying that the schema produced by planning the input of `LogicalPlan::Aggregate` exactly matches the schema of the input plan. When set to false, if the schema does not match exactly (including nullability and metadata), a planning error will be raised. This is used to workaround bugs in the planner that are now caught by the new schema verification step.
362362
datafusion.execution.soft_max_rows_per_output_file 50000000 Target number of rows in output files when writing multiple. This is a soft max, so it can be exceeded slightly. There also will be one file smaller than the limit if the total number of rows written is not roughly divisible by the soft max
363-
datafusion.execution.sort_in_place_threshold_bytes 1048576000 When sorting, below what size should data be concatenated and sorted in a single RecordBatch rather than sorted in batches and merged. Note: In theory we should always be able to sort in place, but some corner cases for merging testing failed, so we set a large threshold to avoid that. Future work: potential remove this option and always sort in place.
363+
datafusion.execution.sort_in_place_threshold_bytes 1048576 When sorting, below what size should data be concatenated and sorted in a single RecordBatch rather than sorted in batches and merged.
364364
datafusion.execution.sort_spill_reservation_bytes 10485760 Specifies the reserved memory for each spillable sort operation to facilitate an in-memory merge. When a sort operation spills to disk, the in-memory data must be sorted and merged before being written to a file. This setting reserves a specific amount of memory for that in-memory sort/merge process. Note: This setting is irrelevant if the sort operation cannot spill (i.e., if there's no `DiskManager` configured).
365365
datafusion.execution.split_file_groups_by_statistics false Attempt to eliminate sorts by packing & sorting files with non-overlapping statistics into the same file groups. Currently experimental
366366
datafusion.execution.target_partitions 7 Number of partitions for query execution. Increasing partitions can increase concurrency. Defaults to the number of CPU cores on the system

docs/source/user-guide/configs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ Environment variables are read during `SessionConfig` initialisation so they mus
8484
| datafusion.execution.planning_concurrency | 0 | Fan-out during initial physical planning. This is mostly use to plan `UNION` children in parallel. Defaults to the number of CPU cores on the system |
8585
| datafusion.execution.skip_physical_aggregate_schema_check | false | When set to true, skips verifying that the schema produced by planning the input of `LogicalPlan::Aggregate` exactly matches the schema of the input plan. When set to false, if the schema does not match exactly (including nullability and metadata), a planning error will be raised. This is used to workaround bugs in the planner that are now caught by the new schema verification step. |
8686
| datafusion.execution.sort_spill_reservation_bytes | 10485760 | Specifies the reserved memory for each spillable sort operation to facilitate an in-memory merge. When a sort operation spills to disk, the in-memory data must be sorted and merged before being written to a file. This setting reserves a specific amount of memory for that in-memory sort/merge process. Note: This setting is irrelevant if the sort operation cannot spill (i.e., if there's no `DiskManager` configured). |
87-
| datafusion.execution.sort_in_place_threshold_bytes | 1048576000 | When sorting, below what size should data be concatenated and sorted in a single RecordBatch rather than sorted in batches and merged. Note: In theory we should always be able to sort in place, but some corner cases for merging testing failed, so we set a large threshold to avoid that. Future work: potential remove this option and always sort in place. |
87+
| datafusion.execution.sort_in_place_threshold_bytes | 1048576 | When sorting, below what size should data be concatenated and sorted in a single RecordBatch rather than sorted in batches and merged. |
8888
| datafusion.execution.meta_fetch_concurrency | 32 | Number of files to read in parallel when inferring schema and statistics |
8989
| datafusion.execution.minimum_parallel_output_files | 4 | Guarantees a minimum level of output files running in parallel. RecordBatches will be distributed in round robin fashion to each parallel writer. Each writer is closed and a new file opened once soft_max_rows_per_output_file is reached. |
9090
| datafusion.execution.soft_max_rows_per_output_file | 50000000 | Target number of rows in output files when writing multiple. This is a soft max, so it can be exceeded slightly. There also will be one file smaller than the limit if the total number of rows written is not roughly divisible by the soft max |

0 commit comments

Comments
 (0)