Closed
Description
This has a list of performance improvements:
- Use Arrow Row Format in SortExec to improve performance #5230
- Improve the performance of
Aggregator
, grouping, aggregation #4973 - Improve Sorting / Merge performance #2427
- Possible performance regressions #5061
- Make aggregate accumulators storage column-based #956
- Optimize hash_aggregate when there are no null group keys #850
- Improve grouping performance by special casing small / fixed size keys #846
- Improve performance if IN list function #145
- Improve like/nlike performance #88
- Improve the performance of COUNT DISTINCT queries for high cardinality groups #5547
- Add projection to
FilterExec
to avoid unecessary output creation #5436 - Poor reported performance of DataFusion against DuckDB and Hyper #5942
- datafusion-cli scanning a single large parquet file uses only a single core #5995
- Fuse grouped aggregate and filter operators for improved performance #5944
- Row accumulator support update Scalar values #6002
- TPCH, Query 18 and 17 very slow #5646
- Add projection to
HashJoinExec
to avoid unecessary output creation #6768 - Benchmarks for group by spilling to disk #7571
- Bad Join Order for TPCH Q18 results in slow performance #7950
- Bad Join Order for TPCH Q17 results in slow performance #7949
- Reported DataFusion performance problem #9148
- Improve performance of COUNT (distinct x) for dictionary columns #258
- Improved performance of RANGE preceding window functions #4904
- Slowdown in ClickBench Q36-Q37 between DataFusion 43.0.0 and 44.0.0 #14481
- Avoid extra copies in
CoalesceBatchesExec
to improve performance #7957 - Push Dynamic Join Predicates into Scan ("Sideways Information Passing", etc) #7955
- Run DataFusion benchmarks regularly and track performance history over time #5504
- Materialize Dictionaries in Group Keys #7647
- [EPIC] (Even More) Grouping / Group By / Aggregation Performance #7000
- Speed up hash partitioning #6822