Skip to content

Reduce size of Expr struct #14366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jan 29, 2025

Which issue does this PR close?

Rationale for this change

@waynexia 's comment on #14256 (comment) got me thinking maybe the build time regression had something to do with the size of Expr

So I poked around for ways to reduce the size, and I found that currently Expr is 272 bytes

What changes are included in this PR?

  1. Add a test for the size of Expr
  2. Change Expr::WindowFunction(WindowFunction) --> Expr::WindowFunction(Box<WindowFunction>) -- which drops the size of Exprfrom 272 to112` bytes

Are these changes tested?

functionally by CI

TODO:

  1. test impact on build time
  2. test impact on planning performance benchmarks

Are there any user-facing changes?

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jan 29, 2025
Comment on lines 2437 to 2439
return Some(idx + input_len);
} else {
None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inconsistent return vs value

@alamb alamb force-pushed the alamb/make_expr_smaller branch from b25ba57 to f5df767 Compare January 29, 2025 21:42
@github-actions github-actions bot added sql SQL Planner optimizer Optimizer rules core Core DataFusion crate substrait Changes to the substrait crate proto Related to proto crate labels Jan 29, 2025
@alamb alamb mentioned this pull request Jan 29, 2025
@alamb
Copy link
Contributor Author

alamb commented Feb 12, 2025

I ran some build benchmarks on a GPC machine and I conclude this change does not improve the build timings

Building main....
+ rm -rf target
+ cargo build --release --timings --lib --quiet

real    4m28.125s
user    36m43.568s
sys     1m19.993s
+ rm -rf target
+ cargo build --release --timings --lib --quiet

real    4m32.743s
user    36m47.328s
sys     1m19.786s
+ rm -rf target
+ cargo build --release --timings --lib --quiet

real    4m33.336s
user    36m50.926s
sys     1m19.246s
+ git reset --hard
HEAD is now at cb5d42e135 Disable extended tests (#14604)
+ git checkout alamb/make_expr_smaller
Switched to branch 'alamb/make_expr_smaller'
Your branch is up to date with 'alamb/alamb/make_expr_smaller'.
+ echo 'Building make_expr_smaller....'
Building make_expr_smaller....
+ rm -rf target
+ cargo build --release --timings --lib --quiet

real    4m30.234s
user    37m15.027s
sys     1m19.688s
+ rm -rf target
+ cargo build --release --timings --lib --quiet

real    4m30.580s
user    37m6.178s
sys     1m19.885s
+ rm -rf target
+ cargo build --release --timings --lib --quiet

real    4m32.293s
user    36m54.063s
sys     1m19.557s
alamb@aal-dev:~/arrow-datafusion$

@crepererum
Copy link
Contributor

TBH I would be surprised if the build performance is affected by a few memcpy calls. I think the question here is rather if our runtime performance changes, potentially due to less stress on the memory allocator.

@alamb
Copy link
Contributor Author

alamb commented Feb 19, 2025

I'll try and find time to run some sql planning benchmarks

Copy link

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label Apr 22, 2025
@github-actions github-actions bot closed this May 1, 2025
@alamb
Copy link
Contributor Author

alamb commented May 16, 2025

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/make_expr_smaller (279f074) to cb5d42e diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --bench sql_planner
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_make_expr_smaller
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented May 16, 2025

🤖: Benchmark completed

Details

group                                         alamb_make_expr_smaller                main
-----                                         -----------------------                ----
logical_aggregate_with_join                   1.00  1486.0±16.67µs        ? ?/sec    1.02  1519.2±14.28µs        ? ?/sec
logical_select_all_from_1000                  1.00      4.5±0.02ms        ? ?/sec    1.01      4.6±0.03ms        ? ?/sec
logical_select_one_from_700                   1.00  1182.9±12.15µs        ? ?/sec    1.03  1213.7±19.22µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00  1148.5±16.11µs        ? ?/sec    1.02  1171.8±12.08µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00  1140.3±19.70µs        ? ?/sec    1.02  1164.6±17.11µs        ? ?/sec
physical_intersection                         1.00      2.4±0.02ms        ? ?/sec    1.02      2.5±0.02ms        ? ?/sec
physical_join_consider_sort                   1.00      3.2±0.02ms        ? ?/sec    1.02      3.3±0.02ms        ? ?/sec
physical_join_distinct                        1.00  1148.5±15.20µs        ? ?/sec    1.00  1150.0±10.96µs        ? ?/sec
physical_many_self_joins                      1.01     17.5±0.12ms        ? ?/sec    1.00     17.3±0.18ms        ? ?/sec
physical_plan_clickbench_all                  1.00    285.4±4.98ms        ? ?/sec    1.14    324.2±6.52ms        ? ?/sec
physical_plan_clickbench_q1                   1.00      4.6±0.12ms        ? ?/sec    1.10      5.0±0.13ms        ? ?/sec
physical_plan_clickbench_q10                  1.00      5.7±0.14ms        ? ?/sec    1.16      6.5±0.42ms        ? ?/sec
physical_plan_clickbench_q11                  1.00      5.8±0.22ms        ? ?/sec    1.13      6.6±0.21ms        ? ?/sec
physical_plan_clickbench_q12                  1.00      6.0±0.17ms        ? ?/sec    1.12      6.7±0.24ms        ? ?/sec
physical_plan_clickbench_q13                  1.00      5.5±0.11ms        ? ?/sec    1.12      6.1±0.17ms        ? ?/sec
physical_plan_clickbench_q14                  1.00      5.8±0.17ms        ? ?/sec    1.12      6.5±0.15ms        ? ?/sec
physical_plan_clickbench_q15                  1.00      5.6±0.14ms        ? ?/sec    1.13      6.3±0.18ms        ? ?/sec
physical_plan_clickbench_q16                  1.00      5.1±0.18ms        ? ?/sec    1.09      5.5±0.19ms        ? ?/sec
physical_plan_clickbench_q17                  1.00      5.0±0.13ms        ? ?/sec    1.14      5.8±0.15ms        ? ?/sec
physical_plan_clickbench_q18                  1.00      4.9±0.13ms        ? ?/sec    1.12      5.4±0.15ms        ? ?/sec
physical_plan_clickbench_q19                  1.00      5.7±0.16ms        ? ?/sec    1.12      6.4±0.14ms        ? ?/sec
physical_plan_clickbench_q2                   1.00      4.9±0.13ms        ? ?/sec    1.10      5.4±0.12ms        ? ?/sec
physical_plan_clickbench_q20                  1.00      4.5±0.10ms        ? ?/sec    1.11      5.0±0.14ms        ? ?/sec
physical_plan_clickbench_q21                  1.00      4.9±0.15ms        ? ?/sec    1.13      5.5±0.20ms        ? ?/sec
physical_plan_clickbench_q22                  1.00      5.7±0.17ms        ? ?/sec    1.15      6.6±0.17ms        ? ?/sec
physical_plan_clickbench_q23                  1.00      6.2±0.16ms        ? ?/sec    1.15      7.1±0.17ms        ? ?/sec
physical_plan_clickbench_q24                  1.00      6.9±0.18ms        ? ?/sec    1.12      7.8±0.18ms        ? ?/sec
physical_plan_clickbench_q25                  1.00      5.1±0.13ms        ? ?/sec    1.14      5.8±0.12ms        ? ?/sec
physical_plan_clickbench_q26                  1.00      4.8±0.13ms        ? ?/sec    1.12      5.4±0.13ms        ? ?/sec
physical_plan_clickbench_q27                  1.00      5.2±0.16ms        ? ?/sec    1.11      5.8±0.19ms        ? ?/sec
physical_plan_clickbench_q28                  1.00      5.9±0.18ms        ? ?/sec    1.14      6.7±0.20ms        ? ?/sec
physical_plan_clickbench_q29                  1.00      6.9±0.15ms        ? ?/sec    1.17      8.1±0.27ms        ? ?/sec
physical_plan_clickbench_q3                   1.00      4.9±0.11ms        ? ?/sec    1.10      5.3±0.12ms        ? ?/sec
physical_plan_clickbench_q30                  1.00     19.5±0.32ms        ? ?/sec    1.15     22.3±0.25ms        ? ?/sec
physical_plan_clickbench_q31                  1.00      6.1±0.17ms        ? ?/sec    1.15      7.0±0.20ms        ? ?/sec
physical_plan_clickbench_q32                  1.00      6.0±0.14ms        ? ?/sec    1.15      7.0±0.18ms        ? ?/sec
physical_plan_clickbench_q33                  1.00      5.6±0.17ms        ? ?/sec    1.13      6.3±0.17ms        ? ?/sec
physical_plan_clickbench_q34                  1.00      5.1±0.15ms        ? ?/sec    1.15      5.9±0.16ms        ? ?/sec
physical_plan_clickbench_q35                  1.00      5.2±0.15ms        ? ?/sec    1.14      6.0±0.16ms        ? ?/sec
physical_plan_clickbench_q36                  1.00      6.4±0.15ms        ? ?/sec    1.15      7.3±0.16ms        ? ?/sec
physical_plan_clickbench_q37                  1.00      6.4±0.16ms        ? ?/sec    1.17      7.5±0.15ms        ? ?/sec
physical_plan_clickbench_q38                  1.00      6.3±0.15ms        ? ?/sec    1.16      7.3±0.16ms        ? ?/sec
physical_plan_clickbench_q39                  1.00      6.0±0.17ms        ? ?/sec    1.15      6.8±0.14ms        ? ?/sec
physical_plan_clickbench_q4                   1.00      4.6±0.12ms        ? ?/sec    1.09      5.0±0.13ms        ? ?/sec
physical_plan_clickbench_q40                  1.00      6.6±0.22ms        ? ?/sec    1.16      7.6±0.19ms        ? ?/sec
physical_plan_clickbench_q41                  1.00      6.3±0.18ms        ? ?/sec    1.16      7.3±0.15ms        ? ?/sec
physical_plan_clickbench_q42                  1.00      6.2±0.17ms        ? ?/sec    1.14      7.0±0.21ms        ? ?/sec
physical_plan_clickbench_q43                  1.00      6.2±0.19ms        ? ?/sec    1.18      7.3±0.22ms        ? ?/sec
physical_plan_clickbench_q44                  1.00      4.7±0.13ms        ? ?/sec    1.13      5.3±0.17ms        ? ?/sec
physical_plan_clickbench_q45                  1.00      4.7±0.14ms        ? ?/sec    1.10      5.2±0.11ms        ? ?/sec
physical_plan_clickbench_q46                  1.00      5.4±0.22ms        ? ?/sec    1.10      5.9±0.16ms        ? ?/sec
physical_plan_clickbench_q47                  1.00      6.1±0.18ms        ? ?/sec    1.09      6.6±0.19ms        ? ?/sec
physical_plan_clickbench_q48                  1.00      6.6±0.19ms        ? ?/sec    1.12      7.4±0.20ms        ? ?/sec
physical_plan_clickbench_q49                  1.00      6.8±0.17ms        ? ?/sec    1.14      7.8±0.23ms        ? ?/sec
physical_plan_clickbench_q5                   1.00      4.8±0.12ms        ? ?/sec    1.11      5.4±0.12ms        ? ?/sec
physical_plan_clickbench_q6                   1.00      4.9±0.14ms        ? ?/sec    1.08      5.3±0.14ms        ? ?/sec
physical_plan_clickbench_q7                   1.00      5.3±0.11ms        ? ?/sec    1.13      6.0±0.19ms        ? ?/sec
physical_plan_clickbench_q8                   1.00      5.1±0.14ms        ? ?/sec    1.11      5.7±0.18ms        ? ?/sec
physical_plan_clickbench_q9                   1.00      5.4±0.14ms        ? ?/sec    1.13      6.1±0.23ms        ? ?/sec
physical_plan_tpcds_all                       1.00  1333.8±11.64ms        ? ?/sec    1.10  1466.0±19.74ms        ? ?/sec
physical_plan_tpch_all                        1.00     85.9±0.70ms        ? ?/sec    1.11     95.5±0.88ms        ? ?/sec
physical_plan_tpch_q1                         1.00      3.1±0.03ms        ? ?/sec    1.10      3.4±0.02ms        ? ?/sec
physical_plan_tpch_q10                        1.00      4.4±0.05ms        ? ?/sec    1.06      4.7±0.04ms        ? ?/sec
physical_plan_tpch_q11                        1.00      3.9±0.06ms        ? ?/sec    1.09      4.2±0.05ms        ? ?/sec
physical_plan_tpch_q12                        1.00      2.9±0.03ms        ? ?/sec    1.11      3.3±0.03ms        ? ?/sec
physical_plan_tpch_q13                        1.00      2.4±0.03ms        ? ?/sec    1.08      2.6±0.04ms        ? ?/sec
physical_plan_tpch_q14                        1.00      2.7±0.03ms        ? ?/sec    1.09      3.0±0.03ms        ? ?/sec
physical_plan_tpch_q16                        1.00      3.6±0.06ms        ? ?/sec    1.11      4.0±0.03ms        ? ?/sec
physical_plan_tpch_q17                        1.00      3.5±0.03ms        ? ?/sec    1.10      3.9±0.04ms        ? ?/sec
physical_plan_tpch_q18                        1.00      4.0±0.03ms        ? ?/sec    1.09      4.3±0.04ms        ? ?/sec
physical_plan_tpch_q19                        1.00      5.2±0.04ms        ? ?/sec    1.17      6.0±0.04ms        ? ?/sec
physical_plan_tpch_q2                         1.00      7.3±0.10ms        ? ?/sec    1.08      7.9±0.08ms        ? ?/sec
physical_plan_tpch_q20                        1.00      4.5±0.05ms        ? ?/sec    1.10      5.0±0.04ms        ? ?/sec
physical_plan_tpch_q21                        1.00      5.7±0.05ms        ? ?/sec    1.10      6.3±0.05ms        ? ?/sec
physical_plan_tpch_q22                        1.00      3.4±0.02ms        ? ?/sec    1.12      3.8±0.03ms        ? ?/sec
physical_plan_tpch_q3                         1.00      3.2±0.03ms        ? ?/sec    1.08      3.4±0.03ms        ? ?/sec
physical_plan_tpch_q4                         1.00      2.5±0.02ms        ? ?/sec    1.08      2.7±0.03ms        ? ?/sec
physical_plan_tpch_q5                         1.00      4.4±0.05ms        ? ?/sec    1.06      4.6±0.03ms        ? ?/sec
physical_plan_tpch_q6                         1.00  1799.7±24.60µs        ? ?/sec    1.07  1931.1±14.84µs        ? ?/sec
physical_plan_tpch_q7                         1.00      5.5±0.05ms        ? ?/sec    1.09      6.0±0.07ms        ? ?/sec
physical_plan_tpch_q8                         1.00      6.6±0.06ms        ? ?/sec    1.09      7.2±0.09ms        ? ?/sec
physical_plan_tpch_q9                         1.00      5.2±0.05ms        ? ?/sec    1.07      5.6±0.04ms        ? ?/sec
physical_select_aggregates_from_200           1.00     30.7±0.19ms        ? ?/sec    1.08     33.1±0.21ms        ? ?/sec
physical_select_all_from_1000                 1.00     41.5±0.17ms        ? ?/sec    1.02     42.5±0.20ms        ? ?/sec
physical_select_one_from_700                  1.00      3.3±0.02ms        ? ?/sec    1.01      3.4±0.02ms        ? ?/sec
physical_sorted_union_orderby                 1.00    113.2±0.52ms        ? ?/sec    1.05    119.2±0.60ms        ? ?/sec
physical_theta_join_consider_sort             1.00      3.7±0.06ms        ? ?/sec    1.01      3.7±0.04ms        ? ?/sec
physical_unnest_to_join                       1.00      3.3±0.03ms        ? ?/sec    1.01      3.4±0.02ms        ? ?/sec
with_param_values_many_columns                1.00    123.3±0.94µs        ? ?/sec    1.28    157.8±1.29µs        ? ?/sec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules proto Related to proto crate sql SQL Planner Stale PR has not had any activity for some time substrait Changes to the substrait crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants