Improve `compare.py` output to use `min` times and better column titles #6134

alamb · 2023-04-26T21:37:43Z

Which issue does this PR close?

Part of #6127

Rationale for this change

I want to be able to easily understand what is being compared and reported. Currently the reporting from compare.py is somewhat opaque:

It averages benchmark run timings (which increases the variance)
The header columns are non sensical

What changes are included in this PR?

Use min of the iteration times to report the benchmark results -- I think this minimizes run to run variance and is the fairest way to asses performance
Use the filename as the column header

Are these changes tested?

No, it is a tool only change

Are there any user-facing changes?

Not really

Example output Before this PR

(note the bad headers)

python3 compare.py results/alamb_bench/tpch.json results/alamb_bench/tpch_mem.json
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ /Users/alam… ┃           -o ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │     794.64ms │     559.37ms │ +1.42x faster │
│ QQuery 2     │     239.12ms │     171.29ms │ +1.40x faster │
│ QQuery 3     │     254.99ms │      99.33ms │ +2.57x faster │
│ QQuery 4     │     147.85ms │      66.65ms │ +2.22x faster │
│ QQuery 5     │     322.94ms │     258.98ms │ +1.25x faster │
│ QQuery 6     │     130.89ms │      24.24ms │ +5.40x faster │
│ QQuery 7     │     629.42ms │     645.68ms │     no change │
│ QQuery 8     │     375.57ms │     140.26ms │ +2.68x faster │
│ QQuery 9     │     597.65ms │     334.59ms │ +1.79x faster │
│ QQuery 10    │     397.16ms │     197.59ms │ +2.01x faster │
│ QQuery 11    │     182.41ms │     167.33ms │ +1.09x faster │
│ QQuery 12    │     228.44ms │      99.10ms │ +2.31x faster │
│ QQuery 13    │     736.21ms │     438.47ms │ +1.68x faster │
│ QQuery 14    │     195.68ms │      31.99ms │ +6.12x faster │
│ QQuery 15    │     175.08ms │      57.62ms │ +3.04x faster │
│ QQuery 16    │     175.67ms │     144.90ms │ +1.21x faster │
│ QQuery 17    │    1953.59ms │    1925.39ms │     no change │
│ QQuery 18    │    2133.06ms │    1968.85ms │ +1.08x faster │
│ QQuery 19    │     371.82ms │     107.44ms │ +3.46x faster │
│ QQuery 20    │     680.02ms │     567.75ms │ +1.20x faster │
│ QQuery 21    │     987.36ms │     857.87ms │ +1.15x faster │
│ QQuery 22    │     105.31ms │      77.25ms │ +1.36x faster │
└──────────────┴──────────────┴──────────────┴───────────────┘

Output after this PR

python3 compare.py results/alamb_bench/tpch.json results/alamb_bench/tpch_mem.json
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      tpch ┃  tpch_mem ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  762.35ms │  548.16ms │ +1.39x faster │
│ QQuery 2     │  210.23ms │  164.26ms │ +1.28x faster │
│ QQuery 3     │  239.37ms │   96.47ms │ +2.48x faster │
│ QQuery 4     │  147.01ms │   59.81ms │ +2.46x faster │
│ QQuery 5     │  317.68ms │  251.01ms │ +1.27x faster │
│ QQuery 6     │  129.40ms │   22.86ms │ +5.66x faster │
│ QQuery 7     │  603.09ms │  629.24ms │     no change │
│ QQuery 8     │  359.94ms │  138.52ms │ +2.60x faster │
│ QQuery 9     │  578.71ms │  329.58ms │ +1.76x faster │
│ QQuery 10    │  378.17ms │  188.56ms │ +2.01x faster │
│ QQuery 11    │  180.20ms │  159.49ms │ +1.13x faster │
│ QQuery 12    │  219.37ms │   94.54ms │ +2.32x faster │
│ QQuery 13    │  732.80ms │  425.82ms │ +1.72x faster │
│ QQuery 14    │  190.55ms │   30.99ms │ +6.15x faster │
│ QQuery 15    │  173.51ms │   54.58ms │ +3.18x faster │
│ QQuery 16    │  163.33ms │  130.02ms │ +1.26x faster │
│ QQuery 17    │ 1866.25ms │ 1855.35ms │     no change │
│ QQuery 18    │ 2063.31ms │ 1916.26ms │ +1.08x faster │
│ QQuery 19    │  363.00ms │  102.86ms │ +3.53x faster │
│ QQuery 20    │  639.92ms │  530.65ms │ +1.21x faster │
│ QQuery 21    │  946.66ms │  815.83ms │ +1.16x faster │
│ QQuery 22    │  103.37ms │   73.07ms │ +1.41x faster │
└──────────────┴───────────┴───────────┴───────────────┘

comphead · 2023-04-28T15:48:13Z

benchmarks/compare.py

-        return method(iteration.elapsed for iteration in self.iterations)
+        # Use minimum execution time to account for variations / other
+        # things the system was doing
+        return min(iteration.elapsed for iteration in self.iterations)


lgtm, but imho, can min be deceptive comparing to avg/mean/median? 🤔

I agree it is misleading -- in terms of measuring a change between datafusion versions, I think min will give us the least variance between runs and represents best case performance.

However, it doesn't really give a sense for how much variation is across runs 🤔

alamb added 2 commits April 26, 2023 17:33

Minor: improve output of compare.py

167f2aa

cleanup

882aab7

alamb added the development-process Related to development process of DataFusion label Apr 26, 2023

github-actions bot removed the development-process Related to development process of DataFusion label Apr 26, 2023

alamb changed the title ~~Alamb/better compare~~ Improve compare.py output to use min times and better column titles Apr 26, 2023

alamb added the development-process Related to development process of DataFusion label Apr 26, 2023

alamb closed this Apr 27, 2023

alamb reopened this Apr 27, 2023

comphead reviewed Apr 28, 2023

View reviewed changes

comphead approved these changes Apr 28, 2023

View reviewed changes

r4ntix approved these changes Apr 28, 2023

View reviewed changes

alamb merged commit 0374b9a into apache:main Apr 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve `compare.py` output to use `min` times and better column titles #6134

Improve `compare.py` output to use `min` times and better column titles #6134

Uh oh!

alamb commented Apr 26, 2023 •

edited

Loading

Uh oh!

comphead Apr 28, 2023

Uh oh!

alamb Apr 28, 2023

Uh oh!

Uh oh!

Improve compare.py output to use min times and better column titles #6134

Improve compare.py output to use min times and better column titles #6134

Uh oh!

Conversation

alamb commented Apr 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Example output Before this PR

Output after this PR

Uh oh!

comphead Apr 28, 2023

Choose a reason for hiding this comment

Uh oh!

alamb Apr 28, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Improve `compare.py` output to use `min` times and better column titles #6134

Improve `compare.py` output to use `min` times and better column titles #6134

alamb commented Apr 26, 2023 •

edited

Loading