Skip to content

Improve compare.py output to use min times and better column titles #6134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 30, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Apr 26, 2023

Which issue does this PR close?

Part of #6127

Rationale for this change

I want to be able to easily understand what is being compared and reported. Currently the reporting from compare.py is somewhat opaque:

  1. It averages benchmark run timings (which increases the variance)
  2. The header columns are non sensical

What changes are included in this PR?

  1. Use min of the iteration times to report the benchmark results -- I think this minimizes run to run variance and is the fairest way to asses performance
  2. Use the filename as the column header

Are these changes tested?

No, it is a tool only change

Are there any user-facing changes?

Not really

Example output Before this PR

(note the bad headers)

python3 compare.py results/alamb_bench/tpch.json results/alamb_bench/tpch_mem.json
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ /Users/alam… ┃           -o ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │     794.64ms │     559.37ms │ +1.42x faster │
│ QQuery 2     │     239.12ms │     171.29ms │ +1.40x faster │
│ QQuery 3     │     254.99ms │      99.33ms │ +2.57x faster │
│ QQuery 4     │     147.85ms │      66.65ms │ +2.22x faster │
│ QQuery 5     │     322.94ms │     258.98ms │ +1.25x faster │
│ QQuery 6     │     130.89ms │      24.24ms │ +5.40x faster │
│ QQuery 7     │     629.42ms │     645.68ms │     no change │
│ QQuery 8     │     375.57ms │     140.26ms │ +2.68x faster │
│ QQuery 9     │     597.65ms │     334.59ms │ +1.79x faster │
│ QQuery 10    │     397.16ms │     197.59ms │ +2.01x faster │
│ QQuery 11    │     182.41ms │     167.33ms │ +1.09x faster │
│ QQuery 12    │     228.44ms │      99.10ms │ +2.31x faster │
│ QQuery 13    │     736.21ms │     438.47ms │ +1.68x faster │
│ QQuery 14    │     195.68ms │      31.99ms │ +6.12x faster │
│ QQuery 15    │     175.08ms │      57.62ms │ +3.04x faster │
│ QQuery 16    │     175.67ms │     144.90ms │ +1.21x faster │
│ QQuery 17    │    1953.59ms │    1925.39ms │     no change │
│ QQuery 18    │    2133.06ms │    1968.85ms │ +1.08x faster │
│ QQuery 19    │     371.82ms │     107.44ms │ +3.46x faster │
│ QQuery 20    │     680.02ms │     567.75ms │ +1.20x faster │
│ QQuery 21    │     987.36ms │     857.87ms │ +1.15x faster │
│ QQuery 22    │     105.31ms │      77.25ms │ +1.36x faster │
└──────────────┴──────────────┴──────────────┴───────────────┘

Output after this PR

python3 compare.py results/alamb_bench/tpch.json results/alamb_bench/tpch_mem.json
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      tpch ┃  tpch_mem ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  762.35ms │  548.16ms │ +1.39x faster │
│ QQuery 2     │  210.23ms │  164.26ms │ +1.28x faster │
│ QQuery 3     │  239.37ms │   96.47ms │ +2.48x faster │
│ QQuery 4     │  147.01ms │   59.81ms │ +2.46x faster │
│ QQuery 5     │  317.68ms │  251.01ms │ +1.27x faster │
│ QQuery 6     │  129.40ms │   22.86ms │ +5.66x faster │
│ QQuery 7     │  603.09ms │  629.24ms │     no change │
│ QQuery 8     │  359.94ms │  138.52ms │ +2.60x faster │
│ QQuery 9     │  578.71ms │  329.58ms │ +1.76x faster │
│ QQuery 10    │  378.17ms │  188.56ms │ +2.01x faster │
│ QQuery 11    │  180.20ms │  159.49ms │ +1.13x faster │
│ QQuery 12    │  219.37ms │   94.54ms │ +2.32x faster │
│ QQuery 13    │  732.80ms │  425.82ms │ +1.72x faster │
│ QQuery 14    │  190.55ms │   30.99ms │ +6.15x faster │
│ QQuery 15    │  173.51ms │   54.58ms │ +3.18x faster │
│ QQuery 16    │  163.33ms │  130.02ms │ +1.26x faster │
│ QQuery 17    │ 1866.25ms │ 1855.35ms │     no change │
│ QQuery 18    │ 2063.31ms │ 1916.26ms │ +1.08x faster │
│ QQuery 19    │  363.00ms │  102.86ms │ +3.53x faster │
│ QQuery 20    │  639.92ms │  530.65ms │ +1.21x faster │
│ QQuery 21    │  946.66ms │  815.83ms │ +1.16x faster │
│ QQuery 22    │  103.37ms │   73.07ms │ +1.41x faster │
└──────────────┴───────────┴───────────┴───────────────┘

@alamb alamb added the development-process Related to development process of DataFusion label Apr 26, 2023
@github-actions github-actions bot removed the development-process Related to development process of DataFusion label Apr 26, 2023
@alamb alamb changed the title Alamb/better compare Improve compare.py output to use min times and better column titles Apr 26, 2023
@alamb alamb added the development-process Related to development process of DataFusion label Apr 26, 2023
@alamb alamb closed this Apr 27, 2023
@alamb alamb reopened this Apr 27, 2023
return method(iteration.elapsed for iteration in self.iterations)
# Use minimum execution time to account for variations / other
# things the system was doing
return min(iteration.elapsed for iteration in self.iterations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but imho, can min be deceptive comparing to avg/mean/median? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is misleading -- in terms of measuring a change between datafusion versions, I think min will give us the least variance between runs and represents best case performance.

However, it doesn't really give a sense for how much variation is across runs 🤔

@alamb alamb merged commit 0374b9a into apache:main Apr 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development-process Related to development process of DataFusion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants