Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization: Aggregates with many duplicate components #5032

Open
benjaminwinger opened this issue Mar 11, 2025 · 0 comments
Open

Optimization: Aggregates with many duplicate components #5032

benjaminwinger opened this issue Mar 11, 2025 · 0 comments

Comments

@benjaminwinger
Copy link
Collaborator

Description

The following query from ClickBench performs poorly (original sql):

MATCH (h:hits) RETURN SUM(h.ResolutionWidth), SUM(h.ResolutionWidth + 1), SUM(h.ResolutionWidth + 2), SUM(h.ResolutionWidth + 3), SUM(h.ResolutionWidth + 4), SUM(h.ResolutionWidth + 5), SUM(h.ResolutionWidth + 6), SUM(h.ResolutionWidth + 7), SUM(h.ResolutionWidth + 8), SUM(h.ResolutionWidth + 9), SUM(h.ResolutionWidth + 10), SUM(h.ResolutionWidth + 11), SUM(h.ResolutionWidth + 12), SUM(h.ResolutionWidth + 13), SUM(h.ResolutionWidth + 14), SUM(h.ResolutionWidth + 15), SUM(h.ResolutionWidth + 16), SUM(h.ResolutionWidth + 17), SUM(h.ResolutionWidth + 18), SUM(h.ResolutionWidth + 19), SUM(h.ResolutionWidth + 20), SUM(h.ResolutionWidth + 21), SUM(h.ResolutionWidth + 22), SUM(h.ResolutionWidth + 23), SUM(h.ResolutionWidth + 24), SUM(h.ResolutionWidth + 25), SUM(h.ResolutionWidth + 26), SUM(h.ResolutionWidth + 27), SUM(h.ResolutionWidth + 28), SUM(h.ResolutionWidth + 29), SUM(h.ResolutionWidth + 30), SUM(h.ResolutionWidth + 31), SUM(h.ResolutionWidth + 32), SUM(h.ResolutionWidth + 33), SUM(h.ResolutionWidth + 34), SUM(h.ResolutionWidth + 35), SUM(h.ResolutionWidth + 36), SUM(h.ResolutionWidth + 37), SUM(h.ResolutionWidth + 38), SUM(h.ResolutionWidth + 39), SUM(h.ResolutionWidth + 40), SUM(h.ResolutionWidth + 41), SUM(h.ResolutionWidth + 42), SUM(h.ResolutionWidth + 43), SUM(h.ResolutionWidth + 44), SUM(h.ResolutionWidth + 45), SUM(h.ResolutionWidth + 46), SUM(h.ResolutionWidth + 47), SUM(h.ResolutionWidth + 48), SUM(h.ResolutionWidth + 49), SUM(h.ResolutionWidth + 50), SUM(h.ResolutionWidth + 51), SUM(h.ResolutionWidth + 52), SUM(h.ResolutionWidth + 53), SUM(h.ResolutionWidth + 54), SUM(h.ResolutionWidth + 55), SUM(h.ResolutionWidth + 56), SUM(h.ResolutionWidth + 57), SUM(h.ResolutionWidth + 58), SUM(h.ResolutionWidth + 59), SUM(h.ResolutionWidth + 60), SUM(h.ResolutionWidth + 61), SUM(h.ResolutionWidth + 62), SUM(h.ResolutionWidth + 63), SUM(h.ResolutionWidth + 64), SUM(h.ResolutionWidth + 65), SUM(h.ResolutionWidth + 66), SUM(h.ResolutionWidth + 67), SUM(h.ResolutionWidth + 68), SUM(h.ResolutionWidth + 69), SUM(h.ResolutionWidth + 70), SUM(h.ResolutionWidth + 71), SUM(h.ResolutionWidth + 72), SUM(h.ResolutionWidth + 73), SUM(h.ResolutionWidth + 74), SUM(h.ResolutionWidth + 75), SUM(h.ResolutionWidth + 76), SUM(h.ResolutionWidth + 77), SUM(h.ResolutionWidth + 78), SUM(h.ResolutionWidth + 79), SUM(h.ResolutionWidth + 80), SUM(h.ResolutionWidth + 81), SUM(h.ResolutionWidth + 82), SUM(h.ResolutionWidth + 83), SUM(h.ResolutionWidth + 84), SUM(h.ResolutionWidth + 85), SUM(h.ResolutionWidth + 86), SUM(h.ResolutionWidth + 87), SUM(h.ResolutionWidth + 88), SUM(h.ResolutionWidth + 89);

The performance isn't too terrible, it's no worse than most of the hash aggregate queries from clickbench, but this could be massively simplified by the following optimizations:

  1. Getting the optimizer to factorize SUM(h.ResolutionWidth + N) into SUM(h.ResolutionWidth) + COUNT(h) * N
  2. Only calculating SUM(h.ResolutionWidth) and COUNT(h) once and re-using the results.

On the other hand, this is a really contrived example and you could easily re-write it to something like ... WITH SUM(h.ResolutionWidth) as w, COUNT(h) as c RETURN w, w + c, w + c * 2, w + c * 3, .... But even that is only about 4x faster.

The performance at the moment (on our 2xEPYC 7551 machine) is roughly 3s for the original query, 0.8s for the simplified query I gave above, or just 0.1s for just MATCH (h:hits) RETURN SUM(h.ResolutionWidth), COUNT(h);.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant