Skip to content

Metrics Aggregation - Improve throughput by 10x #1833

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
May 29, 2024

Conversation

cijothomas
Copy link
Member

@cijothomas cijothomas commented May 25, 2024

Part 1 of #1740

Modifies Sum aggregation (used by Counter/UpDownCounters), to have less contention by using RwLock instead of Mutex to access HashMap of values. Updates (the hot path) now only need read() lock as it leverages interior mutability to update the underlying value. This effectively makes the HashMap read-heavy, and only need read() locks in hot path, significantly reducing contention, and thereby boosting throughput.

Perf numbers from metrics stress test confirms the above: we jump from 3 M/sec to 35 M/sec, i.e 10X jump!
Criterion based benchmarks uses single thread and hence, won't show contention. They are not expected to change at all with the changes in this PR - and results show no change.

This PR focused on throughput only, and the next set of PRs (which require more refactoring), will boost the benchmarks as well significantly.

@cijothomas cijothomas requested a review from a team May 25, 2024 17:49
Copy link

codecov bot commented May 25, 2024

Codecov Report

Attention: Patch coverage is 93.33333% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 74.0%. Comparing base (6ee5579) to head (a33461c).

Files Patch % Lines
opentelemetry-sdk/src/metrics/internal/sum.rs 93.3% 2 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##            main   #1833   +/-   ##
=====================================
  Coverage   74.0%   74.0%           
=====================================
  Files        122     122           
  Lines      19570   19577    +7     
=====================================
+ Hits       14493   14499    +6     
- Misses      5077    5078    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lalitb
Copy link
Member

lalitb commented May 25, 2024

Nice work! updating the value atomically under read lock seems to have boosted the throughout a lot :) Changes look good on quick scan, will review it thoroughly. and though we can get some more juice with #1564 , we are hopefully good for now with these changes.

@cijothomas
Copy link
Member Author

Nice work! updating the value atomically under read lock seems to have boosted the throughout a lot :) Changes look good on quick scan, will review it thoroughly. and though we can get some more juice with #1564 , we are hopefully good for now with these changes.

Once contention is avoided (this PR), we are unlikely to gain hot path perf by just sharding alone. But that'll surely help ease the spikes when collect() thread runs, as we can do locks on smaller section instead of whole. We can revisit the sharding logics from 1564.

@cijothomas
Copy link
Member Author

Nice work! updating the value atomically under read lock seems to have boosted the throughout a lot :) Changes look good on quick scan, will review it thoroughly. and though we can get some more juice with #1564 , we are hopefully good for now with these changes.

Once contention is avoided (this PR), we are unlikely to gain hot path perf by just sharding alone. But that'll surely help ease the spikes when collect() thread runs, as we can do locks on smaller section instead of whole. We can revisit the sharding logics from 1564.

Correction. Sharding still helps updates() compete less with other updates() that need to insert a new KVP combination.

@cijothomas cijothomas merged commit 0f6de5a into open-telemetry:main May 29, 2024
22 checks passed
@cijothomas cijothomas deleted the cijothomas/metrics branch May 29, 2024 01:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants