-
Notifications
You must be signed in to change notification settings - Fork 525
Metrics Aggregation - Improve throughput by 10x #1833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics Aggregation - Improve throughput by 10x #1833
Conversation
…tability for value
…entelemetry-rust into cijothomas/metrics
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1833 +/- ##
=====================================
Coverage 74.0% 74.0%
=====================================
Files 122 122
Lines 19570 19577 +7
=====================================
+ Hits 14493 14499 +6
- Misses 5077 5078 +1 ☔ View full report in Codecov by Sentry. |
Nice work! updating the value atomically under read lock seems to have boosted the throughout a lot :) Changes look good on quick scan, will review it thoroughly. and though we can get some more juice with #1564 , we are hopefully good for now with these changes. |
Once contention is avoided (this PR), we are unlikely to gain hot path perf by just sharding alone. But that'll surely help ease the spikes when collect() thread runs, as we can do locks on smaller section instead of whole. We can revisit the sharding logics from 1564. |
Correction. Sharding still helps updates() compete less with other updates() that need to insert a new KVP combination. |
Part 1 of #1740
Modifies Sum aggregation (used by Counter/UpDownCounters), to have less contention by using
RwLock
instead ofMutex
to accessHashMap
of values. Updates (the hot path) now only needread()
lock as it leverages interior mutability to update the underlying value. This effectively makes theHashMap
read-heavy, and only needread()
locks in hot path, significantly reducing contention, and thereby boosting throughput.Perf numbers from metrics stress test confirms the above: we jump from 3 M/sec to 35 M/sec, i.e 10X jump!
Criterion based benchmarks uses single thread and hence, won't show contention. They are not expected to change at all with the changes in this PR - and results show no change.
This PR focused on throughput only, and the next set of PRs (which require more refactoring), will boost the benchmarks as well significantly.