Metrics Aggregation - Improve throughput by 10x #1833

cijothomas · 2024-05-25T17:49:02Z

Part 1 of #1740

Modifies Sum aggregation (used by Counter/UpDownCounters), to have less contention by using RwLock instead of Mutex to access HashMap of values. Updates (the hot path) now only need read() lock as it leverages interior mutability to update the underlying value. This effectively makes the HashMap read-heavy, and only need read() locks in hot path, significantly reducing contention, and thereby boosting throughput.

Perf numbers from metrics stress test confirms the above: we jump from 3 M/sec to 35 M/sec, i.e 10X jump!
Criterion based benchmarks uses single thread and hence, won't show contention. They are not expected to change at all with the changes in this PR - and results show no change.

This PR focused on throughput only, and the next set of PRs (which require more refactoring), will boost the benchmarks as well significantly.

…tability for value

…y-rust

…entelemetry-rust into cijothomas/metrics

codecov · 2024-05-25T17:52:27Z

Codecov Report

Attention: Patch coverage is 93.33333% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 74.0%. Comparing base (6ee5579) to head (a33461c).

Files	Patch %	Lines
opentelemetry-sdk/src/metrics/internal/sum.rs	93.3%	2 Missing ⚠️

Additional details and impacted files

@@          Coverage Diff          @@
##            main   #1833   +/-   ##
=====================================
  Coverage   74.0%   74.0%           
=====================================
  Files        122     122           
  Lines      19570   19577    +7     
=====================================
+ Hits       14493   14499    +6     
- Misses      5077    5078    +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

lalitb · 2024-05-25T18:33:39Z

Nice work! updating the value atomically under read lock seems to have boosted the throughout a lot :) Changes look good on quick scan, will review it thoroughly. and though we can get some more juice with #1564 , we are hopefully good for now with these changes.

cijothomas · 2024-05-26T14:16:03Z

Nice work! updating the value atomically under read lock seems to have boosted the throughout a lot :) Changes look good on quick scan, will review it thoroughly. and though we can get some more juice with #1564 , we are hopefully good for now with these changes.

Once contention is avoided (this PR), we are unlikely to gain hot path perf by just sharding alone. But that'll surely help ease the spikes when collect() thread runs, as we can do locks on smaller section instead of whole. We can revisit the sharding logics from 1564.

opentelemetry-sdk/CHANGELOG.md

opentelemetry-sdk/src/metrics/internal/sum.rs

cijothomas · 2024-05-29T01:17:31Z

Nice work! updating the value atomically under read lock seems to have boosted the throughout a lot :) Changes look good on quick scan, will review it thoroughly. and though we can get some more juice with #1564 , we are hopefully good for now with these changes.

Once contention is avoided (this PR), we are unlikely to gain hot path perf by just sharding alone. But that'll surely help ease the spikes when collect() thread runs, as we can do locks on smaller section instead of whole. We can revisit the sharding logics from 1564.

Correction. Sharding still helps updates() compete less with other updates() that need to insert a new KVP combination.

cijothomas added 22 commits May 10, 2024 16:10

counter aggregation to use readwrite lock on hashmap with interior mu…

f385c8e

…tability for value

Merge branch 'main' into cijothomas/metrics

9133feb

Remove unused

5a24907

Merge branch 'main' of https://github.com/open-telemetry/opentelemetr…

76024e6

…y-rust

Merge branch 'main' into cijothomas/metrics

e45e0fe

Merge branch 'main' into cijothomas/metrics

fcc634d

Merge branch 'cijothomas/metrics' of https://github.com/cijothomas/op…

653d846

…entelemetry-rust into cijothomas/metrics

Merge branch 'main' into cijothomas/metrics

de2aa23

Merge branch 'main' into cijothomas/metrics

e274b36

fmt

eb12f15

revert exampe

1c7e1b9

cleaner

0026c4c

Merge branch 'main' into cijothomas/metrics

82091fc

Merge branch 'main' into cijothomas/metrics

ba8e9c1

fix overflow

ba141fe

Merge branch 'main' into cijothomas/metrics

866d42f

fix cp

d8d4c3d

comment

3444181

Merge branch 'main' into cijothomas/metrics

fa98bb0

changelog

3d1249f

Merge branch 'main' into cijothomas/metrics

2915806

update stress test number

775eb66

cijothomas requested a review from a team May 25, 2024 17:49

fix fmt

0e389d0

Merge branch 'main' into cijothomas/metrics

ab086d2

TommyCpp approved these changes May 28, 2024

View reviewed changes

opentelemetry-sdk/CHANGELOG.md Outdated Show resolved Hide resolved

TommyCpp reviewed May 28, 2024

View reviewed changes

opentelemetry-sdk/src/metrics/internal/sum.rs Outdated Show resolved Hide resolved

hdost approved these changes May 28, 2024

View reviewed changes

lalitb approved these changes May 28, 2024

View reviewed changes

utpilla approved these changes May 28, 2024

View reviewed changes

cijothomas added 2 commits May 28, 2024 18:12

Merge branch 'main' into cijothomas/metrics

34f6b58

pr comments

a33461c

cijothomas merged commit 0f6de5a into open-telemetry:main May 29, 2024
22 checks passed

cijothomas deleted the cijothomas/metrics branch May 29, 2024 01:26

lalitb mentioned this pull request Jul 2, 2024

Adding two level hashing in metrics hashmap #1564

Closed

4 tasks

jaemk mentioned this pull request Jul 12, 2024

Pre-beta release with performance improvements? #1930

Closed

cijothomas mentioned this pull request Aug 6, 2024

Metric refactor - 2x perf and allocation free #1989

Merged

This was referenced Aug 13, 2024

Use ValueMap for Gauge - Throughput increased around 8x #2017

Merged

Use ValueMap for Histogram- Throughput increased by 5x #2033

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metrics Aggregation - Improve throughput by 10x #1833

Metrics Aggregation - Improve throughput by 10x #1833

Uh oh!

cijothomas commented May 25, 2024 •

edited

Loading

Uh oh!

codecov bot commented May 25, 2024 •

edited

Loading

Uh oh!

lalitb commented May 25, 2024 •

edited

Loading

Uh oh!

cijothomas commented May 26, 2024

Uh oh!

Uh oh!

Uh oh!

cijothomas commented May 29, 2024

Uh oh!

Uh oh!

Uh oh!

Metrics Aggregation - Improve throughput by 10x #1833

Metrics Aggregation - Improve throughput by 10x #1833

Uh oh!

Conversation

cijothomas commented May 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented May 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lalitb commented May 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cijothomas commented May 26, 2024

Uh oh!

Uh oh!

Uh oh!

cijothomas commented May 29, 2024

Uh oh!

Uh oh!

Uh oh!

cijothomas commented May 25, 2024 •

edited

Loading

codecov bot commented May 25, 2024 •

edited

Loading

lalitb commented May 25, 2024 •

edited

Loading