Skip to content

Use ValueMap for Histogram- Throughput increased by 5x #2033

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

utpilla
Copy link
Contributor

@utpilla utpilla commented Aug 16, 2024

Follow-up to #2012
Similar to #1833. #1989, #2017
Related to #1740

Changes

  • Update Histogram to use ValueMap
Machine information OS: Ubuntu 22.04.4 LTS (5.15.153.1-microsoft-standard-WSL2) Hardware: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz, 16vCPUs, RAM: 64.0 GB

Benchmarks

There is a ~60% increase in benchmark performance. This is coming from ValueMap's optimized lookup on the hot path.

Histogram_Record_With_Three_Random_Attributes Average time
main branch 509.21 n
PR 193.04 ns

Stress Test

Throughput has improved by 5 times. This is coming from ValueMap's reduced thread contention and optimized lookup on the hot path.

16 threads updating random trackers Throughput
main branch 1.8 M/sec
PR 9 M/sec

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@utpilla utpilla requested a review from a team August 16, 2024 20:37
@@ -48,40 +48,54 @@ impl Operation for Assign {
///
/// This structure is parametrized by an `Operation` that indicates how
/// updates to the underlying value trackers should be performed.
pub(crate) struct ValueMap<T: Number<T>, O> {
pub(crate) struct ValueMap<AU: AtomicallyUpdate<T>, T: Number<T>, O> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueMap struct takes three generic parameters now. There is scope to further refactor this. We could update AtomicTracker in a way that would allow us to get rid of Operation trait. I didn't want to move too many things around in this PR.

Copy link

codecov bot commented Aug 16, 2024

Codecov Report

Attention: Patch coverage is 69.74359% with 59 lines in your changes missing coverage. Please review.

Project coverage is 76.6%. Comparing base (108161c) to head (8b92798).

Files Patch % Lines
...pentelemetry-sdk/src/metrics/internal/histogram.rs 65.9% 48 Missing ⚠️
opentelemetry-sdk/src/metrics/internal/mod.rs 77.0% 11 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main   #2033     +/-   ##
=======================================
- Coverage   76.7%   76.6%   -0.1%     
=======================================
  Files        122     122             
  Lines      20828   20916     +88     
=======================================
+ Hits       15993   16041     +48     
- Misses      4835    4875     +40     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

};

buckets.bin(index, value);
buckets.sum(value);
Copy link
Contributor Author

@utpilla utpilla Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One difference now is that we would always update the sum. Previously, this was happening only when self.record_sum is true. I think we are not required to provide record_sum as a configuration option in the first place. Currently, it is set based on this condition:

let record_sum = !matches!(
kind,
InstrumentKind::UpDownCounter
| InstrumentKind::ObservableUpDownCounter
| InstrumentKind::ObservableGauge
);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spec:

Arithmetic sum of Measurement values in population. This SHOULD NOT be collected when used with instruments that record negative measurements (e.g. UpDownCounter or ObservableGauge).

This scenario occurs only if a View is used to aggregate UpDownCounter/Gauge as Histogram. Lets address this (if we chose to), in a future PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, I am unsure if SUM should be not reported, when -ve measurements are provided, which are currently not blockeable at API level for f64...

Proto wording is not super clear to me...Lets follow up later.
https://github.com/open-telemetry/opentelemetry-proto/blob/v0.9.0/opentelemetry/proto/metrics/v1/metrics.proto#L538-L543

Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice gains!
The generic/awkwardness is not a blocker as they are purely internal implementations, and can be improved without breaking.

@cijothomas cijothomas merged commit fcada17 into open-telemetry:main Aug 16, 2024
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants