Skip to content

perf: Suppress telemetry using ContextFlags(usize) instead of bool #2861

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bantonsson
Copy link
Contributor

@bantonsson bantonsson commented Mar 25, 2025

Changes

This PR aligns the Context struct and changes a bool into a flag field. It tries to mitigate the performance impact of #2821 on context attach/detach operations.

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

Sorry, something went wrong.

Copy link

codecov bot commented Mar 25, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.3%. Comparing base (bc82d4f) to head (2f628ee).

Additional details and impacted files
@@          Coverage Diff          @@
##            main   #2861   +/-   ##
=====================================
  Coverage   81.3%   81.3%           
=====================================
  Files        126     126           
  Lines      24156   24162    +6     
=====================================
+ Hits       19650   19656    +6     
  Misses      4506    4506           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bantonsson bantonsson force-pushed the ban/context-suppression branch 3 times, most recently from d79a782 to dc038f9 Compare March 25, 2025 15:36
@scottgerring scottgerring reopened this Mar 26, 2025
@scottgerring
Copy link
Contributor

👷 build bot output from this run:

name baseDuration changesDuration difference
context_attach/nested_cx/empty_cx 31.7±0.14ns 19.1±0.27ns -40
context_attach/nested_cx/single_value_cx 33.7±0.22ns 19.4±0.44ns -42
context_attach/nested_cx/span_cx 33.0±0.29ns 19.4±0.30ns -41
context_attach/out_of_order_cx_drop/empty_cx 38.7±0.18ns 19.8±0.19ns -49
context_attach/out_of_order_cx_drop/single_value_cx 40.1±0.13ns 20.8±0.48ns -48
context_attach/out_of_order_cx_drop/span_cx 39.7±0.22ns 20.8±0.32ns -48
context_attach/single_cx/empty_cx 18.1±0.22ns 12.5±0.31ns -31
context_attach/single_cx/single_value_cx 17.6±0.08ns 12.5±0.13ns -29
context_attach/single_cx/span_cx 17.3±0.13ns 12.5±0.25ns -28

@bantonsson bantonsson force-pushed the ban/context-suppression branch from dc038f9 to 06eb4a1 Compare March 27, 2025 08:10
@bantonsson
Copy link
Contributor Author

bantonsson commented Mar 27, 2025

These are the benchmark numbers from this run:

These numbers are not relevant since the code has changed completely.

@bantonsson bantonsson marked this pull request as ready for review March 27, 2025 10:28
@bantonsson bantonsson requested a review from a team as a code owner March 27, 2025 10:28
@bantonsson bantonsson force-pushed the ban/context-suppression branch 4 times, most recently from e3aca49 to ad3ad40 Compare March 28, 2025 09:14
@bantonsson bantonsson marked this pull request as draft March 28, 2025 14:58
@bantonsson
Copy link
Contributor Author

This is still a draft until #2870 has been merged and all benchmarks are run properly.

@bantonsson bantonsson force-pushed the ban/context-suppression branch 6 times, most recently from baad2fe to 01874de Compare April 3, 2025 15:41
@bantonsson bantonsson changed the title perf: Optimize cloning of Context since it is immutable perf: Suppress telemetry using ContextFlags(usize) instead of bool Apr 4, 2025
@bantonsson
Copy link
Contributor Author

New performance numbers from new approach in this run:

name baseDuration changesDuration difference
context/has_active_span/in-cx/alt 8.4±0.03ns 8.4±0.05ns 0.0
context/has_active_span/in-cx/spec 5.2±0.17ns 5.0±0.16ns -2.9
context/has_active_span/no-cx/alt 8.4±0.03ns 8.4±0.03ns 0.0
context/has_active_span/no-cx/spec 5.0±0.17ns 4.7±0.19ns -6.5
context/has_active_span/no-sdk/alt 8.4±0.02ns 8.4±0.02ns 0.0
context/has_active_span/no-sdk/spec 5.0±0.18ns 4.7±0.17ns -6.5
context/is_recording/in-cx/alt 4.7±0.19ns 4.7±0.14ns 0.0
context/is_recording/in-cx/spec 7.5±0.22ns 7.5±0.23ns 0.0
context/is_recording/no-cx/alt 4.7±0.14ns 4.7±0.17ns 0.0
context/is_recording/no-cx/spec 7.2±0.36ns 7.2±0.39ns +1.0
context/is_recording/no-sdk/alt 4.7±0.13ns 4.7±0.14ns 0.0
context/is_recording/no-sdk/spec 7.2±0.21ns 7.2±0.24ns 0.0
context/is_sampled/in-cx/alt 8.7±0.03ns 8.7±0.04ns 0.0
context/is_sampled/in-cx/spec 5.4±0.15ns 5.3±0.18ns -0.99
context/is_sampled/no-cx/alt 8.7±0.03ns 8.7±0.04ns 0.0
context/is_sampled/no-cx/spec 5.1±0.53ns 5.0±0.17ns -0.99
context/is_sampled/no-sdk/alt 8.7±0.03ns 8.7±0.05ns 0.0
context/is_sampled/no-sdk/spec 5.0±0.06ns 5.0±0.19ns 0.0
context_attach/nested_cx/empty_cx 47.4±1.16ns 39.0±1.06ns -18
context_attach/nested_cx/single_value_cx 48.7±1.19ns 42.8±1.00ns -12
context_attach/nested_cx/span_cx 48.5±0.44ns 43.0±4.56ns -12
context_attach/out_of_order_cx_drop/empty_cx 41.7±0.94ns 40.6±1.20ns -2.9
context_attach/out_of_order_cx_drop/single_value_cx 42.3±2.58ns 42.0±0.89ns -0.99
context_attach/out_of_order_cx_drop/span_cx 42.5±0.87ns 42.0±0.85ns -0.99
context_attach/single_cx/empty_cx 24.0±0.59ns 19.4±0.36ns -19
context_attach/single_cx/single_value_cx 23.4±0.52ns 23.4±0.44ns 0.0
context_attach/single_cx/span_cx 23.4±0.65ns 23.1±0.51ns -2.0
exporter_disabled_concurrent_processor 2.9±0.29ns 3.1±0.08ns +7.0
exporter_disabled_simple_processor 6.0±0.26ns 6.2±0.18ns +5.0
telemetry_suppression/enter_telemetry_suppressed_scope 27.1±0.18ns 25.2±0.46ns -7.4
telemetry_suppression/is_current_telemetry_suppressed_false 1.4±0.02ns 1.3±0.03ns -6.5
telemetry_suppression/is_current_telemetry_suppressed_true 1.4±0.05ns 1.3±0.03ns -6.5
telemetry_suppression/normal_attach 30.4±0.62ns 28.0±0.86ns -7.4

@bantonsson bantonsson force-pushed the ban/context-suppression branch from 01874de to 3416528 Compare April 4, 2025 07:40
@bantonsson bantonsson marked this pull request as ready for review April 4, 2025 07:42
@cijothomas
Copy link
Member

@bantonsson can you run the bench in your machine and see if you are also observing the same? I am seeing regression in my laptop. There are improvements to attach ones anyway, so we should still proceed with this PR, but I am curious how much we can trust the bench results from the CI machines!

telemetry_suppression/enter_telemetry_suppressed_scope
time: [10.170 ns 10.198 ns 10.224 ns]
change: [+8.3229% +8.8793% +9.4188%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) low severe
3 (3.00%) low mild
1 (1.00%) high severe
telemetry_suppression/normal_attach
time: [11.386 ns 11.440 ns 11.495 ns]
change: [+9.0850% +9.5688% +10.094%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
7 (7.00%) low mild
2 (2.00%) high mild
telemetry_suppression/is_current_telemetry_suppressed_false
time: [729.10 ps 731.32 ps 733.54 ps]
change: [-2.4232% -1.9738% -1.5230%] (p = 0.00 < 0.05)
Performance has improved.
Found 20 outliers among 100 measurements (20.00%)
4 (4.00%) low severe
8 (8.00%) low mild
4 (4.00%) high mild
4 (4.00%) high severe
telemetry_suppression/is_current_telemetry_suppressed_true
time: [730.30 ps 732.13 ps 734.16 ps]
change: [-1.7477% -1.2922% -0.7619%] (p = 0.00 < 0.05)
Change within noise threshold.

@bantonsson
Copy link
Contributor Author

bantonsson commented Apr 7, 2025

@cijothomas These are the numbers from my M1 Max Laptop, where I see a slight regression in checks and large improvements in entering.

Benchmarking telemetry_suppression/enter_telemetry_suppressed_scope: Collecting 100 samples in estimated 2.0001 s (135M itelemetry_suppression/enter_telemetry_suppressed_scope
                        time:   [14.479 ns 14.511 ns 14.549 ns]
                        change: [-21.316% -20.968% -20.611%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  7 (7.00%) high severe
telemetry_suppression/normal_attach
                        time:   [15.431 ns 15.483 ns 15.550 ns]
                        change: [-18.631% -18.300% -17.951%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe
Benchmarking telemetry_suppression/is_current_telemetry_suppressed_false: Collecting 100 samples in estimated 2.0000 s (1telemetry_suppression/is_current_telemetry_suppressed_false
                        time:   [1.1357 ns 1.1436 ns 1.1545 ns]
                        change: [+2.1229% +2.7044% +3.3306%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
Benchmarking telemetry_suppression/is_current_telemetry_suppressed_true: Collecting 100 samples in estimated 2.0000 s (1.telemetry_suppression/is_current_telemetry_suppressed_true
                        time:   [1.1311 ns 1.1367 ns 1.1438 ns]
                        change: [+1.1998% +1.6296% +2.0577%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 23 outliers among 100 measurements (23.00%)
  2 (2.00%) low severe
  3 (3.00%) low mild
  4 (4.00%) high mild
  14 (14.00%) high severe

And these are the numbers from my AMD Ryzen 5 3600 2.2GHz box, where I see a slight improvement in checks and large improvements in entering.

Benchmarking telemetry_suppression/enter_telemetry_suppressed_scope: Collecting 100 samples in estimated 2.0001 s (85M iteratio
telemetry_suppression/enter_telemetry_suppressed_scope
                        time:   [23.078 ns 23.081 ns 23.083 ns]
                        change: [-16.623% -16.588% -16.558%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
telemetry_suppression/normal_attach
                        time:   [24.008 ns 24.024 ns 24.039 ns]
                        change: [-17.823% -17.631% -17.412%] (p = 0.00 < 0.05)
                        Performance has improved.
Benchmarking telemetry_suppression/is_current_telemetry_suppressed_false: Collecting 100 samples in estimated 2.0000 s (2.8B it
telemetry_suppression/is_current_telemetry_suppressed_false
                        time:   [715.65 ps 715.73 ps 715.82 ps]
                        change: [-1.7774% -1.7447% -1.7127%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe
Benchmarking telemetry_suppression/is_current_telemetry_suppressed_true: Collecting 100 samples in estimated 2.0000 s (2.8B ite
telemetry_suppression/is_current_telemetry_suppressed_true
                        time:   [715.54 ps 715.63 ps 715.73 ps]
                        change: [-3.6408% -3.0627% -2.5393%] (p = 0.00 < 0.05)
                        Performance has improved.

The code seems to be highly sensitive to alignment, so use a bitfield
instead of a boolean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants