Skip to content

Update default values for explicit histogram bucket boundaries to better handle seconds #4527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
EFord36 opened this issue Apr 4, 2025 · 0 comments

Comments

@EFord36
Copy link

EFord36 commented Apr 4, 2025

Is your feature request related to a problem?

Currently, if you set up a histogram with unit "s" and default bucket boundaries, the boundaries are reasonable for network latencies measure in milliseconds, but not seconds. For latencies in seconds, it is likely that all/almost all measurements fall into the first bucket of '0-5' seconds, which is then uninformative to the user.

This issue is the same as open-telemetry/opentelemetry-dotnet#4797 opened for the dotnet sdk. However, I am suggesting that ideally we go further than open-telemetry/opentelemetry-dotnet#4820 which closed that issue, since it only changes the default for known instruments using seconds, and make it the default for all created histograms using seconds. This is suggested as a spec change in open-telemetry/opentelemetry-specification#3509 , but that was for all sdks, and as a blanket rule rather than based on the unit passed in.

Describe the solution you'd like

If you set up a histogram with unit "s" and don't specify explicit bucket boundaries, the boundaries are reasonable for network latencies measured in seconds. (of course, it depends on the use case, but I think a range of 5-1000ms is more useful than 5-10000s. It's hard to image buckets of 7500-10000 seconds being used often.)

If you set up a histogram with another unit, behaviour should be the same as current.

Describe alternatives you've considered

It's not clear to me from trying to read the specification + issues about this in the past if an sdk is 'allowed' to make a change like this without a spec change, but I consider open-telemetry/opentelemetry-dotnet#4820 as some evidence that sdks can make at least some changes in this area. If this isn't allowed under the specification, the main alternatives would be trying a specification-level change (seems to have been tried before unsuccessfully), or maintaining the status quo.

It appears the current status quo is 'durations measurements should use seconds, but you have to provide explicit histogram buckets for duration histograms every time you create one'. This feels like a big 'gotcha' to me, and an opportunity to improve user experience.

Additional Context

open-telemetry/opentelemetry-dotnet#4797
open-telemetry/opentelemetry-dotnet#4820
open-telemetry/opentelemetry-specification#3509

Would you like to implement a fix?

Yes

Happy to work on a fix, but given the interaction with the specification, I don't want to spend too much time on implementing it before knowing a potential PR would be potentially under consideration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant