Skip to content

Remove de-duping of attribute keys #1098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lzchen opened this issue Jun 7, 2023 · 5 comments
Open

Remove de-duping of attribute keys #1098

lzchen opened this issue Jun 7, 2023 · 5 comments
Labels
A-metrics Area: issues related to metrics

Comments

@lzchen
Copy link
Contributor

lzchen commented Jun 7, 2023

In the current metrics SDK implementation, attribute sets with entries that have the same keys are de-duped. The "first" unique key is taken and the rest is dropped. This behavior is not fully defined in the spec so it is up to languages to decide the specific implementation. De-duping does not make sense for a couple of reasons:

  1. Only the first key seen is kept, the rest are dropped. Is this "correct" behavior? Who is to say that the first key is the one that the user wants?
  2. Performance improvement only exists for the case in which there are multiple of the same key, which is uncommon. The majority of the cases, keys within an AttributeSet will have different values, in which case, performance actually degrades because of the constant checks of key existence in the seen hashmap.

I ran 2 benchmark tests demonstrating this, the first Counter/AddUniqueAttr makes a call to add with 1million unique keys and the seconds makes a call to add with 1 million of the same keys.

With de-duping:

Counter/AddUniqueAttr   time:   [338.88 ms **342.29 ms** 345.65 ms]
                        change: [+26.244% +28.995% +31.699%] (p = 0.00 < 0.05)
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe
Counter/AddDupeAttr     time:   [22.946 ms **23.200 ms** 23.475 ms]
                        change: [-90.635% -90.503% -90.361%] (p = 0.00 < 0.05)

Without de-duping:

Counter/AddUniqueAttr   time:   [236.92 ms **240.31 ms** 244.08 ms]
                        change: [-0.0665% +1.5192% +3.3808%] (p = 0.08 > 0.05)
                        Performance has improved.
Counter/AddDupeAttr     time:   [186.02 ms **187.53 ms** 189.18 ms]
                        change: [+1.8287% +2.9432% +4.0648%] (p = 0.00 < 0.05)
                        Performance has regressed.

Of course de-duping will have tremendous time improvements for duplicate attributes but this is probably not worth the overhead of additional time added with non-duplicate attributes.

Thoughts?

@shaun-cox
Copy link
Contributor

I like the intention but have another suggestion if others don't find it palatable. You're pointing out this case:

impl From<&[KeyValue]> for AttributeSet

The root issue is that we want &[KeyValue] to have unique keys, in general, not just for building AttributeSets from them. I say this from the general assertion that when we say "a list of key/value pairs", we naturally mean a list where each key is actually unique among the others... otherwise we wouldn't have called them "keys".

First step: let's assume we can remove struct KeyValue and replace it with a more idiomatic (Key, Value) as a language tuple. (We can still have pub use KeyValue = (Key, Value) as a type alias.)

Next, let's also assume the common case that the keys themselves are known at compile time. This means we could have some new type to represent something like &'static [&'static str; N] (a fixed array key names) or &'static[Key; N] and a proc_macro to build those up letting users create named static constant "key sets". The proc_macro would do the de-duping that is happening at run-time above.

Then, we can produce an iterator of (Key, Value) given a KeySet (the arbitrary name for the thing above) and another Iterator<Item = Value>, by just using zip.

I know it's a bit hand-wavy, but I think we could do better in the API at composing key/value pairs in general, with the compiler doing the checking for dups of keys in the cases where all the keys are known at compile time anyway.

@TommyCpp TommyCpp added the A-common Area:common issues that not related to specific pillar label Jun 26, 2023
@lzchen
Copy link
Contributor Author

lzchen commented Jun 27, 2023

Might have to retract the original idea of removing deduping in the API/SDK due to uniqueness being a must outlined here in the spec. Instead, maybe we make it a requirement for users to pass in de-duped attribute keys into the api.

@shaun-cox

Your idea sounds interesting. @MadVikingGod is this along the lines of what you were referring to Go was trying to implement with fixed array key names known at compile time?

@cijothomas
Copy link
Member

Might have to retract the original idea of removing deduping in the API/SDK due to uniqueness being a must outlined here in the spec.

The spec allows flexibility there. We could do de-dup in the exporter thread, so it won't affect hot path performance.

@djc
Copy link
Contributor

djc commented Sep 1, 2023

@shaun-cox I like your plan (except for the type alias, which IMO usually just make the code harder to understand). I wonder if it is possible to write a const function that can check an array of strings for duplicates...

@cijothomas cijothomas added A-metrics Area: issues related to metrics and removed A-common Area:common issues that not related to specific pillar labels Oct 4, 2023
@lalitb lalitb added this to the Metrics API Stable milestone Feb 1, 2024
@cijothomas
Copy link
Member

#1989 Already addresses this by avoiding de-dup cost in hot path, and punishing only those users who do have duplicates. Keeping the issue open, as we don't have tests to prove this!

@cijothomas cijothomas removed this from the Metrics API Stable milestone Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-metrics Area: issues related to metrics
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants