Remove de-duping of attribute keys #1098

lzchen · 2023-06-07T20:48:31Z

In the current metrics SDK implementation, attribute sets with entries that have the same keys are de-duped. The "first" unique key is taken and the rest is dropped. This behavior is not fully defined in the spec so it is up to languages to decide the specific implementation. De-duping does not make sense for a couple of reasons:

Only the first key seen is kept, the rest are dropped. Is this "correct" behavior? Who is to say that the first key is the one that the user wants?
Performance improvement only exists for the case in which there are multiple of the same key, which is uncommon. The majority of the cases, keys within an AttributeSet will have different values, in which case, performance actually degrades because of the constant checks of key existence in the seen hashmap.

I ran 2 benchmark tests demonstrating this, the first Counter/AddUniqueAttr makes a call to add with 1million unique keys and the seconds makes a call to add with 1 million of the same keys.

With de-duping:

Counter/AddUniqueAttr   time:   [338.88 ms **342.29 ms** 345.65 ms]
                        change: [+26.244% +28.995% +31.699%] (p = 0.00 < 0.05)
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe
Counter/AddDupeAttr     time:   [22.946 ms **23.200 ms** 23.475 ms]
                        change: [-90.635% -90.503% -90.361%] (p = 0.00 < 0.05)

Without de-duping:

Counter/AddUniqueAttr   time:   [236.92 ms **240.31 ms** 244.08 ms]
                        change: [-0.0665% +1.5192% +3.3808%] (p = 0.08 > 0.05)
                        Performance has improved.
Counter/AddDupeAttr     time:   [186.02 ms **187.53 ms** 189.18 ms]
                        change: [+1.8287% +2.9432% +4.0648%] (p = 0.00 < 0.05)
                        Performance has regressed.

Of course de-duping will have tremendous time improvements for duplicate attributes but this is probably not worth the overhead of additional time added with non-duplicate attributes.

Thoughts?

The text was updated successfully, but these errors were encountered:

shaun-cox · 2023-06-12T19:08:27Z

I like the intention but have another suggestion if others don't find it palatable. You're pointing out this case:

impl From<&[KeyValue]> for AttributeSet

The root issue is that we want &[KeyValue] to have unique keys, in general, not just for building AttributeSets from them. I say this from the general assertion that when we say "a list of key/value pairs", we naturally mean a list where each key is actually unique among the others... otherwise we wouldn't have called them "keys".

First step: let's assume we can remove struct KeyValue and replace it with a more idiomatic (Key, Value) as a language tuple. (We can still have pub use KeyValue = (Key, Value) as a type alias.)

Next, let's also assume the common case that the keys themselves are known at compile time. This means we could have some new type to represent something like &'static [&'static str; N] (a fixed array key names) or &'static[Key; N] and a proc_macro to build those up letting users create named static constant "key sets". The proc_macro would do the de-duping that is happening at run-time above.

Then, we can produce an iterator of (Key, Value) given a KeySet (the arbitrary name for the thing above) and another Iterator<Item = Value>, by just using zip.

I know it's a bit hand-wavy, but I think we could do better in the API at composing key/value pairs in general, with the compiler doing the checking for dups of keys in the cases where all the keys are known at compile time anyway.

lzchen · 2023-06-27T18:59:31Z

Might have to retract the original idea of removing deduping in the API/SDK due to uniqueness being a must outlined here in the spec. Instead, maybe we make it a requirement for users to pass in de-duped attribute keys into the api.

@shaun-cox

Your idea sounds interesting. @MadVikingGod is this along the lines of what you were referring to Go was trying to implement with fixed array key names known at compile time?

cijothomas · 2023-08-31T15:13:04Z

Might have to retract the original idea of removing deduping in the API/SDK due to uniqueness being a must outlined here in the spec.

The spec allows flexibility there. We could do de-dup in the exporter thread, so it won't affect hot path performance.

djc · 2023-09-01T09:13:14Z

@shaun-cox I like your plan (except for the type alias, which IMO usually just make the code harder to understand). I wonder if it is possible to write a const function that can check an array of strings for duplicates...

cijothomas · 2024-08-07T18:38:05Z

#1989 Already addresses this by avoiding de-dup cost in hot path, and punishing only those users who do have duplicates. Keeping the issue open, as we don't have tests to prove this!

TommyCpp added the A-common Area:common issues that not related to specific pillar label Jun 26, 2023

cijothomas added A-metrics Area: issues related to metrics and removed A-common Area:common issues that not related to specific pillar labels Oct 4, 2023

cijothomas mentioned this issue Oct 4, 2023

SpanAttribute key deduplication #1284

Closed

cijothomas mentioned this issue Oct 13, 2023

[Feature Request] Deduplication of attributes #1300

Open

cijothomas mentioned this issue Nov 22, 2023

Metric SDK - remove de-duplicating attributes for perf gain #1398

Closed

4 tasks

lalitb added this to the Metrics API Stable milestone Feb 1, 2024

cijothomas removed this from the Metrics API Stable milestone Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove de-duping of attribute keys #1098

Remove de-duping of attribute keys #1098

lzchen commented Jun 7, 2023 •

edited

Loading

shaun-cox commented Jun 12, 2023

Uh oh!

lzchen commented Jun 27, 2023 •

edited

Loading

Uh oh!

cijothomas commented Aug 31, 2023

Uh oh!

djc commented Sep 1, 2023

Uh oh!

cijothomas commented Aug 7, 2024

Uh oh!

Remove de-duping of attribute keys #1098

Remove de-duping of attribute keys #1098

Comments

lzchen commented Jun 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

shaun-cox commented Jun 12, 2023

Uh oh!

lzchen commented Jun 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cijothomas commented Aug 31, 2023

Uh oh!

djc commented Sep 1, 2023

Uh oh!

cijothomas commented Aug 7, 2024

Uh oh!

lzchen commented Jun 7, 2023 •

edited

Loading

lzchen commented Jun 27, 2023 •

edited

Loading