-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporter/elasticsearch] Handle edge cases in metrics grouping causing TSDB version_conflict_engine_exception #39309
base: main
Are you sure you want to change the base?
Conversation
.chloggen/elasticsearchexporter_handle-metric-grouping-edge-cases.yaml
Outdated
Show resolved
Hide resolved
// e.g. index is already considered during routing and DS attributes do not need to be considered in hashing | ||
func mapHashExcludeReservedAttrs(hasher hash.Hash, m pcommon.Map, extra ...string) { | ||
func mapHashSortedExcludeReservedAttrs(hasher hash.Hash, m pcommon.Map, extra ...string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we consider using common libraries like:
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/pdatautil/hash.go
https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/internal/exp/metrics/identity
for the hash calculations? I think almost all the functions here could be replaced by existing hash options. For the exclusion of DataStream*
attributes we could create a custom hash option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not able to use identity directly, since some fields are used but are not TSDB dimensions, e.g. schema url.
pdatautil hash, on the other hand, looks good and will get rid of a lot of the hashing code we have. Still need to look into how to achieve exclusions. I think we'll be better off creating a map and passing it in, instead of writing an Option which means reimplementing all the funcs in pdatautil hash because they are private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll be better off creating a map and passing it in
which will involve a fair bit of copying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not able to use identity directly, since some fields are used but are not TSDB dimensions, e.g. schema url.
I thought so, the identity is a bit rigid.
Still need to look into how to achieve exclusions
We can implement a new HashOption (in elasticsearch-exporter for now) - something like WithMapExclusions(m pcommon.Map, excludedKeys ...string) HashOption
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of writing an Option which means reimplementing all the funcs in pdatautil hash because they are private
Ah, right!!! everything is private. Maybe we can add a TODO for exporter and create a PR in pdatautil to have this hash option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a TODO, shouldn't be a blocker for this PR.
return 0 | ||
} | ||
|
||
func (h ecsDataPointHasher) hashScope(_ pcommon.InstrumentationScope) uint32 { | ||
return 0 | ||
} | ||
|
||
func (h ecsDataPointHasher) hashDataPoint(_ datapoints.DataPoint) uint32 { | ||
return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This feels a bit misleading for the API. As an API user, calling hashScope, I would expect to get the hash of the instrumentation scope - using the hash or not should be controlling which hash to keep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is my shot at the following requirements
- ecs hashes differently from otel as data point attributes and resource attributes are propagated to the root level
- hashing of resource should be done per-resource, scope done per-scope, to avoid rehashing and sorting resource attributes and scope attributes per-data point.
- (not a strict requirement) avoid leaking ecs vs otel difference to exporter level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the key to the discussion here is where to put the line between exporter logic and metrics grouping logic. Welcome any suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the root of the issue is that the mapping mode is leaking in the hashing logic. What do you think about making the dataPointGroupKey
aware of the mapping mode and calculate the key based on that. So, for example: for ecs
mode, it will populate the combined hash key only, whereas for otel
mode it will populate resource and scope keys? This will also mean that we will only need DataPointHasher
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, check https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/39309/files#r2039284735 - maybe we can remove the *DataPointHasher
completely but this is up to further discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about making the dataPointGroupKey aware of the mapping mode and calculate the key based on that. So, for example: for ecs mode, it will populate the combined hash key only, whereas for otel mode it will populate resource and scope keys? This will also mean that we will only need DataPointHasher.
Let's say there exist a function call getGroupKey(mode, resource, scope, dp) dataPointGroupKey
, doesn't it mean that it needs to be called per data point? Won't we lose the optimization to avoid sorting and rehashing resource and scope attributes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't we lose the optimization to avoid sorting and rehashing resource and scope attributes?
We can just have an interface to update the dataPointGroupKey
, something like UpdateWithScope
, UpdateWithDatapoint
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great advice. I've moved hash key to the grouping file to avoid leaking the implementation to exporter. cce4471
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
4b1a72c
to
6c9cbc4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM, just some minor comments.
// TODO(carsonip): use opentelemetry-collector-contrib/pkg/pdatautil/hash.go when it can optionally exclude attributes | ||
// We could have used it now but it'll involve creating a new Map and copying things over. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we create an issue/pr to track this?
type HashKey struct { | ||
resourceHash uint32 | ||
scopeHash uint32 | ||
dpHash uint32 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial idea was to not have {ecs, otel}DataPointHasher
at all and build the hash update logic in the HashKey
but I think we can discuss that in the follow-up PR when the pdatautil has the method for hashing map with exclude keys.
func (h *ECSDataPointHasher) HashKey() HashKey { | ||
merged := pcommon.NewMap() | ||
merged.EnsureCapacity(h.resource.Attributes().Len() + h.dp.Attributes().Len()) | ||
h.resource.Attributes().CopyTo(merged) | ||
// scope attributes are ignored in ECS mode | ||
for k, v := range h.dp.Attributes().All() { | ||
v.CopyTo(merged.PutEmpty(k)) | ||
} | ||
|
||
hasher := fnv.New32a() | ||
|
||
timestampBuf := make([]byte, 8) | ||
binary.LittleEndian.PutUint64(timestampBuf, uint64(h.dp.Timestamp())) | ||
hasher.Write(timestampBuf) | ||
|
||
mapHashSortedExcludeReservedAttrs(hasher, merged) | ||
|
||
return HashKey{ | ||
dpHash: hasher.Sum32(), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we unify the behaviour of HashKey
for both the datapoint hashers? Currently, the ECS one is doing hash calculation in the HashKey
method whereas the OTel one does it on Update
.
Description
Fix 2 edge cases where metrics grouping were not done correctly.
Implementation:
Link to tracking issue
Fixes #38083
Testing
Added unit tests