-
Notifications
You must be signed in to change notification settings - Fork 13
Scrape errors for the /objectstore metrics path (also affects /array for the older versions) #108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I bet it was hitting an issue when a bucket was set to eradicate (detention). The delay is by default 24 hours. I'll experiment with this. Thanks for the bug report! |
Also, if you'd like to have a chat or Zoom, please email us at |
We have been creating and deleting buckets recently as part of load tests and other similar work in new clusters, so it might be indeed related. |
This is 100% related in https://github.com/PureStorage-OpenConnect/pure-fb-openmetrics-exporter/blob/main/internal/rest-client/buckets_s3_perf.go#L17 we get the list of the buckets to then query their performance 5 at a time. The problem is that in the API the destroyed buckets are listed in this can be solved by adding the filter |
I'll have a bit more of a think if that's how i want to handle this since destroyed buckets still take up space, but that space will still show up in /array metrics , but that is the simplest solution. |
Exactly @chrroberts-pure , I had the same issue and added this to line 51 SetQueryParam("destroyed", "false"). |
What
The exporter returned HTTP 500 and error messages in the /array or /objectstore (depending on the version being used) metrics path for precisely 24h:
How
We used to run version 1.0.12, so the problem first showed up in the /array metrics path, affecting all the relevant metrics for the array health. It started around April 14 12:50 p.m UTC after no changes:
As part of different attempts to fix the issue, we introduced some changes:
All those changes didn't immediately affect the status
After exactly 24 hours, the problem faded away:
The logs didn't show anything useful too:
I cloned the project and explored the code and could not find an obvious reason, but it feels like the exporter is trying to register the same metric more than once in the Prometheus registry 🤔
The text was updated successfully, but these errors were encountered: