Skip to content

Scrape errors for the /objectstore metrics path (also affects /array for the older versions) #108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
B0go opened this issue Apr 16, 2025 · 7 comments · May be fixed by #111
Open

Scrape errors for the /objectstore metrics path (also affects /array for the older versions) #108

B0go opened this issue Apr 16, 2025 · 7 comments · May be fixed by #111
Assignees

Comments

@B0go
Copy link

B0go commented Apr 16, 2025

What

The exporter returned HTTP 500 and error messages in the /array or /objectstore (depending on the version being used) metrics path for precisely 24h:

An error has occurred while serving metrics:

105 error(s) occurred:
* collected metric "purefb_buckets_s3_specific_performance_throughput_iops" { label:{name:"dimension" value:"others_per_sec"} label:{name:"name" value:"bucket1"} gauge:{value:0}} was collected before with the same name and label values
* collected metric "purefb_buckets_s3_specific_performance_throughput_iops" { label:{name:"dimension" value:"read_buckets_per_sec"} label:{name:"name" value:"bucket1"} gauge:{value:0}} was collected before with the same name and label values
* collected metric "purefb_buckets_s3_specific_performance_throughput_iops" { label:{name:"dimension" value:"read_objects_per_sec"} label:{name:"name" value:"bucket1"} gauge:{value:0}} was collected before with the same name and label values
* collected metric "purefb_buckets_s3_specific_performance_throughput_iops" { label:{name:"dimension" value:"write_buckets_per_sec"} label:{name:"name" value:"bucket1"} gauge:{value:0}} was collected before with the same name and label values
* collected metric "purefb_buckets_s3_specific_performance_throughput_iops" { label:{name:"dimension" value:"write_objects_per_sec"} label:{name:"name" value:"bucket1"} gauge:{value:0}} was collected before with the same name and label values
* collected metric "purefb_buckets_s3_specific_performance_latency_usec" { label:{name:"dimension" value:"usec_per_other_op"} label:{name:"name" value:"bucket1"} gauge:{value:0}} was collected before with the same name and label values
* collected metric "purefb_buckets_s3_specific_performance_latency_usec" { label:{name:"dimension" value:"usec_per_read_bucket_op"} label:{name:"name" value:"bucket1"} gauge:{value:0}} was collected before with the same name and label values
* collected metric "purefb_buckets_s3_specific_performance_latency_usec" { label:{name:"dimension" value:"usec_per_read_object_op"} label:{name:"name" value:"bucket1"} gauge:{value:0}} was collected before with the same name and label values
* collected metric "purefb_buckets_s3_specific_performance_latency_usec" { label:{name:"dimension" value:"usec_per_write_bucket_op"} label:{name:"name" value:"bucket1"} gauge:{value:0}} was collected before with the same name and label values
[...]

How

We used to run version 1.0.12, so the problem first showed up in the /array metrics path, affecting all the relevant metrics for the array health. It started around April 14 12:50 p.m UTC after no changes:

Image

As part of different attempts to fix the issue, we introduced some changes:

  • Upgraded the exporter to 1.1.3 and changed the configuration to use the new metric paths
  • Increased the scrape interval to 45s for all metric paths except for /array
  • Restarted the exporter multiple times
  • Changed the configuration to connect to only one FB cluster per exporter (used to be two)

All those changes didn't immediately affect the status

After exactly 24 hours, the problem faded away:

Image

The logs didn't show anything useful too:

● pure-exporter.service - Pure Exporter
     Loaded: loaded (/etc/systemd/system/pure-exporter.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2025-04-09 20:38:50 UTC; 4 days ago
   Main PID: 925921 (pure-exporter)
      Tasks: 36 (limit: 120666)
     Memory: 27.1M
        CPU: 1h 54min 22.323s
     CGroup: /system.slice/pure-exporter.service
             └─925921 /opt/pure_exporter/1.0.12/pure-exporter --tokens=/opt/pure_exporter/tokens.yml

Apr 09 20:38:50 host-1 systemd[1]: Started Pure Exporter.
Apr 09 20:38:50 host-1 pure-exporter[925921]: 2025/04/09 20:38:50 Start Pure FlashBlade exporter development on 0.0.0.0:9491

I cloned the project and explored the code and could not find an obvious reason, but it feels like the exporter is trying to register the same metric more than once in the Prometheus registry 🤔

@chrroberts-pure
Copy link
Collaborator

I bet it was hitting an issue when a bucket was set to eradicate (detention). The delay is by default 24 hours.

I'll experiment with this. Thanks for the bug report!

@chrroberts-pure
Copy link
Collaborator

Also, if you'd like to have a chat or Zoom, please email us at [email protected] and we can schedule a time. Thank you!

@B0go
Copy link
Author

B0go commented Apr 17, 2025

I bet it was hitting an issue when a bucket was set to eradicate (detention). The delay is by default 24 hours.

I'll experiment with this. Thanks for the bug report!

We have been creating and deleting buckets recently as part of load tests and other similar work in new clusters, so it might be indeed related.

@chrroberts-pure
Copy link
Collaborator

This is 100% related

in https://github.com/PureStorage-OpenConnect/pure-fb-openmetrics-exporter/blob/main/internal/rest-client/buckets_s3_perf.go#L17 we get the list of the buckets to then query their performance 5 at a time.

The problem is that in the API the destroyed buckets are listed in /buckets but do not appear in /buckets/s3-specific-performance

this can be solved by adding the filter ?destroyed=false to the GET /buckets

to https://github.com/PureStorage-OpenConnect/pure-fb-openmetrics-exporter/blob/main/internal/rest-client/buckets.go#L47

@chrroberts-pure chrroberts-pure self-assigned this Apr 17, 2025
@chrroberts-pure
Copy link
Collaborator

I'll have a bit more of a think if that's how i want to handle this since destroyed buckets still take up space, but that space will still show up in /array metrics , but that is the simplest solution.

@ofoing
Copy link

ofoing commented Apr 18, 2025

This is 100% related

in https://github.com/PureStorage-OpenConnect/pure-fb-openmetrics-exporter/blob/main/internal/rest-client/buckets_s3_perf.go#L17 we get the list of the buckets to then query their performance 5 at a time.

The problem is that in the API the destroyed buckets are listed in /buckets but do not appear in /buckets/s3-specific-performance

this can be solved by adding the filter ?destroyed=false to the GET /buckets

to https://github.com/PureStorage-OpenConnect/pure-fb-openmetrics-exporter/blob/main/internal/rest-client/buckets.go#L47

Exactly @chrroberts-pure , I had the same issue and added this to line 51

SetQueryParam("destroyed", "false").

@chrroberts-pure
Copy link
Collaborator

chrroberts-pure commented Apr 22, 2025

Thank you both @ofoing and @B0go , i've opened #111 which will resolve this issue. This will be released in the next version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants