Scrape errors for the /objectstore metrics path (also affects /array for the older versions) #108

B0go · 2025-04-16T13:40:25Z

What

The exporter returned HTTP 500 and error messages in the /array or /objectstore (depending on the version being used) metrics path for precisely 24h:

An error has occurred while serving metrics:

105 error(s) occurred:
* collected metric "purefb_buckets_s3_specific_performance_throughput_iops" { label:{name:"dimension" value:"others_per_sec"} label:{name:"name" value:"bucket1"} gauge:{value:0}} was collected before with the same name and label values
* collected metric "purefb_buckets_s3_specific_performance_throughput_iops" { label:{name:"dimension" value:"read_buckets_per_sec"} label:{name:"name" value:"bucket1"} gauge:{value:0}} was collected before with the same name and label values
* collected metric "purefb_buckets_s3_specific_performance_throughput_iops" { label:{name:"dimension" value:"read_objects_per_sec"} label:{name:"name" value:"bucket1"} gauge:{value:0}} was collected before with the same name and label values
* collected metric "purefb_buckets_s3_specific_performance_throughput_iops" { label:{name:"dimension" value:"write_buckets_per_sec"} label:{name:"name" value:"bucket1"} gauge:{value:0}} was collected before with the same name and label values
* collected metric "purefb_buckets_s3_specific_performance_throughput_iops" { label:{name:"dimension" value:"write_objects_per_sec"} label:{name:"name" value:"bucket1"} gauge:{value:0}} was collected before with the same name and label values
* collected metric "purefb_buckets_s3_specific_performance_latency_usec" { label:{name:"dimension" value:"usec_per_other_op"} label:{name:"name" value:"bucket1"} gauge:{value:0}} was collected before with the same name and label values
* collected metric "purefb_buckets_s3_specific_performance_latency_usec" { label:{name:"dimension" value:"usec_per_read_bucket_op"} label:{name:"name" value:"bucket1"} gauge:{value:0}} was collected before with the same name and label values
* collected metric "purefb_buckets_s3_specific_performance_latency_usec" { label:{name:"dimension" value:"usec_per_read_object_op"} label:{name:"name" value:"bucket1"} gauge:{value:0}} was collected before with the same name and label values
* collected metric "purefb_buckets_s3_specific_performance_latency_usec" { label:{name:"dimension" value:"usec_per_write_bucket_op"} label:{name:"name" value:"bucket1"} gauge:{value:0}} was collected before with the same name and label values
[...]

How

We used to run version 1.0.12, so the problem first showed up in the /array metrics path, affecting all the relevant metrics for the array health. It started around April 14 12:50 p.m UTC after no changes:

As part of different attempts to fix the issue, we introduced some changes:

Upgraded the exporter to 1.1.3 and changed the configuration to use the new metric paths
Increased the scrape interval to 45s for all metric paths except for /array
Restarted the exporter multiple times
Changed the configuration to connect to only one FB cluster per exporter (used to be two)

All those changes didn't immediately affect the status

After exactly 24 hours, the problem faded away:

The logs didn't show anything useful too:

● pure-exporter.service - Pure Exporter
     Loaded: loaded (/etc/systemd/system/pure-exporter.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2025-04-09 20:38:50 UTC; 4 days ago
   Main PID: 925921 (pure-exporter)
      Tasks: 36 (limit: 120666)
     Memory: 27.1M
        CPU: 1h 54min 22.323s
     CGroup: /system.slice/pure-exporter.service
             └─925921 /opt/pure_exporter/1.0.12/pure-exporter --tokens=/opt/pure_exporter/tokens.yml

Apr 09 20:38:50 host-1 systemd[1]: Started Pure Exporter.
Apr 09 20:38:50 host-1 pure-exporter[925921]: 2025/04/09 20:38:50 Start Pure FlashBlade exporter development on 0.0.0.0:9491

I cloned the project and explored the code and could not find an obvious reason, but it feels like the exporter is trying to register the same metric more than once in the Prometheus registry 🤔

The text was updated successfully, but these errors were encountered:

chrroberts-pure · 2025-04-16T20:33:59Z

I bet it was hitting an issue when a bucket was set to eradicate (detention). The delay is by default 24 hours.

I'll experiment with this. Thanks for the bug report!

chrroberts-pure · 2025-04-16T20:35:15Z

Also, if you'd like to have a chat or Zoom, please email us at [email protected] and we can schedule a time. Thank you!

B0go · 2025-04-17T12:29:22Z

I bet it was hitting an issue when a bucket was set to eradicate (detention). The delay is by default 24 hours.

I'll experiment with this. Thanks for the bug report!

We have been creating and deleting buckets recently as part of load tests and other similar work in new clusters, so it might be indeed related.

chrroberts-pure · 2025-04-17T19:37:36Z

This is 100% related

in https://github.com/PureStorage-OpenConnect/pure-fb-openmetrics-exporter/blob/main/internal/rest-client/buckets_s3_perf.go#L17 we get the list of the buckets to then query their performance 5 at a time.

The problem is that in the API the destroyed buckets are listed in /buckets but do not appear in /buckets/s3-specific-performance

this can be solved by adding the filter ?destroyed=false to the GET /buckets

to https://github.com/PureStorage-OpenConnect/pure-fb-openmetrics-exporter/blob/main/internal/rest-client/buckets.go#L47

chrroberts-pure · 2025-04-17T19:39:45Z

I'll have a bit more of a think if that's how i want to handle this since destroyed buckets still take up space, but that space will still show up in /array metrics , but that is the simplest solution.

ofoing · 2025-04-18T15:41:59Z

This is 100% related

in https://github.com/PureStorage-OpenConnect/pure-fb-openmetrics-exporter/blob/main/internal/rest-client/buckets_s3_perf.go#L17 we get the list of the buckets to then query their performance 5 at a time.

The problem is that in the API the destroyed buckets are listed in /buckets but do not appear in /buckets/s3-specific-performance

this can be solved by adding the filter ?destroyed=false to the GET /buckets

to https://github.com/PureStorage-OpenConnect/pure-fb-openmetrics-exporter/blob/main/internal/rest-client/buckets.go#L47

Exactly @chrroberts-pure , I had the same issue and added this to line 51

SetQueryParam("destroyed", "false").

chrroberts-pure · 2025-04-22T08:57:51Z

Thank you both @ofoing and @B0go , i've opened #111 which will resolve this issue. This will be released in the next version.

chrroberts-pure self-assigned this Apr 17, 2025

chrroberts-pure linked a pull request Apr 22, 2025 that will close this issue

Add a filter to not query destroyed buckets in GET /buckets #111

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scrape errors for the /objectstore metrics path (also affects /array for the older versions) #108

Scrape errors for the /objectstore metrics path (also affects /array for the older versions) #108

B0go commented Apr 16, 2025 •

edited

Loading

chrroberts-pure commented Apr 16, 2025

chrroberts-pure commented Apr 16, 2025

B0go commented Apr 17, 2025

chrroberts-pure commented Apr 17, 2025

chrroberts-pure commented Apr 17, 2025

ofoing commented Apr 18, 2025 •

edited

Loading

chrroberts-pure commented Apr 22, 2025 •

edited

Loading

Scrape errors for the /objectstore metrics path (also affects /array for the older versions) #108

Scrape errors for the /objectstore metrics path (also affects /array for the older versions) #108

Comments

B0go commented Apr 16, 2025 • edited Loading

What

How

chrroberts-pure commented Apr 16, 2025

chrroberts-pure commented Apr 16, 2025

B0go commented Apr 17, 2025

chrroberts-pure commented Apr 17, 2025

chrroberts-pure commented Apr 17, 2025

ofoing commented Apr 18, 2025 • edited Loading

chrroberts-pure commented Apr 22, 2025 • edited Loading

B0go commented Apr 16, 2025 •

edited

Loading

ofoing commented Apr 18, 2025 •

edited

Loading

chrroberts-pure commented Apr 22, 2025 •

edited

Loading