openshift-usage-scripts contains 2 scripts:
openshift-metrics/openshift-prometheus_metrics.py
that collects metrics from prometheus and writes those to a json file.openshift-metrics/merge.py
takes the collected metrics and produces CSV OpenShift usage reports, one by namespace and other by pod.
This was a fork of https://github.com/OCP-on-NERC/xdmod-openshift-scripts
In order to run the scripts, you need to set the following environment variables:
OPENSHIFT_TOKEN
foropenshift_prometheus_metrics.py
when collecting metrics from thanos/prometheus.
We are using the token for xdmod-reader
service account in xdmod-reader
namespace on nerc-prod cluster. You can extract the token with:
oc get secrets -n xdmod-reader --as system:admin xdmod-reader-token-m6s2m -o yaml | yq .data.token -r |base64 -d
.
When running the scripts, there are two methods of specifying the OpenShift Prometheus endpoint. The first is through an environment variable:
$ export OPENSHIFT_PROMETHEUS_URL=<prometheus url>
$ python -m openshift-metrics.openshift_prometheus_metrics.py
The second is directly on the command line:
$ python -m openshift-metrics.openshift_prometheus_metrics --openshift-url <prometheus url>
By default the script will pull data from the day before.
$ python -m openshift_metrics.openshift_prometheus_metrics \
--openshift-url https://thanos-querier-openshift-monitoring.apps.shift.nerc.mghpcc.org \
You can specify a data range to collect metrics for a time period like this:
$ python -m openshift_metrics.openshift_prometheus_metrics \
--openshift-url https://thanos-querier-openshift-monitoring.apps.shift.nerc.mghpcc.org \
--report-start-date 2022-03-01 \
--report-end-date 2022-03-07 \
This will collect metrics from March 1st to March 7th, inclusive.
You can generate the openshift usage report by passing it multiple metrics files
$ python -m openshift_metrics.merge \
metrics-2024-01-01-to-2024-01-07.json \
metrics-2024-01-08-to-2024-01-14.json \
metrics-2024-01-15-to-2024-01-16.json
This will merge the metrics and produce the openshift usage report of the period January 1st to January 16.
Output file name can be specified with the --output-file
flags. You can also pass in a bunch of files like this:
$ python -m openshift_metrics.merge data_2024_01/*.json
The openshift_prometheus_metrics.py
retrieves metrics at a pod level. It does so with the
following Prometheus query:
<prometheus_url>/api/v1/query_range?query=<metric>&start=<report_date>T00:00:00Z&end=<report_date>T23:59:59Z&step=<step_min>m
This query generates samples for "step_min" minutes. The script will then merge consecutive samples together if their metrics are the same.
The script queries the following metrics:
- kube_pod_resource_request{unit="cores"} unless on(pod, namespace) kube_pod_status_unschedulable
- Cores requested by a pods that are scheduled to run.
- 'kube_pod_resource_request{unit="bytes"} unless on(pod, namespace) kube_pod_status_unschedulable'
- Memory (RAM) requested by pods that are scheduled to run.
- 'kube_pod_resource_request{resource=~".gpu."} unless on(pod, namespace) kube_pod_status_unschedulable'
- GPU Requested by pods that are sheculed to run. The requested GPU resource must have the word "gpu" in it
to be captured by this query. E.g.
nvidia.com/gpu
- GPU Requested by pods that are sheculed to run. The requested GPU resource must have the word "gpu" in it
to be captured by this query. E.g.