-
Notifications
You must be signed in to change notification settings - Fork 142
Restrict cluster prometheus rules to metrics of the according cluster #555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: fpfuetsch <[email protected]>
@fpfuetsch I have an even better solution. I rewrote almost all the alerts so they can be deployed by the operator chart and the rules are rewritten to match against the cluster name. Here is an example for
or with a cluster label:
Here the cluster name is: labels:
severity: warning
{{ .Values.monitoring.prometheusRule.clusterLabel }}: $labels.{{ .Values.monitoring.prometheusRule.clusterLabel }}
namespace: {{ "{{ $labels.namespace }}" }}
cnpg_cluster: {{ "{{ $labels.created_by_name }}" }} I already have these. I just need to test them. |
And rewrite them in the context of the operator chart. |
@itay-grudev I have two questions regarding your approach:
I personally like the idea that alert rules for the clusters are part of the cluster deployment while alert rules for the operator are part of the operator deployment. |
No. You'll have to implement the ignore logic in your alert routing or escalation software - like Grafana Oncall. |
So you prefer opt-out (ignoring, silencing) instead of opt-in for alerts? |
I am not sure, but it would make sense to have a set of alerts shipped with the operator. The only reason I implemented them at the cluster level is because I never figured out how to implement a Cluster Offline Alert without knowledge of the existence of a cluster. |
@itay-grudev What do you think about merging this change until we found a good way to translate the alerts for the operator chart? |
Closes #554