Cluster autoscaler is not able to scale down Rancher managed cluster #7981

dirkdaems · 2025-03-26T14:41:41Z

Which component are you using?:
cluster-autoscaler

What version of the component are you using?:
Component version: v1.29.0

What k8s version are you using (kubectl version)?:

$ kubectl version
Client Version: v1.32.3
Kustomize Version: v5.5.0
Server Version: v1.29.4+rke2r1

What environment is this in?:
Rancher managed Kubernetes cluster on an OpenStack based cloud at CloudFerro.

What did you expect to happen?:
The autoscaler should scale down the worker nodes.

What happened instead?:
The autoscaler was not able to scale down the worker nodes.

How to reproduce it (as minimally and precisely as possible):

Deploy a Rancher managed Kubernetes cluster on an OpenStack based cloud.
Start a workload which will exceed quota or which will exhaust OpenStack resources
Stop the workload
Autoscaler will not be able to downscale the cluster anymore

Anything else we need to know?:

When the issue occurs, these kind of logs can be found in the autoscaler pod logs:
I0326 12:39:22.919386 1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"k8s-worker-eo2a-xlarge-c29f50b3-s74qs.novalocal", UID:"c23b0b6d-4f0c-484d-a5c5-c7c896c08be1", APIVersion:"v1", ResourceVersion:"101114911", FieldPath:""}): type: 'Warning' reason: 'ScaleDownFailed' failed to delete empty node: failed to delete nodes from group worker-eo2a-xlarge: could not find providerID in machine: k8s-worker-eo2a-xlarge-78b857bb76x5hgc6-gz6t8/fleet-default

Previously, we logged #6778 which was closed because we thought that after upgrading the issue was fixed. We now noticed it again, so logging this ticket.

In the autoscaler Grafana dashboard you typically see that the autoscaler is aware of the unneeded nodes, but scaling down fails, probably due to this providerID issue:

The text was updated successfully, but these errors were encountered:

Shubham82 · 2025-03-27T07:32:29Z

/area provider/rancher
/area cluster-autoscaler

Shubham82 · 2025-03-27T07:33:55Z

cc @ctrox

dirkdaems · 2025-04-30T13:14:14Z

When this happens, a Rancher machine k8s resource can't be deleted:

To resolve the issue the finalizers on the Rancher machine k8s resource have to be removed.

dirkdaems added the kind/bug Categorizes issue or PR as related to a bug. label Mar 26, 2025

k8s-ci-robot added area/provider/rancher area/cluster-autoscaler labels Mar 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cluster autoscaler is not able to scale down Rancher managed cluster #7981

Cluster autoscaler is not able to scale down Rancher managed cluster #7981

dirkdaems commented Mar 26, 2025 •

edited

Loading

Shubham82 commented Mar 27, 2025

Uh oh!

Shubham82 commented Mar 27, 2025

Uh oh!

dirkdaems commented Apr 30, 2025

Uh oh!

Cluster autoscaler is not able to scale down Rancher managed cluster #7981

Cluster autoscaler is not able to scale down Rancher managed cluster #7981

Comments

dirkdaems commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Shubham82 commented Mar 27, 2025

Uh oh!

Shubham82 commented Mar 27, 2025

Uh oh!

dirkdaems commented Apr 30, 2025

Uh oh!

dirkdaems commented Mar 26, 2025 •

edited

Loading