Skip to content

Error on Triggers during upgrade: statefulsets.apps "kafka-broker-dispatcher" not found #8535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
alanrichman opened this issue Mar 19, 2025 · 8 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@alanrichman
Copy link

I am in the process of updating a Knative operator installation to the latest and am running into an issue with Triggers on the hop from 1.13.3 to 1.14.X with the Helm chart.

Describe the bug
After successfully updating the Knative Eventing and Serving resources the Triggers go into a state of Ready=false, Reason=Schedule. The failure message is:

failed to schedule consumers: statefulsets.apps "kafka-broker-dispatcher" not found

This upgrade has also prompted the creation of new Consumer Groups, the original ones are named with a GUID and contain a reference to their source in the spec. The new ones are named knative-trigger-<name-of-trigger and are also Ready=false, Reason=Schedule. The failure message is also:

failed to schedule consumers: statefulsets.apps "kafka-broker-dispatcher" not found

It is right that there is no Stateful Set by that name, there is a deployment by that name in the knative-eventing namespace

Expected behavior

Expected is full reconciliation as happened for the upgrade to 1.13.3

I am also seeing the same error in the logs of the kafka-controller deployment for the Knative Eventing resource

{
    "level": "info",
    "ts": "2025-03-19T13:17:47.660Z",
    "logger": "kafka-broker-controller",
    "caller": "state/state.go:194",
    "msg": "failed to get statefulset",
    "commit": "a488e34-dirty",
    "knative.dev/pod": "kafka-controller-776c75785b-tzjnm",
    "error": "statefulsets.apps \"kafka-broker-dispatcher\" not found"
}
{
    "level": "error",
    "ts": "2025-03-19T13:17:47.660Z",
    "logger": "kafka-broker-controller",
    "caller": "consumergroup/reconciler.go:303",
    "msg": "Returned an error",
    "commit": "a488e34-dirty",
    "knative.dev/pod": "kafka-controller-776c75785b-tzjnm",
    "knative.dev/controller": "knative.dev.eventing-kafka-broker.control-plane.pkg.reconciler.consumergroup.Reconciler",
    "knative.dev/kind": "internal.kafka.eventing.knative.dev.ConsumerGroup",
    "knative.dev/traceid": "63b5340c-5619-4f15-bf8f-745307ba4bc0",
    "knative.dev/key": "default/knative-trigger-my-trigger",
    "targetMethod": "ReconcileKind",
    "error": "failed to schedule consumers: statefulsets.apps \"kafka-broker-dispatcher\" not found",
    "stacktrace": "knative.dev/eventing-kafka-broker/control-plane/pkg/client/internals/kafka/injection/reconciler/eventing/v1alpha1/consumergroup.(*reconcilerImpl).Reconcile\n\tknative.dev/eventing-kafka-broker/control-plane/pkg/client/internals/kafka/injection/reconciler/eventing/v1alpha1/consumergroup/reconciler.go:303\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\n\tknative.dev/[email protected]/controller/controller.go:542\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\tknative.dev/[email protected]/controller/controller.go:491"
}
...
{
    "level": "info",
    "ts": "2025-03-19T13:24:05.959Z",
    "logger": "kafka-broker-controller",
    "caller": "statefulset/autoscaler.go:182",
    "msg": "error while refreshing scheduler state (will retry){error 26 0  statefulsets.apps \"kafka-broker-dispatcher\" not found}",
    "commit": "a488e34-dirty",
    "knative.dev/pod": "kafka-controller-776c75785b-tzjnm",
    "component": "autoscaler"
}
{
    "level": "error",
    "ts": "2025-03-19T13:24:05.959Z",
    "logger": "kafka-broker-controller",
    "caller": "statefulset/autoscaler.go:168",
    "msg": "Failed to autoscale",
    "commit": "a488e34-dirty",
    "knative.dev/pod": "kafka-controller-776c75785b-tzjnm",
    "error": "statefulsets.apps \"kafka-broker-dispatcher\" not found",
    "stacktrace": "knative.dev/eventing/pkg/scheduler/statefulset.(*autoscaler).syncAutoscale.func1\n\tknative.dev/[email protected]/pkg/scheduler/statefulset/autoscaler.go:168\nk8s.io/apimachinery/pkg/util/wait.Poll.ConditionFunc.WithContext.func1\n\tk8s.io/[email protected]/pkg/util/wait/wait.go:109\nk8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext\n\tk8s.io/[email protected]/pkg/util/wait/wait.go:154\nk8s.io/apimachinery/pkg/util/wait.waitForWithContext\n\tk8s.io/[email protected]/pkg/util/wait/wait.go:207\nk8s.io/apimachinery/pkg/util/wait.poll\n\tk8s.io/[email protected]/pkg/util/wait/poll.go:260\nk8s.io/apimachinery/pkg/util/wait.PollWithContext\n\tk8s.io/[email protected]/pkg/util/wait/poll.go:85\nk8s.io/apimachinery/pkg/util/wait.Poll\n\tk8s.io/[email protected]/pkg/util/wait/poll.go:66\nknative.dev/eventing/pkg/scheduler/statefulset.(*autoscaler).syncAutoscale\n\tknative.dev/[email protected]/pkg/scheduler/statefulset/autoscaler.go:165\nknative.dev/eventing/pkg/scheduler/statefulset.(*autoscaler).Start\n\tknative.dev/[email protected]/pkg/scheduler/statefulset/autoscaler.go:145\nknative.dev/eventing/pkg/scheduler/statefulset.New.func2\n\tknative.dev/[email protected]/pkg/scheduler/statefulset/scheduler.go:106"
}

My triggers are pretty simple as far as I know

apiVersion: eventing.knative.dev/v1
kind: Trigger
metadata:
  name: my-trigger
  annotations:
    kafka.eventing.knative.dev/delivery.order: ordered
  namespace: default
spec:
  broker: default-broker
  filter:
    attributes:
      brokersource: my
  subscriber:
    ref:
      apiVersion: serving.knative.dev/v1
      kind: Service
      name: my-sink
      namespace: default

I reviewed the release notes for every version of Knative 1.14.X and I did not see any deprecations or breaking changes but please let me know if you see me overlooking something.

To Reproduce

  1. Be on Knative 1.13
  2. Have sources, sinks, triggers, etc.
  3. Upgrade to 1.14+
    Knative release version
  • Operator: 1.14.9
  • Eventing: 1.14.6
  • Serving: 1.14.1

Additional context
Add any other context about the problem here such as proposed priority

@alanrichman alanrichman added the kind/bug Categorizes issue or PR as related to a bug. label Mar 19, 2025
@matzew
Copy link
Member

matzew commented Mar 20, 2025

The Knative Eventing Broker for apache kafka in 1.13 is using deployments:
https://github.com/knative-extensions/eventing-kafka-broker/blob/release-1.13/data-plane/config/broker/500-dispatcher.yaml

While on 1.14 they are statefulsets:
https://github.com/knative-extensions/eventing-kafka-broker/blob/release-1.14/data-plane/config/broker/500-dispatcher.yaml

Release Notes:
https://github.com/knative-extensions/eventing-kafka-broker/releases/tag/knative-v1.14.0

The kafka-broker-dispatcher StatefulSet now scales with KEDA if you enable the controller-autoscaler-keda feature flag (knative-extensions/eventing-kafka-broker#3813, @Cali0707)

@alanrichman
Copy link
Author

alanrichman commented Mar 20, 2025

@matzew Thanks! That explains why it's looking for a Stateful Set. Do you know what might be causing it not to be created as ons? It still exists and gets created as a deployment.

Is there any config setting or other action I need to take to trigger this change?

@matzew
Copy link
Member

matzew commented Mar 20, 2025

Are you on Eventing Kafka Broker 1.14?

@alanrichman
Copy link
Author

I have my KnativeEventing resource declared like this:

apiVersion: operator.knative.dev/v1beta1
kind: KnativeEventing
metadata:
  name: knative-eventing
  namespace: knative-eventing
spec:
  version: 1.14.0

And it shows as ready:

$ kubectl get knativeeventing -n knative-eventing
NAME               VERSION   READY   REASON
knative-eventing   1.14.0    True

These are the deployments in its namespace:

$ kubectl get deployments -n knative-eventing
NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
eventing-controller       1/1     1            1           39h
eventing-webhook          1/1     1            1           39h
imc-controller            1/1     1            1           39h
imc-dispatcher            1/1     1            1           39h
kafka-broker-dispatcher   1/1     1            1           309d
kafka-broker-receiver     1/1     1            1           317d
kafka-controller          1/1     1            1           39h
kafka-webhook-eventing    1/1     1            1           39h
mt-broker-controller      1/1     1            1           39h
mt-broker-filter          1/1     1            1           39h
mt-broker-ingress         1/1     1            1           39h
pingsource-mt-adapter     0/0     0            0           39h

and stateful sets

$ kubectl get statefulsets -n knative-eventing
NAME                      READY   AGE
kafka-source-dispatcher   1/1     39h

This is all right after upgrading the KnativeEventing to 1.14.0 from 1.13.8. There are not pod cycles on the dispatcher or receiver deployments

$ kubectl get pods -n knative-eventing
NAME                                                           READY   STATUS      RESTARTS   AGE
eventing-controller-d5b7d858f-rmwvc                            1/1     Running     0          5m58s
eventing-webhook-5d56685f75-n6ls8                              1/1     Running     0          5m57s
imc-controller-78d684955f-7cxmt                                1/1     Running     0          5m54s
imc-dispatcher-f9c6f466d-z587l                                 1/1     Running     0          5m54s
kafka-broker-dispatcher-f4bc9d445-s8c8t                        1/1     Running     0          4d10h
kafka-broker-receiver-55bff4788d-rxxgn                         1/1     Running     0          45h
kafka-controller-776c75785b-48xpl                              1/1     Running     0          5m48s
kafka-controller-post-install-eventing-1.14.0-6ql9h            1/1     Running     0          5m46s
kafka-source-dispatcher-0                                      1/1     Running     0          5m42s
kafka-webhook-eventing-7d6564fd44-ns8l6                        1/1     Running     0          5m47s
knative-kafka-storage-version-migrator-eventing-1.14.0-9stth   0/1     Completed   0          5m46s
mt-broker-controller-7868db58cc-6wtkh                          1/1     Running     0          5m52s
mt-broker-filter-5bdb4b8d48-b6scz                              1/1     Running     0          5m53s
mt-broker-ingress-58c7cd87c7-6pp2b                             1/1     Running     0          5m52s
storage-version-migration-eventing-eventing-1.14.0-wlvd2       0/1     Completed   0          5m51s

I get identical results on a full removal-reinstallation of the Knative operator (1.14.9) using the same versions listed here

@alanrichman
Copy link
Author

@matzew some other stuff I have tried since to no avail, in case it's meaningful

  • Delete and recreate the KnativeEventing resource
  • Selectively terminate and observe recreation of deployments created for Knative Eventing
  • Upgrading the Kafka installed in the cluster and retrying

I have another cluster needing the same upgrades where I retried everything and got the same results.

@matzew
Copy link
Member

matzew commented Mar 26, 2025

So your dispatcher is still a deployment. Not a statefulset.

can you give details on the deployment of broker-dispatcher? Make sure it is on 1.14?

I am not sure why the ugrade did not work. BTW. the Knative Kafka bits you have installed manually?

@alanrichman
Copy link
Author

alanrichman commented Mar 30, 2025

@matzew Sorry for taking a bit to get back to you on this, I appreciate your help!

I am keeping this particular KnativeEventing on 1.13.8 while I am not working on it. I re-attempted the upgrade to 1.14.6 and my kafka-broker-dispatcher deployment is running this image: gcr.io/knative-releases/knative-kafka-broker-dispatcher@sha256:b7eb182d1970c19646b2ebf15868934a94a90a262d1f1075d55c3bfb3ef83699

This is the same image it was running while on 1.13.8, and I can see that it is the same image that it was running while we were on 1.12.0 since the deployment is pretty old, and there is only one RS for it which is the same age as the deployment. My kafka-broker-receiver deployment is the same age and also does not have any new RS.

The STS I do have, kafka-source-dispatcher, does rev when upgrading the KnativeEventing resource to 1.14.6.

Given all of this I had the idea to try deleting those two possibly-stale deployments and retrying the upgrade which did not work. I then deleted the KnativeEventing resource and recreated it on version 1.14.6 to start with. It created these deployments:

$ kubectl get deployments -n knative-eventing
NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
eventing-controller      1/1     1            1           3m32s
eventing-webhook         1/1     1            1           3m32s
imc-controller           1/1     1            1           3m29s
imc-dispatcher           1/1     1            1           3m28s
kafka-controller         1/1     1            1           3m22s
kafka-webhook-eventing   1/1     1            1           3m21s
mt-broker-controller     1/1     1            1           3m26s
mt-broker-filter         1/1     1            1           3m27s
mt-broker-ingress        1/1     1            1           3m27s
pingsource-mt-adapter    0/0     0            0           3m32s

And this stateful set

 $ kubectl get statefulsets -n knative-eventing
NAME                      READY   AGE
kafka-source-dispatcher   1/1     3m41s

It seems weird to me that even a new KnativeEventing resource on 1.14.6 is not creating the stateful set and still reporting a Ready status, does that tell us anything of note? The KnativeEventing has this status:

status:
  conditions:
  - lastTransitionTime: "2025-03-30T21:46:30Z"
    status: "True"
    type: DependenciesInstalled
  - lastTransitionTime: "2025-03-30T21:46:31Z"
    status: "True"
    type: DeploymentsAvailable
  - lastTransitionTime: "2025-03-30T21:46:30Z"
    status: "True"
    type: InstallSucceeded
  - lastTransitionTime: "2025-03-30T21:46:31Z"
    status: "True"
    type: Ready
  - lastTransitionTime: "2025-03-30T21:45:57Z"
    status: "True"
    type: VersionMigrationEligible
  manifests:
  - /var/run/ko/knative-eventing/1.14.6
  - /var/run/ko/eventing-source/1.14/kafka
  observedGeneration: 1
  version: 1.14.6

But the triggers are still failing with this error:

status:
  annotations:
    group.id: knative-trigger-my-trigger
  conditions:
  - lastTransitionTime: "2025-03-30T21:43:34Z"
    message: Did you install the data plane for this component?
    reason: Data plane not available
    status: "False"
    type: BrokerReady
  - lastTransitionTime: "2025-03-18T23:51:24Z"
    status: "True"
    type: DeadLetterSinkResolved
  - lastTransitionTime: "2025-03-30T21:40:23Z"
    message: 'failed to schedule consumers: statefulsets.apps "kafka-broker-dispatcher"
      not found'
    reason: Schedule
    status: "False"
    type: DependencyReady
  - lastTransitionTime: "2025-03-30T21:46:00Z"
    reason: authentication-oidc feature disabled
    status: "True"
    type: OIDCIdentityCreated
  - lastTransitionTime: "2025-03-30T21:43:34Z"
    message: Did you install the data plane for this component?
    reason: Data plane not available
    status: "False"
    type: Ready
  - lastTransitionTime: "2025-03-30T21:40:23Z"
    status: "True"
    type: SubscriberResolved
  - lastTransitionTime: "2025-03-18T23:51:24Z"
    status: "True"
    type: SubscriptionReady
  observedGeneration: 1

As far as other Kafka components, we have Kafka deployed in this cluster using Strimzi v0.45.0 and Kafka v3.9.0

@alanrichman
Copy link
Author

I noticed that both the kafka-broker-dispatcher Deployment and kafka-source-dispatcher Stateful Set are both using the gcr.io/knative-releases/knative-kafka-broker-dispatcher image. Is the correct state for eventing 1.14.6 to have both as stateful sets or to have only the kafka-broker-dispatcher stateful set? Is it possible that my stateful set is getting the wrong name for some reason?

I ask because I have read a bunch of other issues in this repo where the poster shows that they only have a kafka-broker-dispatcher statefulset, e.g.: knative-extensions/eventing-kafka-broker#3995 (comment)

That being said I do not know how the stateful set would get the "wrong" name, I reviewed all of the config maps in the knative-eventing namespace and did not find any references that would drive that. Just a thought

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants