-
Notifications
You must be signed in to change notification settings - Fork 168
Fluentd Crash #194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fluentd Crash #194
Comments
What version of kubernetes is your cluster? |
Kubernetes version is 1.13. Fluentd crashes sometimes, not always. And I can see the metadata is correctly parsed. I guess it crashes only when parsing fails?
… On 2 Oct 2019, at 23:32, Richard Megginson ***@***.***> wrote:
What version of kubernetes is your cluster?
Could it be a bug in kubeclient-4.5.0? @jkohen have you seen this testing the latest plugin with the latest kubeclient?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Can you try reverting the kubeclient version to 4.4.x? |
hmm - https://bugzilla.redhat.com/show_bug.cgi?id=1743865 - investigating more |
possible workaround(fix?) - see https://bugzilla.redhat.com/show_bug.cgi?id=1743865#c4 and https://bugzilla.redhat.com/show_bug.cgi?id=1743865#c8 - if you can't view those comments, let me know |
@richm We use managed Kubernetes from cloud provider. I have switched fluent-plugin-kubernetes_metadata_filter to v2.2.0, seems working well. |
@emillg are you using kubernetes_metadata_filter v2.2.0 with kubeclient 4.5.0? |
@richm I haven't seen that issue. It seems to be connected with namespace watches, and I only modified the pod listing. @cben does this error look familiar? I could be missing something, but I don't see any relevant changes between releases v2.1.6 (May 15, 2019) and v2.4.0 of |
I don't either - but we only recently updated the plugin to use kubeclient 4.4 - before that it was using a quite old version of kubeclient - it may be that @emillg did not use kubeclient 4.4, moving directly to 4.5, and this problem is present in 4.x The broader problem is that the primary maintainers of this plugin are Red Hatters working on OpenShift logging, which does not use the watch functionality - we have no upstream (here) CI coverage of this feature, and we have no operational experience using "plain" Kubernetes, and no operational experience using watches - we need someone to step up and support the watches feature, or we will simply not support it anymore (unless for some reason we start using it). |
Let me see if we can take a look at the problem we're seeing here. @qingling128 who can take a look? @richm We can talk about maintenance beyond this case offline. My -at-google.com email is easy to guess from my GitHub username. |
One recent change is that kubeclient started allowing both 3.y and 4.y versions for I just woke up so I might be talking nonsense 🙃 [P.S. I'll also advertise that I'd be happy to add more maintainers to kubeclient, if anyone is interested. I don't have enough time lately, and my Red Hatter perspective is slightly limiting too.] |
That bugzilla used http-3.3.0 though. |
@richm This is the Dockerfile I use now. Upgrading fluent-plugin-kubernetes_metadata_filter to a higher version will crash.
|
@emillg thanks - can you also attach your kubernetes_metadata plugin config? |
|
@emillg - Are you using any proxy / firewall / AWS load balancer as mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1743865#c4 and https://bugzilla.redhat.com/show_bug.cgi?id=1743865#c8? |
In fact we are seeing this error as well now. It's reproducible when the Kubernetes master API is temporarily unavailable (e.g. cluster upgrade, cluster resizing):
The agent that did not crash had the following versions:
The agent that crashes has the following versions:
The cause might be that the parser return value changed in the |
Might be related to httprb/http#556. |
After some digging, it seems that the fluent-plugin-kubernetes_metadata_filter/lib/fluent/plugin/kubernetes_metadata_watch_pods.rb Line 49 in a87ffaf
fluent-plugin-kubernetes_metadata_filter/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb Line 36 in 2271293
As suggested in ManageIQ/kubeclient#275 (comment), we need to:
|
@qingling128 Thoughts about needing to rework #29 to resolve this issue? |
@jcantrill - Yeah, I have a pending PR for this already. Currently trying to test it. How does that approach in #194 (comment) sound to you? Would you like to review the PR when it's out? |
Seems reasonable. I don't know if its necessary to keep track of the version from what I recall. You simply do something like a list, extract the resource version, and start watching from that point. The |
That makes sense. I'll change it to add retries and go with the Sync'd with @jkohen offline as well to confirm this is a legit approach. |
Always restarting by "list then watch from that point" should work at a basic level. If that's undesired, then keeping track of resourceVersion can reduce that risk, at least in theory, letting you resume from exactly the point previous watch stopped iff k8s still remembers that. |
About I'm inclined to say it doesn't matter how the connection closed. Watches are conceptually infinite, but all TCP connections get closed eventually, one way or another. Calling code has to deal with an "infinite" watch finishing, and I don't see any reason it cares how exactly it finished. |
Fluentd crashed with log:
The text was updated successfully, but these errors were encountered: