Fluentd Crash #194

emillg · 2019-10-02T21:28:10Z

Fluentd crashed with log:

#<Thread:0x00007fa17a7d3778@/usr/local/bundle/gems/fluent-plugin-kubernetes_metadata_filter-2.4.0/lib/fluent/plugin/filter_kubernetes_metadata.rb:265 run> terminated with exception (report_on_exception is true):
/usr/local/bundle/gems/http-4.1.1/lib/http/response/parser.rb:14:in `<<': error reading from socket: Could not parse data entirely (1 != 0) (HTTP::ConnectionError)
	from /usr/local/bundle/gems/http-4.1.1/lib/http/response/parser.rb:14:in `add'
	from /usr/local/bundle/gems/http-4.1.1/lib/http/connection.rb:214:in `read_more'
	from /usr/local/bundle/gems/http-4.1.1/lib/http/connection.rb:92:in `readpartial'
	from /usr/local/bundle/gems/http-4.1.1/lib/http/response/body.rb:30:in `readpartial'
	from /usr/local/bundle/gems/http-4.1.1/lib/http/response/body.rb:36:in `each'
	from /usr/local/bundle/gems/kubeclient-4.5.0/lib/kubeclient/watch_stream.rb:25:in `each'
	from /usr/local/bundle/gems/fluent-plugin-kubernetes_metadata_filter-2.4.0/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:36:in `start_namespace_watch'
	from /usr/local/bundle/gems/fluent-plugin-kubernetes_metadata_filter-2.4.0/lib/fluent/plugin/filter_kubernetes_metadata.rb:265:in `block in configure'
/usr/local/bundle/gems/http-4.1.1/lib/http/response/parser.rb:14:in `<<': Could not parse data entirely (1 != 0) (HTTP::Parser::Error)
	from /usr/local/bundle/gems/http-4.1.1/lib/http/response/parser.rb:14:in `add'
	from /usr/local/bundle/gems/http-4.1.1/lib/http/connection.rb:214:in `read_more'
	from /usr/local/bundle/gems/http-4.1.1/lib/http/connection.rb:92:in `readpartial'
	from /usr/local/bundle/gems/http-4.1.1/lib/http/response/body.rb:30:in `readpartial'
	from /usr/local/bundle/gems/http-4.1.1/lib/http/response/body.rb:36:in `each'
	from /usr/local/bundle/gems/kubeclient-4.5.0/lib/kubeclient/watch_stream.rb:25:in `each'
	from /usr/local/bundle/gems/fluent-plugin-kubernetes_metadata_filter-2.4.0/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:36:in `start_namespace_watch'
	from /usr/local/bundle/gems/fluent-plugin-kubernetes_metadata_filter-2.4.0/lib/fluent/plugin/filter_kubernetes_metadata.rb:265:in `block in configure'
Unexpected error error reading from socket: Could not parse data entirely (1 != 0)
  /usr/local/bundle/gems/http-4.1.1/lib/http/response/parser.rb:14:in `<<'
  /usr/local/bundle/gems/http-4.1.1/lib/http/response/parser.rb:14:in `add'
  /usr/local/bundle/gems/http-4.1.1/lib/http/connection.rb:214:in `read_more'
  /usr/local/bundle/gems/http-4.1.1/lib/http/connection.rb:92:in `readpartial'
  /usr/local/bundle/gems/http-4.1.1/lib/http/response/body.rb:30:in `readpartial'
  /usr/local/bundle/gems/http-4.1.1/lib/http/response/body.rb:36:in `each'
  /usr/local/bundle/gems/kubeclient-4.5.0/lib/kubeclient/watch_stream.rb:25:in `each'
  /usr/local/bundle/gems/fluent-plugin-kubernetes_metadata_filter-2.4.0/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:36:in `start_namespace_watch'
  /usr/local/bundle/gems/fluent-plugin-kubernetes_metadata_filter-2.4.0/lib/fluent/plugin/filter_kubernetes_metadata.rb:265:in `block in configure'

The text was updated successfully, but these errors were encountered:

richm · 2019-10-02T21:32:11Z

What version of kubernetes is your cluster?
Could it be a bug in kubeclient-4.5.0? @jkohen have you seen this testing the latest plugin with the latest kubeclient?

emillg · 2019-10-02T21:55:42Z

Kubernetes version is 1.13. Fluentd crashes sometimes, not always. And I can see the metadata is correctly parsed. I guess it crashes only when parsing fails?

…

On 2 Oct 2019, at 23:32, Richard Megginson ***@***.***> wrote: What version of kubernetes is your cluster? Could it be a bug in kubeclient-4.5.0? @jkohen have you seen this testing the latest plugin with the latest kubeclient? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

richm · 2019-10-02T22:20:35Z

Can you try reverting the kubeclient version to 4.4.x?

richm · 2019-10-02T22:55:33Z

hmm - https://bugzilla.redhat.com/show_bug.cgi?id=1743865 - investigating more

richm · 2019-10-02T22:57:39Z

possible workaround(fix?) - see https://bugzilla.redhat.com/show_bug.cgi?id=1743865#c4 and https://bugzilla.redhat.com/show_bug.cgi?id=1743865#c8 - if you can't view those comments, let me know

emillg · 2019-10-03T11:14:46Z

@richm We use managed Kubernetes from cloud provider. I have switched fluent-plugin-kubernetes_metadata_filter to v2.2.0, seems working well.

richm · 2019-10-03T14:38:51Z

@emillg are you using kubernetes_metadata_filter v2.2.0 with kubeclient 4.5.0?

jkohen · 2019-10-03T15:38:47Z

@richm I haven't seen that issue. It seems to be connected with namespace watches, and I only modified the pod listing. @cben does this error look familiar?

I could be missing something, but I don't see any relevant changes between releases v2.1.6 (May 15, 2019) and v2.4.0 of kubernetes_metadata_filter, and I don't see any relevant changes between kubeclient v4.4.0 (May 3, 2019) and v4.5.0.

richm · 2019-10-03T16:54:39Z

I don't see any relevant changes between kubeclient v4.4.0 (May 3, 2019) and v4.5.0.

I don't either - but we only recently updated the plugin to use kubeclient 4.4 - before that it was using a quite old version of kubeclient - it may be that @emillg did not use kubeclient 4.4, moving directly to 4.5, and this problem is present in 4.x

The broader problem is that the primary maintainers of this plugin are Red Hatters working on OpenShift logging, which does not use the watch functionality - we have no upstream (here) CI coverage of this feature, and we have no operational experience using "plain" Kubernetes, and no operational experience using watches - we need someone to step up and support the watches feature, or we will simply not support it anymore (unless for some reason we start using it).

jkohen · 2019-10-03T17:40:50Z

Let me see if we can take a look at the problem we're seeing here. @qingling128 who can take a look?

@richm We can talk about maintenance beyond this case offline. My -at-google.com email is easy to guess from my GitHub username.

cben · 2019-10-04T04:26:16Z

One recent change is that kubeclient started allowing both 3.y and 4.y versions for http gem.
I think the traceback crashes inside http?
Can you try with it constrained to some 3.y?
(And if that helps, would it be hard to bisect to exact version?)

I just woke up so I might be talking nonsense 🙃

[P.S. I'll also advertise that I'd be happy to add more maintainers to kubeclient, if anyone is interested. I don't have enough time lately, and my Red Hatter perspective is slightly limiting too.]

cben · 2019-10-04T04:31:42Z

That bugzilla used http-3.3.0 though.

emillg · 2019-10-04T17:20:12Z

@richm This is the Dockerfile I use now. Upgrading fluent-plugin-kubernetes_metadata_filter to a higher version will crash.

FROM fluent/fluentd:v1.7-debian-1

USER root
WORKDIR /home/fluent
ENV PATH /home/fluent/.gem/ruby/2.3.0/bin:$PATH

RUN buildDeps="sudo make gcc g++ libc-dev ruby-dev" \
    && apt-get update \
    && apt-get install -y --no-install-recommends $buildDeps \
    && echo 'gem: --no-document' >> /etc/gemrc \
    && gem install fluent-plugin-splunk-hec -v 1.1.2 \
    && gem install fluent-plugin-kubernetes_metadata_filter -v 2.2.0 \
    && gem install fluent-plugin-multi-format-parser -v 1.0.0 \
    && gem sources --clear-all \
    && SUDO_FORCE_REMOVE=yes apt-get purge -y --auto-remove \
                  -o APT::AutoRemove::RecommendsImportant=false \
                  $buildDeps \
    && rm -rf /var/lib/apt/lists/* \
    && rm -rf /tmp/* /var/tmp/* /usr/lib/ruby/gems/*/cache/*.gem

COPY fluent.conf /fluentd/etc/

ENTRYPOINT ["fluentd"]
CMD ["-c", "/fluentd/etc/fluent.conf"]

richm · 2019-10-04T17:22:50Z

@emillg thanks - can you also attach your kubernetes_metadata plugin config?

emillg · 2019-10-04T17:30:58Z

<source>
  @type tail
  path /var/log/containers/*.log
  exclude_path ["/var/log/containers/*_kube-system_*.log"]
  tag kubernetes.*
  <parse>
    @type regexp
    expression ^(?<time>.+)\s(stdout|stderr)\s\w+\s(?<log>.*)$
    time_key time
    time_format %Y-%m-%dT%H:%M:%S.%NZ
  </parse>
</source>

<filter kubernetes.**>
  @type kubernetes_metadata
  skip_master_url true
  skip_labels true
</filter>

qingling128 · 2019-10-09T01:58:54Z

@emillg - Are you using any proxy / firewall / AWS load balancer as mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1743865#c4 and https://bugzilla.redhat.com/show_bug.cgi?id=1743865#c8?

qingling128 · 2020-01-02T21:43:43Z

In fact we are seeing this error as well now. It's reproducible when the Kubernetes master API is temporarily unavailable (e.g. cluster upgrade, cluster resizing):

2019-11-18 12:49:23 +0000 [info]: #0 following tail of /var/log/containers/heapster-67dc58ff6f-97fjf_kube-system_heapster-7fab495079f2ed44c3633a5c32023ee0a396d8fbd7968b22b40945796c0ce979.log
2019-11-18 12:49:23 +0000 [info]: #0 disable filter chain optimization because [Fluent::Plugin::KubernetesMetadataFilter] uses `#filter_stream` method.
2019-11-18 12:49:23 +0000 [info]: #0 following tail of /var/log/containers/heapster-67dc58ff6f-97fjf_kube-system_heapster-nanny-493553b8ed95852e8010c3f174e49d29023e8b4e91fcd8e4a51646ec8373d663.log
2019-11-18 12:49:23 +0000 [info]: #0 disable filter chain optimization because [Fluent::Plugin::KubernetesMetadataFilter] uses `#filter_stream` method.
2019-11-18 12:49:23 +0000 [info]: #0 following tail of /var/log/containers/heapster-67dc58ff6f-97fjf_kube-system_prom-to-sd-64be4ba515feec9da8c3111b9a32a251243777aad7a10d9071d91b1bfeeaba8c.log
2019-11-18 12:49:23 +0000 [info]: #0 disable filter chain optimization because [Fluent::Plugin::KubernetesMetadataFilter] uses `#filter_stream` method.
2019-11-18 12:53:29 +0000 [info]: #0 stats - namespace_cache_size: 0, pod_cache_size: 2, namespace_cache_miss: 24, pod_cache_host_updates: 6, pod_cache_watch_ignored: 2, pod_cache_api_updates: 15, id_cache_miss: 15, pod_cache_watch_updates: 1
2019-11-18 12:53:30 +0000 [info]: #0 disable filter chain optimization because [Fluent::Plugin::PrometheusFilter, Fluent::Plugin::RecordTransformerFilter, Fluent::Plugin::RecordTransformerFilter] uses `#filter_stream` method.
#<Thread:0x0000000002c327b8@/opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/filter_kubernetes_metadata.rb:265 run> terminated with exception (report_on_exception is true):
#<Thread:0x0000000002289928@/opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/filter_kubernetes_metadata.rb:263 run> terminated with exception (report_on_exception is true):
/opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/parser.rb:14:in `<<'/opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/parser.rb:14:in `<<': : error reading from socket: Could not parse data entirely (1 != 0)error reading from socket: Could not parse data entirely (1 != 0) (HTTP::ConnectionError ()
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/parser.rb:14:in `add'
HTTP::ConnectionError)
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/parser.rb:14:in `add'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/connection.rb:214:in `read_more'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/connection.rb:214:in `read_more'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/connection.rb:92:in `readpartial'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/connection.rb:92:in `readpartial'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/body.rb:30:in `readpartial'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/body.rb:30:in `readpartial'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/body.rb:36:in `each'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/body.rb:36:in `each'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/kubeclient-4.4.1/lib/kubeclient/watch_stream.rb:25:in `each'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:36:in `start_namespace_watch'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/kubeclient-4.4.1/lib/kubeclient/watch_stream.rb:25:in `each'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/filter_kubernetes_metadata.rb:265:in `block in configure'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/kubernetes_metadata_watch_pods.rb:52:in `start_pod_watch'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/filter_kubernetes_metadata.rb:263:in `block in configure'
#<Thread:0x0000000002c328d0@/opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/filter_kubernetes_metadata.rb:263 run> terminated with exception (report_on_exception is true):
/opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/parser.rb:14:in `<<': error reading from socket: Could not parse data entirely (1 != 0) (HTTP::ConnectionError)
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/parser.rb:14:in `add'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/connection.rb:214:in `read_more'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/connection.rb:92:in `readpartial'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/body.rb:30:in `readpartial'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/body.rb:36:in `each'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/kubeclient-4.4.1/lib/kubeclient/watch_stream.rb:25:in `each'
Unexpected error error reading from socket: Could not parse data entirely (1 != 0)
  /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/parser.rb:14:in `<<'
  /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/parser.rb:14:in `add'
  /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/connection.rb:214:in `read_more'
  /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/connection.rb:92:in `readpartial'
  /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/body.rb:30:in `readpartial'
  /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/body.rb:36:in `each'
  /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/kubeclient-4.4.1/lib/kubeclient/watch_stream.rb:25:in `each'
  /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/kubernetes_metadata_watch_pods.rb:52:in `start_pod_watch'
  /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/filter_kubernetes_metadata.rb:263:in `block in configure'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/kubernetes_metadata_watch_pods.rb:52:in `start_pod_watch'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/filter_kubernetes_metadata.rb:263:in `block in configure'
  2019-11-18 12:53:40 +0000 [error]: #0 /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/parser.rb:14:in `<<'
  2019-11-18 12:53:40 +0000 [error]: #0 /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/parser.rb:14:in `add'
  2019-11-18 12:53:40 +0000 [error]: #0 /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/connection.rb:214:in `read_more'
  2019-11-18 12:53:40 +0000 [error]: #0 /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/connection.rb:92:in `readpartial'
  2019-11-18 12:53:40 +0000 [error]: #0 /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/body.rb:30:in `readpartial'
  2019-11-18 12:53:40 +0000 [error]: #0 /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/body.rb:36:in `each'
  2019-11-18 12:53:40 +0000 [error]: #0 /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/kubeclient-4.4.1/lib/kubeclient/watch_stream.rb:25:in `each'
  2019-11-18 12:53:40 +0000 [error]: #0 /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:36:in `start_namespace_watch'
  2019-11-18 12:53:40 +0000 [error]: #0 /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/filter_kubernetes_metadata.rb:265:in `block in configure'
2019-11-18 12:53:40 +0000 [error]: #0 unexpected error error_class=HTTP::ConnectionError error="error reading from socket: Could not parse data entirely (1 != 0)"
  2019-11-18 12:53:40 +0000 [error]: #0 suppressed same stacktrace

The agent that did not crash had the following versions:

fluent-plugin-kubernetes_metadata_filter (2.2.0)
kubeclient (1.1.4)
http (0.9.8)

The agent that crashes has the following versions:

fluent-plugin-kubernetes_metadata_filter (2.4.0)
kubeclient (4.5.0)
http (4.1.1)

The cause might be that the parser return value changed in the http gem from version 0.9.8 to 4.1.1, which is no longer handled properly.

qingling128 · 2020-01-02T21:49:38Z

Might be related to httprb/http#556.

qingling128 · 2020-01-02T22:53:34Z

After some digging, it seems that the watcher.each in

fluent-plugin-kubernetes_metadata_filter/lib/fluent/plugin/kubernetes_metadata_watch_pods.rb

Line 49 in a87ffaf

watcher.each do |notice|

and

fluent-plugin-kubernetes_metadata_filter/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb

Line 36 in 2271293

watcher.each do |notice|

need to handle retries when connection is closed.

As suggested in ManageIQ/kubeclient#275 (comment), we need to:

Retry when the connection is closed in the watch;
Ketry keep track of resource_version and pass it when retrying

jcantrill · 2020-01-03T16:36:56Z

@qingling128 Thoughts about needing to rework #29 to resolve this issue?

qingling128 · 2020-01-03T17:30:28Z

@jcantrill - Yeah, I have a pending PR for this already. Currently trying to test it. How does that approach in #194 (comment) sound to you? Would you like to review the PR when it's out?

jcantrill · 2020-01-03T18:27:09Z

1. Retry when the connection is closed in the watch;

2. Ketry keep track of resource_version and pass it when retrying

Seems reasonable. I don't know if its necessary to keep track of the version from what I recall. You simply do something like a list, extract the resource version, and start watching from that point. The kube or oc binary with a loglevel=9 will show the REST calls and the sequence of the calls need to do a proper watch

qingling128 · 2020-01-03T21:00:20Z

That makes sense. I'll change it to add retries and go with the listing from resourceVersion 0 then starting a watch approach during retries.

Sync'd with @jkohen offline as well to confirm this is a legit approach.

cben · 2020-01-07T17:14:12Z

Always restarting by "list then watch from that point" should work at a basic level.
The only downside is it might miss a few changes that happen between old watch disconnecting and the list request.

If that's undesired, then keeping track of resourceVersion can reduce that risk, at least in theory, letting you resume from exactly the point previous watch stopped iff k8s still remembers that.
You need a fallback to "list then watch" if you get an http 410 Gone error though, which I guess is best expressed by a double loop.
(In theory, even a pure "list then watch" loop risks 410 and should catch and loop again, but that's as simpler 1 loop.)

cben · 2020-01-07T17:34:44Z

About HTTP::ConnectionError: I guess it's inconsistent that Kubeclient leaks HTTP errors from watch as-is.
It's unclear to me from httprb/http#556 whether this is what "normal" disconnection by apiserver looks like, and whether it's abnormal / violates HTTP.
Should kubeclient report it up but wrapped in a Kubeclient::HttpError (like it does for RestClient errors for all operations except watching), or should it be swallowed and .each just cleanly exit?
https://bugzilla.redhat.com/show_bug.cgi?id=1743865 suggests it's not k8s apiserver fault, but may be middleboxes severing the connection (in that case, raising AWS load balances timeout was confirmed to help).

I'm inclined to say it doesn't matter how the connection closed. Watches are conceptually infinite, but all TCP connections get closed eventually, one way or another. Calling code has to deal with an "infinite" watch finishing, and I don't see any reason it cares how exactly it finished.
=> So I propose Kubeclient will swallow the error and exit .each cleanly.

Fixes fabric8io/fluent-plugin-kubernetes_metadata_filter#194.

qingling128 mentioned this issue Jan 7, 2020

Add retry logic in pod and namespace watches when Kubernetes API connection gets closed. #204

Merged

jcantrill closed this as completed in #204 Jan 21, 2020

qingling128 added a commit to GoogleCloudPlatform/google-fluentd that referenced this issue Jan 24, 2020

Upgrade to fluent-plugin-kubernetes_metadata_filter 2.4.2.

ea32296

Fixes fabric8io/fluent-plugin-kubernetes_metadata_filter#194.

qingling128 mentioned this issue Jan 24, 2020

Upgrade to fluent-plugin-kubernetes_metadata_filter 2.4.2. GoogleCloudPlatform/google-fluentd#235

Merged

qingling128 added a commit to GoogleCloudPlatform/google-fluentd that referenced this issue Jan 24, 2020

Upgrade to fluent-plugin-kubernetes_metadata_filter 2.4.2. (#235)

ff2b370

Fixes fabric8io/fluent-plugin-kubernetes_metadata_filter#194.

cben mentioned this issue Feb 27, 2020

document that watcher stops see #273 and add each_with_retry ManageIQ/kubeclient#275

Open

nickngch mentioned this issue Mar 17, 2020

fluentd crashed when implemented Contrail SDN rancher/rancher#26079

Open

qingling128 mentioned this issue Apr 6, 2020

The cpu is very high when I use v2.4.5 #224

Closed

andrzej-stencel mentioned this issue Oct 9, 2020

Fluentd pods are restarted when connection to API server is lost SumoLogic/sumologic-kubernetes-collection#995

Closed

robbiezhang mentioned this issue Nov 9, 2020

Daemonset restarting several times fluent/fluentd-kubernetes-daemonset#495

Closed

bentastic27 mentioned this issue Dec 10, 2020

V1 Logging on recently upgraded/created AKS clusters fails with "Connection reset by peer" rancher/rancher#30425

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fluentd Crash #194

Fluentd Crash #194

emillg commented Oct 2, 2019

richm commented Oct 2, 2019

emillg commented Oct 2, 2019 via email

richm commented Oct 2, 2019

richm commented Oct 2, 2019

richm commented Oct 2, 2019

emillg commented Oct 3, 2019

richm commented Oct 3, 2019

jkohen commented Oct 3, 2019

richm commented Oct 3, 2019

jkohen commented Oct 3, 2019

cben commented Oct 4, 2019

cben commented Oct 4, 2019

emillg commented Oct 4, 2019

richm commented Oct 4, 2019

emillg commented Oct 4, 2019

qingling128 commented Oct 9, 2019

qingling128 commented Jan 2, 2020

qingling128 commented Jan 2, 2020

qingling128 commented Jan 2, 2020

jcantrill commented Jan 3, 2020

qingling128 commented Jan 3, 2020

jcantrill commented Jan 3, 2020

qingling128 commented Jan 3, 2020

cben commented Jan 7, 2020

cben commented Jan 7, 2020 •

edited

Loading

Fluentd Crash #194

Fluentd Crash #194

Comments

emillg commented Oct 2, 2019

richm commented Oct 2, 2019

emillg commented Oct 2, 2019 via email

richm commented Oct 2, 2019

richm commented Oct 2, 2019

richm commented Oct 2, 2019

emillg commented Oct 3, 2019

richm commented Oct 3, 2019

jkohen commented Oct 3, 2019

richm commented Oct 3, 2019

jkohen commented Oct 3, 2019

cben commented Oct 4, 2019

cben commented Oct 4, 2019

emillg commented Oct 4, 2019

richm commented Oct 4, 2019

emillg commented Oct 4, 2019

qingling128 commented Oct 9, 2019

qingling128 commented Jan 2, 2020

qingling128 commented Jan 2, 2020

qingling128 commented Jan 2, 2020

jcantrill commented Jan 3, 2020

qingling128 commented Jan 3, 2020

jcantrill commented Jan 3, 2020

qingling128 commented Jan 3, 2020

cben commented Jan 7, 2020

cben commented Jan 7, 2020 • edited Loading

cben commented Jan 7, 2020 •

edited

Loading