Skip to content

Fluentd Crash #194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
emillg opened this issue Oct 2, 2019 · 25 comments · Fixed by #204 or GoogleCloudPlatform/google-fluentd#235
Closed

Fluentd Crash #194

emillg opened this issue Oct 2, 2019 · 25 comments · Fixed by #204 or GoogleCloudPlatform/google-fluentd#235

Comments

@emillg
Copy link

emillg commented Oct 2, 2019

Fluentd crashed with log:

#<Thread:0x00007fa17a7d3778@/usr/local/bundle/gems/fluent-plugin-kubernetes_metadata_filter-2.4.0/lib/fluent/plugin/filter_kubernetes_metadata.rb:265 run> terminated with exception (report_on_exception is true):
/usr/local/bundle/gems/http-4.1.1/lib/http/response/parser.rb:14:in `<<': error reading from socket: Could not parse data entirely (1 != 0) (HTTP::ConnectionError)
	from /usr/local/bundle/gems/http-4.1.1/lib/http/response/parser.rb:14:in `add'
	from /usr/local/bundle/gems/http-4.1.1/lib/http/connection.rb:214:in `read_more'
	from /usr/local/bundle/gems/http-4.1.1/lib/http/connection.rb:92:in `readpartial'
	from /usr/local/bundle/gems/http-4.1.1/lib/http/response/body.rb:30:in `readpartial'
	from /usr/local/bundle/gems/http-4.1.1/lib/http/response/body.rb:36:in `each'
	from /usr/local/bundle/gems/kubeclient-4.5.0/lib/kubeclient/watch_stream.rb:25:in `each'
	from /usr/local/bundle/gems/fluent-plugin-kubernetes_metadata_filter-2.4.0/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:36:in `start_namespace_watch'
	from /usr/local/bundle/gems/fluent-plugin-kubernetes_metadata_filter-2.4.0/lib/fluent/plugin/filter_kubernetes_metadata.rb:265:in `block in configure'
/usr/local/bundle/gems/http-4.1.1/lib/http/response/parser.rb:14:in `<<': Could not parse data entirely (1 != 0) (HTTP::Parser::Error)
	from /usr/local/bundle/gems/http-4.1.1/lib/http/response/parser.rb:14:in `add'
	from /usr/local/bundle/gems/http-4.1.1/lib/http/connection.rb:214:in `read_more'
	from /usr/local/bundle/gems/http-4.1.1/lib/http/connection.rb:92:in `readpartial'
	from /usr/local/bundle/gems/http-4.1.1/lib/http/response/body.rb:30:in `readpartial'
	from /usr/local/bundle/gems/http-4.1.1/lib/http/response/body.rb:36:in `each'
	from /usr/local/bundle/gems/kubeclient-4.5.0/lib/kubeclient/watch_stream.rb:25:in `each'
	from /usr/local/bundle/gems/fluent-plugin-kubernetes_metadata_filter-2.4.0/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:36:in `start_namespace_watch'
	from /usr/local/bundle/gems/fluent-plugin-kubernetes_metadata_filter-2.4.0/lib/fluent/plugin/filter_kubernetes_metadata.rb:265:in `block in configure'
Unexpected error error reading from socket: Could not parse data entirely (1 != 0)
  /usr/local/bundle/gems/http-4.1.1/lib/http/response/parser.rb:14:in `<<'
  /usr/local/bundle/gems/http-4.1.1/lib/http/response/parser.rb:14:in `add'
  /usr/local/bundle/gems/http-4.1.1/lib/http/connection.rb:214:in `read_more'
  /usr/local/bundle/gems/http-4.1.1/lib/http/connection.rb:92:in `readpartial'
  /usr/local/bundle/gems/http-4.1.1/lib/http/response/body.rb:30:in `readpartial'
  /usr/local/bundle/gems/http-4.1.1/lib/http/response/body.rb:36:in `each'
  /usr/local/bundle/gems/kubeclient-4.5.0/lib/kubeclient/watch_stream.rb:25:in `each'
  /usr/local/bundle/gems/fluent-plugin-kubernetes_metadata_filter-2.4.0/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:36:in `start_namespace_watch'
  /usr/local/bundle/gems/fluent-plugin-kubernetes_metadata_filter-2.4.0/lib/fluent/plugin/filter_kubernetes_metadata.rb:265:in `block in configure'
@richm
Copy link
Contributor

richm commented Oct 2, 2019

What version of kubernetes is your cluster?
Could it be a bug in kubeclient-4.5.0? @jkohen have you seen this testing the latest plugin with the latest kubeclient?

@emillg
Copy link
Author

emillg commented Oct 2, 2019 via email

@richm
Copy link
Contributor

richm commented Oct 2, 2019

Can you try reverting the kubeclient version to 4.4.x?

@richm
Copy link
Contributor

richm commented Oct 2, 2019

hmm - https://bugzilla.redhat.com/show_bug.cgi?id=1743865 - investigating more

@richm
Copy link
Contributor

richm commented Oct 2, 2019

possible workaround(fix?) - see https://bugzilla.redhat.com/show_bug.cgi?id=1743865#c4 and https://bugzilla.redhat.com/show_bug.cgi?id=1743865#c8 - if you can't view those comments, let me know

@emillg
Copy link
Author

emillg commented Oct 3, 2019

@richm We use managed Kubernetes from cloud provider. I have switched fluent-plugin-kubernetes_metadata_filter to v2.2.0, seems working well.

@richm
Copy link
Contributor

richm commented Oct 3, 2019

@emillg are you using kubernetes_metadata_filter v2.2.0 with kubeclient 4.5.0?

@jkohen
Copy link
Contributor

jkohen commented Oct 3, 2019

@richm I haven't seen that issue. It seems to be connected with namespace watches, and I only modified the pod listing. @cben does this error look familiar?

I could be missing something, but I don't see any relevant changes between releases v2.1.6 (May 15, 2019) and v2.4.0 of kubernetes_metadata_filter, and I don't see any relevant changes between kubeclient v4.4.0 (May 3, 2019) and v4.5.0.

@richm
Copy link
Contributor

richm commented Oct 3, 2019

I don't see any relevant changes between kubeclient v4.4.0 (May 3, 2019) and v4.5.0.

I don't either - but we only recently updated the plugin to use kubeclient 4.4 - before that it was using a quite old version of kubeclient - it may be that @emillg did not use kubeclient 4.4, moving directly to 4.5, and this problem is present in 4.x

The broader problem is that the primary maintainers of this plugin are Red Hatters working on OpenShift logging, which does not use the watch functionality - we have no upstream (here) CI coverage of this feature, and we have no operational experience using "plain" Kubernetes, and no operational experience using watches - we need someone to step up and support the watches feature, or we will simply not support it anymore (unless for some reason we start using it).

@jkohen
Copy link
Contributor

jkohen commented Oct 3, 2019

Let me see if we can take a look at the problem we're seeing here. @qingling128 who can take a look?

@richm We can talk about maintenance beyond this case offline. My -at-google.com email is easy to guess from my GitHub username.

@cben
Copy link

cben commented Oct 4, 2019

One recent change is that kubeclient started allowing both 3.y and 4.y versions for http gem.
I think the traceback crashes inside http?
Can you try with it constrained to some 3.y?
(And if that helps, would it be hard to bisect to exact version?)

I just woke up so I might be talking nonsense 🙃

[P.S. I'll also advertise that I'd be happy to add more maintainers to kubeclient, if anyone is interested. I don't have enough time lately, and my Red Hatter perspective is slightly limiting too.]

@cben
Copy link

cben commented Oct 4, 2019

That bugzilla used http-3.3.0 though.

@emillg
Copy link
Author

emillg commented Oct 4, 2019

@richm This is the Dockerfile I use now. Upgrading fluent-plugin-kubernetes_metadata_filter to a higher version will crash.

FROM fluent/fluentd:v1.7-debian-1

USER root
WORKDIR /home/fluent
ENV PATH /home/fluent/.gem/ruby/2.3.0/bin:$PATH

RUN buildDeps="sudo make gcc g++ libc-dev ruby-dev" \
    && apt-get update \
    && apt-get install -y --no-install-recommends $buildDeps \
    && echo 'gem: --no-document' >> /etc/gemrc \
    && gem install fluent-plugin-splunk-hec -v 1.1.2 \
    && gem install fluent-plugin-kubernetes_metadata_filter -v 2.2.0 \
    && gem install fluent-plugin-multi-format-parser -v 1.0.0 \
    && gem sources --clear-all \
    && SUDO_FORCE_REMOVE=yes apt-get purge -y --auto-remove \
                  -o APT::AutoRemove::RecommendsImportant=false \
                  $buildDeps \
    && rm -rf /var/lib/apt/lists/* \
    && rm -rf /tmp/* /var/tmp/* /usr/lib/ruby/gems/*/cache/*.gem

COPY fluent.conf /fluentd/etc/

ENTRYPOINT ["fluentd"]
CMD ["-c", "/fluentd/etc/fluent.conf"]

@richm
Copy link
Contributor

richm commented Oct 4, 2019

@emillg thanks - can you also attach your kubernetes_metadata plugin config?

@emillg
Copy link
Author

emillg commented Oct 4, 2019

<source>
  @type tail
  path /var/log/containers/*.log
  exclude_path ["/var/log/containers/*_kube-system_*.log"]
  tag kubernetes.*
  <parse>
    @type regexp
    expression ^(?<time>.+)\s(stdout|stderr)\s\w+\s(?<log>.*)$
    time_key time
    time_format %Y-%m-%dT%H:%M:%S.%NZ
  </parse>
</source>

<filter kubernetes.**>
  @type kubernetes_metadata
  skip_master_url true
  skip_labels true
</filter>

@qingling128
Copy link
Contributor

@emillg - Are you using any proxy / firewall / AWS load balancer as mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1743865#c4 and https://bugzilla.redhat.com/show_bug.cgi?id=1743865#c8?

@qingling128
Copy link
Contributor

In fact we are seeing this error as well now. It's reproducible when the Kubernetes master API is temporarily unavailable (e.g. cluster upgrade, cluster resizing):

2019-11-18 12:49:23 +0000 [info]: #0 following tail of /var/log/containers/heapster-67dc58ff6f-97fjf_kube-system_heapster-7fab495079f2ed44c3633a5c32023ee0a396d8fbd7968b22b40945796c0ce979.log
2019-11-18 12:49:23 +0000 [info]: #0 disable filter chain optimization because [Fluent::Plugin::KubernetesMetadataFilter] uses `#filter_stream` method.
2019-11-18 12:49:23 +0000 [info]: #0 following tail of /var/log/containers/heapster-67dc58ff6f-97fjf_kube-system_heapster-nanny-493553b8ed95852e8010c3f174e49d29023e8b4e91fcd8e4a51646ec8373d663.log
2019-11-18 12:49:23 +0000 [info]: #0 disable filter chain optimization because [Fluent::Plugin::KubernetesMetadataFilter] uses `#filter_stream` method.
2019-11-18 12:49:23 +0000 [info]: #0 following tail of /var/log/containers/heapster-67dc58ff6f-97fjf_kube-system_prom-to-sd-64be4ba515feec9da8c3111b9a32a251243777aad7a10d9071d91b1bfeeaba8c.log
2019-11-18 12:49:23 +0000 [info]: #0 disable filter chain optimization because [Fluent::Plugin::KubernetesMetadataFilter] uses `#filter_stream` method.
2019-11-18 12:53:29 +0000 [info]: #0 stats - namespace_cache_size: 0, pod_cache_size: 2, namespace_cache_miss: 24, pod_cache_host_updates: 6, pod_cache_watch_ignored: 2, pod_cache_api_updates: 15, id_cache_miss: 15, pod_cache_watch_updates: 1
2019-11-18 12:53:30 +0000 [info]: #0 disable filter chain optimization because [Fluent::Plugin::PrometheusFilter, Fluent::Plugin::RecordTransformerFilter, Fluent::Plugin::RecordTransformerFilter] uses `#filter_stream` method.
#<Thread:0x0000000002c327b8@/opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/filter_kubernetes_metadata.rb:265 run> terminated with exception (report_on_exception is true):
#<Thread:0x0000000002289928@/opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/filter_kubernetes_metadata.rb:263 run> terminated with exception (report_on_exception is true):
/opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/parser.rb:14:in `<<'/opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/parser.rb:14:in `<<': : error reading from socket: Could not parse data entirely (1 != 0)error reading from socket: Could not parse data entirely (1 != 0) (HTTP::ConnectionError ()
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/parser.rb:14:in `add'
HTTP::ConnectionError)
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/parser.rb:14:in `add'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/connection.rb:214:in `read_more'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/connection.rb:214:in `read_more'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/connection.rb:92:in `readpartial'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/connection.rb:92:in `readpartial'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/body.rb:30:in `readpartial'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/body.rb:30:in `readpartial'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/body.rb:36:in `each'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/body.rb:36:in `each'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/kubeclient-4.4.1/lib/kubeclient/watch_stream.rb:25:in `each'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:36:in `start_namespace_watch'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/kubeclient-4.4.1/lib/kubeclient/watch_stream.rb:25:in `each'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/filter_kubernetes_metadata.rb:265:in `block in configure'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/kubernetes_metadata_watch_pods.rb:52:in `start_pod_watch'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/filter_kubernetes_metadata.rb:263:in `block in configure'
#<Thread:0x0000000002c328d0@/opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/filter_kubernetes_metadata.rb:263 run> terminated with exception (report_on_exception is true):
/opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/parser.rb:14:in `<<': error reading from socket: Could not parse data entirely (1 != 0) (HTTP::ConnectionError)
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/parser.rb:14:in `add'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/connection.rb:214:in `read_more'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/connection.rb:92:in `readpartial'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/body.rb:30:in `readpartial'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/body.rb:36:in `each'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/kubeclient-4.4.1/lib/kubeclient/watch_stream.rb:25:in `each'
Unexpected error error reading from socket: Could not parse data entirely (1 != 0)
  /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/parser.rb:14:in `<<'
  /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/parser.rb:14:in `add'
  /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/connection.rb:214:in `read_more'
  /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/connection.rb:92:in `readpartial'
  /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/body.rb:30:in `readpartial'
  /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/body.rb:36:in `each'
  /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/kubeclient-4.4.1/lib/kubeclient/watch_stream.rb:25:in `each'
  /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/kubernetes_metadata_watch_pods.rb:52:in `start_pod_watch'
  /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/filter_kubernetes_metadata.rb:263:in `block in configure'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/kubernetes_metadata_watch_pods.rb:52:in `start_pod_watch'
	from /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/filter_kubernetes_metadata.rb:263:in `block in configure'
  2019-11-18 12:53:40 +0000 [error]: #0 /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/parser.rb:14:in `<<'
  2019-11-18 12:53:40 +0000 [error]: #0 /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/parser.rb:14:in `add'
  2019-11-18 12:53:40 +0000 [error]: #0 /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/connection.rb:214:in `read_more'
  2019-11-18 12:53:40 +0000 [error]: #0 /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/connection.rb:92:in `readpartial'
  2019-11-18 12:53:40 +0000 [error]: #0 /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/body.rb:30:in `readpartial'
  2019-11-18 12:53:40 +0000 [error]: #0 /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/http-4.1.1/lib/http/response/body.rb:36:in `each'
  2019-11-18 12:53:40 +0000 [error]: #0 /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/kubeclient-4.4.1/lib/kubeclient/watch_stream.rb:25:in `each'
  2019-11-18 12:53:40 +0000 [error]: #0 /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:36:in `start_namespace_watch'
  2019-11-18 12:53:40 +0000 [error]: #0 /opt/google-fluentd/embedded/lib/ruby/gems/2.5.0/gems/fluent-plugin-kubernetes_metadata_filter-2.3.1/lib/fluent/plugin/filter_kubernetes_metadata.rb:265:in `block in configure'
2019-11-18 12:53:40 +0000 [error]: #0 unexpected error error_class=HTTP::ConnectionError error="error reading from socket: Could not parse data entirely (1 != 0)"
  2019-11-18 12:53:40 +0000 [error]: #0 suppressed same stacktrace

The agent that did not crash had the following versions:

fluent-plugin-kubernetes_metadata_filter (2.2.0)
kubeclient (1.1.4)
http (0.9.8)

The agent that crashes has the following versions:

fluent-plugin-kubernetes_metadata_filter (2.4.0)
kubeclient (4.5.0)
http (4.1.1)

The cause might be that the parser return value changed in the http gem from version 0.9.8 to 4.1.1, which is no longer handled properly.

@qingling128
Copy link
Contributor

Might be related to httprb/http#556.

@qingling128
Copy link
Contributor

After some digging, it seems that the watcher.each in

and need to handle retries when connection is closed.

As suggested in ManageIQ/kubeclient#275 (comment), we need to:

  1. Retry when the connection is closed in the watch;
  2. Ketry keep track of resource_version and pass it when retrying

@jcantrill
Copy link
Contributor

@qingling128 Thoughts about needing to rework #29 to resolve this issue?

@qingling128
Copy link
Contributor

@jcantrill - Yeah, I have a pending PR for this already. Currently trying to test it. How does that approach in #194 (comment) sound to you? Would you like to review the PR when it's out?

@jcantrill
Copy link
Contributor

1. Retry when the connection is closed in the watch;

2. Ketry keep track of resource_version and pass it when retrying

Seems reasonable. I don't know if its necessary to keep track of the version from what I recall. You simply do something like a list, extract the resource version, and start watching from that point. The kube or oc binary with a loglevel=9 will show the REST calls and the sequence of the calls need to do a proper watch

@qingling128
Copy link
Contributor

That makes sense. I'll change it to add retries and go with the listing from resourceVersion 0 then starting a watch approach during retries.

Sync'd with @jkohen offline as well to confirm this is a legit approach.

@cben
Copy link

cben commented Jan 7, 2020

Always restarting by "list then watch from that point" should work at a basic level.
The only downside is it might miss a few changes that happen between old watch disconnecting and the list request.

If that's undesired, then keeping track of resourceVersion can reduce that risk, at least in theory, letting you resume from exactly the point previous watch stopped iff k8s still remembers that.
You need a fallback to "list then watch" if you get an http 410 Gone error though, which I guess is best expressed by a double loop.
(In theory, even a pure "list then watch" loop risks 410 and should catch and loop again, but that's as simpler 1 loop.)

@cben
Copy link

cben commented Jan 7, 2020

About HTTP::ConnectionError: I guess it's inconsistent that Kubeclient leaks HTTP errors from watch as-is.
It's unclear to me from httprb/http#556 whether this is what "normal" disconnection by apiserver looks like, and whether it's abnormal / violates HTTP.
Should kubeclient report it up but wrapped in a Kubeclient::HttpError (like it does for RestClient errors for all operations except watching), or should it be swallowed and .each just cleanly exit?
https://bugzilla.redhat.com/show_bug.cgi?id=1743865 suggests it's not k8s apiserver fault, but may be middleboxes severing the connection (in that case, raising AWS load balances timeout was confirmed to help).

I'm inclined to say it doesn't matter how the connection closed. Watches are conceptually infinite, but all TCP connections get closed eventually, one way or another. Calling code has to deal with an "infinite" watch finishing, and I don't see any reason it cares how exactly it finished.
=> So I propose Kubeclient will swallow the error and exit .each cleanly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment