-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Graceful Websocket Connection Shutdown #3633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@StevenACoffman Curious for your opinion here. If you think my points are legitimate, I can work on a PR. |
@john-markham I'm feverish at the moment, so I don't really trust my own technical judgement at the moment. 😅 However, I am always in favor of PRs that do not break existing behavior and only help fix problems. The tricky part is verifying both of those for websockets. Some of the recent contributors for websocket related code probably have more hands on experience and insight on this particular issue as well. @UnAfraid @telemenar @wiegell @vlad-tokarev @szgupta @df-wg Can you please take a look and share your thoughts? |
@StevenACoffman Feel better. Thank you so much for the energy you bring to this project and for keeping it alive! |
I'm closing this issue. After testing out Add websocket shutdown grace period #3653 in my infra, I saw my application layer shutdown hook was not running in time before my load balancer's de registration delay was up, which is what was causing 99% of the 1006's. I'm still skeptical we implement the full closing handshake -- but it seems like it just doesn't even matter 99% of the time. For the 1%, I think #3653 might be useful. For those in the future looking at this issue, I found this article: https://www.grammarly.com/blog/engineering/perfecting-smooth-rolling-updates-in-amazon-elastic-container-service/ which seems to suggest that these errors are unavoidable, but in my experience that was not the case. Another thing we are considering doing long term is migrating to service mesh to get rid of the ALB related headaches |
Hi! How do we currently gracefully shutdown websocket connections and is it actually graceful?
This has been brought up before but does not seem like the discussion is complete: #2847. The solution there implies canceling outstanding request contexts when the server context shuts down in
InitFunc
is enough andgqlgen
should just handle the Websocket shutdown rigamarole for us. Seems sensible.Using this approach (or not -- this solution seems to have no effect which suggests to me something is up in gqlgen) I see tons of these errors in my
ErrorFunc
on deploy:My
InitFunc
(see below for full code) looks something like:I see those error messages immediately after my
Server shutting down
log.Looking at gqlgen's code, I should hit
closeOnCancel
thenc.close(...)
on server shutdownc.conn
? Isn't that part of the Websocket RFC?These 1006 errors ultimately show up as
Protocol(ResetWithoutClosingHandshake)
in my downstream Rust proxy. I don't think these errors should always be retried (see snapview/tokio-tungstenite#101), so it makes it hard to know what to do even further downstream on the client. These errors are nearly 100% correlated with my deployments and pods being naturally cycled out and not reproducible in any other way.It would be nice if
gorilla/websocket
provided an API for this likenet/http
does for shutting down non-Websocket connections, but it seems like it does not and will not (gorilla/websocket#448).gqlgen
seems like the place we should handle gracefully closing websocket connections, no? It seems like others in the community have started to work around this issue (ethereum/go-ethereum#30482 (comment)), which is really not ideal.My full code:
The text was updated successfully, but these errors were encountered: