-
Notifications
You must be signed in to change notification settings - Fork 16
EXPERIMENTAL: Improve WebSocket Re-Connections #1216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🦋 Changeset detectedLatest commit: 475184b The changes in this PR will be included in the next version bump. This PR includes changesets to release 6 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
packages/core/src/BaseSession.ts
Outdated
@@ -451,6 +461,12 @@ export class BaseSession { | |||
const payload = this.decode<JSONRPCRequest | JSONRPCResponse>(event.data) | |||
this.logger.wsTraffic({ type: 'recv', payload }) | |||
|
|||
if (this._waitingAuthStateUpdate && !isAuthStateEvent(payload)) { | |||
// we lost a `signalwire.authorization.state` probably in a WS reconnection | |||
// the server should always send before any other msg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Astaelan can you validate this assumption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should be correct, I can't find any other messages that could sneak through. A ping maybe but should still never come before the auth state and connect response. If auth fails however, an error response may come without auth state, so you should validate that potential.
The other thing to note is that if you reconnect with an authorization state and nothing changes, then you may not get an auth state, because nothing changed, validate this scenario as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Astaelan does that mean after sending the “signalwire.connect”, the SDK should NOT send any request before receiving the “auth_state” event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The longer answer is that you should never depend on authorization state events in the first place. They can come whenever Hagrid decides it needs the client to retain new information on a reconnect, and then only sends the authorization state event if something actually changed. This may or may not happen right after connect, may or may not happen during an invite, even broadcast stuff can change it I believe. So I originally proposed NOT depending on the auth state event itself.
Instead, what I originally proposed, is that if the client starts a verto.invite request, it should not consider the invite/call valid until it receives the invite response. If there is going to be an authorization state update, it will happen during the invite before the response returns to the client. Approaching it based on the invite response means that getting an invite response is the trigger for a healthy call (if it is a successful response) and if you needed an auth state you would have gotten one before the response.
If you approach it like this, then the client would "throw away" knowledge about the verto.invite it sent if it gets disconnected before receiving the invite response. This would in turn mean that when reconnecting with the same protocol, it would ignore the verto.answer event pending for that previous call attempt, and could start a new call with a new call id. This would be the most reliable way since auth state is just a state update and should not be treated as gaurenteed to ever occur, there is a sort of delta check to only send if it changes.
What you CAN depend on, is that IF there is going to be an authorization state update during a verto.invite, then you WILL receive it before the verto.invite response. Therefore you can depend on the verto.invite response as the point when things are healthy for a reconnect/hijack/reattach.
Therefore to answer the original question in shorter form, it doesn't really matter. You may not get an auth state, so don't depend on it, depend on the invite response if you need something to control state and flow, but there is nothing stopping you from sending a ping or ping response or other things while waiting for an invite response. The main point here is not to assume you can reattach a call until after you get the invite response, if you get disconnected before the invite response you should throw the call away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Then I need to make changes...
The verto.invite
is easy to handle it like you said. My problem was with the verto.answer
The verto.answer
is mostly why I did this way,,,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way for the SDK to understand the stored auth_state is an old/stale auth_state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, it's supposed to be blind and just send the latest. It doesn't know if it hasn't received one because of a disconnect. Making any assumptions about expecting to receive one is also sketchy as it may not change depending on the scenario (like reattaching attempts).
The only true point of safe knowledge is that once we receive an invite response, we can assume IF we needed an updated authorization state then we will have already received it before the invite response making this call "reattachable".
Maybe we need to think about this slightly different, the simplest way I can think about this is "An outbound call is not reattachable until we get the invite response", which means we should not try to reattach, answer, or deal with anything related to that call if we see information about it after a reconnect that says it's not reattachable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you link the ticket to this PR so I can find it, thanks.
Could you please share in which scenario the SDK sends this request twice?
What does it mean by lost? How does the SDK lose this during the call? |
@@ -78,8 +80,9 @@ export class SATSession extends JWTSession { | |||
variation: this.options.apiRequestRetriesDelayIncrement, | |||
}), | |||
expectedErrorHandler: (error) => { | |||
if (error?.message?.startsWith('Authentication failed')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please share in which scenario the SDK sends this request twice?
In the case of a double signalwire.connect
the server, don't fail the authentication. But with a "Method not recognized". You can verify in the production logs in the protocol signalwire_5d863653-9c10-414c-a9ee-5368b06da742_a988d370-f27a-4b97-8011-89b1bda89b83_b85f0246-f186-4d79-a42c-266cced37153
a real case.
The problem was two retry mechanisms in "parallel" for the signalwire.connect
. With this change, we let only the connection handle the retries.
An A real case in production can be verified with the connection id |
Discussions moved to https://github.com/signalwire/cloud-product/issues/14632 |
@jpsantosbh, since you mentioned this PR in the standup for review. I’m honestly not sure what we’re trying to achieve here, it seems the scope of the PR may have changed significantly. As @Astaelan mentioned, the SDK only needs to:
I strongly suggest splitting this PR work into multiple PRs, each addressing a single task. Otherwise, reviewing the PR will be very hard and would generate a lot of back-and-forth. |
Sure... |
Description
This changes are intented for internal test/usage since ATM the e2e corver were removed, because it was providing false positives.
Depending on the cause and the signalling stage the WebSocket reconnection needs to baheve differently.
signalwire.connetc
in the same sessionauthorizantion_state
sync is lostverto.invites
if the intance still have it refence stateType of change
Code snippets
In case of new feature or breaking changes, please include code snippets.