Skip to content

_close_event doesn't set on an error in SSHSession.connection_lost #743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RoketFlame opened this issue Apr 2, 2025 · 4 comments
Closed

Comments

@RoketFlame
Copy link

RoketFlame commented Apr 2, 2025

If an exception occurs when calling the SSHSession.connection_lost method (which can happen since I can inherit this class), we will get into an infinite loop because the SSHConnection's self._close_event.set() method in the _cleanup() method will not be invoked

asyncssh/channel.py

# class SSHChannel
    ...
    def _cleanup(self, exc: Optional[Exception] = None) -> None:
        """Clean up this channel"""

        if self._open_waiter:
            if not self._open_waiter.cancelled():  # pragma: no branch
                self._open_waiter.set_exception(
                    ChannelOpenError(OPEN_CONNECT_FAILED, "SSH connection closed")
                )

            self._open_waiter = None

        if self._request_waiters:
            for waiter in self._request_waiters:
                if not waiter.cancelled():  # pragma: no cover
                    if exc:
                        waiter.set_exception(exc)
                    else:
                        waiter.set_result(False)

            self._request_waiters = []

        if self._session is not None:
            self._session.connection_lost(exc) # <- Exception here
            self._session = None
    ...

asyncssh/connection.py

# class SSHConnection
    ...
    def _cleanup(self, exc: Optional[Exception]) -> None:
        """Clean up this connection"""

        self._cancel_keepalive_timer()

        for chan in list(self._channels.values()):
            chan.process_connection_close(exc) # <- Exception here

        for listener in list(self._local_listeners.values()):
            listener.close()

        while self._global_request_waiters:
            self._process_global_response(MSG_REQUEST_FAILURE, 0,
                                          SSHPacket(b''))

        if self._auth:
            self._auth.cancel()
            self._auth = None

        if self._error_handler:
            self._error_handler(self, exc)
            self._acceptor = None
            self._error_handler = None

        if self._wait and self._waiter and not self._waiter.cancelled():
            if exc:
                self._waiter.set_exception(exc)
            else: # pragma: no cover
                self._waiter.set_result(None)

            self._wait = None

        if self._owner: # pragma: no branch
            self._owner.connection_lost(exc)
            self._owner = None

        self._cancel_login_timer()
        self._close_event.set() # <- Will not be called

        self._inpbuf = b''

        if self._tunnel:
            self._tunnel.close()
            self._tunnel = None
    ...

call stack

base_events 1785 ERROR Exception in callback SSHClientConnection._cleanup(None)
handle: <Handle SSHClientConnection._cleanup(None)>
Traceback (most recent call last):
  File "/3.11.8/lib/python3.11/asyncio/events.py", line 84, in _run
    self._context.run(self._callback, *self._args)
  File "/.venv/lib/python3.11/site-packages/asyncssh/connection.py", line 3364, in _cleanup
    super()._cleanup(exc)
  File "/.venv/lib/python3.11/site-packages/asyncssh/connection.py", line 971, in _cleanup
    chan.process_connection_close(exc)
  File "/.venv/lib/python3.11/site-packages/asyncssh/channel.py", line 458, in process_connection_close
    self._cleanup(exc)
  File "/.venv/lib/python3.11/site-packages/asyncssh/channel.py", line 223, in _cleanup
    self._session.connection_lost(exc)
  File "/.venv/lib/python3.11/site-packages/my_ssh.py", line 256, in connection_lost
    raise Exception('oops')
@ronf
Copy link
Owner

ronf commented Apr 2, 2025

Thanks for the report.

Uncaught exceptions generally bubble back out to the SSHConnection's internal_error() function, which triggers a connection-level shutdown via _force_close(), but I guess that's your point. The connection will iterate over all the open channels, closing them, and during that process the session's connection_lost() will end up being called again, triggering another internal error and attempt to clean up.

It looks like this same issue could apply to SSHClient/SSHServer, when their connection_lost() is called from SSHConnection.

I guess in this case it might be best to copy the session into a local variable and setting self._session = None before actually calling the callback. If there are multiple channels which raise exceptions, this might pass through internal error more than once, but it should eventually unwind all of that once all the session closes have been attempted.

The other option would be to catch and log errors in the cleanup code, without going all the back up to internal_error() once a cleanup has been started. Either way, there might be some remnants which are not properly cleaned up, but there isn't really much that can be done about that if the cleanup code itself is failing.

@ronf
Copy link
Owner

ronf commented Apr 5, 2025

Here's a first cut at a fix for this:

diff --git a/asyncssh/channel.py b/asyncssh/channel.py
index ddf1440..0778abf 100644
--- a/asyncssh/channel.py
+++ b/asyncssh/channel.py
@@ -26,6 +26,7 @@ import codecs
 import inspect
 import re
 import signal as _signal
+import sys
 from types import MappingProxyType
 from typing import TYPE_CHECKING, Any, AnyStr, Awaitable, Callable
 from typing import Dict, Generic, Iterable, List, Mapping, Optional
@@ -225,7 +226,13 @@ class SSHChannel(Generic[AnyStr], SSHPacketHandler):
             self._request_waiters = []

         if self._session is not None:
-            self._session.connection_lost(exc)
+            # pylint: disable=broad-except
+            try:
+                self._session.connection_lost(exc)
+            except Exception:
+                self.logger.debug1('Uncaught exception in session ignored',
+                                   exc_info=sys.exc_info)
+
             self._session = None

         self._close_event.set()
diff --git a/asyncssh/connection.py b/asyncssh/connection.py
index 94dcb5e..11a3d0d 100644
--- a/asyncssh/connection.py
+++ b/asyncssh/connection.py
@@ -1075,7 +1075,13 @@ class SSHConnection(SSHPacketHandler, asyncio.Protocol):
             self._wait = None

         if self._owner: # pragma: no branch
-            self._owner.connection_lost(exc)
+            # pylint: disable=broad-except
+            try:
+                self._owner.connection_lost(exc)
+            except Exception:
+                self.logger.debug1('Uncaught exception in owner ignored',
+                                   exc_info=sys.exc_info)
+
             self._owner = None

         self._cancel_login_timer()

Errors in the connection_lost() callback in either SSHClient/SSHServer or SSHClientSession/SSHServerSession are logged (at debug level 1) and include a traceback of the exception. However, processing of the cleanup continues on after that as if nothing happened. This should be enough to always close the AsyncSSH connection successfully, but there may be some cleanup in the session or owner which doesn't run completely.

@ronf
Copy link
Owner

ronf commented Apr 5, 2025

This fix is now available in the "develop" branch as commit 8ae6b2b. Thanks for reporting the issue!

@ronf
Copy link
Owner

ronf commented May 3, 2025

This fix is now included in AsyncSSH 2.21.0.

@ronf ronf closed this as completed May 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants