Skip to content

Add message with debugging info to Cancelled #3256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
May 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions newsfragments/3232.feature.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
:exc:`Cancelled` strings can now display the source and reason for a cancellation. Trio-internal sources of cancellation will set this string, and :meth:`CancelScope.cancel` now has a ``reason`` string parameter that can be used to attach info to any :exc:`Cancelled` to help in debugging.
4 changes: 3 additions & 1 deletion src/trio/_channel.py
Original file line number Diff line number Diff line change
Expand Up @@ -547,7 +547,9 @@ async def context_manager(
yield wrapped_recv_chan
# User has exited context manager, cancel to immediately close the
# abandoned generator if it's still alive.
nursery.cancel_scope.cancel()
nursery.cancel_scope.cancel(
"exited trio.as_safe_channel context manager"
)
except BaseExceptionGroup as eg:
try:
raise_single_exception_from_group(eg)
Expand Down
4 changes: 3 additions & 1 deletion src/trio/_core/_asyncgens.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,9 @@ async def _finalize_one(
# with an exception, not even a Cancelled. The inside
# is cancelled so there's no deadlock risk.
with _core.CancelScope(shield=True) as cancel_scope:
cancel_scope.cancel()
cancel_scope.cancel(
reason="disallow async work when closing async generators during trio shutdown"
)
await agen.aclose()
except BaseException:
ASYNCGEN_LOGGER.exception(
Expand Down
52 changes: 49 additions & 3 deletions src/trio/_core/_exceptions.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,26 @@
from __future__ import annotations

from typing import TYPE_CHECKING
from functools import partial
from typing import TYPE_CHECKING, Literal

import attrs

from trio._util import NoPublicConstructor, final

if TYPE_CHECKING:
from collections.abc import Callable

from typing_extensions import Self, TypeAlias

CancelReasonLiteral: TypeAlias = Literal[
"KeyboardInterrupt",
"deadline",
"explicit",
"nursery",
"shutdown",
"unknown",
]


class TrioInternalError(Exception):
"""Raised by :func:`run` if we encounter a bug in Trio, or (possibly) a
Expand Down Expand Up @@ -34,6 +48,7 @@ class WouldBlock(Exception):


@final
@attrs.define(eq=False, kw_only=True)
class Cancelled(BaseException, metaclass=NoPublicConstructor):
"""Raised by blocking calls if the surrounding scope has been cancelled.

Expand Down Expand Up @@ -67,11 +82,42 @@ class Cancelled(BaseException, metaclass=NoPublicConstructor):

"""

source: CancelReasonLiteral
# repr(Task), so as to avoid gc troubles from holding a reference
source_task: str | None = None
Copy link
Contributor

@A5rocks A5rocks Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure repr(Task) is actually that useful? Like yes, it says what function this was started in and maybe you can locate the task through some gc machinery through the id, but...

Maybe a weakref would be more useful so people can access attributes? I feel like "where was I spawned" is already answered by the stack trace. (nevermind, just remembered this is the canceller not the cancellee)


Nevermind, I didn't think this suggestion through. A weakref wouldn't work for the common case (a task cancelling a sibling task).

I'm not convinced a strong ref here would be bad -- a Task doesn't store the exception or the cancellation reason so there's no reference cycle I think? But a string here is fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a Task contains a CancelScope and the scope gets cancelled within the same task, the scope will then have a strong ref to a CancelReason which will then point back to the Task. I think?

the repr(Task) on its own is perhaps not super useful, but in case you have multiple cancellations going on at the same time that are only distinguished by the source task then you can visually distinguish them even without other sources of the task id.
Though it does also contain the name of the function itself:
<Task 'trio._core._tests.test_run.test_Cancelled_str' at 0x\w*>
which could be very helpful if you have different functions spawned in a nursery.

Copy link
Contributor

@A5rocks A5rocks May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I guess task -> parent/child nursery -> cancel scope -> cancel reason -> task is a loop, yeah. That's annoying. (Or maybe CoroutineType stores frames as a strongref? That too.)

reason: str | None = None

def __str__(self) -> str:
return "Cancelled"
return (
f"cancelled due to {self.source}"
+ ("" if self.reason is None else f" with reason {self.reason!r}")
+ ("" if self.source_task is None else f" from task {self.source_task}")
)

def __reduce__(self) -> tuple[Callable[[], Cancelled], tuple[()]]:
return (Cancelled._create, ())
# The `__reduce__` tuple does not support directly passing kwargs, and the
# kwargs are required so we can't use the third item for adding to __dict__,
# so we use partial.
return (
partial(
Cancelled._create,
source=self.source,
source_task=self.source_task,
reason=self.reason,
),
(),
)

if TYPE_CHECKING:
# for type checking on internal code
@classmethod
def _create(
cls,
*,
source: CancelReasonLiteral,
source_task: str | None = None,
reason: str | None = None,
) -> Self: ...


class BusyResourceError(Exception):
Expand Down
143 changes: 124 additions & 19 deletions src/trio/_core/_run.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,12 @@
from ._asyncgens import AsyncGenerators
from ._concat_tb import concat_tb
from ._entry_queue import EntryQueue, TrioToken
from ._exceptions import Cancelled, RunFinishedError, TrioInternalError
from ._exceptions import (
Cancelled,
CancelReasonLiteral,
RunFinishedError,
TrioInternalError,
)
from ._instrumentation import Instruments
from ._ki import KIManager, enable_ki_protection
from ._parking_lot import GLOBAL_PARKING_LOT_BREAKER
Expand Down Expand Up @@ -305,7 +310,7 @@ def expire(self, now: float) -> bool:
did_something = True
# This implicitly calls self.remove(), so we don't need to
# decrement _active here
cancel_scope.cancel()
cancel_scope._cancel(CancelReason(source="deadline"))
# If we've accumulated too many stale entries, then prune the heap to
# keep it under control. (We only do this occasionally in a batch, to
# keep the amortized cost down)
Expand All @@ -314,6 +319,20 @@ def expire(self, now: float) -> bool:
return did_something


@attrs.define
class CancelReason:
"""Attached to a :class:`CancelScope` upon cancellation with details of the source of the
cancellation, which is then used to construct the string in a :exc:`Cancelled`.
Users can pass a ``reason`` str to :meth:`CancelScope.cancel` to set it.

Not publicly exported or documented.
"""

source: CancelReasonLiteral
source_task: str | None = None
reason: str | None = None


@attrs.define(eq=False)
class CancelStatus:
"""Tracks the cancellation status for a contiguous extent
Expand Down Expand Up @@ -468,6 +487,14 @@ def recalculate(self) -> None:
or current.parent_cancellation_is_visible_to_us
)
if new_state != current.effectively_cancelled:
if (
current._scope._cancel_reason is None
and current.parent_cancellation_is_visible_to_us
):
assert current._parent is not None
current._scope._cancel_reason = (
current._parent._scope._cancel_reason
)
current.effectively_cancelled = new_state
if new_state:
for task in current._tasks:
Expand Down Expand Up @@ -558,6 +585,8 @@ class CancelScope:
_cancel_called: bool = attrs.field(default=False, init=False)
cancelled_caught: bool = attrs.field(default=False, init=False)

_cancel_reason: CancelReason | None = attrs.field(default=None, init=False)

# Constructor arguments:
_relative_deadline: float = attrs.field(
default=inf,
Expand Down Expand Up @@ -594,7 +623,7 @@ def __enter__(self) -> Self:
self._relative_deadline = inf

if current_time() >= self._deadline:
self.cancel()
self._cancel(CancelReason(source="deadline"))
with self._might_change_registered_deadline():
self._cancel_status = CancelStatus(scope=self, parent=task._cancel_status)
task._activate_cancel_status(self._cancel_status)
Expand Down Expand Up @@ -883,19 +912,42 @@ def shield(self, new_value: bool) -> None:
self._cancel_status.recalculate()

@enable_ki_protection
def cancel(self) -> None:
"""Cancels this scope immediately.

This method is idempotent, i.e., if the scope was already
cancelled then this method silently does nothing.
def _cancel(self, cancel_reason: CancelReason | None) -> None:
"""Internal sources of cancellation should use this instead of :meth:`cancel`
in order to set a more detailed :class:`CancelReason`
Helper or high-level functions can use `cancel`.
"""
if self._cancel_called:
return

if self._cancel_reason is None:
self._cancel_reason = cancel_reason

with self._might_change_registered_deadline():
self._cancel_called = True

if self._cancel_status is not None:
self._cancel_status.recalculate()

@enable_ki_protection
def cancel(self, reason: str | None = None) -> None:
"""Cancels this scope immediately.

The optional ``reason`` argument accepts a string, which will be attached to
any resulting :exc:`Cancelled` exception to help you understand where that
cancellation is coming from and why it happened.

This method is idempotent, i.e., if the scope was already
cancelled then this method silently does nothing.
"""
try:
current_task = repr(_core.current_task())
except RuntimeError:
current_task = None
self._cancel(
CancelReason(reason=reason, source="explicit", source_task=current_task)
)

@property
def cancel_called(self) -> bool:
"""Readonly :class:`bool`. Records whether cancellation has been
Expand Down Expand Up @@ -924,7 +976,7 @@ def cancel_called(self) -> bool:
# but it makes the value returned by cancel_called more
# closely match expectations.
if not self._cancel_called and current_time() >= self._deadline:
self.cancel()
self._cancel(CancelReason(source="deadline"))
return self._cancel_called


Expand Down Expand Up @@ -1192,9 +1244,9 @@ def parent_task(self) -> Task:
"(`~trio.lowlevel.Task`): The Task that opened this nursery."
return self._parent_task

def _add_exc(self, exc: BaseException) -> None:
def _add_exc(self, exc: BaseException, reason: CancelReason | None) -> None:
Copy link
Contributor

@A5rocks A5rocks Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the only callers of this are internal, IMO it would be cleaner to have them set the reason inline. Also, to avoid multiple comments for similar things, why doesn't this unconditionally set _cancel_reason = reason if it isn't None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why doesn't this unconditionally set _cancel_reason = reason if it isn't None?

in case we get multiple sources of cancellation I don't want to override the first one. In other places it's more critical, but here I could see a scenario where:

  1. Something causes a cancellation, be it a deadline or a crashing task or whatever
  2. a different task B gets cancelled, but they have an except Cancelled, and inside that handler they raise a different exception
  3. without if self.cancel_scope._cancel_reason is None: the cause would now get set to task B raising an exception

so I'm pretty sure we need the if, which means we'd need to write the if three times if we did it in-line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not relevant anymore

Copy link
Contributor

@A5rocks A5rocks May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the code to see why this isn't relevant anymore, but I already typed up a response comment to this:

I'm not entirely convinced avoiding this is a good thing:

async def crasher():
  await trio.sleep(2)
  raise ValueError("...")

with trio.open_nursery() as nursery:
  try:
    nursery.start_soon(crasher)
    try:
      await trio.sleep(10)  # app code
    finally:
      with trio.move_on_after(2, shield=True):
        await trio.sleep(3)  # cleanup code
  except trio.Cancelled as c:
    # what should c's cancel reason be
    raise

This might matter for instance in code for shutting down stuff on exceptions, moving on after 2 seconds. The cancel reason if the clean up code ran over its 2 seconds would presumably (?) be that the exceptions happened, not that the timeout happened. I think it would make more sense if the reason was instead about the timeout.

(I haven't played with this PR yet so I'm not sure that's actually what will happen)


Would it make sense to establish some sort of causal mechanism? I.e. a field on CancelReason that points to the old CancelReason. (I guess Cancelled could store another Cancelled? But that might be bad for cycles.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah current behavior is that the crasher becomes the reason.

Storing a chain of Cancelled sounds tricky and very likely to induce gc problems. I'm pretty sure we'd have to store any raised Cancelled in the scope itself in order to be able to refer back to them.

... although the crash cancellation should be accessible somehow in the finally scope to be set as __context__. I wonder where that is getting lost

But storing a chain of reasons would be fairly straightforward and sounds like it might have some good use

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I found a repro where sys.exc_info() got cleared, but I might have mistaken myself and idr the repro anymore.

But going back to your example:
I have a pretty strong intuition that the reason the nursery scope is canceled is because a child crashed. The deadline is the reason the inner scope inside the finally is canceled, but that cancellation will be swallowed by move_on_after and even in a world where we stored a chain of reasons the nursery scope would never see the deadline cancellation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point: what's the cancel reason visible inside the move_on_after then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it was deadline, with the child-crashing cancelled in its __context__, because of the shielding. I can add a test case for it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, yeah that behavior sounds nice. Returning to the earliest response you have, is that (nursery cancel -> raise a different exception in one of the tasks) the only case where things try to overwrite the cancellation reason? If so, I think it would be nicer to make nurseries not try to cancel if they are already cancelled (which would prevent the cancellation reason from being overwritten).


I also see that changing the deadline can potentially overwrite. I don't see why that would try to cancel anything if things are already cancelled... I guess just code simplicity.

I guess it makes sense to try to handle it in one place with a check on the cancellation reason, then. I just don't like it!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha. Yeah if we rewrote everything from scratch we might implement it differently.

But I think there's a bunch of ways to re-cancel, including simply calling cs.cancel(...) multiple times

self._pending_excs.append(exc)
self.cancel_scope.cancel()
self.cancel_scope._cancel(reason)

def _check_nursery_closed(self) -> None:
if not any([self._nested_child_running, self._children, self._pending_starts]):
Expand All @@ -1210,7 +1262,14 @@ def _child_finished(
) -> None:
self._children.remove(task)
if isinstance(outcome, Error):
self._add_exc(outcome.error)
self._add_exc(
outcome.error,
CancelReason(
source="nursery",
source_task=repr(task),
reason=f"child task raised exception {outcome.error!r}",
),
)
self._check_nursery_closed()

async def _nested_child_finished(
Expand All @@ -1220,7 +1279,14 @@ async def _nested_child_finished(
# Returns ExceptionGroup instance (or any exception if the nursery is in loose mode
# and there is just one contained exception) if there are pending exceptions
if nested_child_exc is not None:
self._add_exc(nested_child_exc)
self._add_exc(
nested_child_exc,
reason=CancelReason(
source="nursery",
source_task=repr(self._parent_task),
reason=f"Code block inside nursery contextmanager raised exception {nested_child_exc!r}",
),
)
self._nested_child_running = False
self._check_nursery_closed()

Expand All @@ -1231,7 +1297,13 @@ async def _nested_child_finished(
def aborted(raise_cancel: _core.RaiseCancelT) -> Abort:
exn = capture(raise_cancel).error
if not isinstance(exn, Cancelled):
self._add_exc(exn)
self._add_exc(
exn,
CancelReason(
source="KeyboardInterrupt",
source_task=repr(self._parent_task),
),
)
# see test_cancel_scope_exit_doesnt_create_cyclic_garbage
del exn # prevent cyclic garbage creation
return Abort.FAILED
Expand All @@ -1245,7 +1317,8 @@ def aborted(raise_cancel: _core.RaiseCancelT) -> Abort:
try:
await cancel_shielded_checkpoint()
except BaseException as exc:
self._add_exc(exc)
# there's no children to cancel, so don't need to supply cancel reason
self._add_exc(exc, reason=None)

popped = self._parent_task._child_nurseries.pop()
assert popped is self
Expand Down Expand Up @@ -1575,8 +1648,17 @@ def _attempt_delivery_of_any_pending_cancel(self) -> None:
if not self._cancel_status.effectively_cancelled:
return

reason = self._cancel_status._scope._cancel_reason

def raise_cancel() -> NoReturn:
raise Cancelled._create()
if reason is None:
raise Cancelled._create(source="unknown", reason="misnesting")
else:
raise Cancelled._create(
source=reason.source,
reason=reason.reason,
source_task=reason.source_task,
)

self._attempt_abort(raise_cancel)

Expand Down Expand Up @@ -2075,15 +2157,27 @@ async def init(
)

# Main task is done; start shutting down system tasks
self.system_nursery.cancel_scope.cancel()
self.system_nursery.cancel_scope._cancel(
CancelReason(
source="shutdown",
reason="main task done, shutting down system tasks",
source_task=repr(self.init_task),
)
)

# System nursery is closed; finalize remaining async generators
await self.asyncgens.finalize_remaining(self)

# There are no more asyncgens, which means no more user-provided
# code except possibly run_sync_soon callbacks. It's finally safe
# to stop the run_sync_soon task and exit run().
run_sync_soon_nursery.cancel_scope.cancel()
run_sync_soon_nursery.cancel_scope._cancel(
CancelReason(
source="shutdown",
reason="main task done, shutting down run_sync_soon callbacks",
source_task=repr(self.init_task),
)
)

################
# Outside context problems
Expand Down Expand Up @@ -2926,7 +3020,18 @@ async def checkpoint() -> None:
if task._cancel_status.effectively_cancelled or (
task is task._runner.main_task and task._runner.ki_pending
):
with CancelScope(deadline=-inf):
cs = CancelScope(deadline=-inf)
if (
task._cancel_status._scope._cancel_reason is None
and task is task._runner.main_task
and task._runner.ki_pending
):
task._cancel_status._scope._cancel_reason = CancelReason(
source="KeyboardInterrupt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of strange to me to set this here. Is there no way for the thing raising KeyboardInterrupt to set this cancel reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's a super ugly place.

KeyboardInterrupt is raised in two places: in Task._attempt_delivery_of_pending_ki where we ... probably could access self._parent_nursery.cancel_scope to set the reason. But the other one is in _ki.py KIManager.install() and that one can't see shit. But if we want it to be analogous to Cancelled that one doesn't set the reason on raising, it sets it when the scope is cancelled - and Cancelled is only raised later when a checkpoint is actually hit. And #3233 would have to throw that away

Though the current logic doesn't make sense in that regard either, the scope has already effectively been canceled, but it's not until we checkpoint that we set the reason to KeyboardInterrupt.

I'd love to place it in Runner.deliver_ki, which is the one that actually sets ki_pending = True... but navigating from there to the revelant scope and set a reason is non-trivial.

With the current setup there's even a case for saying we shouldn't bother setting a reason at all, because the scope hasn't actually been canceled. And during KI we don't raise Cancelled, so we don't have a consumer for the CancelReason. You can see it through introspection.. but that seems like a big stretch during KI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my current error when trying to do it in deliver_ki is that setting self.main_task._cancel_status._scope._cancel_reason errors because _cancel_status can be None... which confuses me because there's a comment saying it can only be None in the init task.. but I'm explicitly accessing main_task :S

But I'm starting to lean towards not setting the reason, though #3233 would have to revisit that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we can just leave the status quo and fix it in #3233 unless we expect the PRs to be in separate releases

)
assert task._cancel_status._scope._cancel_reason is not None
cs._cancel_reason = task._cancel_status._scope._cancel_reason
with cs:
await _core.wait_task_rescheduled(lambda _: _core.Abort.SUCCEEDED)


Expand Down
Loading
Loading