Skip to content

std: stop using TLS in signal handler #140628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 18, 2025
Merged

Conversation

joboet
Copy link
Member

@joboet joboet commented May 3, 2025

TLS is not async-signal-safe, making its use in the signal handler used to detect stack overflows unsound (c.f. #133698). POSIX however lists two thread-specific identifiers that can be obtained in a signal handler: the current pthread_t and the address of errno. Since pthread_equal is not AS-safe, pthread_t should be considered opaque, so for our purposes, &errno is the only option. This however works nicely: we can use the address as a key into a map that stores information for each thread. This PR uses a BTreeMap protected by a spin lock to hold the guard page address and thread name and thus fixes #133698.

@joboet joboet added A-thread-locals Area: Thread local storage (TLS) T-libs Relevant to the library team, which will review and decide on the PR/issue. labels May 3, 2025
@rustbot
Copy link
Collaborator

rustbot commented May 3, 2025

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added O-unix Operating system: Unix-like S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 3, 2025
@rust-log-analyzer

This comment has been minimized.

@workingjubilee
Copy link
Member

thread 'main' panicked at tests/pass/weak_memory/weak.rs:145:5:
assertion failed: std::iter::repeat_with(|| f()).take(100).any(|x| x)

well, that doesn't seem good

@rustbot
Copy link
Collaborator

rustbot commented May 4, 2025

The Miri subtree was changed

cc @rust-lang/miri

@joboet
Copy link
Member Author

joboet commented May 4, 2025

well, that doesn't seem good

That's no cause for concern, these miri tests just assumed that there is no synchronisation between threads created independently from each other. That's no longer true however as the threads all acquire the thread info lock. I've fixed this now by moving the operations around a bit and by using yield_now.

Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Miri test changes are right, but I find these tests a lot harder to read now. We should stop using thread::spawn and instead directly call pthread_create or so to avoid the spurious synchronization that std incurs.

Copy link
Member

@Mark-Simulacrum Mark-Simulacrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General approach seems fine to me.

//! Unfortunately, because thread local storage isn't async-signal-safe, we
//! cannot soundly use it in our stack overflow handler. While this works
//! without problems on most platforms, it can lead to undefined behaviour
//! on others (such as GNU/Linux). Luckily, the POSIX specification documents
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we expect that all cfg(unix) platforms satisfy POSIX? Is that a safe assumption to make?

It seems a little odd to me that errno is somehow special (vs arbitrary thread-locals)...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a little odd to me that errno is somehow special (vs arbitrary thread-locals)...

The C runtime can rely on the fact that it's always loaded and use e.g. the initial-exec TLS model – which is async-signal-safe – for errno (that's what glibc does). Or they embed errno into the thread control block, which is accessible through the thread pointer (that's what musl does).

I guess we expect that all cfg(unix) platforms satisfy POSIX? Is that a safe assumption to make?

The expectation must only hold for platforms that enable the stack overflow handler.

Copy link
Member

@workingjubilee workingjubilee May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a safe assumption to make on various niche cases, however if they implement any threading, have signals, yet do not implement POSIX correctly-enough for this, they will have a lot of problems that we will immediately notice. In particular, this is a very mild demand: "implement errno in a POSIX-conforming way". If you have threads and signals AND want to run C code, you pretty much have to do this. Your only choice with being non-conformant in that case is to deliberately pick a non-conforming implementation that still somehow doesn't get stomped by C code running concurrently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice the only other implementation of errno I would expect to see is the global version that gets stomped by running concurrently and thus cannot support threading.

// This uses a `BTreeMap` instead of a hashmap since it supports constant
// initialization and automatically reduces the amount of memory used when
// items are removed.
static mut THREAD_INFO: BTreeMap<usize, ThreadInfo> = BTreeMap::new();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe s/static mut/ with a SyncUnsafeCell or something like that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly about this, but I didn't think it any clearer...

// items are removed.
static mut THREAD_INFO: BTreeMap<usize, ThreadInfo> = BTreeMap::new();

struct UnlockOnDrop;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this be SpinGuard<'a, BTreeMap<usize, ThreadInfo>> or so?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would necessitate Deref and DerefMut implementations that aren't really adding any clarity.

}
}

pub fn set_current_info(guard_page_range: Range<usize>, thread_name: Option<Box<str>>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why not take the ThreadInfo struct as the argument?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the struct creation to be unnecessary clutter at the call site.

@Mark-Simulacrum
Copy link
Member

That's no longer true however as the threads all acquire the thread info lock

Are we OK with that guarantee propagating out to "userspace"? I guess it's sort of unavoidable if we want some kind of lock at thread creation/death but it feels a bit like that should be a more intentional decision?

I guess it's undocumented but might still be a bit hard to claw back if we found a way to eliminate the lock later.

@joboet
Copy link
Member Author

joboet commented May 5, 2025

I think the Miri test changes are right, but I find these tests a lot harder to read now.

I've added a helper function that abstracts all of the yield_now logic away, so the tests are hopefully clearer now.

@rust-log-analyzer

This comment has been minimized.

@joboet joboet force-pushed the async_signal_safe branch from e2e4fda to 3280a1f Compare May 5, 2025 09:06
@RalfJung
Copy link
Member

RalfJung commented May 5, 2025

disable the stack overflow handler on miri

Hm, I guess that is the other option. It'd also let us remove some other hacks from Miri probably, since we don't really support the functions that code uses. OTOH it does leave some code without our usual UB test coverage.

TLS is not async-signal-safe, making its use in the signal handler used to detect stack overflows unsound (c.f. rust-lang#133698). POSIX however lists two thread-specific identifiers that can be obtained in a signal handler: the current `pthread_t` and the address of `errno`. Since `pthread_equal` is not AS-safe, `pthread_t` should be considered opaque, so for our purposes, `&errno` is the only option. This however works nicely: we can use the address as a key into a map that stores information for each thread. This PR uses a `BTreeMap` protected by a spin lock to hold the guard page address and thread name and thus fixes rust-lang#133698.
@joboet joboet force-pushed the async_signal_safe branch from 3280a1f to fb91fdb Compare May 5, 2025 13:19
mod thread_info;

// miri doesn't model signals, and this code has some synchronization properties
// that we don't want to expose to user code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// that we don't want to expose to user code.
// that we don't want to expose to user code. Also, Miri fundamentally cannot even have a stack overflow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Miri could definitely have stack overflows. It's just that random aborts aren't a useful think to emulate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could Miri have stack overflows? The stack is a Vec<Frame>. We can have a push fail due to OOM, but that will just abort the interpreter aprubtly in a well-defined way.

@Mark-Simulacrum
Copy link
Member

r=me modulo Ralf's suggestion. No strong opinion on the remaining unresolved threads, seems fine to leave as-is.

@joboet joboet force-pushed the async_signal_safe branch from fb91fdb to 2cdbd69 Compare May 16, 2025 12:26
@joboet
Copy link
Member Author

joboet commented May 16, 2025

@bors r=@Mark-Simulacrum

@bors
Copy link
Collaborator

bors commented May 16, 2025

📌 Commit 2cdbd69 has been approved by Mark-Simulacrum

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 16, 2025
fmease added a commit to fmease/rust that referenced this pull request May 18, 2025
…imulacrum

std: stop using TLS in signal handler

TLS is not async-signal-safe, making its use in the signal handler used to detect stack overflows unsound (c.f. rust-lang#133698). POSIX however lists two thread-specific identifiers that can be obtained in a signal handler: the current `pthread_t` and the address of `errno`. Since `pthread_equal` is not AS-safe, `pthread_t` should be considered opaque, so for our purposes, `&errno` is the only option. This however works nicely: we can use the address as a key into a map that stores information for each thread. This PR uses a `BTreeMap` protected by a spin lock to hold the guard page address and thread name and thus fixes rust-lang#133698.
bors added a commit to rust-lang-ci/rust that referenced this pull request May 18, 2025
Rollup of 6 pull requests

Successful merges:

 - rust-lang#127013 (Add `f16` formatting and parsing)
 - rust-lang#140154 (Cygwin support in rustc)
 - rust-lang#140628 (std: stop using TLS in signal handler)
 - rust-lang#140966 (Remove #![feature(let_chains)] from library and src/librustdoc)
 - rust-lang#140994 (replace `cc_detect::cc2ar` with `cc::try_get_archiver`)
 - rust-lang#141127 (bump windows crate for compiler,bootstrap and tools)

r? `@ghost`
`@rustbot` modify labels: rollup
fmease added a commit to fmease/rust that referenced this pull request May 18, 2025
…imulacrum

std: stop using TLS in signal handler

TLS is not async-signal-safe, making its use in the signal handler used to detect stack overflows unsound (c.f. rust-lang#133698). POSIX however lists two thread-specific identifiers that can be obtained in a signal handler: the current `pthread_t` and the address of `errno`. Since `pthread_equal` is not AS-safe, `pthread_t` should be considered opaque, so for our purposes, `&errno` is the only option. This however works nicely: we can use the address as a key into a map that stores information for each thread. This PR uses a `BTreeMap` protected by a spin lock to hold the guard page address and thread name and thus fixes rust-lang#133698.
bors added a commit to rust-lang-ci/rust that referenced this pull request May 18, 2025
Rollup of 10 pull requests

Successful merges:

 - rust-lang#127013 (Add `f16` formatting and parsing)
 - rust-lang#138940 (Stabilize the avx512 target features)
 - rust-lang#140154 (Cygwin support in rustc)
 - rust-lang#140490 (split `asm!` parsing and validation)
 - rust-lang#140628 (std: stop using TLS in signal handler)
 - rust-lang#140746 (name resolution for guard patterns)
 - rust-lang#140926 (Return value of coroutine_layout fn changed to Result with LayoutError)
 - rust-lang#141127 (bump windows crate for compiler,bootstrap and tools)
 - rust-lang#141214 (Miri subtree update)
 - rust-lang#141218 (gvn: avoid creating overlapping assignments)

r? `@ghost`
`@rustbot` modify labels: rollup
bors added a commit to rust-lang-ci/rust that referenced this pull request May 18, 2025
Rollup of 8 pull requests

Successful merges:

 - rust-lang#138940 (Stabilize the avx512 target features)
 - rust-lang#140490 (split `asm!` parsing and validation)
 - rust-lang#140628 (std: stop using TLS in signal handler)
 - rust-lang#140746 (name resolution for guard patterns)
 - rust-lang#140926 (Return value of coroutine_layout fn changed to Result with LayoutError)
 - rust-lang#141127 (bump windows crate for compiler,bootstrap and tools)
 - rust-lang#141214 (Miri subtree update)
 - rust-lang#141218 (gvn: avoid creating overlapping assignments)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 6f415e0 into rust-lang:master May 18, 2025
6 checks passed
@rustbot rustbot added this to the 1.89.0 milestone May 18, 2025
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request May 18, 2025
Rollup merge of rust-lang#140628 - joboet:async_signal_safe, r=Mark-Simulacrum

std: stop using TLS in signal handler

TLS is not async-signal-safe, making its use in the signal handler used to detect stack overflows unsound (c.f. rust-lang#133698). POSIX however lists two thread-specific identifiers that can be obtained in a signal handler: the current `pthread_t` and the address of `errno`. Since `pthread_equal` is not AS-safe, `pthread_t` should be considered opaque, so for our purposes, `&errno` is the only option. This however works nicely: we can use the address as a key into a map that stores information for each thread. This PR uses a `BTreeMap` protected by a spin lock to hold the guard page address and thread name and thus fixes rust-lang#133698.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-thread-locals Area: Thread local storage (TLS) O-unix Operating system: Unix-like S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

std assumes that accessing an already-initialized TLS variable is async-signal-safe
8 participants