|
| 1 | +- Affected components: `bare-metal`; `cortex-m` and similar |
| 2 | +- Feature Name: sound_mutex |
| 3 | +- Start Date: 2019-10-23 |
| 4 | +- RFC PR: (leave this empty) |
| 5 | +- Rust Issue: (leave this empty) |
| 6 | + |
| 7 | +# Summary |
| 8 | + |
| 9 | +`bare_metal::Mutex`, re-exported in the `cortex-m` and other crates, is a mutex |
| 10 | +based on critical sections that temporarily disable all interrupts. As provided |
| 11 | +today this abstraction is unsound in multi-core context because `Mutex` can be |
| 12 | +stored in `static` variables, which are visible to all cores, but interrupt |
| 13 | +masking is *not* sufficient to synchronize (potentially) parallel access (i.e. |
| 14 | +access from different cores) to memory. |
| 15 | + |
| 16 | +This document proposes that we deprecate the existing `Mutex` abstraction in |
| 17 | +favor of a mutex that properly expresses the idea that mutexes based on |
| 18 | +interrupt-masking is only "Sync" in single-core context. |
| 19 | + |
| 20 | +# Motivation |
| 21 | + |
| 22 | +Today it's possible to write multi-core applications for homogeneous multi-core |
| 23 | +devices on stable Rust, as evidenced in the [lpcxpresso55S69] repository. In |
| 24 | +fact, it has been possible to write multi-core Cortex-M applications since Rust |
| 25 | +1.30.0, the release in which it became possible to write single-core Cortex-M |
| 26 | +applications on stable. |
| 27 | + |
| 28 | +[lpcxpresso55S69]: https://github.com/japaric/lpcxpresso55S69 |
| 29 | + |
| 30 | +Writing homogeneous multi-core applications requires no special build steps and |
| 31 | +these programs make use of the same compilation targets used for single-core |
| 32 | +applications (e.g. `thumbv7m-none-eabi`). For this reason, it is not possible to |
| 33 | +restrict the parts of crates like `cortex-m` that these multi-core applications |
| 34 | +can use. Therefore, these crates can *not* assume that they will only be used in |
| 35 | +single-core contexts as that may to lead to abstractions that are single-core |
| 36 | +sound but multi-core unsound. |
| 37 | + |
| 38 | +`cortex_m::Mutex` is one abstraction that assumes a single-core context. The |
| 39 | +following example demonstrates how this "safe" abstraction leads to a data race |
| 40 | +in a multi-core application. |
| 41 | + |
| 42 | +``` rust |
| 43 | +use core::cell::Cell; |
| 44 | +use cortex_m::{Mutex, interrupt}; |
| 45 | + |
| 46 | +static X: Mutex<Cell<u64>> = Mutex::new(Cell::new(0)); |
| 47 | + |
| 48 | +#[entry(0)] // entry point of core #0 |
| 49 | +fn main0() { |
| 50 | + interrupt::free(|cs| loop { |
| 51 | + let x = X.borrow(cs); |
| 52 | + x.set(x.get() + 1); // (A) |
| 53 | + }); |
| 54 | +} |
| 55 | + |
| 56 | +#[entry(1)] |
| 57 | +fn main1() { |
| 58 | + interrupt::free(|cs| loop { |
| 59 | + let x = X.borrow(cs); |
| 60 | + x.set(!x.get()); // (B) |
| 61 | + }); |
| 62 | +} |
| 63 | +``` |
| 64 | + |
| 65 | +Here the statements A and B run in parallel because disabling the interrupts on |
| 66 | +one core has no effect on other cores. |
| 67 | + |
| 68 | +Proliferation of this unsound `Mutex` API, specially in libraries, will |
| 69 | +inevitably lead to hard to debug soundness issues in multi-core programs (the |
| 70 | +root of the problem could be in a crate deep in the dependency graph). It is |
| 71 | +important to promptly remove this unsound API before multi-core applications |
| 72 | +become more widespread. |
| 73 | + |
| 74 | +# Detailed design |
| 75 | + |
| 76 | +## `SingleCore*` |
| 77 | + |
| 78 | +This document proposes adopting a new set of `Send` / `Sync` traits that |
| 79 | +are "only needs to be sound in single-core context" variants of the ones |
| 80 | +provided by the `core` crate. The `Send` and `Sync` traits in `core` will |
| 81 | +continue to mean "must be sound in single-core AND multi-core context". These |
| 82 | +traits will go into a separate crate so they can be used by other libraries and |
| 83 | +frameworks. The full implementation of these traits is shown below -- all names |
| 84 | +are up for debate: |
| 85 | + |
| 86 | +``` rust |
| 87 | +// crate: single-core-send-sync |
| 88 | + |
| 89 | +pub auto trait SingleCoreSend {} |
| 90 | + |
| 91 | +pub auto trait SingleCoreSync {} |
| 92 | + |
| 93 | +// replicate all the `Send` / `Sync` impls in core |
| 94 | +impl<T> !SingleCoreSync for *mut T {} |
| 95 | +impl<T> !SingleCoreSync for *const T {} |
| 96 | +unsafe impl<'a, T> SingleCoreSend for &'a T where T: SingleCoreSync {} |
| 97 | +unsafe impl<'a, T> SingleCoreSend for &'a mut T where T: SingleCoreSync {} |
| 98 | +// .. |
| 99 | + |
| 100 | +// all multi-core Send types are also single-core Send |
| 101 | +unsafe impl<T> SingleCoreSend for T where T: Send {} |
| 102 | + |
| 103 | +// all multi-core Sync types are also single-core Sync |
| 104 | +unsafe impl<T> SingleCoreSync for T where T: Sync {} |
| 105 | +``` |
| 106 | + |
| 107 | +With these traits in place the existing `Mutex` can be replaced with one that |
| 108 | +implements `SingleCoreSync`, but not `Sync`. |
| 109 | + |
| 110 | +``` rust |
| 111 | +// crate: bare-metal |
| 112 | + |
| 113 | +pub struct SingleCoreMutex<T> { /* .. */ } |
| 114 | + |
| 115 | +unsafe impl<T> SingleCoreSync for SingleCoreMutex<T> {} |
| 116 | + |
| 117 | +impl<T> SingleCoreMutex<T> { |
| 118 | + pub const fn new(data: T) -> Self { |
| 119 | + // .. |
| 120 | + } |
| 121 | + |
| 122 | + // .. and the rest of the `Mutex` API (e.g. `borrow`) .. |
| 123 | +} |
| 124 | +``` |
| 125 | + |
| 126 | +### Usage |
| 127 | + |
| 128 | +It is not possible to instantiate the `SingleCoreMutex` type in a static |
| 129 | +variable because it does not implement the `Sync` trait -- this is a soundness |
| 130 | +requirement and not a shortcoming; but existing and upcoming concurrency |
| 131 | +frameworks / libraries can loosen their `Send` bounds into `SingleCoreSend` |
| 132 | +bounds to accept `SingleCoreMutex` values. |
| 133 | + |
| 134 | +The code snippet below shows an hypothetical API to dynamically register |
| 135 | +interrupt handlers. This API uses the `SingleCoreSend` trait instead of the |
| 136 | +`Send` bound because it doesn't enable multi-core concurrency (the registered |
| 137 | +interrupt handlers will run on core that registered the handler). |
| 138 | + |
| 139 | +``` rust |
| 140 | +/// registers an interrupt handler in the vector table of the _calling_ core |
| 141 | +fn register(interrupt: Interrupt, handler: F) |
| 142 | +where |
| 143 | + F: FnMut() + SingleCoreSend + 'static, |
| 144 | + // ^^^^^^^^^^^^^^ |
| 145 | +{ |
| 146 | + // .. |
| 147 | +} |
| 148 | + |
| 149 | +#[entry] |
| 150 | +fn main() { |
| 151 | + // remember: there's a source code level transformation here; `X` and `Y` |
| 152 | + // have type `&'static mut _` |
| 153 | + static mut X: Mutex<Cell<u64>> = Mutex::new(Cell::new(0u64)); |
| 154 | + static mut Y: AtomicU32 = AtomicU32::new(0); |
| 155 | + |
| 156 | + let x: &'static _ = X; // coerces `&'static mut T` into `&'static T` |
| 157 | + |
| 158 | + register(Interrupt::A, || { |
| 159 | + let z: &'static Mutex<Cell<u64>> = x; // `x` is copied here |
| 160 | + // ^^^^^^^^^^^^^^^^^^^^^^^^^ OK; this is a `SingleCoreSend` type |
| 161 | + |
| 162 | + // .. do stuff with `z` .. |
| 163 | + }); |
| 164 | + |
| 165 | + let y: &'static _ = Y; |
| 166 | + |
| 167 | + register(Interrupt::B, || { |
| 168 | + let z: &'static AtomicU32 = y; // `y` is copied here |
| 169 | + // ^^^^^^^^^^^^^^^^^^ OK; this is a `Send` type |
| 170 | + |
| 171 | + // .. do stuff with `z` .. |
| 172 | + }); |
| 173 | + |
| 174 | + loop { |
| 175 | + // do stuff with `x` and `y` |
| 176 | + } |
| 177 | +} |
| 178 | +``` |
| 179 | + |
| 180 | +Here the statically allocated variables, `X` and `Y`, are accessed concurrently |
| 181 | +from `main` and the interrupt handlers `A` and `B`. All accesses are performed |
| 182 | +by the same core so `SingleCoreSend` is the right bound to use and it's OK to |
| 183 | +use the `SingleCoreMutex`. |
| 184 | + |
| 185 | +### Blockers |
| 186 | + |
| 187 | +The `SingleCore*` API depends on the `auto trait` feature, which as of Rust |
| 188 | +1.38.0 is still unstable (gated behind the `optin_builtin_traits` feature gate). |
| 189 | +Thus the implementation of this API will need to wait until the feature is |
| 190 | +stabilized. One could implement this API behind a Cargo feature that requires |
| 191 | +nightly Rust but that would hinder its adoption so this proposal advises against |
| 192 | +doing that. |
| 193 | + |
| 194 | +## Intermediate step |
| 195 | + |
| 196 | +Because the `SingleCore*` API is, time-wise, far away this document proposes, as |
| 197 | +an intermediate step, landing an `unsafe` `RacyMutex` type and deprecating / |
| 198 | +removing `Mutex` *today*. |
| 199 | + |
| 200 | +`RacyMutex` will pretty much replicate the API of `Mutex`, including its `Sync` |
| 201 | +implementation, but will have an `unsafe` constructor. The safety contract of |
| 202 | +the constructor is that the value has to be created in a single-core context; |
| 203 | +only within that context the rest of the `RacyMutex` API will be sound. |
| 204 | + |
| 205 | +The main parts of the `RacyMutex` API are shown below: |
| 206 | + |
| 207 | +``` rust |
| 208 | +// crate: bare-metal |
| 209 | + |
| 210 | +/// A mutex that's racy in multi-core contexts |
| 211 | +pub struct RacyMutex<T> { /* .. */ } |
| 212 | + |
| 213 | +/// IMPORTANT: `RacyMutex` is only `Sync` in a single-core context |
| 214 | +unsafe impl<T> Sync for RacyMutex<T> {} |
| 215 | + |
| 216 | +impl<T> RacyMutex<T> { |
| 217 | + /// Creates a new `RacyMutex` |
| 218 | + /// |
| 219 | + /// # Safety |
| 220 | + /// |
| 221 | + /// By constructing a `RacyMutex` the caller is asserting that the value |
| 222 | + /// will only be used in a single-core context; this is the case, for |
| 223 | + /// example, if the constructor is called in a single-core application. |
| 224 | + /// |
| 225 | + /// Using `RacyMutex` in a multi-core context is unsound. For that reason, |
| 226 | + /// `RacyMutex` must NEVER be used in a general purpose library; it should |
| 227 | + /// only be used in (a) HALs for single-core devices, and (b) applications |
| 228 | + /// for single-core devices. |
| 229 | + pub const unsafe fn new(data: T) -> Self { |
| 230 | + // .. |
| 231 | + } |
| 232 | + |
| 233 | + |
| 234 | + // .. and the rest of the `Mutex` API .. |
| 235 | +} |
| 236 | +``` |
| 237 | + |
| 238 | +### Migration path |
| 239 | + |
| 240 | +Existing applications that use `bare_metal::Mutex`-es in static variables can |
| 241 | +easily migrate to `RacyMutex` by wrapping the constructors in an `unsafe` block |
| 242 | +and replacing `Mutex` with `RacyMutex`. |
| 243 | + |
| 244 | +``` rust |
| 245 | +// this |
| 246 | +// static SHALL_WE_DANCE: Mutex<RefCell<Option<Partner>>> = Mutex::new(/* .. */); |
| 247 | + |
| 248 | +// becomes |
| 249 | +static SHALL_WE_DANCE: RacyMutex<RefCell<Option<Partner>>> = unsafe { |
| 250 | + // NOTE(unsafe): this is a single-core application |
| 251 | + RacyMutex::new(/* .. */) |
| 252 | +}; |
| 253 | + |
| 254 | +#[entry] |
| 255 | +fn main() { |
| 256 | + // .. |
| 257 | +} |
| 258 | + |
| 259 | +#[exception] |
| 260 | +fn SysTick() { |
| 261 | + interrupt::free(|cs| { |
| 262 | + // this stays the same (safe) |
| 263 | + let yes = SHALL_WE_DANCE.borrow(cs).borrow_mut().take().unwrap(); |
| 264 | + // .. |
| 265 | + }); |
| 266 | +} |
| 267 | +``` |
| 268 | + |
| 269 | +# Unresolved questions |
| 270 | + |
| 271 | +- Bikeshed all names |
| 272 | + |
| 273 | +- Should `RacyMutex` be removed once the `SingleCore*` API has been implemented? |
| 274 | + Or should both co-exist side by side? |
| 275 | + |
| 276 | +# Alternatives |
| 277 | + |
| 278 | +- Come up with a interrupt-masking / spinlock hybrid implementation for `Mutex` |
| 279 | + that's both single-core and multi-core `Sync`. |
| 280 | + |
| 281 | +- Remove `bare_metal::Mutex` without replacement. |
0 commit comments