diff --git a/rfcs/0000-acle.md b/rfcs/0000-acle.md new file mode 100644 index 00000000..4688bb2b --- /dev/null +++ b/rfcs/0000-acle.md @@ -0,0 +1,532 @@ +# Summary + +The purpose of this RFC is to provide an API to access to low level, ARM +specific SIMD and non-SIMD instructions on *stable* Rust. + +# Motivation + +*TODO* (stable channel, less error prone than writing your own assembly, no +extra build dependency to assembly external files (e.g `arm-none-eabi-gcc`)) + +# Detailed design + +## ACLE: What and why + +*TODO* + +> The Arm architecture includes features that go beyond the set of operations +> available to C/C++ programmers. The intention of the Arm C Language Extensions +> (ACLE) is to allow the writing of applications and middleware code that is +> portable across compilers, and across Arm architecture variants, while +> exploiting the advanced features of the Arm architecture. + +## Memory barriers + +Reference: Section 7.3 "Memory barriers" of [ACLE]. + +### API + +The following API will be available under `{core,std}::arch::{arm,aarch64}` and +will be available for both the "arm" and "aarch64" architectures unless +indicated otherwise (see `target_arch`). + +``` rust +/// Generates a DMB (data memory barrier) instruction or equivalent CP15 instruction. +/// +/// DMB ensures the observed ordering of memory accesses. Memory accesses of the +/// specified type issued before the DMB are guaranteed to be observed (in the +/// specified scope) before memory accesses issued after the DMB. +/// +/// For example, DMB should be used between storing data, and updating a flag +/// variable that makes that data available to another core. +/// +/// The __dmb() intrinsic also acts as a compiler memory barrier of the +/// appropriate type. +pub unsafe fn __dmb(arg: A) where A: Dmb { /* .. */ } + +/// Generates a DSB (data synchronization barrier) instruction or equivalent +/// CP15 instruction. +/// +/// DSB ensures the completion of memory accesses. A DSB behaves as the +/// equivalent DMB and has additional properties. After a DSB instruction +/// completes, all memory accesses of the specified type issued before the DSB +/// are guaranteed to have completed. +/// +/// The __dsb() intrinsic also acts as a compiler memory barrier of the +/// appropriate type. +pub unsafe fn __dsb(arg: A) where A: Dsb { /* .. */ } + +/// Generates an ISB (instruction synchronization barrier) instruction or +/// equivalent CP15 instruction. +/// +/// This instruction flushes the processor pipeline fetch buffers, so that +/// following instructions are fetched from cache or memory. +/// +/// An ISB is needed after some system maintenance operations. An ISB is also +/// needed before transferring control to code that has been loaded or modified +/// in memory, for example by an overlay mechanism or just-in-time code +/// generator. (Note that if instruction and data caches are separate, +/// privileged cache maintenance operations would be needed in order to unify +/// the caches.) +/// +/// The only supported argument for the __isb() intrinsic is 15, corresponding +/// to the SY (full system) scope of the ISB instruction. +pub unsafe fn __isb(arg: A) where A: Isb { /* .. */ } + +// Arguments to the above intrinsics + +/// Full system is the required shareability domain, reads and writes are the +/// required access types +pub struct SY; + +/// Full system is the required shareability domain, writes are the required +/// access type +pub struct ST; + +/// Full system is the required shareability domain, reads are the required +/// access type +#[cfg(target_arch = "aarch64")] +pub struct LD; + +/// Inner Shareable is the required shareability domain, reads and writes are +/// the required access types +pub struct ISH; + +/// Inner Shareable is the required shareability domain, reads are the required +/// access type +#[cfg(target_arch = "aarch64")] +pub struct ISHLD; + +/// Inner Shareable is the required shareability domain, writes are the required +/// access type +pub struct ISHST; + +/// Non-shareable is the required shareability domain, reads and writes are the +/// required access types +pub struct NSH; + +/// Non-shareable is the required shareability domain, reads are the required +/// access type +#[cfg(target_arch = "aarch64")] +pub struct NSHLD; + +/// Non-shareable is the required shareability domain, writes are the required +/// access type +pub struct NSHST; + +/// Outer Shareable is the required shareability domain, reads and writes are +/// the required access types +pub struct OSH; + +/// Outher Shareable is the required shareability domain, reads are the required +/// access type +#[cfg(target_arch = "aarch64")] +pub struct OSHLD; + +/// Outer Shareable is the required shareability domain, writes are the required +/// access type +pub struct OSHST; + +// The following `struct`s implement the `Dmb` and `Dsb` traits: +// SY, LD, ST, ISH, ISHLD, ISHST, NSH, NSHST, OSH, OSHLD, OSHST +// +// Only SY implements the `Isb` trait +``` + +### Example usage + +``` rust +use core::arch::arm::{self, SY} + +unsafe { + // omitted: write to the SCB peripheral to invalidate some cache + + arm::__dsb(SY); + arm::__isb(SY); +} +``` + +In C, this would be written as: + +``` c +// ommitted part + +__dsb(0xF); +__dmb(0xF); +``` + +### Implementation + +Quoting [ACLE][] (Section 7.3): + +> The intrinsics in this section are available for all targets. They may be +> no-ops (i.e. generate no code, but possibly act as a code motion barrier in +> compilers) on targets where the relevant instructions do not exist, but only +> if the property they guarantee would have held anyway. On targets where the +> relevant instructions exist but are implemented as no-ops, these intrinsics +> generate the instructions. + +Furthermore the table 10.1 in [ACLE] indicates: + +| Instruction | Flags | Arch. | Intrinsic or C code | +| ----------- | ----- | ----- | ------------------- | +| DMB | | 8, 7, 6-M | __dmb | +| DSB | | 8, 7, 6-M | __dsb | +| ISB | | 8, 7, 6-M | __isb | + +Where the architecture numbers mean (quoting Section 10.1 of [ACLE] ): + +> Architecture 8 means Armv8-A AArch32 and AArch64, 8-32 means Armv8-AArch32 +> only. + +> Architecture 7 means Armv7-A and Armv7-R. + +> In the sequence of Thumb-only architectures { 6-M, 7-M, 7E-M } each +> architecture includes its predecessor instruction set. + +Thus the memory barriers will be implemented as follows: + +``` rust +pub trait Dmb: sealed::Trait { + unsafe fn dmb(&self); +} + +mod sealed { + trait Trait {} +} + +pub unsafe fn __dmb(arg: A) where A: Dmb { + arg.dmb() +} + +impl Dmb for SY { + #[cfg(any( + target_feature = "mclass", // 6-M + target_feature = "v7", // 7 + target_arch = "aarch64" // 8 + ))] + unsafe fn dmb(&self) { + asm!("dmb 0xF" : : : "memory" : "volatile"); + } + + #[cfg(not(/* like above */)] + unsafe fn dmb(&self) { + // No-op but still a compiler barrier because of "memory" + asm!("" : : : "memory" : "volatile"); + } +} +``` + +That is on sub-architectures where the DMB instruction doesn't exist, the +`__dmb` intrinsic will be equivalent to a compiler barrier. + +## Hints + +References: Section 7.4 "Hints" and Section "Section 7.7 NOP" of [ACLE]. + +### API + +The following API will be available under `{core,std}::arch::{arm,aarch64}` and +will be available for both the "arm" and "aarch64" architectures. + +``` rust +/// Generates a WFI (wait for interrupt) hint instruction, or nothing. +/// +/// The WFI instruction allows (but does not require) the processor to enter a +/// low-power state until one of a number of asynchronous events occurs. +pub unsafe fn __wfi() { /* .. */ } + +/// Generates a WFE (wait for event) hint instruction, or nothing. +/// +/// The WFE instruction allows (but does not require) the processor to enter a +/// low-power state until some event occurs such as a SEV being issued by +/// another processor. +pub unsafe fn __wfe() { /* .. */ } + +/// Generates a SEV (send a global event) hint instruction. +/// +/// This causes an event to be signaled to all processors in a multiprocessor +/// system. It is a NOP on a uniprocessor system. +pub unsafe fn __sev() { /* .. */ } + +/// Generates a send a local event hint instruction. +/// +/// This causes an event to be signaled to only the processor executing this +/// instruction. In a multiprocessor system, it is not required to affect the +/// other processors. +pub unsafe fn __sevl() { /* .. */ } + +/// Generates a YIELD hint instruction. +/// +/// This enables multithreading software to indicate to the hardware that it is +/// performing a task, for example a spin-lock, that could be swapped out to +/// improve overall system performance. +pub unsafe fn __yield() { /* .. */ } + +/// Generates a DBG instruction. +/// +/// This provides a hint to debugging and related systems. The argument must be +/// a constant integer from 0 to 15 inclusive. See implementation documentation +/// for the effect (if any) of this instruction and the meaning of the +/// argument. This is available only when compliling for AArch32. +#[rustc_args_required_const(0)] +pub unsafe fn __dbg(_: u32) { /* .. */ } + +/// Generates an unspecified no-op instruction. +/// +/// Note that not all architectures provide a distinguished NOP instruction. On +/// those that do, it is unspecified whether this intrinsic generates it or +/// another instruction. It is not guaranteed that inserting this instruction +/// will increase execution time. +pub unsafe fn __nop() { /* .. */ } +``` + +### Implementation + +Quoting Section 7.4 "Hints" of [ACLE][] : + +> The intrinsics in this section are available for all targets. They may be +> no-ops (i.e. generate no code, but possibly act as a code motion barrier in +> compilers) on targets where the relevant instructions do not exist. On +> targets where the relevant instructions exist but are implemented as no-ops, +> these intrinsics generate the instructions + +So like in the implementation of memory barriers these intrinsics will do +nothing on *some* sub-architectures. + +## System register access + +Reference: Section 9 "System register access" of [ACLE]. + +### API + +The following API will be available under `{core,std}::arch::{arm,aarch64}` and +will be available for both the "arm" and "aarch64" architectures. + +``` rust +/// Reads a 32-bit system register +pub unsafe fn __arm_rsr(special_register: R) -> u32 +where + R: Rsr +{ + /* .. */ +} + +/// Reads a 64-bit system register +pub unsafe fn __arm_rsr64(special_register: R) -> u64 +where + R: Rsr64 +{ + /* .. */ +} + +/// Reads a system register containing an address +pub unsafe fn __arm_rsrp(special_register: R) -> *const c_void +where + R: Rsrp +{ + /* .. */ +} + +/// Writes a 32-bit system register +pub unsafe fn __arm_wsr(special_register: R, value: u32) +where + R: Wsr +{ + /* .. */ +} + +/// Writes a 64-bit system register +pub unsafe fn __arm_wsr64(special_register: R, value: u64) +where + R: Wsr64 +{ + /* .. */ +} + +/// Writes a system register containing an address +pub unsafe fn __arm_wsrp(special_register: R, value: *const c_void) +where + R: Wsrp +{ + /* .. */ +} +``` + +The values that can be used for the `special_register` argument depend on the +target subarchitecture and whether the register is a 32-bit register or a 64-bit +register. + +Possible 32-bit system registers (see `__arm_rsr`, `__arm_rsrp`, `__arm_wsr`, +and `__arm_wsrp`) include: + +- The values accepted in the `spec_reg` field of the MRS instruction, for + example CPSR. See [ARMARM] for more details. + +- The values accepted in the `spec_reg` field of the MSR (immediate) + instruction. See [ARMARM] for more details. + +- The values accepted in the `spec_reg` field of the VMRS instruction, for + example FPSID. See [ARMARM] for more details. + +- The values accepted in the `spec_reg` field of the VMSR instruction, for + example FPSCR. See [ARMARM] for more details. + +- The values accepted in the `spec_reg` field of the MSR and MRS instructions + with virtualization extensions, for example ELR_Hyp. See [ARMARM] for more + details. + +- The values specified in Special register encodings used in Armv7-M system + instructions, for example PRIMASK. See [ARMv7M] for more details. + +Possible 64-bit system registers (see `__arm_rsr64`, `__arm_rsrp64`, +`__arm_wsr64`, and `__arm_wsrp64`) include: + +- The values accepted in the pstatefield of the MSR (immediate) instruction. See + [ARMARMv8] for more details. + +### Example usage + +``` rust +use core::arch::arm::{BASEPRI, self}; + +unsafe { + let new_val: u32 = /* .. */; + let f = /* some closure */; + + let old_val = arm::__arm_rsr(BASEPRI); + + // start of critical section + arm::__arm_wsr(BASEPRI, new_val); + + f(); + + // end of critical section + arm::__arm_wsr(BASEPRI, old_val); +} +``` + +In C you would write: + +``` c +uint32_t old_val, new_val; +void* f; + +new_val = /* .. */; +f = /* .. */; + +old_val = __arm_rsr("BASEPRI"); +__arm_wsr("BASEPRI", new_val) +f(); +__arm_wsr("BASEPRI", old_val) +``` + +### Implementation + +The important part of the implementation is double checking that special +register `struct`s are only available on the sub-architectures where they are +physically present. + +``` rust +pub trait Rsr: sealed::Trait { + unsafe fn rsr(&self) -> u32; +} + +pub trait Wsr: sealed::Trait { + unsafe fn wsr(&self, value: u32); +} + +mod sealed { + trait Trait {} +} + +pub unsafe fn __arm_rsr(special_register: R) -> u32 +where + R: Rsr +{ + special_register.rsr(); +} + +pub unsafe fn __arm_wsr(special_register: R, value: u32) +where + R: Wsr +{ + special_register.wsr(value) +} + +#[cfg(target_feature = "mclass")] +pub struct BASEPRI; + +#[cfg(target_feature = "mclass")] +impl Rsr for BASEPRI { + fn rsr(&self) -> u32 { + let r: u32; + asm!("mrs $0, BASEPRI" : "=r"(r) ::: "volatile"); + r + } +} + +#[cfg(target_feature = "mclass")] +impl Wsr for BASEPRI { + fn wsr(&self, value: u32) { + asm!("msr BASEPRI, $0" :: "r"(value) : "memory" : "volatile") + } +} +``` + +## 32-bit SIMD + +Reference: Section 8.5 "32-bit SIMD intrinsics" of [ACLE]. + +*TODO*, but at least I should tell you that this is *not* about NEON. [ACLE] +says so in Section 8.5.1 "Availability": + +> Armv6 introduced instructions to perform 32-bit SIMD operations (i.e. two +> 16-bit operations or four 8-bit operations) on the Arm general-purpose +> registers. These instructions are not related to the much more versatile +> Advanced SIMD (NEON) extension, whose support is described in Advanced SIMD +> (NEON) intrinsics. + +And in Section 5.4.9 "32-bit SIMD instructions" + +> __ARM_FEATURE_SIMD32 is defined to 1 if the 32-bit SIMD instructions are +> supported and the intrinsics defined in 32-bit SIMD intrinsics are available. +> This also implies support for the GE global flags which indicate byte-by-byte +> comparison results. + +> __ARM_FEATURE_SIMD32 is deprecated in ACLE 2.0 for A-profile. Users are +> encouraged to use NEON Intrinscs as an equivalent for the 32-bit SIMD +> intrinsics functionality. However they are fully supported for M and +> R-profiles. This is defined for AArch32 only. + +So these are mainly Cortex-M and Cortex-R specific intrinsics. + +This section will list the SIMD intrinsics to expose in `core::arch::arm` and +will propose that they are only exposed on `+mclass` and `+rclass` ARMv7 +targets. + +# References + +## ACLE + +[ACLE]: #acle + +[ARM C Language Extensions Q2 2018](https://silver.arm.com/download/ARM_and_AMBA_Architecture/AR580-DA-70000-r0p0-06rel0/DDI0403E_c_armv7m_arm.pdf) + +## ARMARM + +[ARMARM]: #armarm + +[ARM Architecture Reference Manual (7-A / 7-R)](https://static.docs.arm.com/ddi0406/c/DDI0406C_C_arm_architecture_reference_manual.pdf) + +## ARMv7M + +[ARMv7M]: #armv7m + +[ARM Architecture Reference Manual (7-M)](https://static.docs.arm.com/ddi0403/eb/DDI0403E_B_armv7m_arm.pdf) + +## ARMARMv8 + +[ARMARMv8]: #armarmv8 + +[ARMv8-A Reference Manual](https://static.docs.arm.com/ddi0487/ca/DDI0487C_a_armv8_arm.pdf)