diff --git a/rfcs/0000-acle.md b/rfcs/0000-acle.md
new file mode 100644
index 00000000..4688bb2b
--- /dev/null
+++ b/rfcs/0000-acle.md
@@ -0,0 +1,532 @@
+# Summary
+
+The purpose of this RFC is to provide an API to access to low level, ARM
+specific SIMD and non-SIMD instructions on *stable* Rust.
+
+# Motivation
+
+*TODO* (stable channel, less error prone than writing your own assembly, no
+extra build dependency to assembly external files (e.g `arm-none-eabi-gcc`))
+
+# Detailed design
+
+## ACLE: What and why
+
+*TODO*
+
+> The Arm architecture includes features that go beyond the set of operations
+> available to C/C++ programmers. The intention of the Arm C Language Extensions
+> (ACLE) is to allow the writing of applications and middleware code that is
+> portable across compilers, and across Arm architecture variants, while
+> exploiting the advanced features of the Arm architecture.
+
+## Memory barriers
+
+Reference: Section 7.3 "Memory barriers" of [ACLE].
+
+### API
+
+The following API will be available under `{core,std}::arch::{arm,aarch64}` and
+will be available for both the "arm" and "aarch64" architectures unless
+indicated otherwise (see `target_arch`).
+
+``` rust
+/// Generates a DMB (data memory barrier) instruction or equivalent CP15 instruction.
+///
+/// DMB ensures the observed ordering of memory accesses. Memory accesses of the
+/// specified type issued before the DMB are guaranteed to be observed (in the
+/// specified scope) before memory accesses issued after the DMB.
+///
+/// For example, DMB should be used between storing data, and updating a flag
+/// variable that makes that data available to another core.
+///
+/// The __dmb() intrinsic also acts as a compiler memory barrier of the
+/// appropriate type.
+pub unsafe fn __dmb(arg: A) where A: Dmb { /* .. */ }
+
+/// Generates a DSB (data synchronization barrier) instruction or equivalent
+/// CP15 instruction.
+///
+/// DSB ensures the completion of memory accesses. A DSB behaves as the
+/// equivalent DMB and has additional properties. After a DSB instruction
+/// completes, all memory accesses of the specified type issued before the DSB
+/// are guaranteed to have completed.
+///
+/// The __dsb() intrinsic also acts as a compiler memory barrier of the
+/// appropriate type.
+pub unsafe fn __dsb(arg: A) where A: Dsb { /* .. */ }
+
+/// Generates an ISB (instruction synchronization barrier) instruction or
+/// equivalent CP15 instruction.
+///
+/// This instruction flushes the processor pipeline fetch buffers, so that
+/// following instructions are fetched from cache or memory.
+///
+/// An ISB is needed after some system maintenance operations. An ISB is also
+/// needed before transferring control to code that has been loaded or modified
+/// in memory, for example by an overlay mechanism or just-in-time code
+/// generator. (Note that if instruction and data caches are separate,
+/// privileged cache maintenance operations would be needed in order to unify
+/// the caches.)
+///
+/// The only supported argument for the __isb() intrinsic is 15, corresponding
+/// to the SY (full system) scope of the ISB instruction.
+pub unsafe fn __isb(arg: A) where A: Isb { /* .. */ }
+
+// Arguments to the above intrinsics
+
+/// Full system is the required shareability domain, reads and writes are the
+/// required access types
+pub struct SY;
+
+/// Full system is the required shareability domain, writes are the required
+/// access type
+pub struct ST;
+
+/// Full system is the required shareability domain, reads are the required
+/// access type
+#[cfg(target_arch = "aarch64")]
+pub struct LD;
+
+/// Inner Shareable is the required shareability domain, reads and writes are
+/// the required access types
+pub struct ISH;
+
+/// Inner Shareable is the required shareability domain, reads are the required
+/// access type
+#[cfg(target_arch = "aarch64")]
+pub struct ISHLD;
+
+/// Inner Shareable is the required shareability domain, writes are the required
+/// access type
+pub struct ISHST;
+
+/// Non-shareable is the required shareability domain, reads and writes are the
+/// required access types
+pub struct NSH;
+
+/// Non-shareable is the required shareability domain, reads are the required
+/// access type
+#[cfg(target_arch = "aarch64")]
+pub struct NSHLD;
+
+/// Non-shareable is the required shareability domain, writes are the required
+/// access type
+pub struct NSHST;
+
+/// Outer Shareable is the required shareability domain, reads and writes are
+/// the required access types
+pub struct OSH;
+
+/// Outher Shareable is the required shareability domain, reads are the required
+/// access type
+#[cfg(target_arch = "aarch64")]
+pub struct OSHLD;
+
+/// Outer Shareable is the required shareability domain, writes are the required
+/// access type
+pub struct OSHST;
+
+// The following `struct`s implement the `Dmb` and `Dsb` traits:
+// SY, LD, ST, ISH, ISHLD, ISHST, NSH, NSHST, OSH, OSHLD, OSHST
+//
+// Only SY implements the `Isb` trait
+```
+
+### Example usage
+
+``` rust
+use core::arch::arm::{self, SY}
+
+unsafe {
+ // omitted: write to the SCB peripheral to invalidate some cache
+
+ arm::__dsb(SY);
+ arm::__isb(SY);
+}
+```
+
+In C, this would be written as:
+
+``` c
+// ommitted part
+
+__dsb(0xF);
+__dmb(0xF);
+```
+
+### Implementation
+
+Quoting [ACLE][] (Section 7.3):
+
+> The intrinsics in this section are available for all targets. They may be
+> no-ops (i.e. generate no code, but possibly act as a code motion barrier in
+> compilers) on targets where the relevant instructions do not exist, but only
+> if the property they guarantee would have held anyway. On targets where the
+> relevant instructions exist but are implemented as no-ops, these intrinsics
+> generate the instructions.
+
+Furthermore the table 10.1 in [ACLE] indicates:
+
+| Instruction | Flags | Arch. | Intrinsic or C code |
+| ----------- | ----- | ----- | ------------------- |
+| DMB | | 8, 7, 6-M | __dmb |
+| DSB | | 8, 7, 6-M | __dsb |
+| ISB | | 8, 7, 6-M | __isb |
+
+Where the architecture numbers mean (quoting Section 10.1 of [ACLE] ):
+
+> Architecture 8 means Armv8-A AArch32 and AArch64, 8-32 means Armv8-AArch32
+> only.
+
+> Architecture 7 means Armv7-A and Armv7-R.
+
+> In the sequence of Thumb-only architectures { 6-M, 7-M, 7E-M } each
+> architecture includes its predecessor instruction set.
+
+Thus the memory barriers will be implemented as follows:
+
+``` rust
+pub trait Dmb: sealed::Trait {
+ unsafe fn dmb(&self);
+}
+
+mod sealed {
+ trait Trait {}
+}
+
+pub unsafe fn __dmb(arg: A) where A: Dmb {
+ arg.dmb()
+}
+
+impl Dmb for SY {
+ #[cfg(any(
+ target_feature = "mclass", // 6-M
+ target_feature = "v7", // 7
+ target_arch = "aarch64" // 8
+ ))]
+ unsafe fn dmb(&self) {
+ asm!("dmb 0xF" : : : "memory" : "volatile");
+ }
+
+ #[cfg(not(/* like above */)]
+ unsafe fn dmb(&self) {
+ // No-op but still a compiler barrier because of "memory"
+ asm!("" : : : "memory" : "volatile");
+ }
+}
+```
+
+That is on sub-architectures where the DMB instruction doesn't exist, the
+`__dmb` intrinsic will be equivalent to a compiler barrier.
+
+## Hints
+
+References: Section 7.4 "Hints" and Section "Section 7.7 NOP" of [ACLE].
+
+### API
+
+The following API will be available under `{core,std}::arch::{arm,aarch64}` and
+will be available for both the "arm" and "aarch64" architectures.
+
+``` rust
+/// Generates a WFI (wait for interrupt) hint instruction, or nothing.
+///
+/// The WFI instruction allows (but does not require) the processor to enter a
+/// low-power state until one of a number of asynchronous events occurs.
+pub unsafe fn __wfi() { /* .. */ }
+
+/// Generates a WFE (wait for event) hint instruction, or nothing.
+///
+/// The WFE instruction allows (but does not require) the processor to enter a
+/// low-power state until some event occurs such as a SEV being issued by
+/// another processor.
+pub unsafe fn __wfe() { /* .. */ }
+
+/// Generates a SEV (send a global event) hint instruction.
+///
+/// This causes an event to be signaled to all processors in a multiprocessor
+/// system. It is a NOP on a uniprocessor system.
+pub unsafe fn __sev() { /* .. */ }
+
+/// Generates a send a local event hint instruction.
+///
+/// This causes an event to be signaled to only the processor executing this
+/// instruction. In a multiprocessor system, it is not required to affect the
+/// other processors.
+pub unsafe fn __sevl() { /* .. */ }
+
+/// Generates a YIELD hint instruction.
+///
+/// This enables multithreading software to indicate to the hardware that it is
+/// performing a task, for example a spin-lock, that could be swapped out to
+/// improve overall system performance.
+pub unsafe fn __yield() { /* .. */ }
+
+/// Generates a DBG instruction.
+///
+/// This provides a hint to debugging and related systems. The argument must be
+/// a constant integer from 0 to 15 inclusive. See implementation documentation
+/// for the effect (if any) of this instruction and the meaning of the
+/// argument. This is available only when compliling for AArch32.
+#[rustc_args_required_const(0)]
+pub unsafe fn __dbg(_: u32) { /* .. */ }
+
+/// Generates an unspecified no-op instruction.
+///
+/// Note that not all architectures provide a distinguished NOP instruction. On
+/// those that do, it is unspecified whether this intrinsic generates it or
+/// another instruction. It is not guaranteed that inserting this instruction
+/// will increase execution time.
+pub unsafe fn __nop() { /* .. */ }
+```
+
+### Implementation
+
+Quoting Section 7.4 "Hints" of [ACLE][] :
+
+> The intrinsics in this section are available for all targets. They may be
+> no-ops (i.e. generate no code, but possibly act as a code motion barrier in
+> compilers) on targets where the relevant instructions do not exist. On
+> targets where the relevant instructions exist but are implemented as no-ops,
+> these intrinsics generate the instructions
+
+So like in the implementation of memory barriers these intrinsics will do
+nothing on *some* sub-architectures.
+
+## System register access
+
+Reference: Section 9 "System register access" of [ACLE].
+
+### API
+
+The following API will be available under `{core,std}::arch::{arm,aarch64}` and
+will be available for both the "arm" and "aarch64" architectures.
+
+``` rust
+/// Reads a 32-bit system register
+pub unsafe fn __arm_rsr(special_register: R) -> u32
+where
+ R: Rsr
+{
+ /* .. */
+}
+
+/// Reads a 64-bit system register
+pub unsafe fn __arm_rsr64(special_register: R) -> u64
+where
+ R: Rsr64
+{
+ /* .. */
+}
+
+/// Reads a system register containing an address
+pub unsafe fn __arm_rsrp(special_register: R) -> *const c_void
+where
+ R: Rsrp
+{
+ /* .. */
+}
+
+/// Writes a 32-bit system register
+pub unsafe fn __arm_wsr(special_register: R, value: u32)
+where
+ R: Wsr
+{
+ /* .. */
+}
+
+/// Writes a 64-bit system register
+pub unsafe fn __arm_wsr64(special_register: R, value: u64)
+where
+ R: Wsr64
+{
+ /* .. */
+}
+
+/// Writes a system register containing an address
+pub unsafe fn __arm_wsrp(special_register: R, value: *const c_void)
+where
+ R: Wsrp
+{
+ /* .. */
+}
+```
+
+The values that can be used for the `special_register` argument depend on the
+target subarchitecture and whether the register is a 32-bit register or a 64-bit
+register.
+
+Possible 32-bit system registers (see `__arm_rsr`, `__arm_rsrp`, `__arm_wsr`,
+and `__arm_wsrp`) include:
+
+- The values accepted in the `spec_reg` field of the MRS instruction, for
+ example CPSR. See [ARMARM] for more details.
+
+- The values accepted in the `spec_reg` field of the MSR (immediate)
+ instruction. See [ARMARM] for more details.
+
+- The values accepted in the `spec_reg` field of the VMRS instruction, for
+ example FPSID. See [ARMARM] for more details.
+
+- The values accepted in the `spec_reg` field of the VMSR instruction, for
+ example FPSCR. See [ARMARM] for more details.
+
+- The values accepted in the `spec_reg` field of the MSR and MRS instructions
+ with virtualization extensions, for example ELR_Hyp. See [ARMARM] for more
+ details.
+
+- The values specified in Special register encodings used in Armv7-M system
+ instructions, for example PRIMASK. See [ARMv7M] for more details.
+
+Possible 64-bit system registers (see `__arm_rsr64`, `__arm_rsrp64`,
+`__arm_wsr64`, and `__arm_wsrp64`) include:
+
+- The values accepted in the pstatefield of the MSR (immediate) instruction. See
+ [ARMARMv8] for more details.
+
+### Example usage
+
+``` rust
+use core::arch::arm::{BASEPRI, self};
+
+unsafe {
+ let new_val: u32 = /* .. */;
+ let f = /* some closure */;
+
+ let old_val = arm::__arm_rsr(BASEPRI);
+
+ // start of critical section
+ arm::__arm_wsr(BASEPRI, new_val);
+
+ f();
+
+ // end of critical section
+ arm::__arm_wsr(BASEPRI, old_val);
+}
+```
+
+In C you would write:
+
+``` c
+uint32_t old_val, new_val;
+void* f;
+
+new_val = /* .. */;
+f = /* .. */;
+
+old_val = __arm_rsr("BASEPRI");
+__arm_wsr("BASEPRI", new_val)
+f();
+__arm_wsr("BASEPRI", old_val)
+```
+
+### Implementation
+
+The important part of the implementation is double checking that special
+register `struct`s are only available on the sub-architectures where they are
+physically present.
+
+``` rust
+pub trait Rsr: sealed::Trait {
+ unsafe fn rsr(&self) -> u32;
+}
+
+pub trait Wsr: sealed::Trait {
+ unsafe fn wsr(&self, value: u32);
+}
+
+mod sealed {
+ trait Trait {}
+}
+
+pub unsafe fn __arm_rsr(special_register: R) -> u32
+where
+ R: Rsr
+{
+ special_register.rsr();
+}
+
+pub unsafe fn __arm_wsr(special_register: R, value: u32)
+where
+ R: Wsr
+{
+ special_register.wsr(value)
+}
+
+#[cfg(target_feature = "mclass")]
+pub struct BASEPRI;
+
+#[cfg(target_feature = "mclass")]
+impl Rsr for BASEPRI {
+ fn rsr(&self) -> u32 {
+ let r: u32;
+ asm!("mrs $0, BASEPRI" : "=r"(r) ::: "volatile");
+ r
+ }
+}
+
+#[cfg(target_feature = "mclass")]
+impl Wsr for BASEPRI {
+ fn wsr(&self, value: u32) {
+ asm!("msr BASEPRI, $0" :: "r"(value) : "memory" : "volatile")
+ }
+}
+```
+
+## 32-bit SIMD
+
+Reference: Section 8.5 "32-bit SIMD intrinsics" of [ACLE].
+
+*TODO*, but at least I should tell you that this is *not* about NEON. [ACLE]
+says so in Section 8.5.1 "Availability":
+
+> Armv6 introduced instructions to perform 32-bit SIMD operations (i.e. two
+> 16-bit operations or four 8-bit operations) on the Arm general-purpose
+> registers. These instructions are not related to the much more versatile
+> Advanced SIMD (NEON) extension, whose support is described in Advanced SIMD
+> (NEON) intrinsics.
+
+And in Section 5.4.9 "32-bit SIMD instructions"
+
+> __ARM_FEATURE_SIMD32 is defined to 1 if the 32-bit SIMD instructions are
+> supported and the intrinsics defined in 32-bit SIMD intrinsics are available.
+> This also implies support for the GE global flags which indicate byte-by-byte
+> comparison results.
+
+> __ARM_FEATURE_SIMD32 is deprecated in ACLE 2.0 for A-profile. Users are
+> encouraged to use NEON Intrinscs as an equivalent for the 32-bit SIMD
+> intrinsics functionality. However they are fully supported for M and
+> R-profiles. This is defined for AArch32 only.
+
+So these are mainly Cortex-M and Cortex-R specific intrinsics.
+
+This section will list the SIMD intrinsics to expose in `core::arch::arm` and
+will propose that they are only exposed on `+mclass` and `+rclass` ARMv7
+targets.
+
+# References
+
+## ACLE
+
+[ACLE]: #acle
+
+[ARM C Language Extensions Q2 2018](https://silver.arm.com/download/ARM_and_AMBA_Architecture/AR580-DA-70000-r0p0-06rel0/DDI0403E_c_armv7m_arm.pdf)
+
+## ARMARM
+
+[ARMARM]: #armarm
+
+[ARM Architecture Reference Manual (7-A / 7-R)](https://static.docs.arm.com/ddi0406/c/DDI0406C_C_arm_architecture_reference_manual.pdf)
+
+## ARMv7M
+
+[ARMv7M]: #armv7m
+
+[ARM Architecture Reference Manual (7-M)](https://static.docs.arm.com/ddi0403/eb/DDI0403E_B_armv7m_arm.pdf)
+
+## ARMARMv8
+
+[ARMARMv8]: #armarmv8
+
+[ARMv8-A Reference Manual](https://static.docs.arm.com/ddi0487/ca/DDI0487C_a_armv8_arm.pdf)