Fix fence on non-x86 arch and miri #16

taiki-e · 2022-07-17T13:26:08Z

The problem seems to be that the original author of this code confused fence in the x86 hardware memory model with atomic fence in the C++ memory model. (On x86, lock cmpxchg; mov (load from memory) is fine. See also https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html. On C++ memory model and many architectures, fence for load should be load; fence)

Fixes bevyengine/bevy#5164
FYI @cbeuw

taiki-e · 2022-07-17T13:33:27Z

At least crossbeam and event-listener also have the same issue, but fixing them is probably more complex...

sbarral · 2022-07-19T13:29:10Z

I feel a bit uncomfortable with this commit.

Admittedly, I don't know what exactly is the role of the fence here. This fence does not exist in Dmitry Vyukov's original implementation of the queue, so I guess it was added as part of the modifications that ensure that this queue is linearisable (unlike the original queue).

That being said, if the cross-platform solution is indeed to place the load before the fence (this, I do not know) then I am pretty sure that the intel specialization that uses a lock operation instead of an mfence should also keep the load before.

I did look at https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html but could not see where it states that lock + mov (in this order) is equivalent to mov + mfence. In fact, the latest GCC does use the lock optimization and definitely preserves the order, i.e. mov + lock (see this godbolt: https://godbolt.org/z/o3rYdTvYv).

This reverts commit 54df36a.

RalfJung · 2022-07-26T21:24:08Z

src/lib.rs

@@ -461,7 +464,11 @@ fn full_fence() {
        // x86 platforms is going to optimize this away.
        let a = AtomicUsize::new(0);
        let _ = a.compare_exchange(0, 1, Ordering::SeqCst, Ordering::SeqCst);
+        // On x86, `lock cmpxchg; mov` is fine. See also https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html.
+        load_op()


FWIW, this is still Rust code -- so if Miri complains when running this branch of the code (which I suspect it will, since a SC RMW before a load cannot replace a fence after a load), then this code is still wrong.

When you write Rust code, the hardware memory model is all but irrelevant for program correctness. Only the Rust memory model counts.

EDIT: Oh I see this got reverted in #18.

RalfJung · 2022-07-26T21:25:39Z

src/lib.rs

@@ -461,7 +464,11 @@ fn full_fence() {
        // x86 platforms is going to optimize this away.


The fact that you are hoping that "sane" compilers for particular targets are going to treat the memory model differently, is a big red flag. The memory model is target-independent, and a whole bunch of optimizations run on this code (including its use of atomics) before any target-specific concerns are applied.

Inline assembly is the only correct choice here.

EDIT: Oh I see this got reverted in #18.

RalfJung · 2022-07-26T21:33:26Z

That being said, if the cross-platform solution is indeed to place the load before the fence (this, I do not know) then I am pretty sure that the intel specialization that uses a lock operation instead of an mfence should also keep the load before.

I would usually expect that to be the case -- a relaxed load followed by an acquire-or-stronger fence can induce a synchronization edge. But I don't know the context for this particular code.

Does something break, or perf go down badly, if the fence is moved after the load?

Fix fence on non-x86 arch and miri

32f971c

taiki-e merged commit 54df36a into master Jul 17, 2022

taiki-e deleted the fence branch July 17, 2022 13:33

sbarral mentioned this pull request Jul 19, 2022

Fenced load in bounded queue / v1.2.3 #17

Closed

taiki-e added a commit that referenced this pull request Jul 20, 2022

Revert "Fix fence on non-x86 arch and miri (#16)"

29c59d1

This reverts commit 54df36a.

taiki-e mentioned this pull request Jul 20, 2022

Revert "Fix fence on non-x86 arch and miri (#16)" #18

Merged

taiki-e added a commit that referenced this pull request Jul 20, 2022

Revert "Fix fence on non-x86 arch and miri (#16)"

194cf15

This reverts commit 54df36a.

RalfJung reviewed Jul 26, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix fence on non-x86 arch and miri #16

Fix fence on non-x86 arch and miri #16

Uh oh!

taiki-e commented Jul 17, 2022 •

edited

Loading

Uh oh!

taiki-e commented Jul 17, 2022

Uh oh!

sbarral commented Jul 19, 2022

Uh oh!

RalfJung Jul 26, 2022 •

edited

Loading

Uh oh!

RalfJung Jul 26, 2022 •

edited

Loading

Uh oh!

RalfJung commented Jul 26, 2022

Uh oh!

Uh oh!

		@@ -461,7 +464,11 @@ fn full_fence() {
		// x86 platforms is going to optimize this away.

Fix fence on non-x86 arch and miri #16

Fix fence on non-x86 arch and miri #16

Uh oh!

Conversation

taiki-e commented Jul 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taiki-e commented Jul 17, 2022

Uh oh!

sbarral commented Jul 19, 2022

Uh oh!

RalfJung Jul 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung Jul 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung commented Jul 26, 2022

Uh oh!

Uh oh!

taiki-e commented Jul 17, 2022 •

edited

Loading

RalfJung Jul 26, 2022 •

edited

Loading

RalfJung Jul 26, 2022 •

edited

Loading