Skip to content

Commit e20af0d

Browse files
committed
Create a blog on changes to 128-bit integers
1 parent 5c9d70b commit e20af0d

File tree

1 file changed

+298
-0
lines changed

1 file changed

+298
-0
lines changed
Lines changed: 298 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,298 @@
1+
---
2+
layout: post
3+
title: "Changes to `u128`/`i128` layout in 1.77 and 1.78"
4+
author: Trevor Gross
5+
team: Lang
6+
---
7+
8+
Rust has long had an inconsistency with C regarding the alignment of 128-bit integers.
9+
This problem has recently been resolved, but the fix comes with some effects that are
10+
worth being aware of.
11+
12+
As a user, you most likely do not need to worry about these changes unless you are:
13+
14+
1. Assuming the alignment of `i128`/`u128` rather than using `align_of`
15+
1. Ignoring the `improper_ctypes*` lints and using these types in FFI
16+
17+
There are also no changes to architectures other than x86-32 and x86-64. If your
18+
code makes heavy use of 128-bit integers, you may notice runtime performance increases
19+
at a possible cost of additional memory use.
20+
21+
This post is intended to clarify what changed, why it changed, and what to expect. If
22+
you are only looking for a compatibility matrix, jump to the
23+
[Compatibility](#compatibility) section.
24+
25+
# Background
26+
27+
Data types have two intrinsic values that relate to how they can be arranged in memory;
28+
size and alignment. A type's size is the amount of space it takes up in memory, and its
29+
alignment specifies which addresses it is allowed to be placed at.
30+
31+
The size of simple types like primitives is usually unambiguous, being the exact size of
32+
the data they represent with no padding (unused space). For example, an `i64` always has
33+
a size of 64 bits or 8 bytes.
34+
35+
Alignment, however, can seem less consistent. An 8-byte integer _could_ reasonably be
36+
stored at any memory address (1-byte aligned), but most 64-bit computers will get the
37+
best performance if it is instead stored at a multiple of 8 (8-byte aligned). So, like
38+
in other languages, primitives in Rust have this most efficient alignment by default.
39+
The effects of this can be seen when creating composite types: [^composite-playground]
40+
41+
```rust=
42+
use core::mem::{align_of, offset_of};
43+
44+
#[repr(C)]
45+
struct Foo {
46+
a: u8, // 1-byte aligned
47+
b: u16, // 2-byte aligned
48+
}
49+
50+
#[repr(C)]
51+
struct Bar {
52+
a: u8, // 1=byte aligned
53+
b: u64, // 8-byte aligned
54+
}
55+
56+
println!("Offset of b (u16) in Foo: {}", offset_of!(Foo, b));
57+
println!("Alignment of Foo: {}", align_of::<Foo>());
58+
println!("Offset of b (u64) in Bar: {}", offset_of!(Bar, b));
59+
println!("Alignment of Bar: {}", align_of::<Bar>());
60+
```
61+
62+
Output:
63+
64+
```text
65+
Offset of b (u16) in Foo: 2
66+
Alignment of Foo: 2
67+
Offset of b (u64) in Bar: 8
68+
Alignment of Bar: 8
69+
```
70+
71+
We see that within a struct, a type will always be placed such that its offset is a
72+
multiple of its alignment.
73+
74+
These numbers are not arbitrary; the application binary interface (ABI) says what they
75+
should be. In the x86-64 [psABI] (processor-specific ABI) for System V (Unix & Linux),
76+
_Figure 3.1: Scalar Types_ tells us exactly how primitives should be represented:
77+
78+
| C type | Rust equivalent | `sizeof` | Alignment (bytes) |
79+
| ---------------- | --------------- | -------- | ----------------- |
80+
| `char` | `i8` | 1 | 1 |
81+
| `unsigned char` | `u8` | 1 | 1 |
82+
| `short` | `i16` | 2 | 2 |
83+
| `unsigned short` | `u16` | 2 | 2 |
84+
| `long` | `i64` | 8 | 8 |
85+
| `unsigned long` | `u64` | 8 | 8 |
86+
87+
The ABI only specifies C types, but Rust follows the same definitions both for
88+
compatibility and for the performance benefits.
89+
90+
# The Incorrect Alignment Problem
91+
92+
It is easy to imagine that if two implementations disagree on the alignment of a data
93+
type, they would not be able to reliably share data containing that type. Well...
94+
95+
```rust=
96+
println!("alignment of i128: {}", align_of::<i128>());
97+
```
98+
99+
```text=
100+
// rustc 1.76.0
101+
alignment of i128: 8
102+
```
103+
104+
```c=
105+
printf("alignment of __int128: %zu\n", _Alignof(__int128));
106+
```
107+
108+
```text=
109+
// gcc 13.2
110+
alignment of __int128: 16
111+
112+
// clang 17.0.1
113+
alignment of __int128: 16
114+
```
115+
116+
Looks like Rust disagrees![^align-godbolt] Looking back at the [psABI], we can see that
117+
Rust indeed is in the wrong here:
118+
119+
| C type | Rust equivalent | `sizeof` | Alignment (bytes) |
120+
| ------------------- | --------------- | -------- | ----------------- |
121+
| `__int128` | `i128` | 16 | 16 |
122+
| `unsigned __int128` | `u128` | 16 | 16 |
123+
124+
It turns out this isn't because of something that Rust is actively doing incorrectly:
125+
layout of primitives comes from the LLVM codegen backend used by both Rust and Clang,
126+
among other languages, and it has the alignment for `i128` hardcoded to 8 bytes.
127+
128+
Clang does not have this issue only because of a workaround, where the alignment is
129+
manually set to 16 bytes before handing the type to LLVM. This fixes the layout issue
130+
but has been the source of some other minor problems.[^f128-segfault][^va-segfault]
131+
Rust does no such manual adjustement, hence the issue reported at
132+
<https://github.com/rust-lang/rust/issues/54341>.
133+
134+
# The Calling Convention Problem
135+
136+
It happens that there an additional problem: LLVM does not always do the correct thing
137+
when passing 128-bit integers as function arguments. This was a [known issue in LLVM],
138+
before its [relevance to Rust was discovered].
139+
140+
When calling a function, the arguments get passed in registers until there are no more
141+
slots, then they get "spilled" to the stack. The ABI tells us what to do here as well,
142+
in the section _3.2.3 Parameter Passing_:
143+
144+
> Arguments of type `__int128` offer the same operations as INTEGERs, yet they do not
145+
> fit into one general purpose register but require two registers. For classification
146+
> purposes `__int128` is treated as if it were implemented as:
147+
>
148+
> ```c
149+
> typedef struct {
150+
> long low, high;
151+
> } __int128;
152+
> ```
153+
>
154+
> with the exception that arguments of type `__int128` that are stored in memory must be
155+
> aligned on a 16-byte boundary.
156+
157+
We can try this out by implementing the calling convention manually. In the below C
158+
example, inline assembly is used to call `foo(0xaf, val, val, val)` with `val` as
159+
`0x0x11223344556677889900aabbccddeeff`.
160+
161+
x86-64 uses the registers `rdi`, `rsi`, `rdx`, `rcx`, `r8`, and `r9` to pass function
162+
arguments, in that order (you guessed it, this is also in the ABI). Each argument
163+
fits a word (64 bits), and anything that doesn't fit gets `push`ed to the
164+
stack.
165+
166+
```c=
167+
/* full example at https://godbolt.org/z/zGaK1T96c */
168+
169+
/* to see the issue, we need a padding value to "mess up" argument alignment */
170+
void foo(char pad, __int128 a, __int128 b, __int128 c) {
171+
printf("%#x\n", pad & 0xff);
172+
print_i128(a);
173+
print_i128(b);
174+
print_i128(c);
175+
}
176+
177+
int main() {
178+
asm(
179+
"movl $0xaf, %edi \n\t" /* 1st slot (edi): padding char */
180+
"movq $0x9900aabbccddeeff, %rsi \n\t" /* 2rd slot (rsi): lower half of `a` */
181+
"movq $0x1122334455667788, %rdx \n\t" /* 3nd slot (rdx): upper half of `a` */
182+
"movq $0x9900aabbccddeeff, %rcx \n\t" /* 4th slot (rcx): lower half of `b` */
183+
"movq $0x1122334455667788, %r8 \n\t" /* 5th slot (r8): upper half of `b` */
184+
"movq $0xdeadbeef4c0ffee0, %r9 \n\t" /* 6th slot (r9): should be unused, but
185+
* let's trick clang! */
186+
187+
/* reuse our stored registers to load the stack */
188+
"pushq %rdx \n\t" /* upper half of `c` gets passed on the stack */
189+
"pushq %rsi \n\t" /* lower half of `c` gets passed on the stack */
190+
"call foo \n\t" /* call the function */
191+
"addq $16, %rsp \n\t" /* reset the stack */
192+
);
193+
}
194+
```
195+
196+
Running the above with GCC prints the following expected output:
197+
198+
```
199+
0xaf
200+
0x11223344556677889900aabbccddeeff
201+
0x11223344556677889900aabbccddeeff
202+
0x11223344556677889900aabbccddeeff
203+
```
204+
205+
But running with Clang 17 prints:
206+
207+
```
208+
0xaf
209+
0x11223344556677889900aabbccddeeff
210+
0x11223344556677889900aabbccddeeff
211+
0x9900aabbccddeeffdeadbeef4c0ffee0
212+
```
213+
214+
Surprise!
215+
216+
This illustrates the second problem: LLVM expects an `i128` to be passed half in a
217+
register and half on the stack, but this is not allowed by the ABI.
218+
219+
Since this comes from LLVM and has no reasonable workaround, this is a problem in
220+
both Clang and Rust.
221+
222+
# Solutions
223+
224+
Getting these problems resolved was a lengthy effort by many people, starting with a
225+
patch by compiler team member Simonas Kazlauskas in 2017: [D28990]. Unfortunately,
226+
this wound up reverted. It was later attempted again in [D86310] by LLVM contributor
227+
Harald van Dijk, which is the version that finally landed in October 2023.
228+
229+
Around the same time, Nikita Popov fixed the calling convention issue with [D158169].
230+
Both of these changes made it into LLVM 18, meaning all relevant ABI issues will be
231+
resolved in both Clang and Rust that use this version (Clang 18 and Rust 1.78 when using
232+
the bundled LLVM).
233+
234+
However, `rustc` can also use the version of LLVM installed in the system rather than a
235+
bundled version, which may be older. To mitigate the change of problems from differing
236+
alignment with the same `rustc` version, [a proposal] was introduced to manually
237+
correct the alignment, like Clang has been doing. This was implemented by Matthew Maurer
238+
in [#11672].
239+
240+
As mentioned above, part of the reason for an ABI to specify the alignment of a datatype
241+
is because it is more efficient on that architecture. We actually got to see that
242+
firsthand: the [initial performance run] with the manual alignment change showed
243+
nontrivial improvements to compiler performance (which relies heavily on 128-bit
244+
integers to store integer literals). The downside of increasing alignment is that
245+
composite types do not always fit together as nicely in memory, leading to an increase
246+
in usage. Unfortunately this meant some of the performance wins needed to be sacrificed
247+
to avoid an increased memory footprint.
248+
249+
[a proposal]: https://github.com/rust-lang/compiler-team/issues/683
250+
[#11672]: https://github.com/rust-lang/rust/pull/116672/
251+
[D158169]: https://reviews.llvm.org/D158169
252+
[D28990]: https://reviews.llvm.org/D28990
253+
[D86310]: https://reviews.llvm.org/D86310
254+
255+
# Compatibilty
256+
257+
The most imporant question is how compatibility changed as a result of these fixes. In
258+
short, `i128` and `u128` with Rust using LLVM 18 (the default version starting with
259+
1.78) will be completely compatible with any version of GCC, as well as Clang 18 and
260+
above (released March 2024). All other combinations have some incompatible cases, which
261+
are summarized in the table below:
262+
263+
| Compiler 1 | Compiler 2 | status |
264+
| ---------------------------------- | ------------------- | ----------------------------------- |
265+
| Rust ≥ 1.78 with bundled LLVM (18) | GCC (any version) | Fully compatible |
266+
| Rust ≥ 1.78 with bundled LLVM (18) | Clang ≥ 18 | Fully compatible |
267+
| Rust ≥ 1.77 with LLVM ≥ 18 | GCC (any version) | Fully compatible |
268+
| Rust ≥ 1.77 with LLVM ≥ 18 | Clang ≥ 18 | Fully compatible |
269+
| Rust ≥ 1.77 with LLVM ≥ 18 | Clang \< 18 | Storage compatible, has calling bug |
270+
| Rust ≥ 1.77 with LLVM \< 18 | GCC (any version) | Storage compatible, has calling bug |
271+
| Rust ≥ 1.77 with LLVM \< 18 | Clang (any version) | Storage compatible, has calling bug |
272+
| Rust \< 1.77[^l] | GCC (any version) | Incompatible |
273+
| Rust \< 1.77[^l] | Clang (any version) | Incompatible |
274+
| GCC (any version) | Clang ≥ 18 | Fully compatible |
275+
| GCC (any version) | Clang \< 18 | Storage compatible with calling bug |
276+
277+
[^l]: Rust < 1.77 with LLVM 18 will have some degree of compatibility, this is just
278+
an uncommon combination.
279+
280+
# Effects & Future Steps
281+
282+
As mentioned in the introduction, most users will see no effects of this change
283+
unless you are already doing something questionable with these types.
284+
285+
Starting with Rust 1.77, it will be reasonably safe to start experimenting with
286+
128-bit integers in FFI, with some more certainty coming with the LLVM update
287+
in 1.78. There is [ongoing discussion] about lifting the lint in an upcoming
288+
version, but it remains to be seen when that will actually happen.
289+
290+
[relevance to Rust was discovered]: https://github.com/rust-lang/rust/issues/54341#issuecomment-1064729606
291+
[initial performance run]: https://github.com/rust-lang/rust/pull/116672/#issuecomment-1858600381
292+
[known issue in llvm]: https://github.com/llvm/llvm-project/issues/41784
293+
[psabi]: https://www.uclibc.org/docs/psABI-x86_64.pdf
294+
[ongoing discussion]: https://github.com/rust-lang/lang-team/issues/255
295+
[^align-godbolt]: https://godbolt.org/z/h94Ge1vMW
296+
[^composite-playground]: https://play.rust-lang.org/?version=beta&mode=debug&edition=2021&gist=c263ae121912284d3ba553290caa6778
297+
[^va-segfault]: https://github.com/llvm/llvm-project/issues/20283
298+
[^f128-segfault]: https://bugs.llvm.org/show_bug.cgi?id=50198

0 commit comments

Comments
 (0)