Skip to content

Commit 2c6a52f

Browse files
committed
revise to take feedback into account:
- only guarantee with one unit variant - fieldless enums - remove potentially confusing discussion of what compiler does today - remove discussion of niche values from enums, not important (yet) - generally reorganize layout rules to be by category of enum
1 parent b14035f commit 2c6a52f

File tree

1 file changed

+74
-110
lines changed

1 file changed

+74
-110
lines changed

reference/src/representation/enums.md

Lines changed: 74 additions & 110 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,14 @@ and which parts are still in a "preliminary" state.
77

88
[#10]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/10
99

10-
## Background
10+
## Categories of enums
1111

12-
**C-like enums.** The simplest form of enum is simply a list of
13-
variants:
12+
**Empty enums.** Enums with no variants can never be instantiated and
13+
are equivalent to the `!` type. They do not accept any `#[repr]`
14+
annotations.
15+
16+
**Fieldless enums.** The simplest form of enum is one where none of
17+
the variants have any fields:
1418

1519
```rust
1620
enum SomeEnum {
@@ -19,13 +23,13 @@ enum SomeEnum {
1923
Variant3,
2024
```
2125

22-
Such enums are called "C-like" because they correspond quite closely
23-
with enums in the C language (though there are important differences
24-
as well, covered later). Presuming that they have more than one
25-
variant, these sorts of enums are always represented as a simple integer,
26-
though the size will vary.
26+
Such enums correspond quite closely with enums in the C language
27+
(though there are important differences as well). Presuming that they
28+
have more than one variant, these sorts of enums are always
29+
represented as a simple integer, though the size will vary.
2730

28-
C-like enums may also specify the value of their discriminants explicitly:
31+
Fieldless enums may also specify the value of their discriminants
32+
explicitly:
2933

3034
```rust
3135
enum SomeEnum {
@@ -51,17 +55,9 @@ enum Foo {
5155
}
5256
```
5357

54-
**Option-like enums.** As a special case of data-carrying enums, we
55-
identify "option-like" enums as enums where all of the variants but
56-
one have no fields, and one variant has a single field. The most
57-
common example is `Option` itself. In some cases, as described below,
58-
the compiler may apply special optimization rules to the layout of
59-
option-like enums. The **payload** of an option-like enum is the value
60-
of that single field.
61-
62-
## Enums with a specified representation
58+
## repr annotations accepted on enums
6359

64-
Enums may be annotation using the following `#[repr]` tags:
60+
In general, enums may be annotation using the following `#[repr]` tags:
6561

6662
- A specific integer type (called `Int` as a shorthand below):
6763
- `#[repr(u8)]`
@@ -79,25 +75,36 @@ Enums may be annotation using the following `#[repr]` tags:
7975
- `#[repr(C, u16)]`
8076
- etc
8177

82-
We cover each of the categories below. The layout rules for enums with
83-
explicit `#[repr]` annotations are specified in [RFC 2195][].
78+
Note that manually specifying the alignment using `#[repr(align)]` is
79+
not permitted on an enum.
8480

85-
[RFC 2195]: https://rust-lang.github.io/rfcs/2195-really-tagged-unions.html
81+
The set of repr annotations accepted by an enum depends on its category,
82+
as defined above:
83+
84+
- Empty enums: no repr annotations are permitted.
85+
- Fieldless enums: `#[repr(Int)]`-style and `#[repr(C)]` annotations are permitted, but `#[repr(C, Int)]` annotations are not.
86+
- Data-carrying enums: all repr annotations are permitted.
8687

87-
### Layout of an enum with no variants
88+
## Enum layout rules
8889

89-
An enum with no variants can never be instantiated and is logically
90-
equivalent to the "never type" `!`. Such enums are guaranteed to have
91-
the same layout as `!` (zero size and alignment 1).
90+
The rules for enum layout vary depending on the category.
9291

93-
### Layout of a C-like enum
92+
### Layout of an empty enum
9493

95-
If there is no `#[repr]` attached to a C-like enum, it is guaranteed
96-
to be represented as an integer of sufficient size to store the
97-
discriminants for all possible variants. The size is selected by the
98-
compiler but must be at least a `u8`.
94+
An **empty enum** is an enum with no variants; empty enums can never
95+
be instantiated and are logically equivalent to the "never type"
96+
`!`. `#[repr]` annotations are not accepted on empty enums. Empty
97+
enums are guaranteed to have the same layout as `!` (zero size and
98+
alignment 1).
9999

100-
When a `#[repr(Int)]`-style annotation is attached to a C-like enum
100+
### Layout of a fieldless enum
101+
102+
If there is no `#[repr]` attached to a fieldless enum, it is
103+
guaranteed to be represented as an integer of sufficient size to store
104+
the discriminants for all possible variants. The size is selected by
105+
the compiler but must be at least a `u8`.
106+
107+
When a `#[repr(Int)]`-style annotation is attached to a fieldless enum
101108
(one without any data for its variants), it will cause the enum to be
102109
represented as a simple integer of the specified size `Int`. This must
103110
be sufficient to store all the required discriminant values.
@@ -107,7 +114,7 @@ size as the C compiler would use for the given target for an
107114
equivalent C-enum declaration.
108115

109116
Combining a `C` and `Int` representation (e.g., `#[repr(C, u8)]`) is
110-
not permitted on a C-like enum.
117+
not permitted on a fieldless enum.
111118

112119
The values used for the discriminant will match up with what is
113120
specified (or automatically assigned) in the enum definition. For
@@ -128,12 +135,19 @@ enum Foo {
128135
**Unresolved question:** What about platforms where `-fshort-enums`
129136
are the default? Do we know/care about that?
130137

131-
### Layout for enums that carry data
138+
### Layout of a data-carrying enums with an explicit repr annotation
132139

133-
For enums that carry data, the layout differs depending on whether
134-
C-compatibility is requested or not.
140+
This section concerns data-carrying enums **with an explicit repr
141+
annotation of some form**. The memory layout of such cases was
142+
specified in [RFC 2195][] and is therefore normative.
135143

136-
#### Non-C-compatible layouts
144+
[RFC 2195]: https://rust-lang.github.io/rfcs/2195-really-tagged-unions.html
145+
146+
The layout of data-carrying enums that do **not** have an explicit
147+
repr annotation is generally undefined, but with certain specific
148+
exceptions: see the next section for details.
149+
150+
#### Non-C-compatible representation selected
137151

138152
When an enum is tagged with `#[repr(Int)]` for some integral type
139153
`Int` (e.g., `#[repr(u8)]`), it will be represented as a C-union of a
@@ -176,15 +190,15 @@ Note that the `TwoCasesVariantA` and `TwoCasesVariantB` structs are
176190
appears at offset 0 in both cases, so that we can read it to determine
177191
the current variant.
178192

179-
#### C-compatible layouts.
193+
#### C-compatible representation selected
180194

181195
When the `#[repr]` tag includes `C`, e.g., `#[repr(C)]` or `#[repr(C,
182196
u8)]`, the layout of enums is changed to better match C++ enums. In
183197
this mode, the data is laid out as a tuple of `(discriminant, union)`,
184198
where `union` represents a C union of all the possible variants. The
185199
type of the discriminant will be the integral type specified (`u8`,
186200
etc) -- if no type is specified, then the compiler will select one
187-
based on what a size a C-like enum would have with the same number of
201+
based on what a size a fieldless enum would have with the same number of
188202
variants.
189203

190204
This layout, while more compatible and arguably more obvious, is also
@@ -252,27 +266,26 @@ struct MyEnum {
252266
};
253267
```
254268
255-
## Enums without a specified representation
269+
### Layout of a data-carrying enums without a repr annotation
270+
271+
If no explicit `#[repr]` attribute is used, then the layout of a
272+
data-carrying enum is typically **not specified**. However, in certain
273+
select cases, there are **guaranteed layout optimizations** that may
274+
apply, as described below.
256275
257-
If no explicit `#[repr]` attribute is used, then the layout of most
258-
enums is not specified, with one crucial exception: option-like enums
259-
may in some cases use a compact layout that is identical to their
260-
payload.
276+
#### Discriminant elision on Option-like enums
261277
262278
(Meta-note: The content in this section is not described by any RFC
263279
and is therefore "non-normative".)
264280
265-
### Discriminant elision on Option-like enums
281+
**Definition.** An **option-like enum** is a 2-variant enum where:
266282
267-
**Definition.** An **option-like enum** is an enum which has:
268-
269-
- one variant with a single field,
270-
- other variants with no fields ("unit" variants).
283+
- one variant has a single field, and
284+
- the other variant has no fields (the "unit variant").
271285
272286
The simplest example is `Option<T>` itself, where the `Some` variant
273287
has a single field (of type `T`), and the `None` variant has no
274-
fields. But other enums that fit that same template (and even enums
275-
that include multiple `None`-like fields) fit.
288+
fields. But other enums that fit that same template fit.
276289
277290
**Definition.** The **payload** of an option-like enum is the single
278291
field which it contains; in the case of `Option<T>`, the payload has
@@ -284,15 +297,17 @@ may never be NULL, and hence defines a niche consisting of the
284297
bitstring `0`. Similarly, the standard library types [`NonZeroU8`]
285298
and friends may never be zero, and hence also define the value of `0`
286299
as a niche. (Types that define niche values will say so as part of the
287-
description of their representation invariant.)
300+
description of their representation invariant, which -- as of this
301+
writing -- are the next topic up for discussion in the unsafe code
302+
guidelines process.)
288303
289304
[`NonZeroU8`]: https://doc.rust-lang.org/std/num/struct.NonZeroU8.html
290305
291-
**Option-like enums where the payload defines an adequate number of
292-
niche values are guaranteed to be represented without using any
293-
discriminant at all.** This is called **discriminant elision**. If
294-
discriminant elision is in effect, then the layout of the enum is
295-
equal to the layout of its payload.
306+
**Option-like enums where the payload defines at least one niche value
307+
are guaranteed to be represented using the same memory layout as their
308+
payload.** This is called **discriminant elision**, as there is no
309+
explicit discriminant value stored anywhere. Instead, niche values are
310+
used to represent the unit variant.
296311
297312
The most common example is that `Option<&u8>` can be represented as an
298313
nullable `&u8` reference -- the `None` variant is then represented
@@ -313,64 +328,13 @@ a nullable pointer. FFI interop often depends on this property.
313328
pointer (which is therefore equivalent to a C function pointer) . FFI
314329
interop often depends on this property.
315330
316-
**Example.** Consider the following enum definitions:
331+
**Example.** The following enum definition is **not** option-like,
332+
as it has two unit variants:
317333
318334
```rust
319335
enum Enum1<T> {
320336
Present(T),
321337
Absent1,
322338
Absent2,
323339
}
324-
325-
enum Enum2 {
326-
A, B, C
327-
}
328340
```
329-
330-
`Enum1<&u8>` is not eligible for discriminant elision, since `&u8`
331-
defines a single niche value, but `Enum1` has two unit
332-
variants. However, `Enum2` has only three legal values (0 for `A`, 1
333-
for `B`, and 2 for `C`), and hence defines a plethora of niche values[^caveat].
334-
Therefore, `Enum1<Enum2>` is guaranteed to be laid out the same as
335-
`Enum2` ([consider the results of applying
336-
`size_of`](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=eadff247f2c5713b8f3b6c9cda297711)).
337-
338-
[^caveat]: Strictly speaking, niche values are considered part of the "representation invariant" for an enum and not its type. Therefore, this section is added only as a preview for future unsafe-code-guidelines discussion.
339-
340-
### Other optimizations
341-
342-
The previous section specified a relatively narrow set of layout
343-
optimizations that are **guaranteed** by the compiler. However, the
344-
compiler is always free to perform **more** optimizations than this
345-
minimal set. For example, the compiler presently treats `Result<T,
346-
()>` and `Option<T>` as equivalent, but this behavior is not
347-
guaranteed to continue as `Result<T, ()>` is not considered
348-
"option-like".
349-
350-
As of this writing, the compiler's current behavior is to attempt to
351-
elide discriminants whenever possible. Furthermore, a variant whose
352-
only fields are of zero-size is considered a unit variant for this
353-
purpose. If eliding discriminants is not possible (e.g., because the
354-
payload does not define sufficient niche values), then the compiler
355-
will select an appropriate discriminant size `N` and use a
356-
representation roughly equivalent to `#[repr(N)]`, though without the
357-
strict `#[repr(C)]` guarantees on each struct. However, this behavior
358-
is not guaranteed to remain the same in future versions of the
359-
compiler and should not be relied upon. (While it is not expected that
360-
existing layout optimizations will be removed, it is possible -- it is
361-
also possible for the compiler to introduce new sorts of
362-
optimizations.)
363-
364-
## Niche values
365-
366-
C-like enums with N variants and no specified representation are
367-
guaranteed to supply niche values corresponding to 256 - N (presuming
368-
that is a positive number). This is because a C-like enum must be
369-
represented using an integer and that integer must correspond to a
370-
valid variant: the precise size of C-like enums is not specified but
371-
it must be at least one byte, which means that there are at least 256
372-
possible bitstrings (only N of which are valid).
373-
374-
Other enums -- or enums with a specified representation -- may supply
375-
niches if their representation invariant permits it, but that is not
376-
**guaranteed**.

0 commit comments

Comments
 (0)