Skip to content

langref: improve packed struct memory layout description #21741

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 16, 2025

Conversation

kj4tmp
Copy link
Contributor

@kj4tmp kj4tmp commented Oct 18, 2024

intended to close
#14384
#20647

Questions:

1. I don't know what to do about this TODO:

> TODO update these docs with a recommendation on how to use packed structs with MMIO
> (the use case for volatile packed structs) once this issue is resolved.
> Don't worry, there will be a good solution for this use case in zig.

  1. Does something different happen if the backing integer is specified as unsigned vs signed?

Looks like this:

image

Copy link
Contributor

@rootbeer rootbeer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I have a suggestion and a question, but this is an improvement without addressing either of those.

@kj4tmp
Copy link
Contributor Author

kj4tmp commented Oct 19, 2024

I found an article about MMIO in zig: https://www.scattered-thoughts.net/writing/mmio-in-zig/

However, this was written in 2021, and integer-backed packed structs landed in 2022: #5049 (comment)

I will attempt to tailor the information into some recommendation for MMIO. Right now, I am thinking:

  1. remind user about alignment
  2. tell them how to override the alignment using over-align pointer and @ptrFromInt

But I won't be able to make a doc-test that actually uses @ptrFromInt because it would likely segfault so perhaps it will just be a paragraph

@kj4tmp kj4tmp marked this pull request as draft October 19, 2024 05:25
@kj4tmp kj4tmp marked this pull request as ready for review October 19, 2024 06:28
@kj4tmp
Copy link
Contributor Author

kj4tmp commented Oct 19, 2024

MMIO example needs close review from someone familiar with this use-case.

It is inspired by examples in the microzig framework:
https://github.com/ZigEmbeddedGroup/microzig/blob/1b8dff99cb3e034e853309c165c59f5c25bfa430/core/src/cpus/cortex_m.zig#L40

@rootbeer
Copy link
Contributor

Looks even better to me now! I like the MMIO text and example, and while it makes sense to me, I don't have any particular experience with them to validate it.

In the MMIO example, I'm torn between wanting to see more documentation about the _reserved field (the explicit padding is needed to make the underlying load/store operations not load/store undefined bits) and it not really being in scope for a Language Reference (nor specific to packed structs). AFAICT, this is a "best practice"? Since the semantics are defined by the IO device you're interacting with.

@kj4tmp
Copy link
Contributor Author

kj4tmp commented Oct 19, 2024

Most CPUs (except rare old old ones) cannot individually change bits in memory. They can usually only read / write from the smallest addressable unit of memory, which is usually 1 byte. In this example, a micro controller would typically have a special memory address that can be written to manipulate GPIO (general purpose input and output). This example assumes the user has read the microcontroller datasheet, found an address of a register (0x0123), and the datasheet says the first four bits are for GPIO0 through 3 and the last four bits are not used for anything and should be zero.

The thing to be careful about is not letting the reads / writes from memory to be optimized away (hence the volatile) and to have the current alignment for the point addressing the memory (1 byte in this case).

In the example, the entire byte at address 0x0123 will be overwritten on every call to the writeToGPIO function.

@kj4tmp
Copy link
Contributor Author

kj4tmp commented Oct 19, 2024

Technically I could remove the _reserved and I could remove the align(1) and everything would still likely work, since the alignment of a u4 would be 1 automatically, but I prefer the explicit-ness.

@kj4tmp
Copy link
Contributor Author

kj4tmp commented Nov 30, 2024

added another sentence explicitly mentioning padding due to alignment, since there are questions in the communities a lot about this.

Looks like this right now:

image

Copy link

@mnemnion mnemnion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does a good job of clarifying some aspects of packed structs, but there are a few parts I think should be trimmed down, they're more appropriate to a tutorial than a langref.

<li>The backing integer has the same bit width as the fields' total bit width.</li>
<li>The backing integer is subject to the same rules as any integer, including {#link|alignment|Alignment#},
having a maximum bit count of 65535, and the host endianness.
On a big endian system, the first declared field will have the highest memory address, and on a little endian system, the lowest.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this part of the documentation is a good place to document how endianness works. This isn't really a documentation of what packed structs are.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the harm in having this information here.
It also isn't just documentation "how endianness works", but how it interacts with struct layout here. The language used to be specified/implemented differently in this regard, and this point is often asked about (see inciting issue #20647 - andrew didn't close it, but reflagged it from bug to documentation issue ).
Ultimately the wording and scope is up to andrew to decide. I don't think trimming it now (on anyone else's opinion) is helpful.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GitHub comment system makes this unclear because of the context window: I was only referring to line 2188 here. Endianness is not really part of the user interface of packed structs, it's a fact about the backing integers.

The point of review is to come up with a better patch, so I disagree that it isn't helpful to do it. However, given the lack of consensus on this point, I think it's fine to err on the side of leaving it in.

</li>
<li>{#syntax#}bool{#endsyntax#} fields use exactly 1 bit.</li>
<li>Due to {#link|alignment|Alignment#}, the backing integer may require more memory that its bit width. {#link|@bitSizeOf|@bitSizeOf#} and {#link|@sizeOf|@sizeOf#} can interrogate the difference.</li>
<li>Field access and assignment can be understood as shorthand for bitshifts on the backing integer.</li>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one should be left off. The compiler has many options for implementing packed struct operations, many don't involve bitshifting at all.

How packed structs are translated into machine code is an implementation detail which doesn't need to be analogized here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is most relevant to the MMIO case, perhaps I could move it down there? See recent comments on this PR about how cpus cannot change individual bits in memory without a load/store of complete bytes.

I tried to keep it vague (using "can be understood as") because it makes sense that the compiler could optimize field assignments inside very large packed structs to only affect the changed memory, as you said, and because I do not know the exact specifics of the current or expected behavior when volatile is used.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem I see is that it isn't a complete description. For instance pack.field3 = true might be implemented as backing_int | 0b0100, that's a mask, not a shift. The documentation is a user guide: a complete description of the implementation of packed struct operations would be inappropriate, so I don't think a partial analogy is correct here either.

Compare to the integer literal documentation. It does say that signed values are always two's complement, that's useful to know: the C standard, for instance, does not (or didn't until quite recently) specify an integer representation. But it doesn't say anything about how the Zig compiler implements large integers like u256, because while that's interesting, it isn't user documentation.

As for moving the idea to the MMIO case: yeah that makes perfect sense to me. A comment over a specific operation in the sample code saying "this is equivalent to {stuff}" which shows a mask, bitshift, whatever is correct, that's helpful, because it can be completely correct for the commented line, and illustrates the connection between packed struct field access and the manual-transmission way of doing these things in other low-level languages. So I'd say that's a good place for it.

By saying it's equivalent, it's also not a statement about what the implementation happens to do. Keeping implementation details out of the documentation is important, because it's no longer a detail if it's documented.

(the use case for volatile packed structs) once this issue is resolved.
Don't worry, there will be a good solution for this use case in zig.
Packed structs can be used to interact with memory-mapped input-output (MMIO), which is
common in embedded applications. A pointer of the correct alignment and address to a packed struct

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having an illustration of MMIO is on balance good, but there's too much detail here. I'd cut this to:

Packed structs can be used to interact with memory-mapped input-output (MMIO):

Followed by the illustration code. If you think the detail about the pointer alignment and address is important for understanding the code snippet (perhaps) it could be a comment in the code snippet.

@kj4tmp
Copy link
Contributor Author

kj4tmp commented Dec 9, 2024

For me to provide a truly adequate MMIO example (to satisfy the TODO), I need to know the following:

  1. Are field loads / stores of a volatile pointer to a packed struct guaranteed to load/store the entire backing integer, or only just enough for the field?
  2. Is pointer dereference operator on a volatile pointer (my_pointer.*) guaranteed to load / store the entire backing integer?

@kj4tmp
Copy link
Contributor Author

kj4tmp commented Jan 19, 2025

Regarding impending love to the PR queue:

As it stands right now, I think this PR is a strict improvement to the status quo and is mergable. If a core team member has feedback I am happy to implement it.
The adequacy of the MMIO example is a little language- specification-y (semantics of volatile pointers to packed structs) and thus might better be described by a core team member.
I could make an attempt but it would just be me interpreting the current behavior with godbolt.

As far as I know, most user code is careful to read/write the entire packed struct and not individual fields. For example: https://github.com/ZigEmbeddedGroup/microzig/blob/cf666d6137b85dfa5353b275c1b46b421f646f3f/core/src/mmio.zig#L47

@andrewrk andrewrk force-pushed the langref-packed-structs branch from 853a66d to 2c30e85 Compare April 11, 2025 21:17
@andrewrk andrewrk enabled auto-merge April 11, 2025 21:18
@andrewrk andrewrk force-pushed the langref-packed-structs branch from 2c30e85 to bd38c41 Compare April 16, 2025 02:19
@andrewrk andrewrk merged commit 3746b3d into ziglang:master Apr 16, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants