Skip to content

Endian-aware integer types #3380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
oskarnp opened this issue Oct 4, 2019 · 14 comments
Open

Endian-aware integer types #3380

oskarnp opened this issue Oct 4, 2019 · 14 comments
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@oskarnp
Copy link

oskarnp commented Oct 4, 2019

Proposal: Add integer types that represents a specific endianness:

i16le
u16le
i32le
u32le
...

i16be
u16be
i32be
u32be
...

Example: Casting a u32be to u32 would byte swap automatically if host is little-endian.

@tgschultz
Copy link
Contributor

Can you outline why the various std.mem facilities or std.io.Serializer/Deserializer are insufficient?

@oskarnp
Copy link
Author

oskarnp commented Oct 4, 2019

Can you outline why the various std.mem facilities or std.io.Serializer/Deserializer are insufficient?

They get the job done. But with some help from the type system I thought this could be a good feature to avoid mistakes. Feature was inspired by another language that has this (Odin).

@kyle-github
Copy link

This was proposed some time ago, at least by me. At the time, IIRC, Andrew pointed to doing this via special packing types on structs instead. That was a nicer way to do this as it allowed for more flexibility and general utility.

I'll update this if I can find the issue. It was a long time ago...

@ikskuh
Copy link
Contributor

ikskuh commented Oct 7, 2019

I don't think it's a good idea to enforce endianess in the type system. This will create huge performance penalties when adding foreign-endian integers.

Endianess should only be a concern of serialization facilities and you can create pretty good stuff with that already in userland:

pub const MyType = struct
{
    pub const serializationTags = "fieldLE:be,fieldBE:le"; // for example, this can be read and interpreted by your serializer 

    fieldLE: i32,
    fieldBE: i32
};

@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Oct 9, 2019
@andrewrk andrewrk added this to the 0.7.0 milestone Oct 9, 2019
@thejoshwolfe
Copy link
Contributor

Does this proposal have any Real Actual Use Cases. Has anyone written code that we can see that would be improved by this feature? (here's an example of a real actual use case for a different feature.)

Artificial use cases using "Foo" and "MyType" etc are useful for illustrating what the proposal is, but not why it should be accepted.

@kyle-github
Copy link

As I mentioned back in October, @andrewrk pointed out that this can be done by a special type of packed struct. @MasterQ32 also has a good point that this can be done by serialization routines.

There are two places that I use specific endian-ness in C:

  1. binary files with fixed formats.
  2. network protocols (IP is big-endian, others such as some industrial protocols can be little-endian).

In C I use serialization routines. In Zig it seems like most of this can be done by comptime code generation by introspection of structs etc.

@Tetralux
Copy link
Contributor

@kyle-github

by a special type of packed struct

How would that work? Would it mean that you cannot forget to swap the value?

In Zig it seems like most of this can be done by comptime code generation by introspection of structs etc.

At face value that seems overly complicated for such a simple thing. How would you envision using comptime code generation to detect this?

@gingerBill
Copy link

gingerBill commented Apr 15, 2020

@andrewrk Are you suggesting that this should be done through intrinsics rather than at the type system level?

I personally use endian specific types a lot to since many file formats and network formats have a specific endian and it is clear to encode it in the type system than the logic.

@Tetralux

This comment has been minimized.

@andrewrk
Copy link
Member

There are valid use cases for this feature. If this was implemented, there are even a few places in the std lib that would be updated to take advantage of it. However, I'm closing the issue because:

  • The alternative (status quo) to those use cases is acceptable, in terms of robustness, maintainability, and generated code quality.
  • This would increase the language size.

@andrewrk
Copy link
Member

andrewrk commented Mar 9, 2025

Reopening and accepting. After 5 years using the language, and with the evolution of enums and packed structs, this feature makes more sense.

For starters it will be a non-breaking change, and endian-aware integer types will only be creatable via the @Type builtin.

@andrewrk andrewrk reopened this Mar 9, 2025
@andrewrk andrewrk modified the milestones: 0.7.0, 0.15.0 Mar 9, 2025
@andrewrk andrewrk added the accepted This proposal is planned. label Mar 9, 2025
@mlugg
Copy link
Member

mlugg commented Mar 9, 2025

How will this interact with integers whose bit size is not a power-of-two >=8? Such integers do not have well-defined layout, so endianness isn't a particularly meaningful concept to the user. As such, it doesn't really make sense to have distinct e.g. u4le and u4be (whatever the equivalent @Type/@Int construction is). Perhaps std.builtin.Type.Int has endianness: ?Endianness, where:

  • @typeInfo sets that to null iff the integer has ill-defined layout
  • Reification of an integer with well-defined layout either a) disallows null or b) interprets null as "native"
  • Reification of an integer with ill-defined layout requires this field to be null

Also, how does this interact with packed struct? My assumption is that, since packed structs are concerned only with the bit-level representation of values, u32le and u32be will act identically in a packed struct (although of course, the distinction still matters when you load the field).

@andrewrk
Copy link
Member

Language currently assumes 8 bits_per_byte.

Only integers with bit width evenly divisible by bits_per_byte will support non-native endian. In a post-#3806 world, only the "bag of bits" types would support this; mathematical integers would not support it.

Similarly, in status quo, only integers with bit width evenly divisible by bits_per_byte have well-defined in-memory layout. Others such as u3 do not. However all status quo integer types have a logical bit layout, which is why they are all allowed inside packed structs.

I think std.builtin.Type.Int would have endian: Endian = native_endian,. I don't see any reason to introduce a third state, even for integer types without well-defined memory layout.

packed struct and @bitCast are intrinsically related; they both deal with logical bit layout. Endianness does not affect logical bit layout. This difference between memory layout and logical bit layout is one of the key benefits of having this in the type system. In practice, users will find it useful to choose the endianness carefully for the backing integer of packed structs and enums, when such things correspond to some ABI.

@kj4tmp
Copy link
Contributor

kj4tmp commented Mar 10, 2025

Who are we helping specifically? It sounds like we are only helping ABI use cases and not serialization use-cases.

I personally have the following serialization use cases which are still a bit awkward.

// user desires easy serializability and zero-copy networking
// user chooses big endian backed packed struct
// unfortunately, user must write fields in reverse order
pub const Header = packed struct(u112be) {
    ether_type: EtherType,
    src_mac: u48,
    dest_mac: u48,
};

// user can deserialize without much fuss from any system:
pub fn deserialize(comptime T: type, bytes: [@divExact(@bitSizeOf(T), 8)]u8) T {
    return @bitCast(bytes);
}

All we have accomplished for this use case is effectivley tagging the packed struct as big endian for
serialization.

Even with "endianness aware types", its still a bit hoop-jumpy for me to know "whats the actual bits here". I have to first
think about logical bit order, and then do the conversions to in-memory layout.

To address the potential counter-argument of "binary serialization formats should not be represented in the type system", please
consider that most ubiquitous networking formats were designed in consideration of their ease of implementation
in C, but they are still arduous in C and Zig could possibly do better than C here (and already does with packed
structs).

I'm not sure there is a solution for everyone, I can understand that ABI people care more about logical bit order (because historically
bit-mask flags have been done using logical bit shifts), but for people with problems like "I need to adhere to this bit-wise spec of behavior regardless of host endianness" its still a bit awkward (see reversed field order for big endian protocols).

Perhaps we could simply expose more options to the user to manipulate the effects of field order? For example:

pub const Header = packed struct(u112, .first_field_has_lowest_memory_address) {
    dest_mac: u48be,
    src_mac: u48be,
    ether_type: EtherType, // enum(u16be)
};

To be clear, nothing about the status quo is blocking me. As it is right now, I can represent almost any binary format, just in a bit of an awkward way. And definitely in a better way than C.

And for some added context, here are a selection of some other real-world structs I have to work with:

(They all must be little endian with first field transmitted first over the network. I wonder how much more clear I could express the intent with this proposal!)

pub const LoopControlSettings = enum(u2) {
    auto = 0,
    auto_close,
    always_open,
    always_closed,
};

pub const DLControlRegisterCompact = packed struct {
    forwarding_rule: bool,
    temporary_loop_control: bool,
    reserved: u6 = 0,
    loop_control_port0: LoopControlSettings, // enum(u2)
    loop_control_port1: LoopControlSettings, // enum(u2)
    loop_control_port2: LoopControlSettings, // enum(u2)
    loop_control_port3: LoopControlSettings, // enum(u2)
};

pub const Header = packed struct(u80) {
    command: Command, // enum(u8)
    idx: u8 = 0,
    address: u32,
    length: u11,
    reserved: u3 = 0,
    circulating: bool,
    next: bool,
    irq: u16,
};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

10 participants