-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Endian-aware integer types #3380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you outline why the various std.mem facilities or std.io.Serializer/Deserializer are insufficient? |
They get the job done. But with some help from the type system I thought this could be a good feature to avoid mistakes. Feature was inspired by another language that has this (Odin). |
This was proposed some time ago, at least by me. At the time, IIRC, Andrew pointed to doing this via special packing types on structs instead. That was a nicer way to do this as it allowed for more flexibility and general utility. I'll update this if I can find the issue. It was a long time ago... |
I don't think it's a good idea to enforce endianess in the type system. This will create huge performance penalties when adding foreign-endian integers. Endianess should only be a concern of serialization facilities and you can create pretty good stuff with that already in userland: pub const MyType = struct
{
pub const serializationTags = "fieldLE:be,fieldBE:le"; // for example, this can be read and interpreted by your serializer
fieldLE: i32,
fieldBE: i32
}; |
Does this proposal have any Real Actual Use Cases. Has anyone written code that we can see that would be improved by this feature? (here's an example of a real actual use case for a different feature.) Artificial use cases using "Foo" and "MyType" etc are useful for illustrating what the proposal is, but not why it should be accepted. |
As I mentioned back in October, @andrewrk pointed out that this can be done by a special type of packed struct. @MasterQ32 also has a good point that this can be done by serialization routines. There are two places that I use specific endian-ness in C:
In C I use serialization routines. In Zig it seems like most of this can be done by comptime code generation by introspection of structs etc. |
How would that work? Would it mean that you cannot forget to swap the value?
At face value that seems overly complicated for such a simple thing. How would you envision using comptime code generation to detect this? |
@andrewrk Are you suggesting that this should be done through intrinsics rather than at the type system level? I personally use endian specific types a lot to since many file formats and network formats have a specific endian and it is clear to encode it in the type system than the logic. |
This comment has been minimized.
This comment has been minimized.
There are valid use cases for this feature. If this was implemented, there are even a few places in the std lib that would be updated to take advantage of it. However, I'm closing the issue because:
|
Reopening and accepting. After 5 years using the language, and with the evolution of enums and packed structs, this feature makes more sense. For starters it will be a non-breaking change, and endian-aware integer types will only be creatable via the |
How will this interact with integers whose bit size is not a power-of-two >=8? Such integers do not have well-defined layout, so endianness isn't a particularly meaningful concept to the user. As such, it doesn't really make sense to have distinct e.g.
Also, how does this interact with |
Language currently assumes 8 bits_per_byte. Only integers with bit width evenly divisible by bits_per_byte will support non-native endian. In a post-#3806 world, only the "bag of bits" types would support this; mathematical integers would not support it. Similarly, in status quo, only integers with bit width evenly divisible by bits_per_byte have well-defined in-memory layout. Others such as I think
|
Who are we helping specifically? It sounds like we are only helping ABI use cases and not serialization use-cases. I personally have the following serialization use cases which are still a bit awkward. // user desires easy serializability and zero-copy networking
// user chooses big endian backed packed struct
// unfortunately, user must write fields in reverse order
pub const Header = packed struct(u112be) {
ether_type: EtherType,
src_mac: u48,
dest_mac: u48,
};
// user can deserialize without much fuss from any system:
pub fn deserialize(comptime T: type, bytes: [@divExact(@bitSizeOf(T), 8)]u8) T {
return @bitCast(bytes);
} All we have accomplished for this use case is effectivley tagging the packed struct as big endian for Even with "endianness aware types", its still a bit hoop-jumpy for me to know "whats the actual bits here". I have to first To address the potential counter-argument of "binary serialization formats should not be represented in the type system", please I'm not sure there is a solution for everyone, I can understand that ABI people care more about logical bit order (because historically Perhaps we could simply expose more options to the user to manipulate the effects of field order? For example: pub const Header = packed struct(u112, .first_field_has_lowest_memory_address) {
dest_mac: u48be,
src_mac: u48be,
ether_type: EtherType, // enum(u16be)
}; To be clear, nothing about the status quo is blocking me. As it is right now, I can represent almost any binary format, just in a bit of an awkward way. And definitely in a better way than C. And for some added context, here are a selection of some other real-world structs I have to work with: (They all must be little endian with first field transmitted first over the network. I wonder how much more clear I could express the intent with this proposal!) pub const LoopControlSettings = enum(u2) {
auto = 0,
auto_close,
always_open,
always_closed,
};
pub const DLControlRegisterCompact = packed struct {
forwarding_rule: bool,
temporary_loop_control: bool,
reserved: u6 = 0,
loop_control_port0: LoopControlSettings, // enum(u2)
loop_control_port1: LoopControlSettings, // enum(u2)
loop_control_port2: LoopControlSettings, // enum(u2)
loop_control_port3: LoopControlSettings, // enum(u2)
};
pub const Header = packed struct(u80) {
command: Command, // enum(u8)
idx: u8 = 0,
address: u32,
length: u11,
reserved: u3 = 0,
circulating: bool,
next: bool,
irq: u16,
};
|
Proposal: Add integer types that represents a specific endianness:
Example: Casting a
u32be
tou32
would byte swap automatically if host is little-endian.The text was updated successfully, but these errors were encountered: