-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Allow specifying the type of a slice's length #14926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Upon first glance, there are two things that pop into my head:
|
i don't think this fits into Zig. if we had things like short pointers and operator overloading then maybe, but at that point Zig has lost a lot of its simplicity and imo you're better off making a comptime function that does this pub fn ShortSlice(comptime T: type) type {
return struct {
ptr: [*]const T,
len: u32,
pub inline fn slice(self: *const @This(), from: u32, to: u32) []const T
pub inline fn at(self: *const @This(), index: u32) T
}
} also the reason I mention short pointers is because unless you can make the |
I amended the proposal.
I have not encountered this pattern a lot but if you do encounter it you should be able to find out pretty quickly by looking at the function's return type because if it is |
Note that |
Good point. The memory savings are probably less significant than I assumed. |
Mentioned this in my previous comment test {
@compileLog(@sizeOf(struct { ptr: [*]u8, len: u32 }));
}
these are the same if you swap |
See #1830 |
Preface
I'm proposing the possibility to specify the type of a slice's length.
A slice always allows me to index up to
std.math.maxInt(usize)
bytes of data, but that doesn't mean I ever actually want to index that much data.For example on modern 64-bit systems, all strings usually represented as
[]const u8
/[]u8
will include a 64-bit length, so all of those strings can be as big as 18446744073709552000 characters. Most of the time I never need my strings to be this long, or data in general, for that matter, as this is about slices as a whole and strings are just an example.It has bothered me for a while now how easy it is to waste memory as in this example: in my code I have a real case where a specific type of string is simply never allowed to be bigger than 32767 characters, meaning its length should be
u15
instead ofusize
(which would in turn be 8 bytes (u64
) on my system).Proposal
I propose the syntax
[Type]Type
where the firstType
is optional and defaults tousize
. It's easier to understand with examples:[]const u8
= as before, a constant slice with a length of typeusize
[u15]u8
= a mutable slice with a length of typeu15
[u8:0]const u32
= a constant zero-terminated slice with a length of typeu8
I think this syntax makes a lot of sense. So far, apart from
:sentinel
you are able to put this inside the square brackets of list types:usize
(-> slice, such as[]u8
)*
= no length (-> pointer, such as[*]u8
)*c
= no length (-> C pointer, such as[*c]u8
)[5]u8
)And now in addition:
[u8]u8
)The integer type has one restriction: it must be unsigned. Both as a length and as an index a signed integer type (i.e. the possibility of a negative value) does not make sense.
Possible Problem
Another restriction that could be imposed upon that slice length type is that the bit size must still be less than or equal to the bit size of
usize
. This however could cause one problem: if I have[u64]u8
in my code, it'll compile fine for a 64-bit system whereusize
=u64
but it will fail to compile for a 32-bit system whereusize
<u64
.Because of this we may not actually want to impose any bit size restriction on the slice length type.
So that means I can have
[u128]const u8
on a 64-bit system but will only be able to accessstd.math.maxInt(u64)
bytes, and I think that's OK. That's just how these limitations work.That is because indices even to slices with a custom length type will still be indexed using
usize
s (even if as a result of an implicit integer cast) because that's how a lot of hardware expects it as well (it needs a pointer-sized integer).So I think this possible problem is avoided if we do it this way.
Advantages
struct { [*]Type, usize }
. After this, what exactly a slice is represented as might gain more importance.Other Commentary
Not sure if there are any real disadvantages aside from one more thing to learn (well, if you need it) and additional language complexity. I think it's worth it.
This proposal if implemented as described does not break any existing code.
It seems to be a general trend in Zig to assume
usize
for a lot of things. This includes for examplefor (0..x) |i| {}
wherei
will implicitly beusize
. I think it will be better ultimately to steer towards a language where the programmer has more control over the types of these kinds of things (see also: #14704).I believe this proposal will also make progress towards making Zig a better language suitable for processors where the data size is not the same as the pointer size, like it is the case for the 6502 microprocessor. So for example in the case of the 6502 ultimately Zig should allow me to index data using either
u8
(the 6502's data size) oru16
(the 6502's pointer size) because currently Zig will always implicitly cast tousize
(=u16
on the 6502).Related to that and this proposal in some ways I think is #5185.
The text was updated successfully, but these errors were encountered: