Skip to content

Allow specifying the type of a slice's length #14926

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wooster0 opened this issue Mar 15, 2023 · 8 comments
Closed

Allow specifying the type of a slice's length #14926

wooster0 opened this issue Mar 15, 2023 · 8 comments

Comments

@wooster0
Copy link
Contributor

wooster0 commented Mar 15, 2023

Preface

I'm proposing the possibility to specify the type of a slice's length.

A slice always allows me to index up to std.math.maxInt(usize) bytes of data, but that doesn't mean I ever actually want to index that much data.
For example on modern 64-bit systems, all strings usually represented as []const u8/[]u8 will include a 64-bit length, so all of those strings can be as big as 18446744073709552000 characters. Most of the time I never need my strings to be this long, or data in general, for that matter, as this is about slices as a whole and strings are just an example.

It has bothered me for a while now how easy it is to waste memory as in this example: in my code I have a real case where a specific type of string is simply never allowed to be bigger than 32767 characters, meaning its length should be u15 instead of usize (which would in turn be 8 bytes (u64) on my system).

Proposal

I propose the syntax [Type]Type where the first Type is optional and defaults to usize. It's easier to understand with examples:

  • []const u8 = as before, a constant slice with a length of type usize
  • [u15]u8 = a mutable slice with a length of type u15
  • [u8:0]const u32 = a constant zero-terminated slice with a length of type u8

I think this syntax makes a lot of sense. So far, apart from :sentinel you are able to put this inside the square brackets of list types:

  • Nothing = runtime-known length of type usize (-> slice, such as []u8)
  • * = no length (-> pointer, such as [*]u8)
  • *c = no length (-> C pointer, such as [*c]u8)
  • Integer = comptime-known length (-> array, such as [5]u8)

And now in addition:

  • Integer type = runtime-known length of that type (-> slice, such as [u8]u8)

The integer type has one restriction: it must be unsigned. Both as a length and as an index a signed integer type (i.e. the possibility of a negative value) does not make sense.

Possible Problem

Another restriction that could be imposed upon that slice length type is that the bit size must still be less than or equal to the bit size of usize. This however could cause one problem: if I have [u64]u8 in my code, it'll compile fine for a 64-bit system where usize = u64 but it will fail to compile for a 32-bit system where usize < u64.

Because of this we may not actually want to impose any bit size restriction on the slice length type.
So that means I can have [u128]const u8 on a 64-bit system but will only be able to access std.math.maxInt(u64) bytes, and I think that's OK. That's just how these limitations work.
That is because indices even to slices with a custom length type will still be indexed using usizes (even if as a result of an implicit integer cast) because that's how a lot of hardware expects it as well (it needs a pointer-sized integer).

So I think this possible problem is avoided if we do it this way.

Advantages

  • Possibly save memory and in turn improve performance (but see Allow specifying the type of a slice's length #14926 (comment)).
  • Make these bytes taken up by a slice's length more transparent to the programmer. I think many don't realize that a "slice" is just a fancy struct { [*]Type, usize }. After this, what exactly a slice is represented as might gain more importance.
  • Make slices customizable to a degree where I wouldn't have to create my own slice types (like with the struct in the previous point) just to save memory. This would result in ugly, hard-to-read code.
  • Make code possibly easier to understand. If I give a slice a specific type for its length it might give the reader more of an idea of what data it may be.
  • Can serve as an additional prevention against bugs relating to data being longer than it is allowed to be.

Other Commentary

Not sure if there are any real disadvantages aside from one more thing to learn (well, if you need it) and additional language complexity. I think it's worth it.

This proposal if implemented as described does not break any existing code.

It seems to be a general trend in Zig to assume usize for a lot of things. This includes for example for (0..x) |i| {} where i will implicitly be usize. I think it will be better ultimately to steer towards a language where the programmer has more control over the types of these kinds of things (see also: #14704).


I believe this proposal will also make progress towards making Zig a better language suitable for processors where the data size is not the same as the pointer size, like it is the case for the 6502 microprocessor. So for example in the case of the 6502 ultimately Zig should allow me to index data using either u8 (the 6502's data size) or u16 (the 6502's pointer size) because currently Zig will always implicitly cast to usize (= u16 on the 6502).

Related to that and this proposal in some ways I think is #5185.

@wooster0 wooster0 changed the title Allow changing the type of a slice's length Allow specifying the type of a slice's length Mar 15, 2023
@leecannon
Copy link
Contributor

Upon first glance, there are two things that pop into my head:

  • Can the type that is given be anything other than an unsigned integer? For example is i32 fine?
  • There is now the possibility of not knowing whether [comptimeFunction()]u8 is a slice or an array, without looking at comptimeFunction.

@nektro
Copy link
Contributor

nektro commented Mar 15, 2023

i don't think this fits into Zig. if we had things like short pointers and operator overloading then maybe, but at that point Zig has lost a lot of its simplicity and imo you're better off making a comptime function that does this

pub fn ShortSlice(comptime T: type) type {
    return struct {
        ptr: [*]const T,
        len: u32,

        pub inline fn slice(self: *const @This(), from: u32, to: u32) []const T

        pub inline fn at(self: *const @This(), index: u32) T
    }
}

also the reason I mention short pointers is because unless you can make the .ptr field also 32 bits then you're not going to see the benefit of this unless this is an optimization only for <64 bit machines

@wooster0
Copy link
Contributor Author

Can the type that is given be anything other than an unsigned integer? For example is i32 fine?

I amended the proposal.

There is now the possibility of not knowing whether [comptimeFunction()]u8 is a slice or an array, without looking at comptimeFunction.

I have not encountered this pattern a lot but if you do encounter it you should be able to find out pretty quickly by looking at the function's return type because if it is type then it is a slice and otherwise an array.

@IntegratedQuantum
Copy link
Contributor

Note that @sizeOf(struct{ptr: [*]u8, len: u15}) is 16 bytes, the same as @sizeOf([]u8).
So this will not actually save memory.

@wooster0
Copy link
Contributor Author

Good point. The memory savings are probably less significant than I assumed.

@rohlem
Copy link
Contributor

rohlem commented Mar 15, 2023

Probably a duplicate of #1830 , although it might be more search-able and looks more detailed than the other issue at a first glance.
(And as noted there, with #3806 it would become even more customizable f.e. returning a slice pointing to 1 - 4 elements).

@nektro
Copy link
Contributor

nektro commented Mar 15, 2023

Mentioned this in my previous comment

test {
    @compileLog(@sizeOf(struct { ptr: [*]u8, len: u32 }));
}
$ zig2 test test.zig -target x86_64-linux
Compile Log Output:
@as(comptime_int, 16)
$ zig2 test test.zig -target x86-linux
Compile Log Output:
@as(comptime_int, 8)

these are the same if you swap u32 for usize because on x86 the size of usize is the same as u32 and on x86_64 the size is padded for alignment since [*]T is a u64/usize on 64 bit platforms

@wooster0
Copy link
Contributor Author

See #1830

@wooster0 wooster0 closed this as not planned Won't fix, can't repro, duplicate, stale Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants