-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
remove vector type; support SIMD operations on arrays directly #23327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've independently come up with more-or-less this proposal in conversations in the past, and I think it's a good idea. My version used builtins rather than directly allowing arithmetic operations, but I don't think I have any problems with allowing arithmetic ops directly on arrays. My reasoning for this is that vectors are predominantly intended to be used locally, where the optimizer can use the hell out of SIMD registers, so details such as memory layout really shouldn't matter that much. The only context where it seems somewhat important to me is, as you mention, calling conventions. However, I don't think that's really a problem:
|
One that comes to mind is accidentally typing |
That's a great example. To elaborate on the problem: const std = @import("std");
const lhs = [_]i32{ 1, 2, 3, 4 };
const rhs = [_]i32{ 5, 6, 7, 8 };
const concatenated = lhs + rhs;
pub fn main() !void {
for (concatenated) |elem| {
std.log.info("elem: {d}", .{elem});
}
} With this proposal, this code would compile and run, and have incorrect behavior at runtime. |
What about vectors within aggregate structs, e.g.? Now you need to allow I think we also need to consider scalable vectors in all this, i.e. vectors whose size is scaled by some runtime-known constant. This is a relatively new concept that's seen on AArch64 and RISC-V. It's not obvious to me that there's a nice way to add support for these without dedicated vector type syntax. |
I agree with @alexrp's points. Also, I would consider this proposal blocked on none of the backends handling alignment properly, so an over-aligned array would not currently be able to produce the same machine code.
Note that it isn't just aligned vs non-aligned instructions, a smaller array than the full vector width would prevent even unaligned instructions for fear of reading unmapped memory. Another problem is that array values would not have the correct alignment, and this would currently prevent vector instructions on even aligned arrays. We would also lose a type to represent vector masks, which we currently use bool vectors for. |
If arrays and SIMD operations become a single type, would that mean something like: pub fn addDataSIMD(
comptime T: type,
comptime vector_size: usize,
data_a: []const T,
data_b: []const T,
result: []T,
) !void {
const Vector = @Vector(vector_size, T);
if (data_a.len != data_b.len) return error.UnequalLength;
if (data_a.len != result.len) return error.ResultLengthMismatch;
const full_blocks = data_a.len / vector_size;
// Process full blocks using SIMD
for (0..full_blocks) |i| {
const start = i * vector_size;
const vec_a: Vector = data_a[start..][0..vector_size].*;
const vec_b: Vector = data_b[start..][0..vector_size].*;
result[start..][0..vector_size].* = vec_a + vec_b;
}
// Handle remaining elements
const remainder = full_blocks * vector_size;
for (remainder..data_a.len) |i| {
result[i] = data_a[i] + data_b[i];
}
} will convert to: pub fn addDataSIMD(
comptime T: type,
comptime vector_size: usize,
data_a: []const T,
data_b: []const T,
result: []T,
) !void {
if (data_a.len != data_b.len) return error.UnequalLength;
if (data_a.len != result.len) return error.ResultLengthMismatch;
const full_blocks = data_a.len / vector_size;
// Process full blocks using SIMD
for (0..full_blocks) |i| {
const start = i * vector_size;
const vec_a: [vector_size]T = data_a[start..][0..vector_size].*;
const vec_b: [vector_size]T = data_b[start..][0..vector_size].*;
result[start..][0..vector_size].* = vec_a + vec_b;
}
// Handle remaining elements - would this need to change now?
const remainder = full_blocks * vector_size;
for (remainder..data_a.len) |i| {
result[i] = data_a[i] + data_b[i];
}
} or, since the pub fn addDataSIMD(
comptime T: type,
data_a: []const T,
data_b: []const T,
result: []T,
) !void {
if (data_a.len != data_b.len) return error.UnequalLength;
if (data_a.len != result.len) return error.ResultLengthMismatch;
const full_blocks = data_a.len / vector_size;
// Process full blocks using SIMD
result = data_a + data_b;
} and if that is the case, how would we tell the addDataSIMD function what vector size to use (sorry if I missed any explanations for this in the above comments!) I think I might be over-complicating things by looking at slices instead of arrays, but I thought it was worth leaving here anyway as it would be useful for signal/video processing and machine learning. |
Your last step is invalid; this issue does not propose allowing arithmetic operators on slices, only arrays. |
This proposal could naturally extend to that by allowing the same set of operations on slices, like written above. |
I don't think so; even if we assume that you have some |
Would it make sence in contexts other than gpus? For cache reasons you probably don't want to compute operation on full slice if you want to chain them. fn foo(a: []const u8, b: []const u8: out: []const u8) void {
out = a + b; // cache misses for all "out"
out += b; // again cache misses
} |
From my experience with designing Odin, it took a few years to finally figure out that I should keep fixed-length arrays and Here are the few reasons as to why I settled on what I did:
Things relevant to Zig:
Things relevant to LLVM:
|
I also forgot to mention that there are scalable SIMD vectors too which don't trivially map to normal fixed-length arrays. See ARM's Scalable Vector Extension (SVE) as a brilliant example of this. From a type system perspective, it's harder to represent them with just arrays since they are a little different. As a programmer who might want to use this, without resorting to assembly, it's going to be nigh impossible to represent it without a dedicated SIMD type. Relative to LLVM:
|
Thanks for the discussion, all. |
I could have sworn there was an issue for this already but I couldn't find it, so here it is.
Arrays coerce to vectors. Vectors coerce to arrays. What's the point of a separate type? I don't know of any type safety argument.
Typically, SIMD operations are performed on arrays. Converting between vector and array is a chore that does not accomplish anything.
There is also the awkwardness of
@Vector
as a way to create the type. There are multiple proposals (see below) trying to make the syntax more palatable.Problems:
The vector type affects the ABI of C calling convention functions. This is load-bearing in compiler_rt for example:
zig/lib/compiler_rt/modti3.zig
Lines 24 to 26 in 9c9d393
If anyone comes up with examples of how this could lead to worsened type safety (i.e. it could be easy to make a mistake and have a bug rather than compile error), that would be a critical flaw in this proposal that would be likely to make it rejected.
Finally, alignment. For many CPUs, vectors have larger alignment than arrays. This proposal is to keep arrays having the same alignment as status quo. To upgrade code that should lower to exactly the same machine code as before, those arrays that are used as vectors will need to be explicitly overaligned to vector alignment. Such extra alignment is a tradeoff; the more compact memory layout can help CPU cache efficiency, but vector alignment ensures that aligned vector load instructions can be selected rather than unaligned variants. Generally, programmers would be able to use default alignment, and then occasionally, after measuring, decide that the aligned vector load instructions are worth it to put an alignment annotation on the arrays that are used for SIMD operations.
Related:
.len
field to vector values #17886The text was updated successfully, but these errors were encountered: