Skip to content

remove vector type; support SIMD operations on arrays directly #23327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
andrewrk opened this issue Mar 23, 2025 · 13 comments
Closed

remove vector type; support SIMD operations on arrays directly #23327

andrewrk opened this issue Mar 23, 2025 · 13 comments
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@andrewrk
Copy link
Member

I could have sworn there was an issue for this already but I couldn't find it, so here it is.

Arrays coerce to vectors. Vectors coerce to arrays. What's the point of a separate type? I don't know of any type safety argument.

Typically, SIMD operations are performed on arrays. Converting between vector and array is a chore that does not accomplish anything.

There is also the awkwardness of @Vector as a way to create the type. There are multiple proposals (see below) trying to make the syntax more palatable.

Problems:

The vector type affects the ABI of C calling convention functions. This is load-bearing in compiler_rt for example:

const v2u64 = @Vector(2, u64);
fn __modti3_windows_x86_64(a: v2u64, b: v2u64) callconv(.c) v2u64 {

If anyone comes up with examples of how this could lead to worsened type safety (i.e. it could be easy to make a mistake and have a bug rather than compile error), that would be a critical flaw in this proposal that would be likely to make it rejected.

Finally, alignment. For many CPUs, vectors have larger alignment than arrays. This proposal is to keep arrays having the same alignment as status quo. To upgrade code that should lower to exactly the same machine code as before, those arrays that are used as vectors will need to be explicitly overaligned to vector alignment. Such extra alignment is a tradeoff; the more compact memory layout can help CPU cache efficiency, but vector alignment ensures that aligned vector load instructions can be selected rather than unaligned variants. Generally, programmers would be able to use default alignment, and then occasionally, after measuring, decide that the aligned vector load instructions are worth it to put an alignment annotation on the arrays that are used for SIMD operations.

Related:

@andrewrk andrewrk added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. labels Mar 23, 2025
@andrewrk andrewrk added this to the 0.15.0 milestone Mar 23, 2025
@mlugg
Copy link
Member

mlugg commented Mar 23, 2025

I've independently come up with more-or-less this proposal in conversations in the past, and I think it's a good idea. My version used builtins rather than directly allowing arithmetic operations, but I don't think I have any problems with allowing arithmetic ops directly on arrays.

My reasoning for this is that vectors are predominantly intended to be used locally, where the optimizer can use the hell out of SIMD registers, so details such as memory layout really shouldn't matter that much. The only context where it seems somewhat important to me is, as you mention, calling conventions. However, I don't think that's really a problem:

  • Within pure-Zig code, it shouldn't really matter what the returning / parameter passing convention for arrays/vectors is, because the optimizer is free to do whatever it wants. Any failure of the compiler to do so right now (I've not tested this!) I would view as an LLVM deficiency, because this doesn't seem particularly difficult (assuming the call isn't just inlined, consider the function body and look for loads which want overalignment, and overalign parameters correspondingly).
  • When interacting with C code, it seems reasonable to just have declarations in std.c (or similar) which "emulate" vector types in terms of calling convention; or, if that isn't possible (calling conventions are weird and I suspect some have vector CCs which can't be matched with other types), we could always add a new parameter annotation (similar syntax to noalias) to indicate a parameter should be passed as a vector (such an annotation would only be valid on functions with a non-Zig callconv). That is, something like extern fn foo(vector x: [3]u32) void. (This does reserve a keyword, which would be nice to avoid; perhaps with some kind of general attribute system which uses an enum of all available attributes.)

@silversquirl
Copy link
Contributor

examples of how this could lead to worsened type safety

One that comes to mind is accidentally typing a + b instead of a ++ b. It's quite specific though, since it would only work when a and b are the same length, and it's probably not a big deal in practice, since it's immediately very obvious something's gone wrong when you run the code.

@andrewrk
Copy link
Member Author

That's a great example. To elaborate on the problem:

const std = @import("std");

const lhs = [_]i32{ 1, 2, 3, 4 };
const rhs = [_]i32{ 5, 6, 7, 8 };
const concatenated = lhs + rhs;

pub fn main() !void {
    for (concatenated) |elem| {
        std.log.info("elem: {d}", .{elem});
    }
}

With this proposal, this code would compile and run, and have incorrect behavior at runtime.

@alexrp
Copy link
Member

alexrp commented Mar 23, 2025

  • we could always add a new parameter annotation (similar syntax to noalias) to indicate a parameter should be passed as a vector (such an annotation would only be valid on functions with a non-Zig callconv). That is, something like extern fn foo(vector x: [3]u32) void. (This does reserve a keyword, which would be nice to avoid; perhaps with some kind of general attribute system which uses an enum of all available attributes.)

What about vectors within aggregate structs, e.g.? Now you need to allow vector on fields or something to that effect. Doesn't it just start to look like vector types but more awkward at that point?

I think we also need to consider scalable vectors in all this, i.e. vectors whose size is scaled by some runtime-known constant. This is a relatively new concept that's seen on AArch64 and RISC-V. It's not obvious to me that there's a nice way to add support for these without dedicated vector type syntax.

@jacobly0
Copy link
Member

jacobly0 commented Mar 23, 2025

I agree with @alexrp's points. Also, I would consider this proposal blocked on none of the backends handling alignment properly, so an over-aligned array would not currently be able to produce the same machine code.

aligned vector load instructions

Note that it isn't just aligned vs non-aligned instructions, a smaller array than the full vector width would prevent even unaligned instructions for fear of reading unmapped memory.

Another problem is that array values would not have the correct alignment, and this would currently prevent vector instructions on even aligned arrays.

We would also lose a type to represent vector masks, which we currently use bool vectors for.

@spkgyk
Copy link

spkgyk commented Mar 24, 2025

If arrays and SIMD operations become a single type, would that mean something like:

pub fn addDataSIMD(
    comptime T: type,
    comptime vector_size: usize,
    data_a: []const T,
    data_b: []const T,
    result: []T,
) !void {
    const Vector = @Vector(vector_size, T);

    if (data_a.len != data_b.len) return error.UnequalLength;
    if (data_a.len != result.len) return error.ResultLengthMismatch;

    const full_blocks = data_a.len / vector_size;

    // Process full blocks using SIMD
    for (0..full_blocks) |i| {
        const start = i * vector_size;

        const vec_a: Vector = data_a[start..][0..vector_size].*;
        const vec_b: Vector = data_b[start..][0..vector_size].*;

        result[start..][0..vector_size].* = vec_a + vec_b;
    }

    // Handle remaining elements
    const remainder = full_blocks * vector_size;
    for (remainder..data_a.len) |i| {
        result[i] = data_a[i] + data_b[i];
    }
}

will convert to:

pub fn addDataSIMD(
    comptime T: type,
    comptime vector_size: usize,
    data_a: []const T,
    data_b: []const T,
    result: []T,
) !void {
    if (data_a.len != data_b.len) return error.UnequalLength;
    if (data_a.len != result.len) return error.ResultLengthMismatch;

    const full_blocks = data_a.len / vector_size;

    // Process full blocks using SIMD
    for (0..full_blocks) |i| {
        const start = i * vector_size;

        const vec_a: [vector_size]T = data_a[start..][0..vector_size].*;
        const vec_b: [vector_size]T = data_b[start..][0..vector_size].*;

        result[start..][0..vector_size].* = vec_a + vec_b;
    }

    // Handle remaining elements - would this need to change now?
    const remainder = full_blocks * vector_size;
    for (remainder..data_a.len) |i| {
        result[i] = data_a[i] + data_b[i];
    }
}

or, since the @Vector automatically scales SIMD, it would simplify even further to:

pub fn addDataSIMD(
    comptime T: type,
    data_a: []const T,
    data_b: []const T,
    result: []T,
) !void {
    if (data_a.len != data_b.len) return error.UnequalLength;
    if (data_a.len != result.len) return error.ResultLengthMismatch;

    const full_blocks = data_a.len / vector_size;

    // Process full blocks using SIMD
    result = data_a + data_b;
}

and if that is the case, how would we tell the addDataSIMD function what vector size to use (sorry if I missed any explanations for this in the above comments!)

I think I might be over-complicating things by looking at slices instead of arrays, but I thought it was worth leaving here anyway as it would be useful for signal/video processing and machine learning.

@mlugg
Copy link
Member

mlugg commented Mar 24, 2025

Your last step is invalid; this issue does not propose allowing arithmetic operators on slices, only arrays.

@Snektron
Copy link
Collaborator

Snektron commented Mar 24, 2025

I think we also need to consider scalable vectors in all this, i.e. vectors whose size is scaled by some runtime-known constant. This is a relatively new concept that's seen on AArch64 and RISC-V. It's not obvious to me that there's a nice way to add support for these without dedicated vector type syntax.

This proposal could naturally extend to that by allowing the same set of operations on slices, like written above.

@alexrp
Copy link
Member

alexrp commented Mar 24, 2025

This proposal could naturally extend to that by allowing the same set of operations on slices, like written above.

I don't think so; even if we assume that you have some @vscale() builtin that you can use to size your slices appropriately for the hardware you're running on, I don't see how the compiler would know to lower an operation like vscaled_result = vscaled_slice1 * vscaled_slice2 to actual hardware instructions using scalable vectors. By not encoding the length and vscale factor in the type, you've lost the necessary knowledge.

@AndrewKraevskii
Copy link
Contributor

I think we also need to consider scalable vectors in all this, i.e. vectors whose size is scaled by some runtime-known constant. This is a relatively new concept that's seen on AArch64 and RISC-V. It's not obvious to me that there's a nice way to add support for these without dedicated vector type syntax.

This proposal could naturally extend to that by allowing the same set of operations on slices, like written above.

Would it make sence in contexts other than gpus? For cache reasons you probably don't want to compute operation on full slice if you want to chain them.

fn foo(a: []const u8, b: []const u8: out: []const u8) void {
    out = a + b; // cache misses for all "out"
    out += b; // again cache misses
}

@gingerBill
Copy link

gingerBill commented Mar 24, 2025

From my experience with designing Odin, it took a few years to finally figure out that I should keep fixed-length arrays and #simd vectors be separate types. I originally had arrays with array-programming must be simd, then removed the simd stuff and then add basic array programming to all arrays, and then finally add a separate #simd type.

Here are the few reasons as to why I settled on what I did:

  • #simd vectors are similar to arrays but they do usually have very different alignment rules
    • Usually at least 16-bytes required for alignment
    • align(16) [4]f32 might still be an array type but that's not really any more clear than doing @Vector(4, f32) as that is showing the intent of the type much better
  • #simd vectors have very different semantics when it comes to addressing/indexing. In Odin, you cannot index a lane from a #simd vector with normal-array syntax (i.e. a[i] or &a[i]) because how that maps down in instructions is very different to than the typical way a normal-array would work. So in Odin, it has this syntax:
    e := simd.extract(v, i)
    v = simd.replace(v, i, e)
    • This also helps guarantee the operations are SIMD like rather than be poorly optimized.
    • So v[i] = e is not allowed because it has to be made clear that when doing SIMD work, you are replacing a field and the assigning the new vector to the original variable, not just changing one section of memory.
  • Conversions between #simd types of the same type might not require any need temporary memory, whilst if it was an array, a naive conversion would do this and the optimizer might struggle in some cases to figure this out (e.g. aliasing issues)
  • Type conversions in Odin are typically explicit, and #simd <-> arrays are no different in that they require an explicit conversion. We supply simd.to_array, simd.from_array, simd.from_slice, et al.
    • I personally do not like the implicit conversions that Zig offer.
  • Array programming is defined for some operators but not all for numerous pragmatic reasons. And for #simd in general, there are other operators which have use the simd.* based procedures rather than operators directly.
    • << and >> are not trivially defined for array-programming in Odin nor for #simd as they can lead to very annoying bugs or unwanted behaviour.
      • This is also coupled with how Odin defines << differently from C. In Odin x << 2 is defined to have same result as as (x << 1) << 1, and that does not necessarily map well to #simd, so explicit procedures/intrinsics are required.
  • Things like x += y are not always what you want for certain operations depending on the type
  • When it comes to comparison based operations, you will want separate calls for things like simd.lanes_eq and do not overload it with ==
    • a == b is always defined to return an untyped boolean and not any form of array of booleans
    • And for something like simd.lanes_eq, you will want it to be like #simd[N]i32/@Vector(N, i32) and @Vector(N, bool) as when doing a lot of SIMD work, you want to treat the results like an integer
    • Order comparisons (<, <=, >, >=) are not defined for either arrays nor #simd, and you must explicit specify the kind of comparison you need in #simd (e.g. simd.lanes_lt)
  • You also do not want to allow any form of array programming on slices or other dynamic array types
    • too many checks (same length checks, alignment checks, etc)
    • doesn't scale well for different platforms
    • would require implicit allocations if you want the same syntax e.g. slice_a + slice_b requires an allocation or always requiring another buffer out = a + b but again requires too many checks to be useful for the syntax. + is then too magical.
  • Odin #simd[N]T restrict the element type to only integers, floats, or booleans
    • You could limit this to anything else if you implemented the simd-like operations as functions but then you don't have any benefits with syntax.
  • Arrays and SIMD vectors might have completely different ABI requirements even if the alignments are the same
    • In Odin, [8]f32 and #simd[8]f32 might be passed very differently depending on the platform and ABI

Things relevant to Zig:

  • a + b for arrays could be easily confused for a ++ b, and vice versa
  • a * b for arrays could be easily confused for a ** b, and vice versa
  • If you are to keep @Vector, I'd highly improve the @splat syntax to not be as necessary.
    • x * @as(@Vector(3, f32), @splat(2)) vs x * 2

Things relevant to LLVM:

  • If you don't use vector types directly in LLVM sometimes, it will not generate the correct code. The auto-vectorizer is great in LLVM but it does need help sometimes, so don't rely on it fully
    • You will also need to enable target features per procedure for many simd stuff, and LLVM really wants you to write the source code in vectors and not fake-array code

@gingerBill
Copy link

I also forgot to mention that there are scalable SIMD vectors too which don't trivially map to normal fixed-length arrays. See ARM's Scalable Vector Extension (SVE) as a brilliant example of this.

From a type system perspective, it's harder to represent them with just arrays since they are a little different. As a programmer who might want to use this, without resorting to assembly, it's going to be nigh impossible to represent it without a dedicated SIMD type.

Relative to LLVM:

  • LLVM now has a specialized SVE type which is of the form <vscale x N x eltty>.

@andrewrk
Copy link
Member Author

Thanks for the discussion, all.

@andrewrk andrewrk closed this as not planned Won't fix, can't repro, duplicate, stale Mar 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

9 participants