-
Notifications
You must be signed in to change notification settings - Fork 1k
Vectorize [un]pack4x{I, U}8[Clamp]
on spv and msl
#7664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
c8e640f
to
ac120e9
Compare
This comment was marked as resolved.
This comment was marked as resolved.
[un]pack4x{I, U}8[Clamp]
on spv and msl[un]pack4x{I, U}8[Clamp]
on spv and msl
Emits vectorized SPIR-V code for the WGSL functions `unpack4xI8`, `unpack4xU8`, `pack4xI8`, `pack4xU8`, `pack4xI8Clamp`, `pack4xU8Clamp`. Exploits the following facts about SPIR-V ops: - `SClamp`, `UClamp`, and `OpUConvert` accept vector arguments, in which case results are computed per component; and - `OpBitcast` can cast between vectors and scalars, with a well-defined bit order that matches that required by the WGSL spec, see below. WGSL spec for `pack4xI8` [1]: > Component e[i] of the input is mapped to bits 8 x i through 8 x i + 7 > of the result. SPIR-V spec for `OpBitcast` [2]: > Within this mapping, any single component of `S` [remark: the type > with fewer but wider components] (mapping to multiple components of > `L` [remark: the type with more but narrower components]) maps its > lower-ordered bits to the lower-numbered components of `L`. [1] https://www.w3.org/TR/WGSL/#pack4xI8-builtin [2] https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpBitcast
Implements more direct conversions between 32-bit integers and 4x8-bit integer vectors using bit casting to/from `packed_[u]char4` when on MSL 2.1+ (older versions of MSL don't seem to support these bit casts). - `unpack4x{I, U}8(x)` becomes `[u]int4(as_type<packed_[u]char4>(x))`; - `pack4x{I, U}8(x)` becomes `as_type<uint>(packed_[u]char4(x))`; and - `pack4x{I, U}8Clamp(x)` becomes `as_type<uint>(packed_uchar4(metal::clamp(x, 0, 255)))`. These bit casts match the WGSL spec for these functions because Metal runs on little-endian machines.
Separates the Vulkan feature sets `VkPhysicalDeviceShaderFloat16Int8Features` and `VkPhysicalDevice16BitStorageFeatures`, which previously were used "together, or not at all". This commit should not change any behavior yet, but I'd like to run full CI tests on it for now. If the CI tests pass, I'll use this separation to enable the `shader_int8` feature separately from the rest of the features to enable optimizations of `[un]pack4x{I,U}8[Clamp]` on SPIR-V.
This allows declaring the SPIR-V capability "Int8", which allows us to generate faster code for `[un]pack4x{I, U}8[Clamp]`.
I added the required changes to This will need some squashing before being merged, but it might be easier to review the PR with the current history (please let me know if you'd like me to squash first). Notes to self: |
FWIW, the optimization of |
Connections
Related to, but independent of:
dot4{I, U}8Packed
on SPIR-V and HLSL #7574dot4{I,U}8Packed
on Metal #7653Description
Emits simpler code (using vectorized ops) for
unpack4xI8
,unpack4xU8
,pack4xI8
,pack4xU8
,pack4xI8Clamp
, andpack4xU8Clamp
on SPIR-V and Metal (with MSL 2.1+).SClamp
,UClamp
, andOpUConvert
accept vector arguments (in which case results are computed per component), and thatOpBitcast
can cast between vectors and scalars, with a well-defined bit order that matches that required by the WGSL spec (see below for details).unpack4x{I, U}8(x)
→[u]int4(as_type<packed_[u]char4>(x))
,pack4x{I, U}8(x)
→as_type<uint>(packed_[u]char4(x))
, andpack4x{I, U}8Clamp(x)
→as_type<uint>(packed_uchar4(metal::clamp(x, 0, 255)))
.Regarding byte order (this was not relevant for the dot product in #7653 but is relevant now):
Testing
uint
).Is there an easy way to do this?wgpu_gpu::shader::data_builtins::pack4x_i8
, but currently uses the polyfill]Squash or Rebase?
All commits should pass CI.Needs squashing.
Checklist
cargo fmt
.taplo format
.cargo clippy --tests
. If applicable, add:--target wasm32-unknown-unknown
cargo xtask test
to run tests.CHANGELOG.md
entry.