Description
The genx.matrix.dpas
operator is lowered to a SIMT DPAS instruction. The operands of the DPAS instructions (A,B) have several constraints, depending on the data type of the vector element. These are the expected input operands shapes:
-
If the size of the element is 16 bits (
f16
,bf16
): the expected input A, B operands' types arevector<8 x f16>
orvector<8 x bf16>
, the GENX translation generates a bitcast tovector<8 x i16>
to satisfy IGC's requirements -
If the size of the element is 32 bits (
tf32
): the expected input A, B operands types arevector<4 x tf32>
, i.e., half of the row or column, the GENX translation generates a bitcast tovector<4 x i32>
-
If the size of element is 8 bits (
i8
,u8
): the expected input A, B operands types arevector<8 x i16/u16>
, i.e. 2 adjacent elements of i8/u8 are packed into a 16 bit element type
For case (1) there are no issues while for case (2) and (3) the operator requirements cause the producer to pack data before creating the operation, specifically:
-
case 2: if the producer input types (for A, B) are
vector<8 x tf32>
it needs to split it to twovector<4 x tf32>
between two SIMD lanes/Warp elements before calling the GENX operation. -
case 3: if the produces input types (for A, B) are
vector<16 x i8/u8>
it needs to pack the two adjacent 8 bit elements into avector<8 x i16/u16>
before calling the GENX operation
The genx.matrix.dpas
operator should be improved to allow inputs of type vector<8 x tf32>
(case 2) and vector<16 x i8/u8>
(case 3). The operation should take care of packing or splitting the input vectors appropriately.