Open
Description
My git version is 4903c11.
Description:
I am experiencing an inconsistent result when executing the same MLIR program with and without the -affine-loop-coalescing
.
Steps to Reproduce:
1. MLIR Program (test.mlir):
test.mlir:
module {
func.func private @printMemrefI32(tensor<*xi32>)
func.func @main() {
%0 = "tosa.const"() <{values = dense<2> : tensor<2x2x3xi32>}> : () -> tensor<2x2x3xi32>
%1 = "tosa.const"() <{values = dense<5> : tensor<2x1x2xi32>}> : () -> tensor<2x1x2xi32>
%2 = "tosa.const"() <{values = dense<0> : tensor<1xi32>}> : () -> tensor<1xi32>
%3 = "tosa.const"() <{values = dense<0> : tensor<1xi32>}> : () -> tensor<1xi32>
%4 = tosa.matmul %1, %0, %2, %3 : (tensor<2x1x2xi32>, tensor<2x2x3xi32>, tensor<1xi32>, tensor<1xi32>) -> tensor<2x1x3xi32>
%cast = tensor.cast %4 : tensor<2x1x3xi32> to tensor<*xi32>
call @printMemrefI32(%cast) : (tensor<*xi32>) -> ()
return
}
}
2. Command to Run Without -affine-loop-coalescing
:
/path/llvm-project/build/bin/mlir-opt test.mlir -pass-pipeline='builtin.module(func.func(tosa-to-linalg-named,tosa-to-linalg))' | \
/path/llvm-project/build/bin/mlir-opt -tosa-to-arith -one-shot-bufferize="bufferize-function-boundaries" -convert-linalg-to-affine-loops | \
/path/llvm-project/build/bin/mlir-opt -pass-pipeline="builtin.module(func.func(convert-affine-for-to-gpu{gpu-block-dims=1 gpu-thread-dims=0}))" | \
/path/llvm-project/build/bin/mlir-opt -lower-affine -gpu-lower-to-nvvm-pipeline | \
/path/llvm-project/build/bin/mlir-runner -e main -entry-point-result=void \
-shared-libs=/path/llvm-project/build/lib/libmlir_runner_utils.so \
-shared-libs=/path/llvm-project/build/lib/libmlir_c_runner_utils.so \
-shared-libs=/path/llvm-project/build/lib/libmlir_cuda_runtime.so.so
3. Output Without -affine-loop-coalescing
:
[[[20, 20, 20]],
[[20, 20, 20]]]
4. Command to Run With -affine-loop-coalescing
:
/path/llvm-project/build/bin/mlir-opt test.mlir -pass-pipeline='builtin.module(func.func(tosa-to-linalg-named,tosa-to-linalg))' | \
/path/llvm-project/build/bin/mlir-opt -tosa-to-arith -one-shot-bufferize="bufferize-function-boundaries" -convert-linalg-to-affine-loops -affine-loop-coalescing | \
/path/llvm-project/build/bin/mlir-opt -pass-pipeline="builtin.module(func.func(convert-affine-for-to-gpu{gpu-block-dims=1 gpu-thread-dims=0}))" | \
/path/llvm-project/build/bin/mlir-opt -lower-affine -gpu-lower-to-nvvm-pipeline | \
/path/llvm-project/build/bin/mlir-runner -e main -entry-point-result=void \
-shared-libs=/path/llvm-project/build/lib/libmlir_runner_utils.so \
-shared-libs=/path/llvm-project/build/lib/libmlir_c_runner_utils.so \
-shared-libs=/path/llvm-project/build/lib/libmlir_cuda_runtime.so.so
5. Output With -affine-loop-coalescing
:
[[[10, 10, 10]],
[[10, 10, 10]]]
I'm not sure if there is any bug in my program or if the wrong usage of the above passes caused this result.