Open
Description
I ran across a method that asked for me to use a shared memory pool of size blocksize and pull from it a few times in a for
/while
loop. I found that the CPU had problems with this. Here is a mwe (without the shmem shenanigans):
using Test
using CUDA
using CUDAKernels
using KernelAbstractions
@kernel function f_test_kernel!(a)
tid = @index(Global, Linear)
@uniform N = length(a)
@uniform b = 0
for i = 1:10
if tid < N
b += 1
@synchronize()
end
end
end
a = zeros(1024)
# works
wait(f_test_kernel!(CUDADevice(),256)(CuArray(a), ndrange=1024))
# doesn't work
wait(f_test_kernel!(CPU(),4)(a, ndrange=1024))
Note: without the if
statement, everything works fine. I also tried a few different nested if statements to see if a similar error occurred, but could not replicate it. It seems to be specifically a loop after a conditional (although maybe a loop in a loop would also trigger it? Still digging).
Error message (tid not defined
):
ERROR: LoadError: TaskFailedException
Stacktrace:
[1] wait
@ ./task.jl:322 [inlined]
[2] wait
@ ~/projects/KernelAbstractions.jl/src/cpu.jl:65 [inlined]
[3] wait (repeats 2 times)
@ ~/projects/KernelAbstractions.jl/src/cpu.jl:29 [inlined]
[4] top-level scope
@ ~/projects/simuleios/histograms/mwe4.jl:22
[5] include(fname::String)
@ Base.MainInclude ./client.jl:444
[6] top-level scope
@ REPL[7]:1
[7] top-level scope
@ ~/.julia/packages/CUDA/YpW0k/src/initialization.jl:52
nested task error: UndefVarError: tid not defined
Stacktrace:
[1] cpu_f_test_kernel!(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.NoDynamicCheck, CartesianIndex{1}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(4,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}}, ::Vector{Float64})
@ ./none:0 [inlined]
[2] overdub
@ ./none:0 [inlined]
[3] __thread_run(tid::Int64, len::Int64, rem::Int64, obj::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(4,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_f_test_kernel!)}, ndrange::Tuple{Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(4,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}, args::Tuple{Vector{Float64}}, dynamic::KernelAbstractions.NDIteration.NoDynamicCheck)
@ KernelAbstractions ~/projects/KernelAbstractions.jl/src/cpu.jl:157
[4] __run(obj::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(4,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_f_test_kernel!)}, ndrange::Tuple{Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(4,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}, args::Tuple{Vector{Float64}}, dynamic::KernelAbstractions.NDIteration.NoDynamicCheck)
@ KernelAbstractions ~/projects/KernelAbstractions.jl/src/cpu.jl:130
[5] (::KernelAbstractions.var"#33#34"{Nothing, Nothing, typeof(KernelAbstractions.__run), Tuple{KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(4,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_f_test_kernel!)}, Tuple{Int64}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(4,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}, Tuple{Vector{Float64}}, KernelAbstractions.NDIteration.NoDynamicCheck}})()
@ KernelAbstractions ~/projects/KernelAbstractions.jl/src/cpu.jl:22
in expression starting at /home/leios/projects/simuleios/histograms/mwe4.jl:22
I'll try my hand at it if I cannot find a workaround, but I figured I would create an issue here first.