Skip to content

Synchronization in a nested while loop causes loss of threadID #274

Open
@leios

Description

@leios

I ran across a method that asked for me to use a shared memory pool of size blocksize and pull from it a few times in a for/while loop. I found that the CPU had problems with this. Here is a mwe (without the shmem shenanigans):

using Test
using CUDA
using CUDAKernels
using KernelAbstractions

@kernel function f_test_kernel!(a)
    tid = @index(Global, Linear)

    @uniform N = length(a)

    @uniform b = 0
    for i = 1:10
        if tid < N
            b += 1
            @synchronize()
        end
    end
end

a = zeros(1024)

# works
wait(f_test_kernel!(CUDADevice(),256)(CuArray(a), ndrange=1024))

# doesn't work
wait(f_test_kernel!(CPU(),4)(a, ndrange=1024))

Note: without the if statement, everything works fine. I also tried a few different nested if statements to see if a similar error occurred, but could not replicate it. It seems to be specifically a loop after a conditional (although maybe a loop in a loop would also trigger it? Still digging).

Error message (tid not defined):

ERROR: LoadError: TaskFailedException
Stacktrace:
 [1] wait
   @ ./task.jl:322 [inlined]
 [2] wait
   @ ~/projects/KernelAbstractions.jl/src/cpu.jl:65 [inlined]
 [3] wait (repeats 2 times)
   @ ~/projects/KernelAbstractions.jl/src/cpu.jl:29 [inlined]
 [4] top-level scope
   @ ~/projects/simuleios/histograms/mwe4.jl:22
 [5] include(fname::String)
   @ Base.MainInclude ./client.jl:444
 [6] top-level scope
   @ REPL[7]:1
 [7] top-level scope
   @ ~/.julia/packages/CUDA/YpW0k/src/initialization.jl:52

    nested task error: UndefVarError: tid not defined
    Stacktrace:
     [1] cpu_f_test_kernel!(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.NoDynamicCheck, CartesianIndex{1}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(4,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}}, ::Vector{Float64})
       @ ./none:0 [inlined]
     [2] overdub
       @ ./none:0 [inlined]
     [3] __thread_run(tid::Int64, len::Int64, rem::Int64, obj::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(4,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_f_test_kernel!)}, ndrange::Tuple{Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(4,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}, args::Tuple{Vector{Float64}}, dynamic::KernelAbstractions.NDIteration.NoDynamicCheck)
       @ KernelAbstractions ~/projects/KernelAbstractions.jl/src/cpu.jl:157
     [4] __run(obj::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(4,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_f_test_kernel!)}, ndrange::Tuple{Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(4,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}, args::Tuple{Vector{Float64}}, dynamic::KernelAbstractions.NDIteration.NoDynamicCheck)
       @ KernelAbstractions ~/projects/KernelAbstractions.jl/src/cpu.jl:130
     [5] (::KernelAbstractions.var"#33#34"{Nothing, Nothing, typeof(KernelAbstractions.__run), Tuple{KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(4,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_f_test_kernel!)}, Tuple{Int64}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(4,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}, Tuple{Vector{Float64}}, KernelAbstractions.NDIteration.NoDynamicCheck}})()
       @ KernelAbstractions ~/projects/KernelAbstractions.jl/src/cpu.jl:22
in expression starting at /home/leios/projects/simuleios/histograms/mwe4.jl:22

I'll try my hand at it if I cannot find a workaround, but I figured I would create an issue here first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions