-
-
Notifications
You must be signed in to change notification settings - Fork 127
Re-integrate NNlibCUDA as a package extension #445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Were you looking at the master docs? There's actually a really nice section guiding how to do all this and be backwards compatible (maybe you are already here, I haven't reviewed the code) |
Yes, that's exactly the section I read and it was extremely helpful. My statement above was only referring to the question of where in the repo tests for package extensions should live, but a look at the PGFPlotsX.jl codebase helped with that :) |
@@ -0,0 +1,5 @@ | |||
# NNlib + CUDA.jl | |||
|
|||
This is a glue package which extends functions from [NNlib.jl](https://github.com/FluxML/NNlib.jl) to work with [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl). It should be loaded automatically when NNlib and CUDA are loaded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid calling it a package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The glue package terminology is from the NNlibCUDA README, which was written back when the package extension proposal was still using "glue package". Will change.
@@ -100,4 +89,23 @@ include("impl/depthwiseconv_im2col.jl") | |||
include("impl/pooling_direct.jl") | |||
include("deprecations.jl") | |||
|
|||
function __init__() | |||
@require NNPACK_jll="a6bfbf70-4841-5cb9-aa18-3a8ad3c413ee" begin | |||
if isdefined(NNPACK_jll, :libnnpack) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could become an extension (in another PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather drop it altogether in the next breaking release because neither we nor any users appear to be using it (NNPACK itself has been basically unmaintained since 2020, and NNlib no longer lists it as a dep as of 0.7.4).
@static if !isdefined(Base, :get_extension) | ||
@require CUDA="052768ef-5323-5732-b1bb-66c8b64840ba" begin | ||
include("../ext/CUDAExt/CUDAExt.jl") | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes this PR a breaking change since it would conflict with NNlibCUDA
.
The alternative is to remove these lines altogether, and have people explicitly load NNlibCUDA
on julia <v1.9, but I think the @require
here makes life easier for downstream packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is indeed breaking. IMO putting this in NNlib 0.9 is preferable to splitting the codebase and maintaining two sets of instructions for users. RE downstream, this PR also seeks to implement FluxML/Flux.jl#2132 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a 0.9 milestone. We should think a bit about any other changes that ought to be bundled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already a TODO in this PR for one of them. Can't think of any others off the top of my head.
using NNlib: DenseConvDims | ||
import NNlib: conv!, ∇conv_filter!, ∇conv_data!, conv_bias_act! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not for this PR, but we should move imports to the top file of the module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there's plenty of cleanup that should happen on the NNlibCUDA side (I already deleted a couple of useless files). Didn't want to change too much though since that introduces extra testing/review burden.
@@ -0,0 +1,67 @@ | |||
if isdefined(Base, :get_extension) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we avoid the ugly duplication below? Shouldn't
using CUDA.CUDNN: ...
work in both cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately no, and I found this out the hard way because I kept missing imports and making the import system freak out. This syntax is documented in https://pkgdocs.julialang.org/dev/creating-packages/#Requires.jl.
@testset "CUDA" begin | ||
Pkg.test("NNlibCUDA") | ||
include("cuda/runtests.jl") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change the folder structure to
include("ext/cuda/runtests.jl")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that Pkg expects a flat structure for each extension (i.e. ext/CUDAExt
is the src/
folder), so there's no good place for tests. Ref. https://pkgdocs.julialang.org/dev/creating-packages/#Conditional-loading-of-code-in-packages-(Extensions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait is the suggestion /test/ext/cuda/runtests.jl
or something like /ext/CUDAExt/...
?
The former just seems more verbose.
The latter is a bit messy as there's no /ext/CUDAExt/src/
folder. Unlike independent packages in the same repository. Maybe that's intentional --- extensions don't have an independent existence, so don't really deserve tests you can run separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ext/CUDAExt
is required for the extension mechanism to work, AIUI. So if we were to move the tests into the extension dir, they'd have to be under ext/CUDAExt/test/...
and that's (as you note) messy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggestion is to do create an ext
folder inside the test
ones, so have /test/ext/cuda/runtests.jl
instead of /test/cuda/runtests.jl
, this also considering we will likely have other extensions in the future. Not really important though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. It doesn't seem awful to have say /test/cuda/
and /test/amd/
at top level, without organising these into another layer of hierarchy.
They could be named /test/CUDAext/
etc to stress the connection I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this is something we can put a pin in for the day when we get extensions which aren't just backends.
Attempting to time this on Julia 1.8: julia> @time using CUDA; @time cu(ones(3)) .+ 1;
7.658277 seconds (9.66 M allocations: 688.737 MiB, 3.56% gc time, 14.19% compilation time: 69% of which was recompilation)
25.145222 seconds (21.11 M allocations: 1.101 GiB, 1.98% gc time, 35.12% compilation time)
julia> @time @eval using NNlib, NNlibCUDA # tagged version
0.792670 seconds (404.08 k allocations: 27.676 MiB, 74.92% compilation time: 95% of which was recompilation)
# or
julia> @time @eval using NNlib # https://github.com/FluxML/NNlib.jl#bc/pkgext
5.517386 seconds (6.12 M allocations: 322.021 MiB, 2.49% gc time, 86.05% compilation time: 12% of which was recompilation)
julia> A, B, C = let n = 100
[cu(rand(n,n,n)) for _ in 1:3]
end;
julia> @time batched_mul!(C,A,B); # first run, no difference
2.235857 seconds (1.28 M allocations: 64.905 MiB, 2.07% gc time, 61.84% compilation time) # tagged
2.227217 seconds (1.28 M allocations: 64.869 MiB, 2.04% gc time, 62.05% compilation time) # PR
julia> @time batched_mul!(C,A,B); # second run, just noise
0.000216 seconds
0.000118 seconds Does this look right? 5s for Requires sounds like a lot. But perhaps not catastrophic given 30s for CUDA. (This is on quite a slow CPU.) Attempting to have a look with Julia nightly, I don't get this example to run:
|
If this is the same machine as FluxML/Flux.jl#2132 (comment), then 5s seems like a lot. @IanButterworth any tips for investigating what's taking up all that time? I don't understand what's going on with the batchedmul error. The line which should be adding the overload is attaching it directly to the NNlib namespace, so an incorrect import shouldn't be the culprit. Does NNlibCUDA work on nightly? I know CUDA.jl often won't support brand new nightlies, but it would be strange if that lack of support manifested this way. |
Yes, same machine. I did not try the Flux and NNlib PRs together. 5s is quite a lot of extra loading time. TTFG is mostly after loading though. Even my simple And using tagged versions of everything, I could install NNlibCUDA & run the same operation. I don't see what causes the failure above.
|
I see this on the latest nightly:
Is there possibly some bug where the 2nd method isn't looked up correctly? |
Looking for invalidations:
Nothing seems untoward or new, so is the latency more due to https://discourse.julialang.org/t/categoricalarrays-forces-openspecfun-jll-recompilation/87759/18? What I still don't understand is how to actually implement the Edit: JuliaPackaging/Requires.jl#65 might have a lead, is there a 2022-Julia friendly version of the same? |
subsided by #492 |
Follow-up to/dependency of FluxML/Flux.jl#2132.
Note that tests live under NNlib's src dir and not the extensions. I couldn't find any guidance on the Pkg docs about this, so I just followed what PGFPlotsX.jl (first registered package using extensions) does.
PR Checklist