-
Notifications
You must be signed in to change notification settings - Fork 14
Update MIOpen to support gfx1151 and use older version of frugally-deep #392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# Originally mirrored from: https://github.com/Dobiasd/frugally-deep/archive/refs/tags/v0.16.2.tar.gz | ||
URL https://rocm-third-party-deps.s3.us-east-2.amazonaws.com/frugally-deep-0.16.2.tar.gz | ||
URL_HASH SHA256=b16af09606dcf02359de53b7c47323baaeda9a174e1c87e126c3127c55571971 | ||
# TODO: copy tarball to rocm-third-party-deps.s3.us-east-2.amazonaws.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to upload the specified version but can you elaborate why the downgraded is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marbre , without this frugally-deep downgrade, code using torch.nn.Conv2d
from pytorch would produce MIOpen Error: tensor_shape_variable needs to be an array RuntimeError: miopenStatusUnknownError
.
See
#244 (comment)
(Really appreciate your quick reviews!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marbre , I've amended the PR with a comment with links to the upstream MIOpen issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marbre as @scottt mentioned, more context available at ROCm/MIOpen#3588 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing me to it. Let me upload the artifact for you asap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. @BradPepersAMD if you're not aware of this yet, you probably should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# TODO: copy tarball to rocm-third-party-deps.s3.us-east-2.amazonaws.com |
Uploaded it, thus you can drop the TODO :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marbre , commit added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 seems you still need to drop the TODO note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, @marbre thanks for catching that. PR updated.
Co-authored: Scott Tsai <[email protected]>
b923341
to
4220d64
Compare
Co-authored-by: Marius Brehler <[email protected]>
I have not much context on the MIOpen specific patches. @stellaraccident would you be able to review or can assign someone? |
Some further explainations of the patches to help review:
patches/amd-mainline/MIOpen/0001-driver-CMakeLists.txt-allow-MIOpenDriver-to-be-built.patch allows This patch changes third-party/nlohmann-json/CMakeLists.txt to build with
patches/amd-mainline/MIOpen/0002-Fix-build-for-gfx1151-gfx1036.patch gfx1151 wants a I'd personally love to learn what a I made MIOpen build for gfx1036 as well, as that was the iGPU available in my 9950X3D CPU and I manually tested the MIOpen GPU assembly kernels on both.
This conv2d fp16 specific kernel was causing illegal opcode errors on the gfx1151 patches/amd-mainline/MIOpen/0003-Disable-conv_wino_fury_RxS-for-gfx115x.patch I should probably file an issue here. I'm looking forward to learning how to find out which GPU instruction produced the illegal opcode error and fixing it. |
Ok, there are a few things that need to be done to land this. For the substantive changes to miopen, I at least need to get a PR opened upstream for feedback. I'll piece this apart and start landing today. |
I tried debugging this and am stumped - the ISA delta between RDNA3 and RDNA3.5 is pretty minimal to non-existent. hipcc doesn't complain about it during assembly either. I think there might be some very subtle differences but it's not clearly documented publicly yet. Ideally this hand-written ISA should be rewritten as MLIR which can then be lowered down to navi3/3.5 to handle those subtle idiosyncrasies. |
@BradPepersAMD is going to help advise on how to land these |
* Both changes were found to be required upon additional testing of MIOpen. * Includes a workaround for ROCm/MIOpen#3588 Part of breaking #392 up for landing.
* Both changes were found to be required upon additional testing of MIOpen. * Includes a workaround for ROCm/MIOpen#3588 Part of breaking #392 up for landing.
Ok, I've broken this down into a couple of patches and either landed or set up to land. #432 is the one with the key functional changes and includes links to issues and upstream PRs I created for each. Closing this as everything it included has been pushed through an appropriate channel. |
* Enables MIOpenDriver (requires patch to correct system deps). Always sets MIOPEN_BUILD_DRIVER=ON and propagates BUILD_TESTING from the parent project #433 * Patches some device feature words for gfx1151 and gfx1036 and simplifies macro logic (0002-Fix-build-for-gfx1151-gfx1036.patch) #434 * Patches conv_wino_fury_RxS to disable on gfx1151 (0003-Disable-conv_wino_fury_RxS-for-gfx115x.patch) #435 * Makes MIOpenDriver exit with failing status (0004-MIOpenDriver-exit-with-status-1-on-usage-error.patch) #436 Lands remaining parts of #392. Co-Authored-By: Scott Tsai <[email protected]>
Ok, the PRs are landed in TheRock. Upstream patches are being reviewed. @scottt you may want to make sure you are subscribed to the gfx1151 specific MIOpen patches as they said they may want to discuss more. |
I combined and reworked the MIOpen patches for gfx1151 from myself and @jammm and tested it a bit more with MIOpenDriver and pytorch.