Use cudaMemset/hipMemset to setup IndexShuffling kernel. #4016

levendlee · 2025-04-24T18:44:04Z

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755

facebook-github-bot · 2025-04-24T18:44:13Z

This pull request was exported from Phabricator. Differential Revision: D73602755

netlify · 2025-04-24T18:44:24Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`a331692`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/680f958c763243000760684c
😎 Deploy Preview	https://deploy-preview-4016--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755

facebook-github-bot · 2025-04-24T20:07:56Z

This pull request was exported from Phabricator. Differential Revision: D73602755

facebook-github-bot · 2025-04-24T20:09:57Z

This pull request was exported from Phabricator. Differential Revision: D73602755

Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755

facebook-github-bot · 2025-04-24T20:21:27Z

This pull request was exported from Phabricator. Differential Revision: D73602755

Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755

Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755

facebook-github-bot · 2025-04-24T21:00:12Z

This pull request was exported from Phabricator. Differential Revision: D73602755

Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755

facebook-github-bot · 2025-04-24T21:07:44Z

This pull request was exported from Phabricator. Differential Revision: D73602755

Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755

Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead. Differential Revision: D73602755

…ytorch#4016) Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync/hipMemsetAsync instead. Note we need to use cudaMemsetAsync/hipMemsetAsync on current stream to be compatible with CUDAGraph capture. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755

facebook-github-bot · 2025-04-25T23:26:23Z

This pull request was exported from Phabricator. Differential Revision: D73602755

facebook-github-bot · 2025-04-25T23:26:27Z

This pull request was exported from Phabricator. Differential Revision: D73602755

…ytorch#4016) Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync/hipMemsetAsync instead. Note we need to use cudaMemsetAsync/hipMemsetAsync on current stream to be compatible with CUDAGraph capture. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755

facebook-github-bot · 2025-04-25T23:34:14Z

This pull request was exported from Phabricator. Differential Revision: D73602755

…ytorch#4016) Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync/hipMemsetAsync instead. Note we need to use cudaMemsetAsync/hipMemsetAsync on current stream to be compatible with CUDAGraph capture. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755

Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync instead. hipMemsetAsync is somehow more expensive than launching a kernel. Avoid doing so for now. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755

facebook-github-bot · 2025-04-27T23:02:34Z

This pull request was exported from Phabricator. Differential Revision: D73602755

Summary: X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync instead. hipMemsetAsync is somehow more expensive than launching a kernel. Avoid doing so for now. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755

facebook-github-bot · 2025-04-28T14:38:28Z

This pull request was exported from Phabricator. Differential Revision: D73602755

facebook-github-bot · 2025-04-28T14:40:38Z

This pull request was exported from Phabricator. Differential Revision: D73602755

Summary: Pull Request resolved: pytorch#4016 X-link: facebookresearch/FBGEMM#1104 It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync instead. hipMemsetAsync is somehow more expensive than launching a kernel. Avoid doing so for now. Reviewed By: Alkaid-Benetnash Differential Revision: D73602755

facebook-github-bot · 2025-04-28T14:49:39Z

This pull request was exported from Phabricator. Differential Revision: D73602755

facebook-github-bot · 2025-04-28T18:41:18Z

This pull request has been merged in eeee38e.

facebook-github-bot added the cla signed label Apr 24, 2025

facebook-github-bot added the fb-exported label Apr 24, 2025

levendlee force-pushed the export-D73602755 branch from c2bbc70 to 282cf97 Compare April 24, 2025 20:07

levendlee force-pushed the export-D73602755 branch from 282cf97 to 6f57ffb Compare April 24, 2025 20:07

levendlee force-pushed the export-D73602755 branch from 6f57ffb to fce075c Compare April 24, 2025 20:07

levendlee force-pushed the export-D73602755 branch from fce075c to 807fd87 Compare April 24, 2025 20:10

levendlee force-pushed the export-D73602755 branch 2 times, most recently from a783b2b to ab6f083 Compare April 24, 2025 20:53

levendlee force-pushed the export-D73602755 branch from ab6f083 to 21b0fb0 Compare April 24, 2025 20:54

levendlee force-pushed the export-D73602755 branch from 21b0fb0 to cf1e7e2 Compare April 24, 2025 21:00

levendlee force-pushed the export-D73602755 branch from cf1e7e2 to f630cde Compare April 24, 2025 21:07

levendlee force-pushed the export-D73602755 branch from f630cde to 02cb1da Compare April 25, 2025 00:58

levendlee force-pushed the export-D73602755 branch from 5a56152 to 513eb3a Compare April 25, 2025 23:24

levendlee force-pushed the export-D73602755 branch from 513eb3a to 2b1606e Compare April 25, 2025 23:26

levendlee force-pushed the export-D73602755 branch from 2b1606e to 8cb941b Compare April 25, 2025 23:26

levendlee force-pushed the export-D73602755 branch from 8cb941b to d6a9b6f Compare April 25, 2025 23:34

levendlee force-pushed the export-D73602755 branch from d6a9b6f to 046f3f4 Compare April 27, 2025 23:02

levendlee force-pushed the export-D73602755 branch from 046f3f4 to f36a72e Compare April 28, 2025 14:37

levendlee force-pushed the export-D73602755 branch from f36a72e to 1d6e112 Compare April 28, 2025 14:37

levendlee force-pushed the export-D73602755 branch from 1d6e112 to 0cb545f Compare April 28, 2025 14:38

levendlee force-pushed the export-D73602755 branch from 0cb545f to ade50d9 Compare April 28, 2025 14:40

levendlee force-pushed the export-D73602755 branch from ade50d9 to a331692 Compare April 28, 2025 14:49

facebook-github-bot closed this in eeee38e Apr 28, 2025

facebook-github-bot added the Merged label Apr 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use cudaMemset/hipMemset to setup IndexShuffling kernel. #4016

Use cudaMemset/hipMemset to setup IndexShuffling kernel. #4016

levendlee commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

netlify bot commented Apr 24, 2025 •

edited

Loading

facebook-github-bot commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

facebook-github-bot commented Apr 25, 2025

facebook-github-bot commented Apr 25, 2025

facebook-github-bot commented Apr 25, 2025

facebook-github-bot commented Apr 27, 2025

facebook-github-bot commented Apr 28, 2025

facebook-github-bot commented Apr 28, 2025

facebook-github-bot commented Apr 28, 2025

facebook-github-bot commented Apr 28, 2025

Use cudaMemset/hipMemset to setup IndexShuffling kernel. #4016

Use cudaMemset/hipMemset to setup IndexShuffling kernel. #4016

Conversation

levendlee commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

netlify bot commented Apr 24, 2025 • edited Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

facebook-github-bot commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

facebook-github-bot commented Apr 24, 2025

facebook-github-bot commented Apr 25, 2025

facebook-github-bot commented Apr 25, 2025

facebook-github-bot commented Apr 25, 2025

facebook-github-bot commented Apr 27, 2025

facebook-github-bot commented Apr 28, 2025

facebook-github-bot commented Apr 28, 2025

facebook-github-bot commented Apr 28, 2025

facebook-github-bot commented Apr 28, 2025

netlify bot commented Apr 24, 2025 •

edited

Loading