Skip to content

Use cudaMemset/hipMemset to setup IndexShuffling kernel. #4016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

levendlee
Copy link
Member

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

Copy link

netlify bot commented Apr 24, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit a331692
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/680f958c763243000760684c
😎 Deploy Preview https://deploy-preview-4016--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 24, 2025
Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755
levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 24, 2025
Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755
levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 24, 2025
Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 24, 2025
Summary:
Pull Request resolved: pytorch#4016

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 24, 2025
Summary:
Pull Request resolved: pytorch#4016

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755
@levendlee levendlee force-pushed the export-D73602755 branch 2 times, most recently from a783b2b to ab6f083 Compare April 24, 2025 20:53
levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 24, 2025
Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755
levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 24, 2025
Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 24, 2025
Summary:
Pull Request resolved: pytorch#4016

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 24, 2025
Summary:
Pull Request resolved: pytorch#4016

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755
levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 25, 2025
Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755
levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 25, 2025
Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemset/hipMemset instead.

Differential Revision: D73602755
levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 25, 2025
…ytorch#4016)

Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync/hipMemsetAsync instead.
Note we need to use cudaMemsetAsync/hipMemsetAsync on current stream to be compatible with CUDAGraph capture.

Reviewed By: Alkaid-Benetnash

Differential Revision: D73602755
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 25, 2025
…ytorch#4016)

Summary:
Pull Request resolved: pytorch#4016

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync/hipMemsetAsync instead.
Note we need to use cudaMemsetAsync/hipMemsetAsync on current stream to be compatible with CUDAGraph capture.

Reviewed By: Alkaid-Benetnash

Differential Revision: D73602755
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 25, 2025
…ytorch#4016)

Summary:
Pull Request resolved: pytorch#4016

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync/hipMemsetAsync instead.
Note we need to use cudaMemsetAsync/hipMemsetAsync on current stream to be compatible with CUDAGraph capture.

Reviewed By: Alkaid-Benetnash

Differential Revision: D73602755
levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 27, 2025
Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync instead.
hipMemsetAsync is somehow more expensive than launching a kernel. Avoid doing so for now.

Reviewed By: Alkaid-Benetnash

Differential Revision: D73602755
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 28, 2025
Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync instead.
hipMemsetAsync is somehow more expensive than launching a kernel. Avoid doing so for now.

Reviewed By: Alkaid-Benetnash

Differential Revision: D73602755
levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 28, 2025
Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync instead.
hipMemsetAsync is somehow more expensive than launching a kernel. Avoid doing so for now.

Reviewed By: Alkaid-Benetnash

Differential Revision: D73602755
levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 28, 2025
Summary:

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync instead.
hipMemsetAsync is somehow more expensive than launching a kernel. Avoid doing so for now.

Reviewed By: Alkaid-Benetnash

Differential Revision: D73602755
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

levendlee added a commit to levendlee/FBGEMM that referenced this pull request Apr 28, 2025
Summary:
Pull Request resolved: pytorch#4016

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync instead.
hipMemsetAsync is somehow more expensive than launching a kernel. Avoid doing so for now.

Reviewed By: Alkaid-Benetnash

Differential Revision: D73602755
Summary:
Pull Request resolved: pytorch#4016

X-link: facebookresearch/FBGEMM#1104

It is too expensive to launch a ATen kernel to do setup. Use cudaMemsetAsync instead.
hipMemsetAsync is somehow more expensive than launching a kernel. Avoid doing so for now.

Reviewed By: Alkaid-Benetnash

Differential Revision: D73602755
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73602755

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in eeee38e.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants