cudaTensorCoreGemm

Replace README references to "CUDA Toolkit 12.5" with general "CUDA T…

Apr 30, 2025

14b1bfd · Apr 30, 2025

Name	Name	Last commit message	Last commit date
parent directory ..
.vscode	.vscode	add and update samples for CUDA 11.6	Jan 13, 2022
CMakeLists.txt	CMakeLists.txt	Update all sample CMakeLists.txt to include ENABLE_CUDA_DEBUG flag to…	Mar 26, 2025
README.md	README.md	Replace README references to "CUDA Toolkit 12.5" with general "CUDA T…	Apr 30, 2025
cudaTensorCoreGemm.cu	cudaTensorCoreGemm.cu	Apply consistent code formatting across the repo. Add clang-format an…	Mar 27, 2025

README.md

cudaTensorCoreGemm - CUDA Tensor Core GEMM

Description

CUDA sample demonstrating a GEMM computation using the Warp Matrix Multiply and Accumulate (WMMA) API introduced in CUDA 9.

This sample demonstrates the use of the new CUDA WMMA API employing the Tensor Cores introduced in the Volta chip family for faster matrix operations.

In addition to that, it demonstrates the use of the new CUDA function attribute cudaFuncAttributeMaxDynamicSharedMemorySize that allows the application to reserve an extended amount of shared memory than it is available by default.

Key Concepts

Matrix Multiply, WMMA, Tensor Cores

Supported SM Architectures

SM 7.0 SM 7.2 SM 7.5 SM 8.0 SM 8.6 SM 8.7 SM 8.9 SM 9.0

Supported OSes

Linux, Windows

Supported CPU Architecture

x86_64, aarch64

CUDA APIs involved

CUDA Runtime API

cudaMemcpy, cudaFree, cudaGetErrorString, cudaGetLastError, cudaEventSynchronize, cudaFuncSetAttribute, cudaEventRecord, cudaMemset, cudaMalloc, cudaEventElapsedTime, cudaGetDeviceProperties, cudaEventCreate

Prerequisites

Download and install the CUDA Toolkit for your corresponding platform.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Files

cudaTensorCoreGemm

cudaTensorCoreGemm

README.md

cudaTensorCoreGemm - CUDA Tensor Core GEMM

Description

Key Concepts

Supported SM Architectures

Supported OSes

Supported CPU Architecture

CUDA APIs involved

CUDA Runtime API

Prerequisites

References (for more details)

Files

cudaTensorCoreGemm

Directory actions

More options

Directory actions

More options

Latest commit

History

cudaTensorCoreGemm

Folders and files

parent directory

README.md

cudaTensorCoreGemm - CUDA Tensor Core GEMM

Description

Key Concepts

Supported SM Architectures

Supported OSes

Supported CPU Architecture

CUDA APIs involved

CUDA Runtime API

Prerequisites

References (for more details)