Name		Name	Last commit message	Last commit date
parent directory ..
.vscode		.vscode
CMakeLists.txt		CMakeLists.txt
README.md		README.md
tf32TensorCoreGemm.cu		tf32TensorCoreGemm.cu

README.md

tf32TensorCoreGemm - tf32 Tensor Core GEMM

Description

A CUDA sample demonstrating tf32 (e8m10) GEMM computation using the Warp Matrix Multiply and Accumulate (WMMA) API introduced with CUDA 11 in Ampere chip family tensor cores for faster matrix operations. This sample also uses async copy provided by cuda pipeline interface for gmem to shmem async loads which improves kernel performance and reduces register presssure.

Key Concepts

Matrix Multiply, WMMA, Tensor Cores

Supported SM Architectures

SM 8.0 SM 8.6 SM 8.7 SM 8.9 SM 9.0

Supported OSes

Linux, Windows

Supported CPU Architecture

x86_64, aarch64

CUDA APIs involved

CUDA Runtime API

cudaMemcpy, cudaFree, cudaGetErrorString, cudaGetLastError, cudaEventSynchronize, cudaFuncSetAttribute, cudaEventRecord, cudaMemset, cudaMalloc, cudaEventElapsedTime, cudaGetDeviceProperties, cudaEventCreate

Dependencies needed to build/run

CPP11

Prerequisites

Download and install the CUDA Toolkit 12.5 for your corresponding platform. Make sure the dependencies mentioned in Dependencies section above are installed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tf32TensorCoreGemm

tf32TensorCoreGemm

README.md

tf32TensorCoreGemm - tf32 Tensor Core GEMM

Description

Key Concepts

Supported SM Architectures

Supported OSes

Supported CPU Architecture

CUDA APIs involved

CUDA Runtime API

Dependencies needed to build/run

Prerequisites

References (for more details)

Files

tf32TensorCoreGemm

Directory actions

More options

Directory actions

More options

Latest commit

History

tf32TensorCoreGemm

Folders and files

parent directory

README.md

tf32TensorCoreGemm - tf32 Tensor Core GEMM

Description

Key Concepts

Supported SM Architectures

Supported OSes

Supported CPU Architecture

CUDA APIs involved

CUDA Runtime API

Dependencies needed to build/run

Prerequisites

References (for more details)