Releases · ROCm/rocBLAS

11 Apr 13:35

rocm-ci

rocm-6.4.0

80e5394

rocBLAS 4.4.0 for ROCm 6.4.0 Latest

Latest

Added

rocTX support in rocBLAS (not available on Windows or in the static library version on Linux)
On gfx12, all functions now support full rocblas_int dynamic range for batch_count
--ninja build option
Support for GPU_TARGETS cmake variable

Changed

rocblas-test client removes the stress tests unless YAML-based testing or gtest_filter adds them
rocblas clients OpenMP default threading is reduced to be less than the logical core count
gemm_ex testing and timing reuses device memory
gemm_ex timing initializes matrices on device

Optimized

Significantly reduced workspace memory requirements for Level 1 ILP64: iamax and iamin
Reduced workspace memory requirements for Level 1 ILP64: dot, asum, nrm2
Improved the performance of Level 2 gemv for the problem sizes (TransA == N && m > 2*n) and (TransA == T)
Improved the performance of Level 3 syrk and herk for the problem size (k > 500 && n < 4000)

Resolved issues

gfx12: ger, geam, geam_ex, dgmm, trmm, symm, hemm, ILP64 gemm, and larger data support
Added a gfortran package dependency for Azure Linux OS
Outdated SLES OS package dependencies (cxxtools and joblib) in install.sh -d
Code object stripping for RPM packages

Upcoming changes

Deprecated the cmake variable AMDGPU_TARGETS. Use GPU_TARGETS instead.

Assets 2

19 Feb 17:47

rocm-ci

rocm-6.3.3

8ebd6c1

rocBLAS 4.3.0 for ROCm 6.3.3

rocBLAS code for ROCm 6.3.3 did not change. The library was rebuilt for the updated ROCm 6.3.3 stack.

Assets 2

28 Jan 15:44

rocm-ci

rocm-6.3.2

8ebd6c1

rocBLAS 4.3.0 for ROCm 6.3.2

rocBLAS code for ROCm 6.3.2 did not change. The library was rebuilt for the updated ROCm 6.3.2 stack.

Assets 2

20 Dec 16:12

rocm-ci

rocm-6.3.1

8ebd6c1

rocBLAS 4.3.0 for ROCm 6.3.1

rocBLAS code for ROCm 6.3.1 did not change. The library was rebuilt for the updated ROCm 6.3.1 stack.

Assets 2

03 Dec 19:49

rocm-ci

rocm-6.3.0

8ebd6c1

rocBLAS 4.3.0 for ROCm 6.3.0

Added

Level 3 and EX functions have an additional ILP64 API for both C and FORTRAN (_64 name suffix) with int64_t function arguments

Changed

amdclang is used as the default compiler instead of hipcc
Internal performance scripts use amd-smi instead of the deprecated rocm-smi

Optimized

Improved performance of Level 2 gbmv
Improved performance of Level 2 gemv for float and double precisions for problem sizes (TransA == N && m==n && m % 128 == 0) measured on a gfx942 GPU

Resolved issues

Fixed stbsv_strided_batched_64 Fortran binding

Upcoming changes

rocblas_Xgemm_kernel_name APIs are deprecated

Assets 2

06 Nov 19:55

rocm-ci

rocm-6.2.4

3171316

rocBLAS 4.2.4 for ROCm 6.2.4

Additions

GFX1151 Support

Assets 2

27 Sep 16:01

rocm-ci

rocm-6.2.2

c6de034

rocBLAS 4.2.1 for ROCm 6.2.2

rocBLAS code for ROCm 6.2.2 did not change. The library was rebuilt for the updated ROCm 6.2.2 stack.

Assets 2

20 Sep 19:58

rocm-ci

rocm-6.2.1

c6de034

rocBLAS 4.2.1 for ROCm 6.2.1

Removals

Remove Device_Memory_Allocation.pdf link in documentation

Fixes

Fixed error/warn message during rocblas_set_stream() call

Assets 2

02 Aug 16:15

rocm-ci

rocm-6.2.0

54f305c

rocBLAS 4.2.0 for ROCm 6.2.0

Additions

Level 2 functions and level 3 trsm have additional ILP64 API for both C and FORTRAN (_64 name suffix) with int64_t function arguments
Cache flush timing for gemm_batched_ex, gemm_strided_batched_ex, axpy
Benchmark class for common timing code
An environment variable "ROCBLAS_DEFAULT_ATOMICS_MODE" to set default atomics mode during creation of 'rocblas_handle'
Extended dot_ex to support single-precision (fp32_r) input and double-precision (fp64_r) output and compute types

Optimizations

Improved performance of Level 1 dot_batched and dot_strided_batched for all precisions. Performance enhanced by 6 times for bigger problem sizes measured on MI210 GPU

Changes

Linux AOCL dependency updated to release 4.2 gcc build
Windows vcpkg dependencies updated to release 2024.02.14
Increased default device workspace from 32 to 128 MiB for architecture gfx9xx with xx >= 40

Deprecations

rocblas_gemm_ex3, gemm_batched_ex3 and gemm_strided_batched_ex3 are deprecated and will be removed in the next major release of rocBLAS. Please refer to hipBLASLt for future 8 bit float usage https://github.com/ROCm/hipBLASLt

Assets 2

12 Mar 18:30

rocm-ci

rocm-6.1.5

8443539

rocBLAS 4.1.2 for ROCm 6.1.5

rocBLAS code for ROCm 6.1.5 did not change. The library was rebuilt for the updated ROCm 6.1.5 stack.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added

Changed

Optimized

Resolved issues

Upcoming changes

Added

Changed

Optimized

Resolved issues

Upcoming changes

Additions

Removals

Fixes

Additions

Optimizations

Changes

Deprecations

Releases: ROCm/rocBLAS

rocBLAS 4.4.0 for ROCm 6.4.0

Added

Changed

Optimized

Resolved issues

Upcoming changes

rocBLAS 4.3.0 for ROCm 6.3.3

rocBLAS 4.3.0 for ROCm 6.3.2

rocBLAS 4.3.0 for ROCm 6.3.1

rocBLAS 4.3.0 for ROCm 6.3.0

Added

Changed

Optimized

Resolved issues

Upcoming changes

rocBLAS 4.2.4 for ROCm 6.2.4

Additions

rocBLAS 4.2.1 for ROCm 6.2.2

rocBLAS 4.2.1 for ROCm 6.2.1

Removals

Fixes

rocBLAS 4.2.0 for ROCm 6.2.0

Additions

Optimizations

Changes

Deprecations

rocBLAS 4.1.2 for ROCm 6.1.5