Releases: ROCm/rocBLAS
Releases · ROCm/rocBLAS
rocBLAS 4.4.0 for ROCm 6.4.0
Added
- rocTX support in rocBLAS (not available on Windows or in the static library version on Linux)
- On gfx12, all functions now support full
rocblas_int
dynamic range forbatch_count
--ninja
build option- Support for GPU_TARGETS cmake variable
Changed
- rocblas-test client removes the stress tests unless YAML-based testing or
gtest_filter
adds them - rocblas clients OpenMP default threading is reduced to be less than the logical core count
gemm_ex
testing and timing reuses device memorygemm_ex
timing initializes matrices on device
Optimized
- Significantly reduced workspace memory requirements for Level 1 ILP64:
iamax
andiamin
- Reduced workspace memory requirements for Level 1 ILP64:
dot
,asum
,nrm2
- Improved the performance of Level 2 gemv for the problem sizes (
TransA == N && m > 2*n
) and (TransA == T
) - Improved the performance of Level 3 syrk and herk for the problem size (
k > 500 && n < 4000
)
Resolved issues
- gfx12:
ger
,geam
,geam_ex
,dgmm
,trmm
,symm
,hemm
, ILP64gemm
, and larger data support - Added a
gfortran
package dependency for Azure Linux OS - Outdated SLES OS package dependencies (
cxxtools
andjoblib
) ininstall.sh -d
- Code object stripping for RPM packages
Upcoming changes
- Deprecated the cmake variable
AMDGPU_TARGETS
. UseGPU_TARGETS
instead.
rocBLAS 4.3.0 for ROCm 6.3.3
rocBLAS code for ROCm 6.3.3 did not change. The library was rebuilt for the updated ROCm 6.3.3 stack.
rocBLAS 4.3.0 for ROCm 6.3.2
rocBLAS code for ROCm 6.3.2 did not change. The library was rebuilt for the updated ROCm 6.3.2 stack.
rocBLAS 4.3.0 for ROCm 6.3.1
rocBLAS code for ROCm 6.3.1 did not change. The library was rebuilt for the updated ROCm 6.3.1 stack.
rocBLAS 4.3.0 for ROCm 6.3.0
Added
- Level 3 and EX functions have an additional ILP64 API for both C and FORTRAN (_64 name suffix) with int64_t function arguments
Changed
- amdclang is used as the default compiler instead of hipcc
- Internal performance scripts use amd-smi instead of the deprecated rocm-smi
Optimized
- Improved performance of Level 2 gbmv
- Improved performance of Level 2 gemv for float and double precisions for problem sizes (TransA == N && m==n && m % 128 == 0) measured on a gfx942 GPU
Resolved issues
- Fixed stbsv_strided_batched_64 Fortran binding
Upcoming changes
- rocblas_Xgemm_kernel_name APIs are deprecated
rocBLAS 4.2.4 for ROCm 6.2.4
Additions
- GFX1151 Support
rocBLAS 4.2.1 for ROCm 6.2.2
rocBLAS code for ROCm 6.2.2 did not change. The library was rebuilt for the updated ROCm 6.2.2 stack.
rocBLAS 4.2.1 for ROCm 6.2.1
Removals
- Remove Device_Memory_Allocation.pdf link in documentation
Fixes
- Fixed error/warn message during rocblas_set_stream() call
rocBLAS 4.2.0 for ROCm 6.2.0
Additions
- Level 2 functions and level 3 trsm have additional ILP64 API for both C and FORTRAN (_64 name suffix) with int64_t function arguments
- Cache flush timing for gemm_batched_ex, gemm_strided_batched_ex, axpy
- Benchmark class for common timing code
- An environment variable "ROCBLAS_DEFAULT_ATOMICS_MODE" to set default atomics mode during creation of 'rocblas_handle'
- Extended dot_ex to support single-precision (fp32_r) input and double-precision (fp64_r) output and compute types
Optimizations
- Improved performance of Level 1 dot_batched and dot_strided_batched for all precisions. Performance enhanced by 6 times for bigger problem sizes measured on MI210 GPU
Changes
- Linux AOCL dependency updated to release 4.2 gcc build
- Windows vcpkg dependencies updated to release 2024.02.14
- Increased default device workspace from 32 to 128 MiB for architecture gfx9xx with xx >= 40
Deprecations
- rocblas_gemm_ex3, gemm_batched_ex3 and gemm_strided_batched_ex3 are deprecated and will be removed in the next major release of rocBLAS. Please refer to hipBLASLt for future 8 bit float usage https://github.com/ROCm/hipBLASLt
rocBLAS 4.1.2 for ROCm 6.1.5
rocBLAS code for ROCm 6.1.5 did not change. The library was rebuilt for the updated ROCm 6.1.5 stack.