Skip to content

Releases: ROCm/rocBLAS

rocBLAS 4.4.0 for ROCm 6.4.0

11 Apr 13:35
80e5394
Compare
Choose a tag to compare

Added

  • rocTX support in rocBLAS (not available on Windows or in the static library version on Linux)
  • On gfx12, all functions now support full rocblas_int dynamic range for batch_count
  • --ninja build option
  • Support for GPU_TARGETS cmake variable

Changed

  • rocblas-test client removes the stress tests unless YAML-based testing or gtest_filter adds them
  • rocblas clients OpenMP default threading is reduced to be less than the logical core count
  • gemm_ex testing and timing reuses device memory
  • gemm_ex timing initializes matrices on device

Optimized

  • Significantly reduced workspace memory requirements for Level 1 ILP64: iamax and iamin
  • Reduced workspace memory requirements for Level 1 ILP64: dot, asum, nrm2
  • Improved the performance of Level 2 gemv for the problem sizes (TransA == N && m > 2*n) and (TransA == T)
  • Improved the performance of Level 3 syrk and herk for the problem size (k > 500 && n < 4000)

Resolved issues

  • gfx12: ger, geam, geam_ex, dgmm, trmm, symm, hemm, ILP64 gemm, and larger data support
  • Added a gfortran package dependency for Azure Linux OS
  • Outdated SLES OS package dependencies (cxxtools and joblib) in install.sh -d
  • Code object stripping for RPM packages

Upcoming changes

  • Deprecated the cmake variable AMDGPU_TARGETS. Use GPU_TARGETS instead.

rocBLAS 4.3.0 for ROCm 6.3.3

19 Feb 17:47
8ebd6c1
Compare
Choose a tag to compare

rocBLAS code for ROCm 6.3.3 did not change. The library was rebuilt for the updated ROCm 6.3.3 stack.

rocBLAS 4.3.0 for ROCm 6.3.2

28 Jan 15:44
8ebd6c1
Compare
Choose a tag to compare

rocBLAS code for ROCm 6.3.2 did not change. The library was rebuilt for the updated ROCm 6.3.2 stack.

rocBLAS 4.3.0 for ROCm 6.3.1

20 Dec 16:12
8ebd6c1
Compare
Choose a tag to compare

rocBLAS code for ROCm 6.3.1 did not change. The library was rebuilt for the updated ROCm 6.3.1 stack.

rocBLAS 4.3.0 for ROCm 6.3.0

03 Dec 19:49
8ebd6c1
Compare
Choose a tag to compare

Added

  • Level 3 and EX functions have an additional ILP64 API for both C and FORTRAN (_64 name suffix) with int64_t function arguments

Changed

  • amdclang is used as the default compiler instead of hipcc
  • Internal performance scripts use amd-smi instead of the deprecated rocm-smi

Optimized

  • Improved performance of Level 2 gbmv
  • Improved performance of Level 2 gemv for float and double precisions for problem sizes (TransA == N && m==n && m % 128 == 0) measured on a gfx942 GPU

Resolved issues

  • Fixed stbsv_strided_batched_64 Fortran binding

Upcoming changes

  • rocblas_Xgemm_kernel_name APIs are deprecated

rocBLAS 4.2.4 for ROCm 6.2.4

06 Nov 19:55
3171316
Compare
Choose a tag to compare

Additions

  • GFX1151 Support

rocBLAS 4.2.1 for ROCm 6.2.2

27 Sep 16:01
Compare
Choose a tag to compare

rocBLAS code for ROCm 6.2.2 did not change. The library was rebuilt for the updated ROCm 6.2.2 stack.

rocBLAS 4.2.1 for ROCm 6.2.1

20 Sep 19:58
Compare
Choose a tag to compare

Removals

  • Remove Device_Memory_Allocation.pdf link in documentation

Fixes

  • Fixed error/warn message during rocblas_set_stream() call

rocBLAS 4.2.0 for ROCm 6.2.0

02 Aug 16:15
54f305c
Compare
Choose a tag to compare

Additions

  • Level 2 functions and level 3 trsm have additional ILP64 API for both C and FORTRAN (_64 name suffix) with int64_t function arguments
  • Cache flush timing for gemm_batched_ex, gemm_strided_batched_ex, axpy
  • Benchmark class for common timing code
  • An environment variable "ROCBLAS_DEFAULT_ATOMICS_MODE" to set default atomics mode during creation of 'rocblas_handle'
  • Extended dot_ex to support single-precision (fp32_r) input and double-precision (fp64_r) output and compute types

Optimizations

  • Improved performance of Level 1 dot_batched and dot_strided_batched for all precisions. Performance enhanced by 6 times for bigger problem sizes measured on MI210 GPU

Changes

  • Linux AOCL dependency updated to release 4.2 gcc build
  • Windows vcpkg dependencies updated to release 2024.02.14
  • Increased default device workspace from 32 to 128 MiB for architecture gfx9xx with xx >= 40

Deprecations

  • rocblas_gemm_ex3, gemm_batched_ex3 and gemm_strided_batched_ex3 are deprecated and will be removed in the next major release of rocBLAS. Please refer to hipBLASLt for future 8 bit float usage https://github.com/ROCm/hipBLASLt

rocBLAS 4.1.2 for ROCm 6.1.5

12 Mar 18:30
8443539
Compare
Choose a tag to compare

rocBLAS code for ROCm 6.1.5 did not change. The library was rebuilt for the updated ROCm 6.1.5 stack.