You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implementing an algorithm I ran into a problem how to better use OpenBLAS. So I have several matrix multiplications in omp parallel section. Resulting matrices should be summed up. So it is just $C = C + A * B$ (e.g., usual dgemm routine with shared $C$ and private $A$ and $B$ in omp parallel section), but might you clarify does OpenBLAS optimally deal with synchronization here when library was built with USE_OPENMP=1 USE_LOCKING=1? I mean something like summing in $C = C + A * B$ expression should be done after block of $A * B$ is calculated (so obviously elements of $C$ shouldn't be updated very often). Could you please tell if I have a correct idea about the implementation in OpenBLAS or do I need to take into account described remarks on my own? And if it would be better to study your code instead of asking such questions directly, just say so!)
The text was updated successfully, but these errors were encountered:
Implementing an algorithm I ran into a problem how to better use OpenBLAS. So I have several matrix multiplications in$C = C + A * B$ (e.g., usual $C$ and private $A$ and $B$ in $C = C + A * B$ expression should be done after block of $A * B$ is calculated (so obviously elements of $C$ shouldn't be updated very often). Could you please tell if I have a correct idea about the implementation in OpenBLAS or do I need to take into account described remarks on my own? And if it would be better to study your code instead of asking such questions directly, just say so!)
omp parallel
section. Resulting matrices should be summed up. So it is justdgemm
routine with sharedomp parallel
section), but might you clarify does OpenBLAS optimally deal with synchronization here when library was built withUSE_OPENMP=1 USE_LOCKING=1
? I mean something like summing inThe text was updated successfully, but these errors were encountered: