|
| 1 | +\pagebreak |
| 2 | +\chapter{OpenMP Affinity} |
| 3 | +\label{chap:openmp_affinity} |
| 4 | + |
| 5 | +OpenMP Affinity consists of a \code{proc\_bind} policy (thread affinity policy) and a specification of |
| 6 | +places (\texttt{"}location units\texttt{"} or \plc{processors} that may be cores, hardware |
| 7 | +threads, sockets, etc.). |
| 8 | +OpenMP Affinity enables users to bind computations on specific places. |
| 9 | +The placement will hold for the duration of the parallel region. |
| 10 | +However, the runtime is free to migrate the OpenMP threads |
| 11 | +to different cores (hardware threads, sockets, etc.) prescribed within a given place, |
| 12 | +if two or more cores (hardware threads, sockets, etc.) have been assigned to a given place. |
| 13 | + |
| 14 | +Often the binding can be managed without resorting to explicitly setting places. |
| 15 | +Without the specification of places in the \code{OMP\_PLACES} variable, |
| 16 | +the OpenMP runtime will distribute and bind threads using the entire range of processors for |
| 17 | +the OpenMP program, according to the \code{OMP\_PROC\_BIND} environment variable |
| 18 | +or the \code{proc\_bind} clause. When places are specified, the OMP runtime |
| 19 | +binds threads to the places according to a default distribution policy, or |
| 20 | +those specified in the \code{OMP\_PROC\_BIND} environment variable or the |
| 21 | +\code{proc\_bind} clause. |
| 22 | + |
| 23 | +In the OpenMP Specifications document a processor refers to an execution unit that |
| 24 | +is enabled for an OpenMP thread to use. A processor is a core when there is |
| 25 | +no SMT (Simultaneous Multi-Threading) support or SMT is disabled. When |
| 26 | +SMT is enabled, a processor is a hardware thread (HW-thread). (This is the |
| 27 | +usual case; but actually, the execution unit is implementation defined.) Processor |
| 28 | +numbers are numbered sequentially from 0 to the number of cores less one (without SMT), or |
| 29 | +0 to the number HW-threads less one (with SMT). OpenMP places use the processor number to designate |
| 30 | +binding locations (unless an \texttt{"}abstract name\texttt{"} is used.) |
| 31 | + |
| 32 | + |
| 33 | +The processors available to a process may be a subset of the system's |
| 34 | +processors. This restriction may be the result of a |
| 35 | +wrapper process controlling the execution (such as \code{numactl} on Linux systems), |
| 36 | +compiler options, library-specific environment variables, or default |
| 37 | +kernel settings. For instance, the execution of multiple MPI processes, |
| 38 | +launched on a single compute node, will each have a subset of processors as |
| 39 | +determined by the MPI launcher or set by MPI affinity environment |
| 40 | +variables for the MPI library. %Forked threads within an MPI process |
| 41 | +%(for a hybrid execution of MPI and OpenMP code) inherit the valid |
| 42 | +%processor set for execution from the parent process (the initial task region) |
| 43 | +%when a parallel region forks threads. The binding policy set in |
| 44 | +%\code{OMP\_PROC\_BIND} or the \code{proc\_bind} clause will be applied to |
| 45 | +%the subset of processors available to \plc{the particular} MPI process. |
| 46 | + |
| 47 | +%Also, setting an explicit list of processor numbers in the \code{OMP\_PLACES} |
| 48 | +%variable before an MPI launch (which involves more than one MPI process) will |
| 49 | +%result in unspecified behavior (and doesn't make sense) because the set of |
| 50 | +%processors in the places list must not contain processors outside the subset |
| 51 | +%of processors for an MPI process. A separate \code{OMP\_PLACES} variable must |
| 52 | +%be set for each MPI process, and is usually accomplished by launching a script |
| 53 | +%which sets \code{OMP\_PLACES} specifically for the MPI process. |
| 54 | + |
| 55 | +Threads of a team are positioned onto places in a compact manner, a |
| 56 | +scattered distribution, or onto the master's place, by setting the |
| 57 | +\code{OMP\_PROC\_BIND} environment variable or the \code{proc\_bind} clause to |
| 58 | +\plc{close}, \plc{spread}, or \plc{master}, respectively. When |
| 59 | +\code{OMP\_PROC\_BIND} is set to FALSE no binding is enforced; and |
| 60 | +when the value is TRUE, the binding is implementation defined to |
| 61 | +a set of places in the \code{OMP\_PLACES} variable or to places |
| 62 | +defined by the implementation if the \code{OMP\_PLACES} variable |
| 63 | +is not set. |
| 64 | + |
| 65 | +The \code{OMP\_PLACES} variable can also be set to an abstract name |
| 66 | +(\plc{threads}, \plc{cores}, \plc{sockets}) to specify that a place is |
| 67 | +either a single hardware thread, a core, or a socket, respectively. |
| 68 | +This description of the \code{OMP\_PLACES} is most useful when the |
| 69 | +number of threads is equal to the number of hardware thread, cores |
| 70 | +or sockets. It can also be used with a \plc{close} or \plc{spread} |
| 71 | +distribution policy when the equality doesn't hold. |
| 72 | + |
| 73 | + |
| 74 | +% We need an example of using sockets, cores and threads: |
| 75 | + |
| 76 | +% case 1 cores: |
| 77 | + |
| 78 | +% Hyper-Threads on (2 hardware threads per core) |
| 79 | +% 1 socket x 4 cores x 2 HW-threads |
| 80 | +% |
| 81 | +% export OMP_NUM_THREADS=4 |
| 82 | +% export OMP_PLACES=threads |
| 83 | +% |
| 84 | +% core # 0 1 2 3 |
| 85 | +% processor # 0,1 2,3 4,5 6,7 |
| 86 | +% thread # 0 * _ _ _ _ _ _ _ #mask for thread 0 |
| 87 | +% thread # 1 _ _ * _ _ _ _ _ #mask for thread 1 |
| 88 | +% thread # 2 _ _ _ _ * _ _ _ #mask for thread 2 |
| 89 | +% thread # 3 _ _ _ _ _ _ * _ #mask for thread 3 |
| 90 | + |
| 91 | +% case 2 threads: |
| 92 | +% |
| 93 | +% Hyper-Threads on (2 hardware threads per core) |
| 94 | +% 1 socket x 4 cores x 2 HW-threads |
| 95 | +% |
| 96 | +% export OMP_NUM_THREADS=4 |
| 97 | +% export OMP_PLACES=cores |
| 98 | +% |
| 99 | +% core # 0 1 2 3 |
| 100 | +% processor # 0,1 2,3 4,5 6,7 |
| 101 | +% thread # 0 * * _ _ _ _ _ _ #mask for thread 0 |
| 102 | +% thread # 1 _ _ * * _ _ _ _ #mask for thread 1 |
| 103 | +% thread # 2 _ _ _ _ * * _ _ #mask for thread 2 |
| 104 | +% thread # 3 _ _ _ _ _ _ * * #mask for thread 3 |
| 105 | + |
| 106 | +% case 3 sockets: |
| 107 | +% |
| 108 | +% No Hyper-Threads |
| 109 | +% 3 socket x 4 cores |
| 110 | +% |
| 111 | +% export OMP_NUM_THREADS=3 |
| 112 | +% export OMP_PLACES=sockets |
| 113 | +% |
| 114 | +% socket # 0 1 2 |
| 115 | +% processor # 0,1,2,3 4,5,6,7 8,9,10,11 |
| 116 | +% thread # 0 * * * * _ _ _ _ _ _ _ _ #mask for thread 0 |
| 117 | +% thread # 0 _ _ _ _ * * * * _ _ _ _ #mask for thread 1 |
| 118 | +% thread # 0 _ _ _ _ _ _ _ _ * * * * #mask for thread 2 |
0 commit comments