Skip to content

Commit 3052c10

Browse files
author
Henry Jin
committed
v5.0.1 release
1 parent eaec9ed commit 3052c10

File tree

406 files changed

+14120
-559
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

406 files changed

+14120
-559
lines changed

Chap_memory_model.tex

+8-9
Original file line numberDiff line numberDiff line change
@@ -48,15 +48,14 @@ \chapter{Memory Model}
4848
In this chapter, examples illustrate how race conditions may arise for accesses
4949
to variables with a \plc{shared} data-sharing attribute when flush operations
5050
are not properly employed. A race condition can exist when two or more threads
51-
are involved in accessing a variable in which not all of the accesses are
52-
reads; that is, a WaR, RaW or WaW condition exists (R=read, a=after, W=write).
53-
A RaR does not produce a race condition. In particular, a data race will arise
54-
when conflicting accesses do not have a well-defined \emph{completion order}.
55-
The existence of data races in OpenMP programs result in undefined behavior,
56-
and so they should generally be avoided for programs to be correct. The
57-
completion order of accesses to a shared variable is guaranteed in OpenMP
58-
through a set of memory consistency rules that are described in the \plc{OpenMP
59-
Memory Consitency} section of the OpenMP Specifications document.
51+
are involved in accessing a variable and at least one of the accesses modifies
52+
the variable. In particular, a data race will arise when conflicting accesses
53+
do not have a well-defined \emph{completion order}. The existence of data
54+
races in OpenMP programs result in undefined behavior, and so they should
55+
generally be avoided for programs to be correct. The completion order of
56+
accesses to a shared variable is guaranteed in OpenMP through a set of memory
57+
consistency rules that are described in the \plc{OpenMP Memory Consitency}
58+
section of the OpenMP Specifications document.
6059

6160
%This chapter also includes examples that exhibit non-sequentially consistent
6261
%(\emph{non-SC}) behavior. Sequential consistency (\emph{SC}) is the desirable

Chap_tasking.tex

+3-2
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,9 @@ \chapter{Tasking}
4040
execution at a scheduling point and return later. The thread is tied
4141
to the task. Scheduling points can be introduced with the \code{taskyield}
4242
construct. With an \code{untied} clause any other thread is allowed to continue
43-
the task. An \code{if} clause with a \plc{true} expression allows the
44-
generating thread to immediately execute the task as an undeferred task.
43+
the task. An \code{if} clause with an expression that evaluates to \plc{false}
44+
results in an \emph{undeferred} task, which instructs the runtime to suspend
45+
the generating task until the undeferred task completes its execution.
4546
By including the data environment of the generating task into the generated task with the
4647
\code{mergeable} and \code{final} clauses, task generation overhead can be reduced.
4748

Examples_Chapt.tex

+5
Original file line numberDiff line numberDiff line change
@@ -19,3 +19,8 @@ \chapter*{Examples}
1919
\item \plc{f90} -- Fortran code in free form.
2020
\end{compactitem}
2121

22+
Some of the example labels may include version information
23+
(\code{\small{}omp\_\plc{verno}}) to indicate features that are illustrated
24+
by an example for a specific OpenMP version, such as ``\plc{scan.1.c}
25+
\;(\code{\small{}omp\_5.0}).''
26+

Examples_SIMD.tex

+16-16
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ \section{\code{simd} and \code{declare} \code{simd} Constructs}
55
The following example illustrates the basic use of the \code{simd} construct
66
to assure the compiler that the loop can be vectorized.
77

8-
\cexample{SIMD}{1}
8+
\cexample[4.0]{SIMD}{1}
99

10-
\ffreeexample{SIMD}{1}
10+
\ffreeexample[4.0]{SIMD}{1}
1111

1212
\clearpage
1313

@@ -40,9 +40,9 @@ \section{\code{simd} and \code{declare} \code{simd} Constructs}
4040
necessary to assure that the each vector operation has its own \plc{tmp}
4141
variable.
4242

43-
\cexample{SIMD}{2}
43+
\cexample[4.0]{SIMD}{2}
4444

45-
\ffreeexample{SIMD}{2}
45+
\ffreeexample[4.0]{SIMD}{2}
4646

4747
\pagebreak
4848
A thread that encounters a SIMD construct executes a vectorized code of the
@@ -52,9 +52,9 @@ \section{\code{simd} and \code{declare} \code{simd} Constructs}
5252
illustrates the use of \code{private} and \code{reduction} clauses in a SIMD
5353
construct.
5454

55-
\cexample{SIMD}{3}
55+
\cexample[4.0]{SIMD}{3}
5656

57-
\ffreeexample{SIMD}{3}
57+
\ffreeexample[4.0]{SIMD}{3}
5858

5959

6060
\pagebreak
@@ -68,19 +68,19 @@ \section{\code{simd} and \code{declare} \code{simd} Constructs}
6868
be 16 or greater, for correct code execution. If the value of \plc{m} is less
6969
than 16, the behavior is undefined.
7070

71-
\cexample{SIMD}{4}
71+
\cexample[4.0]{SIMD}{4}
7272

73-
\ffreeexample{SIMD}{4}
73+
\ffreeexample[4.0]{SIMD}{4}
7474

7575
\pagebreak
7676
The following SIMD construct instructs the compiler to collapse the \plc{i} and
7777
\plc{j} loops into a single SIMD loop in which SIMD chunks are executed by
7878
threads of the team. Within the workshared loop chunks of a thread, the SIMD
7979
chunks are executed in the lanes of the vector units.
8080

81-
\cexample{SIMD}{5}
81+
\cexample[4.0]{SIMD}{5}
8282

83-
\ffreeexample{SIMD}{5}
83+
\ffreeexample[4.0]{SIMD}{5}
8484

8585

8686
%%% section
@@ -95,9 +95,9 @@ \section{\code{inbranch} and \code{notinbranch} Clauses}
9595
the function is always called conditionally in the SIMD loop inside
9696
the function \plc{myaddfloat}.
9797

98-
\cexample{SIMD}{6}
98+
\cexample[4.0]{SIMD}{6}
9999

100-
\ffreeexample{SIMD}{6}
100+
\ffreeexample[4.0]{SIMD}{6}
101101

102102

103103
In the code below, the function \plc{fib()} is called in the main program and
@@ -106,9 +106,9 @@ \section{\code{inbranch} and \code{notinbranch} Clauses}
106106
version for the function \plc{fib()} while retaining the original scalar
107107
version of the \plc{fib()} function.
108108

109-
\cexample{SIMD}{7}
109+
\cexample[4.0]{SIMD}{7}
110110

111-
\ffreeexample{SIMD}{7}
111+
\ffreeexample[4.0]{SIMD}{7}
112112

113113

114114

@@ -124,7 +124,7 @@ \section{Loop-Carried Lexical Forward Dependence}
124124

125125
This test assures that the compiler preserves the loop carried lexical forward-dependence for generating a correct SIMD code.
126126

127-
\cexample{SIMD}{8}
127+
\cexample[4.0]{SIMD}{8}
128128

129-
\ffreeexample{SIMD}{8}
129+
\ffreeexample[4.0]{SIMD}{8}
130130

Examples_acquire_release.tex

+8-8
Original file line numberDiff line numberDiff line change
@@ -67,8 +67,8 @@ \section{Synchronization Based on Acquire/Release Semantics}
6767
\plc{x} equals 10.
6868

6969
\pagebreak
70-
\cexample{acquire_release}{1}
71-
\ffreeexample{acquire_release}{1}
70+
\cexample[5.0]{acquire_release}{1}
71+
\ffreeexample[5.0]{acquire_release}{1}
7272

7373
In the second example, the \code{critical} constructs are exchanged with
7474
\code{atomic} constructs that have \textit{explicit} memory ordering specified. When the
@@ -77,8 +77,8 @@ \section{Synchronization Based on Acquire/Release Semantics}
7777
assignment to \plc{x} on thread 0 happens before the read of \plc{x} on thread
7878
1. Therefore, thread 1 will print ``x = 10''.
7979

80-
\cexample{acquire_release}{2}
81-
\ffreeexample{acquire_release}{2}
80+
\cexample[5.0]{acquire_release}{2}
81+
\ffreeexample[5.0]{acquire_release}{2}
8282

8383
\pagebreak
8484
In the third example, \code{atomic} constructs that specify relaxed atomic
@@ -105,8 +105,8 @@ \section{Synchronization Based on Acquire/Release Semantics}
105105
%}
106106
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3
107107

108-
\cexample{acquire_release}{3}
109-
\ffreeexample{acquire_release}{3}
108+
\cexample[5.0]{acquire_release}{3}
109+
\ffreeexample[5.0]{acquire_release}{3}
110110

111111
Example 4 will fail to order the write to \plc{x} on thread 0 before the read
112112
from \plc{x} on thread 1. Importantly, the implicit release flush on exit from
@@ -137,5 +137,5 @@ \section{Synchronization Based on Acquire/Release Semantics}
137137
%by thread 0.
138138
%}
139139

140-
\cexample{acquire_release_broke}{4}
141-
\ffreeexample{acquire_release_broke}{4}
140+
\cexample[5.0]{acquire_release_broke}{4}
141+
\ffreeexample[5.0]{acquire_release_broke}{4}

Examples_affinity.tex

+10-10
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,9 @@ \subsection{Spread Affinity Policy}
3434
of places in the parent's place partition, for the machine architecture depicted
3535
above. Note that the threads are bound to the first place of each subpartition.
3636

37-
\cexample{affinity}{1}
37+
\cexample[4.0]{affinity}{1}
3838

39-
\fexample{affinity}{1}
39+
\fexample[4.0]{affinity}{1}
4040

4141
It is unspecified on which place the master thread is initially started. If the
4242
master thread is initially started on p0, the following placement of threads will
@@ -75,9 +75,9 @@ \subsection{Spread Affinity Policy}
7575
thread) execute on the parent's place. The next \plc{T/P} threads execute on the next
7676
place in the place partition, and so on, with wrap around.
7777

78-
\cexample{affinity}{2}
78+
\cexample[4.0]{affinity}{2}
7979

80-
\ffreeexample{affinity}{2}
80+
\ffreeexample[4.0]{affinity}{2}
8181

8282
It is unspecified on which place the master thread is initially started. If the
8383
master thread is initially started on p0, the following placement of threads will
@@ -130,9 +130,9 @@ \subsection{Close Affinity Policy}
130130
of places in parent's place partition, for the machine architecture depicted above.
131131
The place partition is not changed by the \code{close} policy.
132132

133-
\cexample{affinity}{3}
133+
\cexample[4.0]{affinity}{3}
134134

135-
\fexample{affinity}{3}
135+
\fexample[4.0]{affinity}{3}
136136

137137
It is unspecified on which place the master thread is initially started. If the
138138
master thread is initially started on p0, the following placement of threads will
@@ -171,9 +171,9 @@ \subsection{Close Affinity Policy}
171171
place in the place partition, and so on, with wrap around. The place partition
172172
is not changed by the \code{close} policy.
173173

174-
\cexample{affinity}{4}
174+
\cexample[4.0]{affinity}{4}
175175

176-
\ffreeexample{affinity}{4}
176+
\ffreeexample[4.0]{affinity}{4}
177177

178178
It is unspecified on which place the master thread is initially started. If the
179179
master thread is initially running on p0, the following placement of threads will
@@ -225,9 +225,9 @@ \subsection{Master Affinity Policy}
225225
the partition list for the machine architecture depicted above. The place partition
226226
is not changed by the master policy.
227227

228-
\cexample{affinity}{5}
228+
\cexample[4.0]{affinity}{5}
229229

230-
\fexample{affinity}{5}
230+
\fexample[4.0]{affinity}{5}
231231

232232
It is unspecified on which place the master thread is initially started. If the
233233
master thread is initially running on p0, the following placement of threads will

Examples_affinity_display.tex

+6-6
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,9 @@ \section{Affinity Display}
2828
In the last parallel region, the thread affinities are reported
2929
because the thread affinity has changed.
3030

31-
\cexample{affinity_display}{1}
31+
\cexample[5.0]{affinity_display}{1}
3232

33-
\ffreeexample{affinity_display}{1}
33+
\ffreeexample[5.0]{affinity_display}{1}
3434

3535

3636
In the following example 2 threads are forked, and each executes on a socket. Next,
@@ -58,9 +58,9 @@ \section{Affinity Display}
5858
and the thread affinity (\%A). In the nested parallel region within the \plc{socket\_work} routine
5959
the affinities for the threads on each socket are printed according to this format.
6060

61-
\cexample{affinity_display}{2}
61+
\cexample[5.0]{affinity_display}{2}
6262

63-
\ffreeexample{affinity_display}{2}
63+
\ffreeexample[5.0]{affinity_display}{2}
6464

6565
The next example illustrates more details about affinity formatting.
6666
First, the \code{omp\_get\_affininity\_format()} API routine is used to
@@ -98,7 +98,7 @@ \section{Affinity Display}
9898
clause and the \plc{if(nchars >= max\_req\_store) max\_req\_store=nchars} statement.
9999
It is used to report possible truncation (if \plc{max\_req\_store} > \plc{buffer\_store}).
100100

101-
\cexample{affinity_display}{3}
101+
\cexample[5.0]{affinity_display}{3}
102102

103-
\ffreeexample{affinity_display}{3}
103+
\ffreeexample[5.0]{affinity_display}{3}
104104

Examples_affinity_query.tex

+2-2
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ \section{Affinity Query Functions}
3737
information. For instance, the socket number and proc\_id's for a socket
3838
can be found in the /proc/cpuinfo text file on Linux systems.
3939

40-
\cexample{affinity_query}{1}
40+
\cexample[4.5]{affinity_query}{1}
4141

42-
\ffreeexample{affinity_query}{1}
42+
\ffreeexample[4.5]{affinity_query}{1}
4343

Examples_allocators.tex

+2-2
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ \section{ Memory Allocators}
5757

5858
%\pagebreak
5959

60-
\cexample{allocators}{1}
61-
\ffreeexample{allocators}{1}
60+
\cexample[5.0]{allocators}{1}
61+
\ffreeexample[5.0]{allocators}{1}
6262

6363

Examples_array_sections.tex

+8-8
Original file line numberDiff line numberDiff line change
@@ -8,31 +8,31 @@ \section{Array Sections in Device Constructs}
88
This example shows the invalid usage of two separate sections of the same array
99
inside of a \code{target} construct.
1010

11-
\cexample{array_sections}{1}
11+
\cexample[4.0]{array_sections}{1}
1212

13-
\ffreeexample{array_sections}{1}
13+
\ffreeexample[4.0]{array_sections}{1}
1414

1515
\pagebreak
1616
This example shows the invalid usage of two separate sections of the same array
1717
inside of a \code{target} construct.
1818

19-
\cexample{array_sections}{2}
19+
\cexample[4.0]{array_sections}{2}
2020

21-
\ffreeexample{array_sections}{2}
21+
\ffreeexample[4.0]{array_sections}{2}
2222

2323
\pagebreak
2424
This example shows the valid usage of two separate sections of the same array inside
2525
of a \code{target} construct.
2626

27-
\cexample{array_sections}{3}
27+
\cexample[4.0]{array_sections}{3}
2828

29-
\ffreeexample{array_sections}{3}
29+
\ffreeexample[4.0]{array_sections}{3}
3030

3131
\pagebreak
3232
This example shows the valid usage of a wholly contained array section of an already
3333
mapped array section inside of a \code{target} construct.
3434

35-
\cexample{array_sections}{4}
35+
\cexample[4.0]{array_sections}{4}
3636

37-
\ffreeexample{array_sections}{4}
37+
\ffreeexample[4.0]{array_sections}{4}
3838

Examples_array_shaping.tex

+9-1
Original file line numberDiff line numberDiff line change
@@ -23,5 +23,13 @@ \section{Array Shaping}
2323
around the shape-operator and $a$ to ensure the correct precedence
2424
over array-section operations.
2525

26-
\cnexample{array_shaping}{1}
26+
\cnexample[5.0]{array_shaping}{1}
2727
\ccppspecificend
28+
29+
The shape operator is not defined for Fortran. Explicit array shaping
30+
of procedure arguments can be used instead to achieve a similar goal.
31+
Below is the Fortran-equivalent of the above example that illustrates
32+
the support of transferring two rows of noncontiguous boundary
33+
data in the \code{target}~\code{update} directive.
34+
35+
\ffreeexample[5.0]{array_shaping}{1}

Examples_associate.tex

+3-3
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,13 @@ \section{Fortran \code{ASSOCIATE} Construct}
1111
attribute rule, the associate name \plc{b} is not allowed to be specified on the \code{private}
1212
clause.
1313

14-
\fnexample{associate}{1}
14+
\fnexample[4.0]{associate}{1}
1515

1616
In next example, within the \code{parallel} construct, the association name \plc{thread\_id}
1717
is associated with the private copy of \plc{i}. The print statement should output the
1818
unique thread number.
1919

20-
\fnexample{associate}{2}
20+
\fnexample[4.0]{associate}{2}
2121

2222
The following example illustrates the effect of specifying a selector name on a data-sharing
2323
attribute clause. The associate name \plc{u} is associated with \plc{v} and the variable \plc{v}
@@ -27,6 +27,6 @@ \section{Fortran \code{ASSOCIATE} Construct}
2727
Attribute Rules section in the OpenMP 4.0 API Specifications). Inside the \code{parallel}
2828
region, \plc{v} has the value of -1 and \plc{u} has the value of the original \plc{v}.
2929

30-
\ffreenexample{associate}{3}
30+
\ffreenexample[4.0]{associate}{3}
3131
\fortranspecificend
3232

Examples_async_target_nowait.tex

+2-2
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,6 @@ \subsection{\code{nowait} Clause on \code{target} Construct}
2626
little time is spent by the \plc{target task} in setting
2727
up and tearing down the the target execution, \code{static} scheduling may be desired.
2828

29-
\cexample{async_target}{3}
29+
\cexample[4.5]{async_target}{3}
3030

31-
\ffreeexample{async_target}{3}
31+
\ffreeexample[4.5]{async_target}{3}

Examples_async_target_nowait_depend.tex

+2-2
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@ \subsection{Asynchronous \code{target} with \code{nowait} and \code{depend} Clau
1111

1212
The \code{nowait} clause on the \code{target} construct creates a deferrable \plc{target task}, allowing the encountering task to continue execution without waiting for the completion of the \plc{target task}.
1313

14-
\cexample{async_target}{4}
14+
\cexample[4.5]{async_target}{4}
1515

16-
\ffreeexample{async_target}{4}
16+
\ffreeexample[4.5]{async_target}{4}
1717

1818
%end

0 commit comments

Comments
 (0)