OpenMP
diff --git a/‎Chap_memory_model.tex
+8-9 b/‎Chap_memory_model.tex
+8-9
diff --git a/‎Chap_tasking.tex
+3-2 b/‎Chap_tasking.tex
+3-2
diff --git a/‎Examples_Chapt.tex
+5 b/‎Examples_Chapt.tex
+5
diff --git a/‎Examples_SIMD.tex
+16-16 b/‎Examples_SIMD.tex
+16-16
diff --git a/‎Examples_acquire_release.tex
+8-8 b/‎Examples_acquire_release.tex
+8-8
diff --git a/‎Examples_affinity.tex
+10-10 b/‎Examples_affinity.tex
+10-10
diff --git a/‎Examples_affinity_display.tex
+6-6 b/‎Examples_affinity_display.tex
+6-6
diff --git a/‎Examples_affinity_query.tex
+2-2 b/‎Examples_affinity_query.tex
+2-2
diff --git a/‎Examples_allocators.tex
+2-2 b/‎Examples_allocators.tex
+2-2
diff --git a/‎Examples_array_sections.tex
+8-8 b/‎Examples_array_sections.tex
+8-8
diff --git a/‎Examples_array_shaping.tex
+9-1 b/‎Examples_array_shaping.tex
+9-1
diff --git a/‎Examples_associate.tex
+3-3 b/‎Examples_associate.tex
+3-3
diff --git a/‎Examples_async_target_nowait.tex
+2-2 b/‎Examples_async_target_nowait.tex
+2-2
diff --git a/‎Examples_async_target_nowait_depend.tex
+2-2 b/‎Examples_async_target_nowait_depend.tex
+2-2
@@ -48,15 +48,14 @@ \chapter{Memory Model}
 In this chapter, examples illustrate how race conditions may arise for accesses
 to variables with a \plc{shared} data-sharing attribute when flush operations
 are not properly employed.  A race condition can exist when two or more threads
-are involved in accessing a variable in which not all of the accesses are
-reads; that is, a WaR, RaW or WaW condition exists (R=read, a=after, W=write).
-A RaR does not produce a race condition. In particular, a data race will arise
-when conflicting accesses do not have a well-defined \emph{completion order}.
-The existence of data races in OpenMP programs result in undefined behavior,
-and so they should generally be avoided for programs to be correct.  The
-completion order of accesses to a shared variable is guaranteed in OpenMP
-through a set of memory consistency rules that are described in the \plc{OpenMP
-Memory Consitency} section of the OpenMP Specifications document.
+are involved in accessing a variable and at least one of the accesses modifies
+the variable.  In particular, a data race will arise when conflicting accesses
+do not have a well-defined \emph{completion order}.  The existence of data
+races in OpenMP programs result in undefined behavior, and so they should
+generally be avoided for programs to be correct.  The completion order of
+accesses to a shared variable is guaranteed in OpenMP through a set of memory
+consistency rules that are described in the \plc{OpenMP Memory Consitency}
+section of the OpenMP Specifications document.
 
 %This chapter also includes examples that exhibit non-sequentially consistent
 %(\emph{non-SC}) behavior. Sequential consistency (\emph{SC}) is the desirable
 
@@ -40,8 +40,9 @@ \chapter{Tasking}
 execution at a scheduling point and return later.  The thread is tied
 to the task.  Scheduling points can be introduced with the \code{taskyield}
 construct.  With an \code{untied} clause any other thread is allowed to continue
-the task.  An \code{if} clause with a \plc{true} expression allows the 
-generating thread to immediately execute the task as an undeferred task.
+the task.  An \code{if} clause with an expression that evaluates to \plc{false} 
+results in an \emph{undeferred} task, which instructs the runtime to suspend
+the generating task until the undeferred task completes its execution.
 By including the data environment of the generating task into the generated task with the 
 \code{mergeable} and \code{final} clauses, task generation overhead can be reduced.
 
 
@@ -19,3 +19,8 @@ \chapter*{Examples}
 \item \plc{f90} -- Fortran code in free form.
 \end{compactitem}
 
+Some of the example labels may include version information 
+(\code{\small{}omp\_\plc{verno}}) to indicate features that are illustrated
+by an example for a specific OpenMP version, such as ``\plc{scan.1.c} 
+\;(\code{\small{}omp\_5.0}).''
+
@@ -5,9 +5,9 @@ \section{\code{simd} and \code{declare} \code{simd} Constructs}
 The following example illustrates the basic use of the \code{simd} construct 
 to assure the compiler that the loop can be vectorized.
 
-\cexample{SIMD}{1}
+\cexample[4.0]{SIMD}{1}
 
-\ffreeexample{SIMD}{1}
+\ffreeexample[4.0]{SIMD}{1}
 
 \clearpage
 
@@ -40,9 +40,9 @@ \section{\code{simd} and \code{declare} \code{simd} Constructs}
 necessary to assure that the each vector operation has its own \plc{tmp} 
 variable.
 
-\cexample{SIMD}{2}
+\cexample[4.0]{SIMD}{2}
 
-\ffreeexample{SIMD}{2}
+\ffreeexample[4.0]{SIMD}{2}
 
 \pagebreak
 A thread that encounters a SIMD construct executes a vectorized code of the 
@@ -52,9 +52,9 @@ \section{\code{simd} and \code{declare} \code{simd} Constructs}
 illustrates the use of \code{private} and \code{reduction} clauses in a SIMD 
 construct.
 
-\cexample{SIMD}{3}
+\cexample[4.0]{SIMD}{3}
 
-\ffreeexample{SIMD}{3}
+\ffreeexample[4.0]{SIMD}{3}
 
 
 \pagebreak
@@ -68,19 +68,19 @@ \section{\code{simd} and \code{declare} \code{simd} Constructs}
 be 16 or greater, for correct code execution.  If the value of \plc{m} is less 
 than 16, the behavior is undefined.
 
-\cexample{SIMD}{4}
+\cexample[4.0]{SIMD}{4}
 
-\ffreeexample{SIMD}{4}
+\ffreeexample[4.0]{SIMD}{4}
 
 \pagebreak
 The following SIMD construct instructs the compiler to collapse the \plc{i} and 
 \plc{j} loops into a single SIMD loop in which SIMD chunks are executed by 
 threads of the team. Within the workshared loop chunks of a thread, the SIMD 
 chunks are executed in the lanes of the vector units.
 
-\cexample{SIMD}{5}
+\cexample[4.0]{SIMD}{5}
 
-\ffreeexample{SIMD}{5}
+\ffreeexample[4.0]{SIMD}{5}
 
 
 %%% section
@@ -95,9 +95,9 @@ \section{\code{inbranch} and \code{notinbranch} Clauses}
 the function is always called conditionally in the SIMD loop inside 
 the function \plc{myaddfloat}.
 
-\cexample{SIMD}{6}
+\cexample[4.0]{SIMD}{6}
 
-\ffreeexample{SIMD}{6}
+\ffreeexample[4.0]{SIMD}{6}
 
 
 In the code below, the function \plc{fib()} is called in the main program and 
@@ -106,9 +106,9 @@ \section{\code{inbranch} and \code{notinbranch} Clauses}
 version for the function \plc{fib()} while retaining the original scalar 
 version of the \plc{fib()} function.
 
-\cexample{SIMD}{7}
+\cexample[4.0]{SIMD}{7}
 
-\ffreeexample{SIMD}{7}
+\ffreeexample[4.0]{SIMD}{7}
 
 
 
@@ -124,7 +124,7 @@ \section{Loop-Carried Lexical Forward Dependence}
 
 This test assures that the compiler preserves the loop carried lexical forward-dependence for generating a correct SIMD code.
 
-\cexample{SIMD}{8}
+\cexample[4.0]{SIMD}{8}
 
-\ffreeexample{SIMD}{8}
+\ffreeexample[4.0]{SIMD}{8}
 
@@ -67,8 +67,8 @@ \section{Synchronization Based on Acquire/Release Semantics}
 \plc{x} equals 10.
 
 \pagebreak
-\cexample{acquire_release}{1}
-\ffreeexample{acquire_release}{1}
+\cexample[5.0]{acquire_release}{1}
+\ffreeexample[5.0]{acquire_release}{1}
 
 In the second example, the \code{critical} constructs are exchanged with
 \code{atomic} constructs that have \textit{explicit} memory ordering specified. When the
@@ -77,8 +77,8 @@ \section{Synchronization Based on Acquire/Release Semantics}
 assignment to \plc{x} on thread 0 happens before the read of \plc{x} on thread
 1. Therefore, thread 1 will print ``x = 10''.
 
-\cexample{acquire_release}{2}
-\ffreeexample{acquire_release}{2}
+\cexample[5.0]{acquire_release}{2}
+\ffreeexample[5.0]{acquire_release}{2}
 
 \pagebreak
 In the third example, \code{atomic} constructs that specify relaxed atomic
@@ -105,8 +105,8 @@ \section{Synchronization Based on Acquire/Release Semantics}
 %}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%3
 
-\cexample{acquire_release}{3}
-\ffreeexample{acquire_release}{3}
+\cexample[5.0]{acquire_release}{3}
+\ffreeexample[5.0]{acquire_release}{3}
 
 Example 4 will fail to order the write to \plc{x} on thread 0 before the read
 from \plc{x} on thread 1. Importantly, the implicit release flush on exit from
@@ -137,5 +137,5 @@ \section{Synchronization Based on Acquire/Release Semantics}
 %by thread 0.
 %}
 
-\cexample{acquire_release_broke}{4}
-\ffreeexample{acquire_release_broke}{4}
+\cexample[5.0]{acquire_release_broke}{4}
+\ffreeexample[5.0]{acquire_release_broke}{4}
@@ -34,9 +34,9 @@ \subsection{Spread Affinity Policy}
 of places in the parent's place partition, for the machine architecture depicted 
 above. Note that the threads are bound to the first place of each subpartition.
 
-\cexample{affinity}{1}
+\cexample[4.0]{affinity}{1}
 
-\fexample{affinity}{1}
+\fexample[4.0]{affinity}{1}
 
 It is unspecified on which place the master thread is initially started. If the 
 master thread is initially started on p0, the following placement of threads will 
@@ -75,9 +75,9 @@ \subsection{Spread Affinity Policy}
 thread) execute on the parent's place. The next \plc{T/P} threads execute on the next 
 place in the place partition, and so on, with wrap around. 
 
-\cexample{affinity}{2}
+\cexample[4.0]{affinity}{2}
 
-\ffreeexample{affinity}{2}
+\ffreeexample[4.0]{affinity}{2}
 
 It is unspecified on which place the master thread is initially started. If the 
 master thread is initially started on p0, the following placement of threads will 
@@ -130,9 +130,9 @@ \subsection{Close Affinity Policy}
 of places in parent's place partition, for the machine architecture depicted above. 
 The place partition is not changed by the \code{close} policy.
 
-\cexample{affinity}{3}
+\cexample[4.0]{affinity}{3}
 
-\fexample{affinity}{3}
+\fexample[4.0]{affinity}{3}
 
 It is unspecified on which place the master thread is initially started. If the 
 master thread is initially started on p0, the following placement of threads will 
@@ -171,9 +171,9 @@ \subsection{Close Affinity Policy}
 place in the place partition, and so on, with wrap around. The place partition 
 is not changed by the \code{close} policy.
 
-\cexample{affinity}{4}
+\cexample[4.0]{affinity}{4}
 
-\ffreeexample{affinity}{4}
+\ffreeexample[4.0]{affinity}{4}
 
 It is unspecified on which place the master thread is initially started. If the 
 master thread is initially running on p0, the following placement of threads will 
@@ -225,9 +225,9 @@ \subsection{Master Affinity Policy}
 the partition list for the machine architecture depicted above. The place partition 
 is not changed by the master policy.
 
-\cexample{affinity}{5}
+\cexample[4.0]{affinity}{5}
 
-\fexample{affinity}{5}
+\fexample[4.0]{affinity}{5}
 
 It is unspecified on which place the master thread is initially started. If the 
 master thread is initially running on p0, the following placement of threads will 
 
@@ -28,9 +28,9 @@ \section{Affinity Display}
 In the last parallel region, the thread affinities are reported
 because the thread affinity has changed.
 
-\cexample{affinity_display}{1}
+\cexample[5.0]{affinity_display}{1}
 
-\ffreeexample{affinity_display}{1}
+\ffreeexample[5.0]{affinity_display}{1}
 
 
 In the following example 2 threads are forked, and each executes on a socket. Next,
@@ -58,9 +58,9 @@ \section{Affinity Display}
 and the thread affinity (\%A). In the nested parallel region within the \plc{socket\_work} routine
 the affinities for the threads on each socket are printed according to this format.
 
-\cexample{affinity_display}{2}
+\cexample[5.0]{affinity_display}{2}
 
-\ffreeexample{affinity_display}{2}
+\ffreeexample[5.0]{affinity_display}{2}
 
 The next example illustrates more details about affinity formatting.
 First, the \code{omp\_get\_affininity\_format()} API routine is used to 
@@ -98,7 +98,7 @@ \section{Affinity Display}
 clause and the \plc{if(nchars >= max\_req\_store) max\_req\_store=nchars} statement. 
 It is used to report possible truncation (if \plc{max\_req\_store} > \plc{buffer\_store}).
 
-\cexample{affinity_display}{3}
+\cexample[5.0]{affinity_display}{3}
 
-\ffreeexample{affinity_display}{3}
+\ffreeexample[5.0]{affinity_display}{3}
 
@@ -37,7 +37,7 @@ \section{Affinity Query Functions}
 information.  For instance, the socket number and proc\_id's for a socket 
 can be found in the /proc/cpuinfo text file on Linux systems.
 
-\cexample{affinity_query}{1}
+\cexample[4.5]{affinity_query}{1}
 
-\ffreeexample{affinity_query}{1}
+\ffreeexample[4.5]{affinity_query}{1}
 
@@ -57,7 +57,7 @@ \section{ Memory Allocators}
 
 %\pagebreak
 
-    \cexample{allocators}{1}
-\ffreeexample{allocators}{1}
+\cexample[5.0]{allocators}{1}
+\ffreeexample[5.0]{allocators}{1}
 
 
@@ -8,31 +8,31 @@ \section{Array Sections in Device Constructs}
 This example shows the invalid usage of two separate sections of the same array 
 inside of a \code{target} construct.
 
-\cexample{array_sections}{1}
+\cexample[4.0]{array_sections}{1}
 
-\ffreeexample{array_sections}{1}
+\ffreeexample[4.0]{array_sections}{1}
 
 \pagebreak
 This example shows the invalid usage of two separate sections of the same array 
 inside of a \code{target} construct.
 
-\cexample{array_sections}{2}
+\cexample[4.0]{array_sections}{2}
 
-\ffreeexample{array_sections}{2}
+\ffreeexample[4.0]{array_sections}{2}
 
 \pagebreak
 This example shows the valid usage of two separate sections of the same array inside 
 of a \code{target} construct.
 
-\cexample{array_sections}{3}
+\cexample[4.0]{array_sections}{3}
 
-\ffreeexample{array_sections}{3}
+\ffreeexample[4.0]{array_sections}{3}
 
 \pagebreak
 This example shows the valid usage of a wholly contained array section of an already 
 mapped array section inside of a \code{target} construct.
 
-\cexample{array_sections}{4}
+\cexample[4.0]{array_sections}{4}
 
-\ffreeexample{array_sections}{4}
+\ffreeexample[4.0]{array_sections}{4}
 
@@ -23,5 +23,13 @@ \section{Array Shaping}
 around the shape-operator and $a$ to ensure the correct precedence 
 over array-section operations.
 
-\cnexample{array_shaping}{1}
+\cnexample[5.0]{array_shaping}{1}
 \ccppspecificend
+
+The shape operator is not defined for Fortran.  Explicit array shaping
+of procedure arguments can be used instead to achieve a similar goal.
+Below is the Fortran-equivalent of the above example that illustrates
+the support of transferring two rows of noncontiguous boundary
+data in the \code{target}~\code{update} directive.
+ 
+\ffreeexample[5.0]{array_shaping}{1}
@@ -11,13 +11,13 @@ \section{Fortran \code{ASSOCIATE} Construct}
 attribute rule, the associate name \plc{b} is not allowed to be specified on the \code{private} 
 clause.
 
-\fnexample{associate}{1}
+\fnexample[4.0]{associate}{1}
 
 In next example, within the \code{parallel} construct, the association name \plc{thread\_id} 
 is associated with the private copy of \plc{i}. The print statement should output the 
 unique thread number.
 
-\fnexample{associate}{2}
+\fnexample[4.0]{associate}{2}
 
 The following example illustrates the effect of specifying a selector name on a data-sharing 
 attribute clause. The associate name \plc{u} is associated with \plc{v} and the variable \plc{v} 
@@ -27,6 +27,6 @@ \section{Fortran \code{ASSOCIATE} Construct}
 Attribute Rules section in the OpenMP 4.0 API Specifications). Inside the \code{parallel} 
 region, \plc{v} has the value of -1 and \plc{u} has the value of the original \plc{v}.
 
-\ffreenexample{associate}{3}
+\ffreenexample[4.0]{associate}{3}
 \fortranspecificend
 
@@ -26,6 +26,6 @@ \subsection{\code{nowait} Clause on \code{target} Construct}
 little time is spent by the \plc{target task} in setting 
 up and tearing down the the target execution, \code{static} scheduling may be desired. 
 
-\cexample{async_target}{3}
+\cexample[4.5]{async_target}{3}
 
-\ffreeexample{async_target}{3}
+\ffreeexample[4.5]{async_target}{3}
@@ -11,8 +11,8 @@ \subsection{Asynchronous \code{target} with \code{nowait} and \code{depend} Clau
 
 The \code{nowait} clause on the \code{target} construct creates a deferrable \plc{target task}, allowing the encountering task to continue execution without waiting for the completion of the \plc{target task}.
 
-\cexample{async_target}{4}
+\cexample[4.5]{async_target}{4}
 
-\ffreeexample{async_target}{4}
+\ffreeexample[4.5]{async_target}{4}
 
 %end