Skip to content

Commit eaec9ed

Browse files
author
Henry Jin
committed
synced with v5.0.0 of the examples-internal repo
1 parent 156a12c commit eaec9ed

File tree

170 files changed

+5822
-233
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

170 files changed

+5822
-233
lines changed

Changes.log

+5
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
[02-Feb-2018] Note
2+
This "Changes.log" is no longer updated. Please use History.tex and
3+
the git log messages for changes.
4+
5+
16
[20-May-2016] Version 4.5.0
27
Changes from 4.0.2ltx
38

Chap_SIMD.tex

+1-1
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ \chapter{SIMD}
3333
\code{uniform}, and \code{aligned}), a requested vector length
3434
(\code{simdlen}), and designate whether the function is always/never
3535
called conditionally in a loop (\code{branch}/\code{inbranch}).
36-
The latter is for optimizing peformance.
36+
The latter is for optimizing performance.
3737

3838
Also, the \code{simd} construct has been combined with the worksharing loop
3939
constructs (\code{for simd} and \code{do simd}) to enable simultaneous thread

Chap_data_environment.tex

+2-2
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ \chapter{Data Environment}
4444
\bigskip
4545
DATA-MAPPING ATTRIBUTES
4646

47-
The \code{map} clause on a device construct explictly specifies how the list items in
47+
The \code{map} clause on a device construct explicitly specifies how the list items in
4848
the clause are mapped from the encountering task's data environment (on the host)
4949
to the corresponding item in the device data environment (on the device).
5050
The common \plc{list items} are arrays, array sections, scalars, pointers, and
@@ -55,7 +55,7 @@ \chapter{Data Environment}
5555
is mapped as a zero-length array section, as is a C++ variable that is a reference to a pointer.
5656
% Waiting for response from Eric on this.
5757

58-
Without explict mapping, non-scalar and non-pointer variables within the scope of the \code{target}
58+
Without explicit mapping, non-scalar and non-pointer variables within the scope of the \code{target}
5959
construct are implicitly mapped with a \plc{map-type} of \code{tofrom}.
6060
Without explicit mapping, scalar variables within the scope of the \code{target}
6161
construct are not mapped, but have an implicit firstprivate data-sharing

Chap_memory_model.tex

+61-34
Original file line numberDiff line numberDiff line change
@@ -2,44 +2,71 @@
22
\chapter{Memory Model}
33
\label{chap:memory_model}
44

5-
In this chapter, examples illustrate race conditions on access to variables with
6-
shared data-sharing attributes. A race condition can exist when two
7-
or more threads are involved in accessing a variable in which not all
8-
of the accesses are reads; that is, a WaR, RaW or WaW condition
9-
exists (R=read, a=after, W=write). A RaR does not produce a race condition.
10-
Ensuring thread execution order at
11-
the processor level is not enough to avoid race conditions, because the
12-
local storage at the processor level (registers, caches, etc.)
13-
must be synchronized so that a consistent view of the variable in the
14-
memory hierarchy can be seen by the threads accessing the variable.
5+
OpenMP provides a shared-memory model that allows all threads on a given
6+
device shared access to \emph{memory}. For a given OpenMP region that may be
7+
executed by more than one thread or SIMD lane, variables in memory may be
8+
\emph{shared} or \emph{private} with respect to those threads or SIMD lanes. A
9+
variable's data-sharing attribute indicates whether it is shared (the
10+
\emph{shared} attribute) or private (the \emph{private}, \emph{firstprivate},
11+
\emph{lastprivate}, \emph{linear}, and \emph{reduction} attributes) in the data
12+
environment of an OpenMP region. While private variables in an OpenMP region
13+
are new copies of the original variable (with same name) that may then be
14+
concurrently accessed or modified by their respective threads or SIMD lanes, a
15+
shared variable in an OpenMP region is the same as the variable of the same
16+
name in the enclosing region. Concurrent accesses or modifications to a
17+
shared variable may therefore require synchronization to avoid data races.
1518

16-
OpenMP provides a shared-memory model which allows all threads access
17-
to \plc{memory} (shared data). Each thread also has exclusive
18-
access to \plc{threadprivate memory} (private data). A private
19-
variable referenced in an OpenMP directive's structured block is a
20-
new version of the original variable (with the same name) for each
21-
task (or SIMD lane) within the code block. A private variable is
22-
initially undefined (except for variables in \code{firstprivate}
23-
and \code{linear} clauses), and the original variable value is
24-
unaltered by assignments to the private variable, (except for
25-
\code{reduction}, \code{lastprivate} and \code{linear} clauses).
19+
OpenMP's memory model also includes a \emph{temporary view} of memory that is
20+
associated with each thread. Two different threads may see different values for
21+
a given variable in their respective temporary views. Threads may employ flush
22+
operations for the purposes of making their temporary view of a variable
23+
consistent with the value of the variable in memory. The effect of a given
24+
flush operation is characterized by its flush properties -- some combination of
25+
\emph{strong}, \emph{release}, and \emph{acquire} -- and, for \emph{strong}
26+
flushes, a \emph{flush-set}.
2627

27-
Private variables in an outer \code{parallel} region can be
28-
shared by implicit tasks of an inner \code{parallel} region
29-
(with a \code{share} clause on the inner \code{parallel} directive).
30-
Likewise, a private variable may be shared in the region of an
31-
explicit \code{task} (through a \code{shared} clause).
28+
A \emph{strong} flush will force consistency between the temporary view and the
29+
memory for all variables in its \emph{flush-set}. Furthermore all strong flushes in a
30+
program that have intersecting flush-sets will execute in some total order, and
31+
within a thread strong flushes may not be reordered with respect to other
32+
memory operations on variables in its flush-set. \emph{Release} and
33+
\emph{acquire} flushes operate in pairs. A release flush may ``synchronize''
34+
with an acquire flush, and when it does so the local memory operations that
35+
precede the release flush will appear to have been completed before the local
36+
memory operations on the same variables that follow the acquire flush.
3237

38+
Flush operations arise from explicit \code{flush} directives, implicit
39+
\code{flush} directives, and also from the execution of \code{atomic}
40+
constructs. The \code{flush} directive forces a consistent view of local
41+
variables of the thread executing the \code{flush}. When a list is supplied on
42+
the directive, only the items (variables) in the list are guaranteed to be
43+
flushed. Implied flushes exist at prescribed locations of certain constructs.
44+
For the complete list of these locations and associated constructs, please
45+
refer to the \plc{flush Construct} section of the OpenMP Specifications
46+
document.
47+
48+
In this chapter, examples illustrate how race conditions may arise for accesses
49+
to variables with a \plc{shared} data-sharing attribute when flush operations
50+
are not properly employed. A race condition can exist when two or more threads
51+
are involved in accessing a variable in which not all of the accesses are
52+
reads; that is, a WaR, RaW or WaW condition exists (R=read, a=after, W=write).
53+
A RaR does not produce a race condition. In particular, a data race will arise
54+
when conflicting accesses do not have a well-defined \emph{completion order}.
55+
The existence of data races in OpenMP programs result in undefined behavior,
56+
and so they should generally be avoided for programs to be correct. The
57+
completion order of accesses to a shared variable is guaranteed in OpenMP
58+
through a set of memory consistency rules that are described in the \plc{OpenMP
59+
Memory Consitency} section of the OpenMP Specifications document.
60+
61+
%This chapter also includes examples that exhibit non-sequentially consistent
62+
%(\emph{non-SC}) behavior. Sequential consistency (\emph{SC}) is the desirable
63+
%property that the results of a multi-threaded program are as if all operations
64+
%are performed in some total order, consistent with the program order of
65+
%operations performed by each thread. OpenMP guarantees that a correct program
66+
%(i.e. a program that does not have a data race) will exhibit SC behavior
67+
%so long as the only \code{atomic} constructs it uses are SC atomic directives.
3368

34-
The \code{flush} directive forces a consistent view of local variables
35-
of the thread executing the \code{flush}.
36-
When a list is supplied on the directive, only the items (variables)
37-
in the list are guaranteed to be flushed.
3869

39-
Implied flushes exist at prescribed locations of certain constructs.
40-
For the complete list of these locations and associated constructs,
41-
please refer to the \plc{flush Construct} section of the OpenMP
42-
Specifications document.
4370

4471
% The following table lists construct in which implied flushes exist, and the
4572
% location of their execution.
@@ -102,4 +129,4 @@ \chapter{Memory Model}
102129
% specific storage location accessed atomically (specified as the \plc{x} variable
103130
% in \plc{atomic Construct} subsection of the OpenMP Specifications document).
104131

105-
Examples 1-3 show the difficulty of synchronizing threads through \code{flush} and \code{atomic} directives.
132+
% Examples 1-3 show the difficulty of synchronizing threads through \code{flush} and \code{atomic} directives.

Chap_program_control.tex

+1-1
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ \chapter{Program Control}
2424
activates the corresponding region.
2525
The \code{cancel} construct is activated by the first encountering thread, and it
2626
continues execution at the end of the named region.
27-
The \code{cancel} construct is also a concellation point for any other thread of the team
27+
The \code{cancel} construct is also a cancellation point for any other thread of the team
2828
to also continue execution at the end of the named region.
2929

3030
Also, once the specified region has been activated for cancellation any thread that encounnters

Chap_synchronization.tex

+25-13
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,15 @@ \chapter{Synchronization}
1919
On a finer scale the \code{atomic} construct allows only a single thread at
2020
a time to have atomic access to a storage location involving a single read,
2121
write, update or capture statement, and a limited number of combinations
22-
when specifying the \code{capture} \plc{atomic-clause} clause. The \plc{atomic-clause} clause
23-
is required for some expression statements, but are not required for
24-
\code{update} statements. Please see the details in the \plc{atomic Construct}
25-
subsection of the \plc{Directives} chapter in the OpenMP Specifications document.
22+
when specifying the \code{capture} \plc{atomic-clause} clause. The
23+
\plc{atomic-clause} clause is required for some expression statements, but is
24+
not required for \code{update} statements. The \plc{memory-order} clause can be
25+
used to specify the degree of memory ordering enforced by an \code{atomic}
26+
construct. From weakest to strongest, they are \code{relaxed} (the default),
27+
acquire and/or release clauses (specified with \code{acquire}, \code{release},
28+
or \code{acq\_rel}), and \code{seq\_cst}. Please see the details in the
29+
\plc{atomic Construct} subsection of the \plc{Directives} chapter in the OpenMP
30+
Specifications document.
2631

2732
% The following three sentences were stolen from the spec.
2833
The \code{ordered} construct either specifies a structured block in a loop,
@@ -37,15 +42,22 @@ \chapter{Synchronization}
3742
dependence. The \code{depend} clause with a \code{source}
3843
\plc{dependence-type} specifies dependence satisfaction.
3944

40-
The \code{flush} directive is a stand-alone construct that forces a thread's
41-
temporal local storage (view) of a variable to memory where a consistent view
42-
of the variable storage can be accesses. When the construct is used without
43-
a variable list, all the locally thread-visible data as defined by the
44-
base language are flushed. A construct with a list applies the flush
45-
operation only to the items in the list. The \code{flush} construct also
46-
effectively insures that no memory (load or store) operation for
47-
the variable set (list items, or default set) may be reordered across
48-
the \code{flush} directive.
45+
The \code{flush} directive is a stand-alone construct for enforcing consistency
46+
between a thread's view of memory and the view of memory for other threads (see
47+
the Memory Model chapter of this document for more details). When the construct
48+
is used with an explicit variable list, a \plc{strong flush} that forces a
49+
thread's temporary view of memory to be consistent with the actual memory is
50+
applied to all listed variables. When the construct is used without an explicit
51+
variable list and without a \plc{memory-order} clause, a strong flush is
52+
applied to all locally thread-visible data as defined by the base language, and
53+
additionally the construct provides both acquire and release memory ordering
54+
semantics. When an explicit variable list is not present and a
55+
\plc{memory-order} clause is present, the construct provides acquire and/or
56+
release memory ordering semantics according to the \plc{memory-order} clause,
57+
but no strong flush is performed. A resulting strong flush that applies to a
58+
set of variables effectively ensures that no memory (load or store)
59+
operation for the affected variables may be reordered across the \code{flush}
60+
directive.
4961

5062
General-purpose routines provide mutual exclusion semantics through locks,
5163
represented by lock variables.

Examples_SIMD.tex

+7-3
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ \section{\code{simd} and \code{declare} \code{simd} Constructs}
88
\cexample{SIMD}{1}
99

1010
\ffreeexample{SIMD}{1}
11+
12+
\clearpage
1113

1214

1315
When a function can be inlined within a loop the compiler has an opportunity to
@@ -24,7 +26,7 @@ \section{\code{simd} and \code{declare} \code{simd} Constructs}
2426
The \code{declare} \code{simd} constructs also illustrate the use of
2527
\code{uniform} and \code{linear} clauses. The \code{uniform(fact)} clause
2628
indicates that the variable \plc{fact} is invariant across the SIMD lanes. In
27-
the \plc{add2} function \plc{a} and \plc{b} are included in the \code{unform}
29+
the \plc{add2} function \plc{a} and \plc{b} are included in the \code{uniform}
2830
list because the C pointer and the Fortran array references are constant. The
2931
\plc{i} index used in the \plc{add2} function is included in a \code{linear}
3032
clause with a constant-linear-step of 1, to guarantee a unity increment of the
@@ -42,7 +44,7 @@ \section{\code{simd} and \code{declare} \code{simd} Constructs}
4244

4345
\ffreeexample{SIMD}{2}
4446

45-
47+
\pagebreak
4648
A thread that encounters a SIMD construct executes a vectorized code of the
4749
iterations. Similar to the concerns of a worksharing loop a loop vectorized
4850
with a SIMD construct must assure that temporary and reduction variables are
@@ -55,6 +57,7 @@ \section{\code{simd} and \code{declare} \code{simd} Constructs}
5557
\ffreeexample{SIMD}{3}
5658

5759

60+
\pagebreak
5861
A \code{safelen(N)} clause in a \code{simd} construct assures the compiler that
5962
there are no loop-carried dependencies for vectors of size \plc{N} or below. If
6063
the \code{safelen} clause is not specified, then the default safelen value is
@@ -69,7 +72,7 @@ \section{\code{simd} and \code{declare} \code{simd} Constructs}
6972

7073
\ffreeexample{SIMD}{4}
7174

72-
75+
\pagebreak
7376
The following SIMD construct instructs the compiler to collapse the \plc{i} and
7477
\plc{j} loops into a single SIMD loop in which SIMD chunks are executed by
7578
threads of the team. Within the workshared loop chunks of a thread, the SIMD
@@ -110,6 +113,7 @@ \section{\code{inbranch} and \code{notinbranch} Clauses}
110113

111114

112115
%%% section
116+
\pagebreak
113117
\section{Loop-Carried Lexical Forward Dependence}
114118
\label{sec:SIMD_forward_dep}
115119

0 commit comments

Comments
 (0)