-
Notifications
You must be signed in to change notification settings - Fork 13.7k
[flang] Restructure runtime to avoid recursion (relanding) #143993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-flang-fir-hlfir @llvm/pr-subscribers-flang-semantics Author: Peter Klausler (klausler) ChangesRecursion, both direct and indirect, prevents accurate stack size calculation at link time for GPU device code. Restructure these recursive (often mutually so) routines in the Fortran runtime with new implementations based on an iterative work queue with suspendable/resumable work tickets: Assign, Initialize, initializeClone, Finalize, and Destroy. Default derived type I/O is also recursive, but already disabled. It can be added to this new framework later if the overall approach succeeds. Note that derived type FINAL subroutine calls, defined assignments, and defined I/O procedures all perform callbacks into user code, which may well reenter the runtime library. This kind of recursion is not handled by this change, although it may be possible to do so in the future using thread-local work queues. (Relanding this patch after reverting initial attempt due to some test failures that needed some time to analyze and fix.) Fixes #142481. Patch is 236.36 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/143993.diff 32 Files Affected:
diff --git a/flang-rt/include/flang-rt/runtime/environment.h b/flang-rt/include/flang-rt/runtime/environment.h
index 16258b3bbba9b..e579f6012ce86 100644
--- a/flang-rt/include/flang-rt/runtime/environment.h
+++ b/flang-rt/include/flang-rt/runtime/environment.h
@@ -64,6 +64,9 @@ struct ExecutionEnvironment {
bool defaultUTF8{false}; // DEFAULT_UTF8
bool checkPointerDeallocation{true}; // FORT_CHECK_POINTER_DEALLOCATION
+ enum InternalDebugging { WorkQueue = 1 };
+ int internalDebugging{0}; // FLANG_RT_DEBUG
+
// CUDA related variables
std::size_t cudaStackLimit{0}; // ACC_OFFLOAD_STACK_SIZE
bool cudaDeviceIsManaged{false}; // NV_CUDAFOR_DEVICE_IS_MANAGED
diff --git a/flang-rt/include/flang-rt/runtime/stat.h b/flang-rt/include/flang-rt/runtime/stat.h
index 070d0bf8673fb..dc372de53506a 100644
--- a/flang-rt/include/flang-rt/runtime/stat.h
+++ b/flang-rt/include/flang-rt/runtime/stat.h
@@ -24,7 +24,7 @@ class Terminator;
enum Stat {
StatOk = 0, // required to be zero by Fortran
- // Interoperable STAT= codes
+ // Interoperable STAT= codes (>= 11)
StatBaseNull = CFI_ERROR_BASE_ADDR_NULL,
StatBaseNotNull = CFI_ERROR_BASE_ADDR_NOT_NULL,
StatInvalidElemLen = CFI_INVALID_ELEM_LEN,
@@ -36,7 +36,7 @@ enum Stat {
StatMemAllocation = CFI_ERROR_MEM_ALLOCATION,
StatOutOfBounds = CFI_ERROR_OUT_OF_BOUNDS,
- // Standard STAT= values
+ // Standard STAT= values (>= 101)
StatFailedImage = FORTRAN_RUNTIME_STAT_FAILED_IMAGE,
StatLocked = FORTRAN_RUNTIME_STAT_LOCKED,
StatLockedOtherImage = FORTRAN_RUNTIME_STAT_LOCKED_OTHER_IMAGE,
@@ -49,10 +49,14 @@ enum Stat {
// Additional "processor-defined" STAT= values
StatInvalidArgumentNumber = FORTRAN_RUNTIME_STAT_INVALID_ARG_NUMBER,
StatMissingArgument = FORTRAN_RUNTIME_STAT_MISSING_ARG,
- StatValueTooShort = FORTRAN_RUNTIME_STAT_VALUE_TOO_SHORT,
+ StatValueTooShort = FORTRAN_RUNTIME_STAT_VALUE_TOO_SHORT, // -1
StatMoveAllocSameAllocatable =
FORTRAN_RUNTIME_STAT_MOVE_ALLOC_SAME_ALLOCATABLE,
StatBadPointerDeallocation = FORTRAN_RUNTIME_STAT_BAD_POINTER_DEALLOCATION,
+
+ // Dummy status for work queue continuation, declared here to perhaps
+ // avoid collisions
+ StatContinue = 201
};
RT_API_ATTRS const char *StatErrorString(int);
diff --git a/flang-rt/include/flang-rt/runtime/type-info.h b/flang-rt/include/flang-rt/runtime/type-info.h
index 5e79efde164f2..80301a313282f 100644
--- a/flang-rt/include/flang-rt/runtime/type-info.h
+++ b/flang-rt/include/flang-rt/runtime/type-info.h
@@ -154,12 +154,17 @@ class SpecialBinding {
RT_API_ATTRS bool IsArgDescriptor(int zeroBasedArg) const {
return (isArgDescriptorSet_ >> zeroBasedArg) & 1;
}
- RT_API_ATTRS bool isTypeBound() const { return isTypeBound_; }
+ RT_API_ATTRS bool IsTypeBound() const { return isTypeBound_ != 0; }
RT_API_ATTRS bool IsArgContiguous(int zeroBasedArg) const {
return (isArgContiguousSet_ >> zeroBasedArg) & 1;
}
- template <typename PROC> RT_API_ATTRS PROC GetProc() const {
- return reinterpret_cast<PROC>(proc_);
+ template <typename PROC>
+ RT_API_ATTRS PROC GetProc(const Binding *bindings = nullptr) const {
+ if (bindings && isTypeBound_ > 0) {
+ return reinterpret_cast<PROC>(bindings[isTypeBound_ - 1].proc);
+ } else {
+ return reinterpret_cast<PROC>(proc_);
+ }
}
FILE *Dump(FILE *) const;
@@ -193,6 +198,8 @@ class SpecialBinding {
// When false, the defined I/O subroutine must have been
// called via a generic interface, not a generic TBP.
std::uint8_t isArgDescriptorSet_{0};
+ // When a special binding is type-bound, this is its binding's index (plus 1,
+ // so that 0 signifies that it's not type-bound).
std::uint8_t isTypeBound_{0};
// True when a FINAL subroutine has a dummy argument that is an array that
// is CONTIGUOUS or neither assumed-rank nor assumed-shape.
@@ -240,6 +247,7 @@ class DerivedType {
RT_API_ATTRS bool noFinalizationNeeded() const {
return noFinalizationNeeded_;
}
+ RT_API_ATTRS bool noDefinedAssignment() const { return noDefinedAssignment_; }
RT_API_ATTRS std::size_t LenParameters() const {
return lenParameterKind().Elements();
@@ -322,6 +330,7 @@ class DerivedType {
bool noInitializationNeeded_{false};
bool noDestructionNeeded_{false};
bool noFinalizationNeeded_{false};
+ bool noDefinedAssignment_{false};
};
} // namespace Fortran::runtime::typeInfo
diff --git a/flang-rt/include/flang-rt/runtime/work-queue.h b/flang-rt/include/flang-rt/runtime/work-queue.h
new file mode 100644
index 0000000000000..0daa7bc4d3384
--- /dev/null
+++ b/flang-rt/include/flang-rt/runtime/work-queue.h
@@ -0,0 +1,555 @@
+//===-- include/flang-rt/runtime/work-queue.h -------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// Internal runtime utilities for work queues that replace the use of recursion
+// for better GPU device support.
+//
+// A work queue comprises a list of tickets. Each ticket class has a Begin()
+// member function, which is called once, and a Continue() member function
+// that can be called zero or more times. A ticket's execution terminates
+// when either of these member functions returns a status other than
+// StatContinue. When that status is not StatOk, then the whole queue
+// is shut down.
+//
+// By returning StatContinue from its Continue() member function,
+// a ticket suspends its execution so that any nested tickets that it
+// may have created can be run to completion. It is the reponsibility
+// of each ticket class to maintain resumption information in its state
+// and manage its own progress. Most ticket classes inherit from
+// class ComponentsOverElements, which implements an outer loop over all
+// components of a derived type, and an inner loop over all elements
+// of a descriptor, possibly with multiple phases of execution per element.
+//
+// Tickets are created by WorkQueue::Begin...() member functions.
+// There is one of these for each "top level" recursive function in the
+// Fortran runtime support library that has been restructured into this
+// ticket framework.
+//
+// When the work queue is running tickets, it always selects the last ticket
+// on the list for execution -- "work stack" might have been a more accurate
+// name for this framework. This ticket may, while doing its job, create
+// new tickets, and since those are pushed after the active one, the first
+// such nested ticket will be the next one executed to completion -- i.e.,
+// the order of nested WorkQueue::Begin...() calls is respected.
+// Note that a ticket's Continue() member function won't be called again
+// until all nested tickets have run to completion and it is once again
+// the last ticket on the queue.
+//
+// Example for an assignment to a derived type:
+// 1. Assign() is called, and its work queue is created. It calls
+// WorkQueue::BeginAssign() and then WorkQueue::Run().
+// 2. Run calls AssignTicket::Begin(), which pushes a tickets via
+// BeginFinalize() and returns StatContinue.
+// 3. FinalizeTicket::Begin() and FinalizeTicket::Continue() are called
+// until one of them returns StatOk, which ends the finalization ticket.
+// 4. AssignTicket::Continue() is then called; it creates a DerivedAssignTicket
+// and then returns StatOk, which ends the ticket.
+// 5. At this point, only one ticket remains. DerivedAssignTicket::Begin()
+// and ::Continue() are called until they are done (not StatContinue).
+// Along the way, it may create nested AssignTickets for components,
+// and suspend itself so that they may each run to completion.
+
+#ifndef FLANG_RT_RUNTIME_WORK_QUEUE_H_
+#define FLANG_RT_RUNTIME_WORK_QUEUE_H_
+
+#include "flang-rt/runtime/connection.h"
+#include "flang-rt/runtime/descriptor.h"
+#include "flang-rt/runtime/stat.h"
+#include "flang-rt/runtime/type-info.h"
+#include "flang/Common/api-attrs.h"
+#include "flang/Runtime/freestanding-tools.h"
+#include <flang/Common/variant.h>
+
+namespace Fortran::runtime::io {
+class IoStatementState;
+struct NonTbpDefinedIoTable;
+} // namespace Fortran::runtime::io
+
+namespace Fortran::runtime {
+class Terminator;
+class WorkQueue;
+
+// Ticket worker base classes
+
+template <typename TICKET> class ImmediateTicketRunner {
+public:
+ RT_API_ATTRS explicit ImmediateTicketRunner(TICKET &ticket)
+ : ticket_{ticket} {}
+ RT_API_ATTRS int Run(WorkQueue &workQueue) {
+ int status{ticket_.Begin(workQueue)};
+ while (status == StatContinue) {
+ status = ticket_.Continue(workQueue);
+ }
+ return status;
+ }
+
+private:
+ TICKET &ticket_;
+};
+
+// Base class for ticket workers that operate elementwise over descriptors
+class Elementwise {
+public:
+ RT_API_ATTRS Elementwise(
+ const Descriptor &instance, const Descriptor *from = nullptr)
+ : instance_{instance}, from_{from} {
+ instance_.GetLowerBounds(subscripts_);
+ if (from_) {
+ from_->GetLowerBounds(fromSubscripts_);
+ }
+ }
+ RT_API_ATTRS bool IsComplete() const { return elementAt_ >= elements_; }
+ RT_API_ATTRS void Advance() {
+ ++elementAt_;
+ instance_.IncrementSubscripts(subscripts_);
+ if (from_) {
+ from_->IncrementSubscripts(fromSubscripts_);
+ }
+ }
+ RT_API_ATTRS void SkipToEnd() { elementAt_ = elements_; }
+ RT_API_ATTRS void Reset() {
+ elementAt_ = 0;
+ instance_.GetLowerBounds(subscripts_);
+ if (from_) {
+ from_->GetLowerBounds(fromSubscripts_);
+ }
+ }
+
+protected:
+ const Descriptor &instance_, *from_{nullptr};
+ std::size_t elements_{instance_.Elements()};
+ std::size_t elementAt_{0};
+ SubscriptValue subscripts_[common::maxRank];
+ SubscriptValue fromSubscripts_[common::maxRank];
+};
+
+// Base class for ticket workers that operate over derived type components.
+class Componentwise {
+public:
+ RT_API_ATTRS Componentwise(const typeInfo::DerivedType &);
+ RT_API_ATTRS bool IsComplete() const { return componentAt_ >= components_; }
+ RT_API_ATTRS void Advance() {
+ ++componentAt_;
+ GetComponent();
+ }
+ RT_API_ATTRS void SkipToEnd() {
+ component_ = nullptr;
+ componentAt_ = components_;
+ }
+ RT_API_ATTRS void Reset() {
+ component_ = nullptr;
+ componentAt_ = 0;
+ GetComponent();
+ }
+ RT_API_ATTRS void GetComponent();
+
+protected:
+ const typeInfo::DerivedType &derived_;
+ std::size_t components_{0}, componentAt_{0};
+ const typeInfo::Component *component_{nullptr};
+ StaticDescriptor<common::maxRank, true, 0> componentDescriptor_;
+};
+
+// Base class for ticket workers that operate over derived type components
+// in an outer loop, and elements in an inner loop.
+class ComponentsOverElements : public Componentwise, public Elementwise {
+public:
+ RT_API_ATTRS ComponentsOverElements(const Descriptor &instance,
+ const typeInfo::DerivedType &derived, const Descriptor *from = nullptr)
+ : Componentwise{derived}, Elementwise{instance, from} {
+ if (Elementwise::IsComplete()) {
+ Componentwise::SkipToEnd();
+ }
+ }
+ RT_API_ATTRS bool IsComplete() const { return Componentwise::IsComplete(); }
+ RT_API_ATTRS void Advance() {
+ SkipToNextElement();
+ if (Elementwise::IsComplete()) {
+ Elementwise::Reset();
+ Componentwise::Advance();
+ }
+ }
+ RT_API_ATTRS void SkipToNextElement() {
+ phase_ = 0;
+ Elementwise::Advance();
+ }
+ RT_API_ATTRS void SkipToNextComponent() {
+ phase_ = 0;
+ Elementwise::Reset();
+ Componentwise::Advance();
+ }
+ RT_API_ATTRS void Reset() {
+ phase_ = 0;
+ Elementwise::Reset();
+ Componentwise::Reset();
+ }
+
+protected:
+ int phase_{0};
+};
+
+// Base class for ticket workers that operate over elements in an outer loop,
+// type components in an inner loop.
+class ElementsOverComponents : public Elementwise, public Componentwise {
+public:
+ RT_API_ATTRS ElementsOverComponents(const Descriptor &instance,
+ const typeInfo::DerivedType &derived, const Descriptor *from = nullptr)
+ : Elementwise{instance, from}, Componentwise{derived} {
+ if (Componentwise::IsComplete()) {
+ Elementwise::SkipToEnd();
+ }
+ }
+ RT_API_ATTRS bool IsComplete() const { return Elementwise::IsComplete(); }
+ RT_API_ATTRS void Advance() {
+ SkipToNextComponent();
+ if (Componentwise::IsComplete()) {
+ Componentwise::Reset();
+ Elementwise::Advance();
+ }
+ }
+ RT_API_ATTRS void SkipToNextComponent() {
+ phase_ = 0;
+ Componentwise::Advance();
+ }
+ RT_API_ATTRS void SkipToNextElement() {
+ phase_ = 0;
+ Componentwise::Reset();
+ Elementwise::Advance();
+ }
+
+protected:
+ int phase_{0};
+};
+
+// Ticket worker classes
+
+// Implements derived type instance initialization
+class InitializeTicket : public ImmediateTicketRunner<InitializeTicket>,
+ private ComponentsOverElements {
+public:
+ RT_API_ATTRS InitializeTicket(
+ const Descriptor &instance, const typeInfo::DerivedType &derived)
+ : ImmediateTicketRunner<InitializeTicket>{*this},
+ ComponentsOverElements{instance, derived} {}
+ RT_API_ATTRS int Begin(WorkQueue &);
+ RT_API_ATTRS int Continue(WorkQueue &);
+};
+
+// Initializes one derived type instance from the value of another
+class InitializeCloneTicket
+ : public ImmediateTicketRunner<InitializeCloneTicket>,
+ private ComponentsOverElements {
+public:
+ RT_API_ATTRS InitializeCloneTicket(const Descriptor &clone,
+ const Descriptor &original, const typeInfo::DerivedType &derived,
+ bool hasStat, const Descriptor *errMsg)
+ : ImmediateTicketRunner<InitializeCloneTicket>{*this},
+ ComponentsOverElements{original, derived}, clone_{clone},
+ hasStat_{hasStat}, errMsg_{errMsg} {}
+ RT_API_ATTRS int Begin(WorkQueue &) { return StatContinue; }
+ RT_API_ATTRS int Continue(WorkQueue &);
+
+private:
+ const Descriptor &clone_;
+ bool hasStat_{false};
+ const Descriptor *errMsg_{nullptr};
+ StaticDescriptor<common::maxRank, true, 0> cloneComponentDescriptor_;
+};
+
+// Implements derived type instance finalization
+class FinalizeTicket : public ImmediateTicketRunner<FinalizeTicket>,
+ private ComponentsOverElements {
+public:
+ RT_API_ATTRS FinalizeTicket(
+ const Descriptor &instance, const typeInfo::DerivedType &derived)
+ : ImmediateTicketRunner<FinalizeTicket>{*this},
+ ComponentsOverElements{instance, derived} {}
+ RT_API_ATTRS int Begin(WorkQueue &);
+ RT_API_ATTRS int Continue(WorkQueue &);
+
+private:
+ const typeInfo::DerivedType *finalizableParentType_{nullptr};
+};
+
+// Implements derived type instance destruction
+class DestroyTicket : public ImmediateTicketRunner<DestroyTicket>,
+ private ComponentsOverElements {
+public:
+ RT_API_ATTRS DestroyTicket(const Descriptor &instance,
+ const typeInfo::DerivedType &derived, bool finalize)
+ : ImmediateTicketRunner<DestroyTicket>{*this},
+ ComponentsOverElements{instance, derived}, finalize_{finalize} {}
+ RT_API_ATTRS int Begin(WorkQueue &);
+ RT_API_ATTRS int Continue(WorkQueue &);
+
+private:
+ bool finalize_{false};
+};
+
+// Implements general intrinsic assignment
+class AssignTicket : public ImmediateTicketRunner<AssignTicket> {
+public:
+ RT_API_ATTRS AssignTicket(Descriptor &to, const Descriptor &from, int flags,
+ MemmoveFct memmoveFct, const typeInfo::DerivedType *declaredType)
+ : ImmediateTicketRunner<AssignTicket>{*this}, to_{to}, from_{&from},
+ flags_{flags}, memmoveFct_{memmoveFct}, declaredType_{declaredType} {}
+ RT_API_ATTRS int Begin(WorkQueue &);
+ RT_API_ATTRS int Continue(WorkQueue &);
+
+private:
+ RT_API_ATTRS bool IsSimpleMemmove() const {
+ return !toDerived_ && to_.rank() == from_->rank() && to_.IsContiguous() &&
+ from_->IsContiguous() && to_.ElementBytes() == from_->ElementBytes();
+ }
+ RT_API_ATTRS Descriptor &GetTempDescriptor();
+
+ Descriptor &to_;
+ const Descriptor *from_{nullptr};
+ int flags_{0}; // enum AssignFlags
+ MemmoveFct memmoveFct_{nullptr};
+ StaticDescriptor<common::maxRank, true, 0> tempDescriptor_;
+ const typeInfo::DerivedType *declaredType_{nullptr};
+ const typeInfo::DerivedType *toDerived_{nullptr};
+ Descriptor *toDeallocate_{nullptr};
+ bool persist_{false};
+ bool done_{false};
+};
+
+// Implements derived type intrinsic assignment.
+template <bool IS_COMPONENTWISE>
+class DerivedAssignTicket
+ : public ImmediateTicketRunner<DerivedAssignTicket<IS_COMPONENTWISE>>,
+ private std::conditional_t<IS_COMPONENTWISE, ComponentsOverElements,
+ ElementsOverComponents> {
+public:
+ using Base = std::conditional_t<IS_COMPONENTWISE, ComponentsOverElements,
+ ElementsOverComponents>;
+ RT_API_ATTRS DerivedAssignTicket(const Descriptor &to, const Descriptor &from,
+ const typeInfo::DerivedType &derived, int flags, MemmoveFct memmoveFct,
+ Descriptor *deallocateAfter)
+ : ImmediateTicketRunner<DerivedAssignTicket>{*this},
+ Base{to, derived, &from}, flags_{flags}, memmoveFct_{memmoveFct},
+ deallocateAfter_{deallocateAfter} {}
+ RT_API_ATTRS int Begin(WorkQueue &);
+ RT_API_ATTRS int Continue(WorkQueue &);
+
+private:
+ static constexpr bool isComponentwise_{IS_COMPONENTWISE};
+ bool toIsContiguous_{this->instance_.IsContiguous()};
+ bool fromIsContiguous_{this->from_->IsContiguous()};
+ int flags_{0};
+ MemmoveFct memmoveFct_{nullptr};
+ Descriptor *deallocateAfter_{nullptr};
+ StaticDescriptor<common::maxRank, true, 0> fromComponentDescriptor_;
+};
+
+namespace io::descr {
+
+template <io::Direction DIR>
+class DescriptorIoTicket
+ : public ImmediateTicketRunner<DescriptorIoTicket<DIR>>,
+ private Elementwise {
+public:
+ RT_API_ATTRS DescriptorIoTicket(io::IoStatementState &io,
+ const Descriptor &descriptor, const io::NonTbpDefinedIoTable *table,
+ bool &anyIoTookPlace)
+ : ImmediateTicketRunner<DescriptorIoTicket>(*this),
+ Elementwise{descriptor}, io_{io}, table_{table},
+ anyIoTookPlace_{anyIoTookPlace} {}
+ RT_API_ATTRS int Begin(WorkQueue &);
+ RT_API_ATTRS int Continue(WorkQueue &);
+ RT_API_ATTRS bool &anyIoTookPlace() { return anyIoTookPlace_; }
+
+private:
+ io::IoStatementState &io_;
+ const io::NonTbpDefinedIoTable *table_{nullptr};
+ bool &anyIoTookPlace_;
+ common::optional<typeInfo::SpecialBinding> nonTbpSpecial_;
+ const typeInfo::DerivedType *derived_{nullptr};
+ const typeInfo::SpecialBinding *special_{nullptr};
+ StaticDescriptor<common::maxRank, true, 0> elementDescriptor_;
+};
+
+template <io::Direction DIR>
+class DerivedIoTicket : public ImmediateTicketRunner<DerivedIoTicket<DIR>>,
+ private ElementsOverComponents {
+public:
+ RT_API_ATTRS DerivedIoTicket(io::IoStatementState &io,
+ const Descriptor &descriptor, const typeInfo::DerivedType &derived,
+ const io::NonTbpDefinedIoTable *table, bool &anyIoTookPlace)
+ : ImmediateTicketRunner<DerivedIoTicket>(*this),
+ ElementsOverComponents{descriptor, derived}, io_{io}, table_{table},
+ anyIoTookPlace_{anyIoTookPlace} {}
+ RT_API_ATTRS int Begin(WorkQueue &) { return StatContinue; }
+ RT_API_ATTRS int Continue(WorkQueue &);
+
+private:
+ io::IoStatementState &io_;
+ const io::NonTbpDefinedIoTable *table_{nullptr};
+ bool &anyIoTookPlace_;
+};
+
+} // namespace io::descr
+
+struct NullTicket {
+ RT_API_ATTRS int Begin(WorkQueue &) const { return StatOk; }
+ RT_API_ATTRS int Continue(WorkQueue &) const { return StatOk; }
+};
+
+struct Ticket {
+ RT_API_ATTRS int Continue(WorkQueue &);
+ bool begun{false};
+ std::variant<NullTicket, InitializeTicket, InitializeCloneTicket,
+ FinalizeTicket, DestroyTicket, AssignTicket, DerivedAssignTicket<false>,
+ DerivedAssignTicket<true>,
+ io::descr::DescriptorIoTicket<io::Direction::Output>,
+ io::descr::DescriptorIoTicket<io::Direction::Input>,
+ io::descr::DerivedIoTicket<io::Direction::Output>,
+ io::descr::DerivedIoTicket<io::Direction::Input>>
+ u;
+};
+
+class WorkQueue {
+public:
+ RT_API_ATTRS explicit WorkQueue(Terminator &terminator)
+ : termin...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
It fixed the reducer. However, it regressed the original test case that both b1
and c1
are arrays that the bound checking failed with this patch. Here is the reducer.
module m
type base
integer :: i
contains
procedure :: bassgn
generic :: assignment(=) => bassgn
end type
type, extends(base) :: child
integer :: j
contains
procedure :: cassgn
generic :: assignment(=) => cassgn
end type
type container
class(base), allocatable :: b1(:)
class(child), allocatable :: c1(:)
end type
interface assignment(=)
module procedure arraytoarray
end interface
contains
impure elemental subroutine bassgn ( a, b )
class(base), intent(out) :: a
type(base), intent(in) :: b
a%i = b%i + 1
select type ( a )
type is ( child )
a%j = b%i + 2
end select
end subroutine
impure elemental subroutine cassgn ( a, b )
class(child), intent(out) :: a
type(child), intent(in) :: b
a%i = b%i + 2
a%j = b%j + 2
end subroutine
end module
program genericAssignmentDtIntrinAssgn029
use m
type(container) :: c1, c2, c3
pointer :: c2
allocatable :: c3
allocate ( c2, c3 )
allocate ( c1%b1(-9:-7), c1%c1(-100:-98) )
c1 = container( (/ base(1), base(2), base(3) /), (/ child(4,5), child(6,7), child(8,9) /) ) !<- this assignment should deallocate c1%b1 and c1%c1 first
print *, c1%b1%i, c1%c1%i, c1%c1%j, 'bounds', lbound(c1%b1), ubound(c1%b1), lbound(c1%c1), ubound(c1%c1)
end program
The assignment to c1
should deallocate both c1%b1
and c1%c1
first as the shape is different. So the bounds output should be 1 3 1 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind, it took me too long to read through it all and @DanielCChen found something I didn't.
The shape actually doesn't change here -- the components were allocated with three elements each originally. EDIT: But component assignment during intrinsic assignment should always reallocate, same shape or not, so the bounds should change regardless. Will fix. |
5c23517
to
f448482
Compare
Recursion, both direct and indirect, prevents accurate stack size calculation at link time for GPU device code. Restructure these recursive (often mutually so) routines in the Fortran runtime with new implementations based on an iterative work queue with suspendable/resumable work tickets: Assign, Initialize, initializeClone, Finalize, and Destroy. Default derived type I/O is also recursive, but already disabled. It can be added to this new framework later if the overall approach succeeds. Note that derived type FINAL subroutine calls, defined assignments, and defined I/O procedures all perform callbacks into user code, which may well reenter the runtime library. This kind of recursion is not handled by this change, although it may be possible to do so in the future using thread-local work queues. (Relanding this patch after reverting initial attempt due to some test failures that needed some time to analyze and fix.) Fixes llvm#142481.
f448482
to
33959da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
All the test cases are fixed.
Thanks!
Recursion, both direct and indirect, prevents accurate stack size calculation at link time for GPU device code. Restructure these recursive (often mutually so) routines in the Fortran runtime with new implementations based on an iterative work queue with suspendable/resumable work tickets: Assign, Initialize, initializeClone, Finalize, and Destroy.
Default derived type I/O is also recursive, but already disabled. It can be added to this new framework later if the overall approach succeeds.
Note that derived type FINAL subroutine calls, defined assignments, and defined I/O procedures all perform callbacks into user code, which may well reenter the runtime library. This kind of recursion is not handled by this change, although it may be possible to do so in the future using thread-local work queues.
(Relanding this patch after reverting initial attempt due to some test failures that needed some time to analyze and fix.)
Fixes #142481.