Skip to content

[SYCL] Check that some device global uninitialized before locking #18041

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 13 additions & 4 deletions sycl/source/detail/context_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -338,16 +338,23 @@ void context_impl::removeAssociatedDeviceGlobal(const void *DeviceGlobalPtr) {
void context_impl::addDeviceGlobalInitializer(
ur_program_handle_t Program, const std::vector<device> &Devs,
const RTDeviceBinaryImage *BinImage) {
if (BinImage->getDeviceGlobals().empty())
return;
std::lock_guard<std::mutex> Lock(MDeviceGlobalInitializersMutex);
for (const device &Dev : Devs) {
auto Key = std::make_pair(Program, getSyclObjImpl(Dev)->getHandleRef());
MDeviceGlobalInitializers.emplace(Key, BinImage);
auto [Iter, Inserted] = MDeviceGlobalInitializers.emplace(Key, BinImage);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple of questions:

  1. I see MDeviceGlobalInitializers is an std::map. Do we really need the elements to be sorted? If not, we can change to std::unordered_map, which should provide better performance.
  2. Are we expecting the inserted elements to be consecutive? Could we use emplace_hint instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I see MDeviceGlobalInitializers is an std::map. Do we really need the elements to be sorted? If not, we can change to std::unordered_map, which should provide better performance.

Sure, done.

  1. Are we expecting the inserted elements to be consecutive? Could we use emplace_hint instead?

The key is a pair of pointers. I'm not sure what can be say about order of them.

Copy link
Contributor

@maarquitos14 maarquitos14 Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key is a pair of pointers. I'm not sure what can be say about order of them.

Yep, that is my concern too. Maybe it's worth it to try and measure to see if it helps? I'll approve anyway in case you want to do that in a separate PR.

if (Inserted && !Iter->second.MDeviceGlobalsFullyInitialized)
++MDeviceGlobalNotInitializedCnt;
}
}

std::vector<ur_event_handle_t> context_impl::initializeDeviceGlobals(
ur_program_handle_t NativePrg,
const std::shared_ptr<queue_impl> &QueueImpl) {
if (!MDeviceGlobalNotInitializedCnt.load(std::memory_order_acquire))
return {};

const AdapterPtr &Adapter = getAdapter();
const DeviceImplPtr &DeviceImpl = QueueImpl->getDeviceImplPtr();
std::lock_guard<std::mutex> NativeProgramLock(MDeviceGlobalInitializersMutex);
Expand All @@ -369,16 +376,17 @@ std::vector<ur_event_handle_t> context_impl::initializeDeviceGlobals(
[&Adapter](const ur_event_handle_t &Event) {
return get_event_info<info::event::command_execution_status>(
Event, Adapter) == info::event_command_status::complete;
return false;
});
// Release the removed events.
for (auto EventIt = NewEnd; EventIt != InitEventsRef.end(); ++EventIt)
Adapter->call<UrApiKind::urEventRelease>(*EventIt);
// Remove them from the collection.
InitEventsRef.erase(NewEnd, InitEventsRef.end());
// If there are no more events, we can mark it as fully initialized.
if (InitEventsRef.empty())
if (InitEventsRef.empty()) {
InitRef.MDeviceGlobalsFullyInitialized = true;
--MDeviceGlobalNotInitializedCnt;
}
return InitEventsRef;
} else if (InitRef.MDeviceGlobalsFullyInitialized) {
// MDeviceGlobalsFullyInitialized could have been set while we were
Expand All @@ -387,7 +395,7 @@ std::vector<ur_event_handle_t> context_impl::initializeDeviceGlobals(
}

// There were no events and it was not set as fully initialized, so this is
// responsible for intializing the device globals.
// responsible for initializing the device globals.
auto DeviceGlobals = InitRef.MBinImage->getDeviceGlobals();
std::vector<std::string> DeviceGlobalIds;
DeviceGlobalIds.reserve(DeviceGlobals.size());
Expand All @@ -402,6 +410,7 @@ std::vector<ur_event_handle_t> context_impl::initializeDeviceGlobals(
// globals are trivially fully initialized and we can end early.
if (DeviceGlobalEntries.empty()) {
InitRef.MDeviceGlobalsFullyInitialized = true;
--MDeviceGlobalNotInitializedCnt;
return {};
}

Expand Down
15 changes: 13 additions & 2 deletions sycl/source/detail/context_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -307,10 +307,21 @@ class context_impl {
std::vector<ur_event_handle_t> MDeviceGlobalInitEvents;
};

std::map<std::pair<ur_program_handle_t, ur_device_handle_t>,
DeviceGlobalInitializer>
using HandleDevicePair = std::pair<ur_program_handle_t, ur_device_handle_t>;

struct HandleDevicePairHash {
std::size_t operator()(const HandleDevicePair &Key) const {
return std::hash<ur_program_handle_t>{}(Key.first) ^
std::hash<ur_device_handle_t>{}(Key.second);
}
};

std::unordered_map<HandleDevicePair, DeviceGlobalInitializer,
HandleDevicePairHash>
MDeviceGlobalInitializers;
std::mutex MDeviceGlobalInitializersMutex;
// The number of device globals that have not been initialized yet.
std::atomic<size_t> MDeviceGlobalNotInitializedCnt = 0;

// For device_global variables that are not used in any kernel code we still
// allow copy operations on them. MDeviceGlobalUnregisteredData stores the
Expand Down