Skip to content

scx_rustland_core: Improve sync wakeups #1855

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 12, 2025
Merged

scx_rustland_core: Improve sync wakeups #1855

merged 2 commits into from
May 12, 2025

Conversation

arighi
Copy link
Contributor

@arighi arighi commented May 11, 2025

Certain pipe-intensive workloads are not capable of maxing out all available CPUs.

Improve this by routing the wakee to the waker's CPU during sync wakeups. This helps maintain locality and balance load more effectively, improving overall CPU utilization.

Example (using
https://github.com/marioroy/mce-sandbox/blob/main/bin/algorithm3.pl):

$ ./algorithm3.pl 1e12 --threads=8

  • before:
    0[||||||||||||||||||      58.3%]   4[|||||||||||||||||||||||100.0%]
    1[||||||||||||||||||||||||85.9%]   5[||||||||||||||||||||||||95.6%]
    2[||||||||||||||||||||||||87.8%]   6[||||||||||||||||||||||||91.2%]
    3[|||||||||||||||||||||||100.0%]   7[||||||||||||||||||||||||94.4%]
  • after:
    0[|||||||||||||||||||||||100.0%]   4[|||||||||||||||||||||||100.0%]
    1[|||||||||||||||||||||||100.0%]   5[|||||||||||||||||||||||100.0%]
    2[|||||||||||||||||||||||100.0%]   6[|||||||||||||||||||||||100.0%]
    3[|||||||||||||||||||||||100.0%]   7[|||||||||||||||||||||||100.0%]

* never perform a direct dispatch otherwise we may have starvation
* with rapid producer/consumer workloads.
*/
if (is_wake_sync(wake_flags))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is one of the more tricky cases to handle. IIRC on large machines this can cross a lot of cross LLC migrations. I had an idea of having a waker mask on the task ctx so that if the waker has woke the task multiple times then it's maybe good to do the migration? Still feels very heuristic and hand wavy though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should eventually be handled by the load balancing algorithm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the sync wakeup scenario is tricky, because we can have sync wakeups that are pure signalling wakeups (basically with no data passed) and in this case we don't want to migrate and producer/consumer wakeups, where we want to migrate. The problem is, we don't have enough information to reliably distinguish between the two. So no matter what we do, we'll end up improving some workloads while regressing others, and vice versa.

I was considering to use a wake_sync counter per-task that decays over time: if it's over a certain threshold it's likely a signal-heavy wakeup and we don't migrate, if it's below a certain threshold is likely a handoff wkeup and we migrate. And in case of rustland_core maybe we can pass this info to the user-space scheduler. But this also feels like a very heuristic-based approach...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in case of rustland_core maybe we can pass this info to the user-space scheduler.

I like this idea, could be interesting to try a few different approaches in userspace.

Copy link
Contributor

@hodgesds hodgesds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

arighi added 2 commits May 12, 2025 07:42
Certain pipe-intensive workloads are not capable of maxing out all
available CPUs.

Improve this by routing the wakee to the waker's CPU during sync
wakeups. This helps maintain locality and balance load more effectively,
improving overall CPU utilization.

Example (using
https://github.com/marioroy/mce-sandbox/blob/main/bin/algorithm3.pl):

 $ ./algorithm3.pl 1e12 --threads=8

 - before:

    0[||||||||||||||||||      58.3%]   4[|||||||||||||||||||||||100.0%]
    1[||||||||||||||||||||||||85.9%]   5[||||||||||||||||||||||||95.6%]
    2[||||||||||||||||||||||||87.8%]   6[||||||||||||||||||||||||91.2%]
    3[|||||||||||||||||||||||100.0%]   7[||||||||||||||||||||||||94.4%]

 - after:

    0[|||||||||||||||||||||||100.0%]   4[|||||||||||||||||||||||100.0%]
    1[|||||||||||||||||||||||100.0%]   5[|||||||||||||||||||||||100.0%]
    2[|||||||||||||||||||||||100.0%]   6[|||||||||||||||||||||||100.0%]
    3[|||||||||||||||||||||||100.0%]   7[|||||||||||||||||||||||100.0%]

Signed-off-by: Andrea Righi <[email protected]>
Consider the waker's CPU as idle on a sync wakeup only if we couldn't
find any idle CPU in the system.

This fixes a performance regression (in terms of fps) with certain games
such as Rocket League.

Fixes: e4fb2c0 ("scx_rustland_core: Improve sync wakeups")
Signed-off-by: Andrea Righi <[email protected]>
@arighi arighi force-pushed the rustland-sync-wakeup branch from 870c6ae to 40a9321 Compare May 12, 2025 05:43
@arighi
Copy link
Contributor Author

arighi commented May 12, 2025

Added a commit on top to fix a performance regression with Rocket League.

@arighi
Copy link
Contributor Author

arighi commented May 12, 2025

Thinking more about the sync wakeup approach in scx_rustland_core, I think it'd be nice to pass the waker and wakeup type to the user-space scheduler. And delegate the decision to user space.

Drafting a possible implementation:

enum {
    WAKE_IRQ,
    WAKE_TASK,
    WAKE_SYNC,
}

If the current CPU is idle, then it's a wakeup from IRQ. If it's not idle, it could be still a wakeup from IRQ or a task wakeup, where current is the waker (maybe we can introduce another flag in the kernel to better distinguish between a IRQ wakeup or task wakeup - this can be useful for other schedulers as well). And for the WAKE_SYNC case we already have SCX_WAKE_SYNC.

For the waker we can just store current->pid in the task context, or 0 in case of WAKE_IRQ.

Then we add waker_pid and wake_type to the context passed to user space. WDYT?

@arighi arighi added this pull request to the merge queue May 12, 2025
Merged via the queue into main with commit 40579ba May 12, 2025
32 checks passed
@arighi arighi deleted the rustland-sync-wakeup branch May 12, 2025 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants