Skip to content

V6.2 timerslack+cgroups #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from
Draft

Conversation

randombtree
Copy link
Owner

Make timer slack useful

Hrtimer background and problems

Linux has had a concept of "timer slack", but in it's current implementation it only means delaying a timer by a slack time (current default 50 us) . If a process want's to behave nice and set a larger slack, e.g. 1 second, every timer in that process will effectively be delayed by 1 second. This is partly because the hrtimers are sorted by the hard timeout (time + slack) and it would be expensive to peek through all timers to find timers that have soft-expired (i.e. time < now).

This obviously leads to timer slack having no positive effect on power savings as hrtimers can't be expired before the hard timeout and as such always result in a timer interrupt.

Solution

The solution is to augment the rbtree used to keep hrtimers stored; the timers will be stored in soft-expiry order (i.e. the time without the slack) and the slack will propagate through the augmented rbtree, giving us a chance to figure out the lowest hard timeout (i.e. time + slack) in the tree. This lowest hard timeout is used to program the timer hardware, but now we can opportunistically execute timers that have lower soft-timeouts reducing timer interrupts.

And another thing (possibly split it out to own PR)...

As timer slack becomes useful, changing the global default timer slack can give some power savings. The other part
adds cgroup support for setting per-cg timer slack. The timer slack is inherited from parent cgroups and can be changed at any point to only affect parts of the cgroup hierarchy.

The rb_add_augmented* functions, like the equivalents in rbtree.h remove a bit
of the necessary boilerplate code when implementing augmented rbtrees.

The addition also affects the augmented callbacks as an insert callback has
to be added, slightly changing the augmented rbtree API.

Signed-off-by: Roger Blomgren <[email protected]>
Augmented rbtrees can be used for e.g. specifying timeout ranges.

Signed-off-by: Roger Blomgren <[email protected]>
Previously, hrtimers mostly expire at timeout + slack, as the rbtree is sorted
on that value. Now, keep the hrtimer rbtree sorted on the "soft" expiry time,
i.e. without the slack. The optimal timeout value for the rbtree is kept  as an
augmented value, thus allowing an idle system to still wait for a timer
up until timeout + slack.

This patch will make the timer slack (at large values) more useful as timer
timeouts can truly be merged to happen at the same timer interrupt.

This work is based on patches from Venkatesh Pallipadi, albeit heavily modified.

Originally-by: Venkatesh Pallipadi https://lkml.org/lkml/2011/9/23/261

Signed-off-by: Roger Blomgren <[email protected]>
This patch doesn't introduce any behavioural changes, but is a
preparatory patch for a dynamic timer slack.

Conversion mostly done by Coccinelle (and some by hand):

@ replace_ts @
expression F;
expression list EL1, EL2;
struct task_struct *T;
symbol current;
@@
(
-F(EL1, T->timer_slack_ns, EL2)
+F(EL1, get_task_timer_slack_ns(T), EL2)
|
-F(T->timer_slack_ns, EL2)
+F(get_task_timer_slack_ns(T), EL2)
|
-F(T->timer_slack_ns, EL2)
+F(get_task_timer_slack_ns(T), EL2)
|
-F(T->timer_slack_ns)
+F(get_task_timer_slack_ns(T))
|
-F = T->timer_slack_ns
+F = get_task_timer_slack_ns(T)
|
-F(EL1, current->timer_slack_ns, EL2)
+F(EL1, get_task_timer_slack_ns(current), EL2)
|
-F(current->timer_slack_ns, EL2)
+F(get_task_timer_slack_ns(current), EL2)
|
-F(current->timer_slack_ns, EL2)
+F(get_task_timer_slack_ns(current), EL2)
|
-F(current->timer_slack_ns)
+F(get_task_timer_slack_ns(current))
|
-F = current->timer_slack_ns
+F = get_task_timer_slack_ns(current)
)

Signed-off-by: Roger Blomgren <[email protected]>
This patch shouldn't change the behaviour of timer slack at all, but is a
preparatory patch for cgroup-based timer slack.

Signed-off-by: Roger Blomgren <[email protected]>
@randombtree randombtree marked this pull request as draft April 6, 2023 13:31
…eout.

The softirq_expires_next is the least hard timeout value (timeout + slack) for
the base, but there can be timers where timeout (sans slack) < now. As the
timers are now sorted in softexpires order, we get the next timer cheaply and
might as well run it if it's available, possibly avoiding a wakeup from idle
later.

Signed-off-by: Roger Blomgren <[email protected]>
…re idle.

With hrtimer storing timers in a soft-expires order, it's cheap to look ahead
if there are soft-expired timers that could be run before idling the CPU. This
COULD result in power saving when using large-enough timer slack values in user
space.

Signed-off-by: Roger Blomgren <[email protected]>
css_filter_for_each_descendant_pre behaves like its unfiltered sibling, except
that a filter function is applied on each node. If the filter returns false
for a CSS node, the node and its descendants will be left out from the
iterator.

Signed-off-by: Roger Blomgren <[email protected]>
Cgroups can now have different timer slack values (cgroup.timer_slack_ns). The
timer slack is inherited down to the descendant cgroups, that can override the
inherited value for their own subtree of descendants, if necessary. A process
that hasn't changed its timer slack value through the appropriate prctl, will
the cgroup provided one which can be either shorter or longer than the default
50 us timer slack previously used in Linux. The 50 us timer slack will still
remain as the default timer slack if the cgroup values are left untouched.

Example inheritance in a cg-hierarchy that has a new timer slack set as R on
the root cgroup, in addition to N and O set in the corresponding descendants:

       {R,s}
      /     \
   {N,r}   {_,R}
    /        \
  {_,N}     {O,r}
    |         |
  {_,N}     {_,O}

where {X,y} denotes the timer slack (X) and the inherited slack (y). The
effective timer slack is in upper case, e.g. {_,Y} means the default inherited
timer slack (Y) is used. Underscore (_) denotes a default timer slack, in which
case the inherited timer slack is used.

Signed-off-by: Roger Blomgren <[email protected]>
@randombtree randombtree force-pushed the v6.2-timerslack+cgroups branch from b70a37a to 04b9f81 Compare April 11, 2023 09:23
@randombtree randombtree self-assigned this Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant