forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 0
V6.2 timerslack+cgroups #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
randombtree
wants to merge
9
commits into
master
Choose a base branch
from
v6.2-timerslack+cgroups
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The rb_add_augmented* functions, like the equivalents in rbtree.h remove a bit of the necessary boilerplate code when implementing augmented rbtrees. The addition also affects the augmented callbacks as an insert callback has to be added, slightly changing the augmented rbtree API. Signed-off-by: Roger Blomgren <[email protected]>
Augmented rbtrees can be used for e.g. specifying timeout ranges. Signed-off-by: Roger Blomgren <[email protected]>
Previously, hrtimers mostly expire at timeout + slack, as the rbtree is sorted on that value. Now, keep the hrtimer rbtree sorted on the "soft" expiry time, i.e. without the slack. The optimal timeout value for the rbtree is kept as an augmented value, thus allowing an idle system to still wait for a timer up until timeout + slack. This patch will make the timer slack (at large values) more useful as timer timeouts can truly be merged to happen at the same timer interrupt. This work is based on patches from Venkatesh Pallipadi, albeit heavily modified. Originally-by: Venkatesh Pallipadi https://lkml.org/lkml/2011/9/23/261 Signed-off-by: Roger Blomgren <[email protected]>
This patch doesn't introduce any behavioural changes, but is a preparatory patch for a dynamic timer slack. Conversion mostly done by Coccinelle (and some by hand): @ replace_ts @ expression F; expression list EL1, EL2; struct task_struct *T; symbol current; @@ ( -F(EL1, T->timer_slack_ns, EL2) +F(EL1, get_task_timer_slack_ns(T), EL2) | -F(T->timer_slack_ns, EL2) +F(get_task_timer_slack_ns(T), EL2) | -F(T->timer_slack_ns, EL2) +F(get_task_timer_slack_ns(T), EL2) | -F(T->timer_slack_ns) +F(get_task_timer_slack_ns(T)) | -F = T->timer_slack_ns +F = get_task_timer_slack_ns(T) | -F(EL1, current->timer_slack_ns, EL2) +F(EL1, get_task_timer_slack_ns(current), EL2) | -F(current->timer_slack_ns, EL2) +F(get_task_timer_slack_ns(current), EL2) | -F(current->timer_slack_ns, EL2) +F(get_task_timer_slack_ns(current), EL2) | -F(current->timer_slack_ns) +F(get_task_timer_slack_ns(current)) | -F = current->timer_slack_ns +F = get_task_timer_slack_ns(current) ) Signed-off-by: Roger Blomgren <[email protected]>
This patch shouldn't change the behaviour of timer slack at all, but is a preparatory patch for cgroup-based timer slack. Signed-off-by: Roger Blomgren <[email protected]>
…eout. The softirq_expires_next is the least hard timeout value (timeout + slack) for the base, but there can be timers where timeout (sans slack) < now. As the timers are now sorted in softexpires order, we get the next timer cheaply and might as well run it if it's available, possibly avoiding a wakeup from idle later. Signed-off-by: Roger Blomgren <[email protected]>
…re idle. With hrtimer storing timers in a soft-expires order, it's cheap to look ahead if there are soft-expired timers that could be run before idling the CPU. This COULD result in power saving when using large-enough timer slack values in user space. Signed-off-by: Roger Blomgren <[email protected]>
css_filter_for_each_descendant_pre behaves like its unfiltered sibling, except that a filter function is applied on each node. If the filter returns false for a CSS node, the node and its descendants will be left out from the iterator. Signed-off-by: Roger Blomgren <[email protected]>
Cgroups can now have different timer slack values (cgroup.timer_slack_ns). The timer slack is inherited down to the descendant cgroups, that can override the inherited value for their own subtree of descendants, if necessary. A process that hasn't changed its timer slack value through the appropriate prctl, will the cgroup provided one which can be either shorter or longer than the default 50 us timer slack previously used in Linux. The 50 us timer slack will still remain as the default timer slack if the cgroup values are left untouched. Example inheritance in a cg-hierarchy that has a new timer slack set as R on the root cgroup, in addition to N and O set in the corresponding descendants: {R,s} / \ {N,r} {_,R} / \ {_,N} {O,r} | | {_,N} {_,O} where {X,y} denotes the timer slack (X) and the inherited slack (y). The effective timer slack is in upper case, e.g. {_,Y} means the default inherited timer slack (Y) is used. Underscore (_) denotes a default timer slack, in which case the inherited timer slack is used. Signed-off-by: Roger Blomgren <[email protected]>
b70a37a
to
04b9f81
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Make timer slack useful
Hrtimer background and problems
Linux has had a concept of "timer slack", but in it's current implementation it only means delaying a timer by a slack time (current default 50 us) . If a process want's to behave nice and set a larger slack, e.g. 1 second, every timer in that process will effectively be delayed by 1 second. This is partly because the hrtimers are sorted by the hard timeout (time + slack) and it would be expensive to peek through all timers to find timers that have soft-expired (i.e. time < now).
This obviously leads to timer slack having no positive effect on power savings as hrtimers can't be expired before the hard timeout and as such always result in a timer interrupt.
Solution
The solution is to augment the rbtree used to keep hrtimers stored; the timers will be stored in soft-expiry order (i.e. the time without the slack) and the slack will propagate through the augmented rbtree, giving us a chance to figure out the lowest hard timeout (i.e. time + slack) in the tree. This lowest hard timeout is used to program the timer hardware, but now we can opportunistically execute timers that have lower soft-timeouts reducing timer interrupts.
And another thing (possibly split it out to own PR)...
As timer slack becomes useful, changing the global default timer slack can give some power savings. The other part
adds cgroup support for setting per-cg timer slack. The timer slack is inherited from parent cgroups and can be changed at any point to only affect parts of the cgroup hierarchy.