-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Remove overly permissive "kill a worker" algorithm #1457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There’s a new sheriff in town and workers can now no longer be killed. “Kill” was a poor term to use anyway and also not needed as worker termination already covered all angles we care about except for one. Now, workers can solely be terminated due to close(), terminate(), or disappearing documents. They can of course continued to be GC’d as long as that is not observable, not even from the network. Fixes #1004.
@jungkees @jakearchibald you probably want to figure out how this affects service workers. |
I would still like some sign-off from other worker teams besides Mozilla's saying that they intend to conform to this updated spec and not kill workers due to e.g. OOM or user experience reasons. |
OOM is still allowed per top-level clause. No need to define that at the API-level unless we do it everywhere. |
WebKit prefers to leave the "User agents may invoke the "kill a worker" processing model on a worker at any time" language. As Domenic mentioned, "user experience reasons" is one class of reasons why we might. There might be others. |
@beidson do you actually use that in your implementation? Or do you have a reasonable lifetime model plus OOM clauses (which the specification already grants for all features). |
We could utilize the current clause AND have a "reasonable lifetime model plus OOM clauses", so this "OR" question is a false choice. User experience extends to the user's entire software and hardware environment, not just a web page they're currently viewing. Also, running OOM is not the only resource constraint that might come up. This is a 100% hypothetical example that we have not implemented, but it is realistc - If the battery on a user's iPhone is running low, and especially if the user has turned on "Low Power Mode", we might want to take drastic steps to keep a whole CPU core idle as well as avoid context switches by sharing one core, so we might want to end the worker. If pressed, I'm sure I can come up with other realistic scenarios supported by the current general language that are not supported by the more specific OOM clause. And then, even after I came up with that list, the current clause is a safety valve against unforeseen scenarios whereas the OOM clause is not. |
We're talking about this right now in person. While we agree that the current language might be way too generous, removing it without a replacement - only allowing OOM killing - is definitely too restrictive. We're collecting out thoughts, give us a bit. :) |
Thanks. Please also talk to the folks implementing Our point of view is that killing the threads independent from the browsing context they are spawn from does not gain you much and actually leads to a bunch of subtle issues. |
I think this is a false equivalence.
In the common case, perhaps not. In exhaustive cases, it probably can.
No doubt! |
I filed tc39/ecma262#886 on JavaScript since by not taking this we're contradicting that standard. |
Just to repeat myself from a comment I made in that bug, killing a worker that is "holding a lock" (not a spec concept since we only have atomics) or is otherwise inside some "critical section" (also not a spec concept, ditto) will tend to lead to deadlocks as other workers can't make progress. Since the window/tab can't block on entering that critical section it will remain interactive but potentially non-functional. All of that would be at least manageable in principle with the current worker semantics if HTML had some kind of notification mechanism for sudden worker death, but it does not. It is possible for the programmer to guard against many problems of worker execution by handling exceptions, but the worker being gunned down by the UA is not one of those. The assumption seems to have been that workers should be considered as lightweight remote hosts (they come and go and can fail) but shared memory requires them to be more like concurrent processes. |
Shared memory makes workers like threads, not processes. In native world, it also doesn't make sense to kill an arbitrary thread in a multi-threaded process. If the UA want to keep the right of killing something to save resource, it can
|
We could make this a little weaker, since it should be okay to terminate a SharedWorker and descendants for instance. We just need to draw the lines at the agent cluster, and a new SharedWorker is effectively a new agent cluster. No idea if that resolves the objection however. |
@beidson's concern was about when service workers can be killed, not dedicated workers. I think it would be fine if dedicated workers "can solely be terminated due to close(), terminate(), or |
For what its worth, I think there are limits to this part:
For example:
I'm all for making GC reasonably hard to observe, but I'm not sure we can realistically implement the guarantee in that statement. |
@domenic if you could review this that'd be appreciated. I wanted to use the GitHub UI, but with the new review suggestions feature from GitHub it appears that's broken. I'll promise less about GC in the eventual commit message, though other than network/storage I/O I don't think we make GC observable (or intend to, anyway). It seems like service workers is poking a bunch of holes though that they should maybe fix. |
The change looks good, and thanks very much to Apple for getting back to us. @annevk, can you help me understand why we have two algorithms in the first place, and why "terminate" is the one we want to keep? The differences seem to be that "kill" is allowed to wait a user-agent defined amount of time, and "terminate" empties the port message queue. I'm really confused why these are different in the current spec, and I'd feel better about changing the spec if I understood the original spec. |
I think kill a worker defines to wait to avoid exposing too much details as to why it's happening. Whereas terminate is the direct result of invoking |
What about the message-queue-emptying? Any idea why it wasn't there for kill? It seems like a good thing to keep, certainly, so again I'm fine with this PR as-is, but I'm just confused about the history here. |
I found https://lists.w3.org/Archives/Public/public-html-diffs/2008Dec/0048.html after some digging which led @domenic and I to conclude that this is probably safe to merge. The algorithms used to be very different and got closer over time as well. There is probably some further lifetime cleanup to be done, in particular around MessagePort management. I'll merge this tomorrow if nobody has raised further concerns by then. |
cc @nhiroki |
There’s a new sheriff in town and workers can now no longer be killed.
“Kill” was a poor term to use anyway and also not needed as worker
termination already covered all angles we care about except for one.
Now, workers can solely be terminated due to close(), terminate(), or
disappearing documents. They can of course continued to be GC’d as long
as that is not observable, not even from the network.
Fixes #1004.