Remove overly permissive "kill a worker" algorithm #1457

annevk · 2016-06-22T12:16:58Z

There’s a new sheriff in town and workers can now no longer be killed.
“Kill” was a poor term to use anyway and also not needed as worker
termination already covered all angles we care about except for one.

Now, workers can solely be terminated due to close(), terminate(), or
disappearing documents. They can of course continued to be GC’d as long
as that is not observable, not even from the network.

Fixes #1004.

There’s a new sheriff in town and workers can now no longer be killed. “Kill” was a poor term to use anyway and also not needed as worker termination already covered all angles we care about except for one. Now, workers can solely be terminated due to close(), terminate(), or disappearing documents. They can of course continued to be GC’d as long as that is not observable, not even from the network. Fixes #1004.

annevk · 2016-06-22T12:22:19Z

@jungkees @jakearchibald you probably want to figure out how this affects service workers.

domenic · 2016-06-23T16:25:07Z

I would still like some sign-off from other worker teams besides Mozilla's saying that they intend to conform to this updated spec and not kill workers due to e.g. OOM or user experience reasons.

annevk · 2016-06-23T16:29:29Z

OOM is still allowed per top-level clause. No need to define that at the API-level unless we do it everywhere.

beidson · 2016-06-27T18:03:20Z

WebKit prefers to leave the "User agents may invoke the "kill a worker" processing model on a worker at any time" language.

As Domenic mentioned, "user experience reasons" is one class of reasons why we might. There might be others.

annevk · 2016-06-28T07:31:55Z

@beidson do you actually use that in your implementation? Or do you have a reasonable lifetime model plus OOM clauses (which the specification already grants for all features).

beidson · 2016-06-28T15:32:56Z

do you actually use that in your implementation? Or do you have a reasonable lifetime model plus OOM clauses (which the specification already grants for all features).

We could utilize the current clause AND have a "reasonable lifetime model plus OOM clauses", so this "OR" question is a false choice.

User experience extends to the user's entire software and hardware environment, not just a web page they're currently viewing. Also, running OOM is not the only resource constraint that might come up.

This is a 100% hypothetical example that we have not implemented, but it is realistc - If the battery on a user's iPhone is running low, and especially if the user has turned on "Low Power Mode", we might want to take drastic steps to keep a whole CPU core idle as well as avoid context switches by sharing one core, so we might want to end the worker.

If pressed, I'm sure I can come up with other realistic scenarios supported by the current general language that are not supported by the more specific OOM clause.

And then, even after I came up with that list, the current clause is a safety valve against unforeseen scenarios whereas the OOM clause is not.

beidson · 2016-06-28T16:33:55Z

We're talking about this right now in person.

While we agree that the current language might be way too generous, removing it without a replacement - only allowing OOM killing - is definitely too restrictive.

We're collecting out thoughts, give us a bit. :)

annevk · 2016-06-28T16:42:27Z

Thanks. Please also talk to the folks implementing SharedArrayBuffer and consider that as web pages start to move updating their tree into workers, killing arbitrary workers is the same as killing the current tab.

Our point of view is that killing the threads independent from the browsing context they are spawn from does not gain you much and actually leads to a bunch of subtle issues.

beidson · 2016-06-28T16:47:00Z

as web pages start to move updating their tree into workers, killing arbitrary workers is the same as killing the current tab.

I think this is a false equivalence.

Our point of view is that killing the threads independent from the browsing context they are spawn from does not gain you much

In the common case, perhaps not.

In exhaustive cases, it probably can.

and actually leads to a bunch of subtle issues.

No doubt!

annevk · 2017-04-12T08:04:25Z

I filed tc39/ecma262#886 on JavaScript since by not taking this we're contradicting that standard.

lars-t-hansen · 2017-04-12T09:22:24Z

I filed tc39/ecma262#886 on JavaScript since by not taking this we're contradicting that standard.

Just to repeat myself from a comment I made in that bug, killing a worker that is "holding a lock" (not a spec concept since we only have atomics) or is otherwise inside some "critical section" (also not a spec concept, ditto) will tend to lead to deadlocks as other workers can't make progress. Since the window/tab can't block on entering that critical section it will remain interactive but potentially non-functional.

All of that would be at least manageable in principle with the current worker semantics if HTML had some kind of notification mechanism for sudden worker death, but it does not. It is possible for the programmer to guard against many problems of worker execution by handling exceptions, but the worker being gunned down by the UA is not one of those. The assumption seems to have been that workers should be considered as lightweight remote hosts (they come and go and can fail) but shared memory requires them to be more like concurrent processes.

duanyao · 2017-04-12T10:47:37Z

Shared memory makes workers like threads, not processes. In native world, it also doesn't make sense to kill an arbitrary thread in a multi-threaded process.

If the UA want to keep the right of killing something to save resource, it can

Kill an entire tab (usually a background one). When a user switches to that tab again, UA just reload it, so the impact on user experience is minimized. It seems some mobile browsers already have this feature.
Kill workers that explicitly allow it. E.g. add an option to the constructor:
```
new Worker('/path/to/js', { autokillable: true })
```
Workers that share a memory must be all autokillable or not at all; they must be killed as a whole.

annevk · 2017-04-12T10:51:41Z

We could make this a little weaker, since it should be okay to terminate a SharedWorker and descendants for instance. We just need to draw the lines at the agent cluster, and a new SharedWorker is effectively a new agent cluster. No idea if that resolves the objection however.

hober · 2017-04-18T18:12:27Z

@beidson's concern was about when service workers can be killed, not dedicated workers. I think it would be fine if dedicated workers "can solely be terminated due to close(), terminate(), or
disappearing documents" (and non-observably from gc).

wanderview · 2017-04-18T18:32:24Z

For what its worth, I think there are limits to this part:

They can of course continued to be GC’d as long as that is not observable, not even from the network.

For example:

You can arrange to get a ServiceWorker state change event if the Worker is GC'd while its the last controlled client.
You can poll using Clients API.
Perhaps APIs like broadcast channel can be used to detect GC by the absence of any more messages.
Probably through IDB transactions where the worker is blocking the window from proceeding.

I'm all for making GC reasonably hard to observe, but I'm not sure we can realistically implement the guarantee in that statement.

annevk · 2017-04-18T18:50:44Z

@domenic if you could review this that'd be appreciated. I wanted to use the GitHub UI, but with the new review suggestions feature from GitHub it appears that's broken.

I'll promise less about GC in the eventual commit message, though other than network/storage I/O I don't think we make GC observable (or intend to, anyway). It seems like service workers is poking a bunch of holes though that they should maybe fix.

domenic · 2017-04-18T18:55:01Z

The change looks good, and thanks very much to Apple for getting back to us.

@annevk, can you help me understand why we have two algorithms in the first place, and why "terminate" is the one we want to keep? The differences seem to be that "kill" is allowed to wait a user-agent defined amount of time, and "terminate" empties the port message queue. I'm really confused why these are different in the current spec, and I'd feel better about changing the spec if I understood the original spec.

annevk · 2017-04-18T19:01:31Z

I think kill a worker defines to wait to avoid exposing too much details as to why it's happening. Whereas terminate is the direct result of invoking terminate(). Though I think in practice the latter can also end up having to wait a bit due to parallelism. Anyway, that's my speculation, I can't say for sure. There's definitely room for future cleanup here.

domenic · 2017-04-18T19:19:30Z

What about the message-queue-emptying? Any idea why it wasn't there for kill? It seems like a good thing to keep, certainly, so again I'm fine with this PR as-is, but I'm just confused about the history here.

annevk · 2017-04-18T19:49:52Z

I found https://lists.w3.org/Archives/Public/public-html-diffs/2008Dec/0048.html after some digging which led @domenic and I to conclude that this is probably safe to merge. The algorithms used to be very different and got closer over time as well.

There is probably some further lifetime cleanup to be done, in particular around MessagePort management.

I'll merge this tomorrow if nobody has raised further concerns by then.

littledan · 2017-04-19T13:49:45Z

cc @nhiroki

domenic added normative change topic: workers labels Jun 23, 2016

domenic added needs implementer interest Moving the issue forward requires implementers to express interest do not merge yet Pull request must not be merged per rationale in comment labels Jun 29, 2016

annevk mentioned this pull request Apr 12, 2017

Do Agent Cluster requirements have TC39 consensus? tc39/ecma262#886

Closed

annevk removed the do not merge yet Pull request must not be merged per rationale in comment label Apr 18, 2017

annevk assigned domenic and unassigned domenic Apr 18, 2017

annevk removed the needs implementer interest Moving the issue forward requires implementers to express interest label Apr 18, 2017

annevk merged commit 77c694e into master Apr 19, 2017

annevk deleted the revoking-the-license-to-kill branch April 19, 2017 07:38

Remove overly permissive "kill a worker" algorithm #1457

Remove overly permissive "kill a worker" algorithm #1457

Uh oh!

Conversation

annevk commented Jun 22, 2016

Uh oh!

annevk commented Jun 22, 2016

Uh oh!

domenic commented Jun 23, 2016

Uh oh!

annevk commented Jun 23, 2016

Uh oh!

beidson commented Jun 27, 2016

Uh oh!

annevk commented Jun 28, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beidson commented Jun 28, 2016

Uh oh!

beidson commented Jun 28, 2016

Uh oh!

annevk commented Jun 28, 2016

Uh oh!

beidson commented Jun 28, 2016

Uh oh!

annevk commented Apr 12, 2017

Uh oh!

lars-t-hansen commented Apr 12, 2017

Uh oh!

duanyao commented Apr 12, 2017

Uh oh!

annevk commented Apr 12, 2017

Uh oh!

hober commented Apr 18, 2017

Uh oh!

wanderview commented Apr 18, 2017

Uh oh!

annevk commented Apr 18, 2017

Uh oh!

domenic commented Apr 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

annevk commented Apr 18, 2017

Uh oh!

domenic commented Apr 18, 2017

Uh oh!

annevk commented Apr 18, 2017

Uh oh!

littledan commented Apr 19, 2017

Uh oh!

Uh oh!

annevk commented Jun 28, 2016 •

edited

Loading

domenic commented Apr 18, 2017 •

edited

Loading