Skip to content

Remove overly permissive "kill a worker" algorithm #1457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 19, 2017

Conversation

annevk
Copy link
Member

@annevk annevk commented Jun 22, 2016

There’s a new sheriff in town and workers can now no longer be killed.
“Kill” was a poor term to use anyway and also not needed as worker
termination already covered all angles we care about except for one.

Now, workers can solely be terminated due to close(), terminate(), or
disappearing documents. They can of course continued to be GC’d as long
as that is not observable, not even from the network.

Fixes #1004.

There’s a new sheriff in town and workers can now no longer be killed.
“Kill” was a poor term to use anyway and also not needed as worker
termination already covered all angles we care about except for one.

Now, workers can solely be terminated due to close(), terminate(), or
disappearing documents. They can of course continued to be GC’d as long
as that is not observable, not even from the network.

Fixes #1004.
@annevk
Copy link
Member Author

annevk commented Jun 22, 2016

@jungkees @jakearchibald you probably want to figure out how this affects service workers.

@domenic
Copy link
Member

domenic commented Jun 23, 2016

I would still like some sign-off from other worker teams besides Mozilla's saying that they intend to conform to this updated spec and not kill workers due to e.g. OOM or user experience reasons.

@annevk
Copy link
Member Author

annevk commented Jun 23, 2016

OOM is still allowed per top-level clause. No need to define that at the API-level unless we do it everywhere.

@beidson
Copy link

beidson commented Jun 27, 2016

WebKit prefers to leave the "User agents may invoke the "kill a worker" processing model on a worker at any time" language.

As Domenic mentioned, "user experience reasons" is one class of reasons why we might. There might be others.

@annevk
Copy link
Member Author

annevk commented Jun 28, 2016

@beidson do you actually use that in your implementation? Or do you have a reasonable lifetime model plus OOM clauses (which the specification already grants for all features).

@beidson
Copy link

beidson commented Jun 28, 2016

do you actually use that in your implementation? Or do you have a reasonable lifetime model plus OOM clauses (which the specification already grants for all features).

We could utilize the current clause AND have a "reasonable lifetime model plus OOM clauses", so this "OR" question is a false choice.

User experience extends to the user's entire software and hardware environment, not just a web page they're currently viewing. Also, running OOM is not the only resource constraint that might come up.

This is a 100% hypothetical example that we have not implemented, but it is realistc - If the battery on a user's iPhone is running low, and especially if the user has turned on "Low Power Mode", we might want to take drastic steps to keep a whole CPU core idle as well as avoid context switches by sharing one core, so we might want to end the worker.

If pressed, I'm sure I can come up with other realistic scenarios supported by the current general language that are not supported by the more specific OOM clause.

And then, even after I came up with that list, the current clause is a safety valve against unforeseen scenarios whereas the OOM clause is not.

@beidson
Copy link

beidson commented Jun 28, 2016

We're talking about this right now in person.

While we agree that the current language might be way too generous, removing it without a replacement - only allowing OOM killing - is definitely too restrictive.

We're collecting out thoughts, give us a bit. :)

@annevk
Copy link
Member Author

annevk commented Jun 28, 2016

Thanks. Please also talk to the folks implementing SharedArrayBuffer and consider that as web pages start to move updating their tree into workers, killing arbitrary workers is the same as killing the current tab.

Our point of view is that killing the threads independent from the browsing context they are spawn from does not gain you much and actually leads to a bunch of subtle issues.

@beidson
Copy link

beidson commented Jun 28, 2016

as web pages start to move updating their tree into workers, killing arbitrary workers is the same as killing the current tab.

I think this is a false equivalence.

Our point of view is that killing the threads independent from the browsing context they are spawn from does not gain you much

In the common case, perhaps not.

In exhaustive cases, it probably can.

and actually leads to a bunch of subtle issues.

No doubt!

@domenic domenic added needs implementer interest Moving the issue forward requires implementers to express interest do not merge yet Pull request must not be merged per rationale in comment labels Jun 29, 2016
@annevk
Copy link
Member Author

annevk commented Apr 12, 2017

I filed tc39/ecma262#886 on JavaScript since by not taking this we're contradicting that standard.

@lars-t-hansen
Copy link

I filed tc39/ecma262#886 on JavaScript since by not taking this we're contradicting that standard.

Just to repeat myself from a comment I made in that bug, killing a worker that is "holding a lock" (not a spec concept since we only have atomics) or is otherwise inside some "critical section" (also not a spec concept, ditto) will tend to lead to deadlocks as other workers can't make progress. Since the window/tab can't block on entering that critical section it will remain interactive but potentially non-functional.

All of that would be at least manageable in principle with the current worker semantics if HTML had some kind of notification mechanism for sudden worker death, but it does not. It is possible for the programmer to guard against many problems of worker execution by handling exceptions, but the worker being gunned down by the UA is not one of those. The assumption seems to have been that workers should be considered as lightweight remote hosts (they come and go and can fail) but shared memory requires them to be more like concurrent processes.

@duanyao
Copy link

duanyao commented Apr 12, 2017

Shared memory makes workers like threads, not processes. In native world, it also doesn't make sense to kill an arbitrary thread in a multi-threaded process.

If the UA want to keep the right of killing something to save resource, it can

  • Kill an entire tab (usually a background one). When a user switches to that tab again, UA just reload it, so the impact on user experience is minimized. It seems some mobile browsers already have this feature.

  • Kill workers that explicitly allow it. E.g. add an option to the constructor:

    new Worker('/path/to/js', { autokillable: true })

    Workers that share a memory must be all autokillable or not at all; they must be killed as a whole.

@annevk
Copy link
Member Author

annevk commented Apr 12, 2017

We could make this a little weaker, since it should be okay to terminate a SharedWorker and descendants for instance. We just need to draw the lines at the agent cluster, and a new SharedWorker is effectively a new agent cluster. No idea if that resolves the objection however.

@hober
Copy link
Contributor

hober commented Apr 18, 2017

@beidson's concern was about when service workers can be killed, not dedicated workers. I think it would be fine if dedicated workers "can solely be terminated due to close(), terminate(), or
disappearing documents" (and non-observably from gc).

@wanderview
Copy link
Member

For what its worth, I think there are limits to this part:

They can of course continued to be GC’d as long as that is not observable, not even from the network.

For example:

  1. You can arrange to get a ServiceWorker state change event if the Worker is GC'd while its the last controlled client.
  2. You can poll using Clients API.
  3. Perhaps APIs like broadcast channel can be used to detect GC by the absence of any more messages.
  4. Probably through IDB transactions where the worker is blocking the window from proceeding.

I'm all for making GC reasonably hard to observe, but I'm not sure we can realistically implement the guarantee in that statement.

@annevk annevk removed the do not merge yet Pull request must not be merged per rationale in comment label Apr 18, 2017
@annevk annevk assigned domenic and unassigned domenic Apr 18, 2017
@annevk
Copy link
Member Author

annevk commented Apr 18, 2017

@domenic if you could review this that'd be appreciated. I wanted to use the GitHub UI, but with the new review suggestions feature from GitHub it appears that's broken.

I'll promise less about GC in the eventual commit message, though other than network/storage I/O I don't think we make GC observable (or intend to, anyway). It seems like service workers is poking a bunch of holes though that they should maybe fix.

@annevk annevk removed the needs implementer interest Moving the issue forward requires implementers to express interest label Apr 18, 2017
@domenic
Copy link
Member

domenic commented Apr 18, 2017

The change looks good, and thanks very much to Apple for getting back to us.

@annevk, can you help me understand why we have two algorithms in the first place, and why "terminate" is the one we want to keep? The differences seem to be that "kill" is allowed to wait a user-agent defined amount of time, and "terminate" empties the port message queue. I'm really confused why these are different in the current spec, and I'd feel better about changing the spec if I understood the original spec.

@annevk
Copy link
Member Author

annevk commented Apr 18, 2017

I think kill a worker defines to wait to avoid exposing too much details as to why it's happening. Whereas terminate is the direct result of invoking terminate(). Though I think in practice the latter can also end up having to wait a bit due to parallelism. Anyway, that's my speculation, I can't say for sure. There's definitely room for future cleanup here.

@domenic
Copy link
Member

domenic commented Apr 18, 2017

What about the message-queue-emptying? Any idea why it wasn't there for kill? It seems like a good thing to keep, certainly, so again I'm fine with this PR as-is, but I'm just confused about the history here.

@annevk
Copy link
Member Author

annevk commented Apr 18, 2017

I found https://lists.w3.org/Archives/Public/public-html-diffs/2008Dec/0048.html after some digging which led @domenic and I to conclude that this is probably safe to merge. The algorithms used to be very different and got closer over time as well.

There is probably some further lifetime cleanup to be done, in particular around MessagePort management.

I'll merge this tomorrow if nobody has raised further concerns by then.

@annevk annevk merged commit 77c694e into master Apr 19, 2017
@annevk annevk deleted the revoking-the-license-to-kill branch April 19, 2017 07:38
@littledan
Copy link
Contributor

cc @nhiroki

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Workers: Curtail the browser's license to kill
8 participants