You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Presently, sled-agent's `InstanceRunner` has two mechanisms for shutting
down a VMM: sending an instance state PUT request to the
`propolis-server` process for the `Stopped` state, or forcibly
terminating the `propolis-server` and tearing down the zone. At present,
when a request to stop an instance is sent to the sled-agent, it uses
the first mechanism, where Propolis is politely asked to stop the
instance --- which I'll refer to as "graceful shutdown". The forceful
termination path is used when asked to unregister an instance where the
VMM has not started up yet, when encountering an unrecoverable VMM
error, or when killing an instance that was making use of an expunged disk. Currently, these two paths don't really overlap: when Nexus asks
a sled-agent to stop an instance, all it will do is politely ask
Propolis to please stop the instance gracefully, and will only fall back
to violently shooting the zone in the face if Propolis returns the error
that indicates it never knew about that instance in the first place.
This means that, should a VMM get *stuck* while shutting down the
instance, stopping it will never complete successfully, and the Propolis
zone won't get cleaned up. This can happen due to e.g. [a Crucible
activation that will never complete][1]. Thus, the sled-agent should
attempt to violently terminate a Propolis zone when a graceful shutdown
of the VMM fails to complete in a timely manner.
This commit introduces a timeout for the graceful shutdown process.
Now, when we send a PUT request to Propolis with the `Stopped` instance
state, the sled-agent will start a 10-minute timer. If no update from
Propolis that indicates the instance has transitioned to `Stopped` is
received before the timer completes, the sled-agent will proceed with
the forceful termination of the Propolis zone.
Fixes#4004.
[1]: #4004 (comment)
0 commit comments