[nexus] add background task for cleaning up abandoned VMMs #5812

hawkw · 2024-05-23T18:31:17Z

Note: This change is part of the ongoing work on instance lifecycle
management that I'm working on in PR #5749. It's not actually necessary
on its own, it's just a component of the upcoming instance updater saga.
However, I thought it would be easier to review if I factored out this
change into a separate PR that can be reviewed and merged on its own.

The instance update saga (see PR #5749) will only clean up after VMMs
whose IDs appear in an instance record. When a live migration finishes
(successfully or not), we want to allow a new migration to begin as soon
as possible, which means we have to unlink the “unused” side of the
migration --- the source if migration succeeded, or the target if it
failed --- from the instance, even though that VMM may not be fully
destroyed yet. Once this happens, the instance update saga will no
longer be able to clean up these VMMs, so we’ll need a separate task
that cleans up these "abandoned" VMMs in the background.

This branch introduces an abandoned_vmm_reaper background task that's
responsible for doing this. It queries the database to list VMMs which
are:

in the Destroyed state
not deleted yet (i.e. time_deleted IS NOT NULL)
not pointed to by their corresponding instances (neither the
active_propolis_id nor the target_propolis_id equals the VMM's ID)

For any VMMs returned by this query, the abandoned_vmm_reaper task
will:

remove the sled_resource reservation for that VMM
sets the time_deleted on the VMM record if it was not already set.

This cleanup process will be executed periodically in the background.
Eventually, the background task will also be explicitly triggered by the
instance update saga when it knows it has abandoned a VMM.

As an aside, I noticed that the current implementation of
DataStore::vmm_mark_deleted will always unconditionally set the
time_deleted field on a VMM record, even if it's already set. This is
"probably fine" for overall correctness: the VMM remains deleted, so the
operation is still idempotent-ish. But, it's not great, as it means
that any queries for VMMs deleted before a certain timestamp may not be
strictly correct, and we're updating the database more frequently than
we really need to. So, I've gone ahead and changed it to only set
time_deleted if the record's time_deleted is null, using
check_if_exists so that the method still returns Ok if the record
was already deleted --- the caller can inspect the returned bool to
determine whether or not they were the actual deleter, but the query
still doesn't fail.

The delete query doesn't actually indicate whether the record was present or not, so we don't actually need to handle that case.

nexus/src/app/background/abandoned_vmm_reaper.rs

nexus/db-queries/src/db/datastore/vmm.rs

smklein · 2024-05-23T20:28:52Z

nexus/db-queries/src/db/datastore/vmm.rs

+            // - not deleted yet
+            .filter(dsl::time_deleted.is_null())
+            // - not pointed to by their corresponding instances
+            .left_join(


(as mentioned in the hypervisor meeting) the fact that this is a left_join, not an inner_join, I think means we would also find any "VMMs that are 'destroyed', but not 'deleted', as long as instances aren't pointing at them".

I think this is actually a nice-to-have feature -- could (should?) this background task be responsible for destroying VMMs that are in a terminal state?

Yeah, we discussed this during the hypervisor sync --- I've updated the comments for this to more accurately reflect that this task will also clean up VMMs that are Destroyed and have never been assigned to an instance. The one case that this task is not responsible for is for VMMs that are Destroyed/Failed while being used actively to run an instance, which is the responsibility of the (future) instance-update saga.

Sounds good - I think I was a bit narrowly focused on "what is the data", rather than "how did we get in this state" in my review, but your comments here make a lot of sense, especially in the context of the upcoming instance-update saga.

smklein · 2024-05-23T20:33:52Z

nexus/src/app/background/abandoned_vmm_reaper.rs

+            let vmm_id = vmm.id;
+            slog::trace!(opctx.log, "Deleting abandoned VMM"; "vmm" => %vmm_id);
+            // Attempt to remove the abandoned VMM's sled resource reservation.
+            match self.datastore.sled_reservation_delete(opctx, vmm_id).await {


The one spot of this PR that feels weird to me is the overlap with functionality in

omicron/nexus/src/app/instance.rs

Line 1970 in c2f3515

pub(crate) async fn notify_instance_updated(

When a VMM is destroyed, that function seems to:

Update counters in the project / fleet about "virtual provisioning" (see: virtual_provisioning_collection_delete_instance)

Unassign producers from oximeter (see: unassign_producer)

Update instance and VMM states to indicate that the VMM has been destroyed

Delete the sled reservation (physical usage)

Mark the VMM deleted

(We have previously identified that it is a problem these steps aren't atomic! But that's an issue which exists independently of this PR)

In contrast, this background task seems to:

Delete the sled reservation (physical usage)

Mark the VMM deleted

So, I have a smattering of follow-up questions here:

Should this background task take over this responsibility for destroying VMMs from notify_instance_updated ? e.g., should that function start kicking this background task?

That function also seems to destroy VMMs in the Failed state. Is that scope we would want this task to handle too?

Should the "physical usage" tracking (sled reservation) and "virtual usage" tracking (virtual_provisioning_collection...) updates be more tightly coupled? I think they're both already atomic.

I think our discussion in the hypervisor sync and on Matrix today has cleared this up --- for readers other than @smklein, though, the goal is that eventually:

The behavior currently run in notify_instance_updated will eventually move to a saga (see Perform instance state transitions in instance-update saga #5749), and the Nexus upcall will just update the instance's state and kick off that saga

When a VMM that's actively in use by an instance is destroyed/failed, a bunch of cleanup needs to happen, including virtual provisioning, Oximeter producer, and networking; the update saga will do this

When a VMM is abandoned after a migration (either because it was a failed migration target, or because it was the old VMM leftover after a successful migration), the cleanup is much simpler, as the instance is still around and still using the virtual provisioning resources. Just the physical sled resources consumed by the individual Propolis process need to be released, the other resources are still in use by the instance, just not via this VMM.

This background task is only responsible for abandoned VMMs, so it only needs to do the simple cleanup process for them; it doesn't deallocate resources owned by instances when an in-use VMM fails or is stopped.

bnaecker

Nice, clean work, looks good to me! Just one comment, but quite minor.

nexus/src/app/background/abandoned_vmm_reaper.rs

Co-authored-by: Sean Klein <[email protected]>

hawkw · 2024-05-24T00:24:31Z

Okay, I've added some more comments clarifying which VMMs are candidates for cleanup by this task, and what cleanup is handled elsewhere (e.g. when an active VMM is destroyed). I'm going to merge this once it gets through CI --- thanks @smklein and @bnaecker for the reviews! <3

hawkw added 7 commits May 22, 2024 16:25

wip abandoned VMM reaper

63ce7a4

do a join instead of a full table scan

3030dc5

fix db query

b659a91

big pile of additional tests

4e9920e

fix vmm_mark_deleted deleting VMMs that are already deleted

eca9efc

get rid of sled_resources_already_deleted

f928c8e

The delete query doesn't actually indicate whether the record was present or not, so we don't actually need to handle that case.

add omdb support

f41d4b7

hawkw requested review from davepacheco, smklein and bnaecker and removed request for davepacheco May 23, 2024 18:31

smklein reviewed May 23, 2024

View reviewed changes

Merge branch 'main' into eliza/abandoned-vmm-reaper

ce0bd0e

bnaecker approved these changes May 23, 2024

View reviewed changes

nexus/src/app/background/abandoned_vmm_reaper.rs Outdated Show resolved Hide resolved

nexus/src/app/background/abandoned_vmm_reaper.rs Outdated Show resolved Hide resolved

hawkw and others added 3 commits May 23, 2024 15:17

post-rebase rustfmt

edf342d

fix typos

66fe92d

Co-authored-by: Sean Klein <[email protected]>

review feedback

1450eee

hawkw requested a review from smklein May 23, 2024 23:42

smklein approved these changes May 23, 2024

View reviewed changes

hawkw enabled auto-merge (squash) May 24, 2024 00:07

hawkw disabled auto-merge May 24, 2024 00:11

hawkw added 2 commits May 23, 2024 17:17

add comments to query

b9a3b8b

more commentary

ba4be20

hawkw enabled auto-merge (squash) May 24, 2024 00:23

hawkw merged commit 27e6b34 into main May 24, 2024
19 checks passed

hawkw deleted the eliza/abandoned-vmm-reaper branch May 24, 2024 17:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[nexus] add background task for cleaning up abandoned VMMs #5812

[nexus] add background task for cleaning up abandoned VMMs #5812

hawkw commented May 23, 2024

smklein May 23, 2024

hawkw May 23, 2024

smklein May 23, 2024

smklein May 23, 2024

hawkw May 23, 2024

bnaecker left a comment

hawkw commented May 24, 2024

[nexus] add background task for cleaning up abandoned VMMs #5812

[nexus] add background task for cleaning up abandoned VMMs #5812

Conversation

hawkw commented May 23, 2024

smklein May 23, 2024

Choose a reason for hiding this comment

hawkw May 23, 2024

Choose a reason for hiding this comment

smklein May 23, 2024

Choose a reason for hiding this comment

smklein May 23, 2024

Choose a reason for hiding this comment

hawkw May 23, 2024

Choose a reason for hiding this comment

bnaecker left a comment

Choose a reason for hiding this comment

hawkw commented May 24, 2024