Skip to content

Commit 4fd9dd2

Browse files
authored
[nexus] Add a schema change to fix instance counter underflow (#5838)
This is a corollary PR to #5830 , which fixed the root cause. Due to a bug in the virtual provisioning query, it was possible to undercount virtual provisioning information for instances, which would result in an integer underflow for "total CPU/RAM provisioned" for a {project, silo, fleet}. Although #5830 fixed the root cause, it's possible that in-field systems have an invalid value if they experienced this bug. This PR uses a schema change, exploiting the fact that schema changes occur with instances offline, to reset these values to a known value.
1 parent 01bc9ad commit 4fd9dd2

File tree

3 files changed

+32
-2
lines changed

3 files changed

+32
-2
lines changed

nexus/db-model/src/schema_versions.rs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ use std::collections::BTreeMap;
1717
///
1818
/// This must be updated when you change the database schema. Refer to
1919
/// schema/crdb/README.adoc in the root of this repository for details.
20-
pub const SCHEMA_VERSION: SemverVersion = SemverVersion::new(71, 0, 0);
20+
pub const SCHEMA_VERSION: SemverVersion = SemverVersion::new(72, 0, 0);
2121

2222
/// List of all past database schema versions, in *reverse* order
2323
///
@@ -29,6 +29,7 @@ static KNOWN_VERSIONS: Lazy<Vec<KnownVersion>> = Lazy::new(|| {
2929
// | leaving the first copy as an example for the next person.
3030
// v
3131
// KnownVersion::new(next_int, "unique-dirname-with-the-sql-files"),
32+
KnownVersion::new(72, "fix-provisioning-counters"),
3233
KnownVersion::new(71, "add-saga-unwound-vmm-state"),
3334
KnownVersion::new(70, "separate-instance-and-vmm-states"),
3435
KnownVersion::new(69, "expose-stage0"),

schema/crdb/dbinit.sql

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4076,7 +4076,7 @@ INSERT INTO omicron.public.db_metadata (
40764076
version,
40774077
target_version
40784078
) VALUES
4079-
(TRUE, NOW(), NOW(), '71.0.0', NULL)
4079+
(TRUE, NOW(), NOW(), '72.0.0', NULL)
40804080
ON CONFLICT DO NOTHING;
40814081

40824082
COMMIT;
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
-- This change fixes provisioning counters, alongside the
2+
-- underflow fix provided in https://github.com/oxidecomputer/omicron/pull/5830.
3+
-- Although this underflow has been fixed, it could have resulted
4+
-- in invalid accounting, which is mitigated by this schema change.
5+
--
6+
-- This update is currently occurring offline, so we exploit
7+
-- that fact to identify that all instances *should* be terminated
8+
-- before racks are updated. If they aren't, and an instance is in the
9+
-- "running" state when an update occurs, the propolis zone would be
10+
-- terminated, while the running database record remains. In this case,
11+
-- the only action we could take on the VMM would be to delete it,
12+
-- which would attempt to delete the "vritual provisioning resource"
13+
-- record anyway. This case is already idempotent, and would be a safe
14+
-- operation even if the "virtual_provisioning_resource" has already
15+
-- been removed.
16+
17+
SET LOCAL disallow_full_table_scans = OFF;
18+
19+
-- First, ensure that no instance records exist.
20+
DELETE FROM omicron.public.virtual_provisioning_resource
21+
WHERE resource_type='instance';
22+
23+
-- Next, update the collections to identify that there
24+
-- are no instances running.
25+
UPDATE omicron.public.virtual_provisioning_collection
26+
SET
27+
cpus_provisioned = 0,
28+
ram_provisioned = 0;
29+

0 commit comments

Comments
 (0)