Skip to content

Commit 5215d85

Browse files
background task for service zone nat (#4857)
Currently the logic for configuring NAT for service zones is deeply nested and crosses sled-agent http API boundaries. The cleanest way to deliver eventual consistency for service zone nat entries was to pull the zone information from inventory and use that to generate nat entries to reconcile against the `ipv4_nat_entry` table. This covers us in the following scenarios: ### RSS: * User provides configuration to RSS * RSS process ultimately creates a sled plan and service plan * Application of service plan by sled-agents creates zones * zone create makes direct calls to dendrite to configure NAT (it is the only way it can be done at this time) * eventually the Nexus zones are launched and handoff to Nexus is complete * inventory task is run, recording zone locations to db * service zone nat background task reads inventory from db and uses the data to generate records for `ipv4_nat_entry` table, then triggers dendrite sync. * sync is ultimately a noop because nat entries already exist in dendrite (dendrite operations are idempotent) ### Cold boot: * sled-agents create switch zones if they are managing a scrimlet, and subsequently create zones written to their ledgers. This may result in direct calls to dendrite. * Once nexus is back up, inventory will resume being collected * service zone nat background task will read inventory from db to reconcile entries in `ipv4_nat_entry` table and then trigger dendrite sync. * If nat is out of date on dendrite, it will be updated on trigger. ### Dendrite crash * If dendrite crashes and restarts, it will immediately contact Nexus for re-sync (pre-existing logic from earlier NAT RPW work) * service zone and instance nat entries are now present in rpw table, so all nat entries will be restored ### Migration / Relocation of service zone * New zone gets created on a sled in the rack. Direct call to dendrite will be made (it uses the same logic as pre-nexus to create zone). * Inventory task will record new location of service zone * Service zone nat background task will use inventory to update table, adding and removing the necessary nat entries and triggering a dendrite update Considerations --- Because this relies on data from the inventory task which runs on a periodic timer (600s), and because this task also runs on a periodic timer (30s), there may be some latency for picking up changes. A few potential avenues for improvement: * Plumb additional logic into service zone nat configuration that enables direct updates to the `ipv4_nat_entry` table once nexus is online. Of note, this would further bifurcate the logic of pre-nexus and post-nexus state management. At this moment, it seems that this is the most painful approach. An argument can be made that we ultimately should be lifting the nat configuration logic _out_ of the service zone creation instead. * Decrease the timer for the inventory task. This is the simplest change, however this would result in more frequent collection, increasing overhead. I do not know _how much_ this would increase overhead. Maybe it is negligible. * Plumb in the ability to trigger the inventory collection task for interesting control plane events. This would allow us to keep the _relatively_ infrequent timing intervals but allow us to refresh on-demand when needed. Related --- Closes #4650 Extracted from #4822
1 parent 80cc001 commit 5215d85

File tree

21 files changed

+770
-31
lines changed

21 files changed

+770
-31
lines changed

common/src/address.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,12 @@ pub const AZ_PREFIX: u8 = 48;
1818
pub const RACK_PREFIX: u8 = 56;
1919
pub const SLED_PREFIX: u8 = 64;
2020

21+
/// maximum possible value for a tcp or udp port
22+
pub const MAX_PORT: u16 = u16::MAX;
23+
24+
/// minimum possible value for a tcp or udp port
25+
pub const MIN_PORT: u16 = u16::MIN;
26+
2127
/// The amount of redundancy for internal DNS servers.
2228
///
2329
/// Must be less than or equal to MAX_DNS_REDUNDANCY.

common/src/nexus_config.rs

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -334,6 +334,8 @@ pub struct BackgroundTaskConfig {
334334
pub inventory: InventoryConfig,
335335
/// configuration for phantom disks task
336336
pub phantom_disks: PhantomDiskConfig,
337+
/// configuration for service zone nat sync task
338+
pub sync_service_zone_nat: SyncServiceZoneNatConfig,
337339
}
338340

339341
#[serde_as]
@@ -376,6 +378,14 @@ pub struct NatCleanupConfig {
376378
pub period_secs: Duration,
377379
}
378380

381+
#[serde_as]
382+
#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)]
383+
pub struct SyncServiceZoneNatConfig {
384+
/// period (in seconds) for periodic activations of this background task
385+
#[serde_as(as = "DurationSeconds<u64>")]
386+
pub period_secs: Duration,
387+
}
388+
379389
#[serde_as]
380390
#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)]
381391
pub struct InventoryConfig {
@@ -517,6 +527,7 @@ mod test {
517527
};
518528
use crate::address::{Ipv6Subnet, RACK_PREFIX};
519529
use crate::api::internal::shared::SwitchLocation;
530+
use crate::nexus_config::SyncServiceZoneNatConfig;
520531
use camino::{Utf8Path, Utf8PathBuf};
521532
use dropshot::ConfigDropshot;
522533
use dropshot::ConfigLogging;
@@ -665,6 +676,7 @@ mod test {
665676
inventory.nkeep = 11
666677
inventory.disable = false
667678
phantom_disks.period_secs = 30
679+
sync_service_zone_nat.period_secs = 30
668680
[default_region_allocation_strategy]
669681
type = "random"
670682
seed = 0
@@ -769,6 +781,9 @@ mod test {
769781
phantom_disks: PhantomDiskConfig {
770782
period_secs: Duration::from_secs(30),
771783
},
784+
sync_service_zone_nat: SyncServiceZoneNatConfig {
785+
period_secs: Duration::from_secs(30)
786+
}
772787
},
773788
default_region_allocation_strategy:
774789
crate::nexus_config::RegionAllocationStrategy::Random {
@@ -827,6 +842,7 @@ mod test {
827842
inventory.nkeep = 3
828843
inventory.disable = false
829844
phantom_disks.period_secs = 30
845+
sync_service_zone_nat.period_secs = 30
830846
[default_region_allocation_strategy]
831847
type = "random"
832848
"##,

dev-tools/omdb/tests/env.out

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,10 @@ task: "phantom_disks"
7070
detects and un-deletes phantom disks
7171

7272

73+
task: "service_zone_nat_tracker"
74+
ensures service zone nat records are recorded in NAT RPW table
75+
76+
7377
---------------------------------------------
7478
stderr:
7579
note: using Nexus URL http://127.0.0.1:REDACTED_PORT
@@ -139,6 +143,10 @@ task: "phantom_disks"
139143
detects and un-deletes phantom disks
140144

141145

146+
task: "service_zone_nat_tracker"
147+
ensures service zone nat records are recorded in NAT RPW table
148+
149+
142150
---------------------------------------------
143151
stderr:
144152
note: Nexus URL not specified. Will pick one from DNS.
@@ -195,6 +203,10 @@ task: "phantom_disks"
195203
detects and un-deletes phantom disks
196204

197205

206+
task: "service_zone_nat_tracker"
207+
ensures service zone nat records are recorded in NAT RPW table
208+
209+
198210
---------------------------------------------
199211
stderr:
200212
note: Nexus URL not specified. Will pick one from DNS.

dev-tools/omdb/tests/successes.out

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -264,6 +264,10 @@ task: "phantom_disks"
264264
detects and un-deletes phantom disks
265265

266266

267+
task: "service_zone_nat_tracker"
268+
ensures service zone nat records are recorded in NAT RPW table
269+
270+
267271
---------------------------------------------
268272
stderr:
269273
note: using Nexus URL http://127.0.0.1:REDACTED_PORT/
@@ -369,6 +373,13 @@ task: "phantom_disks"
369373
number of phantom disks deleted: 0
370374
number of phantom disk delete errors: 0
371375

376+
task: "service_zone_nat_tracker"
377+
configured period: every 30s
378+
currently executing: no
379+
last completed activation: iter 2, triggered by an explicit signal
380+
started at <REDACTED TIMESTAMP> (<REDACTED DURATION>s ago) and ran for <REDACTED DURATION>ms
381+
last completion reported error: inventory collection is None
382+
372383
---------------------------------------------
373384
stderr:
374385
note: using Nexus URL http://127.0.0.1:REDACTED_PORT/

docs/how-to-run.adoc

Lines changed: 77 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -498,41 +498,93 @@ Follow the instructions to set up the https://github.com/oxidecomputer/oxide.rs[
498498
oxide auth login --host http://192.168.1.21
499499
----
500500

501+
=== Configure quotas for your silo
502+
503+
Setting resource quotas is required before you can begin uploading images, provisioning instances, etc.
504+
In this example we'll update the recovery silo so we can provision instances directly from it:
505+
506+
[source, console]
507+
----
508+
$ oxide api /v1/system/silos/recovery/quotas --method PUT --input - <<EOF
509+
{
510+
"cpus": 9999999999,
511+
"memory": 999999999999999999,
512+
"storage": 999999999999999999
513+
}
514+
EOF
515+
516+
# example response
517+
{
518+
"cpus": 9999999999,
519+
"memory": 999999999999999999,
520+
"silo_id": "fa12b74d-30f8-4d5a-bc0e-4d229f13c6e5",
521+
"storage": 999999999999999999
522+
}
523+
----
524+
501525
=== Create an IP pool
502526

503527
An IP pool is needed to provide external connectivity to Instances. The addresses you use here should be addresses you've reserved from the external network (see <<_external_networking>>).
504528

529+
Here we will first create an ip pool for the recovery silo:
505530
[source,console]
506-
----
507-
$ oxide ip-pool range add --pool default --first 192.168.1.31 --last 192.168.1.40
508-
success
509-
IpPoolRange {
510-
id: 4a61e65a-d96d-4c56-9cfd-dc1e44d9e99b,
511-
ip_pool_id: 1b1289a7-cefe-4a7e-a8c9-d93330846301,
512-
range: V4(
513-
Ipv4Range {
514-
first: 192.168.1.31,
515-
last: 192.168.1.40,
516-
},
517-
),
518-
time_created: 2023-08-02T16:31:43.679785Z,
531+
---
532+
$ oxide api /v1/system/ip-pools --method POST --input - <<EOF
533+
{
534+
"name": "default",
535+
"description": "default ip-pool"
519536
}
520-
----
537+
EOF
538+
539+
# example response
540+
{
541+
"description": "default ip-pool",
542+
"id": "1c3dfa5c-7b00-46ff-987a-4e59e512b250",
543+
"name": "default",
544+
"time_created": "2024-01-16T22:51:54.679751Z",
545+
"time_modified": "2024-01-16T22:51:54.679751Z"
546+
}
547+
---
548+
549+
Now we will associate the pool with the recovery silo.
550+
[source,console]
551+
---
552+
$ oxide api /v1/system/ip-pools/default/silos --method POST --input - <<EOF
553+
{
554+
"silo": "recovery",
555+
"is_default": true
556+
}
557+
EOF
558+
559+
# example response
560+
{
561+
"ip_pool_id": "1c3dfa5c-7b00-46ff-987a-4e59e512b250",
562+
"is_default": true,
563+
"silo_id": "5c0aca09-d7ee-4be6-b7b1-060655659f74"
564+
}
565+
---
521566

522-
With SoftNPU you will generally also need to configure Proxy ARP. Below, `IP_POOL_START` and `IP_POOL_END` are the first and last addresses you used in the previous command:
567+
Now we will add an address range to the recovery silo:
523568

524569
[source,console]
525570
----
526-
# dladm won't return leading zeroes but `scadm` expects them
527-
$ SOFTNPU_MAC=$(dladm show-vnic sc0_1 -p -o macaddress | gsed 's/\b\(\w\)\b/0\1/g')
528-
$ pfexec zlogin sidecar_softnpu /softnpu/scadm \
529-
--server /softnpu/server \
530-
--client /softnpu/client \
531-
standalone \
532-
add-proxy-arp \
533-
$IP_POOL_START \
534-
$IP_POOL_END \
535-
$SOFTNPU_MAC
571+
oxide api /v1/system/ip-pools/default/ranges/add --method POST --input - <<EOF
572+
{
573+
"first": "$IP_POOL_START",
574+
"last": "$IP_POOL_END"
575+
}
576+
EOF
577+
578+
# example response
579+
{
580+
"id": "6209516e-2b38-4cbd-bff4-688ffa39d50b",
581+
"ip_pool_id": "1c3dfa5c-7b00-46ff-987a-4e59e512b250",
582+
"range": {
583+
"first": "192.168.1.35",
584+
"last": "192.168.1.40"
585+
},
586+
"time_created": "2024-01-16T22:53:43.179726Z"
587+
}
536588
----
537589

538590
=== Create a Project and Image

nexus/db-model/src/ipv4_nat_entry.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ use serde::Serialize;
1010
use uuid::Uuid;
1111

1212
/// Values used to create an Ipv4NatEntry
13-
#[derive(Insertable, Debug, Clone)]
13+
#[derive(Insertable, Debug, Clone, Eq, PartialEq)]
1414
#[diesel(table_name = ipv4_nat_entry)]
1515
pub struct Ipv4NatValues {
1616
pub external_address: Ipv4Net,

nexus/db-model/src/ipv4net.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ use std::net::Ipv4Addr;
1919
Clone,
2020
Copy,
2121
Debug,
22+
Eq,
2223
PartialEq,
2324
AsExpression,
2425
FromSqlRow,

nexus/db-model/src/ipv6net.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ use crate::RequestAddressError;
2121
Clone,
2222
Copy,
2323
Debug,
24+
Eq,
2425
PartialEq,
2526
AsExpression,
2627
FromSqlRow,

nexus/db-model/src/macaddr.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ use serde::Serialize;
1515
Clone,
1616
Copy,
1717
Debug,
18+
Eq,
1819
PartialEq,
1920
AsExpression,
2021
FromSqlRow,

nexus/db-model/src/schema.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ use omicron_common::api::external::SemverVersion;
1313
///
1414
/// This should be updated whenever the schema is changed. For more details,
1515
/// refer to: schema/crdb/README.adoc
16-
pub const SCHEMA_VERSION: SemverVersion = SemverVersion::new(28, 0, 0);
16+
pub const SCHEMA_VERSION: SemverVersion = SemverVersion::new(29, 0, 0);
1717

1818
table! {
1919
disk (id) {

nexus/db-model/src/vni.rs

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,15 @@ use serde::Deserialize;
1414
use serde::Serialize;
1515

1616
#[derive(
17-
Clone, Debug, Copy, AsExpression, FromSqlRow, Serialize, Deserialize,
17+
Clone,
18+
Debug,
19+
Copy,
20+
AsExpression,
21+
FromSqlRow,
22+
Serialize,
23+
Deserialize,
24+
Eq,
25+
PartialEq,
1826
)]
1927
#[diesel(sql_type = sql_types::Int4)]
2028
pub struct Vni(pub external::Vni);

0 commit comments

Comments
 (0)