[external API] alerts: the renamening #8169

hawkw · 2025-05-15T19:11:03Z

In conversation with @ahl, we have determined that the external API for webhooks added in #7277 should be changed to focus on "alerts" as the first-class user-facing concept, with "webhooks" as one delivery mechanism for alerts. This way, we can talk about alerts as an entity in the API that exist independently of webhooks that deliver alerts, and the same alert types can be shared with other alert delivery mechanisms if any are added in the future.

What we currently refer to as "webhook events" and "webhook event classes" are therefore renamed to "alerts" and "alert classes". The current concept of "webhook receivers" is generalized to an "alert receiver" resource, of which webhook receivers are (currently) the only subtype. This way, if we add other mechanisms of delivering alerts in the future (email, first-class Slack integration, etc), we can introduce new subtypes of alert receivers. I've restructured the API to have both /v1/alert-receivers/... and /v1/webhook-receivers/... routes, with operations common to all alert receivers (list, view, add/remove subscriptions, delete) under the alert-receivers route, and operations related to webhook-specific configuration (add/remove secrets, probe, deliveries) under the webhook-receivers route. I've also changed the AlertReceiver view to have a "kind" enum that stores the subtype-specific configuration; currently, this will only ever be "webhook", but I thought it was worth doing this now to make future additions cause less breakage for API consumers.

This is, admittedly, a somewhat large diff, but fortunately, most of it is just renaming stuff and moving it around. Reviewers can focus more or less exclusively to the changes to the external API routes and models, and maybe the database migrations. Any mistakes while renaming and moving things around have already been caught by the Rust compiler. :)

had to go back and un-rename some columns since apparently CRDB can't do that (sad face!)

i gotta stop forgetting expectorate tests

david-crespo · 2025-05-15T22:35:54Z

I think this is good. One thing that throws me off (but I could definitely get used it) is that you create a webhook receiver with POST /v1/webhook-receivers but then you list and view it with /v1/alert-receivers. I can see why it works that way, and maybe with two kinds of receiver, the structure would be more obvious. On the other hand it's rare for end users to be looking at a list like what's in nexus_tags.txt. The docs site sidebar is the closest thing, but the structure there is not as visible as I'd like, mostly because we're using plain english for the titles, which I don't think we can avoid.

smklein · 2025-05-16T16:36:34Z

nexus/db-model/src/alert_subscription.rs

+    Clone, Debug, Queryable, Selectable, Insertable, Serialize, Deserialize,
+)]
+#[diesel(table_name = alert_subscription)]
+pub struct AlertRxSubscription {


Note for myself: this was moved from nexus/db-model/src/webhook_rx.rs

nexus/db-queries/src/db/datastore/saga.rs

nexus/external-api/src/lib.rs

smklein · 2025-05-16T16:41:38Z

nexus/external-api/src/lib.rs

+        receiver: Query<params::AlertReceiverSelector>,
+        state_filter: Query<params::AlertDeliveryStateFilter>,
+        pagination: Query<PaginatedByTimeAndId>,
+    ) -> Result<HttpResponseOk<ResultsPage<views::WebhookDelivery>>, HttpError>;


Is it weird that this is titled "list alert-deliveries", but it's returning "webhook delivery" objects?

namely: Should this be an "enum-of-one" if we expect there will be non-webhook alerts in the future?

So, I wasn't really sure what to do about this endpoint, honestly. Since it lists deliveries by receiver ID, the type of the response entries will all be the same type of delivery based on the type of receiver.

They could be an enum, but ideally, there would be a single enum containing the list of deliveries, so the user only has to handle the potentially variable type a single time, and once you inspect the top level enum, all the entries are the same type. I thought about doing this, but it felt a bit awkward with dropshot's ResultsPage --- though we could return an enum of multiple types of ResultsPages.

Alternatively, we could just make this endpoint specific to receiver types, so this one could be webhook-deliveries and we could just add separate delivery list endpoints if we were to add other receiver types in future. That might be the better move, since then there's no enum at all..

So, I wasn't really sure what to do about this endpoint, honestly. Since it lists deliveries by receiver ID, the type of the response entries will all be the same type of delivery based on the type of receiver.

Why would we expect these to differ by receiver type?

They could be an enum, but ideally, there would be a single enum containing the list of deliveries, so the user only has to handle the potentially variable type a single time, and once you inspect the top level enum, all the entries are the same type. I thought about doing this, but it felt a bit awkward with dropshot's ResultsPage --- though we could return an enum of multiple types of ResultsPages.

A ResultsPage of enums would be great; an enum of ResultsPage would fuck up pagination magic.

Alternatively, we could just make this endpoint specific to receiver types, so this one could be webhook-deliveries and we could just add separate delivery list endpoints if we were to add other receiver types in future. That might be the better move, since then there's no enum at all..

Again, since multiple receiver types is speculative at this point, let's choose something relatively consistently and re-evaluate when we have a second receiver.

smklein · 2025-05-16T16:42:53Z

nexus/tests/integration_tests/endpoints.rs

@@ -1246,7 +1249,7 @@ pub static DEMO_WEBHOOK_SECRET_CREATE: LazyLock<params::WebhookSecretCreate> =
        secret: "TRUSTNO1".to_string(),
    });

-// pub static DEMO_WEBHOOK_SUBSCRIPTION: LazyLock<shared::WebhookSubscription> =
+// pub static DEMO_ALERT_SUBSCRIPTION: LazyLock<shared::WebhookSubscription> =


Nit: I know this is commented out code (why is this here?) but should this be a shared::AlertScubscription now?

nexus/types/src/external_api/shared.rs

Co-authored-by: Sean Klein <[email protected]>

hawkw · 2025-05-16T17:16:04Z

@david-crespo re:

One thing that throws me off (but I could definitely get used it) is that you create a webhook receiver with POST /v1/webhook-receivers but then you list and view it with /v1/alert-receivers. I can see why it works that way, and maybe with two kinds of receiver, the structure would be more obvious.

Yeah, I agree that this feels a bit weird.

One thing I considered doing is also having a GET /v1/webhook-receivers route to list/view webhook receivers only, in addition to the GET routes for /v1/alert-receivers. That would return the webhook-specific models, and the view route would 404 if you requested a receiver ID/name that was a type other than webhook. I can see a couple advantages of this: it makes the API feel a bit more "complete", and it also provides you a way to get a webhook-receiver-specific model without having to handle the enum if you know the receiver you want is a webhook receiver (and similarly, it gives you a way to list only webhook receivers). On the other hand, it means we have two separate routes that list/view the same entities, which could be confusing for users, and it requires us to maintain more endpoints. What do you think? Is it worth adding routes like that?

hawkw · 2025-05-16T17:26:22Z

Oh, @smklein, one other thing: there are a couple of places where we now have tables that have a few columns with unfortunate names ("event_class" and "event_id" rather than "alert_class" and "alert_id") because CRDB doesn't support renaming columns idempotently. Do you think it's worth changing the migrations to drop those tables and create new ones with nicer names, instead of just renaming the table?

hawkw · 2025-05-16T17:29:45Z

schema/crdb/alerts-renamening/up04.sql

Ughhhh apparently this won't work as written, since you can't "ALTER TYPE IF EXISTS":

──── STDERR: omicron-nexus::test_all integration_tests::schema::dbinit_equals_sum_of_all_up log file: /tmp/test_all-fef1124d0eceaf7d-dbinit_equals_sum_of_all_up.3128075.0.log note: configured to log to "/tmp/test_all-fef1124d0eceaf7d-dbinit_equals_sum_of_all_up.3128075.0.log" thread 'integration_tests::schema::dbinit_equals_sum_of_all_up' panicked at nexus/tests/integration_tests/schema.rs:70:38: Failed to execute update: Error { kind: Db, cause: Some(DbError { severity: "ERROR", parsed_severity: Some(Error), code: SqlState(E42601), message: "at or near \"exists\": syntax error", detail: Some("source SQL:\nALTER TYPE IF EXISTS omicron.public.webhook_event_class\n ^"), hint: Some("try \\h ALTER TYPE"), position: None, where_: None, schema: None, table: None, column: None, datatype: None, constraint: None, file: Some("lexer.go"), line: Some(271), routine: Some("Error") }) }

I guess we'll have to drop the previous enums and add new ones, that's a bummer...

ahl

took a pass comparing what we had determined in the CLI with the changes you're proposing to the API

ahl · 2025-05-16T17:28:54Z

nexus/external-api/src/lib.rs

@@ -3660,79 +3693,48 @@ pub trait NexusExternalApi {
    /// queued for re-delivery.
    #[endpoint {
        method = POST,
-        path = "/v1/webhooks/receivers/{receiver}/probe",
-        tags = ["system/webhooks"],
+        path = "/v1/webhook-receivers/{receiver}/probe",


did you consider nesting webhook-receivers under alerts?

yeah, I had initially wanted to do /v1/alert-receivers/webhooks. however, we can't nest routes under a route which can also look up a resource by name. because there's a /v1/alert-receivers/{name-or-id} route, we can't also have a /v1/alert-receivers/webhooks/{name-or-id} route, since it's unclear whether /webhooks should be treated as a receiver name or as a routable path segment.

i did also consider nesting all of this stuff under a top-level /v1/alerts, so /v1/alerts/receivers, /v1/alerts/webhook-receivers, and so on. however, /v1/alerts is also the route for looking up the actual alert resource (currently used for resend but maybe also for actually getting the payload etc in future). alerts are only looked up by UUID and never by name, so we could nest other routes under /v1/alerts, but it felt a bit weird, and i didn't want to put the alert-lookup route under /v1/alerts/alerts because...that's gross.

also, @david-crespo had previously told me that we try to keep the public API as "flat" as possible rather than nesting, and i think the /v1/alerts, /v1/alert-receivers, /v1/alert-deliveries etc structure is closer to what we've done elsewhere? if either of you have suggestions for a better structure given all that, though, i'm all ears!

@david-crespo can you comment? It seems like "as flat as possible... but not flatter" might be the addendum.

ahl · 2025-05-16T17:29:46Z

nexus/external-api/src/lib.rs

-        path = "/v1/webhooks/receivers/{receiver}/probe",
-        tags = ["system/webhooks"],
+        path = "/v1/webhook-receivers/{receiver}/probe",
+        tags = ["system/alerts"],
    }]
    async fn webhook_receiver_probe(


is this particular to webhook receivers? In the CLI we had decided to call this subcommand oxide alert receiver probe i.e. assuming it would not be specific to webhooks but general for all receivers.

it returns a response model that is webhook-specific (it contains information about the status code of the HTTP response etc).

we could make that model be an enum with a single variant if we think it's preferable for this to be a route that applies to all receiver types. but, my thinking was that it's possible there might be some future receiver types that cannot be probed (can you synchronously probe an email?)

ahl · 2025-05-16T17:31:22Z

nexus/external-api/src/lib.rs

+        path = "/v1/alert-deliveries",
+        tags = ["system/alerts"],
+    }]
+    async fn alert_delivery_list(


we had decided to call this alert receiver log in the CLI. It is particular to a specific receiver, right?

yes, it's particular to a receiver, which is taken as a query param. i think log is a good name for the CLI, but in the API it felt like it should be a "list of delivery resources"...it could also be /v1/alert-receivers/{name-or-id}/deliveries or something?

/v1/alert-receivers/{name-or-id}/deliveries

That's what I was thinking.

yeah, we should be able to do that. i'll note that the previous route was /v1/webhook-deliveries?receiver=... and i think that may also have been at @david-crespo's urging IIRC.

ahl · 2025-05-16T17:32:57Z

nexus/external-api/src/lib.rs

    #[endpoint {
-        method = PUT,
-        path = "/v1/webhooks/receivers/{receiver}",
-        tags = ["system/webhooks"],
+        method = POST,
+        path = "/v1/alert-receivers/{receiver}/subscriptions",
+        tags = ["system/alerts"],
    }]
-    async fn webhook_receiver_update(
+    async fn alert_receiver_subscription_add(
        rqctx: RequestContext<Self::Context>,
-        path_params: Path<params::WebhookReceiverSelector>,
-        params: TypedBody<params::WebhookReceiverUpdate>,
-    ) -> Result<HttpResponseUpdatedNoContent, HttpError>;
+        path_params: Path<params::AlertReceiverSelector>,
+        params: TypedBody<params::AlertSubscriptionCreate>,
+    ) -> Result<HttpResponseCreated<views::AlertSubscriptionCreated>, HttpError>;

-    /// Delete webhook receiver
+    /// Remove alert receiver subscription
    #[endpoint {
        method = DELETE,
-        path = "/v1/webhooks/receivers/{receiver}",
-        tags = ["system/webhooks"],
+        path = "/v1/alert-receivers/{receiver}/subscriptions/{subscription}",
+        tags = ["system/alerts"],
    }]


you made what I thought was a good suggestion in the CLI to call these oxide alert receiver subscribe and oxide alert receiver unsubscribe. Do you now prefer subscription add/remove? I think I prefer the former, but my only strong preference is that CLI and API match in this regard.

i like the "subscribe"/"unsubscribe" naming for the CLI. per this comment from @david-crespo we prefer to use verbs like "add"/"remove" in API routes because they're consistent with other API operations, rather than descriptive verbs like "subscribe"/"unsubscribe". personally i think i would strongly prefer the more descriptive verbs in the CLI and don't have strong preferences about whether we should also use that in the API...perhaps it's more important for both to be consistent with each other than to use the more descriptive verbs, i dunno...

Sounds fine; just wanted to make sure we're making the decision eyes open.

i definitely think there's value in using the same verbs in the CLI and the API, because then the CLI becomes a sort of teaching tool for the API: if you've done something manually, you know exactly where to look if you want to do it programmatically. but, i'm not totally sure how strongly we've weighed that against other concerns in the past. are there any cases where we've intentionally chosen to use different verbs (or nouns!) in the CLI, or are they always the same?

nexus/external-api/src/lib.rs

ahl · 2025-05-16T17:38:39Z

@david-crespo re:

One thing that throws me off (but I could definitely get used it) is that you create a webhook receiver with POST /v1/webhook-receivers but then you list and view it with /v1/alert-receivers. I can see why it works that way, and maybe with two kinds of receiver, the structure would be more obvious.

What about creation via /v1/alert-receivers/webhook POST?

One thing I considered doing is also having a GET /v1/webhook-receivers route to list/view webhook receivers only, in addition to the GET routes for /v1/alert-receivers.

My suggestion is that we do one or the other for now i.e. "list all receivers" (which happen to just be webhooks) or "list webhook receivers" (which happen to be all receivers). In the future, I can't imagine the utility for listing JUST the webhooks, but I have been known to lack imagination!

That would return the webhook-specific models, and the view route would 404 if you requested a receiver ID/name that was a type other than webhook. I can see a couple advantages of this: it makes the API feel a bit more "complete", and it also provides you a way to get a webhook-receiver-specific model without having to handle the enum if you know the receiver you want is a webhook receiver (and similarly, it gives you a way to list only webhook receivers). On the other hand, it means we have two separate routes that list/view the same entities, which could be confusing for users, and it requires us to maintain more endpoints. What do you think? Is it worth adding routes like that?

Given that a second flavor of webhook is speculation at this point, I would suggest just pick something and re-evaluate when we have something more concrete in the future.

hawkw · 2025-05-16T18:41:03Z

@david-crespo re:

One thing that throws me off (but I could definitely get used it) is that you create a webhook receiver with POST /v1/webhook-receivers but then you list and view it with /v1/alert-receivers. I can see why it works that way, and maybe with two kinds of receiver, the structure would be more obvious.

What about creation via /v1/alert-receivers/webhook POST?

Unfortunately we can't have a route like that, as receivers are looked up by name, and there's an ambiguity as to whether /webhook is interpreted as a name of a receiver to look up or a fixed path segment (as i discussed in #8169 (comment))

hawkw added 16 commits May 8, 2025 14:47

WHEW

b8fdb3a

more pain

0f3c38b

reticulating views

6694d25

rename some internal files

4f4f371

split "webhook" and "alert" fns into separate files

9f6b866

rename absolutely E•V•E•R•Y•T•H•I•N•G

84fd31e

more fixy

57168cf

omdb expectorate update

01cc404

polar should use correct name

b1dd690

update authz endpoint tests

2385fda

update webhook tests

b850c5f

migration to rename DB tables

33abcdd

had to go back and un-rename some columns since apparently CRDB can't do that (sad face!)

rename typed uuid kind i forgot about

904c9e1

more internal renaming

571abf1

Merge branch 'main' into eliza/s/webhook/alert

3e62416

post merge fixup

47b19dc

hawkw requested review from ahl, smklein and david-crespo May 15, 2025 19:11

hawkw added 5 commits May 15, 2025 12:18

upadte expectorate query tests

52c4d77

make clippy happy

63352e6

docs embetterment

953196c

forgot to commit some of the docs embetterment

ac3e846

OH GOD THERES MORE OF THEM

9a6ba64

i gotta stop forgetting expectorate tests

smklein reviewed May 16, 2025

View reviewed changes

hawkw and others added 3 commits May 16, 2025 09:53

Update shared.rs

ea8feae

Co-authored-by: Sean Klein <[email protected]>

Update shared.rs

cd89bce

Co-authored-by: Sean Klein <[email protected]>

Update shared.rs

858c231

Co-authored-by: Sean Klein <[email protected]>

hawkw added 4 commits May 16, 2025 10:19

fix accidentally renamed tests

a05697c

fix gross attribute formatting

1df6022

remove commented out code

fdadcf3

fix name of migration

e5267f8

hawkw commented May 16, 2025

View reviewed changes

ahl reviewed May 16, 2025

View reviewed changes

hawkw added 3 commits May 16, 2025 13:17

MAKE THE MIGRATIONS ACTUALLY WORK

10168a0

update openapi again

31b01fd

whoopsie

affced5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[external API] alerts: the renamening #8169

[external API] alerts: the renamening #8169

hawkw commented May 15, 2025

david-crespo commented May 15, 2025

smklein May 16, 2025

smklein May 16, 2025

smklein May 16, 2025

hawkw May 16, 2025

ahl May 16, 2025

smklein May 16, 2025

hawkw commented May 16, 2025

hawkw commented May 16, 2025

hawkw May 16, 2025

ahl left a comment

ahl May 16, 2025

hawkw May 16, 2025

ahl May 16, 2025

ahl May 16, 2025

hawkw May 16, 2025

ahl May 16, 2025

hawkw May 16, 2025

ahl May 16, 2025

hawkw May 16, 2025

ahl May 16, 2025

hawkw May 16, 2025

ahl May 16, 2025

hawkw May 16, 2025

ahl commented May 16, 2025

hawkw commented May 16, 2025

[external API] alerts: the renamening #8169

Are you sure you want to change the base?

[external API] alerts: the renamening #8169

Conversation

hawkw commented May 15, 2025

david-crespo commented May 15, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hawkw commented May 16, 2025

hawkw commented May 16, 2025

Choose a reason for hiding this comment

ahl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahl commented May 16, 2025

hawkw commented May 16, 2025