Implement `PUT /api/v1/trusted_publishing/tokens` API endpoint #11131

Turbo87 · 2025-05-08T07:55:33Z

This is a big one... I'm sorry! 😅

This PR implements the token exchange API endpoint used for the Trusted Publishing flow. This is roughly how it works:

The endpoint accepts a JSON web token, issued by the GitHub Actions runner.
It looks at the iss claim (Issuer) to figure out who issued this token and compares it against a list of supported Trusted Publishing providers (configurable via TRUSTPUB_PROVIDERS, but currently only supporting GitHub Actions).
If a supported provider is found, it loads the JWKS (JSON Web Key Set) from their server, to validate that the JWT was indeed issued by that provider.
It uses the jti claim from the JWT to prevent token reuse by attempting to insert a new row into the trustpub_used_jtis table, or returns an error if a uniqueness constraint is triggered.
It then checks the trustpub_configs_github table for a configuration matching the other claims in the JWT
If any configurations are found a temporary access token is generated, saved to the trustpub_tokens table, and then returned as the API response.

As with the previous PRs, this implementation is largely inspired by the corresponding implementation in https://github.com/pypi/warehouse.

I've prepared the code to be somewhat provider-agnostic, but at this point the endpoint is still hardcoded to only support GitHub Actions. Refactoring this to completion can be done once we add support for another provider.

With that context I hope the PR is somewhat reviewable. It is definitely best reviewed commit by commit.

This fn can be used to decode a JSON web token without verifying it's signature or claims. Only the `iss` claim will actually be decoded, since we use that to find the correct decoding key for the JWT issuer.

This defaults to the domain name (crates.io / staging.crates.io) and controls the expected `aud` claim of the OIDC JWT in the Trusted Publishing token exchange.

…lishing providers

Turbo87 · 2025-05-08T08:03:53Z

crates/crates_io_trustpub/src/access_token.rs

+pub struct AccessToken(SecretString);
+
+impl AccessToken {
+    const PREFIX: &str = "cio_tp_";


for regular tokens we use cio plus 32 alphanumeric characters. when we implemented GitHub Secret Scanning we were told to use a more unique prefix with a separator. therefore the temporary access tokens are using the cio_to_ prefix to make them easier to recognize with less false positives.

Turbo87 · 2025-05-08T08:06:53Z

crates/crates_io_trustpub/src/access_token.rs

+    /// Generate a new access token.
+    pub fn generate() -> Self {
+        Self::from_u64s(rand::random(), rand::random())
+    }
+
+    /// Create an access token from two u64 values.
+    ///
+    /// This is used internally by the `generate()` fn and is extracted
+    /// to a separate function for testing purposes.
+    fn from_u64s(r1: u64, r2: u64) -> Self {
+        let plaintext = format!("{}{r1:016x}{r2:016x}", Self::PREFIX);
+        Self(SecretString::from(plaintext))
+    }


is 128bits of randomness enough?

https://docs.github.com/en/code-security/secret-scanning/secret-scanning-partnership-program/secret-scanning-partner-program#identify-your-secrets-and-create-regular-expressions suggests adding a (32bit) checksum at the end for validation purposes. That would require us to find and add yet another dependency. Not sure if that's really worth it? 🤔

is 128bits of randomness enough?

Yep, 128 bits of entropy is sufficient! You could also do 256 bits to match what PyPI does for its tokens, but that's IMO overkill.

suggests adding a (32bit) checksum at the end for validation purposes. That would require us to find and add yet another dependency. Not sure if that's really worth it? 🤔

I think it depends partially on expected load/how expensive you expect credential revocation to be -- if there's a mass revocation event where GitHub is sending you tends of thousands of potential credentials, it can be helpful to have a pre-DB filter that weeds out false positives.

(This of course doesn't prevent someone from spamming you with fake tokens, since a CRC32 or similar is easy to crunch. But it limits the spam computationally, i.e. requires the spammer to waste time on that and keeps them from having a multiplicative impact on the DB itself.)

TL;DR: if the DB roundtrip is or could become expensive for token revocation, then a checksum or check-digit sequence is a good idea. Otherwise, you could probably leave it out of the MVP and version it in later 🙂

(I haven't used it myself, but https://crates.io/crates/crc32fast might be a good candidate?)

In the past, I've generally regretted not adding in the support for the checksum. 😓 That said, the tokens often looked like other hashes, so there were a lot of false positive submissions.

Since these have the unique prefix the number of submissions should end up restricted to things that actually look like tokens.

TL;DR: if the DB roundtrip is or could become expensive for token revocation, then a checksum or check-digit sequence is a good idea. Otherwise, you could probably leave it out of the MVP and version it in later 🙂

+1 to leaving this out for the MVP and only implementing if it becomes an issue later on.

Turbo87 · 2025-05-08T08:08:14Z

crates/crates_io_trustpub/src/github/test_helpers.rs

+        Self {
+            iss: GITHUB_ISSUER_URL.into(),
+            nbf: now,
+            iat: now,
+            exp: now + 30 * 60,
+            jti: "example-id".into(),
+            sub: format!("repo:{owner_name}/{repository_name}"),
+            aud: AUDIENCE.into(),
+
+            environment: environment.map(|s| s.into()),
+            r#ref: "refs/heads/main".into(),
+            sha: "example-sha".into(),
+            repository: format!("{owner_name}/{repository_name}"),
+            repository_owner: owner_name.into(),
+            actor_id: "12".into(),
+            repository_visibility: "private".into(),
+            repository_id: "74".into(),
+            repository_owner_id: owner_id.to_string(),
+            run_id: "example-run-id".into(),
+            run_number: "10".into(),
+            run_attempt: "2".into(),
+            runner_environment: "github-hosted".into(),
+            actor: "octocat".into(),
+            workflow: "example-workflow".into(),
+            head_ref: "".into(),
+            base_ref: "".into(),
+            event_name: "workflow_dispatch".into(),
+            ref_type: "branch".into(),
+            job_workflow_ref: format!(
+                "{owner_name}/{repository_name}/.github/workflows/{workflow_filename}@refs/heads/main"
+            ),


see https://docs.github.com/en/actions/security-for-github-actions/security-hardening-your-deployments/about-security-hardening-with-openid-connect#understanding-the-oidc-token

Turbo87 · 2025-05-08T08:10:19Z

crates/crates_io_trustpub/src/github/workflows.rs

+    static WORKFLOW_REF_RE: LazyLock<regex::Regex> =
+        LazyLock::new(|| regex::Regex::new(r"([^/]+\.(yml|yaml))(@.+)").unwrap());


PyPI uses a "lookahead match" pattern, but that does not appear to be supported by the regex crate. I've slightly adjusted the pattern to make it work. Since all the test cases from PyPI are passing I'm fairly confident in the change.

Turbo87 · 2025-05-08T08:12:20Z

crates/crates_io_trustpub/src/keystore/impl.rs

+/// The main implementation of the [`OidcKeyStore`] trait.
+///
+/// This struct fetches OIDC keys from a remote provider and caches them. If
+/// a key is not found in the cache, it will attempt to refresh the cached
+/// key set, unless the cache has just recently been refreshed.
+pub struct RealOidcKeyStore {
+    issuer_uri: String,
+    client: reqwest::Client,
+    cache: RwLock<Cache>,
+}


This implementation roughly matches how PyPI caches the JWKS, though they save it in Redis with an expiration time, while we cache them in memory on each API server.

I hope the code comments make it clear enough how the implementation works.

Turbo87 · 2025-05-08T08:14:15Z

crates/crates_io_trustpub/src/test_keys.rs

+use std::sync::LazyLock;
+
+const PRIVATE_KEY_PEM: &[u8] = br#"
+-----BEGIN PRIVATE KEY-----


I'm somewhat surprised GitHub allowed me to push this 😂

As the code comment above says: DO NOT USE THESE IN PRODUCTION!!!1

Turbo87 · 2025-05-08T08:16:09Z

src/bin/server.rs

@@ -37,6 +38,7 @@ fn main() -> anyhow::Result<()> {
        .databases_from_config(&config.db)
        .github(github)
        .github_oauth_from_config(&config)
+        .trustpub_providers(&list("TRUSTPUB_PROVIDERS")?)


this essentially acts as a feature flag for each of the providers. it allows us to enable e.g. github on the staging environment, but not yet on production. once we add new providers it will also allow us to gradually enable them.

if a provider is breached it would also allow us to temporarily disable them.

Turbo87 · 2025-05-08T08:18:27Z

src/config/server.rs

@@ -186,6 +190,9 @@ impl Server {
                .unwrap_or_default()
        );

+        let domain_name = dotenvy::var("DOMAIN_NAME").unwrap_or_else(|_| "crates.io".into());
+        let trustpub_audience = var("TRUSTPUB_AUDIENCE")?.unwrap_or_else(|| domain_name.clone());


this allows us to configure the expected aud claim, if necessary. it defaults to the domain name (crates.io / staging.crates.io).

Nice! This may or may not be applicable in your case, but noting: PyPI intentionally chose not to use the domain here, since we thought we might want to change the authentication and/or exchange subdomain at some point. So PyPI uses pypi and TestPyPI uses testpypi.

(This probably doesn't matter in practice, just flagging as a very small difference! In practice, any identifier that serves as a domain separator is fine, DNS name or whatever else 🙂)

Turbo87 · 2025-05-08T08:19:02Z

src/controllers/trustpub/tokens/exchange/mod.rs

+/// Exchange an OIDC token for a temporary access token.
+#[utoipa::path(
+    put,
+    path = "/api/v1/trusted_publishing/tokens",
+    request_body = inline(json::ExchangeRequest),
+    tag = "trusted_publishing",
+    responses((status = 200, description = "Successful Response", body = inline(json::ExchangeResponse))),
+)]
+pub async fn exchange_trustpub_token(
+    state: AppState,
+    json: json::ExchangeRequest,
+) -> AppResult<Json<json::ExchangeResponse>> {


in case you didn't find it in the large diff: this is the main entrypoint of the new API endpoint 😉

Turbo87 · 2025-05-08T08:22:20Z

src/tests/util/test_app.rs

+    /// Add a new OIDC keystore to the application
+    pub fn with_oidc_keystore(
+        mut self,
+        issuer_url: impl Into<String>,
+        keystore: MockOidcKeyStore,
+    ) -> Self {
+        self.oidc_key_stores
+            .insert(issuer_url.into(), Box::new(keystore));
+        self
+    }


by default, all tests start with no Trusted Publishing providers enabled, but this fn can be used to explicitly add them (with MockOidcKeyStore implementations).

woodruffw · 2025-05-08T16:28:27Z

crates/crates_io_trustpub/src/access_token.rs

+
+    /// Generate a new access token.
+    pub fn generate() -> Self {
+        Self::from_u64s(rand::random(), rand::random())


Small thing, but two questions:

Why two u64s instead of a single [u8; 16]? The per-call overhead for the RNG is minuscule, but a single call would eliminate it and would prevent any implied separation between the halves of the uniformly random token.

Does crates.io have a forking/multi-process model? If so rand::random() might be risky -- I believe it uses ThreadRng by default, which isn't guaranteed to be self-healing over process forks (i.e., it needs to be explicitly reseeded to prevent duplicated randomness between forks). If there are multiple processes at play that can call this generate() function, then my recommendation would be to use OsRng instead.

(The historical reason for not using OsRng is performance concerns around the system calls it can incur, but on Linux getrandom has been in the vDSO since 6.11.)

Does crates.io have a forking/multi-process model?

unless I'm misunderstanding our application, crates.io only uses a single process, but then has multiple worker threads inside the process. since we don't have a GIL in Rust, or are generally single-threaded like in JS, there is usually no need for multiple processes.

Why two u64s instead of a single [u8; 16]?

no real reason actually. it seemed easiest to get to the hex output from there, but I'm totally open to alternatives 😅

unless I'm misunderstanding our application, crates.io only uses a single process, but then has multiple worker threads inside the process. since we don't have a GIL in Rust, or are generally single-threaded like in JS, there is usually no need for multiple processes.

Gotcha, thanks! The concern with rand::random is only when there are multiple processes, not multiple threads, so that API should be fine 🙂

no real reason actually. it seemed easiest to get to the hex output from there, but I'm totally open to alternatives 😅

It's an extreme nitpick from me so feel free to ignore! But from a type/domain coherence perspective I'd recommend [u8; 16] so that it's obvious that there's no difference between the "halves" of the uniformly random token 🙂

mdtro · 2025-05-08T16:44:33Z

src/controllers/trustpub/tokens/exchange/mod.rs

+            let new_token = AccessToken::generate();
+
+            let new_token_model = NewToken {
+                expires_at: chrono::Utc::now() + chrono::Duration::minutes(30),


Will 30 minutes be long enough for cargo builds to complete? I know there was some discussion around this.
rust-lang/rfcs#3691 (comment)

Would it make sense to make this configurable via an environment variable to make it easy for us to adjust?

Would it make sense to make this configurable via an environment variable to make it easy for us to adjust?

I could make it configurable, though I'm wondering whether it would be better to accept it as a request parameter. I guess 30min is probably fine for this MVP though? :)

As a datapoint: we currently do 15 minutes for PyPI, but all of you are eminently more qualified to know what makes more sense here!

https://github.com/pypi/warehouse/blob/c92eaf2777cb26586f2817136e80037c6697e55c/warehouse/oidc/views.py#L288

(Whatever expiry you'd pick, I'd recommend not making it client configurable -- IMO TP's benefit is easiest to justify when expiries are consistent, i.e. users can't configure their way into a longer-lived credential by accident.)

mdtro · 2025-05-08T16:54:48Z

src/controllers/trustpub/tokens/exchange/mod.rs

+
+/// Exchange an OIDC token for a temporary access token.
+#[utoipa::path(
+    put,


To be compliant with RFC 8693, this should be done via a POST.

seems like that RFC requires quite a few other things like using application/x-www-form-urlencoded (which PyPI also doesn't do). I wasn't aware that we were supposed to follow that RFC? 🤔

the rest of the API generally uses PUT to create resources and POST for editing existing resources, which is why I chose PUT here.

Yeah, I don't believe PyPI follows that RFC either (at least, I didn't follow it while implementing Trusted Publishing 😅).

IMO the state/parameters used here are pretty distinct from the standard OAuth2 Token Exchange anyways, so deviating here seems like not a huge deal.

mdtro · 2025-05-08T17:24:20Z

src/controllers/trustpub/tokens/json.rs

+pub struct ExchangeRequest {
+    pub jwt: String,
+}


We probably want to expand and adjust this to be inline with RFC 8693.

The jwt here in the original implementation will be the subject_token in the spec.

Here's what I'd expect an example token exchange request to look like (a lot of implementations also support JSON content types here too):

POST /api/v1/trusted_publishing/tokens Content-Type: application/x-www-form-urlencoded grant_type=urn:ietf:params:oauth:grant-type:token-exchange& subject_token=eyJhbGciOiJSUzI1NiIs...github-oidc-id-token...& subject_token_type=urn:ietf:params:oauth:token-type:id_token& audience=crates.io& requested_token_type=urn:ietf:params:oauth:token-type:access_token

Turbo87 added 20 commits May 8, 2025 09:54

diesel_helpers: Adjust lower() fn to also accept NULL

d6fa9d9

trustpub: Implement load_jwks() fn

6eda81f

trustpub: Add GITHUB_ISSUER_URL constant

512c762

trustpub: Add OidcKeyStore trait

9ed85fa

trustpub: Implement OidcKeyStore trait

a0592db

trustpub: Add mock OidcKeyStore implementation

31e6708

trustpub: Add RSA keys for testing purposes

6a2caee

trustpub: Add MockOidcKeyStore::with_test_key() fn

b5eb62b

trustpub: Implement extract_workflow_filename() fn

83ce793

trustpub: Implement UnverifiedClaims::decode() fn

222b2b8

This fn can be used to decode a JSON web token without verifying it's signature or claims. Only the `iss` claim will actually be decoded, since we use that to find the correct decoding key for the JWT issuer.

trustpub: Implement GitHubClaims struct

8a2abad

trustpub: Implement FullGitHubClaims struct for testing purposes

35d61c0

trustpub: Implement AccessToken struct

8cf78d8

database: Add NewUsedJti data access object

6928349

database: Add NewToken data access object

83be81e

config: Add TRUSTPUB_AUDIENCE setting

e3a2835

This defaults to the domain name (crates.io / staging.crates.io) and controls the expected `aud` claim of the OIDC JWT in the Trusted Publishing token exchange.

App: Add oidc_key_stores hashmap

b2ace7b

AppBuilder: Add trustpub_providers() fn

d203be6

bin/server: Use TRUSTPUB_PROVIDERS env var to configure Trusted Pub…

3cd0e7b

…lishing providers

tests/TestAppBuilder: Add with_oidc_keystore() fn

a60534f

Turbo87 added C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works A-backend ⚙️ labels May 8, 2025

Turbo87 requested review from LawnGnome and a team May 8, 2025 07:58

Implement PUT /api/v1/trusted_publishing/tokens API endpoint

4006490

Turbo87 force-pushed the trustpub-token-exchange branch from ebc6119 to 4006490 Compare May 8, 2025 07:59

Turbo87 mentioned this pull request May 8, 2025

Tracking Issue for "Trusted Publishing Support" #10247

Open

18 tasks

Turbo87 commented May 8, 2025

View reviewed changes

woodruffw reviewed May 8, 2025

View reviewed changes

mdtro reviewed May 8, 2025

View reviewed changes

the8472 mentioned this pull request May 8, 2025

RFC: Extended Standard Library (ESL) rust-lang/rfcs#3810

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `PUT /api/v1/trusted_publishing/tokens` API endpoint #11131

Implement `PUT /api/v1/trusted_publishing/tokens` API endpoint #11131

Turbo87 commented May 8, 2025 •

edited

Loading

Turbo87 May 8, 2025 •

edited

Loading

Turbo87 May 8, 2025

woodruffw May 8, 2025

mdtro May 8, 2025

Turbo87 May 8, 2025

Turbo87 May 8, 2025 •

edited

Loading

Turbo87 May 8, 2025

Turbo87 May 8, 2025

Turbo87 May 8, 2025

Turbo87 May 8, 2025

woodruffw May 8, 2025

Turbo87 May 8, 2025

Turbo87 May 8, 2025

woodruffw May 8, 2025

Turbo87 May 8, 2025

woodruffw May 8, 2025

mdtro May 8, 2025

Turbo87 May 8, 2025

woodruffw May 8, 2025

mdtro May 8, 2025

Turbo87 May 8, 2025

woodruffw May 8, 2025

mdtro May 8, 2025

		static WORKFLOW_REF_RE: LazyLock<regex::Regex> =
		LazyLock::new(\|\| regex::Regex::new(r"([^/]+\.(yml\|yaml))(@.+)").unwrap());

Implement PUT /api/v1/trusted_publishing/tokens API endpoint #11131

Are you sure you want to change the base?

Implement PUT /api/v1/trusted_publishing/tokens API endpoint #11131

Conversation

Turbo87 commented May 8, 2025 • edited Loading

Turbo87 May 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Turbo87 May 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Implement `PUT /api/v1/trusted_publishing/tokens` API endpoint #11131

Implement `PUT /api/v1/trusted_publishing/tokens` API endpoint #11131

Turbo87 commented May 8, 2025 •

edited

Loading

Turbo87 May 8, 2025 •

edited

Loading

Turbo87 May 8, 2025 •

edited

Loading