-
Notifications
You must be signed in to change notification settings - Fork 642
Implement PUT /api/v1/trusted_publishing/tokens
API endpoint
#11131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This fn can be used to decode a JSON web token without verifying it's signature or claims. Only the `iss` claim will actually be decoded, since we use that to find the correct decoding key for the JWT issuer.
This defaults to the domain name (crates.io / staging.crates.io) and controls the expected `aud` claim of the OIDC JWT in the Trusted Publishing token exchange.
…lishing providers
ebc6119
to
4006490
Compare
pub struct AccessToken(SecretString); | ||
|
||
impl AccessToken { | ||
const PREFIX: &str = "cio_tp_"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for regular tokens we use cio
plus 32 alphanumeric characters. when we implemented GitHub Secret Scanning we were told to use a more unique prefix with a separator. therefore the temporary access tokens are using the cio_to_
prefix to make them easier to recognize with less false positives.
/// Generate a new access token. | ||
pub fn generate() -> Self { | ||
Self::from_u64s(rand::random(), rand::random()) | ||
} | ||
|
||
/// Create an access token from two u64 values. | ||
/// | ||
/// This is used internally by the `generate()` fn and is extracted | ||
/// to a separate function for testing purposes. | ||
fn from_u64s(r1: u64, r2: u64) -> Self { | ||
let plaintext = format!("{}{r1:016x}{r2:016x}", Self::PREFIX); | ||
Self(SecretString::from(plaintext)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- is 128bits of randomness enough?
- https://docs.github.com/en/code-security/secret-scanning/secret-scanning-partnership-program/secret-scanning-partner-program#identify-your-secrets-and-create-regular-expressions suggests adding a (32bit) checksum at the end for validation purposes. That would require us to find and add yet another dependency. Not sure if that's really worth it? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- is 128bits of randomness enough?
Yep, 128 bits of entropy is sufficient! You could also do 256 bits to match what PyPI does for its tokens, but that's IMO overkill.
suggests adding a (32bit) checksum at the end for validation purposes. That would require us to find and add yet another dependency. Not sure if that's really worth it? 🤔
I think it depends partially on expected load/how expensive you expect credential revocation to be -- if there's a mass revocation event where GitHub is sending you tends of thousands of potential credentials, it can be helpful to have a pre-DB filter that weeds out false positives.
(This of course doesn't prevent someone from spamming you with fake tokens, since a CRC32 or similar is easy to crunch. But it limits the spam computationally, i.e. requires the spammer to waste time on that and keeps them from having a multiplicative impact on the DB itself.)
TL;DR: if the DB roundtrip is or could become expensive for token revocation, then a checksum or check-digit sequence is a good idea. Otherwise, you could probably leave it out of the MVP and version it in later 🙂
(I haven't used it myself, but https://crates.io/crates/crc32fast might be a good candidate?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the past, I've generally regretted not adding in the support for the checksum. 😓 That said, the tokens often looked like other hashes, so there were a lot of false positive submissions.
Since these have the unique prefix the number of submissions should end up restricted to things that actually look like tokens.
TL;DR: if the DB roundtrip is or could become expensive for token revocation, then a checksum or check-digit sequence is a good idea. Otherwise, you could probably leave it out of the MVP and version it in later 🙂
+1 to leaving this out for the MVP and only implementing if it becomes an issue later on.
Self { | ||
iss: GITHUB_ISSUER_URL.into(), | ||
nbf: now, | ||
iat: now, | ||
exp: now + 30 * 60, | ||
jti: "example-id".into(), | ||
sub: format!("repo:{owner_name}/{repository_name}"), | ||
aud: AUDIENCE.into(), | ||
|
||
environment: environment.map(|s| s.into()), | ||
r#ref: "refs/heads/main".into(), | ||
sha: "example-sha".into(), | ||
repository: format!("{owner_name}/{repository_name}"), | ||
repository_owner: owner_name.into(), | ||
actor_id: "12".into(), | ||
repository_visibility: "private".into(), | ||
repository_id: "74".into(), | ||
repository_owner_id: owner_id.to_string(), | ||
run_id: "example-run-id".into(), | ||
run_number: "10".into(), | ||
run_attempt: "2".into(), | ||
runner_environment: "github-hosted".into(), | ||
actor: "octocat".into(), | ||
workflow: "example-workflow".into(), | ||
head_ref: "".into(), | ||
base_ref: "".into(), | ||
event_name: "workflow_dispatch".into(), | ||
ref_type: "branch".into(), | ||
job_workflow_ref: format!( | ||
"{owner_name}/{repository_name}/.github/workflows/{workflow_filename}@refs/heads/main" | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static WORKFLOW_REF_RE: LazyLock<regex::Regex> = | ||
LazyLock::new(|| regex::Regex::new(r"([^/]+\.(yml|yaml))(@.+)").unwrap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyPI uses a "lookahead match" pattern, but that does not appear to be supported by the regex
crate. I've slightly adjusted the pattern to make it work. Since all the test cases from PyPI are passing I'm fairly confident in the change.
/// The main implementation of the [`OidcKeyStore`] trait. | ||
/// | ||
/// This struct fetches OIDC keys from a remote provider and caches them. If | ||
/// a key is not found in the cache, it will attempt to refresh the cached | ||
/// key set, unless the cache has just recently been refreshed. | ||
pub struct RealOidcKeyStore { | ||
issuer_uri: String, | ||
client: reqwest::Client, | ||
cache: RwLock<Cache>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation roughly matches how PyPI caches the JWKS, though they save it in Redis with an expiration time, while we cache them in memory on each API server.
I hope the code comments make it clear enough how the implementation works.
use std::sync::LazyLock; | ||
|
||
const PRIVATE_KEY_PEM: &[u8] = br#" | ||
-----BEGIN PRIVATE KEY----- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm somewhat surprised GitHub allowed me to push this 😂
As the code comment above says: DO NOT USE THESE IN PRODUCTION!!!1
@@ -37,6 +38,7 @@ fn main() -> anyhow::Result<()> { | |||
.databases_from_config(&config.db) | |||
.github(github) | |||
.github_oauth_from_config(&config) | |||
.trustpub_providers(&list("TRUSTPUB_PROVIDERS")?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this essentially acts as a feature flag for each of the providers. it allows us to enable e.g. github
on the staging environment, but not yet on production. once we add new providers it will also allow us to gradually enable them.
if a provider is breached it would also allow us to temporarily disable them.
@@ -186,6 +190,9 @@ impl Server { | |||
.unwrap_or_default() | |||
); | |||
|
|||
let domain_name = dotenvy::var("DOMAIN_NAME").unwrap_or_else(|_| "crates.io".into()); | |||
let trustpub_audience = var("TRUSTPUB_AUDIENCE")?.unwrap_or_else(|| domain_name.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this allows us to configure the expected aud
claim, if necessary. it defaults to the domain name (crates.io / staging.crates.io).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This may or may not be applicable in your case, but noting: PyPI intentionally chose not to use the domain here, since we thought we might want to change the authentication and/or exchange subdomain at some point. So PyPI uses pypi
and TestPyPI uses testpypi
.
(This probably doesn't matter in practice, just flagging as a very small difference! In practice, any identifier that serves as a domain separator is fine, DNS name or whatever else 🙂)
/// Exchange an OIDC token for a temporary access token. | ||
#[utoipa::path( | ||
put, | ||
path = "/api/v1/trusted_publishing/tokens", | ||
request_body = inline(json::ExchangeRequest), | ||
tag = "trusted_publishing", | ||
responses((status = 200, description = "Successful Response", body = inline(json::ExchangeResponse))), | ||
)] | ||
pub async fn exchange_trustpub_token( | ||
state: AppState, | ||
json: json::ExchangeRequest, | ||
) -> AppResult<Json<json::ExchangeResponse>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case you didn't find it in the large diff: this is the main entrypoint of the new API endpoint 😉
/// Add a new OIDC keystore to the application | ||
pub fn with_oidc_keystore( | ||
mut self, | ||
issuer_url: impl Into<String>, | ||
keystore: MockOidcKeyStore, | ||
) -> Self { | ||
self.oidc_key_stores | ||
.insert(issuer_url.into(), Box::new(keystore)); | ||
self | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by default, all tests start with no Trusted Publishing providers enabled, but this fn can be used to explicitly add them (with MockOidcKeyStore
implementations).
|
||
/// Generate a new access token. | ||
pub fn generate() -> Self { | ||
Self::from_u64s(rand::random(), rand::random()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small thing, but two questions:
- Why two
u64
s instead of a single[u8; 16]
? The per-call overhead for the RNG is minuscule, but a single call would eliminate it and would prevent any implied separation between the halves of the uniformly random token. - Does
crates.io
have a forking/multi-process model? If sorand::random()
might be risky -- I believe it usesThreadRng
by default, which isn't guaranteed to be self-healing over process forks (i.e., it needs to be explicitly reseeded to prevent duplicated randomness between forks). If there are multiple processes at play that can call thisgenerate()
function, then my recommendation would be to useOsRng
instead.
(The historical reason for not using OsRng
is performance concerns around the system calls it can incur, but on Linux getrandom
has been in the vDSO since 6.11.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does
crates.io
have a forking/multi-process model?
unless I'm misunderstanding our application, crates.io only uses a single process, but then has multiple worker threads inside the process. since we don't have a GIL in Rust, or are generally single-threaded like in JS, there is usually no need for multiple processes.
Why two
u64
s instead of a single[u8; 16]
?
no real reason actually. it seemed easiest to get to the hex output from there, but I'm totally open to alternatives 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless I'm misunderstanding our application, crates.io only uses a single process, but then has multiple worker threads inside the process. since we don't have a GIL in Rust, or are generally single-threaded like in JS, there is usually no need for multiple processes.
Gotcha, thanks! The concern with rand::random
is only when there are multiple processes, not multiple threads, so that API should be fine 🙂
no real reason actually. it seemed easiest to get to the hex output from there, but I'm totally open to alternatives 😅
It's an extreme nitpick from me so feel free to ignore! But from a type/domain coherence perspective I'd recommend [u8; 16]
so that it's obvious that there's no difference between the "halves" of the uniformly random token 🙂
let new_token = AccessToken::generate(); | ||
|
||
let new_token_model = NewToken { | ||
expires_at: chrono::Utc::now() + chrono::Duration::minutes(30), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will 30 minutes be long enough for cargo builds to complete? I know there was some discussion around this.
rust-lang/rfcs#3691 (comment)
Would it make sense to make this configurable via an environment variable to make it easy for us to adjust?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to make this configurable via an environment variable to make it easy for us to adjust?
I could make it configurable, though I'm wondering whether it would be better to accept it as a request parameter. I guess 30min is probably fine for this MVP though? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a datapoint: we currently do 15 minutes for PyPI, but all of you are eminently more qualified to know what makes more sense here!
(Whatever expiry you'd pick, I'd recommend not making it client configurable -- IMO TP's benefit is easiest to justify when expiries are consistent, i.e. users can't configure their way into a longer-lived credential by accident.)
|
||
/// Exchange an OIDC token for a temporary access token. | ||
#[utoipa::path( | ||
put, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be compliant with RFC 8693, this should be done via a POST.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like that RFC requires quite a few other things like using application/x-www-form-urlencoded
(which PyPI also doesn't do). I wasn't aware that we were supposed to follow that RFC? 🤔
the rest of the API generally uses PUT
to create resources and POST
for editing existing resources, which is why I chose PUT
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't believe PyPI follows that RFC either (at least, I didn't follow it while implementing Trusted Publishing 😅).
IMO the state/parameters used here are pretty distinct from the standard OAuth2 Token Exchange anyways, so deviating here seems like not a huge deal.
pub struct ExchangeRequest { | ||
pub jwt: String, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to expand and adjust this to be inline with RFC 8693.
The jwt
here in the original implementation will be the subject_token
in the spec.
Here's what I'd expect an example token exchange request to look like (a lot of implementations also support JSON content types here too):
POST /api/v1/trusted_publishing/tokens
Content-Type: application/x-www-form-urlencoded
grant_type=urn:ietf:params:oauth:grant-type:token-exchange&
subject_token=eyJhbGciOiJSUzI1NiIs...github-oidc-id-token...&
subject_token_type=urn:ietf:params:oauth:token-type:id_token&
audience=crates.io&
requested_token_type=urn:ietf:params:oauth:token-type:access_token
This is a big one... I'm sorry! 😅
This PR implements the token exchange API endpoint used for the Trusted Publishing flow. This is roughly how it works:
iss
claim (Issuer) to figure out who issued this token and compares it against a list of supported Trusted Publishing providers (configurable viaTRUSTPUB_PROVIDERS
, but currently only supporting GitHub Actions).jti
claim from the JWT to prevent token reuse by attempting to insert a new row into thetrustpub_used_jtis
table, or returns an error if a uniqueness constraint is triggered.trustpub_configs_github
table for a configuration matching the other claims in the JWTtrustpub_tokens
table, and then returned as the API response.As with the previous PRs, this implementation is largely inspired by the corresponding implementation in https://github.com/pypi/warehouse.
I've prepared the code to be somewhat provider-agnostic, but at this point the endpoint is still hardcoded to only support GitHub Actions. Refactoring this to completion can be done once we add support for another provider.
With that context I hope the PR is somewhat reviewable. It is definitely best reviewed commit by commit.
Related:
PUT /api/v1/trusted_publishing/github_configs
API endpoint #11113