-
Notifications
You must be signed in to change notification settings - Fork 239
[Feature] Noise XKpsk3 integration (2025 version) #5692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
2 Skipped Deployments
|
Is there any rationale of the choice for XKpsk3? I.e. how to do DoS mitigation? Also I'm curious how this relates to the announcement of planning to use McEliece. |
d00427a
to
3be921e
Compare
@timkuijsten Thanks for the question! We decided to use XKpsk3 because it provides the best forward secrecy, authentication, replay protection, and identity-hiding guarantees for client-server settings where the client cannot be identified based on its source IP address. Also, as PQ-security is on our roadmap, we opted for the variant with PSK, as the pre-shared key can be later used to inject Post-Quantum safety into the protocol. We also decided to use the XKpsk3 between nodes, as we are considering supporting private gateways in the future, so the identity of the initiator may not be obvious from the source IP address (entry-mixnode). Then, for the rest of the connections (mix node to mix node, and mix node to exit) we also use the same pattern to keep the usage across the network uniform. In terms of DoS protection, deploying Noise at the transport layer helps shield the Sphinx layer, as each message must first be validated by Noise before any Sphinx processing occurs. However, Noise itself can still be vulnerable to DoS attacks. WireGuard mitigates this risk using a cookie-based mechanism. We could adopt a similar approach, or explore alternative strategies — all of which are currently under consideration. |
let node_id = nym_node.node_id; | ||
|
||
let role: NodeRole = rewarded_set.role(node_id).into(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you copied it from skimmed
response but you dropped filtering for inactive nodes, what gives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since (at least for the moment) we need this endpoint to get the Noise keys, we also need it for inactive nodes (tests and stuff) so I did not why we would need it only for active ones.
That being said, I can add the filtering back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it's not designed to be a noise-exlusive endpoint : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the option for later in d4d6e42
I didn't implement the active
version of the endpoint though
match self.as_str().find("psk") { | ||
Some(n) => { | ||
let psk_index = n + 3; | ||
let psk_char = self.as_str().chars().nth(psk_index).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking about it, given we're using hardcoded patterns, couldn't we also hardcode that psk_position
to avoid unwrap
alltogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When adding/modifying patterns, I don't want to have that info in two places, that can lead to mismatch if not maintained properly.
The snow
library is parsing the pattern string to get its scheme so it's not really far from that.
Given that, I feel like this is still the better option no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When adding/modifying patterns, I don't want to have that info in two places, that can lead to mismatch if not maintained properly.
that's why we write unit tests : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can derive EnumIter
on the guy and use it in tests to make sure every variant is always covered and returns expected value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added tests to ensure validity of all the noise pattern we hardcode in 090026d
Let's keep the hardcoded position in the tests though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but why not also use constant for the psk_position
itself xD (I guess I really don't like unwraps that could be removed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reckon having the automated construction somewhere and the constants somewhere else is a good thing to ensure validity.
I'm team automated construction in the actual code and constants in the tests, if the tests pass, those unwrap
s cannot fail.
Alternatively, I can remove them and make the fn
fallible, since it's already used in a fallible function. But again, if the tests pass, they can't fail
common/nymnoise/src/config.rs
Outdated
.support | ||
.inner | ||
.load() | ||
.get(&ip_addr.to_canonical()) // SW default bind address being [::]:1789, it can happen that a responder sees the ipv6-mapped address of the initiator, this check for that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would that matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, wouldn't it be better if we always and only used canonical address?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does matter because I have seen a case where the responder sees the IPv6-mapped version of the initiator's IP, and it needs to canonicalize it to find it.
As for only using the canonical version, using solely the IP for this is kinda shaky, so let's put all the chances on our side no? We're more likely to miss a Noise supporting node than to think a node is supporting it when it doesn't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand. wouldn't using canonical address help us in reducing chances of accidentally having duplicates of the same underlying node?
} | ||
|
||
// Only for phased update | ||
//SW This can lead to some troubles if two nodes shares the same IP and one support Noise but not the other. This in only for the progressive update though and there is no workaround |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i hope this doesn't introduce any problems/vulnerabilities, because somebody will attempt to use that.
couldn't we simply say that if we have connection from ip A.B.C.D and it doesn't use noise, all connections from that ip are not allowed to use noise (and vice versa)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They might exploit and block you if you don't update. If you do, they can't.
We could but then it means somebody can force a downgrade on your up-to-date node. Better to cause problems on older than newer nodes no?
common/nymnoise/src/stream.rs
Outdated
return Poll::Ready(Err(io::ErrorKind::InvalidInput.into())); | ||
}; | ||
noise_buf.truncate(len); | ||
match projected_self.inner_stream.start_send(noise_buf.into()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but start_send
does not imply the data has been sent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added flushing afterwards in fc16dc6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this won't work as you wouldn't be able to flush it before sending is finished. you'd need some sort of state machine to keep track of it. i think. perhaps use one of tokio-util's adapters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hold up, Tokio is doing exactly that in their SinkWriter implementation to go from a Sink to an AsyncWrite :
https://docs.rs/tokio-util/latest/src/tokio_util/io/sink_writer.rs.html#104
let res = self | ||
.client | ||
.get_all_basic_nodes() | ||
.get_all_expanded_nodes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a tiny issue. currently nym api will return unimplemented on that endpoint : (
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well let's sure to update the nym-api before the nodes.
This endpoint is implemented in that PR
.flat_map(|n| { | ||
n.basic.ip_addresses.iter().map(|ip_addr| { | ||
( | ||
SocketAddr::new(*ip_addr, n.basic.mix_port), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait, hold on a second here. i'm not sure this is the port you want to be using. this is the port the remote node is listening on. when it tries to establish connection to your node, it will be different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly the info I want. The port info is used by initiator, which is sending to your node.
The responder will only use the IP part that is constructed here, where I'm removing the port info :
nym/common/nymnoise/src/config.rs
Line 70 in fba6c26
let noise_support = new |
I guess XK + cookie and making sure there is no responder-side state for unauthenticated connections (like rosenpass did) would put you in a good position. Are there already any public discussions, designs or transcripts? |
common/nymnoise/src/config.rs
Outdated
.support | ||
.inner | ||
.load() | ||
.get(&ip_addr.to_canonical()) // SW default bind address being [::]:1789, it can happen that a responder sees the ipv6-mapped address of the initiator, this check for that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand. wouldn't using canonical address help us in reducing chances of accidentally having duplicates of the same underlying node?
common/nymnoise/src/stream.rs
Outdated
|
||
Poll::Ready(Some(Ok(noise_msg))) => { | ||
//We have a new moise msg | ||
let mut dec_msg = vec![0u8; MAXMSGLEN]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder, do we always have to allocate so much data per each message? i don't think that was the noise design
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit : Reading the code (from snow
too) again, indeed allocating noise_message.len() - TAGLEN
will suffice, fixed in ea233c7
let mut dec_msg = vec![0u8; MAXMSGLEN]; | ||
let len = match projected_self.noise { | ||
Some(transport_state) => { | ||
match transport_state.read_message(&noise_msg, &mut dec_msg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just return those bytes immediately here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we have things in the buffer that needs to be returned before that.
A call to this poll_read
might end up with us having data we have decrypted but can't return yet.
Hence we need to check it first before returning that part
common/nymnoise/src/stream.rs
Outdated
Poll::Ready(Err(err)) => Poll::Ready(Err(err)), | ||
|
||
Poll::Ready(Ok(())) => { | ||
let mut noise_buf = BytesMut::zeroed(MAXMSGLEN + TAGLEN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not reuse existing buf or at least use its length? it's seems very wasteful to always allocate 60k for each tiny write...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory the writes can be up to 60k bytes though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit : Reading the code (from snow
too) again, indeed allocating buf.len() + TAGLEN
will suffice, fixed in ea233c7
} | ||
} | ||
|
||
impl AsyncWrite for NoiseStream { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would https://docs.rs/tokio-util/latest/tokio_util/io/struct.SinkWriter.html work instead of implementing AsyncWrite
? actually, why do we even need AsyncWrite + AsyncRead
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without Noise, we're wrapping the TcpStream into a tokio_util::codec::Framed
with the sphinx packet codec.
To code a one-for-one replacement, NoiseStream has to be AsyncWrite + AsyncRead
in a way or another
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SinkWriter doesn't work for the same reason StreamReader doesn't. We need to handle the encryption in the middle
@timkuijsten There are no public discussions or designs yet no. |
But nym uses TCP right? Isn't TCP's three-way-handshake enough for IP ownership confirmation? (unlike WireGuard which needs the cookie because it uses UDP). |
@timkuijsten For the moment it does indeed. That's a fair point, I'll need to think about it. |
Err(err) => { | ||
error!("Failed to perform Noise handshake with {address} - {err}"); | ||
// we failed to finish the noise handshake - increase reconnection attempt | ||
self.current_reconnection.fetch_add(1, Ordering::SeqCst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't we end up in a constant reconnection loop if receiver doesn't support noise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message is a bit misleading tbh, my bad.
If the receiver doesn't support Noise, there won't be any handshake, and we will just return the TcpStream
wrapped in a Connection
for compatibility
match self.as_str().find("psk") { | ||
Some(n) => { | ||
let psk_index = n + 3; | ||
let psk_char = self.as_str().chars().nth(psk_index).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but why not also use constant for the psk_position
itself xD (I guess I really don't like unwraps that could be removed)
|
||
while !handshake.is_handshake_finished() { | ||
if handshake.is_my_turn() { | ||
self.send_handshake_msg(&mut handshake).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking slightly deeper into it. i'm a bit confused: why are you having a method on self
that takes handshake
argument that originally was part of self but you temporarily removed it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need the inner_stream
from self
to send the encrypted handshake message.
I'm taking the handshake
out from self
to prevent duplicate calls to this fn
from messing with the first one.
99fc052
to
48045f7
Compare
Description
Noise PR #4360 is dead, long live Noise PR #5692.
No stacked PRs this time, commits description are quite explicit (and it's less of a mess than last time)
This change is