[Feature] Noise XKpsk3 integration (2025 version) #5692

simonwicky · 2025-04-07T12:27:02Z

Description

Noise PR #4360 is dead, long live Noise PR #5692.

No stacked PRs this time, commits description are quite explicit (and it's less of a mess than last time)

This change is

vercel · 2025-04-07T12:30:46Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
nym-explorer-v2	❌ Failed (Inspect)			May 21, 2025 7:59am

2 Skipped Deployments

Name	Status	Preview	Comments	Updated (UTC)
docs-nextra	⬜️ Ignored (Inspect)	Visit Preview		May 21, 2025 7:59am
nym-next-explorer	⬜️ Ignored (Inspect)	Visit Preview		May 21, 2025 7:59am

timkuijsten · 2025-04-07T22:29:27Z

Is there any rationale of the choice for XKpsk3? I.e. how to do DoS mitigation? Also I'm curious how this relates to the announcement of planning to use McEliece.

aniampio · 2025-04-10T07:51:15Z

@timkuijsten Thanks for the question! We decided to use XKpsk3 because it provides the best forward secrecy, authentication, replay protection, and identity-hiding guarantees for client-server settings where the client cannot be identified based on its source IP address. Also, as PQ-security is on our roadmap, we opted for the variant with PSK, as the pre-shared key can be later used to inject Post-Quantum safety into the protocol. We also decided to use the XKpsk3 between nodes, as we are considering supporting private gateways in the future, so the identity of the initiator may not be obvious from the source IP address (entry-mixnode). Then, for the rest of the connections (mix node to mix node, and mix node to exit) we also use the same pattern to keep the usage across the network uniform.

In terms of DoS protection, deploying Noise at the transport layer helps shield the Sphinx layer, as each message must first be validated by Noise before any Sphinx processing occurs. However, Noise itself can still be vulnerable to DoS attacks. WireGuard mitigates this risk using a cookie-based mechanism. We could adopt a similar approach, or explore alternative strategies — all of which are currently under consideration.

jstuczyn · 2025-04-10T10:33:48Z

nym-api/src/nym_nodes/handlers/unstable/semi_skimmed.rs

+        let node_id = nym_node.node_id;
+
+        let role: NodeRole = rewarded_set.role(node_id).into();
+


you copied it from skimmed response but you dropped filtering for inactive nodes, what gives?

Since (at least for the moment) we need this endpoint to get the Noise keys, we also need it for inactive nodes (tests and stuff) so I did not why we would need it only for active ones.
That being said, I can add the filtering back

but it's not designed to be a noise-exlusive endpoint : )

Added the option for later in d4d6e42
I didn't implement the active version of the endpoint though

jstuczyn · 2025-04-10T10:36:42Z

common/nymnoise/src/config.rs

+        match self.as_str().find("psk") {
+            Some(n) => {
+                let psk_index = n + 3;
+                let psk_char = self.as_str().chars().nth(psk_index).unwrap();


thinking about it, given we're using hardcoded patterns, couldn't we also hardcode that psk_position to avoid unwrap alltogether?

When adding/modifying patterns, I don't want to have that info in two places, that can lead to mismatch if not maintained properly.

The snow library is parsing the pattern string to get its scheme so it's not really far from that.

Given that, I feel like this is still the better option no?

When adding/modifying patterns, I don't want to have that info in two places, that can lead to mismatch if not maintained properly.

that's why we write unit tests : )

you can derive EnumIter on the guy and use it in tests to make sure every variant is always covered and returns expected value

I added tests to ensure validity of all the noise pattern we hardcode in 090026d
Let's keep the hardcoded position in the tests though

but why not also use constant for the psk_position itself xD (I guess I really don't like unwraps that could be removed)

I reckon having the automated construction somewhere and the constants somewhere else is a good thing to ensure validity.
I'm team automated construction in the actual code and constants in the tests, if the tests pass, those unwraps cannot fail.
Alternatively, I can remove them and make the fn fallible, since it's already used in a fallible function. But again, if the tests pass, they can't fail

common/nymnoise/src/config.rs

jstuczyn · 2025-04-10T10:41:30Z

common/nymnoise/src/config.rs

+                    .support
+                    .inner
+                    .load()
+                    .get(&ip_addr.to_canonical()) // SW default bind address being [::]:1789, it can happen that a responder sees the ipv6-mapped address of the initiator, this check for that


why would that matter?

also, wouldn't it be better if we always and only used canonical address?

It does matter because I have seen a case where the responder sees the IPv6-mapped version of the initiator's IP, and it needs to canonicalize it to find it.

As for only using the canonical version, using solely the IP for this is kinda shaky, so let's put all the chances on our side no? We're more likely to miss a Noise supporting node than to think a node is supporting it when it doesn't

I'm not sure I understand. wouldn't using canonical address help us in reducing chances of accidentally having duplicates of the same underlying node?

jstuczyn · 2025-04-10T10:43:36Z

common/nymnoise/src/config.rs

+    }
+
+    // Only for phased update
+    //SW This can lead to some troubles if two nodes shares the same IP and one support Noise but not the other. This in only for the progressive update though and there is no workaround


i hope this doesn't introduce any problems/vulnerabilities, because somebody will attempt to use that.

couldn't we simply say that if we have connection from ip A.B.C.D and it doesn't use noise, all connections from that ip are not allowed to use noise (and vice versa)

They might exploit and block you if you don't update. If you do, they can't.

We could but then it means somebody can force a downgrade on your up-to-date node. Better to cause problems on older than newer nodes no?

common/nymnoise/src/stream.rs

jstuczyn · 2025-04-10T11:01:42Z

common/nymnoise/src/stream.rs

+                    return Poll::Ready(Err(io::ErrorKind::InvalidInput.into()));
+                };
+                noise_buf.truncate(len);
+                match projected_self.inner_stream.start_send(noise_buf.into()) {


but start_send does not imply the data has been sent

Added flushing afterwards in fc16dc6

this won't work as you wouldn't be able to flush it before sending is finished. you'd need some sort of state machine to keep track of it. i think. perhaps use one of tokio-util's adapters?

Hold up, Tokio is doing exactly that in their SinkWriter implementation to go from a Sink to an AsyncWrite :
https://docs.rs/tokio-util/latest/src/tokio_util/io/sink_writer.rs.html#104

nym-node/src/node/mod.rs

jstuczyn · 2025-04-10T11:03:54Z

nym-node/src/node/shared_network.rs

        let res = self
            .client
-            .get_all_basic_nodes()
+            .get_all_expanded_nodes()


there's a tiny issue. currently nym api will return unimplemented on that endpoint : (

Well let's sure to update the nym-api before the nodes.
This endpoint is implemented in that PR

jstuczyn · 2025-04-10T11:05:36Z

nym-node/src/node/shared_network.rs

+            .flat_map(|n| {
+                n.basic.ip_addresses.iter().map(|ip_addr| {
+                    (
+                        SocketAddr::new(*ip_addr, n.basic.mix_port),


wait, hold on a second here. i'm not sure this is the port you want to be using. this is the port the remote node is listening on. when it tries to establish connection to your node, it will be different

This is exactly the info I want. The port info is used by initiator, which is sending to your node.
The responder will only use the IP part that is constructed here, where I'm removing the port info :

nym/common/nymnoise/src/config.rs

Line 70 in fba6c26

let noise_support = new

timkuijsten · 2025-04-10T13:10:59Z

In terms of DoS protection, deploying Noise at the transport layer helps shield the Sphinx layer, as each message must first be validated by Noise before any Sphinx processing occurs. However, Noise itself can still be vulnerable to DoS attacks. WireGuard mitigates this risk using a cookie-based mechanism. We could adopt a similar approach, or explore alternative strategies — all of which are currently under consideration.

I guess XK + cookie and making sure there is no responder-side state for unauthenticated connections (like rosenpass did) would put you in a good position. Are there already any public discussions, designs or transcripts?

common/nymnoise/src/stream.rs

jstuczyn · 2025-04-11T10:13:32Z

common/nymnoise/src/config.rs

+                    .support
+                    .inner
+                    .load()
+                    .get(&ip_addr.to_canonical()) // SW default bind address being [::]:1789, it can happen that a responder sees the ipv6-mapped address of the initiator, this check for that


I'm not sure I understand. wouldn't using canonical address help us in reducing chances of accidentally having duplicates of the same underlying node?

common/nymnoise/src/stream.rs

jstuczyn · 2025-04-11T10:18:49Z

common/nymnoise/src/stream.rs

+
+            Poll::Ready(Some(Ok(noise_msg))) => {
+                //We have a new moise msg
+                let mut dec_msg = vec![0u8; MAXMSGLEN];


I wonder, do we always have to allocate so much data per each message? i don't think that was the noise design

Edit : Reading the code (from snow too) again, indeed allocating noise_message.len() - TAGLEN will suffice, fixed in ea233c7

common/nymnoise/src/stream.rs

jstuczyn · 2025-04-11T10:21:54Z

common/nymnoise/src/stream.rs

+                let mut dec_msg = vec![0u8; MAXMSGLEN];
+                let len = match projected_self.noise {
+                    Some(transport_state) => {
+                        match transport_state.read_message(&noise_msg, &mut dec_msg) {


why not just return those bytes immediately here

Maybe we have things in the buffer that needs to be returned before that.
A call to this poll_read might end up with us having data we have decrypted but can't return yet.
Hence we need to check it first before returning that part

jstuczyn · 2025-04-11T10:23:22Z

common/nymnoise/src/stream.rs

+            Poll::Ready(Err(err)) => Poll::Ready(Err(err)),
+
+            Poll::Ready(Ok(())) => {
+                let mut noise_buf = BytesMut::zeroed(MAXMSGLEN + TAGLEN);


why not reuse existing buf or at least use its length? it's seems very wasteful to always allocate 60k for each tiny write...

In theory the writes can be up to 60k bytes though

Edit : Reading the code (from snow too) again, indeed allocating buf.len() + TAGLEN will suffice, fixed in ea233c7

jstuczyn · 2025-04-11T10:24:30Z

common/nymnoise/src/stream.rs

+    }
+}
+
+impl AsyncWrite for NoiseStream {


would https://docs.rs/tokio-util/latest/tokio_util/io/struct.SinkWriter.html work instead of implementing AsyncWrite? actually, why do we even need AsyncWrite + AsyncRead?

Without Noise, we're wrapping the TcpStream into a tokio_util::codec::Framed with the sphinx packet codec.
To code a one-for-one replacement, NoiseStream has to be AsyncWrite + AsyncRead in a way or another

SinkWriter doesn't work for the same reason StreamReader doesn't. We need to handle the encryption in the middle

simonwicky · 2025-04-14T12:10:40Z

@timkuijsten There are no public discussions or designs yet no.
Using cookies is a good lead, although it's just used to confirm the IP address so it can then be limited with other means.

timkuijsten · 2025-04-14T12:17:51Z

Using cookies is a good lead, although it's just used to confirm the IP address so it can then be limited with other means.

But nym uses TCP right? Isn't TCP's three-way-handshake enough for IP ownership confirmation? (unlike WireGuard which needs the cookie because it uses UDP).

simonwicky · 2025-04-14T13:02:21Z

@timkuijsten For the moment it does indeed. That's a fair point, I'll need to think about it.

jstuczyn · 2025-04-25T09:26:39Z

common/client-libs/mixnet-client/src/client.rs

+                            Err(err) => {
+                                error!("Failed to perform Noise handshake with {address} - {err}");
+                                // we failed to finish the noise handshake - increase reconnection attempt
+                                self.current_reconnection.fetch_add(1, Ordering::SeqCst);


wouldn't we end up in a constant reconnection loop if receiver doesn't support noise?

The error message is a bit misleading tbh, my bad.
If the receiver doesn't support Noise, there won't be any handshake, and we will just return the TcpStream wrapped in a Connection for compatibility

jstuczyn · 2025-04-25T09:29:01Z

common/nymnoise/src/config.rs

+        match self.as_str().find("psk") {
+            Some(n) => {
+                let psk_index = n + 3;
+                let psk_char = self.as_str().chars().nth(psk_index).unwrap();


but why not also use constant for the psk_position itself xD (I guess I really don't like unwraps that could be removed)

jstuczyn · 2025-04-25T10:04:03Z

common/nymnoise/src/stream.rs

+
+        while !handshake.is_handshake_finished() {
+            if handshake.is_my_turn() {
+                self.send_handshake_msg(&mut handshake).await?;


looking slightly deeper into it. i'm a bit confused: why are you having a method on self that takes handshake argument that originally was part of self but you temporarily removed it?

We still need the inner_stream from self to send the encrypted handshake message.
I'm taking the handshake out from self to prevent duplicate calls to this fn from messing with the first one.

simonwicky requested a review from jstuczyn April 7, 2025 12:27

vercel bot deployed to Preview – nym-explorer-v2 April 7, 2025 12:30 View deployment

vercel bot deployed to Preview – nym-explorer-v2 April 7, 2025 13:50 View deployment

vercel bot deployed to Preview – nym-explorer-v2 April 8, 2025 10:50 View deployment

vercel bot deployed to Preview – nym-explorer-v2 April 9, 2025 09:35 View deployment

simonwicky force-pushed the simon/noise_nodes_2025 branch from d00427a to 3be921e Compare April 9, 2025 09:45

vercel bot deployed to Preview – nym-explorer-v2 April 9, 2025 09:51 View deployment

vercel bot deployed to Preview – nym-explorer-v2 April 9, 2025 10:12 View deployment

simonwicky requested a review from octol April 9, 2025 11:29

benedettadavico added this to the Tourist milestone Apr 9, 2025

jstuczyn reviewed Apr 10, 2025

View reviewed changes

vercel bot deployed to Preview – nym-explorer-v2 April 10, 2025 13:33 View deployment

vercel bot deployed to Preview – nym-explorer-v2 April 10, 2025 14:21 View deployment

jstuczyn reviewed Apr 11, 2025

View reviewed changes

common/nymnoise/src/stream.rs Outdated Show resolved Hide resolved

jstuczyn reviewed Apr 11, 2025

View reviewed changes

common/nymnoise/src/stream.rs Show resolved Hide resolved

jstuczyn reviewed Apr 11, 2025

View reviewed changes

vercel bot deployed to Preview – nym-explorer-v2 April 14, 2025 13:57 View deployment

vercel bot deployed to Preview – nym-explorer-v2 April 15, 2025 13:58 View deployment

vercel bot deployed to Preview – nym-explorer-v2 April 16, 2025 08:43 View deployment

simonwicky added this to the Godiva milestone Apr 22, 2025

jstuczyn reviewed Apr 25, 2025

View reviewed changes

simonwicky modified the milestones: Appenzeller, Brie May 5, 2025

simonwicky and others added 19 commits May 20, 2025 16:03

add noise lib

d21eff3

adding proper noise key announcing on nodes

6cf8d79

add semi-skimmed endpoint to distribute noise key

eabad28

noise handshake common

ddf7c3b

noise handshake responder side

1ee8a5a

noise handshake initiator side

8f950ee

enable noise by announcing keys

690a189

fix wasm client by conditionnally import mixnet client in client-core

bfa52f7

additional Polish; missing features, extra test, etc

cb17314

some comments and minor improvements for future versions

c6675bb

appease the clippy god

cc28575

appease the clippy god

7e2e122

resolve non stream-related PR comments

162b742

fix asyncread and asyncwrite op following PR comment

cc4dcff

add active_only option for semi-skimmed node build_response

cd61e7c

improve noisestream creation and test noisepatterns

32fc3c1

restore start_send use

7c10417

change buffer allocation method and use connection timeout

4188c9f

add multiple output for semi-skimmed endpoint

48045f7

simonwicky force-pushed the simon/noise_nodes_2025 branch from 99fc052 to 48045f7 Compare May 20, 2025 14:19

vercel bot had a problem deploying to Preview – nym-explorer-v2 May 20, 2025 14:27 Failure

Bump ns-api version

a66b2ae

vercel bot temporarily deployed to Preview – nym-next-explorer May 20, 2025 14:33 Inactive

vercel bot temporarily deployed to Preview – nym-explorer-v2 May 20, 2025 14:33 Inactive

backwards compatibility for mixnodes announced keys

8125c0a

vercel bot had a problem deploying to Preview – nym-explorer-v2 May 21, 2025 07:58 Failure

		let node_id = nym_node.node_id;

		let role: NodeRole = rewarded_set.role(node_id).into();

[Feature] Noise XKpsk3 integration (2025 version) #5692

Are you sure you want to change the base?

[Feature] Noise XKpsk3 integration (2025 version) #5692

Conversation

simonwicky commented Apr 7, 2025 • edited by mmsinclair Loading

Description

vercel bot commented Apr 7, 2025 • edited Loading

timkuijsten commented Apr 7, 2025

aniampio commented Apr 10, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timkuijsten commented Apr 10, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonwicky Apr 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonwicky Apr 16, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonwicky commented Apr 14, 2025

timkuijsten commented Apr 14, 2025

simonwicky commented Apr 14, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonwicky commented Apr 7, 2025 •

edited by mmsinclair

Loading

vercel bot commented Apr 7, 2025 •

edited

Loading

simonwicky Apr 11, 2025 •

edited

Loading

simonwicky Apr 16, 2025 •

edited

Loading