-
Notifications
You must be signed in to change notification settings - Fork 9
use mac flows to filter xde traffic #61 #62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Not yet had the luxury to test on real NICs, but we have progress (below taken from a running omicron + SoftNPU instance):
This currently relies on utterly abusing the internals of |
So, the interesting news is that this still works, and works on Intel NICs. Sadly, performance (in the latency sense) is practically identical: EDIT: Marginal changes are expected on these numbers; my test setup is capped at 2x1GbE and these latency measurements only cover C2S results from #62 after `master`
C2S results repeated
So between a few runs we're basically in the same ballpark, possibly a little worse off as we've now added in # master
kyle@farme:~/gits/opte$ pfexec opteadm set-xde-underlay igb0 igb1
kyle@farme:~/gits/opte$ ifconfig | grep igb
igb0: flags=1000942<BROADCAST,RUNNING,PROMISC,MULTICAST,IPv4> mtu 9000 index 3
igb1: flags=1000942<BROADCAST,RUNNING,PROMISC,MULTICAST,IPv4> mtu 9000 index 4
igb0: flags=20002104941<UP,RUNNING,PROMISC,MULTICAST,DHCP,ROUTER,IPv6> mtu 9000 index 3
igb1: flags=20002104941<UP,RUNNING,PROMISC,MULTICAST,DHCP,ROUTER,IPv6> mtu 9000 index 4
# git switch use-the-flow-luke-61, driver recompile, ...
kyle@farme:~/gits/opte$ pfexec opteadm set-xde-underlay igb0 igb1
kyle@farme:~/gits/opte$ ifconfig | grep igb
igb0: flags=1000842<BROADCAST,RUNNING,MULTICAST,IPv4> mtu 9000 index 3
igb1: flags=1000842<BROADCAST,RUNNING,MULTICAST,IPv4> mtu 9000 index 4
igb0: flags=20002104841<UP,RUNNING,MULTICAST,DHCP,ROUTER,IPv6> mtu 9000 index 3
igb1: flags=20002104841<UP,RUNNING,MULTICAST,DHCP,ROUTER,IPv6> mtu 9000 index 4 I don't yet know why zone-to-zone over simnets is broken on CI -- from what I recall it worked on my local helios box before I acquired a second test node. |
What do you get when running a similar traffic flow between the raw IPv6 addresses? |
While running an iperf session over each underlay link for 100s: kyle@farme:~/gits/opte$ cargo kbench in-situ have-a-go
Finished bench [optimized + debuginfo] target(s) in 0.15s
Running benches/xde.rs (target/release/deps/xde-ae24f11c9169898b)
###----------------------###
::: DTrace running... :::
:::Type 'exit' to finish.:::
###----------------------###
dtrace: description 'profile-201us ' matched 2 probes
exit
###---------------------###
:::Awaiting out files...:::
###---------------------###
###-----###
:::done!:::
###-----###
ERROR: No stack counts found
Failed to create flamegraph for xde_rx.
ERROR: No stack counts found
Failed to create flamegraph for xde_mc_tx. No hits for non-Geneve traffic. The flows themselves are: kyle@farme:~/gits/opte$ flowadm show-flow
FLOW LINK IPADDR PROTO LPORT RPORT DSFLD
igb0_xde igb0 -- udp 6081 -- --
igb1_xde igb1 -- udp 6081 -- -- So far as I can tell we can't jointly specify IP addr + family + port, c.f.
|
7af1fe6
to
11d6bc6
Compare
Today, we get our TX and RX pathways on underlay devices for XDE by creating a secondary MAC client on each device. As part of this process we must attach a unicast MAC address (or specify `MAC_OPEN_FLAGS_NO_UNICAST_ADDR`) during creation to spin up a valid datapath, otherwise we can receive packets on our promiscuous mode handler but any sent packets are immediately dropped by MAC. However, datapath setup then fails to supply a dedicated ring/group for the new client, and the device is reduced to pure software classification. This hard-disables any ring polling threads, and so all packet processing occurs in the interrupt context. This limits throughput and increases OPTE's blast radius on control plane/crucible traffic between sleds. This PR places a hold onto the underlay NICs via `dls`, and makes use of `dls_open`/`dls_close` to acquire a valid transmit pathway onto the original (primary) MAC client, to which we can also attach a promiscuous callback. As desired, we are back in hardware classification. This work is orthogonal to #62 (and related efforts) which will get us out of promiscuous mode -- both are necessary parts of making optimal use of the illumos networking stack. Closes #489 .
Accidentally over-counted by the ETH header, and illumos is sending down chunks of size n×MSS. This would split one TSO packet into three (versus baseline two).
CI on recent PRs is breaking, due to rustup 1.28.0+ no longer autoinstalling the correct rust toolchain version. This hurts us immediately since we have *two* toolchains (pinned nightly and stable), and deliberately specified the nightly for some tooling. This PR changes this over to use buildomat's auto-installation for the stable variant, and the new toolchain show -> install pattern for nightly. This also lets us place `$NIGHTLY` into most of our `cargo fmt` invocations, which should reduce the busywork in future compiler bumps for XDE.
a98ac2f
to
4e0570f
Compare
Squashed down since the initial work here is a pain to rebase every time due to massive changes to OPTE over the years. Co-authored-by: Ryan Zezeski <[email protected]>
05a7ce9
to
effa48f
Compare
Rebased this atop tunneled LSO and IPv6 fastpath. On
One significant downside is that we've lost aggregate bandwidth. Runs with |
This work is on the back burner at the moment as there is more pressing work that can get done; sticking with promisc isn't a problem for the near term future.