Is there any way to tell what PeerId has sent Inbound message in InboundUpgrade? #2602

folex · 2022-04-02T17:17:27Z

Some Context

I'm debugging a heisenbug which happens periodically: peer A sends message to peer B, but peer B is unable to read it due to unexpected end of file error.

And I'm having trouble reproducing and/or figuring out, which peer causes this. It would help me a lot if I could tell who has sent the message, i.e. PeerId of the sender.

I don't see how to get that info inside upgrade_inbound:

fn upgrade_inbound(self, mut socket: Socket, info: Self::Info)

Socket is multistream_select::negotiated::Negotiated<rw_stream_sink::RwStreamSink<Chan>>

The Question

Is there a way to tell sender PeerId inside InboundUpgrade implementation like this one upgrade.rs#L115?

Version

libp2p-core: 0.32
libp2p: 0.43

P.S.

Error happens inside read_length_prefixed. Exception looks like this

Error processing inbound ProtocolMessage: unexpected end of file

     Location:
         /home/circleci/project/particle-protocol/src/libp2p_protocol/upgrade.rs:126:30

     Stack backtrace:
        0: core::ops::function::Fn::call
        1: eyre::capture_handler
        2: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
        3: <libp2p_core::either::EitherFuture2<AFut,BFut> as core::future::future::Future>::poll
        4: futures_util::future::future::FutureExt::poll_unpin
        5: <libp2p_swarm::connection::handler_wrapper::SubstreamUpgrade<UserData,Upgrade> as core::future::future::Future>::poll
        6: futures_util::stream::stream::StreamExt::poll_next_unpin
        7: libp2p_swarm::connection::handler_wrapper::HandlerWrapper<TProtoHandler>::poll
        8: libp2p_swarm::connection::Connection<THandler>::poll
        9: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
       10: futures_executor::thread_pool::PoolState::work
       11: std::sys_common::backtrace::__rust_begin_short_backtrace
       12: core::ops::function::FnOnce::call_once{{vtable.shim}}
       13: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
                  at ./rustc/ec4bcaac450279b029f3480b8b8f1b82ab36a5eb/library/alloc/src/boxed.rs:1854:9
       14: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
                  at ./rustc/ec4bcaac450279b029f3480b8b8f1b82ab36a5eb/library/alloc/src/boxed.rs:1854:9
       15: std::sys::unix::thread::Thread::new::thread_start
                  at ./rustc/ec4bcaac450279b029f3480b8b8f1b82ab36a5eb/library/std/src/sys/unix/thread.rs:108:17
       16: start_thread
       17: clone

Protocol implementation is here https://github.com/fluencelabs/fluence/blob/master/particle-protocol/src/libp2p_protocol/upgrade.rs#L126

The text was updated successfully, but these errors were encountered:

mxinden · 2022-04-06T16:01:22Z

InboundUpgrade has no way to tell who the sender is, given that the security handshake, and thus the authentication of the sender might not have happened yet.

The only way I see is logging the PeerId once the security handshake has been successful and then try to correlate the log line with the panic. Though I guess this is happening on a busy machine and thus hard to correlate, correct?

folex · 2022-04-13T12:03:28Z

Yes, exactly. It's very hard to correlate.

folex · 2022-04-13T12:06:11Z

If it's not possible to provide PeerId information there, maybe it makes sense provide address info? Like multiaddr.

That would allow me to understand who is on the other end of the socket.

If that makes sense, I can try to implement that.

mxinden · 2022-04-17T08:13:05Z

If it's not possible to provide PeerId information there, maybe it makes sense provide address info? Like multiaddr.

Note that we are using the same InboundUpgrade and OutboundUpgrade across connection and stream upgrades. E.g. in the latter case we don't know the address, i.e. the address is not relevant. We could break the trait into two one for streams one for connections, though I am not sure that is worth it?

folex · 2022-04-19T20:14:40Z

Well, in OutboundUpgrade case it seems relevant, too, and for similar reason.

Imagine a situation, where you're trying to communicate with some peer, and communication fails at Inbound/Outbound Upgrade level. How do you debug it in an open big network?

Let me try to give more specific arguments.

InboundUpgrade

In my case, some peers close connection before fully writing protocol message. That's InboundUpgrade.

And that's exactly what happened in my case. And I think I have a fix for that (as described here libp2p/rust-yamux#117).

BUT since the network is rather inert when it comes to updates, some peers still follow invalid algorithm for sending data (ie closing too early). So my logs are still full of EOF errors, and it makes me uneasy, because I can't be sure I have fixed the problem.

If I could tell multiaddress/peerid/etc of the sender, I could go to that peer and ask for it's version via other protocol (eg identify), or at least tell if it's one of peers managed by me or not.

OutboundUpgrade

This would be a hypothetical example. Consider the following situation:

I send protocol message to some peer, but sending fails. It throws an EOF error for some peers, and messages do not get delivered.

If PeerId/multiaddress would be possible to learn, then I could pinpoint the peer at failure, and try to debug it somehow. Know its version, maybe talk to hoster of that peer, etc.

Conclusion :)

I see address information as vital for debugging networking problems. It is also feels natural to me to be able to access addresses of sockets, since OS has that information if I'm not mistaken.

WDYT?

Thanks for your openness and readiness to discuss possible changes in such a core API, really appreciate that. Cheers!

mxinden · 2022-04-23T09:29:49Z

I am still a bit reserved to add an additional argument to OutboundInboundUpgrade for the sake of debugging.

Would it not be an option to return an error from OutboundUpgrade or InboundUpgrade and then log the error along with the peer information in ConnectionHandler::inject_dial_upgrade_error or inject_listen_upgrade_error or NetworkBehaviour::inject_dial_failure or inject_listen_failure?

I might be missing something.

folex · 2022-04-25T10:17:00Z

Thank you, that sounds like what I originally wanted to know!

folex mentioned this issue Apr 25, 2022

Log connection failures fluencelabs/nox#1271

Merged

thomaseizinger closed this as completed Mar 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is there any way to tell what PeerId has sent Inbound message in InboundUpgrade? #2602

Is there any way to tell what PeerId has sent Inbound message in InboundUpgrade? #2602

folex commented Apr 2, 2022

mxinden commented Apr 6, 2022

Uh oh!

folex commented Apr 13, 2022

Uh oh!

folex commented Apr 13, 2022

Uh oh!

mxinden commented Apr 17, 2022

Uh oh!

folex commented Apr 19, 2022 •

edited

Loading

Uh oh!

mxinden commented Apr 23, 2022

Uh oh!

folex commented Apr 25, 2022

Uh oh!

Is there any way to tell what PeerId has sent Inbound message in InboundUpgrade? #2602

Is there any way to tell what PeerId has sent Inbound message in InboundUpgrade? #2602

Comments

folex commented Apr 2, 2022

Some Context

The Question

Version

P.S.

mxinden commented Apr 6, 2022

Uh oh!

folex commented Apr 13, 2022

Uh oh!

folex commented Apr 13, 2022

Uh oh!

mxinden commented Apr 17, 2022

Uh oh!

folex commented Apr 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

InboundUpgrade

OutboundUpgrade

Conclusion :)

Uh oh!

mxinden commented Apr 23, 2022

Uh oh!

folex commented Apr 25, 2022

Uh oh!

folex commented Apr 19, 2022 •

edited

Loading