Skip to content

Commit 77cab14

Browse files
authored
Merge pull request #1382 from TheBlueMatt/2022-03-gossip-queries-sucks
Fix gossip using `gossip_timestamp_filter` instead of queries
2 parents eb50201 + 1386a0c commit 77cab14

File tree

1 file changed

+81
-180
lines changed

1 file changed

+81
-180
lines changed

lightning/src/routing/network_graph.rs

Lines changed: 81 additions & 180 deletions
Original file line numberDiff line numberDiff line change
@@ -401,82 +401,91 @@ where C::Target: chain::Access, L::Target: Logger
401401
return ();
402402
}
403403

404-
// Send a gossip_timestamp_filter to enable gossip message receipt. Note that we have to
405-
// use a "all timestamps" filter as sending the current timestamp would result in missing
406-
// gossip messages that are simply sent late. We could calculate the intended filter time
407-
// by looking at the current time and subtracting two weeks (before which we'll reject
408-
// messages), but there's not a lot of reason to bother - our peers should be discarding
409-
// the same messages.
404+
// The lightning network's gossip sync system is completely broken in numerous ways.
405+
//
406+
// Given no broadly-available set-reconciliation protocol, the only reasonable approach is
407+
// to do a full sync from the first few peers we connect to, and then receive gossip
408+
// updates from all our peers normally.
409+
//
410+
// Originally, we could simply tell a peer to dump us the entire gossip table on startup,
411+
// wasting lots of bandwidth but ensuring we have the full network graph. After the initial
412+
// dump peers would always send gossip and we'd stay up-to-date with whatever our peer has
413+
// seen.
414+
//
415+
// In order to reduce the bandwidth waste, "gossip queries" were introduced, allowing you
416+
// to ask for the SCIDs of all channels in your peer's routing graph, and then only request
417+
// channel data which you are missing. Except there was no way at all to identify which
418+
// `channel_update`s you were missing, so you still had to request everything, just in a
419+
// very complicated way with some queries instead of just getting the dump.
420+
//
421+
// Later, an option was added to fetch the latest timestamps of the `channel_update`s to
422+
// make efficient sync possible, however it has yet to be implemented in lnd, which makes
423+
// relying on it useless.
424+
//
425+
// After gossip queries were introduced, support for receiving a full gossip table dump on
426+
// connection was removed from several nodes, making it impossible to get a full sync
427+
// without using the "gossip queries" messages.
428+
//
429+
// Once you opt into "gossip queries" the only way to receive any gossip updates that a
430+
// peer receives after you connect, you must send a `gossip_timestamp_filter` message. This
431+
// message, as the name implies, tells the peer to not forward any gossip messages with a
432+
// timestamp older than a given value (not the time the peer received the filter, but the
433+
// timestamp in the update message, which is often hours behind when the peer received the
434+
// message).
435+
//
436+
// Obnoxiously, `gossip_timestamp_filter` isn't *just* a filter, but its also a request for
437+
// your peer to send you the full routing graph (subject to the filter). Thus, in order to
438+
// tell a peer to send you any updates as it sees them, you have to also ask for the full
439+
// routing graph to be synced. If you set a timestamp filter near the current time, peers
440+
// will simply not forward any new updates they see to you which were generated some time
441+
// ago (which is not uncommon). If you instead set a timestamp filter near 0 (or two weeks
442+
// ago), you will always get the full routing graph from all your peers.
443+
//
444+
// Most lightning nodes today opt to simply turn off receiving gossip data which only
445+
// propagated some time after it was generated, and, worse, often disable gossiping with
446+
// several peers after their first connection. The second behavior can cause gossip to not
447+
// propagate fully if there are cuts in the gossiping subgraph.
448+
//
449+
// In an attempt to cut a middle ground between always fetching the full graph from all of
450+
// our peers and never receiving gossip from peers at all, we send all of our peers a
451+
// `gossip_timestamp_filter`, with the filter time set either two weeks ago or an hour ago.
452+
//
453+
// For no-std builds, we bury our head in the sand and do a full sync on each connection.
454+
let should_request_full_sync = self.should_request_full_sync(&their_node_id);
455+
#[allow(unused_mut, unused_assignments)]
456+
let mut gossip_start_time = 0;
457+
#[cfg(feature = "std")]
458+
{
459+
gossip_start_time = SystemTime::now().duration_since(UNIX_EPOCH).expect("Time must be > 1970").as_secs();
460+
if should_request_full_sync {
461+
gossip_start_time -= 60 * 60 * 24 * 7 * 2; // 2 weeks ago
462+
} else {
463+
gossip_start_time -= 60 * 60; // an hour ago
464+
}
465+
}
466+
410467
let mut pending_events = self.pending_events.lock().unwrap();
411468
pending_events.push(MessageSendEvent::SendGossipTimestampFilter {
412469
node_id: their_node_id.clone(),
413470
msg: GossipTimestampFilter {
414471
chain_hash: self.network_graph.genesis_hash,
415-
first_timestamp: 0,
472+
first_timestamp: gossip_start_time as u32, // 2106 issue!
416473
timestamp_range: u32::max_value(),
417474
},
418475
});
419-
420-
// Check if we need to perform a full synchronization with this peer
421-
if !self.should_request_full_sync(&their_node_id) {
422-
return ();
423-
}
424-
425-
let first_blocknum = 0;
426-
let number_of_blocks = 0xffffffff;
427-
log_debug!(self.logger, "Sending query_channel_range peer={}, first_blocknum={}, number_of_blocks={}", log_pubkey!(their_node_id), first_blocknum, number_of_blocks);
428-
pending_events.push(MessageSendEvent::SendChannelRangeQuery {
429-
node_id: their_node_id.clone(),
430-
msg: QueryChannelRange {
431-
chain_hash: self.network_graph.genesis_hash,
432-
first_blocknum,
433-
number_of_blocks,
434-
},
435-
});
436476
}
437477

438-
/// Statelessly processes a reply to a channel range query by immediately
439-
/// sending an SCID query with SCIDs in the reply. To keep this handler
440-
/// stateless, it does not validate the sequencing of replies for multi-
441-
/// reply ranges. It does not validate whether the reply(ies) cover the
442-
/// queried range. It also does not filter SCIDs to only those in the
443-
/// original query range. We also do not validate that the chain_hash
444-
/// matches the chain_hash of the NetworkGraph. Any chan_ann message that
445-
/// does not match our chain_hash will be rejected when the announcement is
446-
/// processed.
447-
fn handle_reply_channel_range(&self, their_node_id: &PublicKey, msg: ReplyChannelRange) -> Result<(), LightningError> {
448-
log_debug!(self.logger, "Handling reply_channel_range peer={}, first_blocknum={}, number_of_blocks={}, sync_complete={}, scids={}", log_pubkey!(their_node_id), msg.first_blocknum, msg.number_of_blocks, msg.sync_complete, msg.short_channel_ids.len(),);
449-
450-
log_debug!(self.logger, "Sending query_short_channel_ids peer={}, batch_size={}", log_pubkey!(their_node_id), msg.short_channel_ids.len());
451-
let mut pending_events = self.pending_events.lock().unwrap();
452-
pending_events.push(MessageSendEvent::SendShortIdsQuery {
453-
node_id: their_node_id.clone(),
454-
msg: QueryShortChannelIds {
455-
chain_hash: msg.chain_hash,
456-
short_channel_ids: msg.short_channel_ids,
457-
}
458-
});
459-
478+
fn handle_reply_channel_range(&self, _their_node_id: &PublicKey, _msg: ReplyChannelRange) -> Result<(), LightningError> {
479+
// We don't make queries, so should never receive replies. If, in the future, the set
480+
// reconciliation extensions to gossip queries become broadly supported, we should revert
481+
// this code to its state pre-0.0.106.
460482
Ok(())
461483
}
462484

463-
/// When an SCID query is initiated the remote peer will begin streaming
464-
/// gossip messages. In the event of a failure, we may have received
465-
/// some channel information. Before trying with another peer, the
466-
/// caller should update its set of SCIDs that need to be queried.
467-
fn handle_reply_short_channel_ids_end(&self, their_node_id: &PublicKey, msg: ReplyShortChannelIdsEnd) -> Result<(), LightningError> {
468-
log_debug!(self.logger, "Handling reply_short_channel_ids_end peer={}, full_information={}", log_pubkey!(their_node_id), msg.full_information);
469-
470-
// If the remote node does not have up-to-date information for the
471-
// chain_hash they will set full_information=false. We can fail
472-
// the result and try again with a different peer.
473-
if !msg.full_information {
474-
return Err(LightningError {
475-
err: String::from("Received reply_short_channel_ids_end with no information"),
476-
action: ErrorAction::IgnoreError
477-
});
478-
}
479-
485+
fn handle_reply_short_channel_ids_end(&self, _their_node_id: &PublicKey, _msg: ReplyShortChannelIdsEnd) -> Result<(), LightningError> {
486+
// We don't make queries, so should never receive replies. If, in the future, the set
487+
// reconciliation extensions to gossip queries become broadly supported, we should revert
488+
// this code to its state pre-0.0.106.
480489
Ok(())
481490
}
482491

@@ -1541,7 +1550,7 @@ mod tests {
15411550
use routing::network_graph::{NetGraphMsgHandler, NetworkGraph, NetworkUpdate, MAX_EXCESS_BYTES_FOR_RELAY};
15421551
use ln::msgs::{Init, OptionalField, RoutingMessageHandler, UnsignedNodeAnnouncement, NodeAnnouncement,
15431552
UnsignedChannelAnnouncement, ChannelAnnouncement, UnsignedChannelUpdate, ChannelUpdate,
1544-
ReplyChannelRange, ReplyShortChannelIdsEnd, QueryChannelRange, QueryShortChannelIds, MAX_VALUE_MSAT};
1553+
ReplyChannelRange, QueryChannelRange, QueryShortChannelIds, MAX_VALUE_MSAT};
15451554
use util::test_utils;
15461555
use util::logger::Logger;
15471556
use util::ser::{Readable, Writeable};
@@ -2278,15 +2287,16 @@ mod tests {
22782287
}
22792288

22802289
#[test]
2290+
#[cfg(feature = "std")]
22812291
fn calling_sync_routing_table() {
2292+
use std::time::{SystemTime, UNIX_EPOCH};
2293+
22822294
let network_graph = create_network_graph();
22832295
let (secp_ctx, net_graph_msg_handler) = create_net_graph_msg_handler(&network_graph);
22842296
let node_privkey_1 = &SecretKey::from_slice(&[42; 32]).unwrap();
22852297
let node_id_1 = PublicKey::from_secret_key(&secp_ctx, node_privkey_1);
22862298

22872299
let chain_hash = genesis_block(Network::Testnet).header.block_hash();
2288-
let first_blocknum = 0;
2289-
let number_of_blocks = 0xffff_ffff;
22902300

22912301
// It should ignore if gossip_queries feature is not enabled
22922302
{
@@ -2296,132 +2306,23 @@ mod tests {
22962306
assert_eq!(events.len(), 0);
22972307
}
22982308

2299-
// It should send a query_channel_message with the correct information
2309+
// It should send a gossip_timestamp_filter with the correct information
23002310
{
23012311
let init_msg = Init { features: InitFeatures::known(), remote_network_address: None };
23022312
net_graph_msg_handler.peer_connected(&node_id_1, &init_msg);
23032313
let events = net_graph_msg_handler.get_and_clear_pending_msg_events();
2304-
assert_eq!(events.len(), 2);
2314+
assert_eq!(events.len(), 1);
23052315
match &events[0] {
23062316
MessageSendEvent::SendGossipTimestampFilter{ node_id, msg } => {
23072317
assert_eq!(node_id, &node_id_1);
23082318
assert_eq!(msg.chain_hash, chain_hash);
2309-
assert_eq!(msg.first_timestamp, 0);
2319+
let expected_timestamp = SystemTime::now().duration_since(UNIX_EPOCH).expect("Time must be > 1970").as_secs();
2320+
assert!((msg.first_timestamp as u64) >= expected_timestamp - 60*60*24*7*2);
2321+
assert!((msg.first_timestamp as u64) < expected_timestamp - 60*60*24*7*2 + 10);
23102322
assert_eq!(msg.timestamp_range, u32::max_value());
23112323
},
23122324
_ => panic!("Expected MessageSendEvent::SendChannelRangeQuery")
23132325
};
2314-
match &events[1] {
2315-
MessageSendEvent::SendChannelRangeQuery{ node_id, msg } => {
2316-
assert_eq!(node_id, &node_id_1);
2317-
assert_eq!(msg.chain_hash, chain_hash);
2318-
assert_eq!(msg.first_blocknum, first_blocknum);
2319-
assert_eq!(msg.number_of_blocks, number_of_blocks);
2320-
},
2321-
_ => panic!("Expected MessageSendEvent::SendChannelRangeQuery")
2322-
};
2323-
}
2324-
2325-
// It should not enqueue a query when should_request_full_sync return false.
2326-
// The initial implementation allows syncing with the first 5 peers after
2327-
// which should_request_full_sync will return false
2328-
{
2329-
let network_graph = create_network_graph();
2330-
let (secp_ctx, net_graph_msg_handler) = create_net_graph_msg_handler(&network_graph);
2331-
let init_msg = Init { features: InitFeatures::known(), remote_network_address: None };
2332-
for n in 1..7 {
2333-
let node_privkey = &SecretKey::from_slice(&[n; 32]).unwrap();
2334-
let node_id = PublicKey::from_secret_key(&secp_ctx, node_privkey);
2335-
net_graph_msg_handler.peer_connected(&node_id, &init_msg);
2336-
let events = net_graph_msg_handler.get_and_clear_pending_msg_events();
2337-
if n <= 5 {
2338-
assert_eq!(events.len(), 2);
2339-
} else {
2340-
// Even after the we stop sending the explicit query, we should still send a
2341-
// gossip_timestamp_filter on each new connection.
2342-
assert_eq!(events.len(), 1);
2343-
}
2344-
2345-
}
2346-
}
2347-
}
2348-
2349-
#[test]
2350-
fn handling_reply_channel_range() {
2351-
let network_graph = create_network_graph();
2352-
let (secp_ctx, net_graph_msg_handler) = create_net_graph_msg_handler(&network_graph);
2353-
let node_privkey_1 = &SecretKey::from_slice(&[42; 32]).unwrap();
2354-
let node_id_1 = PublicKey::from_secret_key(&secp_ctx, node_privkey_1);
2355-
2356-
let chain_hash = genesis_block(Network::Testnet).header.block_hash();
2357-
2358-
// Test receipt of a single reply that should enqueue an SCID query
2359-
// matching the SCIDs in the reply
2360-
{
2361-
let result = net_graph_msg_handler.handle_reply_channel_range(&node_id_1, ReplyChannelRange {
2362-
chain_hash,
2363-
sync_complete: true,
2364-
first_blocknum: 0,
2365-
number_of_blocks: 2000,
2366-
short_channel_ids: vec![
2367-
0x0003e0_000000_0000, // 992x0x0
2368-
0x0003e8_000000_0000, // 1000x0x0
2369-
0x0003e9_000000_0000, // 1001x0x0
2370-
0x0003f0_000000_0000, // 1008x0x0
2371-
0x00044c_000000_0000, // 1100x0x0
2372-
0x0006e0_000000_0000, // 1760x0x0
2373-
],
2374-
});
2375-
assert!(result.is_ok());
2376-
2377-
// We expect to emit a query_short_channel_ids message with the received scids
2378-
let events = net_graph_msg_handler.get_and_clear_pending_msg_events();
2379-
assert_eq!(events.len(), 1);
2380-
match &events[0] {
2381-
MessageSendEvent::SendShortIdsQuery { node_id, msg } => {
2382-
assert_eq!(node_id, &node_id_1);
2383-
assert_eq!(msg.chain_hash, chain_hash);
2384-
assert_eq!(msg.short_channel_ids, vec![
2385-
0x0003e0_000000_0000, // 992x0x0
2386-
0x0003e8_000000_0000, // 1000x0x0
2387-
0x0003e9_000000_0000, // 1001x0x0
2388-
0x0003f0_000000_0000, // 1008x0x0
2389-
0x00044c_000000_0000, // 1100x0x0
2390-
0x0006e0_000000_0000, // 1760x0x0
2391-
]);
2392-
},
2393-
_ => panic!("expected MessageSendEvent::SendShortIdsQuery"),
2394-
}
2395-
}
2396-
}
2397-
2398-
#[test]
2399-
fn handling_reply_short_channel_ids() {
2400-
let network_graph = create_network_graph();
2401-
let (secp_ctx, net_graph_msg_handler) = create_net_graph_msg_handler(&network_graph);
2402-
let node_privkey = &SecretKey::from_slice(&[41; 32]).unwrap();
2403-
let node_id = PublicKey::from_secret_key(&secp_ctx, node_privkey);
2404-
2405-
let chain_hash = genesis_block(Network::Testnet).header.block_hash();
2406-
2407-
// Test receipt of a successful reply
2408-
{
2409-
let result = net_graph_msg_handler.handle_reply_short_channel_ids_end(&node_id, ReplyShortChannelIdsEnd {
2410-
chain_hash,
2411-
full_information: true,
2412-
});
2413-
assert!(result.is_ok());
2414-
}
2415-
2416-
// Test receipt of a reply that indicates the peer does not maintain up-to-date information
2417-
// for the chain_hash requested in the query.
2418-
{
2419-
let result = net_graph_msg_handler.handle_reply_short_channel_ids_end(&node_id, ReplyShortChannelIdsEnd {
2420-
chain_hash,
2421-
full_information: false,
2422-
});
2423-
assert!(result.is_err());
2424-
assert_eq!(result.err().unwrap().err, "Received reply_short_channel_ids_end with no information");
24252326
}
24262327
}
24272328

0 commit comments

Comments
 (0)