As a prepatory step to making gossip replies synchronous, we will move
the ErrPeerExiting error into the lnpeer package so that it can be
imported by the discovery package. With synchronous sends, this error
can now be detected and handled by goroutines in the syncer, and cause
them to exit instead of continue to sending backlogs.
This commit creates a distinct replyHandler, completely isolating the
requesting state machine from the processing of queries from the remote
peer. Before the two were interlaced, and the syncer could only reply to
messages in certain states. Now the two will be complete separated,
which is preliminary step to make the replies synchronous (as otherwise
we would be blocking our own requesting state machine).
With this changes, the channelGraphSyncer of each peer will drive the
replyHanlder of the other. The two can now operate independently, or
even spun up conditionally depending on advertised support for gossip
queries, as shown below:
A B
channelGraphSyncer ---control-msg--->
replyHandler
channelGraphSyncer <--control-msg----
gossiper <--gossip-msgs----
<--control-msg---- channelGraphSyncer
replyHandler
---control-msg---> channelGraphSyncer
---gossip-msgs---> gossiper
This race was possible due to us making a subscription request before
the ChannelRouter has started. We address it by creating a dummy
subscription before proceeding to the real one to ensure we can do so
successfully. We use a dummy one in order to not consume an update from
the real one. This addresses the common "timed out waiting for opened
channel" flake within the integration test suite since the subscription
was never properly created, so we'd never be notified of when new graph
updates were received.
In this commit, we speed up the `TestChainWatcherDataLossProtect`
_considerably_ by enumerating relevant tests using table driven tests
rather than generating random tests via the `testing/quick` package.
Each of these test cases are also run in parallel bringing down the
execution time of this test from a few minutes, to a few seconds.
In this commit, we begin to queue any active syncers until the initial
historical sync has completed. We do this to ensure we can properly
handle any new channel updates at tip. This is required for fresh nodes
that are syncing the channel graph for the first time. If we begin
accepting updates at tip while the initial historical sync is still
ongoing, then we risk not processing certain updates since we've yet to
learn of the channels themselves.
In this commit, we add logic to handle a peer with whom we're performing
an initial historical sync disconnecting. This is required to ensure we
get as much of the graph as possible when starting a fresh node. It will
also serve useful to ensure we do not get stalled once we prevent active
GossipSyncers from starting until the initial historical sync has
completed.
Now that the roundRobinHandler is no longer present, this commit aims to
clean up and simplify some of the logic surrounding initializing/tearing
down new/stale GossipSyncers from the SyncManager. Along the way, we
also synchronize these calls with the syncerHandler, which will serve
useful in future work that allows us to recovery from initial historical
sync disconnections.
Since ActiveSync GossipSyncers no longer synchronize our state with the
remote peers, none of the logic surrounding the round-robin is required
within the SyncManager.
In this commit, we remove the ability for ActiveSync GossipSyncers to
synchronize our graph with our remote peers. This serves as a starting
point towards allowing the daemon to only synchronize our graph through
historical syncs, which will be routinely done by the SyncManager.