In this commit, we fix a slight bug in the existing implantation that
would cause no channel recovery if the recovering node was already
connected to their channel peer. As we need the link to be known at the
time of connection, if we're already connected, then the chan sync
message won't be sent again. By first disconnecting an existing peer, we
ensure that during the next connection (after the recovered channel is
added to the database), then the regular chan sync message exchange will
take place as expected. # Please enter the commit message for your
changes.
In this commit, we modify the starting link logic to always send the
chan sync message to the remote peer in a synchronous manner. Otherwise,
it's possible that we fail very quickly below this block, and don't ever
send the message to the remote peer.
In this commit, we fix a bug in the existing logic for ConnectPeer that
would cause an SCB restore to fail if we were already connected to the
peer. To fix this, we now instead will just return with a success if
we're already connected to the peer.
In this commit, we modify the main loop in `processChanPolicyUpdate` to
send updates for private channels directly to the remote peer via the
reliable message sender. This fixes a prior issue where the remote peer
wouldn't receive new updates as this method doesn't go through the
traditional path for channel updates.
In this commit, we add a new test case to exercise a recent bug fix to
ensure that we no longer broadcast private channel policy changes. Along
the way, a few helper functions were added to slim down the test to the
core logic compared to some of the existing tests in this package. In
the future, these new helper functions should be utilized more widely for
tests in this package in order to cut down on some of the duplicated
logic.
This commit removes the MarkEdgeZombie method from channeldb. This
method is currently not used in any live code paths in production, and
is only used in unit tests. However, incorrect usage of this method
could result in an edge being present in both the zombie and channel
indexes, which deviates from any state we would expect to see in
production. Removing the method will help mitigate the potential for
writing incorrect unit tests in the future, by forcing zombie edges to
be created via the relevant, production APIs, e.g. DeleteChannelEdge.
The existing unit tests that use this method have been modified to use
the DeleteChannelEdge instead. No regressions were discovered in the
process.
This commit modifies FetchChanInfos to skip any channels that are not in
the graph at the time of the call. Currently the entire call will fail
if the edge is not found, which stalls a gossip sync in the following
scenario:
1. Remote peer queries for a channel range
2. We return the set of channel ids in that range
3. A channel from that set is removed from the graph, e.g. via close.
4. Remote peer queries for removed edge, causing the query to fail.
To remedy this, we will now skip any edges that are not known in the
database at the time of the query. This prevents the syncer state
machines from halting, which otherwise could only be resolved by
disconnecting and reconnecting.
To avoid the ChainService still attempting to access the database when
it gets closed, re-order the stop order such that the Chainservice gets
stopped before closing the DB.
This commit adds logging of the reason to go to chain for a channel.
This can help users to find out the reason why a channels forced closed.
To get all go to chain reasons, an optimization to break early is
removed. This optimization was not significant, because the normal flow
already examined all htlcs. In the exceptional case where we need to go
to chain, it does not weigh up against logging all go to chain reasons.
This commit reduces the number of channels a syncer will request from
the remote node in a single QueryShortChanIDs message. The current size
is derived from the chunkSize, which is meant to signal the maximum
number of short chan ids that can fit in a single ReplyChannelRange
message. For EncodingSortedPlain, this number is 8000, and we use the
same number to dictate the size of the batch from the remote peer.
We modify this by introducing a separately configurable batchSize, so
that both can be tuned independently. The value is chosen to reduce the
amount of buffering the remote party will perform, only requiring them
queue 500 responses, as opposed to 8000. In turn, this reduces larges
spikes in allocation on the remote node at the expense of a few extra
round trips for the control messages. However, will be negligible since
the control messages are much smaller than the messages being returned.
In this commit, we fix an oversight that overrode the default ban
duration for neutrino to 5s from the default 24 hrs. We correct this by
raising the ban duration to 48hrs. In the future in order to ignore
these peers persistently, we'll need to start to persist our ban list.
However that's a change for another time.
This commit modifies FilterKnownChanIDs to skip edges that
we ourselves have deemed zombies. This prevents us from requesting
the updates from them, as this wastes bandwidth and cpu cycles.
This commits exposes the various parameters around going to chain and
accepting htlcs in a clear way.
In addition to this, it reverts those parameters to what they were
before the merge of commit d1076271456bdab1625ea6b52b93ca3e1bd9aed9.
This commit adds optional jitter to our initial reconnection to our
persistent peers. Currently we will attempt reconnections to all peers
simultaneously, which results in large amount of contention as the
number of channels a node has grows.
We resolve this by adding a randomized delay between 0 and 30 seconds
for all persistent peers. This spreads out the load and contention to
resources such as the database, read/write pools, and memory
allocations. On my node, this allows to start up with about 80% of the
memory burst compared to the all-at-once approach.
This also has a second-order effect in better distributing messages sent
at constant intervals, such as pings. This reduces the concurrent jobs
submitted to the read and write pools at any given time, resulting in
better reuse of read/write buffers and fewer bursty allocation and
garbage collection cycles.