This commit removes the MarkEdgeZombie method from channeldb. This
method is currently not used in any live code paths in production, and
is only used in unit tests. However, incorrect usage of this method
could result in an edge being present in both the zombie and channel
indexes, which deviates from any state we would expect to see in
production. Removing the method will help mitigate the potential for
writing incorrect unit tests in the future, by forcing zombie edges to
be created via the relevant, production APIs, e.g. DeleteChannelEdge.
The existing unit tests that use this method have been modified to use
the DeleteChannelEdge instead. No regressions were discovered in the
process.
This commit modifies FetchChanInfos to skip any channels that are not in
the graph at the time of the call. Currently the entire call will fail
if the edge is not found, which stalls a gossip sync in the following
scenario:
1. Remote peer queries for a channel range
2. We return the set of channel ids in that range
3. A channel from that set is removed from the graph, e.g. via close.
4. Remote peer queries for removed edge, causing the query to fail.
To remedy this, we will now skip any edges that are not known in the
database at the time of the query. This prevents the syncer state
machines from halting, which otherwise could only be resolved by
disconnecting and reconnecting.
To avoid the ChainService still attempting to access the database when
it gets closed, re-order the stop order such that the Chainservice gets
stopped before closing the DB.
This commit adds logging of the reason to go to chain for a channel.
This can help users to find out the reason why a channels forced closed.
To get all go to chain reasons, an optimization to break early is
removed. This optimization was not significant, because the normal flow
already examined all htlcs. In the exceptional case where we need to go
to chain, it does not weigh up against logging all go to chain reasons.
This commit reduces the number of channels a syncer will request from
the remote node in a single QueryShortChanIDs message. The current size
is derived from the chunkSize, which is meant to signal the maximum
number of short chan ids that can fit in a single ReplyChannelRange
message. For EncodingSortedPlain, this number is 8000, and we use the
same number to dictate the size of the batch from the remote peer.
We modify this by introducing a separately configurable batchSize, so
that both can be tuned independently. The value is chosen to reduce the
amount of buffering the remote party will perform, only requiring them
queue 500 responses, as opposed to 8000. In turn, this reduces larges
spikes in allocation on the remote node at the expense of a few extra
round trips for the control messages. However, will be negligible since
the control messages are much smaller than the messages being returned.
In this commit, we fix an oversight that overrode the default ban
duration for neutrino to 5s from the default 24 hrs. We correct this by
raising the ban duration to 48hrs. In the future in order to ignore
these peers persistently, we'll need to start to persist our ban list.
However that's a change for another time.
This commit modifies FilterKnownChanIDs to skip edges that
we ourselves have deemed zombies. This prevents us from requesting
the updates from them, as this wastes bandwidth and cpu cycles.
This commits exposes the various parameters around going to chain and
accepting htlcs in a clear way.
In addition to this, it reverts those parameters to what they were
before the merge of commit d1076271456bdab1625ea6b52b93ca3e1bd9aed9.
This commit adds optional jitter to our initial reconnection to our
persistent peers. Currently we will attempt reconnections to all peers
simultaneously, which results in large amount of contention as the
number of channels a node has grows.
We resolve this by adding a randomized delay between 0 and 30 seconds
for all persistent peers. This spreads out the load and contention to
resources such as the database, read/write pools, and memory
allocations. On my node, this allows to start up with about 80% of the
memory burst compared to the all-at-once approach.
This also has a second-order effect in better distributing messages sent
at constant intervals, such as pings. This reduces the concurrent jobs
submitted to the read and write pools at any given time, resulting in
better reuse of read/write buffers and fewer bursty allocation and
garbage collection cycles.
This commit restructures the creation of various tls related object. It
also fixes a bug where wildcard IP addresses where only instantiated for
the main RPC server and not the WalletUnlocker service.
This prevents a panic during test failure due to a child test calling
FailNow on a parent test context. The sub tests now capture the
testing.T object provided in each closure as opposed to ignoring it and
using the parent context.
This commit increases the default worker timeout currently backing the
read and write pools. This allows the read and write pools to sustain
regular bursty traffic such as ping/pong without releasing their buffers
back to the underlying gc queue. In the future, jitter can be added to
our ping and/or gossip messages to reduce the concurrent usage of read
and write pools, which will make this change even more effective.
In this commit, we address a bug where we'd attempt to replace the
stale active syncer when it transitioned to a passive syncer. This
replacement logic is only intended to happen when the active syncer
disconnects, as rotateActiveSyncerCandidate chooses and queues its own
replacement.
As required by the spec:
> SHOULD send all gossip messages whose timestamp is greater or equal to
first_timestamp, and less than first_timestamp plus timestamp_range.