In this commit, we fix a bug that would cause a node with a hodl HTLC to
cancel back the HTLC upon restart if the invoice has been settled, but
the HTLC is still present on the commitment transaction. A fix for the
HTLC still being present (not triggering a new commitment) has been
fixed recently. However, for older nodes with a lingering HTLC, on
restart it would be failed back.
In this commit, we make the check stricter by only performing these
checks for HTLCs that are in the open state. This ensures that we'll
only check this constraints the first time around, before the HTLC has
been transitioned to the accepted state.
Assuming a graph size of 50,000 channels, an interval of 20 minutes
would cause nodes to consume about 600MB per month in bandwidth doing
these routine historical sync spot checks. In this commit, we increase
to one hour, which consumes about 300MB per month.
This commit adds a brief delay when sending our channel reestablish
message if the link contains a restored channel to ensure we first have
a stable connection. Sending the message will cause the remote peer to
force close the channel, which currently may not be resumed reliably if
the connection is being torn town simultaneously. This delay can be
removed after the force close is reliable, but in the meantime it
improves the reliability of successfully closing out the channel and
allows the `channel_backup_restore/restore_during_creation` to pass
reliably.
In this commit, we fix a slight bug in the existing implantation that
would cause no channel recovery if the recovering node was already
connected to their channel peer. As we need the link to be known at the
time of connection, if we're already connected, then the chan sync
message won't be sent again. By first disconnecting an existing peer, we
ensure that during the next connection (after the recovered channel is
added to the database), then the regular chan sync message exchange will
take place as expected. # Please enter the commit message for your
changes.
In this commit, we modify the starting link logic to always send the
chan sync message to the remote peer in a synchronous manner. Otherwise,
it's possible that we fail very quickly below this block, and don't ever
send the message to the remote peer.
In this commit, we fix a bug in the existing logic for ConnectPeer that
would cause an SCB restore to fail if we were already connected to the
peer. To fix this, we now instead will just return with a success if
we're already connected to the peer.
In this commit, we modify the main loop in `processChanPolicyUpdate` to
send updates for private channels directly to the remote peer via the
reliable message sender. This fixes a prior issue where the remote peer
wouldn't receive new updates as this method doesn't go through the
traditional path for channel updates.
In this commit, we add a new test case to exercise a recent bug fix to
ensure that we no longer broadcast private channel policy changes. Along
the way, a few helper functions were added to slim down the test to the
core logic compared to some of the existing tests in this package. In
the future, these new helper functions should be utilized more widely for
tests in this package in order to cut down on some of the duplicated
logic.
The idea of the batch counter is to increase it for commit tx updates,
so that if the commit tx cannot be updated immediately (revocation
window exhausted), the batch ticker makes sure it happens later.
The batch counter was increased for forwarded htlcs, but not for exit hop
resolutions.
This lead to the situation where the commitment tx would not be updated,
even though the htlc was settled locally. When no other changes happen
on the channel, the htlc eventually reaches its expiry and the channel
is force closed.
This commit removes the MarkEdgeZombie method from channeldb. This
method is currently not used in any live code paths in production, and
is only used in unit tests. However, incorrect usage of this method
could result in an edge being present in both the zombie and channel
indexes, which deviates from any state we would expect to see in
production. Removing the method will help mitigate the potential for
writing incorrect unit tests in the future, by forcing zombie edges to
be created via the relevant, production APIs, e.g. DeleteChannelEdge.
The existing unit tests that use this method have been modified to use
the DeleteChannelEdge instead. No regressions were discovered in the
process.
This commit modifies FetchChanInfos to skip any channels that are not in
the graph at the time of the call. Currently the entire call will fail
if the edge is not found, which stalls a gossip sync in the following
scenario:
1. Remote peer queries for a channel range
2. We return the set of channel ids in that range
3. A channel from that set is removed from the graph, e.g. via close.
4. Remote peer queries for removed edge, causing the query to fail.
To remedy this, we will now skip any edges that are not known in the
database at the time of the query. This prevents the syncer state
machines from halting, which otherwise could only be resolved by
disconnecting and reconnecting.
To avoid the ChainService still attempting to access the database when
it gets closed, re-order the stop order such that the Chainservice gets
stopped before closing the DB.
This commit adds logging of the reason to go to chain for a channel.
This can help users to find out the reason why a channels forced closed.
To get all go to chain reasons, an optimization to break early is
removed. This optimization was not significant, because the normal flow
already examined all htlcs. In the exceptional case where we need to go
to chain, it does not weigh up against logging all go to chain reasons.
This commit reduces the number of channels a syncer will request from
the remote node in a single QueryShortChanIDs message. The current size
is derived from the chunkSize, which is meant to signal the maximum
number of short chan ids that can fit in a single ReplyChannelRange
message. For EncodingSortedPlain, this number is 8000, and we use the
same number to dictate the size of the batch from the remote peer.
We modify this by introducing a separately configurable batchSize, so
that both can be tuned independently. The value is chosen to reduce the
amount of buffering the remote party will perform, only requiring them
queue 500 responses, as opposed to 8000. In turn, this reduces larges
spikes in allocation on the remote node at the expense of a few extra
round trips for the control messages. However, will be negligible since
the control messages are much smaller than the messages being returned.
In this commit, we fix an oversight that overrode the default ban
duration for neutrino to 5s from the default 24 hrs. We correct this by
raising the ban duration to 48hrs. In the future in order to ignore
these peers persistently, we'll need to start to persist our ban list.
However that's a change for another time.