In this commit, we extend the graph's FetchChannelEdgesByID and
HasChannelEdge methods to also check the zombie index whenever the edge
to be looked up doesn't exist within the edge index. We do this to
signal to callers that the edge is known, but only as a zombie, and the
only information that we have about the edge are the node public keys of
the two parties involved in the edge.
In the event that an edge does exist within the zombie index, we make
an additional check on edge policies to ensure they are not within the
router's pruning window, indicating that it is a fresh update.
We mark the edges as zombies when pruning them to ensure we don't
attempt to reprocess them later on. This also applies to channels that
have been removed from the graph due to being stale.
In this commit, we add a zombie edge index to the database. This allows
us to quickly determine across restarts whether we're attempting to
process an edge we've previously deemed as zombie.
Bumps the default read and write handlers to be well
above the average number of peers a node has. Since
the worker counts specify only a maximum number of
concurrent read/write workers, it is expected that
the actual usage would converge to the requirements
of the node anyway. However, in preparation for a
major release, this is a conservative measure to
ensure that the default values aren't too low and
improve network instability.
In this commit, we add a 5 minute idle timer to
the write handler. After catching the write
timeouts, it's been observed that some connections
have trouble reading a message for several hours.
This typically points to a deeper issue w/ the peer
or, e.g. the remote peer switched networks. This now
mirrors the idle timeout used in the read handler,
such that we will disconnect a peer if we are unable
to send or receive a message from the peer after 5
minutes.
We also modify the readHandler to drain its
idleTimer's channel in the even that the timer had
already fired, but we successfully sent the message.
This commit reduces the peer's write timeout to 5s.
Now that the peer catches write timeouts and doesn't
disconnect, this will ensure we spend less time blocking
in the write pool in case others also need to access the
workers concurrently. Slower peers will now only block
for 5s, after every reattempt w/ exponential backoff.
This commit modifies the writeHandler to catch timeout
errors, and retry writes to the socket after a small
backoff, which increases exponentially from 5s to 1m.
With the growing channel graph size, some lower-powered
devices can be slow to pull messages off the wire during
validation. The current behavior will cause us to
disconnect the peer, and resend all of the messages that
the remote peer is slow to validate. Catching the timeout
helps in preventing such expensive reconnection cycles,
especially as the network continues to grow.
This is also a preliminary step to reducing the
write timeout constant. This will allow concurrent usage
of the write pools w/out devoting excessive amounts of
time blocking the pool for slow peers.
Commit ca28c0b removed the dep section,
so it shouldn't be mentioned as a target in other targets.
Also, Glide doesn't seem to be used anymore as well.
This is a better alternative than retrieving it from the graph as it's
possible that the channel is pruned from it doing to not having any
updates within the past two weeks.
This commit modifies the breach arbiter to monitor
all breached inputs for spends and remove them from
the set of inputs to be swept if they are spent to
a terminal state. Prior, we would only watch for
spends on htlcs that may need to transition and
sweep the corresponding second-level htlc.
With these changes, we will no monitor commitment
outputs for spends, as well as spends from the
second level htlcs themselves. If either of these
is detected, we remove them from the set of inputs
to sweep via the justice transaction because there
is nothing the breach arbiter can do.
This functionality will be needed when adding
watchtower support, as the breach arbiter must
detect the case when the tower sweeps on behalf of
the user and stop pursuing the sweep itself. In
addition, this now properly handles the potential
case where somehow the remote party is able to sweep
the their commitment or second-level htlc to their
wallet, and prevent the breach arbiter from trying
to sweep the outputs as it would now.
Note that in the latter event, the internal
accounting may still be incorrect, as it is assumed
that all breached funds return to the victim.
However, these issues will deferred and fixed at a
later date, as the more crucial aspect is that the
breach arbiter doesn't blow up as a result of towers
sweeping channels.
In this commit, we convert all the `Update` calls which are serial, to
use `Batch` calls which are optimistically batched together for
concurrent writers. This should increase performance slightly during the
initial graph sync, and also updates at tip as we can coalesce more of
these individual transactions into a single transaction.