In this commit, we modify the way we handle FeeInsufficientErrors to
more aggressively route around nodes that repeatedly return the same
error to us. This will ensure we skip older nodes on the network which
are running a buggier older version of lnd. Eventually most nodes will
upgrade to this new version, making this change less needed.
We also update the existing test to properly use a multi-hop route to
ensure that we route around the offending node.
In this commit, we add a new node to the current default test graph
that we use for our path finding tests. This new node connects roasbeef
to sophon via a new route with very high fees. With this new node and
the two channels it adds, we can properly test that we’ll route around
failures that we run into during payment routing.
In this commit, we add vertex pruning for any non-final CLTV error.
Before this commit, we assumed that any source of this error was due to
the local node setting the incorrect time lock. However, it’s been
recently noticed on main net that there’re a set of nodes that seem to
not be properly scanned to the chain. Without this patch, users aren’t
able to route successfully as atm, we’ll stop all path finding attempts
if we encounter this.
In this commit, we address a number of edge cases that were unaccounted
for when responding to errors that can be sent back due to an HTLC
routing failure. Namely:
* We’ll no longer stop payment attempts if we’re unable to apply a
channel update, instead, we’ll log the error, prune the channel and
continue.
* We’ll no remember which channels were pruned due to insufficient
fee errors. If we ever get a repeat fee error from a channel, then we
prune it. This ensure that we don’t get stuck in a loop due to a node
continually advertising the same fees.
* We also correct an error in which node we’d prune due to a
temporary or permanent node failure. Before this commit, we would prune
the next node, when we should actually be pruning the node that sent us
the error.
Finally, we also add a new test to exercise the fee insufficient error
handling and channel pruning.
Fixes#865.
In this commit, we add a new field to the LightningPayment struct:
PayAttemptTimeout. This new field allows the caller to control exactly
how much time should be spent attempting to route a payment to the
destination. The default value we’ll use is 60 seconds, but callers are
able to specify a diff value. Once the timeout has passed, we’ll
abandon th e payment attempt, and return an error back to the original
caller.
In this commit, we add a set of new methods to check the freshness of
an edge/node. This will allow callers to skip expensive validation in
the case that the router already knows of an item, or knows of a
fresher version of that time.
A set of tests have been added to ensure basic correctness of these new
methods.
In router_test FindRoutes is passing DefaultFinalCLTVDelta in place
where numPaths is expected. This commit passes a default numPaths for
function calls to FindRoutes so that final cltv delta are correctly
passed.
In this commit, we modify the edgeWeight function that’s used within
the findPath method to weight fees more heavily than the time lock
value at an edge. We do this in order to greedily prefer lower fees
during path finding. This is a simple stop gap in place of more complex
weighting parameters that will be investigated later.
We also modify the edge distance to use an int64 rather than a float.
Finally an additional test has been added in order to excessive this
new change. Before the commit, the test was failing as we preferred the
route with lower total time lock.
In this commit, we modify the caching structure to return a set of
cached routes for a request if the number of routes requested is less
than or equal to the number of cached of routes.
In this commit, we modify the findPaths method to take the max number
of routes to return. With this change, FindRoutes can eventually itself
also take a max number of routes in order to make the function useable
again.
In order to reduce high CPU utilization during the initial network view
sync, we slash down the total number of active in-flight jobs that can
be launched.
In this commit, we now account for a case where a node sends us a
FailPermanentChannelFailure during a payment attempt. Before this
commit, we wouldn’t properly prune the edge to avoid re-using it. We
remedy this by properly attempting to prune the edge if possible.
Future changes well send a FailPermanentChannelFailure in the case that
we ned to go on-chain for an outgoing HTLC, and cancel back the
incoming HTLC.
In this commit, we fix an existing bug that could cause lnd to crash if
we sent a payment, and the *destination* sent a temp channel failure
error message. When handling such a message, we’ll look in the nextHop
map to see which channel was *after* the node that sent the payment.
However, if the destination sends this error, then there’ll be no entry
in this map.
To address this case, we now add a prevHop map. If we attempt to lookup
a node in the nextHop map, and they don’t have an entry, then we’ll
consult the prevHop map.
We also update the set of tests to ensure that we’re properly setting
both the prevHop map and the nextHop map.
This commit adds synchronization around the processing
of multiple ChannelEdgePolicy updates for the same
channel ID at the same time.
This fixes a bug that could cause the database access
HasChannelEdge to be out of date when the goroutine
came to the point where it was calling UpdateEdgePolicy.
This happened because a second goroutine would have
called UpdateEdgePolicy in the meantime.
This bug was quite benign, as if this happened at
runtime, we would eventually get the ChannelEdgePolicy
we had lost again, either from a peer sending it to
us, or if we would fail a payment since we were using
outdated information. However, it would cause some of
the tests to flake, since losing routing information
made payments we expected to go through fail if this
happened.
This is fixed by introducing a new mutex type, that
when locking and unlocking takes an additional
(id uint64) parameter, keeping an internal map
tracking what ID's are currently locked and the
count of goroutines waiting for the mutex. This
ensure we can still process updates concurrently,
only avoiding updates with the same channel ID from
being run concurrently.
In this commit, we modify the pruning semantics of the missionControl
struct. Before this commit, on each payment attempt, we would fetch a
new graph pruned view each time. This served to instantly propagate any
detected failures to all outstanding payment attempts. However, this
meant that we could at times get stuck in a retry loop if sends take a
few second, then we may prune an edge, try another, then the original
edge is now unpruned.
To remedy this, we now introduce the concept of a paymentSession. The
session will start out as a snapshot of the latest graph prune view.
Any payment failures are now reported directly to the paymentSession
rather than missionControl. The rationale for this is that
edges/vertexes pruned as result of failures will never decay for a
local payment session, only for the global prune view. With this in
place, we ensure that our set of prune view only grows for a session.
Fixes#536.
Before this commit, we wouldn’t properly set the TotalFees attribute.
As a result, our sorting algorithm at the end to select candidate
routes would simply maintain the time-lock order rather than also sort
by total fees. This commit fixes this issue and also allows the test
added in the prior commit to pass.
This commit fixes an existing bug within the ChannelRouter. Prior to
this commit, if the chain view skipped blocks or for some reason we had
a gap in blocks delivered, then we would simply accept them. This had
the potential to cause us to miss on-chain channel closure events. To
remedy this, we won’t process any blocks whose heights aren’t
*strictly* increasing.
A longer term fix would be to have the ChainView take a block height,
and re-dispatch any notifications from that height to the current
height.
In this commit, we implement adherence of the disabled bit within a
ChannelUpdate during path finding. If a channel is marked as disabled,
then we won’t attempt to route through it. A test has been added to
exercise this new check.
In this commit, we update path finding to skip an edge if the amount
we’re trying to route through it is below the MinHTLC (in mSAT) value
for that node. We also add a new test to exercise this behavior. In
order for out test to work properly, we’ve modified the JSON to make
the edge to Goku have a higher min HTLC value.
In this commit, we modify the high value passed into UpdateFilter upon
restart. Before this commit, we would pass in the prune height, which
would cause a full rescan within the FilteredChainView if the best
height as > than the prune height. This was redundant as we would
shortly carry out a manual rescan in the method below. To fix this, we
now pass in the bestHeight, this isn’t an issue as the
syncGraphWithChain method will manually scan up to that best height.
In this commit, we add a new abstraction, the ValidationBarrier. This
struct will be used to allow parallel validation of announcements
within notes AuthenticatedGossiper as well as the ChannelRouter.
Naively validating the announcement in parallel would run into issues
as it would be possible for validate an update announcement, before
validating the channel announcement itself. We solve this by creating a
waiting dependance using the ValidationBarrier to ensure that the
defendant jobs wait until their parents have been full validated.
In this commit we ensure that if this is the first time that the
ChannelRouter is starting, then we set the pruned height+hash to the
current best height. Otherwise, it’s possible that we attempt to update
the filter with a 0 prune height, which will restart a historical
rescan unnecessarily.
In this commit we ensure that we only update the filter, if we have a
non-zero chain view. Otherwise, a mini rescan may be kicked off
unnecessarily if we don’t yet know of any channels yet in the greater
graph.