This PR replaces the previously used edge and node ignore lists in path
finding by a probability based system. It modifies path finding so that
it not only compares routes on fee and time lock, but also takes route
success probability into account.
Allowing routes to be compared based on success probability is achieved
by introducing a 'virtual' cost of a payment attempt and using that to
translate probability into another cost factor.
TestRouterPaymentStateMachine tests that the router interacts as
expected with the ControlTower during a payment lifecycle, such that it
payment attempts are not sent twice to the switch, and results are
handled after a restart.
This commit makes the router use the ControlTower to drive the payment
life cycle state machine, to keep track of active payments across
restarts. This lets the router resume payments on startup, such that
their final results can be handled and stored when ready.
This commit moves the responsibility of generating a unique payment ID
from the switch to the router. This will make it easier for the router
to keep track of which HTLCs were successfully forwarded onto the
network, as it can query the switch for existing HTLCs as long as the
paymentIDs are kept.
The router is expected to maintain a map from paymentHash->paymentID,
such that they can be replayed on restart. This also lets the router
check the status of a sent payment after a restart, by querying the
switch for the paymentID in question.
In this commit, we add an additional heuristic when running with
AssumeChannelValid. Since AssumeChannelValid being present assumes that
we're not able to quickly determine whether channels are valid, we also
assume that any channels with the disabled bit set on both sides are
considered zombie. This should be relatively safe to do, since the
disabled bits are usually set when the channel is closed on-chain. In
the case that they aren't, we'll have to wait until both edges haven't
had a new update within two weeks to prune them.
We do this to ensure we don't prune too aggressively, as it's possible
that we've only received the channel announcement for a channel, but not
its accompanying channel updates.
In this commit, we extend the graph's FetchChannelEdgesByID and
HasChannelEdge methods to also check the zombie index whenever the edge
to be looked up doesn't exist within the edge index. We do this to
signal to callers that the edge is known, but only as a zombie, and the
only information that we have about the edge are the node public keys of
the two parties involved in the edge.
In the event that an edge does exist within the zombie index, we make
an additional check on edge policies to ensure they are not within the
router's pruning window, indicating that it is a fresh update.
Currently public keys are represented either as a 33-byte array (Vertex) or as a
btcec.PublicKey struct. The latter isn't useable as index into maps and
cannot be used easily in compares. Therefore the 33-byte array
representation is used predominantly throughout the code base.
This commit converts the argument types of source and target nodes for
path finding to Vertex. Path finding executes no crypto operations and
using Vertex simplifies the code.
Additionally, it prepares for the path finding source parameter to be
exposed over rpc in a follow up commit without requiring conversion back
and forth between Vertex and btcec.PublicKey.
This commit allows the execution of QueryRoutes to be controlled using
lists of black-listed edges and nodes. Any path returned will not pass
through the edges and/or nodes on the list.
Since the MaxHTLC field was recently added to the ChannelEdgePolicy struct,
and the Flags field was broken into ChannelFlags and MessageFlags, the
test edge policies should be updated accordingly.
This commit is a step to split the lnwallet package. It puts the Input
interface and implementations in a separate package along with all their
dependencies from lnwallet.
In this commit:
* we partition lnwire.ChanUpdateFlag into two (ChanUpdateChanFlags and
ChanUpdateMsgFlags), from a uint16 to a pair of uint8's
* we rename the ChannelUpdate.Flags to ChannelFlags and add an
additional MessageFlags field, which will be used to indicate the
presence of the optional field HtlcMaximumMsat within the ChannelUpdate.
* we partition ChannelEdgePolicy.Flags into message and channel flags.
This change corresponds to the partitioning of the ChannelUpdate's Flags
field into MessageFlags and ChannelFlags.
Co-authored-by: Johan T. Halseth <johanth@gmail.com>
In this commit we introduce pruning of channel edges instead of channels.
Channel failures apply to a single direction and it is unnecessarily
restricting to prune both directions.
In this commit the dependency of unmarshallRoute on edge policies being
available is removed. Edge policies may be unknown and reported as nil.
SendToRoute does not need the policies, but it does need pubkeys of the
route hops. In this commit, unmarshallRoute is modified so that it
takes the pubkeys from edgeInfo instead of channelEdgePolicy.
In addition to this, the route structure is simplified. No more connection
to the database at that point. Fees are determined based on incoming and
outgoing amounts.
- Extend SendRequest and QueryRoutesRequest protos
- newRoute function takes fee limit and cuts off routes that exceed it
- queryRoutes, payInvoice and sendPayment commands take the feeLimit inputs and pass them down to newRoute
- When no feeLimit is included, don't enforce any feeLimits at all (by setting feeLimit to maxValue)
In this commit, we introduce a new method to the channel router's config
struct: QueryBandwidth. This method allows the channel router to query
for the up-to-date available bandwidth of a particular link. In the case
that this link emanates from/to us, then we can query the switch to see
if the link is active (if not bandwidth is zero), and return the current
best estimate for the available bandwidth of the link. If the link,
isn't one of ours, then we can thread through the total maximal
capacity of the link.
In order to implement this, the missionControl struct will now query the
switch upon creation to obtain a fresh bandwidth snapshot. We take care
to do this in a distinct db transaction in order to now introduced a
circular waiting condition between the mutexes in bolt, and the channel
state machine.
The aim of this change is to reduce the number of unnecessary failures
during HTLC payment routing as we'll now skip any links that are
inactive, or just don't have enough bandwidth for the payment. Nodes
that have several hundred channels (all of which in various states of
activity and available bandwidth) should see a nice gain from this w.r.t
payment latency.
In this commit, we update the TestSendPaymentErrorPathPruning test to
reflect the new behavior w.r.t how we respond to UnknownPeer errors. In
this new test, we expect that we'll find alternative route in light of
us getting an UnknownPeer error "pointing" to our destination node.
In this commit, we modify our path finding algorithm to take an
additional set of edges that are currently not known to us that are
used to temporarily extend our graph with during a payment session.
These edges should assist the sender of a payment in successfully
constructing a path to the destination.
These edges should usually represent private channels, as they are not
publicly advertised to the network for routing.
In this commit, we modify the way we handle FeeInsufficientErrors to
more aggressively route around nodes that repeatedly return the same
error to us. This will ensure we skip older nodes on the network which
are running a buggier older version of lnd. Eventually most nodes will
upgrade to this new version, making this change less needed.
We also update the existing test to properly use a multi-hop route to
ensure that we route around the offending node.
In this commit, we address a number of edge cases that were unaccounted
for when responding to errors that can be sent back due to an HTLC
routing failure. Namely:
* We’ll no longer stop payment attempts if we’re unable to apply a
channel update, instead, we’ll log the error, prune the channel and
continue.
* We’ll no remember which channels were pruned due to insufficient
fee errors. If we ever get a repeat fee error from a channel, then we
prune it. This ensure that we don’t get stuck in a loop due to a node
continually advertising the same fees.
* We also correct an error in which node we’d prune due to a
temporary or permanent node failure. Before this commit, we would prune
the next node, when we should actually be pruning the node that sent us
the error.
Finally, we also add a new test to exercise the fee insufficient error
handling and channel pruning.
Fixes#865.
In this commit, we add a set of new methods to check the freshness of
an edge/node. This will allow callers to skip expensive validation in
the case that the router already knows of an item, or knows of a
fresher version of that time.
A set of tests have been added to ensure basic correctness of these new
methods.
In router_test FindRoutes is passing DefaultFinalCLTVDelta in place
where numPaths is expected. This commit passes a default numPaths for
function calls to FindRoutes so that final cltv delta are correctly
passed.
In this commit, we modify the edgeWeight function that’s used within
the findPath method to weight fees more heavily than the time lock
value at an edge. We do this in order to greedily prefer lower fees
during path finding. This is a simple stop gap in place of more complex
weighting parameters that will be investigated later.
We also modify the edge distance to use an int64 rather than a float.
Finally an additional test has been added in order to excessive this
new change. Before the commit, the test was failing as we preferred the
route with lower total time lock.
Before this commit, we wouldn’t properly set the TotalFees attribute.
As a result, our sorting algorithm at the end to select candidate
routes would simply maintain the time-lock order rather than also sort
by total fees. This commit fixes this issue and also allows the test
added in the prior commit to pass.
In this commit we add a new test to the set of unit tests for the
ChannelRouter: TestRouterChansClosedOfflinePruneGraph. This tests that
if channels are closed while the ChannelRouter is down, then upon
restart the channels are properly recognized as being closed.
In order to maintain the original essence of the test, we need to clear
the state of missionControl with each attempt, essentially advancing
time between each payment attempt.
This commit implements 2-week zombie channel pruning. This means that
every GraphPruneInterval (currently set to one hour), we’ll scan the
channel graph, marking any channels which haven’t had *both* edges
updated in 2 weeks as a “zombie”. During the second pass, all “zombie”
channel are removed from the channel graph all together.
Adding this functionality means we’ll ensure that we maintain a
“healthy” network view, which will cut down on the number of failed
HTLC routing attempts, and also reflect an active portion of the graph.
In this commit we modify the existing
TestSendPaymentRouteFailureFallback to use a non-critical error aside
from FailChannelDisabled. This is necessary as the behavior of the
current error handling can fail due to us sending in a nil error.
This commit introduces the requirement specified in BOLT#7,
where we ignore any node announcements for a specific node
if we yet haven't seen any channel announcements where this
node takes part. This is to prevent someone DoS-ing the
network with cheap node announcements. In the router this
is enforced by requiring a call to AddNode(node_id) to
be preceded by an AddEdge(edge_id) call, where node_id is
one of the nodes in edge_id.
Within the network, it's important that when an HTLC forwarding failure
occurs, the recipient is notified in a timely manner in order to ensure
that errors are graceful and not unknown. For that reason with
accordance to BOLT №4 onion failure obfuscation have been added.
This commit modifies the processing in the routing package eo new
announcements. Previously, if we cgot a cnew channel announcement but
didn’t yet know of the verses that the chanell connected, the
cnnounacment would be accepted. This behavior was eronoues as if the
channel were to be queried for, the DB query would fail as we would be
unable to retrieve the two nodes involved int he channel.
To avoid such an error case, we will now _reject_ any channel
announcements in which we don’t yet have a valid node announcement for
the connected nodes. This case has been inserted into the handling of
channel announcement, a new test has been added, and finally older
tests have also been updated to ensure that nodes are added to the
database _before_ the edge is.
This commit modifies the routing package to no longer use the
ChainNotifier for pruning the channel graph. Instead, we now use the
FilteredChainView interface to more (from the ChannelRouter’s PoV)
efficiently maintain the channel graph.
Rather than scanning the _entire_ block manually, we now rely on the
FilteredChainView to provide us with FilteredBlocks which include
_only_ the relevant transactions that we care about.
Originally we adding the edge without proof in order to able to use it
for payments path constrcution. This method will allow us to populate
the announcement proof after the exchange of the half proofs and
constrcutrion of full proof finished.
In this commit the routing package was divided on two separete one,
this was done because 'routing' package start take too much responsibily
on themself, so with following commit:
Routing pacakge:
Enitites:
* channeldb.ChannelEdge
* channeldb.ChannelPolicy
* channeldb.NodeLightning
Responsibilities:
* send topology notification
* find payment paths
* send payment
* apply topology changes to the graph
* prune graph
* validate that funding point exist and corresponds to given one
* to be the source of topology data
Discovery package:
Entities:
* lnwire.AnnounceSignature
* lnwire.ChannelAnnouncement
* lnwire.NodeAnnouncement
* lnwire.ChannelUpdateAnnouncement
Responsibilities:
* validate announcement signatures
* sync topology with newly connected peers
* handle the premature annoucement
* redirect topology changes to the router susbsystem
* broadcast announcement to the rest of the network
* exchange channel announcement proofs
Before that moment all that was in the 'routing' which is quite big for
one subsystem.
split
This commit adds payment route failure fallback to SendPayment. By
this, we mean that we now take all the possible routes found during
path finding and try them in series. Either a route fails and we move
onto the next one, or the route is successful and we terminate early.
With this commit, sending payments using lnd is now much more robust as
if there exists an eligible route with sufficient capacity, it will be
utilized.
This commit modifies the existing FindRoute method on the ChannelRouter
to now use the KSP implementation added in a prior commit.
This new method FindRoutes, is able to find all the possible paths
between a source and destination. The method takes all paths reported
by findPaths, and attempt to turn each of them into a route. A route
differs from a path in that is has complete time-lock and fee
information. Some paths may not be able to be turned into routes as
once fees are accounted for the have an insufficient flow. We then take
the routes, sort them by total fee (with time-lock being a
time-breaker), then return them in sorted order.