In this commit, we add vertex pruning for any non-final CLTV error.
Before this commit, we assumed that any source of this error was due to
the local node setting the incorrect time lock. However, it’s been
recently noticed on main net that there’re a set of nodes that seem to
not be properly scanned to the chain. Without this patch, users aren’t
able to route successfully as atm, we’ll stop all path finding attempts
if we encounter this.
In this commit, we address a number of edge cases that were unaccounted
for when responding to errors that can be sent back due to an HTLC
routing failure. Namely:
* We’ll no longer stop payment attempts if we’re unable to apply a
channel update, instead, we’ll log the error, prune the channel and
continue.
* We’ll no remember which channels were pruned due to insufficient
fee errors. If we ever get a repeat fee error from a channel, then we
prune it. This ensure that we don’t get stuck in a loop due to a node
continually advertising the same fees.
* We also correct an error in which node we’d prune due to a
temporary or permanent node failure. Before this commit, we would prune
the next node, when we should actually be pruning the node that sent us
the error.
Finally, we also add a new test to exercise the fee insufficient error
handling and channel pruning.
Fixes#865.
In this commit, we add a new field to the LightningPayment struct:
PayAttemptTimeout. This new field allows the caller to control exactly
how much time should be spent attempting to route a payment to the
destination. The default value we’ll use is 60 seconds, but callers are
able to specify a diff value. Once the timeout has passed, we’ll
abandon th e payment attempt, and return an error back to the original
caller.
In this commit, we add a set of new methods to check the freshness of
an edge/node. This will allow callers to skip expensive validation in
the case that the router already knows of an item, or knows of a
fresher version of that time.
A set of tests have been added to ensure basic correctness of these new
methods.
In this commit, we modify the caching structure to return a set of
cached routes for a request if the number of routes requested is less
than or equal to the number of cached of routes.
In order to reduce high CPU utilization during the initial network view
sync, we slash down the total number of active in-flight jobs that can
be launched.
In this commit, we now account for a case where a node sends us a
FailPermanentChannelFailure during a payment attempt. Before this
commit, we wouldn’t properly prune the edge to avoid re-using it. We
remedy this by properly attempting to prune the edge if possible.
Future changes well send a FailPermanentChannelFailure in the case that
we ned to go on-chain for an outgoing HTLC, and cancel back the
incoming HTLC.
In this commit, we fix an existing bug that could cause lnd to crash if
we sent a payment, and the *destination* sent a temp channel failure
error message. When handling such a message, we’ll look in the nextHop
map to see which channel was *after* the node that sent the payment.
However, if the destination sends this error, then there’ll be no entry
in this map.
To address this case, we now add a prevHop map. If we attempt to lookup
a node in the nextHop map, and they don’t have an entry, then we’ll
consult the prevHop map.
We also update the set of tests to ensure that we’re properly setting
both the prevHop map and the nextHop map.
This commit adds synchronization around the processing
of multiple ChannelEdgePolicy updates for the same
channel ID at the same time.
This fixes a bug that could cause the database access
HasChannelEdge to be out of date when the goroutine
came to the point where it was calling UpdateEdgePolicy.
This happened because a second goroutine would have
called UpdateEdgePolicy in the meantime.
This bug was quite benign, as if this happened at
runtime, we would eventually get the ChannelEdgePolicy
we had lost again, either from a peer sending it to
us, or if we would fail a payment since we were using
outdated information. However, it would cause some of
the tests to flake, since losing routing information
made payments we expected to go through fail if this
happened.
This is fixed by introducing a new mutex type, that
when locking and unlocking takes an additional
(id uint64) parameter, keeping an internal map
tracking what ID's are currently locked and the
count of goroutines waiting for the mutex. This
ensure we can still process updates concurrently,
only avoiding updates with the same channel ID from
being run concurrently.
In this commit, we modify the pruning semantics of the missionControl
struct. Before this commit, on each payment attempt, we would fetch a
new graph pruned view each time. This served to instantly propagate any
detected failures to all outstanding payment attempts. However, this
meant that we could at times get stuck in a retry loop if sends take a
few second, then we may prune an edge, try another, then the original
edge is now unpruned.
To remedy this, we now introduce the concept of a paymentSession. The
session will start out as a snapshot of the latest graph prune view.
Any payment failures are now reported directly to the paymentSession
rather than missionControl. The rationale for this is that
edges/vertexes pruned as result of failures will never decay for a
local payment session, only for the global prune view. With this in
place, we ensure that our set of prune view only grows for a session.
Fixes#536.
This commit fixes an existing bug within the ChannelRouter. Prior to
this commit, if the chain view skipped blocks or for some reason we had
a gap in blocks delivered, then we would simply accept them. This had
the potential to cause us to miss on-chain channel closure events. To
remedy this, we won’t process any blocks whose heights aren’t
*strictly* increasing.
A longer term fix would be to have the ChainView take a block height,
and re-dispatch any notifications from that height to the current
height.
In this commit, we modify the high value passed into UpdateFilter upon
restart. Before this commit, we would pass in the prune height, which
would cause a full rescan within the FilteredChainView if the best
height as > than the prune height. This was redundant as we would
shortly carry out a manual rescan in the method below. To fix this, we
now pass in the bestHeight, this isn’t an issue as the
syncGraphWithChain method will manually scan up to that best height.
In this commit we ensure that if this is the first time that the
ChannelRouter is starting, then we set the pruned height+hash to the
current best height. Otherwise, it’s possible that we attempt to update
the filter with a 0 prune height, which will restart a historical
rescan unnecessarily.
In this commit we ensure that we only update the filter, if we have a
non-zero chain view. Otherwise, a mini rescan may be kicked off
unnecessarily if we don’t yet know of any channels yet in the greater
graph.
For Part 1 of Issue #275. Create isolated private struct in
networkHandler goroutine that will de-duplicate
announcements added to the batch. The struct contains maps
for each of channel announcements, channel updates, and
node announcements to keep track of unique announcements.
The struct has a Reset method to reset stored announcements, an
AddMsg(lnwire.Message) method to add a new message to the current
batch, and a Batch method to return the set of de-duplicated
announcements.
Also fix a few minor typos.
This commit alters the behavior of the router's logic on
startup, ensuring that the chain view is filtered using
the router's latest prune height. Before, the chain was
filtered using the bestHeight variable, which was
uninitialized, benignly forcing a rescan from genesis.
In tracking down this, we realized that we should
actually be using the prune height, as this is
representative of the channel view loaded from disk.
The best height/hash are now only used during
startup to determine if we are out of sync.
In this commit we fix an existing bug within the ChannelRouter. Before
this commit, we would sync our graph prune state, *then* update the
cain filter. This is incorrect as the blocks we manually pruned may
have included channel closing transactions. As a result, we would miss
the pruning of a set of channels, and assume that they were still
active.
In this commit, we fix this by reversing the order: we first update the
chain filter and THEN sync the channel graph.
In this commit we fix a slight bug within the existing SendPayment loop
which would cause the wrong error to be returned to users. Prior to
this commit, if we received an update identical to what we were already
aware of, then that error would be returned rather than the
ForwardingError that encapsulated this update.
In this commit with remedy this by properly returning the exact error.
Partially fixes#391.
In this commit we restore the in memory ChannelRouter as we’ll no
dynamically set the ChannelRouter’s pointer within he spec path finding
test example.
In this commit, we’ll now optionally allow the user to pass in the CLTV
delta value specified by the recipient a payment. If the value isn’t
specified, then we’ll use the current global default for the payment.
In this commit, we modify the FindRoutes method to pass in the CLTV
expiry for the final hop. If the value isn’t passed in, then we’ll use
the current global default value in place.
In this commit, we’ve removed the selfNode attribute from memory, as
the set of new tests we’ll write, will depend on us being able to
switch the source node dynamically from the database itself.
In this commit, from the PoV of the SendPayment method we now delegate
all path finding+verification to missionControl. This change doesn’t
materially affect anything, it simply expands the abstraction to make
way for future features that more heavily utilize mission control.
In this commit we modify the SendPayment loop to optimize for
time-to-first-payment-success-or-failure. The prior logic would first
attempt to find at least 100 routes to the destination, then
iteratively prune them away as errors were encountered. In this commit,
we modify this approach to instead take a lazy approach: we first find
the current “best” path, attempt to send to that, and if an error
occurs we prune a section of the graph by reporting to missionControl,
then continue.
With this new approach, if the first known path has sufficient
capacity, and is available, then the payment speed is greatly improved
from the PoV of users. Additionally, we avoid the excessive computation
of crawling most of the graph in the k-shortest paths loop. With the
decay on missionControl, all routes will now feed information into the
central knowledge hung, allowing all payments to iteratively find out
the inactive portions of the payment graph.
This commit modifies the path finding logic such that all path finding
is done inside a _single_ database transaction. With this change, we
ensure that we don’t end up possibly creating hundreds of database
transactions slowing down the path finding and payment sending process
all together.
This commit adds basic route pruning in response to HTLC onion errors.
With this new change, the router will now prune routes in response to
HTLC errors, which will reduce the time to payment success, and also
avoid a bunch of unnecessary network traffic.
We now respond to two errors lnwire.FailTemporaryChannelFailure and
lnwire.FailUnknownNextPeer. In response to the first error, we’ll prune
all routes that contain the channel which was unable to be routed over.
In response to the second error we’ll prune all routes that contain the
node which couldn’t be found.
In this commit we modify the newRoute function to also add the source
node to the nextHopMap index. With this addition the indexes will now
allow the router to react based on failures that occur during the
_first_ hop, meaning the channel directly attached to the source node.
This commit implements 2-week zombie channel pruning. This means that
every GraphPruneInterval (currently set to one hour), we’ll scan the
channel graph, marking any channels which haven’t had *both* edges
updated in 2 weeks as a “zombie”. During the second pass, all “zombie”
channel are removed from the channel graph all together.
Adding this functionality means we’ll ensure that we maintain a
“healthy” network view, which will cut down on the number of failed
HTLC routing attempts, and also reflect an active portion of the graph.
Use binary.Read/Write in functions to serialize and deserialize
channel close summary and HTLC boolean data, as well as in
methods to put and fetch channel funding info. Remove lnd
implementations of readBool and writeBool as they are no
longer needed. Also fix a few minor typos.
Use sort.Slice in FindRoutes function in routing/router.go, as part
of the move to use new language features. Remove sortableRoutes type
wrapper for slice of Routes since it is no longer needed to sort routes.
This commit modifies the way we currently interpret errors when sending
payments via the SendToSwitch method. We split the errors into two
broad sections: critical errors which cause us to abandon the payment
dispatch all together, and errors which are transient meaning we should
continue trying to remainder of the returned routes.
Note that we haven’t yet properly implemented all the necessary
measures such as filtering edges that are detected as being temporarily
inactive, etc.
This change should correct erroneous behavior such as continuing to try
all available routes in the face of an invalid payment hash error and
the like.
This commit modifies the way we do path caching. Rather than only
caching within SendPayment, we now cache routes within FindRoutes. This
is more natural as SendPayment eventually calls FindRoute. As a result
of this commit, queries to FindRoute are now properly cached, speeding
up applications which are focused on graph visualization or querying
rather than sending payments.
This commit fixes an oversight in the path finding code when converting
a path into a route. Currently, for the last hop, we’d emplace the
expiry delta of the last hop within the per-hop payload. This was left
over from a prior version of the specification.
To fix this, we’ll now emplace the _absolute_ final HTLC expiry with
the payload, such that, the final hop that verify that the HTLC has not
been tampered with in flight.
This commit makes the routing cache invalidation a bit more aggressive.
We now invalidate the cache on each new block as the routes in the
cache are based on the current block height. Using the cached items may
cause our routes to fail due to them having time locks which have
already expired.
This commit introduces the requirement specified in BOLT#7,
where we ignore any node announcements for a specific node
if we yet haven't seen any channel announcements where this
node takes part. This is to prevent someone DoS-ing the
network with cheap node announcements. In the router this
is enforced by requiring a call to AddNode(node_id) to
be preceded by an AddEdge(edge_id) call, where node_id is
one of the nodes in edge_id.
Within the network, it's important that when an HTLC forwarding failure
occurs, the recipient is notified in a timely manner in order to ensure
that errors are graceful and not unknown. For that reason with
accordance to BOLT №4 onion failure obfuscation have been added.
This commit fixes a send on closed channel panic by adding additional
synchronization when cancelling the notifications for a particular
topology client. We now ensure that all goroutines belonging to a
particular topology client exit fully before we close the notification
channel in order to avoid a panic.
This commit replaces the hard-coded 5000 satoshi fees with calls to the
FeeEstimator interface. This should provide a way to cleanly plug in
additional fee calculation algorithms in the future. This change
affected quite a few tests. When possible, the tests were changed to
assert amounts sent rather than balances so that fees wouldn't need to
be taken into account. There were several tests for which this wasn't
possible, so calls to the static fee calculator were made.
This commit fixes a panic due to a send on a closed channel that could
possibly occur depending on the order of channel closes when a client
goes to cancel a topology notification client.
Previously we closed the ntfnChan first, this would possible result in
a panic as the goroutine may have succeeded on a send at the same time
the channel was closed. Instead, we now close the `exit` channel first
which is meant to be a signal to the goroutine that the client has been
canceled.
This commit modifies the processing in the routing package eo new
announcements. Previously, if we cgot a cnew channel announcement but
didn’t yet know of the verses that the chanell connected, the
cnnounacment would be accepted. This behavior was eronoues as if the
channel were to be queried for, the DB query would fail as we would be
unable to retrieve the two nodes involved int he channel.
To avoid such an error case, we will now _reject_ any channel
announcements in which we don’t yet have a valid node announcement for
the connected nodes. This case has been inserted into the handling of
channel announcement, a new test has been added, and finally older
tests have also been updated to ensure that nodes are added to the
database _before_ the edge is.
This commit modifies the routing package to no longer use the
ChainNotifier for pruning the channel graph. Instead, we now use the
FilteredChainView interface to more (from the ChannelRouter’s PoV)
efficiently maintain the channel graph.
Rather than scanning the _entire_ block manually, we now rely on the
FilteredChainView to provide us with FilteredBlocks which include
_only_ the relevant transactions that we care about.
This commit implements some minor coding style, commenting and naming
clean up after the recent major discovery service was merged into the
codebase.
Highlights of the naming changes:
* fundingManager.SendToDiscovery -> SendAnnouncement
* discovery.Discovery -> discovery.AuthenticatedGossiper
The rest of the changes consist primary of grammar fixes and proper
column wrapping.
Originally we adding the edge without proof in order to able to use it
for payments path constrcution. This method will allow us to populate
the announcement proof after the exchange of the half proofs and
constrcutrion of full proof finished.
Change the name of fields of messages which are belong to the discovery
subsystem in a such way so they were the same with the names that are
defined in the specification.
In this commit the routing package was divided on two separete one,
this was done because 'routing' package start take too much responsibily
on themself, so with following commit:
Routing pacakge:
Enitites:
* channeldb.ChannelEdge
* channeldb.ChannelPolicy
* channeldb.NodeLightning
Responsibilities:
* send topology notification
* find payment paths
* send payment
* apply topology changes to the graph
* prune graph
* validate that funding point exist and corresponds to given one
* to be the source of topology data
Discovery package:
Entities:
* lnwire.AnnounceSignature
* lnwire.ChannelAnnouncement
* lnwire.NodeAnnouncement
* lnwire.ChannelUpdateAnnouncement
Responsibilities:
* validate announcement signatures
* sync topology with newly connected peers
* handle the premature annoucement
* redirect topology changes to the router susbsystem
* broadcast announcement to the rest of the network
* exchange channel announcement proofs
Before that moment all that was in the 'routing' which is quite big for
one subsystem.
split
This commit modifies address handling in the NodeAnnouncement struct,
switching from net.TCPAddr to []net.Addr. This enables more flexible
address handling with multiple types and multiple addresses for each
node. This commit addresses the first part of issue #131 .
This commit fixes the issue of broken builds in versions other than go
1.7.5 by sorting according to the sort.Interface interface rather than
the newly available sort.Slice function.
This commit adds caching to our route finding. Caching is done on a
tuple-basis mapping a (dest, amt) pair to a previously calculated set
of shortest paths. The cache invalidated on two occasions: when a block
closes a set of transactions, or we received a new channel update or
channel announcement message.
With this change, payments are now snappier from the PoV of an
application developer as we no longer need to do a series of disk-seeks
before we dispatch each payment.
This commit adds payment route failure fallback to SendPayment. By
this, we mean that we now take all the possible routes found during
path finding and try them in series. Either a route fails and we move
onto the next one, or the route is successful and we terminate early.
With this commit, sending payments using lnd is now much more robust as
if there exists an eligible route with sufficient capacity, it will be
utilized.
This commit modifies the existing FindRoute method on the ChannelRouter
to now use the KSP implementation added in a prior commit.
This new method FindRoutes, is able to find all the possible paths
between a source and destination. The method takes all paths reported
by findPaths, and attempt to turn each of them into a route. A route
differs from a path in that is has complete time-lock and fee
information. Some paths may not be able to be turned into routes as
once fees are accounted for the have an insufficient flow. We then take
the routes, sort them by total fee (with time-lock being a
time-breaker), then return them in sorted order.
This commit modifies the findRoute function to decouple the
validation+creation of a route, from the path finding algorithm itself.
When we say “route”, we mean the full payment route complete with
time-lock and fee information. When we say “path” we simple mean an
ordered set of channel edges from one node to another target node.
With this commit we can now perform path finding independent of route
creation which will be needed in the up coming refactor to implement a
new modified k-shortest paths algorithm.
This commit slightly modified findRoute to accept the node which should
be used as the starting point in our path finding algorithm. With this
change, as we move to a k-shortest paths algorithm this modification
will be needed as all of our path finding attempts won’t always
originate from a the same starting point.
This commit adds some new functionality to the channel router: the
ability to dispatch notification to registered clients upon either a
channel being closed, a new node appearing, or an exiting client being
updated or opened for the first time.
With this change, the integration tests will now be able to eliminate
most of the sleep as we gain a new syntonization point into the
propagation of information within the test network. Additionally, this
also paves the way for client side software to dynamically visualize
the channel graph in real-time as nodes+channels are updated.