Hop maps were used in a test to verify the population of the hop map
itself and further only in a single function (getFailedChannelID).
Rewrote that function and removed the hop maps completely.
There is the general assumption that channel edge policy nodes are
ordered such that the node1 pubkey is smaller than the key of node 2. In
the test graph, this assumption didn't hold. This commit fixes the test
graph and also adds a check to prevent this from happening again.
This commit adds a new test that checks that the bandwidth hints are
considered correclty for local channels, and that disable flags are
ignored in this case.
To decouple our own path finding from the graph state, we don't consider
the disable bit when attempting to use local channels. Instead the
bandwidth hints will be zero for local inactive channels.
We alos modify the unit test to check that the disable flag is ignored
for local edges.
Fixes the following issues:
- If the channel update of FailFeeInsufficient contains an invalid channel
update, it is not possible to properly add to the failed channels set.
- FailAmountBelowMinimum may apply a channel update, but does not retry.
- FailIncorrectCltvExpiry immediately prunes the vertex without
trying one more time.
In this commit, the logic for all three policy related errors is
aligned.
In this commit we add a check to HtlcSatifiesPolicy to verify that the
time lock for the outgoing htlc that is requested in the onion packet
isn't too far in the future.
Without this check, anyone could force an unreasonably long time lock on
the forwarding node.
In this commit the dependency of unmarshallRoute on edge policies being
available is removed. Edge policies may be unknown and reported as nil.
SendToRoute does not need the policies, but it does need pubkeys of the
route hops. In this commit, unmarshallRoute is modified so that it
takes the pubkeys from edgeInfo instead of channelEdgePolicy.
In addition to this, the route structure is simplified. No more connection
to the database at that point. Fees are determined based on incoming and
outgoing amounts.
Previously, gossiper was the only object that validated channel
updates. Because updates can also be received as part of a
failed payment session in the routing package, validation logic
needs to be available there too. Gossiper already depends on
routing and having routing call the validation logic inside
gossiper would be a circular dependency. Therefore the validation
was moved to routing.
We make sure to return an error other than ErrIgnored, as ErrIgnored is
expected to only be returned for updates where we already have the
necessary information in the database.
In case of a channel ID found in the rejectCache, there was a
possibility that we had rejected an invalid update for this channel
earlier, and when attempting to add the current update we wouldn't
distinguish the failure to add from an outdated/ignored update.
The commit ensures that for every channel, there will always
be two entries in the edges bucket. If the policy from one or
both ends of the channel is unknown, it is marked as such.
This allows efficient lookup of incoming edges. This is
required for backwards payment path finding.
In this commit, we introduce a nice optimization with regards to lnd's
interaction with a bitcoind backend. Within lnd, we currently have three
different subsystems responsible for watching the chain: chainntnfs,
lnwallet, and routing/chainview. Each of these subsystems has an active
RPC and ZMQ connection to the underlying bitcoind node. This would incur
a toll on the underlying bitcoind node and would cause us to miss ZMQ
events, which are crucial to lnd. We remedy this issue by sharing the
same connection to a bitcoind node between the different clients within
lnd.
In this commit, we modify the test to explitlcy give the neutrino
backend more time to catch up compared to the RPC backends. We do this
as a recent change has been made in the neutrino backend to wait for the
filter headers to finish syncing before proceeding with the rescan
itself. As a result, we'll need to account for this in the test and
sleep enough to give the backend a chance to catch up.
In this commit, we ensure that the neutrino backend meets the target
interface, and also we update the API usage for the internal neutrino
rescan struct to use the new InputWithScript struct.
In this commit, we update the existing UpdateFilter method to take the
new channeldb.EdgePoint struct in place of the prior wire.OutPoint. We
must do this as the caller now typically has this type due to the
preparation to enable lnd to be able to be compatible with the new
neutrino protocol.
In this commit, we fix a slight race condition that can occur when we go
to add a shell node for a node announcement, but then right afterwards,
a new block arrives that causes us to prune an unconnected node. To
ensure this doesn't happen, we now add shell nodes within the same db
transaction as AddChannelEdge. This ensures that the state is fully
consistent and shell nodes will be added atomically along with the new
channel edge.
As a result of this change, we no longer need to add shell nodes within
the ChannelRouter, as the database will take care of this operation as
it should.
In this commit, we fix an existing bug that could at times lead to a
panic if a user manually crafts a route via SendToRoute, and that route
results in a payment error. The fix is simple: create the map even
though it won't be used in the sessions since the user is feeding the
router manual routes.
In this commit, we modify the granularity of the locking
around the filterMtx in the bitcoind chainview, such that
we only lock once per block connected or filter update.
Currently, we acquire and release the lock for every
update to the map.
We also fix a bug that would cause us to not fully remove
all previous outpoints spent by a txn when doing manual
filter, as we previously would only remove the first output
detected.
In this commit, we modify the granularity of the locking
around the filterMtx in the btcd chainview, such that we
only lock once per block connected or filter update.
Currently, we acquire and release the lock for every
update to the map.
We also fix a bug that would cause us to not fully remove
all previous outpoints spent by a txn when doing manual
filter, as we previously would only remove the first output
detected.
In this commit, we update the generateSphinxPacket to use newLogClosure
to delay the spew evaluation until log print time. Before this commit,
even if we weren't on the trace logging level, the spew call would
always be evaluated.
In this commit, a new weight function is introduced. This will create a
meaningful effect of time lock on route selection. Also, removes the
squaring of the fee term. This led to suboptimal routes.
Unit test added that covers the weight function and asserts that the
lowest fee route is indeed returned.
This comment extends the unit tests for NewRoute with checks
on the total time lock for a route as well as the expected time
lock values for every hop along the route.
This commit fixes the logic inside the newRoute function to
address the following problems:
- Fee calculation for a hop does not include the fee that needs
to be paid to the next hop.
- The incoming channel capacity "sanity" check does not include
the fee to be paid to the current hop.
In this commit, we fix the incorrect expiry values in the
spec_example.json test file. Many of the time locks were incorrect which
allowed bugs within the path finding logic related to CLTV deltas to go
un-detected.
In this commit, we fix an existing bug in the newRoute method. Before
this commit we would use the time lock delta of the current hop to
compute the outgoing time lock for the current hop. This is incorrect as
the time lock delta of the _outgoing_ hop should be used, as this is
what we're paying for "transit" on. This is a bug left over from when we
switched the meaning of the CLTV delta on the ChannelUpdate message
sometime last year.
The fix is simple: use the CLTV delta of the prior (later in the route)
hop.
- Extend SendRequest and QueryRoutesRequest protos
- newRoute function takes fee limit and cuts off routes that exceed it
- queryRoutes, payInvoice and sendPayment commands take the feeLimit inputs and pass them down to newRoute
- When no feeLimit is included, don't enforce any feeLimits at all (by setting feeLimit to maxValue)
In this commit, we modify the recent refactoring of the mission control
sub-system to overload the existing payment session, rather than create
a brand new one. This allows us to re-use more of the existing logic, and
also feedback into mission control the failures incurred by any user
selected routes.
In this commit, we introduce a new method to the channel router's config
struct: QueryBandwidth. This method allows the channel router to query
for the up-to-date available bandwidth of a particular link. In the case
that this link emanates from/to us, then we can query the switch to see
if the link is active (if not bandwidth is zero), and return the current
best estimate for the available bandwidth of the link. If the link,
isn't one of ours, then we can thread through the total maximal
capacity of the link.
In order to implement this, the missionControl struct will now query the
switch upon creation to obtain a fresh bandwidth snapshot. We take care
to do this in a distinct db transaction in order to now introduced a
circular waiting condition between the mutexes in bolt, and the channel
state machine.
The aim of this change is to reduce the number of unnecessary failures
during HTLC payment routing as we'll now skip any links that are
inactive, or just don't have enough bandwidth for the payment. Nodes
that have several hundred channels (all of which in various states of
activity and available bandwidth) should see a nice gain from this w.r.t
payment latency.
This commit alters the neutrino chainview such that it
caches the filter entries corresponding to watched
outpoints at the moment they are added to the filter.
Previously, we would rederive each filter entry when
reconstructing the relevant filter entries, which
would lead to unnecessary work on the gc. Now, each is
created at most once, and reused across subsequent
reconstructions.
Adds a new error ErrVBarrierShuttingDown that is returned
from WaitForDependants if the validation barrier's quit
chan is closed. This allows any blocked goroutines to
distinguish whether the dependent task has been completed,
or if validation should be aborted entirely.
This commit improves the shutdown of the router's
pending validation tasks, by ensuring the pending
tasks exit early if the validation barrier
receives a shutdown request.
Currently, any goroutines blocked by WaitForDependants
will continue execution after a shutdown is signaled.
This may lead to unnexpected behavior as the relation
between updates is no longer upheld. It also has the
side effect of slowing down shutdown, since we
continue to process the remaining updates.
To remedy this, WaitForDependants now returns an error
that signals if a shutdown was requested. The blocked
goroutines can exit early upon seeing this error,
without also signaling completion of their task to
the dependent tasks, which should will now properly
wait to read the validation barrier's quit signal.
In this commit, we update the TestSendPaymentErrorPathPruning test to
reflect the new behavior w.r.t how we respond to UnknownPeer errors. In
this new test, we expect that we'll find alternative route in light of
us getting an UnknownPeer error "pointing" to our destination node.
In this commit we fix an lingering bug in the Mission Control logic we
execute in response to the FailUnknownNextPeer error. Historically, we
would treat this as the _next_ node not being online. As a result, we
would then prune away the vertex from the current reachable graph all
together. It was recently realized, that this would at times be a bit
_tooo_ aggressive if the channel we attempt to route over was faulty,
down, or the incoming node had connectivity issues with the outgoing
node.
In light of this realization, we'll now instead only prune the _edge_
that we attempted to route over. This ensures that we'll continue to
explore the possible edges. Additionally, this guards us against failure
modes where nodes report FailUnknownNextPeer to other nodes in an
attempt to more closely control our retry logic.
This change is a stop gap on the path to a more intelligent set of
autopilot heuristics.
Fixes#1114.
In this commit, we modify our path finding algorithm to take an
additional set of edges that are currently not known to us that are
used to temporarily extend our graph with during a payment session.
These edges should assist the sender of a payment in successfully
constructing a path to the destination.
These edges should usually represent private channels, as they are not
publicly advertised to the network for routing.
In this commit, we introduce the ability for payment sessions to store
an additional set of edges that can be used to assist a payment in
successfully reaching its destination.
In this commit, we add a new field of routing hints to payments over the
Lightning Network. These routing hints can later be used within the path
finding algorithm in order to craft a path that will reach the
destination succesfully.
In this commit, we modify the way we handle FeeInsufficientErrors to
more aggressively route around nodes that repeatedly return the same
error to us. This will ensure we skip older nodes on the network which
are running a buggier older version of lnd. Eventually most nodes will
upgrade to this new version, making this change less needed.
We also update the existing test to properly use a multi-hop route to
ensure that we route around the offending node.
In this commit, we add a new node to the current default test graph
that we use for our path finding tests. This new node connects roasbeef
to sophon via a new route with very high fees. With this new node and
the two channels it adds, we can properly test that we’ll route around
failures that we run into during payment routing.
In this commit, we add vertex pruning for any non-final CLTV error.
Before this commit, we assumed that any source of this error was due to
the local node setting the incorrect time lock. However, it’s been
recently noticed on main net that there’re a set of nodes that seem to
not be properly scanned to the chain. Without this patch, users aren’t
able to route successfully as atm, we’ll stop all path finding attempts
if we encounter this.
In this commit, we address a number of edge cases that were unaccounted
for when responding to errors that can be sent back due to an HTLC
routing failure. Namely:
* We’ll no longer stop payment attempts if we’re unable to apply a
channel update, instead, we’ll log the error, prune the channel and
continue.
* We’ll no remember which channels were pruned due to insufficient
fee errors. If we ever get a repeat fee error from a channel, then we
prune it. This ensure that we don’t get stuck in a loop due to a node
continually advertising the same fees.
* We also correct an error in which node we’d prune due to a
temporary or permanent node failure. Before this commit, we would prune
the next node, when we should actually be pruning the node that sent us
the error.
Finally, we also add a new test to exercise the fee insufficient error
handling and channel pruning.
Fixes#865.
In this commit, we add a new field to the LightningPayment struct:
PayAttemptTimeout. This new field allows the caller to control exactly
how much time should be spent attempting to route a payment to the
destination. The default value we’ll use is 60 seconds, but callers are
able to specify a diff value. Once the timeout has passed, we’ll
abandon th e payment attempt, and return an error back to the original
caller.
In this commit, we add a set of new methods to check the freshness of
an edge/node. This will allow callers to skip expensive validation in
the case that the router already knows of an item, or knows of a
fresher version of that time.
A set of tests have been added to ensure basic correctness of these new
methods.
In router_test FindRoutes is passing DefaultFinalCLTVDelta in place
where numPaths is expected. This commit passes a default numPaths for
function calls to FindRoutes so that final cltv delta are correctly
passed.
In this commit, we modify the edgeWeight function that’s used within
the findPath method to weight fees more heavily than the time lock
value at an edge. We do this in order to greedily prefer lower fees
during path finding. This is a simple stop gap in place of more complex
weighting parameters that will be investigated later.
We also modify the edge distance to use an int64 rather than a float.
Finally an additional test has been added in order to excessive this
new change. Before the commit, the test was failing as we preferred the
route with lower total time lock.
In this commit, we modify the caching structure to return a set of
cached routes for a request if the number of routes requested is less
than or equal to the number of cached of routes.
In this commit, we modify the findPaths method to take the max number
of routes to return. With this change, FindRoutes can eventually itself
also take a max number of routes in order to make the function useable
again.
In order to reduce high CPU utilization during the initial network view
sync, we slash down the total number of active in-flight jobs that can
be launched.
In this commit, we now account for a case where a node sends us a
FailPermanentChannelFailure during a payment attempt. Before this
commit, we wouldn’t properly prune the edge to avoid re-using it. We
remedy this by properly attempting to prune the edge if possible.
Future changes well send a FailPermanentChannelFailure in the case that
we ned to go on-chain for an outgoing HTLC, and cancel back the
incoming HTLC.
In this commit, we fix an existing bug that could cause lnd to crash if
we sent a payment, and the *destination* sent a temp channel failure
error message. When handling such a message, we’ll look in the nextHop
map to see which channel was *after* the node that sent the payment.
However, if the destination sends this error, then there’ll be no entry
in this map.
To address this case, we now add a prevHop map. If we attempt to lookup
a node in the nextHop map, and they don’t have an entry, then we’ll
consult the prevHop map.
We also update the set of tests to ensure that we’re properly setting
both the prevHop map and the nextHop map.
This commit adds synchronization around the processing
of multiple ChannelEdgePolicy updates for the same
channel ID at the same time.
This fixes a bug that could cause the database access
HasChannelEdge to be out of date when the goroutine
came to the point where it was calling UpdateEdgePolicy.
This happened because a second goroutine would have
called UpdateEdgePolicy in the meantime.
This bug was quite benign, as if this happened at
runtime, we would eventually get the ChannelEdgePolicy
we had lost again, either from a peer sending it to
us, or if we would fail a payment since we were using
outdated information. However, it would cause some of
the tests to flake, since losing routing information
made payments we expected to go through fail if this
happened.
This is fixed by introducing a new mutex type, that
when locking and unlocking takes an additional
(id uint64) parameter, keeping an internal map
tracking what ID's are currently locked and the
count of goroutines waiting for the mutex. This
ensure we can still process updates concurrently,
only avoiding updates with the same channel ID from
being run concurrently.
In this commit, we modify the pruning semantics of the missionControl
struct. Before this commit, on each payment attempt, we would fetch a
new graph pruned view each time. This served to instantly propagate any
detected failures to all outstanding payment attempts. However, this
meant that we could at times get stuck in a retry loop if sends take a
few second, then we may prune an edge, try another, then the original
edge is now unpruned.
To remedy this, we now introduce the concept of a paymentSession. The
session will start out as a snapshot of the latest graph prune view.
Any payment failures are now reported directly to the paymentSession
rather than missionControl. The rationale for this is that
edges/vertexes pruned as result of failures will never decay for a
local payment session, only for the global prune view. With this in
place, we ensure that our set of prune view only grows for a session.
Fixes#536.
Before this commit, we wouldn’t properly set the TotalFees attribute.
As a result, our sorting algorithm at the end to select candidate
routes would simply maintain the time-lock order rather than also sort
by total fees. This commit fixes this issue and also allows the test
added in the prior commit to pass.
This commit fixes an existing bug within the ChannelRouter. Prior to
this commit, if the chain view skipped blocks or for some reason we had
a gap in blocks delivered, then we would simply accept them. This had
the potential to cause us to miss on-chain channel closure events. To
remedy this, we won’t process any blocks whose heights aren’t
*strictly* increasing.
A longer term fix would be to have the ChainView take a block height,
and re-dispatch any notifications from that height to the current
height.
In this commit, we implement adherence of the disabled bit within a
ChannelUpdate during path finding. If a channel is marked as disabled,
then we won’t attempt to route through it. A test has been added to
exercise this new check.
In this commit, we update path finding to skip an edge if the amount
we’re trying to route through it is below the MinHTLC (in mSAT) value
for that node. We also add a new test to exercise this behavior. In
order for out test to work properly, we’ve modified the JSON to make
the edge to Goku have a higher min HTLC value.
In this commit, we modify the high value passed into UpdateFilter upon
restart. Before this commit, we would pass in the prune height, which
would cause a full rescan within the FilteredChainView if the best
height as > than the prune height. This was redundant as we would
shortly carry out a manual rescan in the method below. To fix this, we
now pass in the bestHeight, this isn’t an issue as the
syncGraphWithChain method will manually scan up to that best height.
In this commit, we add a new abstraction, the ValidationBarrier. This
struct will be used to allow parallel validation of announcements
within notes AuthenticatedGossiper as well as the ChannelRouter.
Naively validating the announcement in parallel would run into issues
as it would be possible for validate an update announcement, before
validating the channel announcement itself. We solve this by creating a
waiting dependance using the ValidationBarrier to ensure that the
defendant jobs wait until their parents have been full validated.
In this commit we ensure that if this is the first time that the
ChannelRouter is starting, then we set the pruned height+hash to the
current best height. Otherwise, it’s possible that we attempt to update
the filter with a 0 prune height, which will restart a historical
rescan unnecessarily.
In this commit we ensure that we only update the filter, if we have a
non-zero chain view. Otherwise, a mini rescan may be kicked off
unnecessarily if we don’t yet know of any channels yet in the greater
graph.
Run go fmt so config file is formatted correctly. Also rename
newVertex to NewVertex in pathfind_test and notifications_test
as it is now exported from the routing package.
For Part 1 of Issue #275. Create isolated private struct in
networkHandler goroutine that will de-duplicate
announcements added to the batch. The struct contains maps
for each of channel announcements, channel updates, and
node announcements to keep track of unique announcements.
The struct has a Reset method to reset stored announcements, an
AddMsg(lnwire.Message) method to add a new message to the current
batch, and a Batch method to return the set of de-duplicated
announcements.
Also fix a few minor typos.
This commit alters the behavior of the router's logic on
startup, ensuring that the chain view is filtered using
the router's latest prune height. Before, the chain was
filtered using the bestHeight variable, which was
uninitialized, benignly forcing a rescan from genesis.
In tracking down this, we realized that we should
actually be using the prune height, as this is
representative of the channel view loaded from disk.
The best height/hash are now only used during
startup to determine if we are out of sync.
In this commit we fix an existing bug within the ChannelRouter. Before
this commit, we would sync our graph prune state, *then* update the
cain filter. This is incorrect as the blocks we manually pruned may
have included channel closing transactions. As a result, we would miss
the pruning of a set of channels, and assume that they were still
active.
In this commit, we fix this by reversing the order: we first update the
chain filter and THEN sync the channel graph.
In this commit we add a new test to the set of unit tests for the
ChannelRouter: TestRouterChansClosedOfflinePruneGraph. This tests that
if channels are closed while the ChannelRouter is down, then upon
restart the channels are properly recognized as being closed.
In this commit, we add a Reset() method to the mockChainView struct.
With this new method tests are able to fully simulate a restart of the
ChannelRouter. This is necessary as the FilteredChainView instances are
assumed to be stateless, and don’t write their state to disk before a
restart.
This commit adds a test for the FilteredChainView interfaces,
making sure they notify about disconnected/connected blocks
in the correct order during a reorg.
This commit makes use of the blockEventQueue within the neutrino
implementation of FilteredChainView to ensure connected and
disconnected blocks are consumed in order by the reader.
It also specifies that neutrino is not to send disconnected blocks
notifications during rescans, making it consistent with the btcd
implementation.
This commit moves btcd view away from using the deprecated
callbacks onBlockConnected/Disconnected, and instead use
onFilteredBlockConnected/disconnected.
This commit also implements the sending of disconnected blocks
over the staleBlocks channel. To send these blocks, the
blockEventQueue is used to ensure the ordering of blocks are
correctly kept.
It also changes the way filter updates are handled. Since we
now load the tx filter to the rpc server itself, we can call
RescanBlocks instead of manually filtering blocks. These
rescanned blocks are also added to the blockEventQueue,
ensuring the ordering is kept.
blockEventQueue is an ordered queue for block events sent from a
FilteredChainView. The two types of possible block events are
connected/new blocks, and disconencted/stale blocks. The
blockEventQueue keeps the order of these events intact, while
still being non-blocking. This is important in order for the
chainView's call to onFilteredBlockConnected/Disconnected to not
get blocked, and for the consumer of the block events to always
get the events in the correct order.
Before this commit, we would expect that structurally we don’t pay any
fee for the first hop, but do for the final hop. After the latest
commit, this is now flipped as when we say fee, we mean the fee that we
need to pay to transit a link. For the final hop, there’s no additional
distance to be traveled, so the fee is nothing.
In this commit we fix an existing miscalculation in the fees that we
prescribe within the onion payloads for multi-hop routes. Before this
commit, if a route had more than 3 hops, then we would erroneously give
the second to last hop zero fees.
In this commit we correct this behavior, and also re-write the fee
calculation code fragment within newRoute for readability and clarity.
There are now only two cases: this is the last hop, and this is any
other hop. In the case of the last hop, simply send the exact amount
with no additional fee. In the case of an intermediate hop, we use the
_prior_ (closer to the destination) hop to calculate the amount of fees
we need, which allows us to compute the incoming flow. Using that
incoming flow, we then can compute the amount that the hop should
forward out.
Partially fixes#391.
In this commit we fix a slight bug within the existing SendPayment loop
which would cause the wrong error to be returned to users. Prior to
this commit, if we received an update identical to what we were already
aware of, then that error would be returned rather than the
ForwardingError that encapsulated this update.
In this commit with remedy this by properly returning the exact error.
Partially fixes#391.
In this commit we restore the in memory ChannelRouter as we’ll no
dynamically set the ChannelRouter’s pointer within he spec path finding
test example.
In this commit, we’ll now optionally allow the user to pass in the CLTV
delta value specified by the recipient a payment. If the value isn’t
specified, then we’ll use the current global default for the payment.
In this commit, we modify the FindRoutes method to pass in the CLTV
expiry for the final hop. If the value isn’t passed in, then we’ll use
the current global default value in place.
In this commit, we correct the fee calculation when converting from a
path to route. Previously we would apply the “no fee” case at the
_first_ hop, rather than the last hop. As a result, we needed to swap
the edges during path finding, otherwise, if the incoming and outgoing
edges had different fee rates, then we would create an invalid onion
payload.
In this commit we now properly switch fee calculation into three cases:
* a single hop route, so there’s no fee
* we’re at the first hop in a multi hop route, and we apply the fee
for the _next_ hop
* we’re at an intermediate hop and the fee calculation proceeds as
normal
In this commit we revert a commit which was added in the past as way to
allow the path -> route conversion code to remain the same, while
properly respecting the necessary time locks and fees. In an upcoming
change, this swap is no longer necessary as we’ll always use: the time
lock of the outgoing node and the fee of the incoming node.
In this commit, rather than reading the final CLTV delta from the
channel graph itself (which would require _both_ edges to be advertised
in order to route over), we now instead have moved to allowing the
receiving node to choose their own final CLTV delta.
In this commit, we’ve removed the selfNode attribute from memory, as
the set of new tests we’ll write, will depend on us being able to
switch the source node dynamically from the database itself.
In this commit, from the PoV of the SendPayment method we now delegate
all path finding+verification to missionControl. This change doesn’t
materially affect anything, it simply expands the abstraction to make
way for future features that more heavily utilize mission control.
In order to maintain the original essence of the test, we need to clear
the state of missionControl with each attempt, essentially advancing
time between each payment attempt.
In this commit we modify the SendPayment loop to optimize for
time-to-first-payment-success-or-failure. The prior logic would first
attempt to find at least 100 routes to the destination, then
iteratively prune them away as errors were encountered. In this commit,
we modify this approach to instead take a lazy approach: we first find
the current “best” path, attempt to send to that, and if an error
occurs we prune a section of the graph by reporting to missionControl,
then continue.
With this new approach, if the first known path has sufficient
capacity, and is available, then the payment speed is greatly improved
from the PoV of users. Additionally, we avoid the excessive computation
of crawling most of the graph in the k-shortest paths loop. With the
decay on missionControl, all routes will now feed information into the
central knowledge hung, allowing all payments to iteratively find out
the inactive portions of the payment graph.
This commit adds a new system within the ChannelRouter: missionControl.
The purpose of this system to is to act as a shared memory of sorts
between payment sending attempts, recording which edges/vertexes word
or didn’t work. Allowing execution attempts to pass on their iterative
knowledge of the graph to later attempts will reduce the number of
failures encountered, and generally lead to a better UX when sending
payments.
The current capabilities of missionControl are rather limited just to
introduce the new abstraction. Later follow up commits will also add
preferential treatment for reliable nodes, knowledge the impact that
target payments have on unbalancing the payment graph, etc.
This commit fixes a bug that could lead to a deadlock inside bolt db
itself. In a recent commit we allowed a db transaction to be passed
directly into findPath, however, the initial call to graph.ForEachNode
instead passed a _nil_ transaction causing the method itself to create
a _new_ transaction, leading to a deadlock.
We fix this issue by instead re-using the transaction pointer.
This commit modifies the path finding logic such that all path finding
is done inside a _single_ database transaction. With this change, we
ensure that we don’t end up possibly creating hundreds of database
transactions slowing down the path finding and payment sending process
all together.
This commit adds basic route pruning in response to HTLC onion errors.
With this new change, the router will now prune routes in response to
HTLC errors, which will reduce the time to payment success, and also
avoid a bunch of unnecessary network traffic.
We now respond to two errors lnwire.FailTemporaryChannelFailure and
lnwire.FailUnknownNextPeer. In response to the first error, we’ll prune
all routes that contain the channel which was unable to be routed over.
In response to the second error we’ll prune all routes that contain the
node which couldn’t be found.
In this commit we modify the newRoute function to also add the source
node to the nextHopMap index. With this addition the indexes will now
allow the router to react based on failures that occur during the
_first_ hop, meaning the channel directly attached to the source node.
This commit adds three new indexes to the Route struct. These indexes
allow a caller to check if a channel is in the route, check if a node
is in the route, query the next node after a target node, and query the
next channel after a target node. The combination of these new indexes
will allow the ChannelRouter to prune away routes from the available
set in response to any received errors.
This commit implements 2-week zombie channel pruning. This means that
every GraphPruneInterval (currently set to one hour), we’ll scan the
channel graph, marking any channels which haven’t had *both* edges
updated in 2 weeks as a “zombie”. During the second pass, all “zombie”
channel are removed from the channel graph all together.
Adding this functionality means we’ll ensure that we maintain a
“healthy” network view, which will cut down on the number of failed
HTLC routing attempts, and also reflect an active portion of the graph.
Use binary.Read/Write in functions to serialize and deserialize
channel close summary and HTLC boolean data, as well as in
methods to put and fetch channel funding info. Remove lnd
implementations of readBool and writeBool as they are no
longer needed. Also fix a few minor typos.
Use sort.Slice in FindRoutes function in routing/router.go, as part
of the move to use new language features. Remove sortableRoutes type
wrapper for slice of Routes since it is no longer needed to sort routes.
In this commit we modify the existing
TestSendPaymentRouteFailureFallback to use a non-critical error aside
from FailChannelDisabled. This is necessary as the behavior of the
current error handling can fail due to us sending in a nil error.
This commit modifies the way we currently interpret errors when sending
payments via the SendToSwitch method. We split the errors into two
broad sections: critical errors which cause us to abandon the payment
dispatch all together, and errors which are transient meaning we should
continue trying to remainder of the returned routes.
Note that we haven’t yet properly implemented all the necessary
measures such as filtering edges that are detected as being temporarily
inactive, etc.
This change should correct erroneous behavior such as continuing to try
all available routes in the face of an invalid payment hash error and
the like.
This commit modifies the way we do path caching. Rather than only
caching within SendPayment, we now cache routes within FindRoutes. This
is more natural as SendPayment eventually calls FindRoute. As a result
of this commit, queries to FindRoute are now properly cached, speeding
up applications which are focused on graph visualization or querying
rather than sending payments.
This commit reduces the neutrino.WaitForMoreCFHeaders parameter when
instantiating a neutrino instance as a lower value will allow the tests
to complete more quickly.
This commit fixes an oversight in the path finding code when converting
a path into a route. Currently, for the last hop, we’d emplace the
expiry delta of the last hop within the per-hop payload. This was left
over from a prior version of the specification.
To fix this, we’ll now emplace the _absolute_ final HTLC expiry with
the payload, such that, the final hop that verify that the HTLC has not
been tampered with in flight.
This commit fixes an lingering bug within the path finding logic of the
router. Previously we used the edge policy directly attached to the
outgoing channel of the node we were traversing to calculate the fees
and time lock information. This is incorrect, as we instead should be
using the policy of the *connecting* node as we’ll need to pay for
transit as they dictate.
To remedy this, we now grab the incoming+outgoing edges and use those
accordingly when building the initial path.
This commit makes a precautionary change in order to ensure that the
upper bound on the number of iteration’s within our version of Yen’s
algorithm is fixed.
This commit makes the routing cache invalidation a bit more aggressive.
We now invalidate the cache on each new block as the routes in the
cache are based on the current block height. Using the cached items may
cause our routes to fail due to them having time locks which have
already expired.
This commit implements some missing functionality, namely before all
time locks were calculated off of a base height of 0 essentially.
That’s incorrect as all time locks within HTLC’s would then be already
expired. We remedy this requesting the latest height when creating a
route to ensure that our time locks are set properly.
This commit introduces the requirement specified in BOLT#7,
where we ignore any node announcements for a specific node
if we yet haven't seen any channel announcements where this
node takes part. This is to prevent someone DoS-ing the
network with cheap node announcements. In the router this
is enforced by requiring a call to AddNode(node_id) to
be preceded by an AddEdge(edge_id) call, where node_id is
one of the nodes in edge_id.
Modifies the test cases in `TestEdgeUpdateNotification` and
`TestNodeUpdateNotification` to check for the possibility of notifications
being delivered out of order. This addresses some sporadic failures that
were observed when running the test suite.
I looked through some of the open issues but didn't see any addressing this
issue in particular, but if someone could point me to any relevant issues
that would be much appreciated!
Issue
-----
Currently the test suite validates notifications received in the order they
are submitted. The check fails because the verification of each
notification is statically linked to the order in which they are delivered,
seen
[here](1be4d67ce4/routing/notifications_test.go (L403))
and
[here](1be4d67ce4/routing/notifications_test.go (L499))
in `routing/notifications_test.go`. The notifications are typically
delivered in this order, but causes the test to fail otherwise.
Proposed Changes
-------------------
Construct an index that maps a public key to its corresponding edges and/or
nodes. When a notification is received, use its identifying public key and
the index to look up the edge/node to use for validation. Entries are
removed from the index after they are verified to ensure that the same
entry is validated twice. The logic to dynamically handle the verification
of incoming notifications rests can be found here
[here](https://github.com/cfromknecht/lnd/blob/order-invariant-ntfns/routing/notifications_test.go#L420)
and
[here](https://github.com/cfromknecht/lnd/blob/order-invariant-ntfns/routing/notifications_test.go#L539).
Encountered Errors
--------------------
* `TestEdgeUpdateNotification`: notifications_test.go:379: min HTLC of
edge doesn't match: expected 16.7401473 BTC, got 19.4852751 BTC
* `TestNodeUpdateNotification`: notifications_test.go:485: node identity
keys don't match: expected
027b139b2153ac5f3c83c2022e58b3219297d0fb3170739ee6391cddf2e06fe3e7, got
03921deafb61ee13d18e9d96c3ecd9e572e59c8dbd0bb922b5b6ac609d10fe4ee4
Recreating Failing Behavior
---------------------------
The failures can be somewhat difficult to recreate, I was able to reproduce
them by running the unit tests repeatedly until they showed up. I used the
following commands to bring them out of hiding:
```
./gotest.sh -i
go test -test.v ./routing && while [ $? -eq 0 ]; do go test -test.v ./routing; done
```
I was unable to recreate these errors, or any others in this package, after
making the proposed changes and leaving the script running continuously for
~30 minutes. Previously, I could consistently generate an error after ~20
seconds had elapsed on the latest commit in master at the time of writing:
78f6caf5d2e570fea0e5c05cc440cb7395a99c1d. Moar stability ftw!
Within the network, it's important that when an HTLC forwarding failure
occurs, the recipient is notified in a timely manner in order to ensure
that errors are graceful and not unknown. For that reason with
accordance to BOLT №4 onion failure obfuscation have been added.
The btclog package has been changed to defining its own logging
interface (rather than seelog's) and provides a default implementation
for callers to use.
There are two primary advantages to the new logger implementation.
First, all log messages are created before the call returns. Compared
to seelog, this prevents data races when mutable variables are logged.
Second, the new logger does not implement any kind of artifical rate
limiting (what seelog refers to as "adaptive logging"). Log messages
are outputted as soon as possible and the application will appear to
perform much better when watching standard output.
Because log rotation is not a feature of the btclog logging
implementation, it is handled by the main package by importing a file
rotation package that provides an io.Reader interface for creating
output to a rotating file output. The rotator has been configured
with the same defaults that btcd previously used in the seelog config
(10MB file limits with maximum of 3 rolls) but now compresses newly
created roll files. Due to the high compressibility of log text, the
compressed files typically reduce to around 15-30% of the original
10MB file.
This commit fixes a send on closed channel panic by adding additional
synchronization when cancelling the notifications for a particular
topology client. We now ensure that all goroutines belonging to a
particular topology client exit fully before we close the notification
channel in order to avoid a panic.
This commit adds a new method to the routing.Route struct:
ToHopPayloads. This function will converts a complete route into the
series of per-hop payloads that is to be encoded within each HTLC using
an opaque Sphinx packet.
We can now use this function when creating the sphinx packet to
properly encoded the hop payload for each hop in the route.
This commit inches towards fully validation+adherance of the per-hop
payloads within an HTLC’s route by properly calculating the outgoing
time lock value for each hop according to the current draft
specification.
This commit fixes a possible race condition wherein a call to
FilterBlock after a call to UpdateFilter would result in the call to
FilterBlock not yet using the updated filter. We fix this by ensuring
the internal chain filter is updated by the time the call to
FilterBlock returns.
This commit optimizes the neutrino implementation of FilterBlock method
of the ChainView interface. The old implementation would _always_ fetch
the entire block and manually scan through it. Instead, we can just
fetch the filter, and then if the items match, fetch the block itself.
This will save bandwidth during a lnd node’s pruning of the channel
graph after a period of dormancy.
This commit adds an initial rough implementation father ChainNotifier
interface for neutrino, our new light client implementation. This
implementation largely borrows from the existing BtcdNotifier
implementation. As a result, a follow up commit will perform two
refactoring in order to further consolidate code.
This commit adds a new implementation of the FilteredChainView
interface. This implementation speaks purely to the p2p network and is
backed by a new experimental light client implementation.
This commit replaces the hard-coded 5000 satoshi fees with calls to the
FeeEstimator interface. This should provide a way to cleanly plug in
additional fee calculation algorithms in the future. This change
affected quite a few tests. When possible, the tests were changed to
assert amounts sent rather than balances so that fees wouldn't need to
be taken into account. There were several tests for which this wasn't
possible, so calls to the static fee calculator were made.
This commit fixes a panic due to a send on a closed channel that could
possibly occur depending on the order of channel closes when a client
goes to cancel a topology notification client.
Previously we closed the ntfnChan first, this would possible result in
a panic as the goroutine may have succeeded on a send at the same time
the channel was closed. Instead, we now close the `exit` channel first
which is meant to be a signal to the goroutine that the client has been
canceled.
This commit modifies the processing in the routing package eo new
announcements. Previously, if we cgot a cnew channel announcement but
didn’t yet know of the verses that the chanell connected, the
cnnounacment would be accepted. This behavior was eronoues as if the
channel were to be queried for, the DB query would fail as we would be
unable to retrieve the two nodes involved int he channel.
To avoid such an error case, we will now _reject_ any channel
announcements in which we don’t yet have a valid node announcement for
the connected nodes. This case has been inserted into the handling of
channel announcement, a new test has been added, and finally older
tests have also been updated to ensure that nodes are added to the
database _before_ the edge is.
This commit modifies the routing package to no longer use the
ChainNotifier for pruning the channel graph. Instead, we now use the
FilteredChainView interface to more (from the ChannelRouter’s PoV)
efficiently maintain the channel graph.
Rather than scanning the _entire_ block manually, we now rely on the
FilteredChainView to provide us with FilteredBlocks which include
_only_ the relevant transactions that we care about.
This commit adds a new set of behavioral interface level tests to the
chain view package. This set of tests can now be used in order to check
proper conformity to this “specification” for all future
implementations of the chain view package.
This commit adds the first concrete implementation of the
chainview.FilteredChainView interface. The implementation of this
interface, BtcdFilteredChainView is backed by a web sockets connection
to an active btcd instance.
This commit creates a new package as sub-package within the routing
package: chainview. This package is centered around a single interface
definition: the FilteredChainView. This interface is to be used to
allow the routing package to watch a _subset_ of the UTXO set for any
modifications. In the case of LN, the subset of the UTXO set that we
care about is the set of currently opened channels.
In a future commit the routing package will be modified to remove the
current full block scanning with processing of FilteredBlock
notification, and proper updates to the filter as observed by the
FilteredChainView.
This commit fixes a pretty nasty unnoticed bug within the main
k-shortest paths algorithm loop. After a new candidate path is found,
the rootPath (the path up to the pivot node) and the spurPath (the
_new_ path after the pivot node) are to be combined into a new candiate
shortest path. The prior logic simply appended the spurPath onto the
end of the rootPath to create a slice. However, if the case that the
currnet rootPath is really a sub-path in a larger slice, then this will
mutate the underlying slice.
This bug would manifest when doing path finding and cause an infinite
loop as the slice kept growing with new spurPaths, causing the loop to
never terminate. We remedy this bug by properly create a new backing
slice, and adding the elements to them rather than incorrectly mutating
an underlying slice.
This commit fixes a bug within the k-shortest paths routine which could
result in a daemon panic when traversing a graph with particular
characteristics. Before referencing the path to create a sub-slice, we
we’re properly asserting that the length of the path was at least as
long as the current rootPath in question. We fix this by simply
ensuring the length of the slice is adequate before proceeding with the
operation.
This commit implements some minor coding style, commenting and naming
clean up after the recent major discovery service was merged into the
codebase.
Highlights of the naming changes:
* fundingManager.SendToDiscovery -> SendAnnouncement
* discovery.Discovery -> discovery.AuthenticatedGossiper
The rest of the changes consist primary of grammar fixes and proper
column wrapping.
Originally we adding the edge without proof in order to able to use it
for payments path constrcution. This method will allow us to populate
the announcement proof after the exchange of the half proofs and
constrcutrion of full proof finished.
Change the name of fields of messages which are belong to the discovery
subsystem in a such way so they were the same with the names that are
defined in the specification.
Add usage of the 'discovery' package in the lnd, now discovery service
will be handle all lnwire announcement messages and send them to the
remote party.
In this commit the routing package was divided on two separete one,
this was done because 'routing' package start take too much responsibily
on themself, so with following commit:
Routing pacakge:
Enitites:
* channeldb.ChannelEdge
* channeldb.ChannelPolicy
* channeldb.NodeLightning
Responsibilities:
* send topology notification
* find payment paths
* send payment
* apply topology changes to the graph
* prune graph
* validate that funding point exist and corresponds to given one
* to be the source of topology data
Discovery package:
Entities:
* lnwire.AnnounceSignature
* lnwire.ChannelAnnouncement
* lnwire.NodeAnnouncement
* lnwire.ChannelUpdateAnnouncement
Responsibilities:
* validate announcement signatures
* sync topology with newly connected peers
* handle the premature annoucement
* redirect topology changes to the router susbsystem
* broadcast announcement to the rest of the network
* exchange channel announcement proofs
Before that moment all that was in the 'routing' which is quite big for
one subsystem.
split
This commit modifies address handling in the NodeAnnouncement struct,
switching from net.TCPAddr to []net.Addr. This enables more flexible
address handling with multiple types and multiple addresses for each
node. This commit addresses the first part of issue #131 .
This commit fixes the issue of broken builds in versions other than go
1.7.5 by sorting according to the sort.Interface interface rather than
the newly available sort.Slice function.
This commit adds caching to our route finding. Caching is done on a
tuple-basis mapping a (dest, amt) pair to a previously calculated set
of shortest paths. The cache invalidated on two occasions: when a block
closes a set of transactions, or we received a new channel update or
channel announcement message.
With this change, payments are now snappier from the PoV of an
application developer as we no longer need to do a series of disk-seeks
before we dispatch each payment.
This commit adds payment route failure fallback to SendPayment. By
this, we mean that we now take all the possible routes found during
path finding and try them in series. Either a route fails and we move
onto the next one, or the route is successful and we terminate early.
With this commit, sending payments using lnd is now much more robust as
if there exists an eligible route with sufficient capacity, it will be
utilized.
This commit modifies the existing FindRoute method on the ChannelRouter
to now use the KSP implementation added in a prior commit.
This new method FindRoutes, is able to find all the possible paths
between a source and destination. The method takes all paths reported
by findPaths, and attempt to turn each of them into a route. A route
differs from a path in that is has complete time-lock and fee
information. Some paths may not be able to be turned into routes as
once fees are accounted for the have an insufficient flow. We then take
the routes, sort them by total fee (with time-lock being a
time-breaker), then return them in sorted order.
With this commit we make our routing more robust by looking for the
k-shortest paths rather than a single shortest path and using that
unconditionally. In a nut shell Yen’s algorithm does the following:
* Find the shortest path from the source to the destination
* From k=1…K, walk the kth shortest path and find possible
divergence from each node in the path
Our version of Yen’s implemented is slightly modified, rather than
actually increasing the edge weights or remove vertexes from the graph,
we instead use two black-lists: one for edges and the other for
vertexes. Our modified version of Djikstra’s algorithm is then made
aware of these black lists in order to more efficiently implement the
path iteration via spur and root node.
This commit modifies the findRoute method by first calling it findPath,
but also making the following modifications.
First, two new black-listing maps are now passed in. These two maps
contain vertexes but also edges to ignore while performing path
finding. These maps will be used in order to ensure that we don’t
duplicate paths or back-track when executing our KSP algorithm.
Next, we now ensure that the path returned from the findPath function
is ordered properly in the direction of source to target. Such a change
is required for our KSP algorithm to function properly.
This commit adds a new heap structure to heap.go which will be used for
storing candidate paths during the iterations of the k-shortest paths
algorithm.
This commit modifies the findRoute function to decouple the
validation+creation of a route, from the path finding algorithm itself.
When we say “route”, we mean the full payment route complete with
time-lock and fee information. When we say “path” we simple mean an
ordered set of channel edges from one node to another target node.
With this commit we can now perform path finding independent of route
creation which will be needed in the up coming refactor to implement a
new modified k-shortest paths algorithm.
This commit slightly modified findRoute to accept the node which should
be used as the starting point in our path finding algorithm. With this
change, as we move to a k-shortest paths algorithm this modification
will be needed as all of our path finding attempts won’t always
originate from a the same starting point.
In this commit we now utilize the node distance heap that was added in
a prior commit into our core path finding logic. With this new data
structure, we no longer linearly scan the distance of all vertexes from
the source node when deciding which one to greedily explore.
Instead, we now start with the source added to our distance heap, then
new vertexes are progressively added to our heap as their edges are
explored. With this change, we move the computational complexity of our
path finding algorithm closer to the theoretical limit.