This commit fixes an existing bug within the ChannelRouter. Prior to
this commit, if the chain view skipped blocks or for some reason we had
a gap in blocks delivered, then we would simply accept them. This had
the potential to cause us to miss on-chain channel closure events. To
remedy this, we won’t process any blocks whose heights aren’t
*strictly* increasing.
A longer term fix would be to have the ChainView take a block height,
and re-dispatch any notifications from that height to the current
height.
In this commit, we implement adherence of the disabled bit within a
ChannelUpdate during path finding. If a channel is marked as disabled,
then we won’t attempt to route through it. A test has been added to
exercise this new check.
In this commit, we update path finding to skip an edge if the amount
we’re trying to route through it is below the MinHTLC (in mSAT) value
for that node. We also add a new test to exercise this behavior. In
order for out test to work properly, we’ve modified the JSON to make
the edge to Goku have a higher min HTLC value.
In this commit, we modify the high value passed into UpdateFilter upon
restart. Before this commit, we would pass in the prune height, which
would cause a full rescan within the FilteredChainView if the best
height as > than the prune height. This was redundant as we would
shortly carry out a manual rescan in the method below. To fix this, we
now pass in the bestHeight, this isn’t an issue as the
syncGraphWithChain method will manually scan up to that best height.
In this commit, we add a new abstraction, the ValidationBarrier. This
struct will be used to allow parallel validation of announcements
within notes AuthenticatedGossiper as well as the ChannelRouter.
Naively validating the announcement in parallel would run into issues
as it would be possible for validate an update announcement, before
validating the channel announcement itself. We solve this by creating a
waiting dependance using the ValidationBarrier to ensure that the
defendant jobs wait until their parents have been full validated.
In this commit we ensure that if this is the first time that the
ChannelRouter is starting, then we set the pruned height+hash to the
current best height. Otherwise, it’s possible that we attempt to update
the filter with a 0 prune height, which will restart a historical
rescan unnecessarily.
In this commit we ensure that we only update the filter, if we have a
non-zero chain view. Otherwise, a mini rescan may be kicked off
unnecessarily if we don’t yet know of any channels yet in the greater
graph.
Run go fmt so config file is formatted correctly. Also rename
newVertex to NewVertex in pathfind_test and notifications_test
as it is now exported from the routing package.
For Part 1 of Issue #275. Create isolated private struct in
networkHandler goroutine that will de-duplicate
announcements added to the batch. The struct contains maps
for each of channel announcements, channel updates, and
node announcements to keep track of unique announcements.
The struct has a Reset method to reset stored announcements, an
AddMsg(lnwire.Message) method to add a new message to the current
batch, and a Batch method to return the set of de-duplicated
announcements.
Also fix a few minor typos.
This commit alters the behavior of the router's logic on
startup, ensuring that the chain view is filtered using
the router's latest prune height. Before, the chain was
filtered using the bestHeight variable, which was
uninitialized, benignly forcing a rescan from genesis.
In tracking down this, we realized that we should
actually be using the prune height, as this is
representative of the channel view loaded from disk.
The best height/hash are now only used during
startup to determine if we are out of sync.
In this commit we fix an existing bug within the ChannelRouter. Before
this commit, we would sync our graph prune state, *then* update the
cain filter. This is incorrect as the blocks we manually pruned may
have included channel closing transactions. As a result, we would miss
the pruning of a set of channels, and assume that they were still
active.
In this commit, we fix this by reversing the order: we first update the
chain filter and THEN sync the channel graph.
In this commit we add a new test to the set of unit tests for the
ChannelRouter: TestRouterChansClosedOfflinePruneGraph. This tests that
if channels are closed while the ChannelRouter is down, then upon
restart the channels are properly recognized as being closed.
In this commit, we add a Reset() method to the mockChainView struct.
With this new method tests are able to fully simulate a restart of the
ChannelRouter. This is necessary as the FilteredChainView instances are
assumed to be stateless, and don’t write their state to disk before a
restart.
This commit adds a test for the FilteredChainView interfaces,
making sure they notify about disconnected/connected blocks
in the correct order during a reorg.
This commit makes use of the blockEventQueue within the neutrino
implementation of FilteredChainView to ensure connected and
disconnected blocks are consumed in order by the reader.
It also specifies that neutrino is not to send disconnected blocks
notifications during rescans, making it consistent with the btcd
implementation.
This commit moves btcd view away from using the deprecated
callbacks onBlockConnected/Disconnected, and instead use
onFilteredBlockConnected/disconnected.
This commit also implements the sending of disconnected blocks
over the staleBlocks channel. To send these blocks, the
blockEventQueue is used to ensure the ordering of blocks are
correctly kept.
It also changes the way filter updates are handled. Since we
now load the tx filter to the rpc server itself, we can call
RescanBlocks instead of manually filtering blocks. These
rescanned blocks are also added to the blockEventQueue,
ensuring the ordering is kept.
blockEventQueue is an ordered queue for block events sent from a
FilteredChainView. The two types of possible block events are
connected/new blocks, and disconencted/stale blocks. The
blockEventQueue keeps the order of these events intact, while
still being non-blocking. This is important in order for the
chainView's call to onFilteredBlockConnected/Disconnected to not
get blocked, and for the consumer of the block events to always
get the events in the correct order.
Before this commit, we would expect that structurally we don’t pay any
fee for the first hop, but do for the final hop. After the latest
commit, this is now flipped as when we say fee, we mean the fee that we
need to pay to transit a link. For the final hop, there’s no additional
distance to be traveled, so the fee is nothing.
In this commit we fix an existing miscalculation in the fees that we
prescribe within the onion payloads for multi-hop routes. Before this
commit, if a route had more than 3 hops, then we would erroneously give
the second to last hop zero fees.
In this commit we correct this behavior, and also re-write the fee
calculation code fragment within newRoute for readability and clarity.
There are now only two cases: this is the last hop, and this is any
other hop. In the case of the last hop, simply send the exact amount
with no additional fee. In the case of an intermediate hop, we use the
_prior_ (closer to the destination) hop to calculate the amount of fees
we need, which allows us to compute the incoming flow. Using that
incoming flow, we then can compute the amount that the hop should
forward out.
Partially fixes#391.
In this commit we fix a slight bug within the existing SendPayment loop
which would cause the wrong error to be returned to users. Prior to
this commit, if we received an update identical to what we were already
aware of, then that error would be returned rather than the
ForwardingError that encapsulated this update.
In this commit with remedy this by properly returning the exact error.
Partially fixes#391.
In this commit we restore the in memory ChannelRouter as we’ll no
dynamically set the ChannelRouter’s pointer within he spec path finding
test example.
In this commit, we’ll now optionally allow the user to pass in the CLTV
delta value specified by the recipient a payment. If the value isn’t
specified, then we’ll use the current global default for the payment.
In this commit, we modify the FindRoutes method to pass in the CLTV
expiry for the final hop. If the value isn’t passed in, then we’ll use
the current global default value in place.
In this commit, we correct the fee calculation when converting from a
path to route. Previously we would apply the “no fee” case at the
_first_ hop, rather than the last hop. As a result, we needed to swap
the edges during path finding, otherwise, if the incoming and outgoing
edges had different fee rates, then we would create an invalid onion
payload.
In this commit we now properly switch fee calculation into three cases:
* a single hop route, so there’s no fee
* we’re at the first hop in a multi hop route, and we apply the fee
for the _next_ hop
* we’re at an intermediate hop and the fee calculation proceeds as
normal
In this commit we revert a commit which was added in the past as way to
allow the path -> route conversion code to remain the same, while
properly respecting the necessary time locks and fees. In an upcoming
change, this swap is no longer necessary as we’ll always use: the time
lock of the outgoing node and the fee of the incoming node.
In this commit, rather than reading the final CLTV delta from the
channel graph itself (which would require _both_ edges to be advertised
in order to route over), we now instead have moved to allowing the
receiving node to choose their own final CLTV delta.
In this commit, we’ve removed the selfNode attribute from memory, as
the set of new tests we’ll write, will depend on us being able to
switch the source node dynamically from the database itself.
In this commit, from the PoV of the SendPayment method we now delegate
all path finding+verification to missionControl. This change doesn’t
materially affect anything, it simply expands the abstraction to make
way for future features that more heavily utilize mission control.
In order to maintain the original essence of the test, we need to clear
the state of missionControl with each attempt, essentially advancing
time between each payment attempt.
In this commit we modify the SendPayment loop to optimize for
time-to-first-payment-success-or-failure. The prior logic would first
attempt to find at least 100 routes to the destination, then
iteratively prune them away as errors were encountered. In this commit,
we modify this approach to instead take a lazy approach: we first find
the current “best” path, attempt to send to that, and if an error
occurs we prune a section of the graph by reporting to missionControl,
then continue.
With this new approach, if the first known path has sufficient
capacity, and is available, then the payment speed is greatly improved
from the PoV of users. Additionally, we avoid the excessive computation
of crawling most of the graph in the k-shortest paths loop. With the
decay on missionControl, all routes will now feed information into the
central knowledge hung, allowing all payments to iteratively find out
the inactive portions of the payment graph.
This commit adds a new system within the ChannelRouter: missionControl.
The purpose of this system to is to act as a shared memory of sorts
between payment sending attempts, recording which edges/vertexes word
or didn’t work. Allowing execution attempts to pass on their iterative
knowledge of the graph to later attempts will reduce the number of
failures encountered, and generally lead to a better UX when sending
payments.
The current capabilities of missionControl are rather limited just to
introduce the new abstraction. Later follow up commits will also add
preferential treatment for reliable nodes, knowledge the impact that
target payments have on unbalancing the payment graph, etc.
This commit fixes a bug that could lead to a deadlock inside bolt db
itself. In a recent commit we allowed a db transaction to be passed
directly into findPath, however, the initial call to graph.ForEachNode
instead passed a _nil_ transaction causing the method itself to create
a _new_ transaction, leading to a deadlock.
We fix this issue by instead re-using the transaction pointer.
This commit modifies the path finding logic such that all path finding
is done inside a _single_ database transaction. With this change, we
ensure that we don’t end up possibly creating hundreds of database
transactions slowing down the path finding and payment sending process
all together.
This commit adds basic route pruning in response to HTLC onion errors.
With this new change, the router will now prune routes in response to
HTLC errors, which will reduce the time to payment success, and also
avoid a bunch of unnecessary network traffic.
We now respond to two errors lnwire.FailTemporaryChannelFailure and
lnwire.FailUnknownNextPeer. In response to the first error, we’ll prune
all routes that contain the channel which was unable to be routed over.
In response to the second error we’ll prune all routes that contain the
node which couldn’t be found.
In this commit we modify the newRoute function to also add the source
node to the nextHopMap index. With this addition the indexes will now
allow the router to react based on failures that occur during the
_first_ hop, meaning the channel directly attached to the source node.
This commit adds three new indexes to the Route struct. These indexes
allow a caller to check if a channel is in the route, check if a node
is in the route, query the next node after a target node, and query the
next channel after a target node. The combination of these new indexes
will allow the ChannelRouter to prune away routes from the available
set in response to any received errors.
This commit implements 2-week zombie channel pruning. This means that
every GraphPruneInterval (currently set to one hour), we’ll scan the
channel graph, marking any channels which haven’t had *both* edges
updated in 2 weeks as a “zombie”. During the second pass, all “zombie”
channel are removed from the channel graph all together.
Adding this functionality means we’ll ensure that we maintain a
“healthy” network view, which will cut down on the number of failed
HTLC routing attempts, and also reflect an active portion of the graph.
Use binary.Read/Write in functions to serialize and deserialize
channel close summary and HTLC boolean data, as well as in
methods to put and fetch channel funding info. Remove lnd
implementations of readBool and writeBool as they are no
longer needed. Also fix a few minor typos.
Use sort.Slice in FindRoutes function in routing/router.go, as part
of the move to use new language features. Remove sortableRoutes type
wrapper for slice of Routes since it is no longer needed to sort routes.
In this commit we modify the existing
TestSendPaymentRouteFailureFallback to use a non-critical error aside
from FailChannelDisabled. This is necessary as the behavior of the
current error handling can fail due to us sending in a nil error.
This commit modifies the way we currently interpret errors when sending
payments via the SendToSwitch method. We split the errors into two
broad sections: critical errors which cause us to abandon the payment
dispatch all together, and errors which are transient meaning we should
continue trying to remainder of the returned routes.
Note that we haven’t yet properly implemented all the necessary
measures such as filtering edges that are detected as being temporarily
inactive, etc.
This change should correct erroneous behavior such as continuing to try
all available routes in the face of an invalid payment hash error and
the like.
This commit modifies the way we do path caching. Rather than only
caching within SendPayment, we now cache routes within FindRoutes. This
is more natural as SendPayment eventually calls FindRoute. As a result
of this commit, queries to FindRoute are now properly cached, speeding
up applications which are focused on graph visualization or querying
rather than sending payments.
This commit reduces the neutrino.WaitForMoreCFHeaders parameter when
instantiating a neutrino instance as a lower value will allow the tests
to complete more quickly.
This commit fixes an oversight in the path finding code when converting
a path into a route. Currently, for the last hop, we’d emplace the
expiry delta of the last hop within the per-hop payload. This was left
over from a prior version of the specification.
To fix this, we’ll now emplace the _absolute_ final HTLC expiry with
the payload, such that, the final hop that verify that the HTLC has not
been tampered with in flight.
This commit fixes an lingering bug within the path finding logic of the
router. Previously we used the edge policy directly attached to the
outgoing channel of the node we were traversing to calculate the fees
and time lock information. This is incorrect, as we instead should be
using the policy of the *connecting* node as we’ll need to pay for
transit as they dictate.
To remedy this, we now grab the incoming+outgoing edges and use those
accordingly when building the initial path.
This commit makes a precautionary change in order to ensure that the
upper bound on the number of iteration’s within our version of Yen’s
algorithm is fixed.
This commit makes the routing cache invalidation a bit more aggressive.
We now invalidate the cache on each new block as the routes in the
cache are based on the current block height. Using the cached items may
cause our routes to fail due to them having time locks which have
already expired.
This commit implements some missing functionality, namely before all
time locks were calculated off of a base height of 0 essentially.
That’s incorrect as all time locks within HTLC’s would then be already
expired. We remedy this requesting the latest height when creating a
route to ensure that our time locks are set properly.
This commit introduces the requirement specified in BOLT#7,
where we ignore any node announcements for a specific node
if we yet haven't seen any channel announcements where this
node takes part. This is to prevent someone DoS-ing the
network with cheap node announcements. In the router this
is enforced by requiring a call to AddNode(node_id) to
be preceded by an AddEdge(edge_id) call, where node_id is
one of the nodes in edge_id.
Modifies the test cases in `TestEdgeUpdateNotification` and
`TestNodeUpdateNotification` to check for the possibility of notifications
being delivered out of order. This addresses some sporadic failures that
were observed when running the test suite.
I looked through some of the open issues but didn't see any addressing this
issue in particular, but if someone could point me to any relevant issues
that would be much appreciated!
Issue
-----
Currently the test suite validates notifications received in the order they
are submitted. The check fails because the verification of each
notification is statically linked to the order in which they are delivered,
seen
[here](1be4d67ce4/routing/notifications_test.go (L403))
and
[here](1be4d67ce4/routing/notifications_test.go (L499))
in `routing/notifications_test.go`. The notifications are typically
delivered in this order, but causes the test to fail otherwise.
Proposed Changes
-------------------
Construct an index that maps a public key to its corresponding edges and/or
nodes. When a notification is received, use its identifying public key and
the index to look up the edge/node to use for validation. Entries are
removed from the index after they are verified to ensure that the same
entry is validated twice. The logic to dynamically handle the verification
of incoming notifications rests can be found here
[here](https://github.com/cfromknecht/lnd/blob/order-invariant-ntfns/routing/notifications_test.go#L420)
and
[here](https://github.com/cfromknecht/lnd/blob/order-invariant-ntfns/routing/notifications_test.go#L539).
Encountered Errors
--------------------
* `TestEdgeUpdateNotification`: notifications_test.go:379: min HTLC of
edge doesn't match: expected 16.7401473 BTC, got 19.4852751 BTC
* `TestNodeUpdateNotification`: notifications_test.go:485: node identity
keys don't match: expected
027b139b2153ac5f3c83c2022e58b3219297d0fb3170739ee6391cddf2e06fe3e7, got
03921deafb61ee13d18e9d96c3ecd9e572e59c8dbd0bb922b5b6ac609d10fe4ee4
Recreating Failing Behavior
---------------------------
The failures can be somewhat difficult to recreate, I was able to reproduce
them by running the unit tests repeatedly until they showed up. I used the
following commands to bring them out of hiding:
```
./gotest.sh -i
go test -test.v ./routing && while [ $? -eq 0 ]; do go test -test.v ./routing; done
```
I was unable to recreate these errors, or any others in this package, after
making the proposed changes and leaving the script running continuously for
~30 minutes. Previously, I could consistently generate an error after ~20
seconds had elapsed on the latest commit in master at the time of writing:
78f6caf5d2e570fea0e5c05cc440cb7395a99c1d. Moar stability ftw!
Within the network, it's important that when an HTLC forwarding failure
occurs, the recipient is notified in a timely manner in order to ensure
that errors are graceful and not unknown. For that reason with
accordance to BOLT №4 onion failure obfuscation have been added.