Commit Graph

786 Commits

Author SHA1 Message Date
Johan T. Halseth
e1399fb1ec
routing/router: use attempt's unique hash if set on restart 2021-04-27 09:45:13 +02:00
Johan T. Halseth
06f045fca3
channedb/mp_payment: add Hash to individual HTLCs
For AMP payments the hash used for each HTLC will differ, and we will
need to retrive it after a restart. We therefore persist it with each
attempt.
2021-04-27 09:44:19 +02:00
Johan T. Halseth
41ae3530a3
routing/payment_lifecycle: use ShardTracker to track shards
We'll let the payment's lifecycle register each shard it's sending with
the ShardTracker, canceling failed shards. This will be the foundation
for correct AMP derivation for each shard we'll send.
2021-04-27 09:43:40 +02:00
Johan T. Halseth
6474b253d6
routing/shards: add ShardTracker interface
We'll use this to keep track of the outstanding shards and which
preimages we are using for each. For now this is a simple map from
attempt ID to hash, but later we'll hide the AMP child derivation behind
this interface.
2021-04-27 08:27:33 +02:00
Johan T. Halseth
a9f19b100b
router+switch: rename paymentID->attemptID
To distinguish the attempt's unique ID from the overall payment
identifier, we name it attemptID everywhere, and note that the
paymentHash argument won't be the actual payment hash for AMP payments.
2021-04-27 08:27:33 +02:00
Johan T. Halseth
7795353e9f
channeldb: return full payment for inflight payments
We might as well return all info, and we'll need the individual HTLCs
in later commits.
2021-04-27 08:27:32 +02:00
carla
b43ddfdb11
routing: label payment lifecycle loop 2021-04-23 08:51:07 +02:00
carla
a63640c488
routing: account for payment terminal errors
If we have processed a terminal state while we're pathfinding
for another shard, the payment loop should not error out on
ErrPaymentTerminal. Instead, it would wait for our shards to
complete then cleanly exit.
2021-04-23 08:46:22 +02:00
carla
58d95be4dd
multi: change RegisterAttempt error checking order
Move our more generic terminal check forward so that we only
need to handle a single class of expected errors. This change
is mirrored in our mock, and our reproducing tests are updated
to assert that this move catches both classes of errors we get.
2021-04-23 08:39:45 +02:00
carla
12136a97a9
routing/test: add test for stuck payment with in-flight htlcs
Add an additional stuck-payment case, where our payment gets
a terminal error while it has other htlcs in-flight, and a
shard fails with ErrTerminalPayment. This payment also falls in
our class of expected errors, but is not currently handled. The
mock is updated accordingly, using the same ordering as in our
real RegisterAttempt implementation.
2021-04-23 08:39:45 +02:00
carla
80451afe48
routing/test: add test to demonstrate stuck payment, single shard
This commit adds a test which demonstrates that payments can
get stuck if we receive a payment failure while we're pathfinding
for another shard, then try to dispatch a shard after we've
recorded a permanent failure. It also updates our mock to
only consider payments with no in-flight htlcs as in-flight,
to more closely represent our actual RegisterAttempt.
2021-04-23 08:39:44 +02:00
carla
125980afb7
routing/test: block on pathfinding in tests
This commit adds a step to our payment lifecycle test to add
control over when we find a path for our payment, This is
required for testing race conditions around pathfinding
completing and payment failures being reported.
2021-04-23 08:39:43 +02:00
carla
a68155545c
routing/test: use half shard for single success case
Update our single shard success case to use a route which
splits the payment amount in half. This change still tests
the case where reveal of the preimage counts as a success,
even if we don't have the full amount. This change is made
to cut down on potential races in this test case. While we
are waiting for collectResultAsync to report a success, the
payment lifecycle will continue trying to dispatch shards.
In the case where we send 1/4 of the payment amount, we
send 1 or 2 more shards, depending on how long collectAsync
takes. Reducing this test to send 1/2 of the payment amount
means that we will always only try one more shard before
waiting for our shard.
2021-04-23 08:39:42 +02:00
carla
198d567cb2
routing/test: assert error value for payment failures 2021-04-23 08:39:41 +02:00
carla
e0c52e4473
routing/test: close payment result channel on shutdown, mimicking switch
This commit updates our mock to more closely follow the behavior of the
switch for mocked calls to GetPaymentResult. As it stands, our tests
send a test-created error from the switch when we want to mock shutdown.
In reality, the switch will close its result channel, so we update this
test to follow that behavior. This matters for the commit that follows,
because we start checking the error our payments return. If we have an
error from the switch, our tests will fail with an error that we do
not encounter in practice.
2021-04-23 08:39:40 +02:00
carla
9a78e9da73
routing/tests: move lock acquisition after state driving channels
In our payment lifecycle tests, we have two goroutines that
compete for the lock in our mock control tower: the resumePayment
loop which tries to call RegisterAttempt, and the collectResult
handler which is launched in a goroutine by collectResultAsync
and is responsible for various settle/fail calls.

The order that the lock is acquired by these goroutines is
arbitrary, and can lead to flakes in our tests if the step
that we do not intend to execute first gets the lock (eg,
we want to fail a payment in collectResult, but RegisterAttempt
gets there first). This commit moves contention for this lock
after our mock's various "state driving" channels, so that the
lock will be acquired in the order that the test intends it.
2021-04-23 08:39:39 +02:00
carla
b2d941ebfb
routing/test: remove test channel buffers
Now that we run each test individually, we don't need to buffer
our mock's channels anymore. This helps to tighten our test loop,
which currently can move on from a step before it's actually
been processed by the mock. This removal ensures that our payment
loop processes each of the test's steps before moving on to the
next once.
2021-04-23 08:39:38 +02:00
carla
806c4cbd57
routing/test: run each test case individually, add names
Update our payment lifecycle test to run each test case with
a fresh router. This prevents test cases from interacting with
each other. Names are also added for easy debugging.
2021-04-23 08:39:37 +02:00
carla
cb927e89b0
routing/test: add check that sendpayment completes
As is, we don't check that our SendPayment call in
TestRouterPaymentStateMachine completes. This makes it easier
to create malformed tests that just run through steps but leave
the SendPayment call hanging. This commit adds a check that we
have completed our payment to help catch tests like this. We
also remove an unused quit channel.
2021-04-23 08:39:36 +02:00
Olaoluwa Osuntokun
7b589e5811
routing: add strict zombie pruning as a config level param
In this commit, we add strict zombie pruning as a config level param.
This allow us to add the option for those that want a tighter graph, and
not change the default composition of the channel graph for most users
over night.

In addition, we expand the test case slightly by testing that the self
node won't be pruned, but also that if there's a node with only a single
known stale edge, then both variants will prune that edge.
2021-04-21 13:56:27 -05:00
Olaoluwa Osuntokun
916059da48
routing: update chan pruning test w/ new zombie logic 2021-04-21 13:56:24 -05:00
Conner Fromknecht
e3a8b3b0c4
routing/router: prune zombies when either end is stale 2021-04-21 13:56:21 -05:00
yyforyongyu
37aa49c7aa
routing: fix pathfind edge features being nil 2021-04-13 15:53:18 +08:00
Wilmer Paulino
82fe5d9bba
build: update btcwallet dependency introducing pruned bitcoind support
This is achieved by some recent work within the BitcoindClient enabling
it to retrieve pruned blocks from its server's peers.
2021-04-06 14:55:14 -07:00
Olaoluwa Osuntokun
ca96e66b43
Merge pull request #5116 from joostjager/mc-deadlock
routing: fix mission control deadlock
2021-04-05 20:02:57 -07:00
Wilmer Paulino
a620ce3682
build: update btcd and btcwallet dependencies 2021-04-05 15:41:04 -07:00
Olaoluwa Osuntokun
a329c80612
Merge pull request #5133 from wpaulino/routing-validation-cancel-deps
discovery+routing: cancel dependent jobs if parent validation fails
2021-04-01 18:32:58 -07:00
Johan T. Halseth
1231c90a19
routing: avoid open DB transaction if no zombies to prune
We add a simple length check to the channels to be pruned to avoid
opening the DB if there is nothing to be done.
2021-03-30 11:04:13 +02:00
Johan T. Halseth
a0f3624303
routing: delay initial zombie prune by 30 sec
Since zombie pruning can be very slow on some devices (e.g. mobile) it
would stall lnd startup. Since it is not essential for pruning to be
finished for lnd to be functional, we instead delay the initial prune by
30 seconds.

Note that we could also wait for the graphPruneInterval to tick, but
since this is by default 2 hours, it is unlikely that a mobile app will
ever be open that long.
2021-03-30 11:04:13 +02:00
Wilmer Paulino
393111cea9
discovery+routing: cancel dependent jobs if parent validation fails
Previously, we would always allow dependent jobs to be processed,
regardless of the result of its parent job's validation. This isn't
correct, as a parent job contains actions necessary to successfully
process a dependent job. A prime example of this can be found within the
AuthenticatedGossiper, where an incoming channel announcement and update
are both processed, but if the channel announcement job fails to
complete, then the gossiper is unable to properly validate the update.
This commit aims to address this by preventing the dependent jobs to
run.
2021-03-23 11:56:51 -07:00
carla
22491ad567
routing: add mission control import functionality 2021-03-18 10:46:45 +02:00
Joost Jager
56238ebc60 routing: remove unnecessary lock in mission control init
The init code is part of the instantiation, so there is no need for
locking yet.
2021-03-17 12:06:12 +01:00
Joost Jager
89751f869f routing: fix mission control deadlock
This commit fixes the following potential deadlock situation:
* Pathfinding holds a database lock and tries to obtain a mission control lock
via GetProbability
* ReportPaymentSuccess/ReportPaymentFail holds a mission control lock
and tries to obtain a database lock to store the payment result.
2021-03-17 12:03:32 +01:00
yyforyongyu
541fbbb054
routing: check payment.DestFeatures against nil 2021-03-17 00:14:20 +08:00
Conner Fromknecht
250bc8560e
routing: avoid modifying AssumeChannelValid in unit tests
This produces a race condition when reading AssumeChannelValid from a
different goroutine. Instead we isolate the test cases and initial
AssumeChannelValid properly.
2021-02-17 22:43:24 -08:00
Conner Fromknecht
f7c5236bf6
routing: dial back max concurrent block fetches
This commit reduces the number of concurrent validation operations the
router will perform when fully validating the channel graph. Reports
from several users indicate that GetInfo would hang for several minutes,
which is believed to be caused by attempting to validate massive amounts
of channels in parallel. This commit returns the limit back to its
original state before adding the batched gossip improvements.

We keep the 1000 concurrent validation request limit for
AssumeChannelValid, since we don't fetch blocks in that case. This
allows us to still keep the performance benefits on mobile/low-resource
devices.
2021-02-17 18:17:09 -08:00
Olaoluwa Osuntokun
b73a6e2c61
routing: if MaxShardAmt is set, then use that as a ceiling for our splits
In this commit, we thread through the necessary state to allow users to
set a max shard amount. If this value is set, then this'll effectively
serve as a ceiling for all our split attempts. If we need to split,
we'll first try to use `paymentAmt/2`, if that's bigger than
`MaxShardAmt, then we'll use the latter instead.

Ideally in the future we have a dynamic way to automatically set both
the `MaxShardAmt` as well as `MaxParts` for users. Until then exposing
these two new fields will allow us to experiment with setting them
automatically using the RPC interface, and also give users a bit more
control over how we attempt to route payments, akin to coin control for
on-chain payments.

Fixes #4730
2021-02-15 19:31:52 -08:00
Olaoluwa Osuntokun
7398e59927
lnrpc/routerrpc+routing: add new MaxShardAmt field to LightningPayment 2021-02-15 19:31:49 -08:00
Olaoluwa Osuntokun
7b0ea3c029
Merge pull request #4909 from carlaKC/mc-paramsapi
routing: allow runtime updates to mission control config
2021-02-10 18:51:53 -08:00
Johan T. Halseth
7e34132c53
routing: let graph methods take scheduler option 2021-02-10 23:54:03 +01:00
Olaoluwa Osuntokun
107f19a049
routing: add new TestPaymentAddrOnlyNoSplit test case
This test case ensures that we won't try to split payment if the dest
has the payment addr bit, but NOT the mpp optional/require bit.
2021-02-03 17:53:40 -08:00
Olaoluwa Osuntokun
27c1779757
routing: allow custom dest feature bits in integratedRoutingContext.testPayment
This is a preparatory commit for a new test to ensure that if a node
only has the TLV and payment addr feature bits, we don't try to split a
payment.
2021-02-03 17:53:32 -08:00
Olaoluwa Osuntokun
301f1a870e
Merge pull request #4924 from champo/check_payreq_multipart
routerrpc,routing: limit max parts if the invoice doesn't declare MPP support
2021-02-03 16:53:49 -08:00
Oliver Gugger
02267565fe
multi: unify code blocks in READMEs 2021-01-22 09:14:11 +01:00
carla
64dad77e2e
multi: add get and set mission control to routerrpc 2021-01-19 10:57:15 +02:00
carla
edac5bb868
routing: add getter and setter for mission control config 2021-01-19 10:57:14 +02:00
carla
e10e8f11de
routing: extract probability estimator cfg and add validation
In preparation for allowing live update of mc config, we extract our
probability estimator cfg for easy update and add validation.
2021-01-19 10:57:13 +02:00
carla
7b24b586a0
routing: move locking for ReportPaymentSuccess and ReportPaymentFailure
All of the other mission control exported functions acquire their locks
immediately, and do not lock in the subsequent unexported functions.
This commit moves the lock up for the report payment functions so that
mission control's config values are covered by this lock, in preparation
for allowing config to be updated at runtime. Moving this lock means
that we will hold the lock for the additional time it takes to store a
single result, AddResult, to the store.
2021-01-19 10:57:12 +02:00
carla
97442da8f7
routing: add string method for cfg 2021-01-19 10:57:11 +02:00
carla
0735d359b9
router: move self node out of config
We are going to use the config struct to allow getting and setting
of the mission control config in the commits that follow. Self node
is not something we want to change, so we move it out for better
separation.
2021-01-19 10:57:10 +02:00