Update our payment lifecycle test to run each test case with
a fresh router. This prevents test cases from interacting with
each other. Names are also added for easy debugging.
As is, we don't check that our SendPayment call in
TestRouterPaymentStateMachine completes. This makes it easier
to create malformed tests that just run through steps but leave
the SendPayment call hanging. This commit adds a check that we
have completed our payment to help catch tests like this. We
also remove an unused quit channel.
In this commit, we add strict zombie pruning as a config level param.
This allow us to add the option for those that want a tighter graph, and
not change the default composition of the channel graph for most users
over night.
In addition, we expand the test case slightly by testing that the self
node won't be pruned, but also that if there's a node with only a single
known stale edge, then both variants will prune that edge.
Since zombie pruning can be very slow on some devices (e.g. mobile) it
would stall lnd startup. Since it is not essential for pruning to be
finished for lnd to be functional, we instead delay the initial prune by
30 seconds.
Note that we could also wait for the graphPruneInterval to tick, but
since this is by default 2 hours, it is unlikely that a mobile app will
ever be open that long.
Previously, we would always allow dependent jobs to be processed,
regardless of the result of its parent job's validation. This isn't
correct, as a parent job contains actions necessary to successfully
process a dependent job. A prime example of this can be found within the
AuthenticatedGossiper, where an incoming channel announcement and update
are both processed, but if the channel announcement job fails to
complete, then the gossiper is unable to properly validate the update.
This commit aims to address this by preventing the dependent jobs to
run.
This commit fixes the following potential deadlock situation:
* Pathfinding holds a database lock and tries to obtain a mission control lock
via GetProbability
* ReportPaymentSuccess/ReportPaymentFail holds a mission control lock
and tries to obtain a database lock to store the payment result.
This produces a race condition when reading AssumeChannelValid from a
different goroutine. Instead we isolate the test cases and initial
AssumeChannelValid properly.
This commit reduces the number of concurrent validation operations the
router will perform when fully validating the channel graph. Reports
from several users indicate that GetInfo would hang for several minutes,
which is believed to be caused by attempting to validate massive amounts
of channels in parallel. This commit returns the limit back to its
original state before adding the batched gossip improvements.
We keep the 1000 concurrent validation request limit for
AssumeChannelValid, since we don't fetch blocks in that case. This
allows us to still keep the performance benefits on mobile/low-resource
devices.
In this commit, we thread through the necessary state to allow users to
set a max shard amount. If this value is set, then this'll effectively
serve as a ceiling for all our split attempts. If we need to split,
we'll first try to use `paymentAmt/2`, if that's bigger than
`MaxShardAmt, then we'll use the latter instead.
Ideally in the future we have a dynamic way to automatically set both
the `MaxShardAmt` as well as `MaxParts` for users. Until then exposing
these two new fields will allow us to experiment with setting them
automatically using the RPC interface, and also give users a bit more
control over how we attempt to route payments, akin to coin control for
on-chain payments.
Fixes#4730
All of the other mission control exported functions acquire their locks
immediately, and do not lock in the subsequent unexported functions.
This commit moves the lock up for the report payment functions so that
mission control's config values are covered by this lock, in preparation
for allowing config to be updated at runtime. Moving this lock means
that we will hold the lock for the additional time it takes to store a
single result, AddResult, to the store.
We are going to use the config struct to allow getting and setting
of the mission control config in the commits that follow. Self node
is not something we want to change, so we move it out for better
separation.
* mod: bump btcwallet version to accept db timeout
* btcwallet: add DBTimeOut in config
* kvdb: add database timeout option for bbolt
This commit adds a DBTimeout option in bbolt config. The relevant
functions walletdb.Open/Create are updated to use this config. In
addition, the bolt compacter also applies the new timeout option.
* channeldb: add DBTimeout in db options
This commit adds the DBTimeout option for channeldb. A new unit
test file is created to test the default options. In addition,
the params used in kvdb.Create inside channeldb_test is updated
with a DefaultDBTimeout value.
* contractcourt+routing: use DBTimeout in kvdb
This commit touches multiple test files in contractcourt and routing.
The call of function kvdb.Create and kvdb.Open are now updated with
the new param DBTimeout, using the default value kvdb.DefaultDBTimeout.
* lncfg: add DBTimeout option in db config
The DBTimeout option is added to db config. A new unit test is
added to check the default DB config is created as expected.
* migration: add DBTimeout param in kvdb.Create/kvdb.Open
* keychain: update tests to use DBTimeout param
* htlcswitch+chainreg: add DBTimeout option
* macaroons: support DBTimeout config in creation
This commit adds the DBTimeout during the creation of macaroons.db.
The usage of kvdb.Create and kvdb.Open in its tests are updated with
a timeout value using kvdb.DefaultDBTimeout.
* walletunlocker: add dbTimeout option in UnlockerService
This commit adds a new param, dbTimeout, during the creation of
UnlockerService. This param is then passed to wallet.NewLoader
inside various service calls, specifying a timeout value to be
used when opening the bbolt. In addition, the macaroonService
is also called with this dbTimeout param.
* watchtower/wtdb: add dbTimeout param during creation
This commit adds the dbTimeout param for the creation of both
watchtower.db and wtclient.db.
* multi: add db timeout param for walletdb.Create
This commit adds the db timeout param for the function call
walletdb.Create. It touches only the test files found in chainntnfs,
lnwallet, and routing.
* lnd: pass DBTimeout config to relevant services
This commit enables lnd to pass the DBTimeout config to the following
services/config/functions,
- chainControlConfig
- walletunlocker
- wallet.NewLoader
- macaroons
- watchtower
In addition, the usage of wallet.Create is updated too.
* sample-config: add dbtimeout option
In this commit, we extend the `BuildRoute` method and RPC on the router
sub-server to accept a raw payment address which will be included as
part of an MPP payload for the finla hop. This change actually also
allows users to craft their own MPP paths using BuildRoute+SendToRoute.
Our primary goal however, was to fix some broken itests since we now
require the payAddr to be present for ALL payments other than key send
payments.
This allows for a 1000 different persistent operations to proceed
concurrently. Now that we are batching operations at the db level, the
average number of outstanding requests will be higher since the commit
latency has increased. To compensate, we allow for more outstanding
requests to keep the router busy while batches are constructed.
Similarly as with kvdb.View this commits adds a reset closure to the
kvdb.Update call in order to be able to reset external state if the
underlying db backend needs to retry the transaction.
This commit adds a reset() closure to the kvdb.View function which will
be called before each retry (including the first) of the view
transaction. The reset() closure can be used to reset external state
(eg slices or maps) where the view closure puts intermediate results.
This commit clamps all user-chosen CLTVs in LND to be at least 18, which
is the new conservative value used in the sepc. This minimum is applied
uniformly to forwarding CLTV deltas (via channel updates) as well as
final CLTV deltas for new invoices.
This commit reverts cb4cd49dc8d3b0255afe9ff29af9c46c2dbb2c98 to bring
back the insufficient local balance failure.
Distinguishing betweeen this failure and a regular "no route" failure
prevents meaningless htlcs from being sent out.
It can happen that the receiver times out some htlcs of the set if it
took to long to complete. But because the sender's mission control is
now updated, it is worth to keep trying to send those shards again.
Modifies the payment session to launch additional pathfinding attempts
for lower amounts. If a single shot payment isn't possible, the goal is
to try to complete the payment using multiple htlcs. In previous
commits, the payment lifecycle has been prepared to deal with
partial-amount routes returned from the payment session. It will query
for additional shards if needed.
Additionally a new rpc payment parameter is added that controls the
maximum number of shards that will be used for the payment.
With mpp it isn't possible anymore for findPath to determine that there
isn't enough local bandwidth. The full payment amount isn't known at
that point.
In a follow-up, this payment outcome can be reintroduced on a higher
level (payment lifecycle).
This commit fixes the inconsistency between the payment state as
reported by routerrpc.SendPayment/routerrpc.TrackPayment and the main
rpc ListPayments call.
In addition to that, payment state changes are now sent out for every
state change. This opens the door to user interfaces giving more
feedback to the user about the payment process. This is especially
interesting for multi-part payments.
We whitelist a set of "expected" errors that can be returned from
RequestRoute, by converting them into a new type noRouteError. For any
other error returned by RequestRoute, we'll now exit immediately.
We add validation making sure we are not trying to register MPP shards
for non-MPP payments, and vice versa. We also add validtion of total
sent amount against payment value, and matching MPP options.
We also add methods for copying Route/Hop, since it is useful to use
for modifying the route amount in the test.
This commit enables MPP sends for SendToRoute, by allowing launching
another payment attempt if the hash is already registered with the
ControlTower.
We also set the total payment amount of of the payment from mpp record,
to indicate that the shard value might be different from the total
payment value.
We only mark non-MPP payments as failed in the database after
encountering a failure, since we might want to try more shards for MPP.
For now this means that MPP sendToRoute payments will be failed
only after a restart has happened.
This commit finally enables MP payments within the payment lifecycle
(used for SendPayment). This is done by letting the loop launch shards
as long as there is value remaining to send, inspecting the outcomes for
the sent shards when the full payment amount has been filled.
The method channeldb.MPPayment.SentAmt() is added to easily look up how
much value we have sent for the payment.
(almost) PURE CODE MOVE
The only code change is to change a few select cases from
case _ <- channel:
to
case <- channel:
to please the linter.
The test is testing the payment lifecycle, so move it to
payment_lifecycle_test.go
This commit redefines how the control tower handles shard and payment
level settles and failures. We now consider the payment in flight as
long it has active shards, or it has no active shards but has not
reached a terminal condition (settle of one of the shards, or a payment
level failure has been encountered).
We also make it possible to settle/fail shards regardless of the payment
level status (since we must allow late shards recording their status
even though we have already settled/failed the payment).
Finally, we make it possible to Fail the payment when it is already
failed. This is to allow multiple concurrent shards that reach terminal
errors to mark the payment failed, without havinng to synchronize.
In preparation for MPP we return the terminal errors recorded with the
control tower. The reason is that we cannot return immediately when a
shard fails for MPP, since there might be more shards in flight that we
must wait for. For that reason we instead mark the payment failed in the
control tower, then return this error when we inspect the payment,
seeing it has been failed and there are no shards in flight.
To move towards how we will handle existing attempt in case of MPP
(collecting their outcome will be done in separate goroutines separate
from the payment loop), we move to collect their outcome first.
To easily fetch HTLCs that are still not resolved, we add the utility
method InFlightHTLCs to channeldb.MPPayment.
Now that SendToRoute is no longer using the payment lifecycle, we move
the max hop check out of the payment shard's launch() method, and return
the error directly, such that it can be handled in SendToRoute.
Now that SendToRoute is no longer using the payment lifecycle, we
remove the error structs and vars used to cache the last encountered
error. For SendToRoute this will now be returned directly after a shard
has failed.
For SendPayment this means that the last error encountered durinng
pathfinding no longer will be returned. All errors encounterd can
instead be inspected from the HTLC list.
Instead of having SendToRoute pull routes from the payment session in
the payment lifecycle, we utilize the new methods on the paymentShard to
launch and collect the result for this single route.
This also let us remove the check for noRouteError, as we will always
have the result from the tried attempt returned. A result of this is
that we can finally remove lastError from the payment lifecycle (see
next commits).