Previously we used the a priori probability also for our own untried
channels. This led to local channels that had seen a success already
being prioritized over untried local channels. In some cases, depending
on the configured payment attempt cost, this could lead to the payment
taking a two hop route while a direct payment was also possible.
Probabilities are no longer returned for querymc calls. To still provide
some insight into the mission control internals, this commit adds a new
rpc that calculates a success probability estimate for a specific node
pair and amount.
This prepares for decoupling the result interpretation of a single
payment attempt from the information stored in mission control memory
on the history of a node pair. A planned follow-up where we store both
the last success and last failure requires this decoupling.
This commit changes mission control to partially base the estimated
probability for untried connections on historical results obtained in
previous payment attempts. This incentivizes routing nodes to keep all
of their channels in good shape.
Probability estimates are amount dependent. Previously we assumed an
amount, but that starts to make less sense when we make probability more
dependent on amounts in the future.
This commit changes the in-memory structure of the mission control
state. It prepares for calculation of a node probability. For this we
need to be able to efficiently look up the last results for all channels
of a node.
This commit modifies paymentLifecycle so that it not only feeds
failures into mission control, but successes as well.
This allows for more accurate probability estimates. Previously,
the success probability for a successful pair and a pair with
no history was equal. There was no force that pushed towards
previously successful routes.
This commit moves the payment outcome interpretation logic into a
separate file. Also, mission control isn't updated directly anymore, but
results are stored in an interpretedResult struct. This allows the
mission control state to be locked for a minimum amount of time and
makes it easier to unit test the result interpretation.
This commit converts several functions from returning a bool and a
failure reason to a nillable failure reason as return parameter. This
will take away confusion about the interpretation of the two separate
values.
Previously mission control tracked failures on a per node, per channel basis.
This commit changes this to tracking on the level of directed node pairs. The goal
of moving to this coarser-grained level is to reduce the number of required
payment attempts without compromising payment reliability.
Align naming better with the lightning spec. Not the full name of the
failure (FailIncorrectOrUnknownPaymentDetails) is used, because this
would cause too many long lines in the code.
If nodes return a channel policy related failure, they may get a second
chance. Our graph may not be up to date. Previously this logic was
contained in the payment session.
This commit moves that into global mission control and thereby removes
the last mission control state that was kept on the payment level.
Because mission control is not aware of the relation between payment
attempts and payments, the second chance logic is no longer based
tracking second chances given per payment.
Instead a time based approach is used. If a node reports a policy
failure that prevents forwarding to its peer, it will get a second
chance. But it will get it only if the previous second chance was
long enough ago.
Also those second chances are no longer dependent on whether an
associated channel update is valid. It will get the second chance
regardless, to prevent creating a dependency between mission control and
the graph. This would interfer with (future) replay of history, because
the graph may not be the same anymore at that point.
This commit exposes the three main parameters that influence mission
control and path finding to the user as command line or config file
flags. It allows for fine-tuning for optimal results.
Previously every payment had its own local mission control state which
was in effect only for that payment. In this commit most of the local
state is removed and payments all tap into the global mission control
probability estimator.
Furthermore the decay time of pruned edges and nodes is extended, so
that observations about the network can better benefit future payment
processes.
Last, the probability function is transformed from a binary output to a
gradual curve, allowing for a better trade off between candidate routes.
Currently public keys are represented either as a 33-byte array (Vertex) or as a
btcec.PublicKey struct. The latter isn't useable as index into maps and
cannot be used easily in compares. Therefore the 33-byte array
representation is used predominantly throughout the code base.
This commit converts the argument types of source and target nodes for
path finding to Vertex. Path finding executes no crypto operations and
using Vertex simplifies the code.
Additionally, it prepares for the path finding source parameter to be
exposed over rpc in a follow up commit without requiring conversion back
and forth between Vertex and btcec.PublicKey.
In this commit we introduce pruning of channel edges instead of channels.
Channel failures apply to a single direction and it is unnecessarily
restricting to prune both directions.
Fixes the following issues:
- If the channel update of FailFeeInsufficient contains an invalid channel
update, it is not possible to properly add to the failed channels set.
- FailAmountBelowMinimum may apply a channel update, but does not retry.
- FailIncorrectCltvExpiry immediately prunes the vertex without
trying one more time.
In this commit, the logic for all three policy related errors is
aligned.
In this commit, we fix an existing bug that could at times lead to a
panic if a user manually crafts a route via SendToRoute, and that route
results in a payment error. The fix is simple: create the map even
though it won't be used in the sessions since the user is feeding the
router manual routes.
In this commit, we modify the recent refactoring of the mission control
sub-system to overload the existing payment session, rather than create
a brand new one. This allows us to re-use more of the existing logic, and
also feedback into mission control the failures incurred by any user
selected routes.
In this commit, we introduce a new method to the channel router's config
struct: QueryBandwidth. This method allows the channel router to query
for the up-to-date available bandwidth of a particular link. In the case
that this link emanates from/to us, then we can query the switch to see
if the link is active (if not bandwidth is zero), and return the current
best estimate for the available bandwidth of the link. If the link,
isn't one of ours, then we can thread through the total maximal
capacity of the link.
In order to implement this, the missionControl struct will now query the
switch upon creation to obtain a fresh bandwidth snapshot. We take care
to do this in a distinct db transaction in order to now introduced a
circular waiting condition between the mutexes in bolt, and the channel
state machine.
The aim of this change is to reduce the number of unnecessary failures
during HTLC payment routing as we'll now skip any links that are
inactive, or just don't have enough bandwidth for the payment. Nodes
that have several hundred channels (all of which in various states of
activity and available bandwidth) should see a nice gain from this w.r.t
payment latency.