TestRouterPaymentStateMachine tests that the router interacts as
expected with the ControlTower during a payment lifecycle, such that it
payment attempts are not sent twice to the switch, and results are
handled after a restart.
This commit makes the router use the ControlTower to drive the payment
life cycle state machine, to keep track of active payments across
restarts. This lets the router resume payments on startup, such that
their final results can be handled and stored when ready.
This encapsulates all state needed to resume a payment from any point of
the payment flow, and that must be shared between the different stages
of the execution. This is done to prepare for breaking the send loop
into smaller parts, and being able to resume the payment from any point
from persistent state.
In this commit we move handing the deobfuscator from the router to the
switch from when the payment is initiated, to when the result is
queried.
We do this because only the router can recreate the deobfuscator after a
restart, and we are preparing for being able to handle results across
restarts.
Since the deobfuscator cannot be nil anymore, we can also get rid of
that special case.
This lets us distinguish an critical error from a actual payment result
(success or failure). This is important since we know that we can only
attempt another payment when a final result from the previous payment
attempt is received.
This commit moves the responsibility of generating a unique payment ID
from the switch to the router. This will make it easier for the router
to keep track of which HTLCs were successfully forwarded onto the
network, as it can query the switch for existing HTLCs as long as the
paymentIDs are kept.
The router is expected to maintain a map from paymentHash->paymentID,
such that they can be replayed on restart. This also lets the router
check the status of a sent payment after a restart, by querying the
switch for the paymentID in question.
This commit reevaluates the router's quit channel between each block
during the initial call to syncGraphWithChain, which, in the worst case,
may have to scan several thousand blocks on startup if the node has not
been active for some time. Without this, attempting to stop the daemon
will not exit until the rescan has completed, which for certain backends
could be several hours.
In this commit, we update the process that we use to generate a sphinx
packet to send our onion routed HTLC. Due to recent changes in the
`sphinx` package we use, we now need to use a new PaymentPath struct. As
a result, it no longer makes sense to split up the nodes in a route and
their per hop paylods as they're now in the same struct. All tests have
been updated accordingly.
In this commit, we make our findPath function use an edge's MaxHTLC as
its available bandwidth instead of its Capacity. We do this as it's
possible for the capacity of an edge to not exist when operating as a
light client. For channels that do not support the MaxHTLC optional
field, we'll fall back to using the edge's Capacity.
In this commit, we refactor DeleteChannelEdge to use ChannelIDs rather
than ChannelPoints. We do this as the only use of DeleteChannelEdge is
when we are pruning zombie channels from our graph. When running under a
light client, we are unable to obtain the ChannelPoint of each edge due
to the expensive operations required to do so. As a stop-gap, we'll
resort towards using an edge's ChannelID instead, which is already
gossiped between nodes.
Since light clients no longer have access to an edge's capacity, they
are unable to validate whether the max HTLC value for an updated edge
policy respects the capacity limit. As a stop-gap, we'll skip this
check.
This serves as a stop-gap for light clients as blocks need to be
downloaded from the P2P network, and even with caches, would be too
costly for them to verify. Doing this has two side effects however:
we'll no longer know of the channel capacity and outpoint, which are
essential for some of lnd's responsibilities.
In this commit, we disable attempting to determine when a channel has
been closed out on-chain whenever AssumeChannelValid is active. Since
the flag indicates that performing this operation is expensive, we do
this as a temporary optimization until we can include proofs of channels
being closed in the gossip protocol.
With this change, the only way for channels being removed from the graph
will be once they're considered zombies: which can happen when both
edges of a channel have their disabled bits set or when both edges
haven't had an update within the past two weeks.
To ensure we don't mark an edge as live again just because an update
with a fresh timestamp was received, we'll ensure that we reject any
new updates for zombie channels if they remain disabled when running
with AssumeChannelValid.