Previously we only penalized the outgoing connections of a failing node.
This turned out not to be sufficient, because the next route sometimes
went into the same failing node again to try a different outgoing
connection that wasn't yet known to mission control and therefore not
penalized before.
This commit adds success mission control
results for all hops along the route in
a mpp timeout and takes no action for
the final hop along the route. This is a
temporary measure to prevent the default
logic from penalizing the final node while
we decide how to penalize mpp timeouts.
An InvalidOnionPayload implies that the onion was successfully received
by the reporting node, but that they were unable to extract the
contents. Since we assume our own behavior is correct, this mostly
likely poins to an error in the reporter's implementation or that we
sent an unknown required type. Therefore we only penalize that single
hop, and consider the failure terminal if the receiver reported it.
This commit modifies the interpretation of node-level failures.
Previously only the failing node was marked. With this commit, also the
incoming and outgoing connections involved in the route are marked as
failed.
The change prepares for the removal of node-level failures in mission
control probability estimation.
This commit modifies paymentLifecycle so that it not only feeds
failures into mission control, but successes as well.
This allows for more accurate probability estimates. Previously,
the success probability for a successful pair and a pair with
no history was equal. There was no force that pushed towards
previously successful routes.
This commit overhauls the interpretation of failed payments. It changes
the interpretation rules so that we always apply the strongest possible
set of penalties, without making assumptions that would hurt good nodes.
Main changes are:
- Apply different rule sets for intermediate and final nodes. Both types
of nodes have different sets of failures that we expect. Penalize nodes
that send unexpected failure messages.
- Distinguish between direct payments and multi-hop payments. For direct
payments, we can infer more about the performance of our peer because we
trust ourselves.
- In many cases it is impossible for the sender to determine which of
the two nodes in a pair is responsible for the failure. In this
situation, we now penalize bidirectionally. This does not hurt the good
node of the pair, because only its connection to a bad node is
penalized.
- Previously we always penalized the outgoing connection of the
reporting node. This is incorrect for policy related failures. For
policy related failures, it could also be that the reporting node
received a wrongly crafted htlc from its predecessor. By penalizing the
incoming channel, we surely hit the responsible node.
- FailExpiryTooSoon is a failure that could have been caused by any node
up to the reporting node by delaying forwarding of the htlc. We don't
know which node is responsible, therefore we now penalize all node pairs
in the route.
When an undecryptable failure comes back for a payment attempt, we
previously only penalized our own outgoing connection. However,
any node could have caused this failure. It is therefore better to
penalize all node connections along the route. Then at least we know for
sure that we will hit the responsible node.
This commit moves the payment outcome interpretation logic into a
separate file. Also, mission control isn't updated directly anymore, but
results are stored in an interpretedResult struct. This allows the
mission control state to be locked for a minimum amount of time and
makes it easier to unit test the result interpretation.