This commit overhauls the interpretation of failed payments. It changes
the interpretation rules so that we always apply the strongest possible
set of penalties, without making assumptions that would hurt good nodes.
Main changes are:
- Apply different rule sets for intermediate and final nodes. Both types
of nodes have different sets of failures that we expect. Penalize nodes
that send unexpected failure messages.
- Distinguish between direct payments and multi-hop payments. For direct
payments, we can infer more about the performance of our peer because we
trust ourselves.
- In many cases it is impossible for the sender to determine which of
the two nodes in a pair is responsible for the failure. In this
situation, we now penalize bidirectionally. This does not hurt the good
node of the pair, because only its connection to a bad node is
penalized.
- Previously we always penalized the outgoing connection of the
reporting node. This is incorrect for policy related failures. For
policy related failures, it could also be that the reporting node
received a wrongly crafted htlc from its predecessor. By penalizing the
incoming channel, we surely hit the responsible node.
- FailExpiryTooSoon is a failure that could have been caused by any node
up to the reporting node by delaying forwarding of the htlc. We don't
know which node is responsible, therefore we now penalize all node pairs
in the route.
Previously a temporary channel failure was returning for unexpected
malformed htlc failures. This is not what we want to communicate to the
sender, because the sender may apply a penalty to us only.
Returning the temporary channel failure is especially problematic if we
ourselves are the sender and the malformed htlc failure comes from our
direct peer. When interpretating the failure, we aren't able to
distinguish anymore between our channel not having enough balance and
our peer sending an unexpected failure back.
When an undecryptable failure comes back for a payment attempt, we
previously only penalized our own outgoing connection. However,
any node could have caused this failure. It is therefore better to
penalize all node connections along the route. Then at least we know for
sure that we will hit the responsible node.
This commit updates existing tests to not rely on mission control for
pruning of local channels. Information about local channels should
already be up to date before path finding starts. If not, the problem
should be fixed where bandwidth hints are set up.
This commit moves the payment outcome interpretation logic into a
separate file. Also, mission control isn't updated directly anymore, but
results are stored in an interpretedResult struct. This allows the
mission control state to be locked for a minimum amount of time and
makes it easier to unit test the result interpretation.
This commit converts several functions from returning a bool and a
failure reason to a nillable failure reason as return parameter. This
will take away confusion about the interpretation of the two separate
values.
Previously mission control tracked failures on a per node, per channel basis.
This commit changes this to tracking on the level of directed node pairs. The goal
of moving to this coarser-grained level is to reduce the number of required
payment attempts without compromising payment reliability.
In order to prevent information leaks by nodes probing with a payment
hash, this commit changes exit hop processing so that it always returns
incorrect_or_unknown_payment_details and leaves the prober in the dark
about whether an invoice actually exists.
Align naming better with the lightning spec. Not the full name of the
failure (FailIncorrectOrUnknownPaymentDetails) is used, because this
would cause too many long lines in the code.
This commit fixes a bug in DTUint16 and DTUint32, which would cause them
to read too many bytes from the reader. This is due to the fact that
ReadFull was being called on a slice that could be greater than the
underlying type. This is not an issue for DTUint64, since the 8-byte
buffer corresponds to the maximum possible size of a uint64. The
solution is to clamp the buffer to 2 and 4 bytes respectively.
A series of tests are also added to exercise these cases.
This commit adds the truncated integer encodings used in the
variable-size onion payloads. The amount and cltv delta both use the
truncated encoding to shave bytes in the overall size, and will likely
be used in the future for additional extensions where size is a
constraint.
This commit adds concrete encoding methods for primitive integral types.
When external libs need to create custom encoders, this allows them to
do so without incurring an extra allocation on the heap. Previously, the
need to pass a pointer to the integer using an interface{} would cause
the argument to escape, which we avoid by having them copied directly.
This varint has the same serialization as the varint in btcd and
bitcoind, but has different behavior wrt returned errors. In order to
ensure the inner loop properly detects cleanly written records,
ReadVarInt will not only return EOF if it can't read the first byte, as
that means the reader has zero bytes left.
It also modifies the API to allow the caller to provided a static byte
array, which can be reused across all encoding and decoding and
increases performance.
Not all errors that occur when serving client requests in the gRPC
server are logged. As a result, at times, we can be lacking critic
information that can be used to debug issues that pop up. With this PR,
we create a basic streaming+unary interceptor that will log all errors
that occur when servicing calls.
The current format looks something like this in the logs when an error
occurs:
```
[ERR] RPCS: [/lnrpc.Lightning/SendCoins]: decoded address is of unknown format
```
This ensures that the graph synced status is marked true at some point
once a historical sync has completed. Before this commit, a stalled
historical sync could cause us to never mark the graph as synced.