This commit modifies sendMessage to break on the server's quit channel,
which allows synchronous callers of SendMessage or SendLazyMessage to
receive an error during server shutdown which can be independent of a
particular peer's shutdown.
As of https://github.com/lightningnetwork/lnd/pull/2916, all replies
made by gossip syncers were modified to be synchronous. In certain
cases, This would prevent the syncers from shutting down promptly, as
they would try to offload a batch a of messages that could not be
aborted. Now, an error will be propagated back to the caller, allowing
them to detect the error condition, and reevaluate their own quit
signals, releasing any waitgrouped goroutines and permitting a quick
shutdown.
This commit fixes a bug where it was impossible to run lnd in litecoin's
simnet mode because of code duplication. As a result `numNets > 1`
conditional was always true when running lnd with `cfg.Litecoin.SimNet`
flag.
In this commit, we update the itests to expect the correct message for
the sphinx replay test. Before the fixes in the prior commits, we
expected the wrong error since we were actually unable to decrypt these
converted malformed HTLC errors. Now, we'll properly return a parse able
error, so we assert against that error instead of the failure to decode
an error.
In this commit, we add a new test to ensure that we're able to properly
convert malformed HTLC errors that are sourced from multiple hops away,
or our direct channel peers. In order to test this effectively, we force
the onion decryptors of various peers to always fail which will trigger
the malformed HTLC logic.
In this commit, we now properly convert multi-hop malformed HTLC
failures. Before this commit, we wouldn't properly add a layer of
encryption to these errors meaning that the destination would fail to
decrypt the error as it was actually plaintext.
To remedy this, we'll now check if we need to convert an error, and if
so we'll encrypt it as if it we were the source of the error (the true
source is our direct channel peer).
In this commit, we fix a bug that caused us to be unable to properly
handle malformed HTLC failures from our direct link. Before this commit,
we would attempt to decrypt it and fail since it wasn't well formed. In
this commit, if its an error for a local payment, and it needed to be
converted, then we'll decode it w/o decrypting since it's already
plaintext.
In this commit, we add a new method to the ErrorEncrypter interface:
`EncryptMalformedError`. This takes a raw error (no encryption or MAC),
and encrypts it as if we were the originator of this error. This will be
used by the switch to convert malformed fail errors to regular fully
encrypted errors.
In this commit, we start the first phase of fixing an existing bug
within the switch. As is, we don't properly convert
`UpdateFailMalformedHTLC` to regular `UpdateFailHTLC` messages that are
fully encrypted. When we receive a `UpdateFailMalformedHTLC` today,
we'll convert it into a regular fail message by simply encoding the
failure message raw. This failure message doesn't have a MAC yet, so if
we sent it backwards, then the destination wouldn't be able to decrypt
it. We can recognize this type of failure as it'll be the same size as
the raw failure message max size, but it has 4 extra bytes for the
encoding. When we come across this message, we'll mark is as needing
conversion so the switch can take care of it.
In this commit, we update the process that we use to generate a sphinx
packet to send our onion routed HTLC. Due to recent changes in the
`sphinx` package we use, we now need to use a new PaymentPath struct. As
a result, it no longer makes sense to split up the nodes in a route and
their per hop paylods as they're now in the same struct. All tests have
been updated accordingly.
Prior to this change, the numQueryResponses that we calculated would be
one more than what we actually wanted since it didn't account for the
initial QueryChannelRange msg. This resulted in the test sending one
extra delayed query than was configured. This doesn't fundamentally
impact the test, but does make what happens in the test more reflective
of the configuration.
This commit makes all replies in the gossip syncer synchronous, meaning
that they will wait for each message to be successfully written to the
remote peer before attempting to send the next. This helps throttle
messages the remote peer has requested, preventing unintended
disconnects when the remote peer is slow to process messages. This
changes also helps out congestion in the peer by forcing the syncer to
buffer the messages instead of dumping them into the peer's queue.
As a prepatory step to making gossip replies synchronous, we will move
the ErrPeerExiting error into the lnpeer package so that it can be
imported by the discovery package. With synchronous sends, this error
can now be detected and handled by goroutines in the syncer, and cause
them to exit instead of continue to sending backlogs.
This commit creates a distinct replyHandler, completely isolating the
requesting state machine from the processing of queries from the remote
peer. Before the two were interlaced, and the syncer could only reply to
messages in certain states. Now the two will be complete separated,
which is preliminary step to make the replies synchronous (as otherwise
we would be blocking our own requesting state machine).
With this changes, the channelGraphSyncer of each peer will drive the
replyHanlder of the other. The two can now operate independently, or
even spun up conditionally depending on advertised support for gossip
queries, as shown below:
A B
channelGraphSyncer ---control-msg--->
replyHandler
channelGraphSyncer <--control-msg----
gossiper <--gossip-msgs----
<--control-msg---- channelGraphSyncer
replyHandler
---control-msg---> channelGraphSyncer
---gossip-msgs---> gossiper
In this commit, we modify the way we detect local force closes. Before
this commit, we would directly check the broadcast commitment's txid
against what we know to be our best local commitment. In the case of DLP
recovery from an SCB, it's possible that the user force closed, _then_
attempted to recover their channels. As a result, we need to check the
outputs directly in order to also handle this rare, but
possible recovery scenario.
The new detection method uses the outputs to detect if it's a local
commitment or not. Based on the state number, we'll re-derive the
expected scripts, and check to see if they're on the commitment. If not,
then we know it's a remote force close. A new test has been added to
exercise this new behavior, ensuring we catch local closes where we have
and don't have a direct output.