As a prepatory step to making gossip replies synchronous, we will move
the ErrPeerExiting error into the lnpeer package so that it can be
imported by the discovery package. With synchronous sends, this error
can now be detected and handled by goroutines in the syncer, and cause
them to exit instead of continue to sending backlogs.
This commit removes the write backoff, since subsequent retries no
longer need to access the write pool. Subsequent flushes will resume
writing any partial writes that occurred before the timeout until the
message is received or the idle write timer is triggered.
In the future, we can add callbacks that execute on write events
timeouts. This can be useful for mobile clients that might roam, as the
timeout could indicate the connection is dead even if the OS has not
reported it closed. The callback can be used then, for example, to
initiate another outbound connection to test whether or not the issue is
related to the connection.
This commit modifies the way the link writes messages to the wire, by
first buffering ciphertexts to the connection using WriteMessage, and
then calling Flush separately. Currently, the call to Write tries to do
both, which can result in a blocking operation for up to the duration of
the write timeout. Splitting these operations permits less blocking in
the write pool, since now we only need to use a write worker to
serialize and encrypt the plaintext.
After the write pool is released, the peer then attempts to flush the
message using the appropriate write timeout. If a timeout error occurs,
the peer will continue to flush the message w/o serializing or
encrypting the message again, until the message is fully written to the
wire or the write idle timer disconnects the peer.
As a preliminary step to integrating the separated WriteMessage and
Flush calls in the peer, we'll modify the peer to only set a timestamp
on Ping messages once. This makes sense for two reasons, 1) if the
message has already been partially written, we have already committed to
a ping time, and 2) a ciphertext containing the first ping time will
already be buffered in the connection, and we will only be attempting to
Flush on timeout errors.
This commits exposes the various parameters around going to chain and
accepting htlcs in a clear way.
In addition to this, it reverts those parameters to what they were
before the merge of commit d1076271456bdab1625ea6b52b93ca3e1bd9aed9.
In this commit, we replace the NoChanUpdates flag with a flag that
allows us to specify the number of peers we want to actively receive new
graph updates from. This will be required when integrating the new
gossiper SyncManager subsystem with the rest of lnd.
In this commit, we modify the filter we use to determine if we should
add a new channel to the switch to reflect the new channel restoration
state. For all other non-default states, we want to avoid loading in a
channel, but for the restoration state, we need to load the link in
order to ensure we initiate the data loss protection protocol once we
connect to the remote peer.
In this commit, we add a 5 minute idle timer to
the write handler. After catching the write
timeouts, it's been observed that some connections
have trouble reading a message for several hours.
This typically points to a deeper issue w/ the peer
or, e.g. the remote peer switched networks. This now
mirrors the idle timeout used in the read handler,
such that we will disconnect a peer if we are unable
to send or receive a message from the peer after 5
minutes.
We also modify the readHandler to drain its
idleTimer's channel in the even that the timer had
already fired, but we successfully sent the message.
This commit reduces the peer's write timeout to 5s.
Now that the peer catches write timeouts and doesn't
disconnect, this will ensure we spend less time blocking
in the write pool in case others also need to access the
workers concurrently. Slower peers will now only block
for 5s, after every reattempt w/ exponential backoff.
This commit modifies the writeHandler to catch timeout
errors, and retry writes to the socket after a small
backoff, which increases exponentially from 5s to 1m.
With the growing channel graph size, some lower-powered
devices can be slow to pull messages off the wire during
validation. The current behavior will cause us to
disconnect the peer, and resend all of the messages that
the remote peer is slow to validate. Catching the timeout
helps in preventing such expensive reconnection cycles,
especially as the network continues to grow.
This is also a preliminary step to reducing the
write timeout constant. This will allow concurrent usage
of the write pools w/out devoting excessive amounts of
time blocking the pool for slow peers.
This commit increase the expiry grace delta to a value above the
broadcast delta. This prevents htlcs from being accepted that would
immediately trigger a channel force close.
A correct delta is generated in server.go where there is access to
the broadcast delta and passed via the peer to the links.
Co-authored-by: Jim Posen <jim.posen@gmail.com>
In this commit, we set a default max HTLC in the forwarding
policies of newly open channels.
The ForwardingPolicy's MaxHTLC field (added in this commit)
will later be used to decide whether an HTLC satisfies our policy before
forwarding it.
To ensure the ForwardingPolicy's MaxHTLC default matches the max HTLC
advertised in the ChannelUpdate sent out for this channel, we also add
a MaxPendingAmount() function to the lnwallet.Channel.
In this commit, we add a prefix naming scheme to the ChannelStatus enum
variables. We do this as it enables outside callers to more easily
identify each individual enum variable as a part of the greater
enum-like type.
In this commit:
* we partition lnwire.ChanUpdateFlag into two (ChanUpdateChanFlags and
ChanUpdateMsgFlags), from a uint16 to a pair of uint8's
* we rename the ChannelUpdate.Flags to ChannelFlags and add an
additional MessageFlags field, which will be used to indicate the
presence of the optional field HtlcMaximumMsat within the ChannelUpdate.
* we partition ChannelEdgePolicy.Flags into message and channel flags.
This change corresponds to the partitioning of the ChannelUpdate's Flags
field into MessageFlags and ChannelFlags.
Co-authored-by: Johan T. Halseth <johanth@gmail.com>
In this commit, we modify the peer's writeMessage
method to properly handle errors returned from
encoding an lnwire message and from setting the
write deadline on the connection. Since an error
would likely result in an empty byte slice, the
worse case seems to be that we may have tried to
send an empty message on the wire.
Lastly, we correct the way we compute bytes sent
on the wire to properly count the number of bytes
*written*, and not just the length of the encoded
message.
In this commit, we remove the per channel `sigPool` within the
`lnwallet.LightningChannel` struct. With this change, we ensure that as
the number of channels grows, the number of gouroutines idling in the
sigPool stays constant. It's the case that currently on the daemon, most
channels are likely inactive, with only a hand full actually
consistently carrying out channel updates. As a result, this change
should reduce the amount of idle CPU usage, as we have less active
goroutines in select loops.
In order to make this change, the `SigPool` itself has been publicly
exported such that outside callers can make a `SigPool` and pass it into
newly created channels. Since the sig pool now lives outside the
channel, we were also able to do away with the Stop() method on the
channel all together.
Finally, the server is the sub-system that is currently responsible for
managing the `SigPool` within lnd.
This commit makes the AddNewChannel expect a OpenChannel instead of a
LightningChannel struct. This moves the responsibility for starting the
LightningChannel from the fundingmanager to the peer, and we can defer
the channel restoration until we know that the channel is not already
active.
In this commit, we also check ErrEdgeNotFound when attempting to send an
active/inactive channel update for a channel to the network. We do this
as it's possible that a channel has confirmed, but it still does not
meet the required number of confirmations to be publicly announced.
In this commit, we fix a small bug with regards to the persistent peer
connection pruning logic. Before this commit, it'd be the case that we'd
prune a persistent connection to a peer if all links happen to be
inactive. This isn't ideal, as the channels are still open, so we should
always be atttempting to connect to them. We fix this by looking at the
set of channels on-disk instead and prune the persistent connection if
there aren't any.
This commit moves the gossip sync dispatch
such that it is more tightly coupled to the
life cycle of the peer. In testing, I noticed
that the gossip syncer needs to be dispatched
before the first gossip messages come across
the wire.
The prior spawn location in the server happens
after starting all of the peer's goroutines,
which could permit an ordering where the
gossip syncer has not yet been registered.
The new location registers the gossip syncer
within the read handler such that the call is
blocks before any messages are read.
In this commit, we add a quit channel to the AddMsg method of the
msgStream struct. Before this commit, if the queue was full, the
readHandler would block and be unable to exit. We remedy this by
leveraging the existing quit channel of the peer as an additional select
case within the AddMsg method.
In this commit, we thread through the quit of the peer to the execution
of the apply function for a msgStream. This change ensures that if the
target is still processing the message, then the peer is able to exit
cleanly and not block insensately.
In this commit we move the atomic var increment that signals the
consumer goourtine has exited to the top of the method in a defer
statement. This cleans up some duplicate code and also adheres to the
pattern of using defers to signal cleaning up any dependent goroutine
state on exit.
In this commit, we raise the readHandler wait group done into a defer
statement at the top of the method. This fixes an existing but that
would cause the readHandler to declare it had exited, yet possibly still
be waiting on the chan message stream below to exit.
This commit fixes a bug that would cause us to fetch our peer's
ChannelUpdate in some cases, where we really wanted to fetch our own.
The reason this happened was that we passed the peer's pubkey to
fetchLastChanUpdate, making us match on their policy. This would lead to
ChannelUpdates being sent during routing which would have no effect on
the attempted path.
We fix this by always use our own pubkey in fetchLastChanUpdate, and
also uses the common methods within the server to be able to extract the
update even when only one policy is known.
This commit fixes a bug within the peer, where we would always use the
default forwarding policy when adding new links to the switch. Mostly
this wasn't a problem, as we most often are using default values for new
channels, but the min_htlc value is a value that is set by the remote,
not us.
If the remote specified a custom min_htlc during channel creation, we
would still use our DefaultMinHtlc value when first adding the link,
making us try to forward HTLCs that would be rejected by the remote.
We fix this by getting the required min_htlc value from the channel
state machine, and setting this for the link's forwarding policy.
Note that on restarts we will query the database for our latest
ChannelEdgePolicy, and use that to craft the forwardingPolicy. This
means that the value will be set correctly if the policy is found in the
database.
Sometimes when performing an initial sync, the remote
node isn't able to pull messages off the wire because
of long running tasks and queues are saturated. With
a shorter write timeout, we will give up trying to send
messages and teardown the connection, even though the
peer is still active.
This commit adds additional synchronization logic to
WaitForDisconnect, such that it can be spawned before
Start has been executed by the server. Without
modification, the current version will return
immediately since no goroutines will have been
spawned.
To solve this, we modify WaitForDisconnect to block until:
1) the peer is disconnected,
2) the peer is successfully started,
before watching the waitgroup.
In the first case, the waitgroup will block until all
(if any) spawned goroutines have exited. Otherwise, if
the Start is successful, we can switch to watching the
waitgroup, knowing that waitgroup counter is positive.
In this commit, we modify the existing message sending functionality
within the fundingmanager. Due to each mesage send requiring to hold the
server's lock to retrieve the peer, we might run into a case where the
lock is held for a larger than usual amount of time and would therefore
block on sending the message within the fundingmanager. We remedy this
by taking a similar approach to some recent changes within the gossiper.
We now keep track of each peer within the internal fundingmanager
messages and send messages directly to them.
In this commit, we add a timeout within the writeMessage method when we go to write to the socket. We do this as otherwise, if the other peer is blocked for some reason, we'll never actually unblock ourselves, which may cause issues in other sub-systems waiting on this write call. For now, we use a value of 10 seconds, and will adjust in the future if we deem this time period too short.
In this commit, we move the block height dependency from the links in
the switch to the switch itself. This is possible due to a recent change
on the links no longer depending on the block height to update their
commitment fees.
We'll now only have the switch be alerted of new blocks coming in and
links will retrieve the height from it atomically.