This commit removes the write backoff, since subsequent retries no
longer need to access the write pool. Subsequent flushes will resume
writing any partial writes that occurred before the timeout until the
message is received or the idle write timer is triggered.
In the future, we can add callbacks that execute on write events
timeouts. This can be useful for mobile clients that might roam, as the
timeout could indicate the connection is dead even if the OS has not
reported it closed. The callback can be used then, for example, to
initiate another outbound connection to test whether or not the issue is
related to the connection.
This commit modifies the way the link writes messages to the wire, by
first buffering ciphertexts to the connection using WriteMessage, and
then calling Flush separately. Currently, the call to Write tries to do
both, which can result in a blocking operation for up to the duration of
the write timeout. Splitting these operations permits less blocking in
the write pool, since now we only need to use a write worker to
serialize and encrypt the plaintext.
After the write pool is released, the peer then attempts to flush the
message using the appropriate write timeout. If a timeout error occurs,
the peer will continue to flush the message w/o serializing or
encrypting the message again, until the message is fully written to the
wire or the write idle timer disconnects the peer.
As a preliminary step to integrating the separated WriteMessage and
Flush calls in the peer, we'll modify the peer to only set a timestamp
on Ping messages once. This makes sense for two reasons, 1) if the
message has already been partially written, we have already committed to
a ping time, and 2) a ciphertext containing the first ping time will
already be buffered in the connection, and we will only be attempting to
Flush on timeout errors.
This commits exposes the various parameters around going to chain and
accepting htlcs in a clear way.
In addition to this, it reverts those parameters to what they were
before the merge of commit d1076271456bdab1625ea6b52b93ca3e1bd9aed9.
In this commit, we replace the NoChanUpdates flag with a flag that
allows us to specify the number of peers we want to actively receive new
graph updates from. This will be required when integrating the new
gossiper SyncManager subsystem with the rest of lnd.
In this commit, we modify the filter we use to determine if we should
add a new channel to the switch to reflect the new channel restoration
state. For all other non-default states, we want to avoid loading in a
channel, but for the restoration state, we need to load the link in
order to ensure we initiate the data loss protection protocol once we
connect to the remote peer.
In this commit, we add a 5 minute idle timer to
the write handler. After catching the write
timeouts, it's been observed that some connections
have trouble reading a message for several hours.
This typically points to a deeper issue w/ the peer
or, e.g. the remote peer switched networks. This now
mirrors the idle timeout used in the read handler,
such that we will disconnect a peer if we are unable
to send or receive a message from the peer after 5
minutes.
We also modify the readHandler to drain its
idleTimer's channel in the even that the timer had
already fired, but we successfully sent the message.
This commit reduces the peer's write timeout to 5s.
Now that the peer catches write timeouts and doesn't
disconnect, this will ensure we spend less time blocking
in the write pool in case others also need to access the
workers concurrently. Slower peers will now only block
for 5s, after every reattempt w/ exponential backoff.
This commit modifies the writeHandler to catch timeout
errors, and retry writes to the socket after a small
backoff, which increases exponentially from 5s to 1m.
With the growing channel graph size, some lower-powered
devices can be slow to pull messages off the wire during
validation. The current behavior will cause us to
disconnect the peer, and resend all of the messages that
the remote peer is slow to validate. Catching the timeout
helps in preventing such expensive reconnection cycles,
especially as the network continues to grow.
This is also a preliminary step to reducing the
write timeout constant. This will allow concurrent usage
of the write pools w/out devoting excessive amounts of
time blocking the pool for slow peers.
This commit increase the expiry grace delta to a value above the
broadcast delta. This prevents htlcs from being accepted that would
immediately trigger a channel force close.
A correct delta is generated in server.go where there is access to
the broadcast delta and passed via the peer to the links.
Co-authored-by: Jim Posen <jim.posen@gmail.com>
In this commit, we set a default max HTLC in the forwarding
policies of newly open channels.
The ForwardingPolicy's MaxHTLC field (added in this commit)
will later be used to decide whether an HTLC satisfies our policy before
forwarding it.
To ensure the ForwardingPolicy's MaxHTLC default matches the max HTLC
advertised in the ChannelUpdate sent out for this channel, we also add
a MaxPendingAmount() function to the lnwallet.Channel.
In this commit, we add a prefix naming scheme to the ChannelStatus enum
variables. We do this as it enables outside callers to more easily
identify each individual enum variable as a part of the greater
enum-like type.
In this commit:
* we partition lnwire.ChanUpdateFlag into two (ChanUpdateChanFlags and
ChanUpdateMsgFlags), from a uint16 to a pair of uint8's
* we rename the ChannelUpdate.Flags to ChannelFlags and add an
additional MessageFlags field, which will be used to indicate the
presence of the optional field HtlcMaximumMsat within the ChannelUpdate.
* we partition ChannelEdgePolicy.Flags into message and channel flags.
This change corresponds to the partitioning of the ChannelUpdate's Flags
field into MessageFlags and ChannelFlags.
Co-authored-by: Johan T. Halseth <johanth@gmail.com>
In this commit, we modify the peer's writeMessage
method to properly handle errors returned from
encoding an lnwire message and from setting the
write deadline on the connection. Since an error
would likely result in an empty byte slice, the
worse case seems to be that we may have tried to
send an empty message on the wire.
Lastly, we correct the way we compute bytes sent
on the wire to properly count the number of bytes
*written*, and not just the length of the encoded
message.
In this commit, we remove the per channel `sigPool` within the
`lnwallet.LightningChannel` struct. With this change, we ensure that as
the number of channels grows, the number of gouroutines idling in the
sigPool stays constant. It's the case that currently on the daemon, most
channels are likely inactive, with only a hand full actually
consistently carrying out channel updates. As a result, this change
should reduce the amount of idle CPU usage, as we have less active
goroutines in select loops.
In order to make this change, the `SigPool` itself has been publicly
exported such that outside callers can make a `SigPool` and pass it into
newly created channels. Since the sig pool now lives outside the
channel, we were also able to do away with the Stop() method on the
channel all together.
Finally, the server is the sub-system that is currently responsible for
managing the `SigPool` within lnd.
This commit makes the AddNewChannel expect a OpenChannel instead of a
LightningChannel struct. This moves the responsibility for starting the
LightningChannel from the fundingmanager to the peer, and we can defer
the channel restoration until we know that the channel is not already
active.
In this commit, we also check ErrEdgeNotFound when attempting to send an
active/inactive channel update for a channel to the network. We do this
as it's possible that a channel has confirmed, but it still does not
meet the required number of confirmations to be publicly announced.
In this commit, we fix a small bug with regards to the persistent peer
connection pruning logic. Before this commit, it'd be the case that we'd
prune a persistent connection to a peer if all links happen to be
inactive. This isn't ideal, as the channels are still open, so we should
always be atttempting to connect to them. We fix this by looking at the
set of channels on-disk instead and prune the persistent connection if
there aren't any.