Now that the write pool no longer executes blocking i/o operations, it
is safe to reduce this number considerably. The write pool now only
handles encoding and encryption of messages, making problem
computationally bound and thus dependent on available CPUs. The
descriptions of the workers configs is also updated to explain how users
should set these on their on machines.
This commit removes the write backoff, since subsequent retries no
longer need to access the write pool. Subsequent flushes will resume
writing any partial writes that occurred before the timeout until the
message is received or the idle write timer is triggered.
In the future, we can add callbacks that execute on write events
timeouts. This can be useful for mobile clients that might roam, as the
timeout could indicate the connection is dead even if the OS has not
reported it closed. The callback can be used then, for example, to
initiate another outbound connection to test whether or not the issue is
related to the connection.
This commit modifies the way the link writes messages to the wire, by
first buffering ciphertexts to the connection using WriteMessage, and
then calling Flush separately. Currently, the call to Write tries to do
both, which can result in a blocking operation for up to the duration of
the write timeout. Splitting these operations permits less blocking in
the write pool, since now we only need to use a write worker to
serialize and encrypt the plaintext.
After the write pool is released, the peer then attempts to flush the
message using the appropriate write timeout. If a timeout error occurs,
the peer will continue to flush the message w/o serializing or
encrypting the message again, until the message is fully written to the
wire or the write idle timer disconnects the peer.
As a preliminary step to integrating the separated WriteMessage and
Flush calls in the peer, we'll modify the peer to only set a timestamp
on Ping messages once. This makes sense for two reasons, 1) if the
message has already been partially written, we have already committed to
a ping time, and 2) a ciphertext containing the first ping time will
already be buffered in the connection, and we will only be attempting to
Flush on timeout errors.
This commit exposes the WriteMessage and Flush interfaces of the
underlying brontide.Machine, such that callers can have greater
flexibility in when blocking network operations take place.
This commit modifies WriteMessage to only perform encryption on the
passed plaintext, and buffer the ciphertext within the connection
object. We then modify internal uses of WriteMessage to follow with a
call to Flush, which actually writes the message to the wire.
Additionally, since WriteMessage does not actually perform the write
itself, the io.Writer argument is removed from the function signature
and all call sites.
This commit adds a Flush method to the brontide.Machine, which can write
out a buffered message to an io.Writer. This is a preliminary change
which will allow the encryption of the plaintext to be done in a
distinct method from actually writing the bytes to the wire.
In this commit, we make our findPath function use an edge's MaxHTLC as
its available bandwidth instead of its Capacity. We do this as it's
possible for the capacity of an edge to not exist when operating as a
light client. For channels that do not support the MaxHTLC optional
field, we'll fall back to using the edge's Capacity.
In this commit, we refactor DeleteChannelEdge to use ChannelIDs rather
than ChannelPoints. We do this as the only use of DeleteChannelEdge is
when we are pruning zombie channels from our graph. When running under a
light client, we are unable to obtain the ChannelPoint of each edge due
to the expensive operations required to do so. As a stop-gap, we'll
resort towards using an edge's ChannelID instead, which is already
gossiped between nodes.
This commit serves as another stop-gap for light clients since they are
unable to obtain the capacity and channel point of graph edges. Since
they're aware of these things for their own channels, they can populate
the information within the graph themselves once each channel has been
successfully added to the graph.
In this commit, we extend the gossiper with support for external callers
to provide optional fields that can serve as useful when processing a
specific network announcement. This will serve useful for light clients,
which are unable to obtain the channel point and capacity for a given
channel, but can provide them manually for their own set of channels.
Since light clients no longer have access to an edge's capacity, they
are unable to validate whether the max HTLC value for an updated edge
policy respects the capacity limit. As a stop-gap, we'll skip this
check.
This serves as a stop-gap for light clients as blocks need to be
downloaded from the P2P network, and even with caches, would be too
costly for them to verify. Doing this has two side effects however:
we'll no longer know of the channel capacity and outpoint, which are
essential for some of lnd's responsibilities.
In this commit, we disable attempting to determine when a channel has
been closed out on-chain whenever AssumeChannelValid is active. Since
the flag indicates that performing this operation is expensive, we do
this as a temporary optimization until we can include proofs of channels
being closed in the gossip protocol.
With this change, the only way for channels being removed from the graph
will be once they're considered zombies: which can happen when both
edges of a channel have their disabled bits set or when both edges
haven't had an update within the past two weeks.
To ensure we don't mark an edge as live again just because an update
with a fresh timestamp was received, we'll ensure that we reject any
new updates for zombie channels if they remain disabled when running
with AssumeChannelValid.
In this commit, we add an additional heuristic when running with
AssumeChannelValid. Since AssumeChannelValid being present assumes that
we're not able to quickly determine whether channels are valid, we also
assume that any channels with the disabled bit set on both sides are
considered zombie. This should be relatively safe to do, since the
disabled bits are usually set when the channel is closed on-chain. In
the case that they aren't, we'll have to wait until both edges haven't
had a new update within two weeks to prune them.