This commit ensures that the mock attachment
directives use unique keys, ensuring that they
aren't skipped due to already having pending
connection requests. The tests fail when
they're all the same since they collide
in the pendingConns map.
This commit modifies the autopilot agent to track
all pending connection requests, and forgo further
attempts if a connection is already present.
Previously, the agent would try and establish
hundreds of requests to a node, especially if the
connections were timing out and not returning.
This resulted in an OOM OMM when cranking up
maxchannels to 200, since there would be close
to 10k pending connections before the program was
terminated. The issue was compounded by periodic
batch timeouts, causing autopilot to try and
process thousands of triggers for failing
connections to the same peer.
With these fixes, autopilot will skip nodes that we
are trying to connect to during heuristic selection.
The CPU and memory utilization have been significantly
reduced as a result.
In this commit, we implement an optimization to the autopilot agent to
ensure that we don't spin and waste CPU when we either have a large
graph, or a high max channel target for the agent. Before this commit,
each time we went to read the state of a channel from disk, we would
decompress the EC Point each time. However, for the case of the instal
ChannlEdge struct to feed to the agent, we only actually need to obtain
the pubkey, and can save the potentially expensive point decompression
for each directional channel in the graph.
We do this to avoid a huge amount of goroutines piling up when autopilot
is trying to open many channels, as they will all block trying to send
the update on the stateUpdates channel. Now we instead send them on a
buffered channel, similar to what is done with the nodeUpdates.
We do this to avoid a huge amount of goroutines piling up on initial
graph sync, as they will all block trying to send the node update on the
stateUpdates channel. Now we instead make a new buffered channel
nodeUpdates, and just return immediately if there is already a signal in
the channel waiting to be processed.
In this commit, we modify the Node interface to return a set of raw
bytes, rather than the full pubkey struct. We do this as within the
package, commonly we only require the pubkey bytes for fingerprinting
purposes. Before this commit, we were forced to _always_ decompress the
pubkey which can be expensive done thousands of times a second.
In this commit, we modify the balanceUpdate autopilot signal to update
the balance according to what's returned to the WalletBalance callback
rather than explicitly tracking the balance. This gives the agent a
better sense of what the wallet's balance actually is.
Adds a new external signal alerting autopilot that
new nodes have been added to the channel graph or
an existing node has modified its channel
announcment. This allows autopilot to examine its
current state, and attempt to open channels if our
target state is not yet met.
In this commit, we alter our mock heuristic to also take in a quit chan.
It's possible that at the end of a test the agent is blocked on a
NeedMoreChans/Select call as their mock implementations use channels. To
prevent this, we use the agent's quit chan so that the heuristic can
safely exit once the agent does.
In this commit, we refactor the existing connection logic outside of the
ChanController's OpenChannel method. We do this as previously it was
possible for peers to stall us while attempting to connect to them. In
order to remedy this, we now attempt to connect the peer before tracking
them in our set of pending opens.
The commit ensures that for every channel, there will always
be two entries in the edges bucket. If the policy from one or
both ends of the channel is unknown, it is marked as such.
This allows efficient lookup of incoming edges. This is
required for backwards payment path finding.
In this commit, we fix an existing bug that would at times cause us to
spiral out of control and potentially created thousands of outbound
connections. Our fix is simple: limit the total number of outstanding
channel establishment attempts. Without this limit, we have no way to
bound the number of active goroutines.
Fixes#772.
In this commit, we fix a regression introduced by a recent change which
would allow the agent to detect a channel as failed, and blacklist the
node, promising faster convergence with the ideal state of the
heuristic.
The source of this bug is that we would use the set of blacklisted
nodes in order to compute how many additional channels we should open.
If 10 failures happened, then we would think that we had opened up 10
channels, and stop much earlier than we actually should.
To fix this, while ensuring we don’t retry to failed peers, the
NeedMoreChans method will now return *how* anymore channels to open,
and the Select method will take in how many channels it should try to
open *exactly*.
In this commit we modify the ConstrainedPrefAttachment.Select method to
first shuffle the set of potential candidates before selecting them.
This serves to remove the existing grouping between candidates which
may have influenced the selection.
This commit modifies the Select method for the
ConstrainedPrefAttachment attachment heuristic slightly. Previously, it
was possible for an autopilot.Agent to go over the allotted number of
channels as it would unconditionally attempt to establish channel with
all returned Attachment Directives. To remedy this, we now assume that
we already have active, or pending channels to each of the nodes in the
set of skipNodes. Therefore, we now use the size of the skipNodes map
as an upper limit within the primary selection loop.
In this commit, we ensure that we grab the mutex for the pending open
channel state when we attempt to merge the pending state with the
committed state.
This commit adds tracking of the pending channels state within the
autopilot.Agent. This fixes a class of bugs which was discovered during
the latest test net block storm wherein the Agent would attempt to
repeatedly attach to the same node due to rapid closure of other
channels.
In this commit we fix this issue by ensuring that we always factor in
the pending channel state when querying the heuristic w.r.t if we need
more channels, and if so to which nodes should be attached to.
This commit fixes a prior occasional test flake caused by the collision
of the randomly selected 64-bit integers. In order to get around this,
we now instead have a atomic monotonically increasing counter for each
channel ID used within the tests.