To free up build in Travis, we decided to run the non-flaky parts of
the CI pipeline in GitHub Workflows/Actions only. The integration tests
on the other hand are removed from GitHub because individual actions
cannot be restarted there which caused us to restart the whole workflow
if one test was flaky.
This split should give us the best of both worlds: Fast run of small
checks, linting and unit tests with an easy overview of what failed in
the PR directly. And more free build slots on Travis to do more advanced
integration tests on other architectures and/or operating systems. And
the option to restart a single flaky integration test on Travis.
The `restoreStateLogs` function now properly restores the
`addCommitHeightLocal` field of a settle or fail's parent add.
Previously, any updates' parent in unsignedAckedUpdates would have
the field set to the default value of 0. This would cause a force
closure when receiving a commitment due to our belt-and-suspenders
checks for update logs during commitment validation.
The bug in question occurs because the `addCommitHeightLocal` field
is only populated for a restored add if the add is on the local
commitment. `TestChannelRestoreCommitHeight` is expanded in
`lnwallet/channel_test.go` to demonstrate restoration now works.
The faulty state transition:
```
<----fail----
<----sig-----
-----rev----> (add no longer on Alice's commitment)
*Alice restores* (addCommitHeightLocal of failed htlc is 0)
```
NOTE: Alice dies after sending a revocation but before signing a
commitment. This is possible because there is a select block in the link
that can potentially exit after sending over the revocation but before
signing the next commitment state for the counterparty.
This commit creates a new autopilot heuristic which simply returns
normalized betweenness centrality values for the current graph. This
new heuristic will make it possible to prefer nodes with large
centrality when we're trying to open channels. The heuristic is also
somewhat dumb as it doesn't try to figure out the best nodes, as that'd
require adding ghost edges to the graph recalculating the centrality as
many times as many nodes there are (minus the one we already have
channels with).
This commit removes an extra filter on address availability which is not
needed as the scored nodes are a already prefiltered subset of the whole
graph where address availability has already been checked.
This reduces the flakiness of the CPFP test by asserting the wallet has
seen the unspent output before attempting to perform the walletkit's
BumpFee method.
Previously the attempt to bump the fee of the target transaction could
be made before the wallet had had a chance to fully process the
transaction, causing a flaky error.
This switches a few call sites that used a different timeout when
openening channels to the correct openChannelTimeout, which better deal
with flakes in the CI.
This replaces an outstanding sleep for a check for a specific state
during the test for watchtower use: specifically, that the backup has
been sent to the watchtower prior to shutting down Dave.
This reduces flakiness in the test that could occur if the Dave shutdown
without the backup being comitted to the watchtower, causing the rest of
the test to fail.
This changes the wait during node connection to check both for the
existance as well as for the validity of the tls cert and macaroon
files.
This ensures that nodes in the process of starting up don't inadvertedly
cause a connection error due to not yet having written the entire file.
During the channel_backup_restore/restore_during_unlock itest, the node
is restored from seed and immediately restarted. Depending on specific
timing of the machine, the test harness might not have had the graph
subscription processed before the node shuts down, causing the harness
to trigger a panic.
Reducing this to a synchronous subscription attempt means node
initialization necessarily waits until the subscription is done before
attempting to restart, reducing flakiness and ensuring correct behavior.
This forces the Dial attempt to succeed or fail before proceeding with
node setup.
We also log on the node a failure to establish the graph subscription
before panicking so that we can more easily find issues.
This improves the error reporting for the harness' CloseChannel so that
the exact step where closure fails can be better indicated.
This is to help debug some flaky failures in the CI.
In this commit, we add a String() method to the failure resolution
outcome. Without this, logs aren't very useful as the integer version of
the outcome is printed rather than the description.