This is a workaround to fix the windows build. Apparently there was a
change in go 1.16 in how the go.sum entries are calculated. Dependencies
that aren't directly depended on are stripped. Because we need this
indirect windows dependency for the integration tests, we add a
workaround that makes sure the entry is kept in go.sum.
If we have processed a terminal state while we're pathfinding
for another shard, the payment loop should not error out on
ErrPaymentTerminal. Instead, it would wait for our shards to
complete then cleanly exit.
Move our more generic terminal check forward so that we only
need to handle a single class of expected errors. This change
is mirrored in our mock, and our reproducing tests are updated
to assert that this move catches both classes of errors we get.
Add an additional stuck-payment case, where our payment gets
a terminal error while it has other htlcs in-flight, and a
shard fails with ErrTerminalPayment. This payment also falls in
our class of expected errors, but is not currently handled. The
mock is updated accordingly, using the same ordering as in our
real RegisterAttempt implementation.
This commit adds a test which demonstrates that payments can
get stuck if we receive a payment failure while we're pathfinding
for another shard, then try to dispatch a shard after we've
recorded a permanent failure. It also updates our mock to
only consider payments with no in-flight htlcs as in-flight,
to more closely represent our actual RegisterAttempt.
This commit adds a step to our payment lifecycle test to add
control over when we find a path for our payment, This is
required for testing race conditions around pathfinding
completing and payment failures being reported.
Update our single shard success case to use a route which
splits the payment amount in half. This change still tests
the case where reveal of the preimage counts as a success,
even if we don't have the full amount. This change is made
to cut down on potential races in this test case. While we
are waiting for collectResultAsync to report a success, the
payment lifecycle will continue trying to dispatch shards.
In the case where we send 1/4 of the payment amount, we
send 1 or 2 more shards, depending on how long collectAsync
takes. Reducing this test to send 1/2 of the payment amount
means that we will always only try one more shard before
waiting for our shard.
This commit updates our mock to more closely follow the behavior of the
switch for mocked calls to GetPaymentResult. As it stands, our tests
send a test-created error from the switch when we want to mock shutdown.
In reality, the switch will close its result channel, so we update this
test to follow that behavior. This matters for the commit that follows,
because we start checking the error our payments return. If we have an
error from the switch, our tests will fail with an error that we do
not encounter in practice.
In our payment lifecycle tests, we have two goroutines that
compete for the lock in our mock control tower: the resumePayment
loop which tries to call RegisterAttempt, and the collectResult
handler which is launched in a goroutine by collectResultAsync
and is responsible for various settle/fail calls.
The order that the lock is acquired by these goroutines is
arbitrary, and can lead to flakes in our tests if the step
that we do not intend to execute first gets the lock (eg,
we want to fail a payment in collectResult, but RegisterAttempt
gets there first). This commit moves contention for this lock
after our mock's various "state driving" channels, so that the
lock will be acquired in the order that the test intends it.
Now that we run each test individually, we don't need to buffer
our mock's channels anymore. This helps to tighten our test loop,
which currently can move on from a step before it's actually
been processed by the mock. This removal ensures that our payment
loop processes each of the test's steps before moving on to the
next once.
Update our payment lifecycle test to run each test case with
a fresh router. This prevents test cases from interacting with
each other. Names are also added for easy debugging.
As is, we don't check that our SendPayment call in
TestRouterPaymentStateMachine completes. This makes it easier
to create malformed tests that just run through steps but leave
the SendPayment call hanging. This commit adds a check that we
have completed our payment to help catch tests like this. We
also remove an unused quit channel.
Our aggregate htlc test depends on our previous behavior
where recipients would allow channels with pending hold
invoice htlcs to force close. Now that we have an expiry
watcher to prevent these force closes, we can't rely on
this for tests because the recipient will cancel the htlcs
back before they expire.
This commit adds a test for a hold invoice which is accepted
off-chain, and held by the recipient until it expired and
the payer force-closes the channel. With this test we
demonstrate two bugs in our handling of hold invoice state
in the invoice registry when we expire on chain:
- Htlcs not updated: even when we've timed out, we don't
update the htlc state accordingly.
- Invoice can be settled: the invoice can be settled even
though it's expired on chain.
Reproduce the case where we allow settling of invoices that have
htlcs that have actually timed out on chain. This bug can rarely
occur if a hodl invoice goes to chain and is manually settled
after it has timed out. Funds are SAFU, but this could be a
headache because the invoice says it's settled when no funds
were claimed.
This commit updates our multi-hop force close test to use a hodl
invoice so that we can reproduce some bugs which will require
the preimage for the invoice that is timed out on chain.
The previous behavior would allow updates to be overwritten in some
scenarios. Upon restart, this would lead to a missing settle/fail in
the update logs.