In this commit, we fix a long standing bug where at times a co-op
channel closure wouldn't be properly marked as fully closed in the
database. The culprit was a re-occurring code flaw we've seen many times
in the codebase: a closure variable that closes over a loop iterator
variable. Before this instance, I assumed that this could only pop up
when goroutines bind to the loop iterator within a closure. However,
this instance is the exact same issue, but within a regular closure that
has _delayed_ execution. As the closure doesn't execute until long after
the loop has finished executing, it may still be holding onto the _last_
item the loop iterator variable was assigned to.
The fix for this issue is very simple: re-assign the channel point
before creating the closure. Without this fix, we would go to call
db.MarkChanFullyClosed on a channel that may not have yet actually be in
the pending close state, causing all executions to fail.
Fixes#1054.
Fixes#1056.
Fixes#1075.
In this commit, we extend the closeChannelAndAssert testing utility
function to ensure that the channel is no longer marked as "pending
close" in the database. With this change, we hop to catch a recently
reported issue wherein users report that a co-op close channel has been
fully confirmed, yet it still pops up in the `pendingchannels` command.
This commit fixes an issue in funding manager startup,
where a goroutine reads from a range value. The method in
question could cause a channel to be announced at the
wrong time.
This may have been a cause for certain channels having
phantom HTLCs before they had even received the funding
locked message from the remote peer.
This is fixed simply by using the locally scoped
variable passed in as an argument to the goroutine.
In this commit, we modify the way we handle FeeInsufficientErrors to
more aggressively route around nodes that repeatedly return the same
error to us. This will ensure we skip older nodes on the network which
are running a buggier older version of lnd. Eventually most nodes will
upgrade to this new version, making this change less needed.
We also update the existing test to properly use a multi-hop route to
ensure that we route around the offending node.
In this commit, we update the testUpdateChannelPolicy to exercise the
recent set of changes within the switch. If one applies this test to a
fresh branch (without those new changes) it should fail. This is due to
the fact that before, Bob would attempt to apply the constraints of the
incoming link (which we updated) instead of the outgoing link. With the
recent set of changes, the test now properly passes.
In this commit, we fix the TestUpdateForwardingPolicy test case after
the recent modification in the way we handling validating constraints
within the link. After the recent set of changes, Bob will properly use
his outgoing link to validate the set of fee related constraints rather
than the incoming link. As a result, we need to modify the second
channel link, not the first for the test to still be applicable.
In this commit, we simplify the switch's code a bit. Rather than having
a set of channels we use to mutate or query for the set of current
links, we'll instead now just use a mutex to guard a set of link
indexes. This serves to simplify the ode, and also make it such that we
don't need to block forwarding in order to add/remove a link.
In this commit, we fix a very old, lingering bug within the link. When
accepting an HTLC we are meant to validate the fee against the
constraints of the *outgoing* link. This is due to the fact that we're
offering a payment transit service on our outgoing link. Before this
commit, we would use the policies of the *incoming* link. This would at
times lead to odd routing errors as we would go to route, get an error
update and then route again, repeating the process.
With this commit, we'll properly use the incoming link for timelock
related constraints, and the outgoing link for fee related constraints.
We do this by introducing a new HtlcSatisfiesPolicy method in the link.
This method should return a non-nil error if the link can carry the HTLC
as it satisfies its current forwarding policy. We'll use this method now
at *forwarding* time to ensure that we only forward to links that
actually accept the policy. This fixes a number of bugs that existed
before that could result in a link accepting an HTLC that actually
violated its policy. In the case that the policy is violated for *all*
links, we take care to return the error returned by the *target* link so
the caller can update their sending accordingly.
In this commit, we also remove the prior linkControl channel in the
channelLink. Instead, of sending a message to update the internal link
policy, we'll use a mutex in place. This simplifies the code, and also
adds some necessary refactoring in anticipation of the next follow up
commit.
In this commit, we fix an existing deadlock in the
processChanPolicyUpdate method. Before this commit, within
processChanPolicyUpdate, we would directly call updateChannel *within*
the ForEachChannel closure. This would at times result in a deadlock, as
updateChannel will itself attempt to create a write transaction in order
to persist the newly updated channel.
We fix this deadlock by simply performing another loop once we know the
set of channels that we wish to update. This second loop will actually
update the channels on disk.
In this commit, we might a very small change to the way writing messages
works in the peer, which should have large implications w.r.t reducing
memory usage amongst chatty nodes.
When profiling the heap on one of my nodes earlier, I noticed this
fragment:
```
Showing top 20 nodes out of 68
flat flat% sum% cum cum%
0 0% 0% 75.53MB 54.61% main.(*peer).writeHandler
75.53MB 54.61% 54.61% 75.53MB 54.61% main.(*peer).writeMessage
```
Which points to an inefficiency with the way we handle allocations when
writing new messages, drilling down further we see:
```
(pprof) list writeMessage
Total: 138.31MB
ROUTINE ======================== main.(*peer).writeMessage in /root/go/src/github.com/lightningnetwork/lnd/peer.go
75.53MB 75.53MB (flat, cum) 54.61% of Total
. . 1104: p.logWireMessage(msg, false)
. . 1105:
. . 1106: // As the Lightning wire protocol is fully message oriented, we only
. . 1107: // allows one wire message per outer encapsulated crypto message. So
. . 1108: // we'll create a temporary buffer to write the message directly to.
75.53MB 75.53MB 1109: var msgPayload [lnwire.MaxMessagePayload]byte
. . 1110: b := bytes.NewBuffer(msgPayload[0:0:len(msgPayload)])
. . 1111:
. . 1112: // With the temp buffer created and sliced properly (length zero, full
. . 1113: // capacity), we'll now encode the message directly into this buffer.
. . 1114: n, err := lnwire.WriteMessage(b, msg, 0)
(pprof) list writeHandler
Total: 138.31MB
ROUTINE ======================== main.(*peer).writeHandler in /root/go/src/github.com/lightningnetwork/lnd/peer.go
0 75.53MB (flat, cum) 54.61% of Total
. . 1148:
. . 1149: // Write out the message to the socket, closing the
. . 1150: // 'sentChan' if it's non-nil, The 'sentChan' allows
. . 1151: // callers to optionally synchronize sends with the
. . 1152: // writeHandler.
. 75.53MB 1153: err := p.writeMessage(outMsg.msg)
. . 1154: if outMsg.errChan != nil {
. . 1155: outMsg.errChan <- err
. . 1156: }
. . 1157:
. . 1158: if err != nil {
```
Ah hah! We create a _new_ buffer each time we want to write a message
out. This is unnecessary and _very_ wasteful (as seen by the profile).
The fix is simple: re-use a buffer unique to each peer when writing out
messages. Since we know what the max message size is, we just allocate
one of these 65KB buffers for each peer, and keep it around until the
peer is removed.
This commit fixes a bug within the funding manager, where we would use
the wrong min_htlc_value parameter. Instead of attributing the custom
passed value for MinHtlc to the remote's constraints, we would add it to
our own constraints.
This commit adds TestFundingManagerCustomChannelParameters, which checks
that custom channel parameters specified at channel creation is
preserved and recorded correctly on both sides of the channel.
This commit fixes a bug that would cause the local and remote commitment
to be incompatible when using custom remote CSV delay when opening a
channel. This would happen because we wouldn't store the CSV value
before we received the FundingAccept message, and here we would use the
default value.
This commit fixes this by making the csv value part of the
reservationWithCtx struct, such that it can be recorded for use when the
FundingAccept msg comes back.
This commit renames ForceCloseSummary to LocalForceCloseSummary, and
adds a new method NewLocalForceCloseSummary that can be used to derive a
LocalForceCloseSummary if our commitment transaction gets confirmed
in-chain. It is meant to accompany the NewUnilateralCloseSummary method,
which is used for the same purpose in the event of a remote commitment
being seen in-chain.
It is better to replace bash shell with potentially long-running
last script command. This way the running command will receive all
potential unix process signals directly.
A concrete example which motivated this change:
Exec of btcd is needed for graceful shutdown of btcd during
`docker-compose down`. Docker Compose properly sends this signal to our
start-btcd.sh bash shell but it is not further signalled to the running
btcd process. Docker Compose then kills whole container forcefully after
some timeout.
An alternative solution would be to trap SIGTERM in our bash script and
forward it to running btcd. Which would be IMO ugly and error prone.