This commit fixes a bug where all the HTLC rhash slices in a
ListChannelsResponse would be tied to the loop variable, making them all
take the hash of the last HTLC in the list. This commit fixes it by
making a copy of the slice.
In this commit, we fix an existing flake in the integration tests. If it
was the case that Alice didn't yet know of all the channels, then the
payment attempt below would fail at times, depending on other timing
factors in the test. We fix this flake by waiting for Alice to learn of
all channels before we proceed to the actual testing logic.
This commit adds a missing return to the resolveContract method, that
will ensure the goroutine exits if the ChannelArbitrator shuts down.
This fixes a potential deadlock during the integration tests.
We also promote some of the logs to Debug from Trace.
In this commit, we fix a long standing bug where at times a co-op
channel closure wouldn't be properly marked as fully closed in the
database. The culprit was a re-occurring code flaw we've seen many times
in the codebase: a closure variable that closes over a loop iterator
variable. Before this instance, I assumed that this could only pop up
when goroutines bind to the loop iterator within a closure. However,
this instance is the exact same issue, but within a regular closure that
has _delayed_ execution. As the closure doesn't execute until long after
the loop has finished executing, it may still be holding onto the _last_
item the loop iterator variable was assigned to.
The fix for this issue is very simple: re-assign the channel point
before creating the closure. Without this fix, we would go to call
db.MarkChanFullyClosed on a channel that may not have yet actually be in
the pending close state, causing all executions to fail.
Fixes#1054.
Fixes#1056.
Fixes#1075.
In this commit, we extend the closeChannelAndAssert testing utility
function to ensure that the channel is no longer marked as "pending
close" in the database. With this change, we hop to catch a recently
reported issue wherein users report that a co-op close channel has been
fully confirmed, yet it still pops up in the `pendingchannels` command.
This commit fixes an issue in funding manager startup,
where a goroutine reads from a range value. The method in
question could cause a channel to be announced at the
wrong time.
This may have been a cause for certain channels having
phantom HTLCs before they had even received the funding
locked message from the remote peer.
This is fixed simply by using the locally scoped
variable passed in as an argument to the goroutine.
In this commit, we modify the way we handle FeeInsufficientErrors to
more aggressively route around nodes that repeatedly return the same
error to us. This will ensure we skip older nodes on the network which
are running a buggier older version of lnd. Eventually most nodes will
upgrade to this new version, making this change less needed.
We also update the existing test to properly use a multi-hop route to
ensure that we route around the offending node.
In this commit, we update the testUpdateChannelPolicy to exercise the
recent set of changes within the switch. If one applies this test to a
fresh branch (without those new changes) it should fail. This is due to
the fact that before, Bob would attempt to apply the constraints of the
incoming link (which we updated) instead of the outgoing link. With the
recent set of changes, the test now properly passes.
In this commit, we fix the TestUpdateForwardingPolicy test case after
the recent modification in the way we handling validating constraints
within the link. After the recent set of changes, Bob will properly use
his outgoing link to validate the set of fee related constraints rather
than the incoming link. As a result, we need to modify the second
channel link, not the first for the test to still be applicable.
In this commit, we simplify the switch's code a bit. Rather than having
a set of channels we use to mutate or query for the set of current
links, we'll instead now just use a mutex to guard a set of link
indexes. This serves to simplify the ode, and also make it such that we
don't need to block forwarding in order to add/remove a link.
In this commit, we fix a very old, lingering bug within the link. When
accepting an HTLC we are meant to validate the fee against the
constraints of the *outgoing* link. This is due to the fact that we're
offering a payment transit service on our outgoing link. Before this
commit, we would use the policies of the *incoming* link. This would at
times lead to odd routing errors as we would go to route, get an error
update and then route again, repeating the process.
With this commit, we'll properly use the incoming link for timelock
related constraints, and the outgoing link for fee related constraints.
We do this by introducing a new HtlcSatisfiesPolicy method in the link.
This method should return a non-nil error if the link can carry the HTLC
as it satisfies its current forwarding policy. We'll use this method now
at *forwarding* time to ensure that we only forward to links that
actually accept the policy. This fixes a number of bugs that existed
before that could result in a link accepting an HTLC that actually
violated its policy. In the case that the policy is violated for *all*
links, we take care to return the error returned by the *target* link so
the caller can update their sending accordingly.
In this commit, we also remove the prior linkControl channel in the
channelLink. Instead, of sending a message to update the internal link
policy, we'll use a mutex in place. This simplifies the code, and also
adds some necessary refactoring in anticipation of the next follow up
commit.
In this commit, we fix an existing deadlock in the
processChanPolicyUpdate method. Before this commit, within
processChanPolicyUpdate, we would directly call updateChannel *within*
the ForEachChannel closure. This would at times result in a deadlock, as
updateChannel will itself attempt to create a write transaction in order
to persist the newly updated channel.
We fix this deadlock by simply performing another loop once we know the
set of channels that we wish to update. This second loop will actually
update the channels on disk.
In this commit, we might a very small change to the way writing messages
works in the peer, which should have large implications w.r.t reducing
memory usage amongst chatty nodes.
When profiling the heap on one of my nodes earlier, I noticed this
fragment:
```
Showing top 20 nodes out of 68
flat flat% sum% cum cum%
0 0% 0% 75.53MB 54.61% main.(*peer).writeHandler
75.53MB 54.61% 54.61% 75.53MB 54.61% main.(*peer).writeMessage
```
Which points to an inefficiency with the way we handle allocations when
writing new messages, drilling down further we see:
```
(pprof) list writeMessage
Total: 138.31MB
ROUTINE ======================== main.(*peer).writeMessage in /root/go/src/github.com/lightningnetwork/lnd/peer.go
75.53MB 75.53MB (flat, cum) 54.61% of Total
. . 1104: p.logWireMessage(msg, false)
. . 1105:
. . 1106: // As the Lightning wire protocol is fully message oriented, we only
. . 1107: // allows one wire message per outer encapsulated crypto message. So
. . 1108: // we'll create a temporary buffer to write the message directly to.
75.53MB 75.53MB 1109: var msgPayload [lnwire.MaxMessagePayload]byte
. . 1110: b := bytes.NewBuffer(msgPayload[0:0:len(msgPayload)])
. . 1111:
. . 1112: // With the temp buffer created and sliced properly (length zero, full
. . 1113: // capacity), we'll now encode the message directly into this buffer.
. . 1114: n, err := lnwire.WriteMessage(b, msg, 0)
(pprof) list writeHandler
Total: 138.31MB
ROUTINE ======================== main.(*peer).writeHandler in /root/go/src/github.com/lightningnetwork/lnd/peer.go
0 75.53MB (flat, cum) 54.61% of Total
. . 1148:
. . 1149: // Write out the message to the socket, closing the
. . 1150: // 'sentChan' if it's non-nil, The 'sentChan' allows
. . 1151: // callers to optionally synchronize sends with the
. . 1152: // writeHandler.
. 75.53MB 1153: err := p.writeMessage(outMsg.msg)
. . 1154: if outMsg.errChan != nil {
. . 1155: outMsg.errChan <- err
. . 1156: }
. . 1157:
. . 1158: if err != nil {
```
Ah hah! We create a _new_ buffer each time we want to write a message
out. This is unnecessary and _very_ wasteful (as seen by the profile).
The fix is simple: re-use a buffer unique to each peer when writing out
messages. Since we know what the max message size is, we just allocate
one of these 65KB buffers for each peer, and keep it around until the
peer is removed.