Commit Graph

251 Commits

Author SHA1 Message Date
Olaoluwa Osuntokun
d72f28839d
Merge pull request #1104 from halseth/chainwatcher-handoff-race
Fix chainwatcher handoff race
2018-05-03 17:18:31 -07:00
Olaoluwa Osuntokun
ecfde2e85f
Merge pull request #1149 from cfromknecht/trim-pending-htlc-index
Trim Open Circuits Using HTLC Index of Pending Commitments
2018-05-03 16:45:29 -07:00
Olaoluwa Osuntokun
5f059e74cb
peer: ensure msgConsumer sets the shutdown variable on exit
In this commit, we fix a bug that could at times cause a deadlock when a
peer is attempting to disconnect. The issue was that when a peer goes to
disconnect, it needs to stop any active msgStream instances. The Stop()
method of the msgStream would block until an atomic variable was set to
indicate that the stream had fully exited. However, in the case that we
disconnected lower in the msgConsumer loop, we would never set the
streamShutdown variable, meaning that msgStream.Stop() would never
unblock.

The fix for this is simple: set the streamShutdown variable within the
quit case of the second select statement in the msgConsumer goroutine.
2018-05-03 15:45:22 -07:00
Conner Fromknecht
701d37725c
peer: extract hodl mask, remove htlchodl mode 2018-05-02 00:18:51 -07:00
Johan T. Halseth
08f1a3689d
peer: don't pass bool to SubscribeChannelEvents 2018-05-02 08:43:32 +02:00
Johan T. Halseth
bd4e717971
peer: don't load channels that have had commitment broadcasted 2018-04-25 09:37:24 +02:00
practicalswift
663c396235 multi: fix a-vs-an typos 2018-04-17 19:02:04 -07:00
Olaoluwa Osuntokun
ffabb17ce6
peer: use new fetchLastChanUpdate method to populate the ChannelLinkConfig 2018-04-06 14:52:01 -07:00
Olaoluwa Osuntokun
7cbe78eeee
peer: re-use a static writeBuf within writeMessage optimize memory usage
In this commit, we might a very small change to the way writing messages
works in the peer, which should have large implications w.r.t reducing
memory usage amongst chatty nodes.

When profiling the heap on one of my nodes earlier, I noticed this
fragment:
```
Showing top 20 nodes out of 68
      flat  flat%   sum%        cum   cum%
         0     0%     0%    75.53MB 54.61%  main.(*peer).writeHandler
   75.53MB 54.61% 54.61%    75.53MB 54.61%  main.(*peer).writeMessage
```

Which points to an inefficiency with the way we handle allocations when
writing new messages, drilling down further we see:
```
(pprof) list writeMessage
Total: 138.31MB
ROUTINE ======================== main.(*peer).writeMessage in /root/go/src/github.com/lightningnetwork/lnd/peer.go
   75.53MB    75.53MB (flat, cum) 54.61% of Total
         .          .   1104:   p.logWireMessage(msg, false)
         .          .   1105:
         .          .   1106:   // As the Lightning wire protocol is fully message oriented, we only
         .          .   1107:   // allows one wire message per outer encapsulated crypto message. So
         .          .   1108:   // we'll create a temporary buffer to write the message directly to.
   75.53MB    75.53MB   1109:   var msgPayload [lnwire.MaxMessagePayload]byte
         .          .   1110:   b := bytes.NewBuffer(msgPayload[0:0:len(msgPayload)])
         .          .   1111:
         .          .   1112:   // With the temp buffer created and sliced properly (length zero, full
         .          .   1113:   // capacity), we'll now encode the message directly into this buffer.
         .          .   1114:   n, err := lnwire.WriteMessage(b, msg, 0)
(pprof) list writeHandler
Total: 138.31MB
ROUTINE ======================== main.(*peer).writeHandler in /root/go/src/github.com/lightningnetwork/lnd/peer.go
         0    75.53MB (flat, cum) 54.61% of Total
         .          .   1148:
         .          .   1149:                   // Write out the message to the socket, closing the
         .          .   1150:                   // 'sentChan' if it's non-nil, The 'sentChan' allows
         .          .   1151:                   // callers to optionally synchronize sends with the
         .          .   1152:                   // writeHandler.
         .    75.53MB   1153:                   err := p.writeMessage(outMsg.msg)
         .          .   1154:                   if outMsg.errChan != nil {
         .          .   1155:                           outMsg.errChan <- err
         .          .   1156:                   }
         .          .   1157:
         .          .   1158:                   if err != nil {
```

Ah hah! We create a _new_ buffer each time we want to write a message
out. This is unnecessary and _very_ wasteful (as seen by the profile).
The fix is simple: re-use a buffer unique to each peer when writing out
messages. Since we know what the max message size is, we just allocate
one of these 65KB buffers for each peer, and keep it around until the
peer is removed.
2018-04-06 12:55:17 -07:00
Olaoluwa Osuntokun
ca9174e166
peer: extend SendMessage to allow callers to block until msg is sent 2018-04-04 17:43:57 -07:00
Olaoluwa Osuntokun
447a031435
peer: reject remote closes with active HTLCs
In this commit, we follow up to the prior commit by ensuring we won't
accept a co-op close request for a chennel with active HTLCs. When
creating a chanCloser for the first time, we'll check the set of HTLC's
and reject a request (by sending a wire error) if the target channel
still as active HTLC's.
2018-03-30 13:06:57 -07:00
Olaoluwa Osuntokun
e7d66e1dfd
peer: don't d/c peer if we encounter lnwire.ErrUnknownAddrType
In this commit, we fix a minor deviation in our implementation from the
specification. Before if we encountered an unknown error type, we would
disconnect the peer. Instead, we’ll now just continue along parsing the
remainder of the messages. This was flared up recently by some
c-lightning related incompatibilities that emerged on main net.
2018-03-23 15:49:33 -07:00
Olaoluwa Osuntokun
fda3b871c1
peer: ensure we stop the channel if error happens in loadActiveChannels
In this commit, we fix a goroutine leak that could occur if while we
were loading an error occurred in any of the steps after we created the
channel object, but before it was actually loaded in to the script. If
an error occurs at any step, we ensure that we’ll stop toe channel.
Otherwise, the sigPool goroutines would still be lingering and never be
stopped.
2018-03-19 19:14:55 -07:00
Conner Fromknecht
c1f0c4ffda
peer: DecodeOnionObfuscator -> ExractErrorEncrypter 2018-03-13 16:33:29 -07:00
Johan T. Halseth
a6c2550404
peer: track failed channels
This commit adds a set used to track channels we consider failed. This
is done to ensure we don't end up in a connect/disconnect loop when we
attempt to re-sync the channel state of a failed channel with a peer.
2018-03-13 11:11:17 +01:00
Olaoluwa Osuntokun
53045450ad
Merge pull request #823 from Roasbeef/chan-stream-buf-limit
peer: modify the msgStream to not buffer messages off the wire indefi…
2018-03-12 19:39:48 -07:00
Olaoluwa Osuntokun
c285bb5814 htlcswitch+peer: remove DecodeHopIterator from ChannelLinkConfig
In this commit, we remove the DecodeHopIterator method from the
ChannelLinkConfig struct. We do this as we no longer use this method,
since we only ever use the DecodeHopIterators method now.
2018-03-12 18:58:08 -07:00
Olaoluwa Osuntokun
60c8257c3c
peer: modify the msgStream to not buffer messages off the wire indefinitely
In this commit, we modify the msgStream struct to ensure that it has a
cap at which it’ll continue to buffer messages. Currently we have two
msgStream structs per peer: the first for the discovery messages, and
the second for any messages that modify channel state. Due to
inefficiencies in the current protocol for reconciling graph state upon
connection (just dump the entire damn thing), when a node first starts
up, this can lead to very high memory usage as all peers will
concurrently send their initial message dump which can be in the
thousands of messages on testate.

Our fix is simple: make the message stream into a _bounded_ message
stream. The newMsgStream function now has a new argument: bufSize.
Internally, we’ll take this bufSize and create more or less an internal
semaphore for the producer. Each time the producer gets a new message,
it’ll try and read an item from the channel. If the queue still has
size, then this will succeed immediately. If not, then we’ll block
until the consumer actually finishes processing a message and then
signals by sending a new item into the channel.

We choose an initial value of 1000. This was chosen as there’s already
a max limit of outstanding adds on the commitment, and a value of 1000
should allow any incoming messages to be safely flushed and processed
by the gossiper.
2018-03-12 16:34:59 -07:00
Johan T. Halseth
80ef16e853
peer: hand lnwire.Error to fndgMngr or link depending on chanID 2018-03-11 17:21:23 +01:00
Conner Fromknecht
972d238f04
peer: remove Switch from channel link config 2018-03-09 21:18:15 -08:00
Conner Fromknecht
d468a36d69
peer: pass unsafe-replay to link config 2018-03-09 21:18:15 -08:00
Conner Fromknecht
8d0e3dc467
peer: adds FwdPkgGCTicker to channel configs 2018-03-09 21:18:14 -08:00
Conner Fromknecht
0e4be6a04a
peer: init link with batched sphinx processing 2018-03-09 21:18:14 -08:00
Johan T. Halseth
b9d1eceda3
peer: use EstimateFeePerVSize 2018-02-26 22:42:26 +01:00
MeshCollider
2c2ed3c6a9 multi: Unify use of NodeKey in log messages 2018-02-19 17:48:39 -08:00
MeshCollider
915c4201b9 multi: remove internal peer_id usage 2018-02-19 17:48:39 -08:00
practicalswift
b8e1351cf3 multi: fix some recently introduced typos 2018-02-18 15:27:29 -08:00
Olaoluwa Osuntokun
2e090ee2ab
peer: only return a channel snapshot if the channel can route
In this commit, we fix a slight miscalculation within the GetInfo call.
Before this commit, we would list any channel that the peer knew of as
active, instead of those which are, well, actually *active*. We fix
this by skipping any channels that we don’t have the remote revocation
for.
2018-02-08 19:40:54 -08:00
Olaoluwa Osuntokun
22951cb364
lnd: account for new lnwire.Sig API and channeldb API changes 2018-02-06 20:14:33 -08:00
practicalswift
a93736d21e multi: comprehensive typo fixes across all packages 2018-02-06 19:11:11 -08:00
Johan T. Halseth
26a80f86b8
peer: set BatchTicker and BatchSize in channellink config 2018-02-02 21:16:37 -05:00
Olaoluwa Osuntokun
0a4de859a2
discovery+routing: reduce number of active validation barrier jobs
In order to reduce high CPU utilization during the initial network view
sync, we slash down the total number of active in-flight jobs that can
be launched.
2018-01-28 14:55:32 -08:00
Olaoluwa Osuntokun
d4e650c85d
peer: the chancloser no longer needs to notify the breach arb of settled transactions 2018-01-22 19:19:59 -08:00
Olaoluwa Osuntokun
5df6704a9c
contractcourt: make synchronous chain watcher notifications optional
In this commit, we modify the way that notifications are dispatched
within the chainWatcher. Before we would *always* wait for an ack back
before we started to clean up he database state. This would at times
lead to deadlocks. To remedy this, we now allow callers to decide if
they want notifications to be sync or not. The only current caller that
requires this is the breach arbiter.
2018-01-22 19:19:58 -08:00
Olaoluwa Osuntokun
3ec83cc82f
peer+contractcourt: delegate watching for co-op closes to the chainWatcher
In this commit, we modify the interaction between the chanCloser
sub-system and the chain notifier all together. This fixes a series of
bugs as before this commit, we wouldn’t be able to detect if the remote
party actually broadcasted *any* of the transactions that we signed off
upon. This would be rejected to the user by having a “zombie” channel
close that would never actually be resolved.

Rather than the chanCloser watching for on-chain closes, we’ll now open
up a co-op close context to the chainWatcher (via a layer of
indirection via the ChainArbitrator), and report to it all possible
closes that we’ve signed. The chainWatcher will then be able to launch
a goroutine to properly update the database state once any of the
possible closure transactions confirms.
2018-01-22 19:19:53 -08:00
Olaoluwa Osuntokun
69e6ec9954
peer+funding: remove unneeded channel handoff code with the breach arbiter
We no longer need to hand off new channels that come online as the
chainWatcher will be persistent, and always have an active signal for
the entire lifetime of the channel.
2018-01-22 19:19:50 -08:00
Olaoluwa Osuntokun
defa1bc3e3
peer: when creating new links, obtain an on-chain events subscription 2018-01-22 19:19:49 -08:00
Olaoluwa Osuntokun
24a16b4f49
lnd: properly initialize entities of new contractcourt package 2018-01-22 19:19:42 -08:00
Olaoluwa Osuntokun
b1fe0c12bf
peer: ensure that any active msgStreams properly exit upon peer D/C
In this commit, we modify the logic within the Stop() method for
msgStream to ensure that the main goroutine properly exits. It has been
observed on running nodes with tens of connections, that if a node is
very flappy, then the node can end up with hundreds of leaked
goroutines.

In order to fix this, we’ll continually signal the msgConsumer to wake
up after the quit channel has been closed. We do this until the
msgConsumer sets a bool indicating that it has exited atomically.
2018-01-08 19:50:22 -08:00
Conner Fromknecht
44805be8d9
peer: filter borked channels when loading active chans 2018-01-05 13:47:18 -08:00
Johan T. Halseth
25b77a0aee peer: add error chan to queueMsg 2017-12-19 13:01:59 -06:00
Matt Drollette
adf0d98194 multi: fix several typos in godoc comments 2017-12-17 18:40:05 -08:00
Olaoluwa Osuntokun
ecf58d64f7
peer: properly route UpdateFailMalformedHTLC messages to the switch
This commit adds an overlooked case into the main type switch statement
within the peer’s readHandler. Before this commit, we would fail to
process any UpdateFailMalformedHTLC messages, possibly leading to a
commitment desynchronization. To avoid this case, we’ll no properly
process the UpdateFailMalformedHTLC message by sending the message to
an active link registered to the switch.
2017-12-12 11:22:50 -08:00
Olaoluwa Osuntokun
ce6dee6ee4
peer: check LocalUnrevokedCommitPoint for nil-ness as it's optional
In this commit, we modify the logWireMessage function to ensure that we
don't attempt to nil out the LocalUnrevokedCommitPoint.Curve field
unless it's actually set. We need to do this as the field as actually
optional, and we may be reading a message from a node that doesn't
support the option.

Fixes #461.
2017-12-07 13:11:23 -08:00
Olaoluwa Osuntokun
4444ec39ea
funding: properly display our 1-byte error code messages 2017-12-06 18:43:58 -08:00
Olaoluwa Osuntokun
7960b5240f
peer: when processing a msg, skip he funding barrier if it's a ChanSync message
This commit is a follow up to the prior commit: as it’s possible for
the channel_reestablish message to be sent *before* the channel has
been fully confirmed, we’ll now ensure that we process it to the link
even if the channel isn’t yet open.
2017-12-06 16:43:00 -08:00
Olaoluwa Osuntokun
084d477ec3
peer: always load active channels upon connection reestablishment with peer
In this commit, we modify the logic within loadActiveChannels to
*always* load a channel, even if it isn’t yet fully confirmed. With
this change, we ensure that we’ll always send a channel_reestablish
message upon reconnection.

Fixes #458.
2017-12-06 16:43:00 -08:00
Olaoluwa Osuntokun
7b10f54216
peer: in logWireMessage also unset curve field for lnwire.ChannelReestablish
This will allow users to run in trace mode again, without having an
excessive amount of spam in their logs.
2017-12-06 16:43:00 -08:00
Olaoluwa Osuntokun
3bc248e01c
peer: properly process retransmitted FundingLocked message we've never processed
In this commit, we modify the logic within the channelManager to be
able to process any retransmitted FundingLocked messages. Before this
commit, we would simply ignore any new channels sent to us, iff, we
already had an active channel with the same channel point. With the
recent change to the loadActiveChannels method in the peer, this is now
incorrect.

When a peer retransmits the FundingLocked message, it goes through to
the fundingManager. The fundingMgr will then (if we haven’t already
processed it), send the channel to the breach arbiter and also to the
peer’s channelManager. In order to handle this case properly, if we
already have the channel, we’ll check if our current channel *doesn’t*
already have the RemoteNextRevocation field set. If it doesn’t, then
this means that we haven’t yet processed the FundingLcoked message, so
we’ll process it for the first time.

This new logic will properly:
  * ensure that the breachArbiter still has the most up to date channel
  * allow us to update the state of the link has been added to the
switch at this point
      * this link will now be eligible for forwarding after this
sequence
2017-12-06 16:42:59 -08:00
Olaoluwa Osuntokun
ddc5a0fc85
peer: unconditionally add a channel to the switch if it's open
In this commit we revert a prior change which was added after
FundingLocked retransmission was implemented. This prior change didn’t
factor in the fact that the FundingLocked message will *only* be
re-sent after both sides receive the ChannelReestablishment message.
With the prior code, as we never added the channel to the link, we’d
never re-send the ChannelReestablishment, meaning the other side would
never send the FundingLocked message.

By unconditionally adding the channel to the switch, we ensure that
we’ll always properly retransmit the FundingLocked message.
2017-12-06 16:42:59 -08:00