Commit Graph

260 Commits

Author SHA1 Message Date
Olaoluwa Osuntokun
bfcdbe2204 peer: when failing link, depend on server wg, not peer
In this commit, we fix a recently introduced bug. The issue is that
while we're failing the link, the peer we're attempting to force close
on may disconnect. As a result, if the peerTerminationWatcher exits
before we can add to the wait group (it's waiting on that), then we'll
run into a panic as we're attempting to increment the wait group while
another goroutine is calling wait.

The fix is to first check that the server isn't shutting down, and then
use the server's wait group rather than the peer to synchronize
goroutines.

Fixes #1285.
2018-05-25 20:36:40 -07:00
Johan T. Halseth
3b2fd32523
peer: populate OnChannelFailure in link config
This commit makes the peer aware of the LinkFailureErrors that can
happen during link operation, and making it start a goroutine to
properly remove the link and force close the channel.
2018-05-25 06:58:23 +02:00
Johan T. Halseth
4836a25c98
peer: move link creation into new method 2018-05-25 06:54:05 +02:00
Johan T. Halseth
bb611e065d
peer: don't pass closeCtx to chanCloser 2018-05-22 12:06:33 +02:00
Olaoluwa Osuntokun
f0050e7ffb
Merge pull request #1201 from lightningnetwork/peer-async-disconnect
server: Peer Async Disconnect w/ Scheduled peerConnected
2018-05-10 17:05:16 -07:00
Olaoluwa Osuntokun
b271ed5ffb
Merge pull request #1199 from cfromknecht/queue-msg-err-chans
peer/server: Remove Broadcast err chans
2018-05-10 16:55:57 -07:00
Conner Fromknecht
a2f7170ff0
peer: make Disconnect async, block on WaitForDisconnect 2018-05-08 16:37:14 -07:00
Conner Fromknecht
0691b21a30
peer: improves disconnect handling
This commit attempts to resolve some potential deadlock
scenarios during a peer disconnect.

Currently, writeMessage returns a nil error when disconnecting.
This should have minimal impact on the writeHanlder, as the
subsequent loop selects on the quit chan, and will cause it to
exit. However, if this happens when sending the init message,
the Start() method will attempt to proceed even though the peer
has been disconnected.

In addition, this commit changes the behavior of synchronous
write errors, by using a non-blocking select. Though unlikely,
this prevents any cases where multiple errors are returned, and
the errors are not being pulled from the other side of the errChan.
This removes any naked sends on the errChan from stalling the peer's
shutdown.
2018-05-08 16:35:49 -07:00
Olaoluwa Osuntokun
72f48b6abe
htlcswitch+server: ensure we always send an update w/ a TempChannelFailure
In this commit, we ensure that any time we send a TempChannelFailure
that's destined for a multi-hop source sender, then we'll always package
the latest channel update along with it.
2018-05-08 13:00:28 -04:00
Olaoluwa Osuntokun
d72f28839d
Merge pull request #1104 from halseth/chainwatcher-handoff-race
Fix chainwatcher handoff race
2018-05-03 17:18:31 -07:00
Olaoluwa Osuntokun
ecfde2e85f
Merge pull request #1149 from cfromknecht/trim-pending-htlc-index
Trim Open Circuits Using HTLC Index of Pending Commitments
2018-05-03 16:45:29 -07:00
Olaoluwa Osuntokun
5f059e74cb
peer: ensure msgConsumer sets the shutdown variable on exit
In this commit, we fix a bug that could at times cause a deadlock when a
peer is attempting to disconnect. The issue was that when a peer goes to
disconnect, it needs to stop any active msgStream instances. The Stop()
method of the msgStream would block until an atomic variable was set to
indicate that the stream had fully exited. However, in the case that we
disconnected lower in the msgConsumer loop, we would never set the
streamShutdown variable, meaning that msgStream.Stop() would never
unblock.

The fix for this is simple: set the streamShutdown variable within the
quit case of the second select statement in the msgConsumer goroutine.
2018-05-03 15:45:22 -07:00
Conner Fromknecht
701d37725c
peer: extract hodl mask, remove htlchodl mode 2018-05-02 00:18:51 -07:00
Johan T. Halseth
08f1a3689d
peer: don't pass bool to SubscribeChannelEvents 2018-05-02 08:43:32 +02:00
Johan T. Halseth
bd4e717971
peer: don't load channels that have had commitment broadcasted 2018-04-25 09:37:24 +02:00
practicalswift
663c396235 multi: fix a-vs-an typos 2018-04-17 19:02:04 -07:00
Olaoluwa Osuntokun
ffabb17ce6
peer: use new fetchLastChanUpdate method to populate the ChannelLinkConfig 2018-04-06 14:52:01 -07:00
Olaoluwa Osuntokun
7cbe78eeee
peer: re-use a static writeBuf within writeMessage optimize memory usage
In this commit, we might a very small change to the way writing messages
works in the peer, which should have large implications w.r.t reducing
memory usage amongst chatty nodes.

When profiling the heap on one of my nodes earlier, I noticed this
fragment:
```
Showing top 20 nodes out of 68
      flat  flat%   sum%        cum   cum%
         0     0%     0%    75.53MB 54.61%  main.(*peer).writeHandler
   75.53MB 54.61% 54.61%    75.53MB 54.61%  main.(*peer).writeMessage
```

Which points to an inefficiency with the way we handle allocations when
writing new messages, drilling down further we see:
```
(pprof) list writeMessage
Total: 138.31MB
ROUTINE ======================== main.(*peer).writeMessage in /root/go/src/github.com/lightningnetwork/lnd/peer.go
   75.53MB    75.53MB (flat, cum) 54.61% of Total
         .          .   1104:   p.logWireMessage(msg, false)
         .          .   1105:
         .          .   1106:   // As the Lightning wire protocol is fully message oriented, we only
         .          .   1107:   // allows one wire message per outer encapsulated crypto message. So
         .          .   1108:   // we'll create a temporary buffer to write the message directly to.
   75.53MB    75.53MB   1109:   var msgPayload [lnwire.MaxMessagePayload]byte
         .          .   1110:   b := bytes.NewBuffer(msgPayload[0:0:len(msgPayload)])
         .          .   1111:
         .          .   1112:   // With the temp buffer created and sliced properly (length zero, full
         .          .   1113:   // capacity), we'll now encode the message directly into this buffer.
         .          .   1114:   n, err := lnwire.WriteMessage(b, msg, 0)
(pprof) list writeHandler
Total: 138.31MB
ROUTINE ======================== main.(*peer).writeHandler in /root/go/src/github.com/lightningnetwork/lnd/peer.go
         0    75.53MB (flat, cum) 54.61% of Total
         .          .   1148:
         .          .   1149:                   // Write out the message to the socket, closing the
         .          .   1150:                   // 'sentChan' if it's non-nil, The 'sentChan' allows
         .          .   1151:                   // callers to optionally synchronize sends with the
         .          .   1152:                   // writeHandler.
         .    75.53MB   1153:                   err := p.writeMessage(outMsg.msg)
         .          .   1154:                   if outMsg.errChan != nil {
         .          .   1155:                           outMsg.errChan <- err
         .          .   1156:                   }
         .          .   1157:
         .          .   1158:                   if err != nil {
```

Ah hah! We create a _new_ buffer each time we want to write a message
out. This is unnecessary and _very_ wasteful (as seen by the profile).
The fix is simple: re-use a buffer unique to each peer when writing out
messages. Since we know what the max message size is, we just allocate
one of these 65KB buffers for each peer, and keep it around until the
peer is removed.
2018-04-06 12:55:17 -07:00
Olaoluwa Osuntokun
ca9174e166
peer: extend SendMessage to allow callers to block until msg is sent 2018-04-04 17:43:57 -07:00
Olaoluwa Osuntokun
447a031435
peer: reject remote closes with active HTLCs
In this commit, we follow up to the prior commit by ensuring we won't
accept a co-op close request for a chennel with active HTLCs. When
creating a chanCloser for the first time, we'll check the set of HTLC's
and reject a request (by sending a wire error) if the target channel
still as active HTLC's.
2018-03-30 13:06:57 -07:00
Olaoluwa Osuntokun
e7d66e1dfd
peer: don't d/c peer if we encounter lnwire.ErrUnknownAddrType
In this commit, we fix a minor deviation in our implementation from the
specification. Before if we encountered an unknown error type, we would
disconnect the peer. Instead, we’ll now just continue along parsing the
remainder of the messages. This was flared up recently by some
c-lightning related incompatibilities that emerged on main net.
2018-03-23 15:49:33 -07:00
Olaoluwa Osuntokun
fda3b871c1
peer: ensure we stop the channel if error happens in loadActiveChannels
In this commit, we fix a goroutine leak that could occur if while we
were loading an error occurred in any of the steps after we created the
channel object, but before it was actually loaded in to the script. If
an error occurs at any step, we ensure that we’ll stop toe channel.
Otherwise, the sigPool goroutines would still be lingering and never be
stopped.
2018-03-19 19:14:55 -07:00
Conner Fromknecht
c1f0c4ffda
peer: DecodeOnionObfuscator -> ExractErrorEncrypter 2018-03-13 16:33:29 -07:00
Johan T. Halseth
a6c2550404
peer: track failed channels
This commit adds a set used to track channels we consider failed. This
is done to ensure we don't end up in a connect/disconnect loop when we
attempt to re-sync the channel state of a failed channel with a peer.
2018-03-13 11:11:17 +01:00
Olaoluwa Osuntokun
53045450ad
Merge pull request #823 from Roasbeef/chan-stream-buf-limit
peer: modify the msgStream to not buffer messages off the wire indefi…
2018-03-12 19:39:48 -07:00
Olaoluwa Osuntokun
c285bb5814 htlcswitch+peer: remove DecodeHopIterator from ChannelLinkConfig
In this commit, we remove the DecodeHopIterator method from the
ChannelLinkConfig struct. We do this as we no longer use this method,
since we only ever use the DecodeHopIterators method now.
2018-03-12 18:58:08 -07:00
Olaoluwa Osuntokun
60c8257c3c
peer: modify the msgStream to not buffer messages off the wire indefinitely
In this commit, we modify the msgStream struct to ensure that it has a
cap at which it’ll continue to buffer messages. Currently we have two
msgStream structs per peer: the first for the discovery messages, and
the second for any messages that modify channel state. Due to
inefficiencies in the current protocol for reconciling graph state upon
connection (just dump the entire damn thing), when a node first starts
up, this can lead to very high memory usage as all peers will
concurrently send their initial message dump which can be in the
thousands of messages on testate.

Our fix is simple: make the message stream into a _bounded_ message
stream. The newMsgStream function now has a new argument: bufSize.
Internally, we’ll take this bufSize and create more or less an internal
semaphore for the producer. Each time the producer gets a new message,
it’ll try and read an item from the channel. If the queue still has
size, then this will succeed immediately. If not, then we’ll block
until the consumer actually finishes processing a message and then
signals by sending a new item into the channel.

We choose an initial value of 1000. This was chosen as there’s already
a max limit of outstanding adds on the commitment, and a value of 1000
should allow any incoming messages to be safely flushed and processed
by the gossiper.
2018-03-12 16:34:59 -07:00
Johan T. Halseth
80ef16e853
peer: hand lnwire.Error to fndgMngr or link depending on chanID 2018-03-11 17:21:23 +01:00
Conner Fromknecht
972d238f04
peer: remove Switch from channel link config 2018-03-09 21:18:15 -08:00
Conner Fromknecht
d468a36d69
peer: pass unsafe-replay to link config 2018-03-09 21:18:15 -08:00
Conner Fromknecht
8d0e3dc467
peer: adds FwdPkgGCTicker to channel configs 2018-03-09 21:18:14 -08:00
Conner Fromknecht
0e4be6a04a
peer: init link with batched sphinx processing 2018-03-09 21:18:14 -08:00
Johan T. Halseth
b9d1eceda3
peer: use EstimateFeePerVSize 2018-02-26 22:42:26 +01:00
MeshCollider
2c2ed3c6a9 multi: Unify use of NodeKey in log messages 2018-02-19 17:48:39 -08:00
MeshCollider
915c4201b9 multi: remove internal peer_id usage 2018-02-19 17:48:39 -08:00
practicalswift
b8e1351cf3 multi: fix some recently introduced typos 2018-02-18 15:27:29 -08:00
Olaoluwa Osuntokun
2e090ee2ab
peer: only return a channel snapshot if the channel can route
In this commit, we fix a slight miscalculation within the GetInfo call.
Before this commit, we would list any channel that the peer knew of as
active, instead of those which are, well, actually *active*. We fix
this by skipping any channels that we don’t have the remote revocation
for.
2018-02-08 19:40:54 -08:00
Olaoluwa Osuntokun
22951cb364
lnd: account for new lnwire.Sig API and channeldb API changes 2018-02-06 20:14:33 -08:00
practicalswift
a93736d21e multi: comprehensive typo fixes across all packages 2018-02-06 19:11:11 -08:00
Johan T. Halseth
26a80f86b8
peer: set BatchTicker and BatchSize in channellink config 2018-02-02 21:16:37 -05:00
Olaoluwa Osuntokun
0a4de859a2
discovery+routing: reduce number of active validation barrier jobs
In order to reduce high CPU utilization during the initial network view
sync, we slash down the total number of active in-flight jobs that can
be launched.
2018-01-28 14:55:32 -08:00
Olaoluwa Osuntokun
d4e650c85d
peer: the chancloser no longer needs to notify the breach arb of settled transactions 2018-01-22 19:19:59 -08:00
Olaoluwa Osuntokun
5df6704a9c
contractcourt: make synchronous chain watcher notifications optional
In this commit, we modify the way that notifications are dispatched
within the chainWatcher. Before we would *always* wait for an ack back
before we started to clean up he database state. This would at times
lead to deadlocks. To remedy this, we now allow callers to decide if
they want notifications to be sync or not. The only current caller that
requires this is the breach arbiter.
2018-01-22 19:19:58 -08:00
Olaoluwa Osuntokun
3ec83cc82f
peer+contractcourt: delegate watching for co-op closes to the chainWatcher
In this commit, we modify the interaction between the chanCloser
sub-system and the chain notifier all together. This fixes a series of
bugs as before this commit, we wouldn’t be able to detect if the remote
party actually broadcasted *any* of the transactions that we signed off
upon. This would be rejected to the user by having a “zombie” channel
close that would never actually be resolved.

Rather than the chanCloser watching for on-chain closes, we’ll now open
up a co-op close context to the chainWatcher (via a layer of
indirection via the ChainArbitrator), and report to it all possible
closes that we’ve signed. The chainWatcher will then be able to launch
a goroutine to properly update the database state once any of the
possible closure transactions confirms.
2018-01-22 19:19:53 -08:00
Olaoluwa Osuntokun
69e6ec9954
peer+funding: remove unneeded channel handoff code with the breach arbiter
We no longer need to hand off new channels that come online as the
chainWatcher will be persistent, and always have an active signal for
the entire lifetime of the channel.
2018-01-22 19:19:50 -08:00
Olaoluwa Osuntokun
defa1bc3e3
peer: when creating new links, obtain an on-chain events subscription 2018-01-22 19:19:49 -08:00
Olaoluwa Osuntokun
24a16b4f49
lnd: properly initialize entities of new contractcourt package 2018-01-22 19:19:42 -08:00
Olaoluwa Osuntokun
b1fe0c12bf
peer: ensure that any active msgStreams properly exit upon peer D/C
In this commit, we modify the logic within the Stop() method for
msgStream to ensure that the main goroutine properly exits. It has been
observed on running nodes with tens of connections, that if a node is
very flappy, then the node can end up with hundreds of leaked
goroutines.

In order to fix this, we’ll continually signal the msgConsumer to wake
up after the quit channel has been closed. We do this until the
msgConsumer sets a bool indicating that it has exited atomically.
2018-01-08 19:50:22 -08:00
Conner Fromknecht
44805be8d9
peer: filter borked channels when loading active chans 2018-01-05 13:47:18 -08:00
Johan T. Halseth
25b77a0aee peer: add error chan to queueMsg 2017-12-19 13:01:59 -06:00