routing: prune edges instead of vertexes in response to an FailUnknownNextPeer error

In this commit we fix an lingering bug in the Mission Control logic we execute in response to the FailUnknownNextPeer error. Historically, we would treat this as the _next_ node not being online. As a result, we would then prune away the vertex from the current reachable graph all together. It was recently realized, that this would at times be a bit _tooo_ aggressive if the channel we attempt to route over was faulty, down, or the incoming node had connectivity issues with the outgoing node. In light of this realization, we'll now instead only prune the _edge_ that we attempted to route over. This ensures that we'll continue to explore the possible edges. Additionally, this guards us against failure modes where nodes report FailUnknownNextPeer to other nodes in an attempt to more closely control our retry logic. This change is a stop gap on the path to a more intelligent set of autopilot heuristics. Fixes #1114.
2018-04-23 17:50:56 -07:00 · 2018-04-23 17:50:56 -07:00 · bd9f1b597e
commit bd9f1b597e
parent 8af80bfc5c
1 changed files with 7 additions and 5 deletions
--- a/routing/router.go
+++ b/routing/router.go
@ -1819,12 +1819,14 @@ func (r *ChannelRouter) SendPayment(payment *LightningPayment) ([32]byte, *Route
 				continue

 			// If the next hop in the route wasn't known or
-			// offline, we'll prune the _next_ hop from the set of
-			// routes and retry.
+			// offline, we'll only the channel which we attempted
+			// to route over. This is conservative, and it can
+			// handle faulty channels between nodes properly.
+			// Additionally, this guards against routing nodes
+			// returning errors in order to attempt to black list
+			// another node.
 			case *lnwire.FailUnknownNextPeer:
-				pruneVertexFailure(
-					paySession, route, errSource, true,
-				)
+				pruneEdgeFailure(paySession, route, errSource)
 				continue

 			// If the node wasn't able to forward for which ever