routing: prune edges instead of vertexes in response to an FailUnknownNextPeer error

In this commit we fix an lingering bug in the Mission Control logic we
execute in response to the FailUnknownNextPeer error. Historically, we
would treat this as the _next_ node not being online. As a result, we
would then prune away the vertex from the current reachable graph all
together. It was recently realized, that this would at times be a bit
_tooo_ aggressive if the channel we attempt to route over was faulty,
down, or the incoming node had connectivity issues with the outgoing
node.

In light of this realization, we'll now instead only prune the _edge_
that we attempted to route over. This ensures that we'll continue to
explore the possible edges. Additionally, this guards us against failure
modes where nodes report FailUnknownNextPeer to other nodes in an
attempt to more closely control our retry logic.

This change is a stop gap on the path to a more intelligent set of
autopilot heuristics.

Fixes #1114.
This commit is contained in:
Olaoluwa Osuntokun 2018-04-23 17:50:56 -07:00
parent 8af80bfc5c
commit bd9f1b597e
No known key found for this signature in database
GPG Key ID: 964EA263DD637C21

@ -1819,12 +1819,14 @@ func (r *ChannelRouter) SendPayment(payment *LightningPayment) ([32]byte, *Route
continue
// If the next hop in the route wasn't known or
// offline, we'll prune the _next_ hop from the set of
// routes and retry.
// offline, we'll only the channel which we attempted
// to route over. This is conservative, and it can
// handle faulty channels between nodes properly.
// Additionally, this guards against routing nodes
// returning errors in order to attempt to black list
// another node.
case *lnwire.FailUnknownNextPeer:
pruneVertexFailure(
paySession, route, errSource, true,
)
pruneEdgeFailure(paySession, route, errSource)
continue
// If the node wasn't able to forward for which ever