docs: leader election docs

2021-06-29 00:45:11 +02:00 · 2021-06-29 00:45:11 +02:00 · f41771ce54
commit f41771ce54
parent 1287328678
1 changed files with 112 additions and 0 deletions
--- a/docs/leader_election.md
+++ b/docs/leader_election.md
@ -0,0 +1,112 @@
+# Increasing LND reliablity by clustering
+
+Normally LND nodes use the embedded bbolt database to store all important states.
+This method of running has been proven to work well in a variety of environments,
+from mobile clients to large nodes serving hundreds of channels. With scale however
+it is desirable to be able to replicate LND's state to quickly and reliably move nodes,
+do updates and be more resilient to datacenter failures.
+
+It is now possible to store all essential state in a replicated etcd DB and to
+run multiple LND nodes on different machines where only one of them (the leader) 
+is able to read and mutate the database. In such setup if the leader node fails
+or decomissioned, a follower node will be elected as the new leader and will
+quickly come online to minimize downtime.
+
+The leader election feature currently relies on etcd to work both for the election
+itself and for the replicated data store.
+
+## Building LND with leader election support
+
+To create a dev build of LND with leader election support use the following command:
+
+```shell
+⛰  make tags="kvdb_etcd"
+```
+
+## Running a local etcd instance for testing
+
+To start your local etcd instance for testing run:
+
+```shell
+⛰  ./etcd \
+    --auto-tls \
+    --advertise-client-urls=https://127.0.0.1:2379 \
+    --listen-client-urls=https://0.0.0.0:2379 \
+    --max-txn-ops=16384 \
+    --max-request-bytes=104857600
+```
+
+The large `max-txn-ops` and `max-request-bytes` values are currently recommended
+but may not be required in the future.
+
+## Configuring LND to run on etcd and participate in leader election
+
+To run LND with etcd, additional configuration is needed, specified either
+through command line flags or in `lnd.conf`.
+
+Sample command line:
+
+```shell
+⛰  ./lnd-debug \
+    --db.backend=etcd \
+    --db.etcd.host=127.0.0.1:2379 \
+    --db.etcd.certfile=/home/user/etcd/bin/default.etcd/fixtures/client/cert.pem \
+    --db.etcd.keyfile=/home/user/etcd/bin/default.etcd/fixtures/client/key.pem \
+    --db.etcd.insecure_skip_verify \
+    --cluster.enable-leader-election \
+    --cluster.leader-elector=etcd \
+    --cluster.etcd-election-prefix=cluster-leader \
+    --cluster.id=lnd-1
+```
+The `cluster.etcd-election-prefix` option sets the election's etcd key prefix. 
+The `cluster.id` is used to identify the individual nodes in the cluster
+and should be set to a different value for each node.
+
+Optionally users can specify `db.etcd.user` and `db.etcd.pass` for db user
+authentication. If the database is shared, it is possible to separate our data
+from other users by setting `db.etcd.namespace` to an (already existing) etcd
+namespace. In order to test without TLS, we can set `db.etcd.disabletls`
+flag to `true`.
+
+Once the node is up and running we can start more nodes with the same command line.
+
+## Identifying the leader node
+
+The above setup is useful for testing but is not viable when running in a production
+environment. For users relying on containers and orchestration services, it is
+essential to know which node is the leader to be able to automatically route
+network traffic to the right instance. For example in Kubernetes, the load balancer
+will route traffic to all "ready" nodes. This readiness may be monitored by a
+readiness probe.
+
+For readiness probing we can simply use LND's state RPC service where a special state
+`WAITING_TO_START` indicates that the node is waiting to become the leader and is
+not started yet. To test this we can simply curl the REST endpoint of the state RPC:
+
+```
+readinessProbe:
+    exec:
+      command: [
+        "/bin/sh",
+        "-c",
+        "set -e; set -o pipefail; curl -s -k -o - https://localhost:8080/v1/state | jq .'State' | grep -E 'NON_EXISTING|LOCKED|UNLOCKED|RPC_ACTIVE'",
+      ]
+    periodSeconds: 1
+```
+
+## Replication of non-critical data
+
+All critical data is written to the replicated database, including LND's wallet
+data which contains the key material and node identity. Some less critical data
+however is currently not written to that same database for performance reasons
+and is instead still kept in local `bbolt` files.
+
+For example the graph data is kept locally to improve path finding. Other examples
+are the macaroon database or watchtower client database. To make sure a node can
+become active and take over quickly if the leader fails, it is therefore still
+recommended to have the LND data directory on a shared volume that all active and
+passive nodes can access. Otherwise the node that is taking over might first need
+to sync its graph.
+
+As we evolve our cluster support we'll provide more solutions to make replication
+and clustering even more seamless.