Merge pull request #5447 from bhandras/etcd_le_docs
docs: leader election docs
This commit is contained in:
commit
bb5c3f3b51
112
docs/leader_election.md
Normal file
112
docs/leader_election.md
Normal file
@ -0,0 +1,112 @@
|
||||
# Increasing LND reliablity by clustering
|
||||
|
||||
Normally LND nodes use the embedded bbolt database to store all important states.
|
||||
This method of running has been proven to work well in a variety of environments,
|
||||
from mobile clients to large nodes serving hundreds of channels. With scale however
|
||||
it is desirable to be able to replicate LND's state to quickly and reliably move nodes,
|
||||
do updates and be more resilient to datacenter failures.
|
||||
|
||||
It is now possible to store all essential state in a replicated etcd DB and to
|
||||
run multiple LND nodes on different machines where only one of them (the leader)
|
||||
is able to read and mutate the database. In such setup if the leader node fails
|
||||
or decomissioned, a follower node will be elected as the new leader and will
|
||||
quickly come online to minimize downtime.
|
||||
|
||||
The leader election feature currently relies on etcd to work both for the election
|
||||
itself and for the replicated data store.
|
||||
|
||||
## Building LND with leader election support
|
||||
|
||||
To create a dev build of LND with leader election support use the following command:
|
||||
|
||||
```shell
|
||||
⛰ make tags="kvdb_etcd"
|
||||
```
|
||||
|
||||
## Running a local etcd instance for testing
|
||||
|
||||
To start your local etcd instance for testing run:
|
||||
|
||||
```shell
|
||||
⛰ ./etcd \
|
||||
--auto-tls \
|
||||
--advertise-client-urls=https://127.0.0.1:2379 \
|
||||
--listen-client-urls=https://0.0.0.0:2379 \
|
||||
--max-txn-ops=16384 \
|
||||
--max-request-bytes=104857600
|
||||
```
|
||||
|
||||
The large `max-txn-ops` and `max-request-bytes` values are currently recommended
|
||||
but may not be required in the future.
|
||||
|
||||
## Configuring LND to run on etcd and participate in leader election
|
||||
|
||||
To run LND with etcd, additional configuration is needed, specified either
|
||||
through command line flags or in `lnd.conf`.
|
||||
|
||||
Sample command line:
|
||||
|
||||
```shell
|
||||
⛰ ./lnd-debug \
|
||||
--db.backend=etcd \
|
||||
--db.etcd.host=127.0.0.1:2379 \
|
||||
--db.etcd.certfile=/home/user/etcd/bin/default.etcd/fixtures/client/cert.pem \
|
||||
--db.etcd.keyfile=/home/user/etcd/bin/default.etcd/fixtures/client/key.pem \
|
||||
--db.etcd.insecure_skip_verify \
|
||||
--cluster.enable-leader-election \
|
||||
--cluster.leader-elector=etcd \
|
||||
--cluster.etcd-election-prefix=cluster-leader \
|
||||
--cluster.id=lnd-1
|
||||
```
|
||||
The `cluster.etcd-election-prefix` option sets the election's etcd key prefix.
|
||||
The `cluster.id` is used to identify the individual nodes in the cluster
|
||||
and should be set to a different value for each node.
|
||||
|
||||
Optionally users can specify `db.etcd.user` and `db.etcd.pass` for db user
|
||||
authentication. If the database is shared, it is possible to separate our data
|
||||
from other users by setting `db.etcd.namespace` to an (already existing) etcd
|
||||
namespace. In order to test without TLS, we can set `db.etcd.disabletls`
|
||||
flag to `true`.
|
||||
|
||||
Once the node is up and running we can start more nodes with the same command line.
|
||||
|
||||
## Identifying the leader node
|
||||
|
||||
The above setup is useful for testing but is not viable when running in a production
|
||||
environment. For users relying on containers and orchestration services, it is
|
||||
essential to know which node is the leader to be able to automatically route
|
||||
network traffic to the right instance. For example in Kubernetes, the load balancer
|
||||
will route traffic to all "ready" nodes. This readiness may be monitored by a
|
||||
readiness probe.
|
||||
|
||||
For readiness probing we can simply use LND's state RPC service where a special state
|
||||
`WAITING_TO_START` indicates that the node is waiting to become the leader and is
|
||||
not started yet. To test this we can simply curl the REST endpoint of the state RPC:
|
||||
|
||||
```
|
||||
readinessProbe:
|
||||
exec:
|
||||
command: [
|
||||
"/bin/sh",
|
||||
"-c",
|
||||
"set -e; set -o pipefail; curl -s -k -o - https://localhost:8080/v1/state | jq .'State' | grep -E 'NON_EXISTING|LOCKED|UNLOCKED|RPC_ACTIVE'",
|
||||
]
|
||||
periodSeconds: 1
|
||||
```
|
||||
|
||||
## Replication of non-critical data
|
||||
|
||||
All critical data is written to the replicated database, including LND's wallet
|
||||
data which contains the key material and node identity. Some less critical data
|
||||
however is currently not written to that same database for performance reasons
|
||||
and is instead still kept in local `bbolt` files.
|
||||
|
||||
For example the graph data is kept locally to improve path finding. Other examples
|
||||
are the macaroon database or watchtower client database. To make sure a node can
|
||||
become active and take over quickly if the leader fails, it is therefore still
|
||||
recommended to have the LND data directory on a shared volume that all active and
|
||||
passive nodes can access. Otherwise the node that is taking over might first need
|
||||
to sync its graph.
|
||||
|
||||
As we evolve our cluster support we'll provide more solutions to make replication
|
||||
and clustering even more seamless.
|
Loading…
Reference in New Issue
Block a user