Add docs about redundancy (#2142)

## Issue Addressed - Resolves #2140 ## Proposed Changes Adds some documentation on the topic of "redundancy". ## Additional Info NA
2021-01-12 00:26:22 +00:00 · 2021-01-12 00:26:22 +00:00 · 1d535659d6
commit 1d535659d6
parent 423dea169c
3 changed files with 123 additions and 0 deletions
--- a/book/src/SUMMARY.md
+++ b/book/src/SUMMARY.md
@ -33,6 +33,7 @@
 	* [Local Testnets](./local-testnets.md)
    * [Advanced Networking](./advanced_networking.md)
    * [Running a Slasher](./slasher.md)
+    * [Redundancy](./redundancy.md)
 * [Contributing](./contributing.md)
 	* [Development Environment](./setup.md)
 * [FAQs](./faq.md)
--- a/book/src/faq.md
+++ b/book/src/faq.md
@ -174,3 +174,11 @@ or after being off for more than several minutes.

 If this log continues appearing sporadically during operation, there may be an
 issue with your eth1 endpoint.
+
+### Can I use redundancy in my staking setup?
+
+You should **never** use duplicate/redundant validator keypairs or validator clients (i.e., don't
+duplicate your JSON keystores and don't run `lighthouse vc` twice). This will lead to slashing.
+
+However, there are some components which can be configured with redundancy. See the
+[Redundancy](./redundancy.md) guide for more information.
--- a/book/src/redundancy.md
+++ b/book/src/redundancy.md
@ -0,0 +1,114 @@
+# Redundancy
+
+[subscribe-api]: https://ethereum.github.io/eth2.0-APIs/#/Validator/prepareBeaconCommitteeSubnet
+
+There are three places in Lighthouse where redundancy is notable:
+
+1. ✅ GOOD: Using a redundant Beacon node in `lighthouse bn --beacon-nodes`
+1. ✅ GOOD: Using a redundant Eth1 node in `lighthouse bn --eth1-endpoints`
+1. ☠️ BAD: Running redundant `lighthouse vc` instances with overlapping keypairs.
+
+I mention (3) since it is unsafe and should not be confused with the other two
+uses of redundancy. **Running the same validator keypair in more than one
+validator client (Lighthouse, or otherwise) will eventually lead to slashing.**
+See [Slashing Protection](./slashing-protection.md) for more information.
+
+From this paragraph, this document will *only* refer to the first two items (1, 2). We
+*never* recommend that users implement redundancy for validator keypairs.
+
+## Redundant Beacon Nodes
+
+The `lighthouse bn --beacon-nodes` flag allows one or more comma-separated values:
+
+1. `lighthouse vc --beacon-nodes http://localhost:5052`
+1. `lighthouse vc --beacon-nodes http://localhost:5052,http://192.168.1.1:5052`
+
+In the first example, the validator client will attempt to contact
+`http://localhost:5052` to perform duties. If that node is not contactable, not
+synced or unable to serve the request then the validator client may fail to
+perform some duty (e.g., produce a block or attest).
+
+However, in the second example, any failure on `http://localhost:5052` will be
+followed by a second attempt using `http://192.168.1.1:5052`. This
+achieves *redundancy*, allowing the validator client to continue to perform its
+duties as long as *at least one* of the beacon nodes is available.
+
+There are a few interesting properties about the list of `--beacon-nodes`:
+
+- *Ordering matters*: the validator client prefers a beacon node that is
+	earlier in the list.
+- *Synced is preferred*: the validator client prefers a synced beacon node over
+	one that is still syncing.
+- *Failure is sticky*: if a beacon node fails, it will be flagged as offline
+    and wont be retried again for the rest of the slot (12 seconds). This helps prevent the impact
+    of time-outs and other lengthy errors.
+
+> Note: When supplying multiple beacon nodes the `http://localhost:5052` address must be explicitly
+> provided (if it is desired). It will only be used as default if no `--beacon-nodes` flag is
+> provided at all.
+
+### Configuring a redundant Beacon Node
+
+In our previous example we listed `http://192.168.1.1:5052` as a redundant
+node. Apart from having sufficient resources, the backup node should have the
+following flags:
+
+- `--staking`: starts the HTTP API server and ensures the Eth1 chain is synced.
+- `--http-address 0.0.0.0`: this allows *any* external IP address to access the
+	HTTP server (a firewall should be configured to deny unauthorized access to port
+	`5052`). This is only required if your backup node is on a different host.
+- `--subscribe-all-subnets`: ensures that the beacon node subscribes to *all*
+	subnets, not just on-demand requests from validators.
+- `--process-all-attestations`: ensures that the beacon node performs
+	aggregation on all seen attestations.
+
+Subsequently, one could use the following command to provide a backup beacon
+node:
+
+```bash
+lighthouse bn \
+  --staking \
+  --http-address 0.0.0.0 \
+  --subscribe-all-subnets \
+  --process-all-attestations
+```
+
+### Resource usage of redundant Beacon Nodes
+
+The `--subscribe-all-subnets` and `--process-all-attestations` flags typically
+cause a significant increase in resource consumption. A doubling in CPU
+utilization and RAM consumption is expected.
+
+The increase in resource consumption is due to the fact that the beacon node is
+now processing, validating, aggregating and forwarding *all* attestations,
+whereas previously it was likely only doing a fraction of this work. Without
+these flags, subscription to attestation subnets and aggregation of
+attestations is only performed for validators which [explicitly request
+subscriptions](subscribe-api).
+
+There are 64 subnets and each validator will result in a subscription to *at
+least* one subnet. So, using the two aforementioned flags will result in
+resource consumption akin to running 64+ validators.
+
+## Redundant Eth1 nodes
+
+Compared to redundancy in beacon nodes (see above), using redundant Eth1 nodes
+is very straight-forward:
+
+1. `lighthouse bn --eth1-endpoints http://localhost:8545`
+1. `lighthouse bn --eth1-endpoints http://localhost:8545,http://192.168.0.1:8545`
+
+In the case of (1), any failure on `http://localhost:8545` will result in a
+failure to update the Eth1 cache in the beacon node. Consistent failure over a
+period of hours may result in a failure in block production.
+
+However, in the case of (2), the `http://192.168.0.1:8545` Eth1 endpoint will
+be tried each time the first fails. Eth1 endpoints will be tried from first to
+last in the list, until a successful response is obtained.
+
+There is no need for special configuration on the Eth1 endpoint, all endpoints can (probably should)
+be configured identically.
+
+> Note: When supplying multiple endpoints the `http://localhost:8545` address must be explicitly
+> provided (if it is desired). It will only be used as default if no `--eth1-endpoints` flag is
+> provided at all.