From cb275e746dce455d473bef4d653fe85748477718 Mon Sep 17 00:00:00 2001 From: chonghe Date: Mon, 31 Jul 2023 23:51:38 +0000 Subject: [PATCH] Update Lighthouse book FAQ (#4510) Some updates in the FAQ based on issues seen on Discord. Additionally, corrected the disk usage on the default SPRP as the previously provided value is not correct. Co-authored-by: chonghe <44791194+chong-he@users.noreply.github.com> --- book/src/advanced_database.md | 4 +-- book/src/advanced_networking.md | 2 +- book/src/builders.md | 2 ++ book/src/faq.md | 51 +++++++++++++++++++++++++++++++-- 4 files changed, 53 insertions(+), 6 deletions(-) diff --git a/book/src/advanced_database.md b/book/src/advanced_database.md index d95110405..20c5d7443 100644 --- a/book/src/advanced_database.md +++ b/book/src/advanced_database.md @@ -28,8 +28,8 @@ some example values. | Research | 32 | 3.4 TB | 155 ms | | Block explorer/analysis | 128 | 851 GB | 620 ms | | Enthusiast (prev. default) | 2048 | 53.6 GB | 10.2 s | -| Hobbyist | 4096 | 26.8 GB | 20.5 s | -| Validator only (default) | 8192 | 8.1 GB | 41 s | +| Hobbyist | 4096 | 26.8 GB | 20.5 s | +| Validator only (default) | 8192 | 12.7 GB | 41 s | *Last update: May 2023. diff --git a/book/src/advanced_networking.md b/book/src/advanced_networking.md index 586503cb9..ba07a6f87 100644 --- a/book/src/advanced_networking.md +++ b/book/src/advanced_networking.md @@ -172,7 +172,7 @@ In order to do so, lighthouse provides the following CLI options/parameters. advertises some address, must be reachable both over UDP and TCP. -In the general case, an user will not require to set these explicitly. Update +In the general case, a user will not require to set these explicitly. Update these options only if you can guarantee your node is reachable with these values. diff --git a/book/src/builders.md b/book/src/builders.md index 6db360d70..2be4841dd 100644 --- a/book/src/builders.md +++ b/book/src/builders.md @@ -201,6 +201,8 @@ else: use local payload ``` +If you would like to always use the builder payload, you can add the flag `--always-prefer-builder-payload` to the beacon node. + ## Checking your builder config You can check that your builder is configured correctly by looking for these log messages. diff --git a/book/src/faq.md b/book/src/faq.md index d3e25438a..15c575706 100644 --- a/book/src/faq.md +++ b/book/src/faq.md @@ -10,6 +10,8 @@ - [My beacon node logs `WARN BlockProcessingFailure outcome: MissingBeaconBlock`, what should I do?](#bn-missing-beacon) - [After checkpoint sync, the progress of `downloading historical blocks` is slow. Why?](#bn-download-slow) - [My beacon node logs `WARN Error processing HTTP API request`, what should I do?](#bn-http) +- [My beacon node logs `WARN Error signalling fork choice waiter`, what should I do?](#bn-fork-choice) +- [My beacon node logs `ERRO Aggregate attestation queue full`, what should I do?](#bn-queue-full) ## [Validator](#validator-1) - [Why does it take so long for a validator to be activated?](#vc-activation) @@ -30,7 +32,7 @@ - [My beacon node and validator client are on different servers. How can I point the validator client to the beacon node?](#net-bn-vc) - [Should I do anything to the beacon node or validator client settings if I have a relocation of the node / change of IP address?](#net-ip) - [How to change the TCP/UDP port 9000 that Lighthouse listens on?](#net-port) - +- [Lighthouse `v4.3.0` introduces a change where a node will subscribe to only 2 subnets in total. I am worried that this will impact my validators return.](#net-subnet) ## [Miscellaneous](#miscellaneous-1) - [What should I do if I lose my slashing protection database?](#misc-slashing) @@ -74,7 +76,7 @@ The `WARN Execution engine called failed` log is shown when the beacon node cann `error: Reqwest(reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Ipv4(127.0.0.1)), port: Some(8551), path: "/", query: None, fragment: None }, source: TimedOut }), service: exec` -which says `TimedOut` at the end of the message. This means that the execution engine has not responded in time to the beacon node. One option is to add the flag `--execution-timeout-multiplier 3` to the beacon node. However, if the error persists, it is worth digging further to find out the cause. There are a few reasons why this can occur: +which says `TimedOut` at the end of the message. This means that the execution engine has not responded in time to the beacon node. One option is to add the flags `--execution-timeout-multiplier 3` and `--disable-lock-timeouts` to the beacon node. However, if the error persists, it is worth digging further to find out the cause. There are a few reasons why this can occur: 1. The execution engine is not synced. Check the log of the execution engine to make sure that it is synced. If it is syncing, wait until it is synced and the error will disappear. You will see the beacon node logs `INFO Execution engine online` when it is synced. 1. The computer is overloaded. Check the CPU and RAM usage to see if it has overloaded. You can use `htop` to check for CPU and RAM usage. 1. Your SSD is slow. Check if your SSD is in "The Bad" list [here](https://gist.github.com/yorickdowne/f3a3e79a573bf35767cd002cc977b038). If your SSD is in "The Bad" list, it means it cannot keep in sync to the network and you may want to consider upgrading to a better SSD. @@ -170,6 +172,27 @@ ERRO Failed to download attester duties err: FailedToDownloadAttesters("Som This means that the validator client is sending requests to the beacon node. However, as the beacon node is still syncing, it is therefore unable to fulfil the request. The error will disappear once the beacon node is synced. +### My beacon node logs `WARN Error signalling fork choice waiter`, what should I do? + +An example of the full log is shown below: + +``` +WARN Error signalling fork choice waiter slot: 6763073, error: ForkChoiceSignalOutOfOrder { current: Slot(6763074), latest: Slot(6763073) }, service: state_advance +``` + +This suggests that the computer resources are being overwhelmed. It could be due to high CPU usage or high disk I/O usage. This can happen, e.g., when the beacon node is downloading historical blocks, or when the execution client is syncing. The error will disappear when the resources used return to normal or when the node is synced. + + +### My beacon node logs `ERRO Aggregate attestation queue full`, what should I do? + +An example of the full log is shown below: +``` +ERRO Aggregate attestation queue full, queue_len: 4096, msg: the system has insufficient resources for load, module: network::beacon_processor:1542 +``` + +This suggests that the computer resources are being overwhelmed. It could be due to high CPU usage or high disk I/O usage. This can happen, e.g., when the beacon node is downloading historical blocks, or when the execution client is syncing. The error will disappear when the resources used return to normal or when the node is synced. + + ## Validator ### Why does it take so long for a validator to be activated? @@ -279,12 +302,26 @@ The first thing is to ensure both consensus and execution clients are synced wit - the internet is working well - you have sufficient peers -You can see more information on the [Ethstaker KB](https://ethstaker.gitbook.io/ethstaker-knowledge-base/help/missed-attestations). Once the above points are good, missing attestation should be a rare occurrence. +You can see more information on the [Ethstaker KB](https://ethstaker.gitbook.io/ethstaker-knowledge-base/help/missed-attestations). + +Another cause for missing attestations is delays during block processing. When this happens, the debug logs will show (debug logs can be found under `$datadir/beacon/logs`): + +``` +DEBG Delayed head block set_as_head_delay: Some(93.579425ms), imported_delay: Some(1.460405278s), observed_delay: Some(2.540811921s), block_delay: 4.094796624s, slot: 6837344, proposer_index: 211108, block_root: 0x2c52231c0a5a117401f5231585de8aa5dd963bc7cbc00c544e681342eedd1700, service: beacon +``` + +The fields to look for are `imported_delay > 1s` and `observed_delay < 3s`. The `imported_delay` is how long the node took to process the block. The `imported_delay` of larger than 1 second suggests that there is slowness in processing the block. It could be due to high CPU usage, high I/O disk usage or the clients are doing some background maintenance processes. The `observed_delay` is determined mostly by the proposer and partly by your networking setup (e.g., how long it took for the node to receive the block). The `observed_delay` of less than 3 seconds means that the block is not arriving late from the block proposer. Combining the above, this implies that the validator should have been able to attest to the block, but failed due to slowness in the node processing the block. + ### Sometimes I miss the attestation head vote, resulting in penalty. Is this normal? In general, it is unavoidable to have some penalties occasionally. This is particularly the case when you are assigned to attest on the first slot of an epoch and if the proposer of that slot releases the block late, then you will get penalised for missing the target and head votes. Your attestation performance does not only depend on your own setup, but also on everyone elses performance. +You could also check for the sync aggregate participation percentage on block explorers such as [beaconcha.in](https://beaconcha.in/). A low sync aggregate participation percentage (e.g., 60-70%) indicates that the block that you are assigned to attest to may be published late. As a result, your validator fails to correctly attest to the block. + +Another possible reason for missing the head vote is due to a chain "reorg". A reorg can happen if the proposer publishes block `n` late, and the proposer of block `n+1` builds upon block `n-1` instead of `n`. This is called a "reorg". Due to the reorg, block `n` was never included in the chain. If you are assigned to attest at slot `n`, it is possible you may still attest to block `n` despite most of the network recognizing the block as being late. In this case you will miss the head reward. + + ### Can I submit a voluntary exit message without running a beacon node? Yes. Beaconcha.in provides the tool to broadcast the message. You can create the voluntary exit message file with [ethdo](https://github.com/wealdtech/ethdo/releases/tag/v1.30.0) and submit the message via the [beaconcha.in](https://beaconcha.in/tools/broadcast) website. A guide on how to use `ethdo` to perform voluntary exit can be found [here](https://github.com/eth-educators/ethstaker-guides/blob/main/voluntary-exit.md). @@ -425,6 +462,14 @@ No. Lighthouse will auto-detect the change and update your Ethereum Node Record ### How to change the TCP/UDP port 9000 that Lighthouse listens on? Use the flag ```--port ``` in the beacon node. This flag can be useful when you are running two beacon nodes at the same time. You can leave one beacon node as the default port 9000, and configure the second beacon node to listen on, e.g., ```--port 9001```. +### Lighthouse `v4.3.0` introduces a change where a node will subscribe to only 2 subnets in total. I am worried that this will impact my validators return. + +Previously, having more validators means subscribing to more subnets. Since the change, a node will now only subscribe to 2 subnets in total. This will bring about significant reductions in bandwidth for nodes with multiple validators. + +While subscribing to more subnets can ensure you have peers on a wider range of subnets, these subscriptions consume resources and bandwidth. This does not significantly increase the performance of the node, however it does benefit other nodes on the network. + +If you would still like to subscribe to all subnets, you can use the flag `subscribe-all-subnets`. This may improve the block rewards by 1-5%, though it comes at the cost of a much higher bandwidth requirement. + ## Miscellaneous ### What should I do if I lose my slashing protection database?