## Limit Backfill Sync
This PR transitions Lighthouse from syncing all the way back to genesis to only syncing back to the weak subjectivity point (~ 5 months) when syncing via a checkpoint sync.
There are a number of important points to note with this PR:
- Firstly and most importantly, this PR fundamentally shifts the default security guarantees of checkpoint syncing in Lighthouse. Prior to this PR, Lighthouse could verify the checkpoint of any given chain by ensuring the chain eventually terminates at the corresponding genesis. This guarantee can still be employed via the new CLI flag --genesis-backfill which will prompt lighthouse to the old behaviour of downloading all blocks back to genesis. The new behaviour only checks the proposer signatures for the last 5 months of blocks but cannot guarantee the chain matches the genesis chain.
- I have not modified any of the peer scoring or RPC responses. Clients syncing from gensis, will downscore new Lighthouse peers that do not possess blocks prior to the WSP. This is by design, as Lighthouse nodes of this form, need a mechanism to sort through peers in order to find useful peers in order to complete their genesis sync. We therefore do not discriminate between empty/error responses for blocks prior or post the local WSP. If we request a block that a peer does not posses, then fundamentally that peer is less useful to us than other peers.
- This will make a radical shift in that the majority of nodes will no longer store the full history of the chain. In the future we could add a pruning mechanism to remove old blocks from the db also.
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
#3873
## Proposed Changes
add a cache to optimise historical state lookup.
## Additional Info
N/A
Co-authored-by: Michael Sproul <micsproul@gmail.com>
## Issue Addressed
Update Lighthouse book to include latest information especially after Capella upgrade
## Proposed Changes
Notable changes:
- Combine Sec 4.1 & 6.1 into Sec 4, because Sec 6.1 is importing validator key which is a required step when want to run a validator
- Combine Sec 5.1 & 5.2 with Sec 5, and move Sec 5 to under Sec 9
- Added partial withdrawals in Sec 6
## Additional Info
Please provide any additional information. For example, future considerations
or information useful for reviewers.
Co-authored-by: chonghe <tanck2005@gmail.com>
## Issue Addressed
This PR un-deprecates some commonly used test util functions, e.g. `extend_chain`. Most of these were deprecated in 2020 but some of us still found them quite convenient and they're still being used a lot. If there's no issue with using them, I think we should remove the "Deprecated" comment to avoid confusion.
## Proposed Changes
- Allow Docker images to be built with different profiles via e.g. `--build-arg PROFILE=maxperf`.
- Include the build profile in `lighthouse --version`.
## Additional Info
This only affects Docker images built from source. Our published Docker images use `cross`-compiled binaries that get copied into place.
## Issue Addressed
#4150
## Proposed Changes
Maintain trusted peers in the pruning logic. ~~In principle the changes here are not necessary as a trusted peer has a max score (100) and all other peers can have at most 0 (because we don't implement positive scores). This means that we should never prune trusted peers unless we have more trusted peers than the target peer count.~~
This change shifts this logic to explicitly never prune trusted peers which I expect is the intuitive behaviour.
~~I suspect the issue in #4150 arises when a trusted peer disconnects from us for one reason or another and then we remove that peer from our peerdb as it becomes stale. When it re-connects at some large time later, it is no longer a trusted peer.~~
Currently we do disconnect trusted peers, and this PR corrects this to maintain trusted peers in the pruning logic.
As suggested in #4150 we maintain trusted peers in the db and thus we remember them even if they disconnect from us.
## Issue Addressed
[#4162](https://github.com/sigp/lighthouse/issues/4162)
## Proposed Changes
update `--logfile-no-restricted-perms` flag help text to indicate that, for Windows users, the file permissions are inherited from the parent folder
## Additional Info
N/A
## Proposed Changes
Added page explanation for authentication under Siren UI book.
## Additional Info
Please provide any additional information. For example, future considerations
or information useful for reviewers.
It is a well-known fact that IP addresses for beacon nodes used by specific validators can be de-anonymized. There is an assumed risk that a malicious user may attempt to DOS validators when producing blocks to prevent chain growth/liveness.
Although there are a number of ideas put forward to address this, there a few simple approaches we can take to mitigate this risk.
Currently, a Lighthouse user is able to set a number of beacon-nodes that their validator client can connect to. If one beacon node is taken offline, it can fallback to another. Different beacon nodes can use VPNs or rotate IPs in order to mask their IPs.
This PR provides an additional setup option which further mitigates attacks of this kind.
This PR introduces a CLI flag --proposer-only to the beacon node. Setting this flag will configure the beacon node to run with minimal peers and crucially will not subscribe to subnets or sync committees. Therefore nodes of this kind should not be identified as nodes connected to validators of any kind.
It also introduces a CLI flag --proposer-nodes to the validator client. Users can then provide a number of beacon nodes (which may or may not run the --proposer-only flag) that the Validator client will use for block production and propagation only. If these nodes fail, the validator client will fallback to the default list of beacon nodes.
Users are then able to set up a number of beacon nodes dedicated to block proposals (which are unlikely to be identified as validator nodes) and point their validator clients to produce blocks on these nodes and attest on other beacon nodes. An attack attempting to prevent liveness on the eth2 network would then need to preemptively find and attack the proposer nodes which is significantly more difficult than the default setup.
This is a follow on from: #3328
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
The latest stable version (1.69.0) of Rust was released on 20 April and contains this change:
- [Update the minimum external LLVM to 14.](https://github.com/rust-lang/rust/pull/107573/)
This impacts some of our CI workflows (build and release-test-windows) that uses LLVM 13.0. This PR updates the workflows to install LLVM 15.0.
**UPDATE**: Also updated `h2` to address [this issue](https://github.com/advisories/GHSA-f8vr-r385-rh5r)
## Issue Addressed
NA
## Proposed Changes
Avoids reprocessing loops introduced in #4179. (Also somewhat related to #4192).
Breaks the re-queue loop by only re-queuing when an RPC block is received before the attestation creation deadline.
I've put `proposal_is_known` behind a closure to avoid interacting with the `observed_proposers` lock unnecessarily.
## Additional Info
NA
## Issue Addressed
There was a [`VecDeque` bug](https://github.com/rust-lang/rust/issues/108453) in some recent versions of the Rust standard library (1.67.0 & 1.67.1) that could cause Lighthouse to panic (reported by `@Sea Monkey` on discord). See full logs below.
The issue was likely introduced in Rust 1.67.0 and [fixed](https://github.com/rust-lang/rust/pull/108475) in 1.68, and we were able to reproduce the panic ourselves using [@michaelsproul's fuzz tests](https://github.com/michaelsproul/lighthouse/blob/fuzz-lru-time-cache/beacon_node/lighthouse_network/src/peer_manager/fuzz.rs#L111) on both Rust 1.67.0 and 1.67.1.
Users that uses our Docker images or binaries are unlikely affected, as our Docker images were built with `1.66`, and latest binaries were built with latest stable (`1.68.2`). It likely impacts user that builds from source using Rust versions 1.67.x.
## Proposed Changes
Bump Rust version (MSRV) to latest stable `1.68.2`.
## Additional Info
From `@Sea Monkey` on Lighthouse Discord:
> Crash on goerli using `unstable` `dd124b2d6804d02e4e221f29387a56775acccd08`
```
thread 'tokio-runtime-worker' panicked at 'Key must exist', /mnt/goerli/goerli/lighthouse/common/lru_cache/src/time.rs:68:28
stack backtrace:
Apr 15 09:37:36.993 WARN Peer sent invalid block in single block lookup, peer_id: 16Uiu2HAm6ZuyJpVpR6y51X4Enbp8EhRBqGycQsDMPX7e5XfPYznG, error: WouldRevertFinalizedSlot { block_slot: Slot(5420212), finalized_slot: Slot(5420224) }, root: 0x10f6…3165, service: sync
0: rust_begin_unwind
at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/panicking.rs:575:5
1: core::panicking::panic_fmt
at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/panicking.rs:64:14
2: core::panicking::panic_display
at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/panicking.rs:135:5
3: core::panicking::panic_str
at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/panicking.rs:119:5
4: core::option::expect_failed
at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/option.rs:1879:5
5: lru_cache::time::LRUTimeCache<Key>::raw_remove
6: lighthouse_network::peer_manager::PeerManager<TSpec>::handle_ban_operation
7: lighthouse_network::peer_manager::PeerManager<TSpec>::handle_score_action
8: lighthouse_network::peer_manager::PeerManager<TSpec>::report_peer
9: network::service::NetworkService<T>::spawn_service::{{closure}}
10: <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll
11: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
12: <futures_util::future::future::flatten::Flatten<Fut,<Fut as core::future::future::Future>::Output> as core::future::future::Future>::poll
13: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
14: tokio::runtime::task::core::Core<T,S>::poll
15: tokio::runtime::task::harness::Harness<T,S>::poll
16: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
17: tokio::runtime::scheduler::multi_thread::worker::Context::run
18: tokio::macros::scoped_tls::ScopedKey<T>::set
19: tokio::runtime::scheduler::multi_thread::worker::run
20: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
21: tokio::runtime::task::core::Core<T,S>::poll
22: tokio::runtime::task::harness::Harness<T,S>::poll
23: tokio::runtime::blocking::pool::Inner::run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Apr 15 09:37:37.069 INFO Saved DHT state service: network
Apr 15 09:37:37.070 INFO Network service shutdown service: network
Apr 15 09:37:37.132 CRIT Task panic. This is a bug! advice: Please check above for a backtrace and notify the developers, message: <none>, task_name: network
Apr 15 09:37:37.132 INFO Internal shutdown received reason: Panic (fatal error)
Apr 15 09:37:37.133 INFO Shutting down.. reason: Failure("Panic (fatal error)")
Apr 15 09:37:37.135 WARN Unable to free worker error: channel closed, msg: did not free worker, shutdown may be underway
Apr 15 09:37:39.350 INFO Saved beacon chain to disk service: beacon
Panic (fatal error)
```
## Issue Addressed
Closes#4185
## Proposed Changes
- Set user agent to `Lighthouse/vX.Y.Z-<commit hash>` by default
- Allow tweaking user agent via `--builder-user-agent "agent"`
## Proposed Changes
Builds on #4028 to use the new payload bodies methods in the HTTP API as well.
## Caveats
The payloads by range method only works for the finalized chain, so it can't be used in the execution engine integration tests because we try to reconstruct unfinalized payloads there.
## Issue Addressed
NA
## Proposed Changes
Apply two changes to code introduced in #4179:
1. Remove the `ERRO` log for when we error on `proposer_has_been_observed()`. We were seeing a lot of this in our logs for finalized blocks and it's a bit noisy.
1. Use `false` rather than `true` for `proposal_already_known` when there is an error. If a block raises an error in `proposer_has_been_observed()` then the block must be invalid, so we should process (and reject) it now rather than queuing it.
For reference, here is one of the offending `ERRO` logs:
```
ERRO Failed to check observed proposers block_root: 0x5845…878e, source: rpc, error: FinalizedBlock { slot: Slot(5410983), finalized_slot: Slot(5411232) }
```
## Additional Info
NA
## Issue Addressed
NA
## Proposed Changes
Similar to #4181 but without the version bump and a more nuanced fix.
Patches the high CPU usage seen after the Capella fork which was caused by processing exits when there are skip slots.
## Additional Info
~~This is an imperfect solution that will cause us to drop some exits at the fork boundary. This is tracked at #4184.~~
## Issue Addressed
Updated Lighthouse book on Section 2 and added some FAQs
## Proposed Changes
All changes are made in the book/src .md files.
## Additional Info
Please provide any additional information. For example, future considerations
or information useful for reviewers.
Co-authored-by: chonghe <tanck2005@gmail.com>
Co-authored-by: Michael Sproul <micsproul@gmail.com>
## Proposed Changes
We already make some attempts to avoid processing RPC blocks when a block from the same proposer is already being processed through gossip. This PR strengthens that guarantee by using the existing cache for `observed_block_producers` to inform whether an RPC block's processing should be delayed.
## Proposed Changes
This change attempts to prevent failed re-orgs by:
1. Lowering the re-org cutoff from 2s to 1s. This is informed by a failed re-org attempted by @yorickdowne's node. The failed block was requested in the 1.5-2s window due to a Vouch failure, and failed to propagate to the majority of the network before the attestation deadline at 4s.
2. Allow users to adjust their re-org cutoff depending on observed network conditions and their risk profile. The static 2 second cutoff was too rigid.
3. Add a `--proposer-reorg-disallowed-offsets` flag which can be used to prohibit reorgs at certain slots. This is intended to help workaround an issue whereby reorging blocks at slot 1 are currently taking ~1.6s to propagate on gossip rather than ~500ms. This is suspected to be due to a cache miss in current versions of Prysm, which should be fixed in their next release.
## Additional Info
I'm of two minds about removing the `shuffling_stable` check which checks for blocks at slot 0 in the epoch. If we removed it users would be able to configure Lighthouse to try reorging at slot 0, which likely wouldn't work very well due to interactions with the proposer index cache. I think we could leave it for now and revisit it later.
## Issue Addressed
#4146
## Proposed Changes
Removes the `ExecutionOptimisticForkVersionedResponse` type and the associated Beacon API endpoint which is now deprecated. Also removes the test associated with the endpoint.
## Issue Addressed
N/A
## Proposed Changes
Adds a flag for disabling peer scoring. This is useful for local testing and testing small networks for new features.
## Issue Addressed
Attempt to fix#3812
## Proposed Changes
Move web3signer binary download script out of build script to avoid downloading unless necessary. If this works, it should also reduce the build time for all jobs that runs compilation.
> This is currently a WIP and all features are subject to alteration or removal at any time.
## Overview
The successor to #2873.
Contains the backbone of `beacon.watch` including syncing code, the initial API, and several core database tables.
See `watch/README.md` for more information, requirements and usage.
It is possible that when we go to ban a peer, there is already an unbanned message in the queue. It could lead to the case that we ban and immediately unban a peer leaving us in a state where a should-be banned peer is unbanned.
If this banned peer connects to us in this faulty state, we currently do not attempt to re-ban it. This PR does correct this also, so if we do see this error, it will now self-correct (although we shouldn't see the error in the first place).
I have also incremented the severity of not supporting protocols as I see peers ultimately get banned in a few steps and it seems to make sense to just ban them outright, rather than have them linger.
## Issue Addressed
Addresses #4117
## Proposed Changes
See https://github.com/ethereum/keymanager-APIs/pull/58 for proposed API specification.
## TODO
- [x] ~~Add submission to BN~~
- removed, see discussion in [keymanager API](https://github.com/ethereum/keymanager-APIs/pull/58)
- [x] ~~Add flag to allow voluntary exit via the API~~
- no longer needed now the VC doesn't submit exit directly
- [x] ~~Additional verification / checks, e.g. if validator on same network as BN~~
- to be done on client side
- [x] ~~Potentially wait for the message to propagate and return some exit information in the response~~
- not required
- [x] Update http tests
- [x] ~~Update lighthouse book~~
- not required if this endpoint makes it to the standard keymanager API
Co-authored-by: Paul Hauner <paul@paulhauner.com>
Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io>
## Issue Addressed
#3212
## Proposed Changes
- Introduce a new `rate_limiting_backfill_queue` - any new inbound backfill work events gets immediately sent to this FIFO queue **without any processing**
- Spawn a `backfill_scheduler` routine that pops a backfill event from the FIFO queue at specified intervals (currently halfway through a slot, or at 6s after slot start for 12s slots) and sends the event to `BeaconProcessor` via a `scheduled_backfill_work_tx` channel
- This channel gets polled last in the `InboundEvents`, and work event received is wrapped in a `InboundEvent::ScheduledBackfillWork` enum variant, which gets processed immediately or queued by the `BeaconProcessor` (existing logic applies from here)
Diagram comparing backfill processing with / without rate-limiting:
https://github.com/sigp/lighthouse/issues/3212#issuecomment-1386249922
See this comment for @paulhauner's explanation and solution: https://github.com/sigp/lighthouse/issues/3212#issuecomment-1384674956
## Additional Info
I've compared this branch (with backfill processing rate limited to to 1 and 3 batches per slot) against the latest stable version. The CPU usage during backfill sync is reduced by ~5% - 20%, more details on this page:
https://hackmd.io/@jimmygchen/SJuVpJL3j
The above testing is done on Goerli (as I don't currently have hardware for Mainnet), I'm guessing the differences are likely to be bigger on mainnet due to block size.
### TODOs
- [x] Experiment with processing multiple batches per slot. (need to think about how to do this for different slot durations)
- [x] Add option to disable rate-limiting, enabed by default.
- [x] (No longer required now we're reusing the reprocessing queue) Complete the `backfill_scheduler` task when backfill sync is completed or not required
## Issue Addressed
Update the database-migrations to include v4.0.1 for database version v16:
## Proposed Changes
Update the table by adding a row
## Additional Info
Please provide any additional information. For example, future considerations
or information useful for reviewers.
## Issue Addressed
#4127. PR to test port conflicts in CI tests .
## Proposed Changes
See issue for more details, potential solution could be adding a cache bound by time to the `unused_port` function.
## Issue Addressed
#3708
## Proposed Changes
- Add `is_finalized_block` method to `BeaconChain` in `beacon_node/beacon_chain/src/beacon_chain.rs`.
- Add `is_finalized_state` method to `BeaconChain` in `beacon_node/beacon_chain/src/beacon_chain.rs`.
- Add `fork_and_execution_optimistic_and_finalized` in `beacon_node/http_api/src/state_id.rs`.
- Add `ExecutionOptimisticFinalizedForkVersionedResponse` type in `consensus/types/src/fork_versioned_response.rs`.
- Add `execution_optimistic_finalized_fork_versioned_response`function in `beacon_node/http_api/src/version.rs`.
- Add `ExecutionOptimisticFinalizedResponse` type in `common/eth2/src/types.rs`.
- Add `add_execution_optimistic_finalized` method in `common/eth2/src/types.rs`.
- Update API response methods to include finalized.
- Remove `execution_optimistic_fork_versioned_response`
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
## Title
Optimise `update_validators` by decrypting key cache only when necessary
## Issue Addressed
Resolves [#3968: Slow performance of validator client PATCH API with hundreds of keys](https://github.com/sigp/lighthouse/issues/3968)
## Proposed Changes
1. Add a check to determine if there is at least one local definition before decrypting the key cache.
2. Assign an empty `KeyCache` when all definitions are of the `Web3Signer` type.
3. Perform cache-related operations (e.g., saving the modified key cache) only if there are local definitions.
## Additional Info
This PR addresses the excessive CPU usage and slow performance experienced when using the `PATCH lighthouse/validators/{pubkey}` request with a large number of keys. The issue was caused by the key cache using cryptography to decipher and cipher the cache entities every time the request was made. This operation called `scrypt`, which was very slow and required a lot of memory when there were many concurrent requests.
These changes have no impact on the overall functionality but can lead to significant performance improvements when working with remote signers. Importantly, the key cache is never used when there are only `Web3Signer` definitions, avoiding the expensive operation of decrypting the key cache in such cases.
Co-authored-by: Maksim Shcherbo <max.shcherbo@consensys.net>
## Issue Addressed
Which issue # does this PR address?
https://github.com/sigp/lighthouse/issues/3669
## Proposed Changes
Please list or describe the changes introduced by this PR.
- A new API to fetch fork choice data, as specified [here](https://github.com/ethereum/beacon-APIs/pull/232)
- A new integration test to test the new API
## Additional Info
Please provide any additional information. For example, future considerations
or information useful for reviewers.
- `extra_data` field specified in the beacon-API spec is not implemented, please let me know if I should instead.
Co-authored-by: Michael Sproul <micsproul@gmail.com>
## Issue Addressed
The minimum supported Rust version has been set to 1.66 as of Lighthouse v4.0.0. This PR updates Rust to 1.66 in lcli Dockerfile.
Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io>
## Proposed Changes
To prevent breakages from `cargo update`, this updates the `arbitrary` crate to a new commit from my fork. Unfortunately we still need to use my fork (even though my `bound` change was merged) because of this issue: https://github.com/rust-lang/rust-clippy/issues/10185.
In a couple of Rust versions it should be resolved upstream.
## Issue Addressed
NA
## Proposed Changes
- Bump versions.
- Bump openssl version to resolve various `cargo audit` notices.
## Additional Info
- Requires further testing