Initial work towards v0.2.0 (#924)
* Remove ping protocol
* Initial renaming of network services
* Correct rebasing relative to latest master
* Start updating types
* Adds HashMapDelay struct to utils
* Initial network restructure
* Network restructure. Adds new types for v0.2.0
* Removes build artefacts
* Shift validation to beacon chain
* Temporarily remove gossip validation
This is to be updated to match current optimisation efforts.
* Adds AggregateAndProof
* Begin rebuilding pubsub encoding/decoding
* Signature hacking
* Shift gossipsup decoding into eth2_libp2p
* Existing EF tests passing with fake_crypto
* Shifts block encoding/decoding into RPC
* Delete outdated API spec
* All release tests passing bar genesis state parsing
* Update and test YamlConfig
* Update to spec v0.10 compatible BLS
* Updates to BLS EF tests
* Add EF test for AggregateVerify
And delete unused hash2curve tests for uncompressed points
* Update EF tests to v0.10.1
* Use optional block root correctly in block proc
* Use genesis fork in deposit domain. All tests pass
* Fast aggregate verify test
* Update REST API docs
* Fix unused import
* Bump spec tags to v0.10.1
* Add `seconds_per_eth1_block` to chainspec
* Update to timestamp based eth1 voting scheme
* Return None from `get_votes_to_consider` if block cache is empty
* Handle overflows in `is_candidate_block`
* Revert to failing tests
* Fix eth1 data sets test
* Choose default vote according to spec
* Fix collect_valid_votes tests
* Fix `get_votes_to_consider` to choose all eligible blocks
* Uncomment winning_vote tests
* Add comments; remove unused code
* Reduce seconds_per_eth1_block for simulation
* Addressed review comments
* Add test for default vote case
* Fix logs
* Remove unused functions
* Meter default eth1 votes
* Fix comments
* Progress on attestation service
* Address review comments; remove unused dependency
* Initial work on removing libp2p lock
* Add LRU caches to store (rollup)
* Update attestation validation for DB changes (WIP)
* Initial version of should_forward_block
* Scaffold
* Progress on attestation validation
Also, consolidate prod+testing slot clocks so that they share much
of the same implementation and can both handle sub-slot time changes.
* Removes lock from libp2p service
* Completed network lock removal
* Finish(?) attestation processing
* Correct network termination future
* Add slot check to block check
* Correct fmt issues
* Remove Drop implementation for network service
* Add first attempt at attestation proc. re-write
* Add version 2 of attestation processing
* Minor fixes
* Add validator pubkey cache
* Make get_indexed_attestation take a committee
* Link signature processing into new attn verification
* First working version
* Ensure pubkey cache is updated
* Add more metrics, slight optimizations
* Clone committee cache during attestation processing
* Update shuffling cache during block processing
* Remove old commented-out code
* Fix shuffling cache insert bug
* Used indexed attestation in fork choice
* Restructure attn processing, add metrics
* Add more detailed metrics
* Tidy, fix failing tests
* Fix failing tests, tidy
* Address reviewers suggestions
* Disable/delete two outdated tests
* Modification of validator for subscriptions
* Add slot signing to validator client
* Further progress on validation subscription
* Adds necessary validator subscription functionality
* Add new Pubkeys struct to signature_sets
* Refactor with functional approach
* Update beacon chain
* Clean up validator <-> beacon node http types
* Add aggregator status to ValidatorDuty
* Impl Clone for manual slot clock
* Fix minor errors
* Further progress validator client subscription
* Initial subscription and aggregation handling
* Remove decompressed member from pubkey bytes
* Progress to modifying val client for attestation aggregation
* First draft of validator client upgrade for aggregate attestations
* Add hashmap for indices lookup
* Add state cache, remove store cache
* Only build the head committee cache
* Removes lock on a network channel
* Partially implement beacon node subscription http api
* Correct compilation issues
* Change `get_attesting_indices` to use Vec
* Fix failing test
* Partial implementation of timer
* Adds timer, removes exit_future, http api to op pool
* Partial multiple aggregate attestation handling
* Permits bulk messages accross gossipsub network channel
* Correct compile issues
* Improve gosispsub messaging and correct rest api helpers
* Added global gossipsub subscriptions
* Update validator subscriptions data structs
* Tidy
* Re-structure validator subscriptions
* Initial handling of subscriptions
* Re-structure network service
* Add pubkey cache persistence file
* Add more comments
* Integrate persistence file into builder
* Add pubkey cache tests
* Add HashSetDelay and introduce into attestation service
* Handles validator subscriptions
* Add data_dir to beacon chain builder
* Remove Option in pubkey cache persistence file
* Ensure consistency between datadir/data_dir
* Fix failing network test
* Peer subnet discovery gets queued for future subscriptions
* Reorganise attestation service functions
* Initial wiring of attestation service
* First draft of attestation service timing logic
* Correct minor typos
* Tidy
* Fix todos
* Improve tests
* Add PeerInfo to connected peers mapping
* Fix compile error
* Fix compile error from merge
* Split up block processing metrics
* Tidy
* Refactor get_pubkey_from_state
* Remove commented-out code
* Rename state_cache -> checkpoint_cache
* Rename Checkpoint -> Snapshot
* Tidy, add comments
* Tidy up find_head function
* Change some checkpoint -> snapshot
* Add tests
* Expose max_len
* Remove dead code
* Tidy
* Fix bug
* Add sync-speed metric
* Add first attempt at VerifiableBlock
* Start integrating into beacon chain
* Integrate VerifiableBlock
* Rename VerifableBlock -> PartialBlockVerification
* Add start of typed methods
* Add progress
* Add further progress
* Rename structs
* Add full block verification to block_processing.rs
* Further beacon chain integration
* Update checks for gossip
* Add todo
* Start adding segement verification
* Add passing chain segement test
* Initial integration with batch sync
* Minor changes
* Tidy, add more error checking
* Start adding chain_segment tests
* Finish invalid signature tests
* Include single and gossip verified blocks in tests
* Add gossip verification tests
* Start adding docs
* Finish adding comments to block_processing.rs
* Rename block_processing.rs -> block_verification
* Start removing old block processing code
* Fixes beacon_chain compilation
* Fix project-wide compile errors
* Remove old code
* Correct code to pass all tests
* Fix bug with beacon proposer index
* Fix shim for BlockProcessingError
* Only process one epoch at a time
* Fix loop in chain segment processing
* Correct tests from master merge
* Add caching for state.eth1_data_votes
* Add BeaconChain::validator_pubkey
* Revert "Add caching for state.eth1_data_votes"
This reverts commit cd73dcd6434fb8d8e6bf30c5356355598ea7b78e.
Co-authored-by: Grant Wuerker <gwuerker@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: Michael Sproul <micsproul@gmail.com>
Co-authored-by: pawan <pawandhananjay@gmail.com>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
2020-03-17 06:24:44 +00:00
|
|
|
use crate::metrics;
|
2022-07-20 23:16:54 +00:00
|
|
|
use beacon_chain::{
|
2023-01-31 17:26:23 +00:00
|
|
|
capella_readiness::CapellaReadiness,
|
2022-07-20 23:16:54 +00:00
|
|
|
merge_readiness::{MergeConfig, MergeReadiness},
|
|
|
|
BeaconChain, BeaconChainTypes, ExecutionStatus,
|
|
|
|
};
|
Rename eth2_libp2p to lighthouse_network (#2702)
## Description
The `eth2_libp2p` crate was originally named and designed to incorporate a simple libp2p integration into lighthouse. Since its origins the crates purpose has expanded dramatically. It now houses a lot more sophistication that is specific to lighthouse and no longer just a libp2p integration.
As of this writing it currently houses the following high-level lighthouse-specific logic:
- Lighthouse's implementation of the eth2 RPC protocol and specific encodings/decodings
- Integration and handling of ENRs with respect to libp2p and eth2
- Lighthouse's discovery logic, its integration with discv5 and logic about searching and handling peers.
- Lighthouse's peer manager - This is a large module handling various aspects of Lighthouse's network, such as peer scoring, handling pings and metadata, connection maintenance and recording, etc.
- Lighthouse's peer database - This is a collection of information stored for each individual peer which is specific to lighthouse. We store connection state, sync state, last seen ips and scores etc. The data stored for each peer is designed for various elements of the lighthouse code base such as syncing and the http api.
- Gossipsub scoring - This stores a collection of gossipsub 1.1 scoring mechanisms that are continuously analyssed and updated based on the ethereum 2 networks and how Lighthouse performs on these networks.
- Lighthouse specific types for managing gossipsub topics, sync status and ENR fields
- Lighthouse's network HTTP API metrics - A collection of metrics for lighthouse network monitoring
- Lighthouse's custom configuration of all networking protocols, RPC, gossipsub, discovery, identify and libp2p.
Therefore it makes sense to rename the crate to be more akin to its current purposes, simply that it manages the majority of Lighthouse's network stack. This PR renames this crate to `lighthouse_network`
Co-authored-by: Paul Hauner <paul@paulhauner.com>
2021-10-19 00:30:39 +00:00
|
|
|
use lighthouse_network::{types::SyncState, NetworkGlobals};
|
2021-10-07 11:24:57 +00:00
|
|
|
use slog::{crit, debug, error, info, warn, Logger};
|
2019-12-06 07:44:38 +00:00
|
|
|
use slot_clock::SlotClock;
|
|
|
|
use std::sync::Arc;
|
|
|
|
use std::time::{Duration, Instant};
|
2022-07-20 23:16:54 +00:00
|
|
|
use tokio::sync::Mutex;
|
2020-11-28 05:30:57 +00:00
|
|
|
use tokio::time::sleep;
|
2022-07-20 23:16:54 +00:00
|
|
|
use types::*;
|
2019-12-06 07:44:38 +00:00
|
|
|
|
|
|
|
/// Create a warning log whenever the peer count is at or below this value.
|
|
|
|
pub const WARN_PEER_COUNT: usize = 1;
|
|
|
|
|
2020-05-26 03:25:52 +00:00
|
|
|
const DAYS_PER_WEEK: i64 = 7;
|
|
|
|
const HOURS_PER_DAY: i64 = 24;
|
|
|
|
const MINUTES_PER_HOUR: i64 = 60;
|
2019-12-06 07:44:38 +00:00
|
|
|
|
2019-12-09 06:23:43 +00:00
|
|
|
/// The number of historical observations that should be used to determine the average sync time.
|
|
|
|
const SPEEDO_OBSERVATIONS: usize = 4;
|
|
|
|
|
2021-09-22 00:37:28 +00:00
|
|
|
/// The number of slots between logs that give detail about backfill process.
|
|
|
|
const BACKFILL_LOG_INTERVAL: u64 = 5;
|
|
|
|
|
2019-12-06 07:44:38 +00:00
|
|
|
/// Spawns a notifier service which periodically logs information about the node.
|
|
|
|
pub fn spawn_notifier<T: BeaconChainTypes>(
|
2020-10-05 07:45:54 +00:00
|
|
|
executor: task_executor::TaskExecutor,
|
2019-12-06 07:44:38 +00:00
|
|
|
beacon_chain: Arc<BeaconChain<T>>,
|
Initial work towards v0.2.0 (#924)
* Remove ping protocol
* Initial renaming of network services
* Correct rebasing relative to latest master
* Start updating types
* Adds HashMapDelay struct to utils
* Initial network restructure
* Network restructure. Adds new types for v0.2.0
* Removes build artefacts
* Shift validation to beacon chain
* Temporarily remove gossip validation
This is to be updated to match current optimisation efforts.
* Adds AggregateAndProof
* Begin rebuilding pubsub encoding/decoding
* Signature hacking
* Shift gossipsup decoding into eth2_libp2p
* Existing EF tests passing with fake_crypto
* Shifts block encoding/decoding into RPC
* Delete outdated API spec
* All release tests passing bar genesis state parsing
* Update and test YamlConfig
* Update to spec v0.10 compatible BLS
* Updates to BLS EF tests
* Add EF test for AggregateVerify
And delete unused hash2curve tests for uncompressed points
* Update EF tests to v0.10.1
* Use optional block root correctly in block proc
* Use genesis fork in deposit domain. All tests pass
* Fast aggregate verify test
* Update REST API docs
* Fix unused import
* Bump spec tags to v0.10.1
* Add `seconds_per_eth1_block` to chainspec
* Update to timestamp based eth1 voting scheme
* Return None from `get_votes_to_consider` if block cache is empty
* Handle overflows in `is_candidate_block`
* Revert to failing tests
* Fix eth1 data sets test
* Choose default vote according to spec
* Fix collect_valid_votes tests
* Fix `get_votes_to_consider` to choose all eligible blocks
* Uncomment winning_vote tests
* Add comments; remove unused code
* Reduce seconds_per_eth1_block for simulation
* Addressed review comments
* Add test for default vote case
* Fix logs
* Remove unused functions
* Meter default eth1 votes
* Fix comments
* Progress on attestation service
* Address review comments; remove unused dependency
* Initial work on removing libp2p lock
* Add LRU caches to store (rollup)
* Update attestation validation for DB changes (WIP)
* Initial version of should_forward_block
* Scaffold
* Progress on attestation validation
Also, consolidate prod+testing slot clocks so that they share much
of the same implementation and can both handle sub-slot time changes.
* Removes lock from libp2p service
* Completed network lock removal
* Finish(?) attestation processing
* Correct network termination future
* Add slot check to block check
* Correct fmt issues
* Remove Drop implementation for network service
* Add first attempt at attestation proc. re-write
* Add version 2 of attestation processing
* Minor fixes
* Add validator pubkey cache
* Make get_indexed_attestation take a committee
* Link signature processing into new attn verification
* First working version
* Ensure pubkey cache is updated
* Add more metrics, slight optimizations
* Clone committee cache during attestation processing
* Update shuffling cache during block processing
* Remove old commented-out code
* Fix shuffling cache insert bug
* Used indexed attestation in fork choice
* Restructure attn processing, add metrics
* Add more detailed metrics
* Tidy, fix failing tests
* Fix failing tests, tidy
* Address reviewers suggestions
* Disable/delete two outdated tests
* Modification of validator for subscriptions
* Add slot signing to validator client
* Further progress on validation subscription
* Adds necessary validator subscription functionality
* Add new Pubkeys struct to signature_sets
* Refactor with functional approach
* Update beacon chain
* Clean up validator <-> beacon node http types
* Add aggregator status to ValidatorDuty
* Impl Clone for manual slot clock
* Fix minor errors
* Further progress validator client subscription
* Initial subscription and aggregation handling
* Remove decompressed member from pubkey bytes
* Progress to modifying val client for attestation aggregation
* First draft of validator client upgrade for aggregate attestations
* Add hashmap for indices lookup
* Add state cache, remove store cache
* Only build the head committee cache
* Removes lock on a network channel
* Partially implement beacon node subscription http api
* Correct compilation issues
* Change `get_attesting_indices` to use Vec
* Fix failing test
* Partial implementation of timer
* Adds timer, removes exit_future, http api to op pool
* Partial multiple aggregate attestation handling
* Permits bulk messages accross gossipsub network channel
* Correct compile issues
* Improve gosispsub messaging and correct rest api helpers
* Added global gossipsub subscriptions
* Update validator subscriptions data structs
* Tidy
* Re-structure validator subscriptions
* Initial handling of subscriptions
* Re-structure network service
* Add pubkey cache persistence file
* Add more comments
* Integrate persistence file into builder
* Add pubkey cache tests
* Add HashSetDelay and introduce into attestation service
* Handles validator subscriptions
* Add data_dir to beacon chain builder
* Remove Option in pubkey cache persistence file
* Ensure consistency between datadir/data_dir
* Fix failing network test
* Peer subnet discovery gets queued for future subscriptions
* Reorganise attestation service functions
* Initial wiring of attestation service
* First draft of attestation service timing logic
* Correct minor typos
* Tidy
* Fix todos
* Improve tests
* Add PeerInfo to connected peers mapping
* Fix compile error
* Fix compile error from merge
* Split up block processing metrics
* Tidy
* Refactor get_pubkey_from_state
* Remove commented-out code
* Rename state_cache -> checkpoint_cache
* Rename Checkpoint -> Snapshot
* Tidy, add comments
* Tidy up find_head function
* Change some checkpoint -> snapshot
* Add tests
* Expose max_len
* Remove dead code
* Tidy
* Fix bug
* Add sync-speed metric
* Add first attempt at VerifiableBlock
* Start integrating into beacon chain
* Integrate VerifiableBlock
* Rename VerifableBlock -> PartialBlockVerification
* Add start of typed methods
* Add progress
* Add further progress
* Rename structs
* Add full block verification to block_processing.rs
* Further beacon chain integration
* Update checks for gossip
* Add todo
* Start adding segement verification
* Add passing chain segement test
* Initial integration with batch sync
* Minor changes
* Tidy, add more error checking
* Start adding chain_segment tests
* Finish invalid signature tests
* Include single and gossip verified blocks in tests
* Add gossip verification tests
* Start adding docs
* Finish adding comments to block_processing.rs
* Rename block_processing.rs -> block_verification
* Start removing old block processing code
* Fixes beacon_chain compilation
* Fix project-wide compile errors
* Remove old code
* Correct code to pass all tests
* Fix bug with beacon proposer index
* Fix shim for BlockProcessingError
* Only process one epoch at a time
* Fix loop in chain segment processing
* Correct tests from master merge
* Add caching for state.eth1_data_votes
* Add BeaconChain::validator_pubkey
* Revert "Add caching for state.eth1_data_votes"
This reverts commit cd73dcd6434fb8d8e6bf30c5356355598ea7b78e.
Co-authored-by: Grant Wuerker <gwuerker@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: Michael Sproul <micsproul@gmail.com>
Co-authored-by: pawan <pawandhananjay@gmail.com>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
2020-03-17 06:24:44 +00:00
|
|
|
network: Arc<NetworkGlobals<T::EthSpec>>,
|
2021-01-19 09:39:51 +00:00
|
|
|
seconds_per_slot: u64,
|
2020-06-04 11:48:05 +00:00
|
|
|
) -> Result<(), String> {
|
2021-01-19 09:39:51 +00:00
|
|
|
let slot_duration = Duration::from_secs(seconds_per_slot);
|
2019-12-06 07:44:38 +00:00
|
|
|
|
2019-12-09 06:23:43 +00:00
|
|
|
let speedo = Mutex::new(Speedo::default());
|
2020-06-04 11:48:05 +00:00
|
|
|
let log = executor.log().clone();
|
2019-12-06 07:44:38 +00:00
|
|
|
|
2021-09-22 00:37:28 +00:00
|
|
|
// Keep track of sync state and reset the speedo on specific sync state changes.
|
|
|
|
// Specifically, if we switch between a sync and a backfill sync, reset the speedo.
|
|
|
|
let mut current_sync_state = network.sync_state();
|
|
|
|
|
|
|
|
// Store info if we are required to do a backfill sync.
|
|
|
|
let original_anchor_slot = beacon_chain
|
|
|
|
.store
|
|
|
|
.get_anchor_info()
|
|
|
|
.map(|ai| ai.oldest_block_slot);
|
|
|
|
|
2020-05-17 11:16:48 +00:00
|
|
|
let interval_future = async move {
|
2020-05-26 03:25:52 +00:00
|
|
|
// Perform pre-genesis logging.
|
|
|
|
loop {
|
|
|
|
match beacon_chain.slot_clock.duration_to_next_slot() {
|
|
|
|
// If the duration to the next slot is greater than the slot duration, then we are
|
|
|
|
// waiting for genesis.
|
|
|
|
Some(next_slot) if next_slot > slot_duration => {
|
|
|
|
info!(
|
|
|
|
log,
|
|
|
|
"Waiting for genesis";
|
|
|
|
"peers" => peer_count_pretty(network.connected_peers()),
|
|
|
|
"wait_time" => estimated_time_pretty(Some(next_slot.as_secs() as f64)),
|
|
|
|
);
|
2020-11-30 20:29:17 +00:00
|
|
|
eth1_logging(&beacon_chain, &log);
|
2020-11-28 05:30:57 +00:00
|
|
|
sleep(slot_duration).await;
|
2020-05-26 03:25:52 +00:00
|
|
|
}
|
|
|
|
_ => break,
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// Perform post-genesis logging.
|
2021-09-22 00:37:28 +00:00
|
|
|
let mut last_backfill_log_slot = None;
|
2022-07-20 23:16:54 +00:00
|
|
|
|
2021-02-10 23:29:49 +00:00
|
|
|
loop {
|
2022-08-29 14:34:43 +00:00
|
|
|
// Run the notifier half way through each slot.
|
|
|
|
//
|
|
|
|
// Keep remeasuring the offset rather than using an interval, so that we can correct
|
|
|
|
// for system time clock adjustments.
|
|
|
|
let wait = match beacon_chain.slot_clock.duration_to_next_slot() {
|
|
|
|
Some(duration) => duration + slot_duration / 2,
|
|
|
|
None => {
|
|
|
|
warn!(log, "Unable to read current slot");
|
|
|
|
sleep(slot_duration).await;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
};
|
|
|
|
sleep(wait).await;
|
|
|
|
|
2020-02-19 11:12:25 +00:00
|
|
|
let connected_peer_count = network.connected_peers();
|
2020-04-19 10:45:25 +00:00
|
|
|
let sync_state = network.sync_state();
|
2019-12-06 07:44:38 +00:00
|
|
|
|
2021-09-22 00:37:28 +00:00
|
|
|
// Determine if we have switched syncing chains
|
|
|
|
if sync_state != current_sync_state {
|
|
|
|
match (current_sync_state, &sync_state) {
|
|
|
|
(_, SyncState::BackFillSyncing { .. }) => {
|
|
|
|
// We have transitioned to a backfill sync. Reset the speedo.
|
2022-07-20 23:16:54 +00:00
|
|
|
let mut speedo = speedo.lock().await;
|
2021-09-22 00:37:28 +00:00
|
|
|
speedo.clear();
|
|
|
|
}
|
|
|
|
(SyncState::BackFillSyncing { .. }, _) => {
|
|
|
|
// We have transitioned from a backfill sync, reset the speedo
|
2022-07-20 23:16:54 +00:00
|
|
|
let mut speedo = speedo.lock().await;
|
2021-09-22 00:37:28 +00:00
|
|
|
speedo.clear();
|
|
|
|
}
|
|
|
|
(_, _) => {}
|
|
|
|
}
|
|
|
|
current_sync_state = sync_state;
|
|
|
|
}
|
|
|
|
|
Use async code when interacting with EL (#3244)
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
|
|
|
let cached_head = beacon_chain.canonical_head.cached_head();
|
|
|
|
let head_slot = cached_head.head_slot();
|
|
|
|
let head_root = cached_head.head_block_root();
|
|
|
|
let finalized_checkpoint = cached_head.finalized_checkpoint();
|
2021-05-26 05:58:41 +00:00
|
|
|
|
|
|
|
metrics::set_gauge(&metrics::NOTIFIER_HEAD_SLOT, head_slot.as_u64() as i64);
|
|
|
|
|
2021-02-10 23:29:49 +00:00
|
|
|
let current_slot = match beacon_chain.slot() {
|
|
|
|
Ok(slot) => slot,
|
|
|
|
Err(e) => {
|
|
|
|
error!(
|
|
|
|
log,
|
|
|
|
"Unable to read current slot";
|
|
|
|
"error" => format!("{:?}", e)
|
|
|
|
);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
};
|
|
|
|
|
2019-12-06 07:44:38 +00:00
|
|
|
let current_epoch = current_slot.epoch(T::EthSpec::slots_per_epoch());
|
|
|
|
|
2021-09-22 00:37:28 +00:00
|
|
|
// The default is for regular sync but this gets modified if backfill sync is in
|
|
|
|
// progress.
|
|
|
|
let mut sync_distance = current_slot - head_slot;
|
|
|
|
|
2022-07-20 23:16:54 +00:00
|
|
|
let mut speedo = speedo.lock().await;
|
2021-09-22 00:37:28 +00:00
|
|
|
match current_sync_state {
|
|
|
|
SyncState::BackFillSyncing { .. } => {
|
|
|
|
// Observe backfilling sync info.
|
|
|
|
if let Some(oldest_slot) = original_anchor_slot {
|
|
|
|
if let Some(current_anchor_slot) = beacon_chain
|
|
|
|
.store
|
|
|
|
.get_anchor_info()
|
|
|
|
.map(|ai| ai.oldest_block_slot)
|
|
|
|
{
|
2023-05-05 03:49:23 +00:00
|
|
|
sync_distance = current_anchor_slot
|
|
|
|
.saturating_sub(beacon_chain.genesis_backfill_slot);
|
2021-09-22 00:37:28 +00:00
|
|
|
speedo
|
|
|
|
// For backfill sync use a fake slot which is the distance we've progressed from the starting `oldest_block_slot`.
|
|
|
|
.observe(
|
|
|
|
oldest_slot.saturating_sub(current_anchor_slot),
|
|
|
|
Instant::now(),
|
|
|
|
);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
SyncState::SyncingFinalized { .. }
|
|
|
|
| SyncState::SyncingHead { .. }
|
|
|
|
| SyncState::SyncTransition => {
|
|
|
|
speedo.observe(head_slot, Instant::now());
|
|
|
|
}
|
|
|
|
SyncState::Stalled | SyncState::Synced => {}
|
|
|
|
}
|
2019-12-06 07:44:38 +00:00
|
|
|
|
2021-09-22 00:37:28 +00:00
|
|
|
// NOTE: This is going to change based on which sync we are currently performing. A
|
|
|
|
// backfill sync should process slots significantly faster than the other sync
|
|
|
|
// processes.
|
2020-05-17 11:16:48 +00:00
|
|
|
metrics::set_gauge(
|
|
|
|
&metrics::SYNC_SLOTS_PER_SECOND,
|
2020-12-03 01:10:26 +00:00
|
|
|
speedo.slots_per_second().unwrap_or(0_f64) as i64,
|
2020-05-17 11:16:48 +00:00
|
|
|
);
|
Initial work towards v0.2.0 (#924)
* Remove ping protocol
* Initial renaming of network services
* Correct rebasing relative to latest master
* Start updating types
* Adds HashMapDelay struct to utils
* Initial network restructure
* Network restructure. Adds new types for v0.2.0
* Removes build artefacts
* Shift validation to beacon chain
* Temporarily remove gossip validation
This is to be updated to match current optimisation efforts.
* Adds AggregateAndProof
* Begin rebuilding pubsub encoding/decoding
* Signature hacking
* Shift gossipsup decoding into eth2_libp2p
* Existing EF tests passing with fake_crypto
* Shifts block encoding/decoding into RPC
* Delete outdated API spec
* All release tests passing bar genesis state parsing
* Update and test YamlConfig
* Update to spec v0.10 compatible BLS
* Updates to BLS EF tests
* Add EF test for AggregateVerify
And delete unused hash2curve tests for uncompressed points
* Update EF tests to v0.10.1
* Use optional block root correctly in block proc
* Use genesis fork in deposit domain. All tests pass
* Fast aggregate verify test
* Update REST API docs
* Fix unused import
* Bump spec tags to v0.10.1
* Add `seconds_per_eth1_block` to chainspec
* Update to timestamp based eth1 voting scheme
* Return None from `get_votes_to_consider` if block cache is empty
* Handle overflows in `is_candidate_block`
* Revert to failing tests
* Fix eth1 data sets test
* Choose default vote according to spec
* Fix collect_valid_votes tests
* Fix `get_votes_to_consider` to choose all eligible blocks
* Uncomment winning_vote tests
* Add comments; remove unused code
* Reduce seconds_per_eth1_block for simulation
* Addressed review comments
* Add test for default vote case
* Fix logs
* Remove unused functions
* Meter default eth1 votes
* Fix comments
* Progress on attestation service
* Address review comments; remove unused dependency
* Initial work on removing libp2p lock
* Add LRU caches to store (rollup)
* Update attestation validation for DB changes (WIP)
* Initial version of should_forward_block
* Scaffold
* Progress on attestation validation
Also, consolidate prod+testing slot clocks so that they share much
of the same implementation and can both handle sub-slot time changes.
* Removes lock from libp2p service
* Completed network lock removal
* Finish(?) attestation processing
* Correct network termination future
* Add slot check to block check
* Correct fmt issues
* Remove Drop implementation for network service
* Add first attempt at attestation proc. re-write
* Add version 2 of attestation processing
* Minor fixes
* Add validator pubkey cache
* Make get_indexed_attestation take a committee
* Link signature processing into new attn verification
* First working version
* Ensure pubkey cache is updated
* Add more metrics, slight optimizations
* Clone committee cache during attestation processing
* Update shuffling cache during block processing
* Remove old commented-out code
* Fix shuffling cache insert bug
* Used indexed attestation in fork choice
* Restructure attn processing, add metrics
* Add more detailed metrics
* Tidy, fix failing tests
* Fix failing tests, tidy
* Address reviewers suggestions
* Disable/delete two outdated tests
* Modification of validator for subscriptions
* Add slot signing to validator client
* Further progress on validation subscription
* Adds necessary validator subscription functionality
* Add new Pubkeys struct to signature_sets
* Refactor with functional approach
* Update beacon chain
* Clean up validator <-> beacon node http types
* Add aggregator status to ValidatorDuty
* Impl Clone for manual slot clock
* Fix minor errors
* Further progress validator client subscription
* Initial subscription and aggregation handling
* Remove decompressed member from pubkey bytes
* Progress to modifying val client for attestation aggregation
* First draft of validator client upgrade for aggregate attestations
* Add hashmap for indices lookup
* Add state cache, remove store cache
* Only build the head committee cache
* Removes lock on a network channel
* Partially implement beacon node subscription http api
* Correct compilation issues
* Change `get_attesting_indices` to use Vec
* Fix failing test
* Partial implementation of timer
* Adds timer, removes exit_future, http api to op pool
* Partial multiple aggregate attestation handling
* Permits bulk messages accross gossipsub network channel
* Correct compile issues
* Improve gosispsub messaging and correct rest api helpers
* Added global gossipsub subscriptions
* Update validator subscriptions data structs
* Tidy
* Re-structure validator subscriptions
* Initial handling of subscriptions
* Re-structure network service
* Add pubkey cache persistence file
* Add more comments
* Integrate persistence file into builder
* Add pubkey cache tests
* Add HashSetDelay and introduce into attestation service
* Handles validator subscriptions
* Add data_dir to beacon chain builder
* Remove Option in pubkey cache persistence file
* Ensure consistency between datadir/data_dir
* Fix failing network test
* Peer subnet discovery gets queued for future subscriptions
* Reorganise attestation service functions
* Initial wiring of attestation service
* First draft of attestation service timing logic
* Correct minor typos
* Tidy
* Fix todos
* Improve tests
* Add PeerInfo to connected peers mapping
* Fix compile error
* Fix compile error from merge
* Split up block processing metrics
* Tidy
* Refactor get_pubkey_from_state
* Remove commented-out code
* Rename state_cache -> checkpoint_cache
* Rename Checkpoint -> Snapshot
* Tidy, add comments
* Tidy up find_head function
* Change some checkpoint -> snapshot
* Add tests
* Expose max_len
* Remove dead code
* Tidy
* Fix bug
* Add sync-speed metric
* Add first attempt at VerifiableBlock
* Start integrating into beacon chain
* Integrate VerifiableBlock
* Rename VerifableBlock -> PartialBlockVerification
* Add start of typed methods
* Add progress
* Add further progress
* Rename structs
* Add full block verification to block_processing.rs
* Further beacon chain integration
* Update checks for gossip
* Add todo
* Start adding segement verification
* Add passing chain segement test
* Initial integration with batch sync
* Minor changes
* Tidy, add more error checking
* Start adding chain_segment tests
* Finish invalid signature tests
* Include single and gossip verified blocks in tests
* Add gossip verification tests
* Start adding docs
* Finish adding comments to block_processing.rs
* Rename block_processing.rs -> block_verification
* Start removing old block processing code
* Fixes beacon_chain compilation
* Fix project-wide compile errors
* Remove old code
* Correct code to pass all tests
* Fix bug with beacon proposer index
* Fix shim for BlockProcessingError
* Only process one epoch at a time
* Fix loop in chain segment processing
* Correct tests from master merge
* Add caching for state.eth1_data_votes
* Add BeaconChain::validator_pubkey
* Revert "Add caching for state.eth1_data_votes"
This reverts commit cd73dcd6434fb8d8e6bf30c5356355598ea7b78e.
Co-authored-by: Grant Wuerker <gwuerker@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: Michael Sproul <micsproul@gmail.com>
Co-authored-by: pawan <pawandhananjay@gmail.com>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
2020-03-17 06:24:44 +00:00
|
|
|
|
2019-12-06 07:44:38 +00:00
|
|
|
if connected_peer_count <= WARN_PEER_COUNT {
|
2019-12-06 23:11:31 +00:00
|
|
|
warn!(log, "Low peer count"; "peer_count" => peer_count_pretty(connected_peer_count));
|
2019-12-06 07:44:38 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
debug!(
|
|
|
|
log,
|
|
|
|
"Slot timer";
|
2019-12-06 23:11:31 +00:00
|
|
|
"peers" => peer_count_pretty(connected_peer_count),
|
Use async code when interacting with EL (#3244)
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
|
|
|
"finalized_root" => format!("{}", finalized_checkpoint.root),
|
|
|
|
"finalized_epoch" => finalized_checkpoint.epoch,
|
2019-12-06 07:44:38 +00:00
|
|
|
"head_block" => format!("{}", head_root),
|
|
|
|
"head_slot" => head_slot,
|
|
|
|
"current_slot" => current_slot,
|
2021-09-22 00:37:28 +00:00
|
|
|
"sync_state" =>format!("{}", current_sync_state)
|
2019-12-06 07:44:38 +00:00
|
|
|
);
|
|
|
|
|
2021-09-22 00:37:28 +00:00
|
|
|
// Log if we are backfilling.
|
|
|
|
let is_backfilling = matches!(current_sync_state, SyncState::BackFillSyncing { .. });
|
|
|
|
if is_backfilling
|
|
|
|
&& last_backfill_log_slot
|
|
|
|
.map_or(true, |slot| slot + BACKFILL_LOG_INTERVAL <= current_slot)
|
|
|
|
{
|
|
|
|
last_backfill_log_slot = Some(current_slot);
|
|
|
|
|
|
|
|
let distance = format!(
|
|
|
|
"{} slots ({})",
|
|
|
|
sync_distance.as_u64(),
|
|
|
|
slot_distance_pretty(sync_distance, slot_duration)
|
|
|
|
);
|
|
|
|
|
|
|
|
let speed = speedo.slots_per_second();
|
|
|
|
let display_speed = speed.map_or(false, |speed| speed != 0.0);
|
|
|
|
|
|
|
|
if display_speed {
|
|
|
|
info!(
|
|
|
|
log,
|
|
|
|
"Downloading historical blocks";
|
|
|
|
"distance" => distance,
|
|
|
|
"speed" => sync_speed_pretty(speed),
|
2023-05-05 03:49:23 +00:00
|
|
|
"est_time" => estimated_time_pretty(speedo.estimated_time_till_slot(original_anchor_slot.unwrap_or(current_slot).saturating_sub(beacon_chain.genesis_backfill_slot))),
|
2021-09-22 00:37:28 +00:00
|
|
|
);
|
|
|
|
} else {
|
|
|
|
info!(
|
|
|
|
log,
|
|
|
|
"Downloading historical blocks";
|
|
|
|
"distance" => distance,
|
2023-05-05 03:49:23 +00:00
|
|
|
"est_time" => estimated_time_pretty(speedo.estimated_time_till_slot(original_anchor_slot.unwrap_or(current_slot).saturating_sub(beacon_chain.genesis_backfill_slot))),
|
2021-09-22 00:37:28 +00:00
|
|
|
);
|
|
|
|
}
|
|
|
|
} else if !is_backfilling && last_backfill_log_slot.is_some() {
|
|
|
|
last_backfill_log_slot = None;
|
|
|
|
info!(
|
|
|
|
log,
|
|
|
|
"Historical block download complete";
|
|
|
|
);
|
|
|
|
}
|
|
|
|
|
2020-04-19 10:45:25 +00:00
|
|
|
// Log if we are syncing
|
2021-09-22 00:37:28 +00:00
|
|
|
if current_sync_state.is_syncing() {
|
2021-05-26 05:58:41 +00:00
|
|
|
metrics::set_gauge(&metrics::IS_SYNCED, 0);
|
2019-12-06 07:44:38 +00:00
|
|
|
let distance = format!(
|
|
|
|
"{} slots ({})",
|
2021-09-22 00:37:28 +00:00
|
|
|
sync_distance.as_u64(),
|
|
|
|
slot_distance_pretty(sync_distance, slot_duration)
|
2019-12-06 07:44:38 +00:00
|
|
|
);
|
2020-08-18 08:11:39 +00:00
|
|
|
|
|
|
|
let speed = speedo.slots_per_second();
|
|
|
|
let display_speed = speed.map_or(false, |speed| speed != 0.0);
|
|
|
|
|
|
|
|
if display_speed {
|
|
|
|
info!(
|
|
|
|
log,
|
|
|
|
"Syncing";
|
|
|
|
"peers" => peer_count_pretty(connected_peer_count),
|
|
|
|
"distance" => distance,
|
|
|
|
"speed" => sync_speed_pretty(speed),
|
|
|
|
"est_time" => estimated_time_pretty(speedo.estimated_time_till_slot(current_slot)),
|
|
|
|
);
|
|
|
|
} else {
|
|
|
|
info!(
|
|
|
|
log,
|
|
|
|
"Syncing";
|
|
|
|
"peers" => peer_count_pretty(connected_peer_count),
|
|
|
|
"distance" => distance,
|
|
|
|
"est_time" => estimated_time_pretty(speedo.estimated_time_till_slot(current_slot)),
|
|
|
|
);
|
|
|
|
}
|
2021-09-22 00:37:28 +00:00
|
|
|
} else if current_sync_state.is_synced() {
|
2021-05-26 05:58:41 +00:00
|
|
|
metrics::set_gauge(&metrics::IS_SYNCED, 1);
|
2020-07-23 14:18:00 +00:00
|
|
|
let block_info = if current_slot > head_slot {
|
|
|
|
" … empty".to_string()
|
2020-04-19 10:45:25 +00:00
|
|
|
} else {
|
2020-07-23 14:18:00 +00:00
|
|
|
head_root.to_string()
|
|
|
|
};
|
2021-10-07 11:24:57 +00:00
|
|
|
|
Use async code when interacting with EL (#3244)
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
|
|
|
let block_hash = match beacon_chain.canonical_head.head_execution_status() {
|
|
|
|
Ok(ExecutionStatus::Irrelevant(_)) => "n/a".to_string(),
|
|
|
|
Ok(ExecutionStatus::Valid(hash)) => format!("{} (verified)", hash),
|
|
|
|
Ok(ExecutionStatus::Optimistic(hash)) => {
|
2021-10-07 11:24:57 +00:00
|
|
|
warn!(
|
|
|
|
log,
|
Use async code when interacting with EL (#3244)
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
|
|
|
"Head is optimistic";
|
|
|
|
"info" => "chain not fully verified, \
|
|
|
|
block and attestation production disabled until execution engine syncs",
|
|
|
|
"execution_block_hash" => ?hash,
|
2021-10-07 11:24:57 +00:00
|
|
|
);
|
Use async code when interacting with EL (#3244)
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
|
|
|
format!("{} (unverified)", hash)
|
2021-10-07 11:24:57 +00:00
|
|
|
}
|
Use async code when interacting with EL (#3244)
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
|
|
|
Ok(ExecutionStatus::Invalid(hash)) => {
|
2021-10-07 11:24:57 +00:00
|
|
|
crit!(
|
|
|
|
log,
|
|
|
|
"Head execution payload is invalid";
|
|
|
|
"msg" => "this scenario may be unrecoverable",
|
Use async code when interacting with EL (#3244)
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
|
|
|
"execution_block_hash" => ?hash,
|
2021-10-07 11:24:57 +00:00
|
|
|
);
|
Use async code when interacting with EL (#3244)
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
|
|
|
format!("{} (invalid)", hash)
|
2021-10-07 11:24:57 +00:00
|
|
|
}
|
Use async code when interacting with EL (#3244)
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
|
|
|
Err(_) => "unknown".to_string(),
|
2021-10-07 11:24:57 +00:00
|
|
|
};
|
|
|
|
|
2020-07-23 14:18:00 +00:00
|
|
|
info!(
|
|
|
|
log,
|
|
|
|
"Synced";
|
|
|
|
"peers" => peer_count_pretty(connected_peer_count),
|
2021-10-07 11:24:57 +00:00
|
|
|
"exec_hash" => block_hash,
|
Use async code when interacting with EL (#3244)
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
|
|
|
"finalized_root" => format!("{}", finalized_checkpoint.root),
|
|
|
|
"finalized_epoch" => finalized_checkpoint.epoch,
|
2020-07-23 14:18:00 +00:00
|
|
|
"epoch" => current_epoch,
|
|
|
|
"block" => block_info,
|
|
|
|
"slot" => current_slot,
|
|
|
|
);
|
|
|
|
} else {
|
2021-05-26 05:58:41 +00:00
|
|
|
metrics::set_gauge(&metrics::IS_SYNCED, 0);
|
2020-07-23 14:18:00 +00:00
|
|
|
info!(
|
|
|
|
log,
|
|
|
|
"Searching for peers";
|
|
|
|
"peers" => peer_count_pretty(connected_peer_count),
|
Use async code when interacting with EL (#3244)
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
|
|
|
"finalized_root" => format!("{}", finalized_checkpoint.root),
|
|
|
|
"finalized_epoch" => finalized_checkpoint.epoch,
|
2020-07-23 14:18:00 +00:00
|
|
|
"head_slot" => head_slot,
|
|
|
|
"current_slot" => current_slot,
|
|
|
|
);
|
2019-12-06 07:44:38 +00:00
|
|
|
}
|
2020-11-21 00:26:15 +00:00
|
|
|
|
2020-11-30 20:29:17 +00:00
|
|
|
eth1_logging(&beacon_chain, &log);
|
2022-07-20 23:16:54 +00:00
|
|
|
merge_readiness_logging(current_slot, &beacon_chain, &log).await;
|
2023-01-31 17:26:23 +00:00
|
|
|
capella_readiness_logging(current_slot, &beacon_chain, &log).await;
|
2020-05-17 11:16:48 +00:00
|
|
|
}
|
|
|
|
};
|
2019-12-06 07:44:38 +00:00
|
|
|
|
2020-05-17 11:16:48 +00:00
|
|
|
// run the notifier on the current executor
|
2021-02-10 23:29:49 +00:00
|
|
|
executor.spawn(interval_future, "notifier");
|
2019-12-06 07:44:38 +00:00
|
|
|
|
2020-06-04 11:48:05 +00:00
|
|
|
Ok(())
|
2019-12-06 07:44:38 +00:00
|
|
|
}
|
|
|
|
|
2022-07-20 23:16:54 +00:00
|
|
|
/// Provides some helpful logging to users to indicate if their node is ready for the Bellatrix
|
|
|
|
/// fork and subsequent merge transition.
|
|
|
|
async fn merge_readiness_logging<T: BeaconChainTypes>(
|
|
|
|
current_slot: Slot,
|
|
|
|
beacon_chain: &BeaconChain<T>,
|
|
|
|
log: &Logger,
|
|
|
|
) {
|
|
|
|
let merge_completed = beacon_chain
|
|
|
|
.canonical_head
|
|
|
|
.cached_head()
|
|
|
|
.snapshot
|
|
|
|
.beacon_block
|
|
|
|
.message()
|
|
|
|
.body()
|
|
|
|
.execution_payload()
|
|
|
|
.map_or(false, |payload| {
|
|
|
|
payload.parent_hash() != ExecutionBlockHash::zero()
|
|
|
|
});
|
|
|
|
|
2022-08-15 01:31:02 +00:00
|
|
|
let has_execution_layer = beacon_chain.execution_layer.is_some();
|
|
|
|
|
|
|
|
if merge_completed && has_execution_layer
|
|
|
|
|| !beacon_chain.is_time_to_prepare_for_bellatrix(current_slot)
|
|
|
|
{
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if merge_completed && !has_execution_layer {
|
2023-01-31 17:26:23 +00:00
|
|
|
if !beacon_chain.is_time_to_prepare_for_capella(current_slot) {
|
|
|
|
// logging of the EE being offline is handled in `capella_readiness_logging()`
|
|
|
|
error!(
|
|
|
|
log,
|
|
|
|
"Execution endpoint required";
|
|
|
|
"info" => "you need an execution engine to validate blocks, see: \
|
|
|
|
https://lighthouse-book.sigmaprime.io/merge-migration.html"
|
|
|
|
);
|
|
|
|
}
|
2022-07-20 23:16:54 +00:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
match beacon_chain.check_merge_readiness().await {
|
|
|
|
MergeReadiness::Ready {
|
|
|
|
config,
|
|
|
|
current_difficulty,
|
|
|
|
} => match config {
|
|
|
|
MergeConfig {
|
|
|
|
terminal_total_difficulty: Some(ttd),
|
|
|
|
terminal_block_hash: None,
|
|
|
|
terminal_block_hash_epoch: None,
|
|
|
|
} => {
|
|
|
|
info!(
|
|
|
|
log,
|
|
|
|
"Ready for the merge";
|
|
|
|
"terminal_total_difficulty" => %ttd,
|
|
|
|
"current_difficulty" => current_difficulty
|
|
|
|
.map(|d| d.to_string())
|
2022-07-21 05:45:39 +00:00
|
|
|
.unwrap_or_else(|| "??".into()),
|
2022-07-20 23:16:54 +00:00
|
|
|
)
|
|
|
|
}
|
|
|
|
MergeConfig {
|
|
|
|
terminal_total_difficulty: _,
|
|
|
|
terminal_block_hash: Some(terminal_block_hash),
|
|
|
|
terminal_block_hash_epoch: Some(terminal_block_hash_epoch),
|
|
|
|
} => {
|
|
|
|
info!(
|
|
|
|
log,
|
|
|
|
"Ready for the merge";
|
|
|
|
"info" => "you are using override parameters, please ensure that you \
|
|
|
|
understand these parameters and their implications.",
|
|
|
|
"terminal_block_hash" => ?terminal_block_hash,
|
|
|
|
"terminal_block_hash_epoch" => ?terminal_block_hash_epoch,
|
|
|
|
)
|
|
|
|
}
|
|
|
|
other => error!(
|
|
|
|
log,
|
|
|
|
"Inconsistent merge configuration";
|
|
|
|
"config" => ?other
|
|
|
|
),
|
|
|
|
},
|
|
|
|
readiness @ MergeReadiness::NotSynced => warn!(
|
|
|
|
log,
|
|
|
|
"Not ready for merge";
|
|
|
|
"info" => %readiness,
|
|
|
|
),
|
|
|
|
readiness @ MergeReadiness::NoExecutionEndpoint => warn!(
|
|
|
|
log,
|
|
|
|
"Not ready for merge";
|
|
|
|
"info" => %readiness,
|
|
|
|
),
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2023-01-31 17:26:23 +00:00
|
|
|
/// Provides some helpful logging to users to indicate if their node is ready for Capella
|
|
|
|
async fn capella_readiness_logging<T: BeaconChainTypes>(
|
|
|
|
current_slot: Slot,
|
|
|
|
beacon_chain: &BeaconChain<T>,
|
|
|
|
log: &Logger,
|
|
|
|
) {
|
|
|
|
let capella_completed = beacon_chain
|
|
|
|
.canonical_head
|
|
|
|
.cached_head()
|
|
|
|
.snapshot
|
|
|
|
.beacon_block
|
|
|
|
.message()
|
|
|
|
.body()
|
|
|
|
.execution_payload()
|
|
|
|
.map_or(false, |payload| payload.withdrawals_root().is_ok());
|
|
|
|
|
|
|
|
let has_execution_layer = beacon_chain.execution_layer.is_some();
|
|
|
|
|
|
|
|
if capella_completed && has_execution_layer
|
|
|
|
|| !beacon_chain.is_time_to_prepare_for_capella(current_slot)
|
|
|
|
{
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if capella_completed && !has_execution_layer {
|
|
|
|
error!(
|
|
|
|
log,
|
|
|
|
"Execution endpoint required";
|
|
|
|
"info" => "you need a Capella enabled execution engine to validate blocks, see: \
|
|
|
|
https://lighthouse-book.sigmaprime.io/merge-migration.html"
|
|
|
|
);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
match beacon_chain.check_capella_readiness().await {
|
|
|
|
CapellaReadiness::Ready => {
|
2023-03-17 00:44:04 +00:00
|
|
|
info!(
|
|
|
|
log,
|
|
|
|
"Ready for Capella";
|
|
|
|
"info" => "ensure the execution endpoint is updated to the latest Capella/Shanghai release"
|
|
|
|
)
|
2023-01-31 17:26:23 +00:00
|
|
|
}
|
|
|
|
readiness @ CapellaReadiness::ExchangeCapabilitiesFailed { error: _ } => {
|
|
|
|
error!(
|
|
|
|
log,
|
|
|
|
"Not ready for Capella";
|
2023-02-21 00:05:36 +00:00
|
|
|
"hint" => "the execution endpoint may be offline",
|
2023-01-31 17:26:23 +00:00
|
|
|
"info" => %readiness,
|
|
|
|
)
|
|
|
|
}
|
|
|
|
readiness => warn!(
|
|
|
|
log,
|
|
|
|
"Not ready for Capella";
|
2023-02-21 00:05:36 +00:00
|
|
|
"hint" => "try updating the execution endpoint",
|
2023-01-31 17:26:23 +00:00
|
|
|
"info" => %readiness,
|
|
|
|
),
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-11-30 20:29:17 +00:00
|
|
|
fn eth1_logging<T: BeaconChainTypes>(beacon_chain: &BeaconChain<T>, log: &Logger) {
|
|
|
|
let current_slot_opt = beacon_chain.slot().ok();
|
|
|
|
|
Use async code when interacting with EL (#3244)
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
|
|
|
// Perform some logging about the eth1 chain
|
|
|
|
if let Some(eth1_chain) = beacon_chain.eth1_chain.as_ref() {
|
|
|
|
// No need to do logging if using the dummy backend.
|
|
|
|
if eth1_chain.is_dummy_backend() {
|
|
|
|
return;
|
|
|
|
}
|
2021-10-06 08:40:55 +00:00
|
|
|
|
Use async code when interacting with EL (#3244)
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
|
|
|
if let Some(status) = eth1_chain.sync_status(
|
|
|
|
beacon_chain.genesis_time,
|
|
|
|
current_slot_opt,
|
|
|
|
&beacon_chain.spec,
|
|
|
|
) {
|
|
|
|
debug!(
|
|
|
|
log,
|
|
|
|
"Eth1 cache sync status";
|
|
|
|
"eth1_head_block" => status.head_block_number,
|
|
|
|
"latest_cached_block_number" => status.latest_cached_block_number,
|
|
|
|
"latest_cached_timestamp" => status.latest_cached_block_timestamp,
|
|
|
|
"voting_target_timestamp" => status.voting_target_timestamp,
|
|
|
|
"ready" => status.lighthouse_is_cached_and_ready
|
|
|
|
);
|
2020-11-30 20:29:17 +00:00
|
|
|
|
Use async code when interacting with EL (#3244)
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
|
|
|
if !status.lighthouse_is_cached_and_ready {
|
|
|
|
let voting_target_timestamp = status.voting_target_timestamp;
|
2020-11-30 20:29:17 +00:00
|
|
|
|
Use async code when interacting with EL (#3244)
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
|
|
|
let distance = status
|
|
|
|
.latest_cached_block_timestamp
|
|
|
|
.map(|latest| {
|
|
|
|
voting_target_timestamp.saturating_sub(latest)
|
|
|
|
/ beacon_chain.spec.seconds_per_eth1_block
|
|
|
|
})
|
|
|
|
.map(|distance| distance.to_string())
|
|
|
|
.unwrap_or_else(|| "initializing deposits".to_string());
|
2020-11-30 20:29:17 +00:00
|
|
|
|
Use async code when interacting with EL (#3244)
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
|
|
|
warn!(
|
2020-11-30 20:29:17 +00:00
|
|
|
log,
|
2022-08-01 07:20:43 +00:00
|
|
|
"Syncing deposit contract block cache";
|
Use async code when interacting with EL (#3244)
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
|
|
|
"est_blocks_remaining" => distance,
|
2020-11-30 20:29:17 +00:00
|
|
|
);
|
|
|
|
}
|
Use async code when interacting with EL (#3244)
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
|
|
|
} else {
|
|
|
|
error!(
|
|
|
|
log,
|
2022-08-01 07:20:43 +00:00
|
|
|
"Unable to determine deposit contract sync status";
|
Use async code when interacting with EL (#3244)
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
|
|
|
);
|
2020-11-30 20:29:17 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-12-06 23:11:31 +00:00
|
|
|
/// Returns the peer count, returning something helpful if it's `usize::max_value` (effectively a
|
|
|
|
/// `None` value).
|
|
|
|
fn peer_count_pretty(peer_count: usize) -> String {
|
|
|
|
if peer_count == usize::max_value() {
|
|
|
|
String::from("--")
|
|
|
|
} else {
|
|
|
|
format!("{}", peer_count)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-12-09 06:23:43 +00:00
|
|
|
/// Returns a nicely formatted string describing the rate of slot imports per second.
|
|
|
|
fn sync_speed_pretty(slots_per_second: Option<f64>) -> String {
|
|
|
|
if let Some(slots_per_second) = slots_per_second {
|
|
|
|
format!("{:.2} slots/sec", slots_per_second)
|
|
|
|
} else {
|
|
|
|
"--".into()
|
2019-12-06 07:44:38 +00:00
|
|
|
}
|
2019-12-09 06:23:43 +00:00
|
|
|
}
|
2019-12-06 07:44:38 +00:00
|
|
|
|
2019-12-09 06:23:43 +00:00
|
|
|
/// Returns a nicely formatted string how long will we reach the target slot.
|
|
|
|
fn estimated_time_pretty(seconds_till_slot: Option<f64>) -> String {
|
|
|
|
if let Some(seconds_till_slot) = seconds_till_slot {
|
|
|
|
seconds_pretty(seconds_till_slot)
|
2019-12-06 07:44:38 +00:00
|
|
|
} else {
|
2019-12-09 06:23:43 +00:00
|
|
|
"--".into()
|
2019-12-06 07:44:38 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/// Returns a nicely formatted string describing the `slot_span` in terms of weeks, days, hours
|
|
|
|
/// and/or minutes.
|
|
|
|
fn slot_distance_pretty(slot_span: Slot, slot_duration: Duration) -> String {
|
|
|
|
if slot_duration == Duration::from_secs(0) {
|
|
|
|
return String::from("Unknown");
|
|
|
|
}
|
|
|
|
|
|
|
|
let secs = (slot_duration * slot_span.as_u64() as u32).as_secs();
|
2019-12-09 06:23:43 +00:00
|
|
|
seconds_pretty(secs as f64)
|
|
|
|
}
|
|
|
|
|
|
|
|
/// Returns a nicely formatted string describing the `slot_span` in terms of weeks, days, hours
|
|
|
|
/// and/or minutes.
|
|
|
|
fn seconds_pretty(secs: f64) -> String {
|
|
|
|
if secs <= 0.0 {
|
|
|
|
return "--".into();
|
|
|
|
}
|
2019-12-06 07:44:38 +00:00
|
|
|
|
2020-05-26 03:25:52 +00:00
|
|
|
let d = time::Duration::seconds_f64(secs);
|
|
|
|
|
|
|
|
let weeks = d.whole_weeks();
|
|
|
|
let days = d.whole_days();
|
|
|
|
let hours = d.whole_hours();
|
|
|
|
let minutes = d.whole_minutes();
|
|
|
|
|
2020-08-18 08:11:39 +00:00
|
|
|
let week_string = if weeks == 1 { "week" } else { "weeks" };
|
|
|
|
let day_string = if days == 1 { "day" } else { "days" };
|
|
|
|
let hour_string = if hours == 1 { "hr" } else { "hrs" };
|
|
|
|
let min_string = if minutes == 1 { "min" } else { "mins" };
|
|
|
|
|
2020-05-26 03:25:52 +00:00
|
|
|
if weeks > 0 {
|
2020-08-18 08:11:39 +00:00
|
|
|
format!(
|
|
|
|
"{:.0} {} {:.0} {}",
|
|
|
|
weeks,
|
|
|
|
week_string,
|
|
|
|
days % DAYS_PER_WEEK,
|
|
|
|
day_string
|
|
|
|
)
|
2020-05-26 03:25:52 +00:00
|
|
|
} else if days > 0 {
|
2020-08-18 08:11:39 +00:00
|
|
|
format!(
|
|
|
|
"{:.0} {} {:.0} {}",
|
|
|
|
days,
|
|
|
|
day_string,
|
|
|
|
hours % HOURS_PER_DAY,
|
|
|
|
hour_string
|
|
|
|
)
|
2020-05-26 03:25:52 +00:00
|
|
|
} else if hours > 0 {
|
2020-08-18 08:11:39 +00:00
|
|
|
format!(
|
|
|
|
"{:.0} {} {:.0} {}",
|
|
|
|
hours,
|
|
|
|
hour_string,
|
|
|
|
minutes % MINUTES_PER_HOUR,
|
|
|
|
min_string
|
|
|
|
)
|
2019-12-06 07:44:38 +00:00
|
|
|
} else {
|
2020-08-18 08:11:39 +00:00
|
|
|
format!("{:.0} {}", minutes, min_string)
|
2019-12-09 06:23:43 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/// "Speedo" is Australian for speedometer. This struct observes syncing times.
|
|
|
|
#[derive(Default)]
|
|
|
|
pub struct Speedo(Vec<(Slot, Instant)>);
|
|
|
|
|
|
|
|
impl Speedo {
|
|
|
|
/// Observe that we were at some `slot` at the given `instant`.
|
|
|
|
pub fn observe(&mut self, slot: Slot, instant: Instant) {
|
|
|
|
if self.0.len() > SPEEDO_OBSERVATIONS {
|
|
|
|
self.0.remove(0);
|
|
|
|
}
|
|
|
|
|
|
|
|
self.0.push((slot, instant));
|
|
|
|
}
|
|
|
|
|
|
|
|
/// Returns the average of the speeds between each observation.
|
|
|
|
///
|
|
|
|
/// Does not gracefully handle slots that are above `u32::max_value()`.
|
|
|
|
pub fn slots_per_second(&self) -> Option<f64> {
|
|
|
|
let speeds = self
|
|
|
|
.0
|
|
|
|
.windows(2)
|
|
|
|
.filter_map(|windows| {
|
|
|
|
let (slot_a, instant_a) = windows[0];
|
|
|
|
let (slot_b, instant_b) = windows[1];
|
|
|
|
|
|
|
|
// Taking advantage of saturating subtraction on `Slot`.
|
|
|
|
let distance = f64::from((slot_b - slot_a).as_u64() as u32);
|
|
|
|
|
|
|
|
let seconds = f64::from((instant_b - instant_a).as_millis() as u32) / 1_000.0;
|
|
|
|
|
|
|
|
if seconds > 0.0 {
|
|
|
|
Some(distance / seconds)
|
|
|
|
} else {
|
|
|
|
None
|
|
|
|
}
|
|
|
|
})
|
|
|
|
.collect::<Vec<f64>>();
|
|
|
|
|
|
|
|
let count = speeds.len();
|
|
|
|
let sum: f64 = speeds.iter().sum();
|
|
|
|
|
|
|
|
if count > 0 {
|
|
|
|
Some(sum / f64::from(count as u32))
|
|
|
|
} else {
|
|
|
|
None
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/// Returns the time we should reach the given `slot`, judging by the latest observation and
|
|
|
|
/// historical average syncing time.
|
|
|
|
///
|
|
|
|
/// Returns `None` if the slot is prior to our latest observed slot or we have not made any
|
|
|
|
/// observations.
|
|
|
|
pub fn estimated_time_till_slot(&self, target_slot: Slot) -> Option<f64> {
|
|
|
|
let (prev_slot, _) = self.0.last()?;
|
|
|
|
let slots_per_second = self.slots_per_second()?;
|
|
|
|
|
2019-12-11 00:02:54 +00:00
|
|
|
if target_slot > *prev_slot && slots_per_second > 0.0 {
|
2019-12-09 06:23:43 +00:00
|
|
|
let distance = (target_slot - *prev_slot).as_u64() as f64;
|
|
|
|
Some(distance / slots_per_second)
|
|
|
|
} else {
|
|
|
|
None
|
|
|
|
}
|
2019-12-06 07:44:38 +00:00
|
|
|
}
|
2021-09-22 00:37:28 +00:00
|
|
|
|
|
|
|
/// Clears all past observations to be used for an alternative sync (i.e backfill sync).
|
|
|
|
pub fn clear(&mut self) {
|
|
|
|
self.0.clear()
|
|
|
|
}
|
2019-12-06 07:44:38 +00:00
|
|
|
}
|