lighthouse/beacon_node/beacon_chain/src/state_advance_timer.rs

472 lines
17 KiB
Rust
Raw Normal View History

Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
//! Provides a timer which runs in the tail-end of each slot and maybe advances the state of the
//! head block forward a single slot.
//!
//! This provides an optimization with the following benefits:
//!
//! 1. Removes the burden of a single, mandatory `per_slot_processing` call from the leading-edge of
//! block processing. This helps import blocks faster.
//! 2. Allows the node to learn of the shuffling for the next epoch, before the first block from
//! that epoch has arrived. This helps reduce gossip block propagation times.
//!
//! The downsides to this optimization are:
//!
//! 1. We are required to store an additional `BeaconState` for the head block. This consumes
//! memory.
//! 2. There's a possibility that the head block is never built upon, causing wasted CPU cycles.
use crate::validator_monitor::HISTORIC_EPOCHS as VALIDATOR_MONITOR_HISTORIC_EPOCHS;
use crate::{
Optimize validator duties (#2243) ## Issue Addressed Closes #2052 ## Proposed Changes - Refactor the attester/proposer duties endpoints in the BN - Performance improvements - Fixes some potential inconsistencies with the dependent root fields. - Removes `http_api::beacon_proposer_cache` and just uses the one on the `BeaconChain` instead. - Move the code for the proposer/attester duties endpoints into separate files, for readability. - Refactor the `DutiesService` in the VC - Required to reduce the delay on broadcasting new blocks. - Gets rid of the `ValidatorDuty` shim struct that came about when we adopted the standard API. - Separate block/attestation duty tasks so that they don't block each other when one is slow. - In the VC, use `PublicKeyBytes` to represent validators instead of `PublicKey`. `PublicKey` is a legit crypto object whilst `PublicKeyBytes` is just a byte-array, it's much faster to clone/hash `PublicKeyBytes` and this change has had a significant impact on runtimes. - Unfortunately this has created lots of dust changes. - In the BN, store `PublicKeyBytes` in the `beacon_proposer_cache` and allow access to them. The HTTP API always sends `PublicKeyBytes` over the wire and the conversion from `PublicKey` -> `PublickeyBytes` is non-trivial, especially when queries have 100s/1000s of validators (like Pyrmont). - Add the `state_processing::state_advance` mod which dedups a lot of the "apply `n` skip slots to the state" code. - This also fixes a bug with some functions which were failing to include a state root as per [this comment](https://github.com/sigp/lighthouse/blob/072695284f7eff82c51f79bc921ad942fea7483a/consensus/state_processing/src/state_advance.rs#L69-L74). I couldn't find any instance of this bug that resulted in anything more severe than keying a shuffling cache by the wrong block root. - Swap the VC block service to use `mpsc` from `tokio` instead of `futures`. This is consistent with the rest of the code base. ~~This PR *reduces* the size of the codebase :tada:~~ It *used* to reduce the size of the code base before I added more comments. ## Observations on Prymont - Proposer duties times down from peaks of 450ms to consistent <1ms. - Current epoch attester duties times down from >1s peaks to a consistent 20-30ms. - Block production down from +600ms to 100-200ms. ## Additional Info - ~~Blocked on #2241~~ - ~~Blocked on #2234~~ ## TODO - [x] ~~Refactor this into some smaller PRs?~~ Leaving this as-is for now. - [x] Address `per_slot_processing` roots. - [x] Investigate slow next epoch times. Not getting added to cache on block processing? - [x] Consider [this](https://github.com/sigp/lighthouse/blob/072695284f7eff82c51f79bc921ad942fea7483a/beacon_node/store/src/hot_cold_store.rs#L811-L812) in the scenario of replacing the state roots Co-authored-by: pawan <pawandhananjay@gmail.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-03-17 05:09:57 +00:00
beacon_chain::{ATTESTATION_CACHE_LOCK_TIMEOUT, BLOCK_PROCESSING_CACHE_LOCK_TIMEOUT},
snapshot_cache::StateAdvance,
BeaconChain, BeaconChainError, BeaconChainTypes,
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
};
use slog::{debug, error, warn, Logger};
use slot_clock::SlotClock;
use state_processing::per_slot_processing;
use std::sync::{
atomic::{AtomicBool, Ordering},
Arc,
};
use task_executor::TaskExecutor;
Run fork choice before block proposal (#3168) ## Issue Addressed Upcoming spec change https://github.com/ethereum/consensus-specs/pull/2878 ## Proposed Changes 1. Run fork choice at the start of every slot, and wait for this run to complete before proposing a block. 2. As an optimisation, also run fork choice 3/4 of the way through the slot (at 9s), _dequeueing attestations for the next slot_. 3. Remove the fork choice run from the state advance timer that occurred before advancing the state. ## Additional Info ### Block Proposal Accuracy This change makes us more likely to propose on top of the correct head in the presence of re-orgs with proposer boost in play. The main scenario that this change is designed to address is described in the linked spec issue. ### Attestation Accuracy This change _also_ makes us more likely to attest to the correct head. Currently in the case of a skipped slot at `slot` we only run fork choice 9s into `slot - 1`. This means the attestations from `slot - 1` aren't taken into consideration, and any boost applied to the block from `slot - 1` is not removed (it should be). In the language of the linked spec issue, this means we are liable to attest to C, even when the majority voting weight has already caused a re-org to B. ### Why remove the call before the state advance? If we've run fork choice at the start of the slot then it has already dequeued all the attestations from the previous slot, which are the only ones eligible to influence the head in the current slot. Running fork choice again is unnecessary (unless we run it for the next slot and try to pre-empt a re-org, but I don't currently think this is a great idea). ### Performance Based on Prater testing this adds about 5-25ms of runtime to block proposal times, which are 500-1000ms on average (and spike to 5s+ sometimes due to state handling issues :cry: ). I believe this is a small enough penalty to enable it by default, with the option to disable it via the new flag `--fork-choice-before-proposal-timeout 0`. Upcoming work on block packing and state representation will also reduce block production times in general, while removing the spikes. ### Implementation Fork choice gets invoked at the start of the slot via the `per_slot_task` function called from the slot timer. It then uses a condition variable to signal to block production that fork choice has been updated. This is a bit funky, but it seems to work. One downside of the timer-based approach is that it doesn't happen automatically in most of the tests. The test added by this PR has to trigger the run manually.
2022-05-20 05:02:11 +00:00
use tokio::time::{sleep, sleep_until, Instant};
Optimize validator duties (#2243) ## Issue Addressed Closes #2052 ## Proposed Changes - Refactor the attester/proposer duties endpoints in the BN - Performance improvements - Fixes some potential inconsistencies with the dependent root fields. - Removes `http_api::beacon_proposer_cache` and just uses the one on the `BeaconChain` instead. - Move the code for the proposer/attester duties endpoints into separate files, for readability. - Refactor the `DutiesService` in the VC - Required to reduce the delay on broadcasting new blocks. - Gets rid of the `ValidatorDuty` shim struct that came about when we adopted the standard API. - Separate block/attestation duty tasks so that they don't block each other when one is slow. - In the VC, use `PublicKeyBytes` to represent validators instead of `PublicKey`. `PublicKey` is a legit crypto object whilst `PublicKeyBytes` is just a byte-array, it's much faster to clone/hash `PublicKeyBytes` and this change has had a significant impact on runtimes. - Unfortunately this has created lots of dust changes. - In the BN, store `PublicKeyBytes` in the `beacon_proposer_cache` and allow access to them. The HTTP API always sends `PublicKeyBytes` over the wire and the conversion from `PublicKey` -> `PublickeyBytes` is non-trivial, especially when queries have 100s/1000s of validators (like Pyrmont). - Add the `state_processing::state_advance` mod which dedups a lot of the "apply `n` skip slots to the state" code. - This also fixes a bug with some functions which were failing to include a state root as per [this comment](https://github.com/sigp/lighthouse/blob/072695284f7eff82c51f79bc921ad942fea7483a/consensus/state_processing/src/state_advance.rs#L69-L74). I couldn't find any instance of this bug that resulted in anything more severe than keying a shuffling cache by the wrong block root. - Swap the VC block service to use `mpsc` from `tokio` instead of `futures`. This is consistent with the rest of the code base. ~~This PR *reduces* the size of the codebase :tada:~~ It *used* to reduce the size of the code base before I added more comments. ## Observations on Prymont - Proposer duties times down from peaks of 450ms to consistent <1ms. - Current epoch attester duties times down from >1s peaks to a consistent 20-30ms. - Block production down from +600ms to 100-200ms. ## Additional Info - ~~Blocked on #2241~~ - ~~Blocked on #2234~~ ## TODO - [x] ~~Refactor this into some smaller PRs?~~ Leaving this as-is for now. - [x] Address `per_slot_processing` roots. - [x] Investigate slow next epoch times. Not getting added to cache on block processing? - [x] Consider [this](https://github.com/sigp/lighthouse/blob/072695284f7eff82c51f79bc921ad942fea7483a/beacon_node/store/src/hot_cold_store.rs#L811-L812) in the scenario of replacing the state roots Co-authored-by: pawan <pawandhananjay@gmail.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-03-17 05:09:57 +00:00
use types::{AttestationShufflingId, EthSpec, Hash256, RelativeEpoch, Slot};
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
/// If the head slot is more than `MAX_ADVANCE_DISTANCE` from the current slot, then don't perform
/// the state advancement.
///
/// This avoids doing unnecessary work whilst the node is syncing or has perhaps been put to sleep
/// for some period of time.
const MAX_ADVANCE_DISTANCE: u64 = 4;
Avoid parallel fork choice runs during sync (#3217) ## Issue Addressed Fixes an issue that @paulhauner found with the v2.3.0 release candidate whereby the fork choice runs introduced by #3168 tripped over each other during sync: ``` May 24 23:06:40.542 WARN Error signalling fork choice waiter slot: 3884129, error: ForkChoiceSignalOutOfOrder { current: Slot(3884131), latest: Slot(3884129) }, service: beacon ``` This can occur because fork choice is called from the state advance _and_ the per-slot task. When one of these runs takes a long time it can end up finishing after a run from a later slot, tripping the error above. The problem is resolved by not running either of these fork choice calls during sync. Additionally, these parallel fork choice runs were causing issues in the database: ``` May 24 07:49:05.098 WARN Found a chain that should already have been pruned, head_slot: 92925, head_block_root: 0xa76c7bf1b98e54ed4b0d8686efcfdf853484e6c2a4c67e91cbf19e5ad1f96b17, service: beacon May 24 07:49:05.101 WARN Database migration failed error: HotColdDBError(FreezeSlotError { current_split_slot: Slot(92608), proposed_split_slot: Slot(92576) }), service: beacon ``` In this case, two fork choice calls triggering the finalization processing were being processed out of order due to differences in their processing time, causing the background migrator to try to advance finalization _backwards_ :flushed:. Removing the parallel fork choice runs from sync effectively addresses the issue, because these runs are most likely to have different finalized checkpoints (because of the speed at which fork choice advances during sync). In theory it's still possible to process updates out of order if any other fork choice runs end up completing out of order, but this should be much less common. Fixing out of order fork choice runs in general is difficult as it requires architectural changes like serialising fork choice updates through a single thread, or locking fork choice along with the head when it is mutated (https://github.com/sigp/lighthouse/pull/3175). ## Proposed Changes * Don't run per-slot fork choice during sync (if head is older than 4 slots) * Don't run state-advance fork choice during sync (if head is older than 4 slots) * Check for monotonic finalization updates in the background migrator. This is a good defensive check to have, and I'm not sure why we didn't have it before (we may have had it and wrongly removed it).
2022-05-25 03:27:30 +00:00
/// Similarly for fork choice: avoid the fork choice lookahead during sync.
///
/// The value is set to 256 since this would be just over one slot (12.8s) when syncing at
/// 20 slots/second. Having a single fork-choice run interrupt syncing would have very little
/// impact whilst having 8 epochs without a block is a comfortable grace period.
const MAX_FORK_CHOICE_DISTANCE: u64 = 256;
Avoid parallel fork choice runs during sync (#3217) ## Issue Addressed Fixes an issue that @paulhauner found with the v2.3.0 release candidate whereby the fork choice runs introduced by #3168 tripped over each other during sync: ``` May 24 23:06:40.542 WARN Error signalling fork choice waiter slot: 3884129, error: ForkChoiceSignalOutOfOrder { current: Slot(3884131), latest: Slot(3884129) }, service: beacon ``` This can occur because fork choice is called from the state advance _and_ the per-slot task. When one of these runs takes a long time it can end up finishing after a run from a later slot, tripping the error above. The problem is resolved by not running either of these fork choice calls during sync. Additionally, these parallel fork choice runs were causing issues in the database: ``` May 24 07:49:05.098 WARN Found a chain that should already have been pruned, head_slot: 92925, head_block_root: 0xa76c7bf1b98e54ed4b0d8686efcfdf853484e6c2a4c67e91cbf19e5ad1f96b17, service: beacon May 24 07:49:05.101 WARN Database migration failed error: HotColdDBError(FreezeSlotError { current_split_slot: Slot(92608), proposed_split_slot: Slot(92576) }), service: beacon ``` In this case, two fork choice calls triggering the finalization processing were being processed out of order due to differences in their processing time, causing the background migrator to try to advance finalization _backwards_ :flushed:. Removing the parallel fork choice runs from sync effectively addresses the issue, because these runs are most likely to have different finalized checkpoints (because of the speed at which fork choice advances during sync). In theory it's still possible to process updates out of order if any other fork choice runs end up completing out of order, but this should be much less common. Fixing out of order fork choice runs in general is difficult as it requires architectural changes like serialising fork choice updates through a single thread, or locking fork choice along with the head when it is mutated (https://github.com/sigp/lighthouse/pull/3175). ## Proposed Changes * Don't run per-slot fork choice during sync (if head is older than 4 slots) * Don't run state-advance fork choice during sync (if head is older than 4 slots) * Check for monotonic finalization updates in the background migrator. This is a good defensive check to have, and I'm not sure why we didn't have it before (we may have had it and wrongly removed it).
2022-05-25 03:27:30 +00:00
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
#[derive(Debug)]
enum Error {
BeaconChain(BeaconChainError),
HeadMissingFromSnapshotCache(Hash256),
MaxDistanceExceeded {
current_slot: Slot,
head_slot: Slot,
},
StateAlreadyAdvanced {
block_root: Hash256,
},
BadStateSlot {
_state_slot: Slot,
_block_slot: Slot,
},
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
}
impl From<BeaconChainError> for Error {
fn from(e: BeaconChainError) -> Self {
Self::BeaconChain(e)
}
}
/// Provides a simple thread-safe lock to be used for task co-ordination. Practically equivalent to
/// `Mutex<()>`.
#[derive(Clone)]
struct Lock(Arc<AtomicBool>);
impl Lock {
/// Instantiate an unlocked self.
pub fn new() -> Self {
Self(Arc::new(AtomicBool::new(false)))
}
/// Lock self, returning `true` if the lock was already set.
pub fn lock(&self) -> bool {
self.0.fetch_or(true, Ordering::SeqCst)
}
/// Unlock self.
pub fn unlock(&self) {
self.0.store(false, Ordering::SeqCst);
}
}
/// Spawns the timer described in the module-level documentation.
pub fn spawn_state_advance_timer<T: BeaconChainTypes>(
executor: TaskExecutor,
beacon_chain: Arc<BeaconChain<T>>,
log: Logger,
) {
executor.spawn(
state_advance_timer(executor.clone(), beacon_chain, log),
"state_advance_timer",
);
}
/// Provides the timer described in the module-level documentation.
async fn state_advance_timer<T: BeaconChainTypes>(
executor: TaskExecutor,
beacon_chain: Arc<BeaconChain<T>>,
log: Logger,
) {
let is_running = Lock::new();
let slot_clock = &beacon_chain.slot_clock;
let slot_duration = slot_clock.slot_duration();
loop {
Run fork choice before block proposal (#3168) ## Issue Addressed Upcoming spec change https://github.com/ethereum/consensus-specs/pull/2878 ## Proposed Changes 1. Run fork choice at the start of every slot, and wait for this run to complete before proposing a block. 2. As an optimisation, also run fork choice 3/4 of the way through the slot (at 9s), _dequeueing attestations for the next slot_. 3. Remove the fork choice run from the state advance timer that occurred before advancing the state. ## Additional Info ### Block Proposal Accuracy This change makes us more likely to propose on top of the correct head in the presence of re-orgs with proposer boost in play. The main scenario that this change is designed to address is described in the linked spec issue. ### Attestation Accuracy This change _also_ makes us more likely to attest to the correct head. Currently in the case of a skipped slot at `slot` we only run fork choice 9s into `slot - 1`. This means the attestations from `slot - 1` aren't taken into consideration, and any boost applied to the block from `slot - 1` is not removed (it should be). In the language of the linked spec issue, this means we are liable to attest to C, even when the majority voting weight has already caused a re-org to B. ### Why remove the call before the state advance? If we've run fork choice at the start of the slot then it has already dequeued all the attestations from the previous slot, which are the only ones eligible to influence the head in the current slot. Running fork choice again is unnecessary (unless we run it for the next slot and try to pre-empt a re-org, but I don't currently think this is a great idea). ### Performance Based on Prater testing this adds about 5-25ms of runtime to block proposal times, which are 500-1000ms on average (and spike to 5s+ sometimes due to state handling issues :cry: ). I believe this is a small enough penalty to enable it by default, with the option to disable it via the new flag `--fork-choice-before-proposal-timeout 0`. Upcoming work on block packing and state representation will also reduce block production times in general, while removing the spikes. ### Implementation Fork choice gets invoked at the start of the slot via the `per_slot_task` function called from the slot timer. It then uses a condition variable to signal to block production that fork choice has been updated. This is a bit funky, but it seems to work. One downside of the timer-based approach is that it doesn't happen automatically in most of the tests. The test added by this PR has to trigger the run manually.
2022-05-20 05:02:11 +00:00
let duration_to_next_slot = match beacon_chain.slot_clock.duration_to_next_slot() {
Some(duration) => duration,
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
None => {
error!(log, "Failed to read slot clock");
// If we can't read the slot clock, just wait another slot.
sleep(slot_duration).await;
continue;
}
};
Run fork choice before block proposal (#3168) ## Issue Addressed Upcoming spec change https://github.com/ethereum/consensus-specs/pull/2878 ## Proposed Changes 1. Run fork choice at the start of every slot, and wait for this run to complete before proposing a block. 2. As an optimisation, also run fork choice 3/4 of the way through the slot (at 9s), _dequeueing attestations for the next slot_. 3. Remove the fork choice run from the state advance timer that occurred before advancing the state. ## Additional Info ### Block Proposal Accuracy This change makes us more likely to propose on top of the correct head in the presence of re-orgs with proposer boost in play. The main scenario that this change is designed to address is described in the linked spec issue. ### Attestation Accuracy This change _also_ makes us more likely to attest to the correct head. Currently in the case of a skipped slot at `slot` we only run fork choice 9s into `slot - 1`. This means the attestations from `slot - 1` aren't taken into consideration, and any boost applied to the block from `slot - 1` is not removed (it should be). In the language of the linked spec issue, this means we are liable to attest to C, even when the majority voting weight has already caused a re-org to B. ### Why remove the call before the state advance? If we've run fork choice at the start of the slot then it has already dequeued all the attestations from the previous slot, which are the only ones eligible to influence the head in the current slot. Running fork choice again is unnecessary (unless we run it for the next slot and try to pre-empt a re-org, but I don't currently think this is a great idea). ### Performance Based on Prater testing this adds about 5-25ms of runtime to block proposal times, which are 500-1000ms on average (and spike to 5s+ sometimes due to state handling issues :cry: ). I believe this is a small enough penalty to enable it by default, with the option to disable it via the new flag `--fork-choice-before-proposal-timeout 0`. Upcoming work on block packing and state representation will also reduce block production times in general, while removing the spikes. ### Implementation Fork choice gets invoked at the start of the slot via the `per_slot_task` function called from the slot timer. It then uses a condition variable to signal to block production that fork choice has been updated. This is a bit funky, but it seems to work. One downside of the timer-based approach is that it doesn't happen automatically in most of the tests. The test added by this PR has to trigger the run manually.
2022-05-20 05:02:11 +00:00
// Run the state advance 3/4 of the way through the slot (9s on mainnet).
let state_advance_offset = slot_duration / 4;
let state_advance_instant = if duration_to_next_slot > state_advance_offset {
Instant::now() + duration_to_next_slot - state_advance_offset
} else {
// Skip the state advance for the current slot and wait until the next one.
Instant::now() + duration_to_next_slot + slot_duration - state_advance_offset
};
// Run fork choice 23/24s of the way through the slot (11.5s on mainnet).
// We need to run after the state advance, so use the same condition as above.
let fork_choice_offset = slot_duration / 24;
let fork_choice_instant = if duration_to_next_slot > state_advance_offset {
Instant::now() + duration_to_next_slot - fork_choice_offset
} else {
Instant::now() + duration_to_next_slot + slot_duration - fork_choice_offset
};
// Wait for the state advance.
sleep_until(state_advance_instant).await;
// Compute the current slot here at approx 3/4 through the slot. Even though this slot is
// only used by fork choice we need to calculate it here rather than after the state
// advance, in case the state advance flows over into the next slot.
let current_slot = match beacon_chain.slot() {
Ok(slot) => slot,
Err(e) => {
warn!(
log,
"Unable to determine slot in state advance timer";
"error" => ?e
);
// If we can't read the slot clock, just wait another slot.
sleep(slot_duration).await;
continue;
}
};
// Only spawn the state advance task if the lock was previously free.
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
if !is_running.lock() {
let log = log.clone();
let beacon_chain = beacon_chain.clone();
let is_running = is_running.clone();
executor.spawn_blocking(
move || {
match advance_head(&beacon_chain, &log) {
Ok(()) => (),
Err(Error::BeaconChain(e)) => error!(
log,
"Failed to advance head state";
"error" => ?e
),
Err(Error::StateAlreadyAdvanced { block_root }) => debug!(
log,
"State already advanced on slot";
"block_root" => ?block_root
),
Err(Error::MaxDistanceExceeded {
current_slot,
head_slot,
}) => debug!(
log,
"Refused to advance head state";
"head_slot" => head_slot,
"current_slot" => current_slot,
),
other => warn!(
log,
"Did not advance head state";
"reason" => ?other
),
};
// Permit this blocking task to spawn again, next time the timer fires.
is_running.unlock();
},
"state_advance_blocking",
);
} else {
warn!(
log,
"State advance routine overloaded";
"msg" => "system resources may be overloaded"
)
}
Run fork choice before block proposal (#3168) ## Issue Addressed Upcoming spec change https://github.com/ethereum/consensus-specs/pull/2878 ## Proposed Changes 1. Run fork choice at the start of every slot, and wait for this run to complete before proposing a block. 2. As an optimisation, also run fork choice 3/4 of the way through the slot (at 9s), _dequeueing attestations for the next slot_. 3. Remove the fork choice run from the state advance timer that occurred before advancing the state. ## Additional Info ### Block Proposal Accuracy This change makes us more likely to propose on top of the correct head in the presence of re-orgs with proposer boost in play. The main scenario that this change is designed to address is described in the linked spec issue. ### Attestation Accuracy This change _also_ makes us more likely to attest to the correct head. Currently in the case of a skipped slot at `slot` we only run fork choice 9s into `slot - 1`. This means the attestations from `slot - 1` aren't taken into consideration, and any boost applied to the block from `slot - 1` is not removed (it should be). In the language of the linked spec issue, this means we are liable to attest to C, even when the majority voting weight has already caused a re-org to B. ### Why remove the call before the state advance? If we've run fork choice at the start of the slot then it has already dequeued all the attestations from the previous slot, which are the only ones eligible to influence the head in the current slot. Running fork choice again is unnecessary (unless we run it for the next slot and try to pre-empt a re-org, but I don't currently think this is a great idea). ### Performance Based on Prater testing this adds about 5-25ms of runtime to block proposal times, which are 500-1000ms on average (and spike to 5s+ sometimes due to state handling issues :cry: ). I believe this is a small enough penalty to enable it by default, with the option to disable it via the new flag `--fork-choice-before-proposal-timeout 0`. Upcoming work on block packing and state representation will also reduce block production times in general, while removing the spikes. ### Implementation Fork choice gets invoked at the start of the slot via the `per_slot_task` function called from the slot timer. It then uses a condition variable to signal to block production that fork choice has been updated. This is a bit funky, but it seems to work. One downside of the timer-based approach is that it doesn't happen automatically in most of the tests. The test added by this PR has to trigger the run manually.
2022-05-20 05:02:11 +00:00
// Run fork choice pre-emptively for the next slot. This processes most of the attestations
// from this slot off the hot path of block verification and production.
// Wait for the fork choice instant (which may already be past).
sleep_until(fork_choice_instant).await;
let log = log.clone();
let beacon_chain = beacon_chain.clone();
let next_slot = current_slot + 1;
Use async code when interacting with EL (#3244) ## Overview This rather extensive PR achieves two primary goals: 1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state. 2. Refactors fork choice, block production and block processing to `async` functions. Additionally, it achieves: - Concurrent forkchoice updates to the EL and cache pruning after a new head is selected. - Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production. - Concurrent per-block-processing and execution payload verification during block processing. - The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?): - I had to do this to deal with sending blocks into spawned tasks. - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones. - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap. - Avoids cloning *all the blocks* in *every chain segment* during sync. - It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:) - The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs. For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273 ## Changes to `canonical_head` and `fork_choice` Previously, the `BeaconChain` had two separate fields: ``` canonical_head: RwLock<Snapshot>, fork_choice: RwLock<BeaconForkChoice> ``` Now, we have grouped these values under a single struct: ``` canonical_head: CanonicalHead { cached_head: RwLock<Arc<Snapshot>>, fork_choice: RwLock<BeaconForkChoice> } ``` Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously. ## Breaking Changes ### The `state` (root) field in the `finalized_checkpoint` SSE event Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event: 1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`. 4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots. Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1). I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku. ## Notes for Reviewers I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct. I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking". I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it. I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around. Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2. You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests: - Changing tests to be `tokio::async` tests. - Adding `.await` to fork choice, block processing and block production functions. - Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`. - Wrapping `SignedBeaconBlock` in an `Arc`. - In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant. I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic. Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
executor.spawn(
async move {
Avoid parallel fork choice runs during sync (#3217) ## Issue Addressed Fixes an issue that @paulhauner found with the v2.3.0 release candidate whereby the fork choice runs introduced by #3168 tripped over each other during sync: ``` May 24 23:06:40.542 WARN Error signalling fork choice waiter slot: 3884129, error: ForkChoiceSignalOutOfOrder { current: Slot(3884131), latest: Slot(3884129) }, service: beacon ``` This can occur because fork choice is called from the state advance _and_ the per-slot task. When one of these runs takes a long time it can end up finishing after a run from a later slot, tripping the error above. The problem is resolved by not running either of these fork choice calls during sync. Additionally, these parallel fork choice runs were causing issues in the database: ``` May 24 07:49:05.098 WARN Found a chain that should already have been pruned, head_slot: 92925, head_block_root: 0xa76c7bf1b98e54ed4b0d8686efcfdf853484e6c2a4c67e91cbf19e5ad1f96b17, service: beacon May 24 07:49:05.101 WARN Database migration failed error: HotColdDBError(FreezeSlotError { current_split_slot: Slot(92608), proposed_split_slot: Slot(92576) }), service: beacon ``` In this case, two fork choice calls triggering the finalization processing were being processed out of order due to differences in their processing time, causing the background migrator to try to advance finalization _backwards_ :flushed:. Removing the parallel fork choice runs from sync effectively addresses the issue, because these runs are most likely to have different finalized checkpoints (because of the speed at which fork choice advances during sync). In theory it's still possible to process updates out of order if any other fork choice runs end up completing out of order, but this should be much less common. Fixing out of order fork choice runs in general is difficult as it requires architectural changes like serialising fork choice updates through a single thread, or locking fork choice along with the head when it is mutated (https://github.com/sigp/lighthouse/pull/3175). ## Proposed Changes * Don't run per-slot fork choice during sync (if head is older than 4 slots) * Don't run state-advance fork choice during sync (if head is older than 4 slots) * Check for monotonic finalization updates in the background migrator. This is a good defensive check to have, and I'm not sure why we didn't have it before (we may have had it and wrongly removed it).
2022-05-25 03:27:30 +00:00
// Don't run fork choice during sync.
Use async code when interacting with EL (#3244) ## Overview This rather extensive PR achieves two primary goals: 1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state. 2. Refactors fork choice, block production and block processing to `async` functions. Additionally, it achieves: - Concurrent forkchoice updates to the EL and cache pruning after a new head is selected. - Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production. - Concurrent per-block-processing and execution payload verification during block processing. - The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?): - I had to do this to deal with sending blocks into spawned tasks. - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones. - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap. - Avoids cloning *all the blocks* in *every chain segment* during sync. - It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:) - The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs. For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273 ## Changes to `canonical_head` and `fork_choice` Previously, the `BeaconChain` had two separate fields: ``` canonical_head: RwLock<Snapshot>, fork_choice: RwLock<BeaconForkChoice> ``` Now, we have grouped these values under a single struct: ``` canonical_head: CanonicalHead { cached_head: RwLock<Arc<Snapshot>>, fork_choice: RwLock<BeaconForkChoice> } ``` Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously. ## Breaking Changes ### The `state` (root) field in the `finalized_checkpoint` SSE event Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event: 1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`. 4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots. Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1). I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku. ## Notes for Reviewers I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct. I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking". I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it. I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around. Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2. You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests: - Changing tests to be `tokio::async` tests. - Adding `.await` to fork choice, block processing and block production functions. - Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`. - Wrapping `SignedBeaconBlock` in an `Arc`. - In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant. I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic. Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
if beacon_chain.best_slot() + MAX_FORK_CHOICE_DISTANCE < current_slot {
Avoid parallel fork choice runs during sync (#3217) ## Issue Addressed Fixes an issue that @paulhauner found with the v2.3.0 release candidate whereby the fork choice runs introduced by #3168 tripped over each other during sync: ``` May 24 23:06:40.542 WARN Error signalling fork choice waiter slot: 3884129, error: ForkChoiceSignalOutOfOrder { current: Slot(3884131), latest: Slot(3884129) }, service: beacon ``` This can occur because fork choice is called from the state advance _and_ the per-slot task. When one of these runs takes a long time it can end up finishing after a run from a later slot, tripping the error above. The problem is resolved by not running either of these fork choice calls during sync. Additionally, these parallel fork choice runs were causing issues in the database: ``` May 24 07:49:05.098 WARN Found a chain that should already have been pruned, head_slot: 92925, head_block_root: 0xa76c7bf1b98e54ed4b0d8686efcfdf853484e6c2a4c67e91cbf19e5ad1f96b17, service: beacon May 24 07:49:05.101 WARN Database migration failed error: HotColdDBError(FreezeSlotError { current_split_slot: Slot(92608), proposed_split_slot: Slot(92576) }), service: beacon ``` In this case, two fork choice calls triggering the finalization processing were being processed out of order due to differences in their processing time, causing the background migrator to try to advance finalization _backwards_ :flushed:. Removing the parallel fork choice runs from sync effectively addresses the issue, because these runs are most likely to have different finalized checkpoints (because of the speed at which fork choice advances during sync). In theory it's still possible to process updates out of order if any other fork choice runs end up completing out of order, but this should be much less common. Fixing out of order fork choice runs in general is difficult as it requires architectural changes like serialising fork choice updates through a single thread, or locking fork choice along with the head when it is mutated (https://github.com/sigp/lighthouse/pull/3175). ## Proposed Changes * Don't run per-slot fork choice during sync (if head is older than 4 slots) * Don't run state-advance fork choice during sync (if head is older than 4 slots) * Check for monotonic finalization updates in the background migrator. This is a good defensive check to have, and I'm not sure why we didn't have it before (we may have had it and wrongly removed it).
2022-05-25 03:27:30 +00:00
return;
}
beacon_chain.recompute_head_at_slot(next_slot).await;
Run fork choice before block proposal (#3168) ## Issue Addressed Upcoming spec change https://github.com/ethereum/consensus-specs/pull/2878 ## Proposed Changes 1. Run fork choice at the start of every slot, and wait for this run to complete before proposing a block. 2. As an optimisation, also run fork choice 3/4 of the way through the slot (at 9s), _dequeueing attestations for the next slot_. 3. Remove the fork choice run from the state advance timer that occurred before advancing the state. ## Additional Info ### Block Proposal Accuracy This change makes us more likely to propose on top of the correct head in the presence of re-orgs with proposer boost in play. The main scenario that this change is designed to address is described in the linked spec issue. ### Attestation Accuracy This change _also_ makes us more likely to attest to the correct head. Currently in the case of a skipped slot at `slot` we only run fork choice 9s into `slot - 1`. This means the attestations from `slot - 1` aren't taken into consideration, and any boost applied to the block from `slot - 1` is not removed (it should be). In the language of the linked spec issue, this means we are liable to attest to C, even when the majority voting weight has already caused a re-org to B. ### Why remove the call before the state advance? If we've run fork choice at the start of the slot then it has already dequeued all the attestations from the previous slot, which are the only ones eligible to influence the head in the current slot. Running fork choice again is unnecessary (unless we run it for the next slot and try to pre-empt a re-org, but I don't currently think this is a great idea). ### Performance Based on Prater testing this adds about 5-25ms of runtime to block proposal times, which are 500-1000ms on average (and spike to 5s+ sometimes due to state handling issues :cry: ). I believe this is a small enough penalty to enable it by default, with the option to disable it via the new flag `--fork-choice-before-proposal-timeout 0`. Upcoming work on block packing and state representation will also reduce block production times in general, while removing the spikes. ### Implementation Fork choice gets invoked at the start of the slot via the `per_slot_task` function called from the slot timer. It then uses a condition variable to signal to block production that fork choice has been updated. This is a bit funky, but it seems to work. One downside of the timer-based approach is that it doesn't happen automatically in most of the tests. The test added by this PR has to trigger the run manually.
2022-05-20 05:02:11 +00:00
Use async code when interacting with EL (#3244) ## Overview This rather extensive PR achieves two primary goals: 1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state. 2. Refactors fork choice, block production and block processing to `async` functions. Additionally, it achieves: - Concurrent forkchoice updates to the EL and cache pruning after a new head is selected. - Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production. - Concurrent per-block-processing and execution payload verification during block processing. - The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?): - I had to do this to deal with sending blocks into spawned tasks. - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones. - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap. - Avoids cloning *all the blocks* in *every chain segment* during sync. - It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:) - The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs. For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273 ## Changes to `canonical_head` and `fork_choice` Previously, the `BeaconChain` had two separate fields: ``` canonical_head: RwLock<Snapshot>, fork_choice: RwLock<BeaconForkChoice> ``` Now, we have grouped these values under a single struct: ``` canonical_head: CanonicalHead { cached_head: RwLock<Arc<Snapshot>>, fork_choice: RwLock<BeaconForkChoice> } ``` Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously. ## Breaking Changes ### The `state` (root) field in the `finalized_checkpoint` SSE event Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event: 1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`. 4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots. Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1). I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku. ## Notes for Reviewers I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct. I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking". I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it. I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around. Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2. You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests: - Changing tests to be `tokio::async` tests. - Adding `.await` to fork choice, block processing and block production functions. - Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`. - Wrapping `SignedBeaconBlock` in an `Arc`. - In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant. I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic. Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
// Use a blocking task to avoid blocking the core executor whilst waiting for locks
// in `ForkChoiceSignalTx`.
beacon_chain.task_executor.clone().spawn_blocking(
move || {
// Signal block proposal for the next slot (if it happens to be waiting).
if let Some(tx) = &beacon_chain.fork_choice_signal_tx {
if let Err(e) = tx.notify_fork_choice_complete(next_slot) {
warn!(
log,
"Error signalling fork choice waiter";
"error" => ?e,
"slot" => next_slot,
);
}
}
},
"fork_choice_advance_signal_tx",
);
Run fork choice before block proposal (#3168) ## Issue Addressed Upcoming spec change https://github.com/ethereum/consensus-specs/pull/2878 ## Proposed Changes 1. Run fork choice at the start of every slot, and wait for this run to complete before proposing a block. 2. As an optimisation, also run fork choice 3/4 of the way through the slot (at 9s), _dequeueing attestations for the next slot_. 3. Remove the fork choice run from the state advance timer that occurred before advancing the state. ## Additional Info ### Block Proposal Accuracy This change makes us more likely to propose on top of the correct head in the presence of re-orgs with proposer boost in play. The main scenario that this change is designed to address is described in the linked spec issue. ### Attestation Accuracy This change _also_ makes us more likely to attest to the correct head. Currently in the case of a skipped slot at `slot` we only run fork choice 9s into `slot - 1`. This means the attestations from `slot - 1` aren't taken into consideration, and any boost applied to the block from `slot - 1` is not removed (it should be). In the language of the linked spec issue, this means we are liable to attest to C, even when the majority voting weight has already caused a re-org to B. ### Why remove the call before the state advance? If we've run fork choice at the start of the slot then it has already dequeued all the attestations from the previous slot, which are the only ones eligible to influence the head in the current slot. Running fork choice again is unnecessary (unless we run it for the next slot and try to pre-empt a re-org, but I don't currently think this is a great idea). ### Performance Based on Prater testing this adds about 5-25ms of runtime to block proposal times, which are 500-1000ms on average (and spike to 5s+ sometimes due to state handling issues :cry: ). I believe this is a small enough penalty to enable it by default, with the option to disable it via the new flag `--fork-choice-before-proposal-timeout 0`. Upcoming work on block packing and state representation will also reduce block production times in general, while removing the spikes. ### Implementation Fork choice gets invoked at the start of the slot via the `per_slot_task` function called from the slot timer. It then uses a condition variable to signal to block production that fork choice has been updated. This is a bit funky, but it seems to work. One downside of the timer-based approach is that it doesn't happen automatically in most of the tests. The test added by this PR has to trigger the run manually.
2022-05-20 05:02:11 +00:00
},
"fork_choice_advance",
);
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
}
}
/// Reads the `snapshot_cache` from the `beacon_chain` and attempts to take a clone of the
/// `BeaconState` of the head block. If it obtains this clone, the state will be advanced a single
/// slot then placed back in the `snapshot_cache` to be used for block verification.
///
/// See the module-level documentation for rationale.
fn advance_head<T: BeaconChainTypes>(
beacon_chain: &Arc<BeaconChain<T>>,
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
log: &Logger,
) -> Result<(), Error> {
let current_slot = beacon_chain.slot()?;
// These brackets ensure that the `head_slot` value is dropped before we run fork choice and
// potentially invalidate it.
//
// Fork-choice is not run *before* this function to avoid unnecessary calls whilst syncing.
{
Use async code when interacting with EL (#3244) ## Overview This rather extensive PR achieves two primary goals: 1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state. 2. Refactors fork choice, block production and block processing to `async` functions. Additionally, it achieves: - Concurrent forkchoice updates to the EL and cache pruning after a new head is selected. - Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production. - Concurrent per-block-processing and execution payload verification during block processing. - The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?): - I had to do this to deal with sending blocks into spawned tasks. - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones. - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap. - Avoids cloning *all the blocks* in *every chain segment* during sync. - It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:) - The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs. For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273 ## Changes to `canonical_head` and `fork_choice` Previously, the `BeaconChain` had two separate fields: ``` canonical_head: RwLock<Snapshot>, fork_choice: RwLock<BeaconForkChoice> ``` Now, we have grouped these values under a single struct: ``` canonical_head: CanonicalHead { cached_head: RwLock<Arc<Snapshot>>, fork_choice: RwLock<BeaconForkChoice> } ``` Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously. ## Breaking Changes ### The `state` (root) field in the `finalized_checkpoint` SSE event Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event: 1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`. 4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots. Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1). I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku. ## Notes for Reviewers I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct. I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking". I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it. I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around. Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2. You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests: - Changing tests to be `tokio::async` tests. - Adding `.await` to fork choice, block processing and block production functions. - Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`. - Wrapping `SignedBeaconBlock` in an `Arc`. - In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant. I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic. Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
let head_slot = beacon_chain.best_slot();
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
// Don't run this when syncing or if lagging too far behind.
if head_slot + MAX_ADVANCE_DISTANCE < current_slot {
return Err(Error::MaxDistanceExceeded {
current_slot,
head_slot,
});
}
}
Use async code when interacting with EL (#3244) ## Overview This rather extensive PR achieves two primary goals: 1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state. 2. Refactors fork choice, block production and block processing to `async` functions. Additionally, it achieves: - Concurrent forkchoice updates to the EL and cache pruning after a new head is selected. - Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production. - Concurrent per-block-processing and execution payload verification during block processing. - The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?): - I had to do this to deal with sending blocks into spawned tasks. - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones. - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap. - Avoids cloning *all the blocks* in *every chain segment* during sync. - It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:) - The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs. For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273 ## Changes to `canonical_head` and `fork_choice` Previously, the `BeaconChain` had two separate fields: ``` canonical_head: RwLock<Snapshot>, fork_choice: RwLock<BeaconForkChoice> ``` Now, we have grouped these values under a single struct: ``` canonical_head: CanonicalHead { cached_head: RwLock<Arc<Snapshot>>, fork_choice: RwLock<BeaconForkChoice> } ``` Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously. ## Breaking Changes ### The `state` (root) field in the `finalized_checkpoint` SSE event Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event: 1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`. 4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots. Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1). I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku. ## Notes for Reviewers I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct. I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking". I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it. I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around. Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2. You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests: - Changing tests to be `tokio::async` tests. - Adding `.await` to fork choice, block processing and block production functions. - Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`. - Wrapping `SignedBeaconBlock` in an `Arc`. - In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant. I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic. Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
let head_root = beacon_chain.head_beacon_block_root();
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
let (head_slot, head_state_root, mut state) = match beacon_chain
.snapshot_cache
.try_write_for(BLOCK_PROCESSING_CACHE_LOCK_TIMEOUT)
.ok_or(BeaconChainError::SnapshotCacheLockTimeout)?
.get_for_state_advance(head_root)
{
StateAdvance::AlreadyAdvanced => {
return Err(Error::StateAlreadyAdvanced {
block_root: head_root,
})
}
StateAdvance::BlockNotFound => return Err(Error::HeadMissingFromSnapshotCache(head_root)),
StateAdvance::State {
state,
state_root,
block_slot,
} => (block_slot, state_root, *state),
};
let initial_slot = state.slot();
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
let initial_epoch = state.current_epoch();
let state_root = if state.slot() == head_slot {
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
Some(head_state_root)
} else {
// Protect against advancing a state more than a single slot.
//
// Advancing more than one slot without storing the intermediate state would corrupt the
// database. Future works might store temporary, intermediate states inside this function.
return Err(Error::BadStateSlot {
_block_slot: head_slot,
_state_slot: state.slot(),
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
});
};
// Advance the state a single slot.
if let Some(summary) = per_slot_processing(&mut state, state_root, &beacon_chain.spec)
.map_err(BeaconChainError::from)?
{
Cache participating indices for Altair epoch processing (#2416) ## Issue Addressed NA ## Proposed Changes This PR addresses two things: 1. Allows the `ValidatorMonitor` to work with Altair states. 1. Optimizes `altair::process_epoch` (see [code](https://github.com/paulhauner/lighthouse/blob/participation-cache/consensus/state_processing/src/per_epoch_processing/altair/participation_cache.rs) for description) ## Breaking Changes The breaking changes in this PR revolve around one premise: *After the Altair fork, it's not longer possible (given only a `BeaconState`) to identify if a validator had *any* attestation included during some epoch. The best we can do is see if that validator made the "timely" source/target/head flags.* Whilst this seems annoying, it's not actually too bad. Finalization is based upon "timely target" attestations, so that's really the most important thing. Although there's *some* value in knowing if a validator had *any* attestation included, it's far more important to know about "timely target" participation, since this is what affects finality and justification. For simplicity and consistency, I've also removed the ability to determine if *any* attestation was included from metrics and API endpoints. Now, all Altair and non-Altair states will simply report on the head/target attestations. The following section details where we've removed fields and provides replacement values. ### Breaking Changes: Prometheus Metrics Some participation metrics have been removed and replaced. Some were removed since they are no longer relevant to Altair (e.g., total attesting balance) and others replaced with gwei values instead of pre-computed values. This provides more flexibility at display-time (e.g., Grafana). The following metrics were added as replacements: - `beacon_participation_prev_epoch_head_attesting_gwei_total` - `beacon_participation_prev_epoch_target_attesting_gwei_total` - `beacon_participation_prev_epoch_source_attesting_gwei_total` - `beacon_participation_prev_epoch_active_gwei_total` The following metrics were removed: - `beacon_participation_prev_epoch_attester` - instead use `beacon_participation_prev_epoch_source_attesting_gwei_total / beacon_participation_prev_epoch_active_gwei_total`. - `beacon_participation_prev_epoch_target_attester` - instead use `beacon_participation_prev_epoch_target_attesting_gwei_total / beacon_participation_prev_epoch_active_gwei_total`. - `beacon_participation_prev_epoch_head_attester` - instead use `beacon_participation_prev_epoch_head_attesting_gwei_total / beacon_participation_prev_epoch_active_gwei_total`. The `beacon_participation_prev_epoch_attester` endpoint has been removed. Users should instead use the pre-existing `beacon_participation_prev_epoch_target_attester`. ### Breaking Changes: HTTP API The `/lighthouse/validator_inclusion/{epoch}/{validator_id}` endpoint loses the following fields: - `current_epoch_attesting_gwei` (use `current_epoch_target_attesting_gwei` instead) - `previous_epoch_attesting_gwei` (use `previous_epoch_target_attesting_gwei` instead) The `/lighthouse/validator_inclusion/{epoch}/{validator_id}` endpoint lose the following fields: - `is_current_epoch_attester` (use `is_current_epoch_target_attester` instead) - `is_previous_epoch_attester` (use `is_previous_epoch_target_attester` instead) - `is_active_in_current_epoch` becomes `is_active_unslashed_in_current_epoch`. - `is_active_in_previous_epoch` becomes `is_active_unslashed_in_previous_epoch`. ## Additional Info NA ## TODO - [x] Deal with total balances - [x] Update validator_inclusion API - [ ] Ensure `beacon_participation_prev_epoch_target_attester` and `beacon_participation_prev_epoch_head_attester` work before Altair Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-07-27 07:01:01 +00:00
// Expose Prometheus metrics.
if let Err(e) = summary.observe_metrics() {
error!(
log,
"Failed to observe epoch summary metrics";
"src" => "state_advance_timer",
"error" => ?e
);
}
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
// Only notify the validator monitor for recent blocks.
if state.current_epoch() + VALIDATOR_MONITOR_HISTORIC_EPOCHS as u64
>= current_slot.epoch(T::EthSpec::slots_per_epoch())
{
// Potentially create logs/metrics for locally monitored validators.
Cache participating indices for Altair epoch processing (#2416) ## Issue Addressed NA ## Proposed Changes This PR addresses two things: 1. Allows the `ValidatorMonitor` to work with Altair states. 1. Optimizes `altair::process_epoch` (see [code](https://github.com/paulhauner/lighthouse/blob/participation-cache/consensus/state_processing/src/per_epoch_processing/altair/participation_cache.rs) for description) ## Breaking Changes The breaking changes in this PR revolve around one premise: *After the Altair fork, it's not longer possible (given only a `BeaconState`) to identify if a validator had *any* attestation included during some epoch. The best we can do is see if that validator made the "timely" source/target/head flags.* Whilst this seems annoying, it's not actually too bad. Finalization is based upon "timely target" attestations, so that's really the most important thing. Although there's *some* value in knowing if a validator had *any* attestation included, it's far more important to know about "timely target" participation, since this is what affects finality and justification. For simplicity and consistency, I've also removed the ability to determine if *any* attestation was included from metrics and API endpoints. Now, all Altair and non-Altair states will simply report on the head/target attestations. The following section details where we've removed fields and provides replacement values. ### Breaking Changes: Prometheus Metrics Some participation metrics have been removed and replaced. Some were removed since they are no longer relevant to Altair (e.g., total attesting balance) and others replaced with gwei values instead of pre-computed values. This provides more flexibility at display-time (e.g., Grafana). The following metrics were added as replacements: - `beacon_participation_prev_epoch_head_attesting_gwei_total` - `beacon_participation_prev_epoch_target_attesting_gwei_total` - `beacon_participation_prev_epoch_source_attesting_gwei_total` - `beacon_participation_prev_epoch_active_gwei_total` The following metrics were removed: - `beacon_participation_prev_epoch_attester` - instead use `beacon_participation_prev_epoch_source_attesting_gwei_total / beacon_participation_prev_epoch_active_gwei_total`. - `beacon_participation_prev_epoch_target_attester` - instead use `beacon_participation_prev_epoch_target_attesting_gwei_total / beacon_participation_prev_epoch_active_gwei_total`. - `beacon_participation_prev_epoch_head_attester` - instead use `beacon_participation_prev_epoch_head_attesting_gwei_total / beacon_participation_prev_epoch_active_gwei_total`. The `beacon_participation_prev_epoch_attester` endpoint has been removed. Users should instead use the pre-existing `beacon_participation_prev_epoch_target_attester`. ### Breaking Changes: HTTP API The `/lighthouse/validator_inclusion/{epoch}/{validator_id}` endpoint loses the following fields: - `current_epoch_attesting_gwei` (use `current_epoch_target_attesting_gwei` instead) - `previous_epoch_attesting_gwei` (use `previous_epoch_target_attesting_gwei` instead) The `/lighthouse/validator_inclusion/{epoch}/{validator_id}` endpoint lose the following fields: - `is_current_epoch_attester` (use `is_current_epoch_target_attester` instead) - `is_previous_epoch_attester` (use `is_previous_epoch_target_attester` instead) - `is_active_in_current_epoch` becomes `is_active_unslashed_in_current_epoch`. - `is_active_in_previous_epoch` becomes `is_active_unslashed_in_previous_epoch`. ## Additional Info NA ## TODO - [x] Deal with total balances - [x] Update validator_inclusion API - [ ] Ensure `beacon_participation_prev_epoch_target_attester` and `beacon_participation_prev_epoch_head_attester` work before Altair Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-07-27 07:01:01 +00:00
if let Err(e) = beacon_chain
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
.validator_monitor
.read()
Cache participating indices for Altair epoch processing (#2416) ## Issue Addressed NA ## Proposed Changes This PR addresses two things: 1. Allows the `ValidatorMonitor` to work with Altair states. 1. Optimizes `altair::process_epoch` (see [code](https://github.com/paulhauner/lighthouse/blob/participation-cache/consensus/state_processing/src/per_epoch_processing/altair/participation_cache.rs) for description) ## Breaking Changes The breaking changes in this PR revolve around one premise: *After the Altair fork, it's not longer possible (given only a `BeaconState`) to identify if a validator had *any* attestation included during some epoch. The best we can do is see if that validator made the "timely" source/target/head flags.* Whilst this seems annoying, it's not actually too bad. Finalization is based upon "timely target" attestations, so that's really the most important thing. Although there's *some* value in knowing if a validator had *any* attestation included, it's far more important to know about "timely target" participation, since this is what affects finality and justification. For simplicity and consistency, I've also removed the ability to determine if *any* attestation was included from metrics and API endpoints. Now, all Altair and non-Altair states will simply report on the head/target attestations. The following section details where we've removed fields and provides replacement values. ### Breaking Changes: Prometheus Metrics Some participation metrics have been removed and replaced. Some were removed since they are no longer relevant to Altair (e.g., total attesting balance) and others replaced with gwei values instead of pre-computed values. This provides more flexibility at display-time (e.g., Grafana). The following metrics were added as replacements: - `beacon_participation_prev_epoch_head_attesting_gwei_total` - `beacon_participation_prev_epoch_target_attesting_gwei_total` - `beacon_participation_prev_epoch_source_attesting_gwei_total` - `beacon_participation_prev_epoch_active_gwei_total` The following metrics were removed: - `beacon_participation_prev_epoch_attester` - instead use `beacon_participation_prev_epoch_source_attesting_gwei_total / beacon_participation_prev_epoch_active_gwei_total`. - `beacon_participation_prev_epoch_target_attester` - instead use `beacon_participation_prev_epoch_target_attesting_gwei_total / beacon_participation_prev_epoch_active_gwei_total`. - `beacon_participation_prev_epoch_head_attester` - instead use `beacon_participation_prev_epoch_head_attesting_gwei_total / beacon_participation_prev_epoch_active_gwei_total`. The `beacon_participation_prev_epoch_attester` endpoint has been removed. Users should instead use the pre-existing `beacon_participation_prev_epoch_target_attester`. ### Breaking Changes: HTTP API The `/lighthouse/validator_inclusion/{epoch}/{validator_id}` endpoint loses the following fields: - `current_epoch_attesting_gwei` (use `current_epoch_target_attesting_gwei` instead) - `previous_epoch_attesting_gwei` (use `previous_epoch_target_attesting_gwei` instead) The `/lighthouse/validator_inclusion/{epoch}/{validator_id}` endpoint lose the following fields: - `is_current_epoch_attester` (use `is_current_epoch_target_attester` instead) - `is_previous_epoch_attester` (use `is_previous_epoch_target_attester` instead) - `is_active_in_current_epoch` becomes `is_active_unslashed_in_current_epoch`. - `is_active_in_previous_epoch` becomes `is_active_unslashed_in_previous_epoch`. ## Additional Info NA ## TODO - [x] Deal with total balances - [x] Update validator_inclusion API - [ ] Ensure `beacon_participation_prev_epoch_target_attester` and `beacon_participation_prev_epoch_head_attester` work before Altair Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-07-27 07:01:01 +00:00
.process_validator_statuses(state.current_epoch(), &summary, &beacon_chain.spec)
{
error!(
log,
"Unable to process validator statuses";
"error" => ?e
);
}
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
}
}
debug!(
log,
"Advanced head state one slot";
"head_root" => ?head_root,
"state_slot" => state.slot(),
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
"current_slot" => current_slot,
);
Optimize validator duties (#2243) ## Issue Addressed Closes #2052 ## Proposed Changes - Refactor the attester/proposer duties endpoints in the BN - Performance improvements - Fixes some potential inconsistencies with the dependent root fields. - Removes `http_api::beacon_proposer_cache` and just uses the one on the `BeaconChain` instead. - Move the code for the proposer/attester duties endpoints into separate files, for readability. - Refactor the `DutiesService` in the VC - Required to reduce the delay on broadcasting new blocks. - Gets rid of the `ValidatorDuty` shim struct that came about when we adopted the standard API. - Separate block/attestation duty tasks so that they don't block each other when one is slow. - In the VC, use `PublicKeyBytes` to represent validators instead of `PublicKey`. `PublicKey` is a legit crypto object whilst `PublicKeyBytes` is just a byte-array, it's much faster to clone/hash `PublicKeyBytes` and this change has had a significant impact on runtimes. - Unfortunately this has created lots of dust changes. - In the BN, store `PublicKeyBytes` in the `beacon_proposer_cache` and allow access to them. The HTTP API always sends `PublicKeyBytes` over the wire and the conversion from `PublicKey` -> `PublickeyBytes` is non-trivial, especially when queries have 100s/1000s of validators (like Pyrmont). - Add the `state_processing::state_advance` mod which dedups a lot of the "apply `n` skip slots to the state" code. - This also fixes a bug with some functions which were failing to include a state root as per [this comment](https://github.com/sigp/lighthouse/blob/072695284f7eff82c51f79bc921ad942fea7483a/consensus/state_processing/src/state_advance.rs#L69-L74). I couldn't find any instance of this bug that resulted in anything more severe than keying a shuffling cache by the wrong block root. - Swap the VC block service to use `mpsc` from `tokio` instead of `futures`. This is consistent with the rest of the code base. ~~This PR *reduces* the size of the codebase :tada:~~ It *used* to reduce the size of the code base before I added more comments. ## Observations on Prymont - Proposer duties times down from peaks of 450ms to consistent <1ms. - Current epoch attester duties times down from >1s peaks to a consistent 20-30ms. - Block production down from +600ms to 100-200ms. ## Additional Info - ~~Blocked on #2241~~ - ~~Blocked on #2234~~ ## TODO - [x] ~~Refactor this into some smaller PRs?~~ Leaving this as-is for now. - [x] Address `per_slot_processing` roots. - [x] Investigate slow next epoch times. Not getting added to cache on block processing? - [x] Consider [this](https://github.com/sigp/lighthouse/blob/072695284f7eff82c51f79bc921ad942fea7483a/beacon_node/store/src/hot_cold_store.rs#L811-L812) in the scenario of replacing the state roots Co-authored-by: pawan <pawandhananjay@gmail.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-03-17 05:09:57 +00:00
// Build the current epoch cache, to prepare to compute proposer duties.
state
.build_committee_cache(RelativeEpoch::Current, &beacon_chain.spec)
.map_err(BeaconChainError::from)?;
// Build the next epoch cache, to prepare to compute attester duties.
state
.build_committee_cache(RelativeEpoch::Next, &beacon_chain.spec)
.map_err(BeaconChainError::from)?;
// If the `pre_state` is in a later epoch than `state`, pre-emptively add the proposer shuffling
// for the state's current epoch and the committee cache for the state's next epoch.
if initial_epoch < state.current_epoch() {
// Update the proposer cache.
//
// We supply the `head_root` as the decision block since the prior `if` statement guarantees
// the head root is the latest block from the prior epoch.
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
beacon_chain
.beacon_proposer_cache
.lock()
.insert(
state.current_epoch(),
head_root,
state
.get_beacon_proposer_indices(&beacon_chain.spec)
.map_err(BeaconChainError::from)?,
state.fork(),
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
)
.map_err(BeaconChainError::from)?;
Optimize validator duties (#2243) ## Issue Addressed Closes #2052 ## Proposed Changes - Refactor the attester/proposer duties endpoints in the BN - Performance improvements - Fixes some potential inconsistencies with the dependent root fields. - Removes `http_api::beacon_proposer_cache` and just uses the one on the `BeaconChain` instead. - Move the code for the proposer/attester duties endpoints into separate files, for readability. - Refactor the `DutiesService` in the VC - Required to reduce the delay on broadcasting new blocks. - Gets rid of the `ValidatorDuty` shim struct that came about when we adopted the standard API. - Separate block/attestation duty tasks so that they don't block each other when one is slow. - In the VC, use `PublicKeyBytes` to represent validators instead of `PublicKey`. `PublicKey` is a legit crypto object whilst `PublicKeyBytes` is just a byte-array, it's much faster to clone/hash `PublicKeyBytes` and this change has had a significant impact on runtimes. - Unfortunately this has created lots of dust changes. - In the BN, store `PublicKeyBytes` in the `beacon_proposer_cache` and allow access to them. The HTTP API always sends `PublicKeyBytes` over the wire and the conversion from `PublicKey` -> `PublickeyBytes` is non-trivial, especially when queries have 100s/1000s of validators (like Pyrmont). - Add the `state_processing::state_advance` mod which dedups a lot of the "apply `n` skip slots to the state" code. - This also fixes a bug with some functions which were failing to include a state root as per [this comment](https://github.com/sigp/lighthouse/blob/072695284f7eff82c51f79bc921ad942fea7483a/consensus/state_processing/src/state_advance.rs#L69-L74). I couldn't find any instance of this bug that resulted in anything more severe than keying a shuffling cache by the wrong block root. - Swap the VC block service to use `mpsc` from `tokio` instead of `futures`. This is consistent with the rest of the code base. ~~This PR *reduces* the size of the codebase :tada:~~ It *used* to reduce the size of the code base before I added more comments. ## Observations on Prymont - Proposer duties times down from peaks of 450ms to consistent <1ms. - Current epoch attester duties times down from >1s peaks to a consistent 20-30ms. - Block production down from +600ms to 100-200ms. ## Additional Info - ~~Blocked on #2241~~ - ~~Blocked on #2234~~ ## TODO - [x] ~~Refactor this into some smaller PRs?~~ Leaving this as-is for now. - [x] Address `per_slot_processing` roots. - [x] Investigate slow next epoch times. Not getting added to cache on block processing? - [x] Consider [this](https://github.com/sigp/lighthouse/blob/072695284f7eff82c51f79bc921ad942fea7483a/beacon_node/store/src/hot_cold_store.rs#L811-L812) in the scenario of replacing the state roots Co-authored-by: pawan <pawandhananjay@gmail.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-03-17 05:09:57 +00:00
// Update the attester cache.
let shuffling_id = AttestationShufflingId::new(head_root, &state, RelativeEpoch::Next)
.map_err(BeaconChainError::from)?;
let committee_cache = state
.committee_cache(RelativeEpoch::Next)
.map_err(BeaconChainError::from)?;
beacon_chain
.shuffling_cache
.try_write_for(ATTESTATION_CACHE_LOCK_TIMEOUT)
.ok_or(BeaconChainError::AttestationCacheLockTimeout)?
Avoid duplicate committee cache loads (#3574) ## Issue Addressed NA ## Proposed Changes I have observed scenarios on Goerli where Lighthouse was receiving attestations which reference the same, un-cached shuffling on multiple threads at the same time. Lighthouse was then loading the same state from database and determining the shuffling on multiple threads at the same time. This is unnecessary load on the disk and RAM. This PR modifies the shuffling cache so that each entry can be either: - A committee - A promise for a committee (i.e., a `crossbeam_channel::Receiver`) Now, in the scenario where we have thread A and thread B simultaneously requesting the same un-cached shuffling, we will have the following: 1. Thread A will take the write-lock on the shuffling cache, find that there's no cached committee and then create a "promise" (a `crossbeam_channel::Sender`) for a committee before dropping the write-lock. 1. Thread B will then be allowed to take the write-lock for the shuffling cache and find the promise created by thread A. It will block the current thread waiting for thread A to fulfill that promise. 1. Thread A will load the state from disk, obtain the shuffling, send it down the channel, insert the entry into the cache and then continue to verify the attestation. 1. Thread B will then receive the shuffling from the receiver, be un-blocked and then continue to verify the attestation. In the case where thread A fails to generate the shuffling and drops the sender, the next time that specific shuffling is requested we will detect that the channel is disconnected and return a `None` entry for that shuffling. This will cause the shuffling to be re-calculated. ## Additional Info NA
2022-09-16 08:54:03 +00:00
.insert_committee_cache(shuffling_id.clone(), committee_cache);
Optimize validator duties (#2243) ## Issue Addressed Closes #2052 ## Proposed Changes - Refactor the attester/proposer duties endpoints in the BN - Performance improvements - Fixes some potential inconsistencies with the dependent root fields. - Removes `http_api::beacon_proposer_cache` and just uses the one on the `BeaconChain` instead. - Move the code for the proposer/attester duties endpoints into separate files, for readability. - Refactor the `DutiesService` in the VC - Required to reduce the delay on broadcasting new blocks. - Gets rid of the `ValidatorDuty` shim struct that came about when we adopted the standard API. - Separate block/attestation duty tasks so that they don't block each other when one is slow. - In the VC, use `PublicKeyBytes` to represent validators instead of `PublicKey`. `PublicKey` is a legit crypto object whilst `PublicKeyBytes` is just a byte-array, it's much faster to clone/hash `PublicKeyBytes` and this change has had a significant impact on runtimes. - Unfortunately this has created lots of dust changes. - In the BN, store `PublicKeyBytes` in the `beacon_proposer_cache` and allow access to them. The HTTP API always sends `PublicKeyBytes` over the wire and the conversion from `PublicKey` -> `PublickeyBytes` is non-trivial, especially when queries have 100s/1000s of validators (like Pyrmont). - Add the `state_processing::state_advance` mod which dedups a lot of the "apply `n` skip slots to the state" code. - This also fixes a bug with some functions which were failing to include a state root as per [this comment](https://github.com/sigp/lighthouse/blob/072695284f7eff82c51f79bc921ad942fea7483a/consensus/state_processing/src/state_advance.rs#L69-L74). I couldn't find any instance of this bug that resulted in anything more severe than keying a shuffling cache by the wrong block root. - Swap the VC block service to use `mpsc` from `tokio` instead of `futures`. This is consistent with the rest of the code base. ~~This PR *reduces* the size of the codebase :tada:~~ It *used* to reduce the size of the code base before I added more comments. ## Observations on Prymont - Proposer duties times down from peaks of 450ms to consistent <1ms. - Current epoch attester duties times down from >1s peaks to a consistent 20-30ms. - Block production down from +600ms to 100-200ms. ## Additional Info - ~~Blocked on #2241~~ - ~~Blocked on #2234~~ ## TODO - [x] ~~Refactor this into some smaller PRs?~~ Leaving this as-is for now. - [x] Address `per_slot_processing` roots. - [x] Investigate slow next epoch times. Not getting added to cache on block processing? - [x] Consider [this](https://github.com/sigp/lighthouse/blob/072695284f7eff82c51f79bc921ad942fea7483a/beacon_node/store/src/hot_cold_store.rs#L811-L812) in the scenario of replacing the state roots Co-authored-by: pawan <pawandhananjay@gmail.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-03-17 05:09:57 +00:00
debug!(
log,
"Primed proposer and attester caches";
"head_root" => ?head_root,
"next_epoch_shuffling_root" => ?shuffling_id.shuffling_decision_block,
"state_epoch" => state.current_epoch(),
"current_epoch" => current_slot.epoch(T::EthSpec::slots_per_epoch()),
);
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
}
// Apply the state to the attester cache, if the cache deems it interesting.
beacon_chain
.attester_cache
.maybe_cache_state(&state, head_root, &beacon_chain.spec)
.map_err(BeaconChainError::from)?;
let final_slot = state.slot();
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
// Insert the advanced state back into the snapshot cache.
beacon_chain
.snapshot_cache
.try_write_for(BLOCK_PROCESSING_CACHE_LOCK_TIMEOUT)
.ok_or(BeaconChainError::SnapshotCacheLockTimeout)?
.update_pre_state(head_root, state)
.ok_or(Error::HeadMissingFromSnapshotCache(head_root))?;
// If we have moved into the next slot whilst processing the state then this function is going
// to become ineffective and likely become a hindrance as we're stealing the tree hash cache
// from the snapshot cache (which may force the next block to rebuild a new one).
//
// If this warning occurs very frequently on well-resourced machines then we should consider
// starting it earlier in the slot. Otherwise, it's a good indication that the machine is too
// slow/overloaded and will be useful information for the user.
let starting_slot = current_slot;
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
let current_slot = beacon_chain.slot()?;
if starting_slot < current_slot {
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
warn!(
log,
"State advance too slow";
"head_root" => %head_root,
"advanced_slot" => final_slot,
"current_slot" => current_slot,
"starting_slot" => starting_slot,
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
"msg" => "system resources may be overloaded",
);
}
debug!(
log,
"Completed state advance";
"head_root" => ?head_root,
"advanced_slot" => final_slot,
"initial_slot" => initial_slot,
);
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn lock() {
let lock = Lock::new();
assert!(!lock.lock());
assert!(lock.lock());
assert!(lock.lock());
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
lock.unlock();
assert!(!lock.lock());
assert!(lock.lock());
Advance state to next slot after importing block (#2174) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
}
}