lighthouse/beacon_node/beacon_chain/tests/attestation_production.rs

249 lines
8.2 KiB
Rust
Raw Normal View History

Faster attestation production (#838) * Start adding interop genesis state to lcli * Use more efficient method to generate genesis state * Remove duplicate int_to_bytes32 * Add lcli command to change state genesis time * Add option to allow VC to start with unsynced BN * Set VC to do parallel key loading * Don't default to dummy eth1 backend * Add endpoint to dump operation pool * Add metrics for op pool * Remove state clone for slot notifier * Add mem size approximation for tree hash cache * Avoid cloning tree hash when getting head * Avoid cloning tree hash when getting head * Add working arena-based cached tree hash * Add another benchmark * Add pre-allocation for caches * Make cache nullable * Fix bugs in cache tree hash * Add validator tree hash optimization * Optimize hash_concat * Make hash32_concat return fixed-len array * Fix failing API tests * Add new beacon state cache struct * Add validator-specific cache * Separate list and values arenas * Add parallel validator registry hashing * Remove MultiTreeHashCache * Remove cached tree hash macro * Fix failing tree hash test * Address Michael's comments * Add CachedTreeHash impl for ef tests * Fix messy merge conflict * Optimize attestation production * Add first basic optimizations * Fix SlotOutOfBounds error * Resolved missed merge conflicts * Fix another missed merge conflict * Fix more merge conflict issues * Add `StateSkipConfig` * Fix test compile errors * Add failing test * Fix bug, make tests pass * Add comment * Delete unused function * Replace deleted comment
2020-03-04 06:10:22 +00:00
#![cfg(not(debug_assertions))]
use beacon_chain::block_verification_types::RpcBlock;
Rework block processing (#4092) * introduce availability pending block * add intoavailableblock trait * small fixes * add 'gossip blob cache' and start to clean up processing and transition types * shard memory blob cache * Initial commit * Fix after rebase * Add gossip verification conditions * cache cleanup * general chaos * extended chaos * cargo fmt * more progress * more progress * tons of changes, just tryna compile * everything, everywhere, all at once * Reprocess an ExecutedBlock on unavailable blobs * Add sus gossip verification for blobs * Merge stuff * Remove reprocessing cache stuff * lint * Add a wrapper to allow construction of only valid `AvailableBlock`s * rename blob arc list to blob list * merge cleanuo * Revert "merge cleanuo" This reverts commit 5e98326878c77528d0c4668c5a4db4a4b0fbaeaa. * Revert "Revert "merge cleanuo"" This reverts commit 3a4009443a5812b3028abe855079307436dc5419. * fix rpc methods * move beacon block and blob to eth2/types * rename gossip blob cache to data availability checker * lots of changes * fix some compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * cargo fmt * use a common data structure for block import types * fix availability check on proposal import * refactor the blob cache and split the block wrapper into two types * add type conversion for signed block and block wrapper * fix beacon chain tests and do some renaming, add some comments * Partial processing (#4) * move beacon block and blob to eth2/types * rename gossip blob cache to data availability checker * lots of changes * fix some compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * cargo fmt * use a common data structure for block import types * fix availability check on proposal import * refactor the blob cache and split the block wrapper into two types * add type conversion for signed block and block wrapper * fix beacon chain tests and do some renaming, add some comments * cargo update (#6) --------- Co-authored-by: realbigsean <sean@sigmaprime.io> Co-authored-by: realbigsean <seananderson33@gmail.com>
2023-03-24 21:30:41 +00:00
use beacon_chain::test_utils::{AttestationStrategy, BeaconChainHarness, BlockStrategy};
use beacon_chain::{StateSkipConfig, WhenSlotSkipped};
use lazy_static::lazy_static;
Use async code when interacting with EL (#3244) ## Overview This rather extensive PR achieves two primary goals: 1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state. 2. Refactors fork choice, block production and block processing to `async` functions. Additionally, it achieves: - Concurrent forkchoice updates to the EL and cache pruning after a new head is selected. - Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production. - Concurrent per-block-processing and execution payload verification during block processing. - The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?): - I had to do this to deal with sending blocks into spawned tasks. - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones. - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap. - Avoids cloning *all the blocks* in *every chain segment* during sync. - It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:) - The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs. For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273 ## Changes to `canonical_head` and `fork_choice` Previously, the `BeaconChain` had two separate fields: ``` canonical_head: RwLock<Snapshot>, fork_choice: RwLock<BeaconForkChoice> ``` Now, we have grouped these values under a single struct: ``` canonical_head: CanonicalHead { cached_head: RwLock<Arc<Snapshot>>, fork_choice: RwLock<BeaconForkChoice> } ``` Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously. ## Breaking Changes ### The `state` (root) field in the `finalized_checkpoint` SSE event Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event: 1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`. 4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots. Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1). I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku. ## Notes for Reviewers I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct. I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking". I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it. I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around. Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2. You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests: - Changing tests to be `tokio::async` tests. - Adding `.await` to fork choice, block processing and block production functions. - Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`. - Wrapping `SignedBeaconBlock` in an `Arc`. - In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant. I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic. Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
use std::sync::Arc;
Faster attestation production (#838) * Start adding interop genesis state to lcli * Use more efficient method to generate genesis state * Remove duplicate int_to_bytes32 * Add lcli command to change state genesis time * Add option to allow VC to start with unsynced BN * Set VC to do parallel key loading * Don't default to dummy eth1 backend * Add endpoint to dump operation pool * Add metrics for op pool * Remove state clone for slot notifier * Add mem size approximation for tree hash cache * Avoid cloning tree hash when getting head * Avoid cloning tree hash when getting head * Add working arena-based cached tree hash * Add another benchmark * Add pre-allocation for caches * Make cache nullable * Fix bugs in cache tree hash * Add validator tree hash optimization * Optimize hash_concat * Make hash32_concat return fixed-len array * Fix failing API tests * Add new beacon state cache struct * Add validator-specific cache * Separate list and values arenas * Add parallel validator registry hashing * Remove MultiTreeHashCache * Remove cached tree hash macro * Fix failing tree hash test * Address Michael's comments * Add CachedTreeHash impl for ef tests * Fix messy merge conflict * Optimize attestation production * Add first basic optimizations * Fix SlotOutOfBounds error * Resolved missed merge conflicts * Fix another missed merge conflict * Fix more merge conflict issues * Add `StateSkipConfig` * Fix test compile errors * Add failing test * Fix bug, make tests pass * Add comment * Delete unused function * Replace deleted comment
2020-03-04 06:10:22 +00:00
use tree_hash::TreeHash;
2020-03-04 23:35:39 +00:00
use types::{AggregateSignature, EthSpec, Keypair, MainnetEthSpec, RelativeEpoch, Slot};
Faster attestation production (#838) * Start adding interop genesis state to lcli * Use more efficient method to generate genesis state * Remove duplicate int_to_bytes32 * Add lcli command to change state genesis time * Add option to allow VC to start with unsynced BN * Set VC to do parallel key loading * Don't default to dummy eth1 backend * Add endpoint to dump operation pool * Add metrics for op pool * Remove state clone for slot notifier * Add mem size approximation for tree hash cache * Avoid cloning tree hash when getting head * Avoid cloning tree hash when getting head * Add working arena-based cached tree hash * Add another benchmark * Add pre-allocation for caches * Make cache nullable * Fix bugs in cache tree hash * Add validator tree hash optimization * Optimize hash_concat * Make hash32_concat return fixed-len array * Fix failing API tests * Add new beacon state cache struct * Add validator-specific cache * Separate list and values arenas * Add parallel validator registry hashing * Remove MultiTreeHashCache * Remove cached tree hash macro * Fix failing tree hash test * Address Michael's comments * Add CachedTreeHash impl for ef tests * Fix messy merge conflict * Optimize attestation production * Add first basic optimizations * Fix SlotOutOfBounds error * Resolved missed merge conflicts * Fix another missed merge conflict * Fix more merge conflict issues * Add `StateSkipConfig` * Fix test compile errors * Add failing test * Fix bug, make tests pass * Add comment * Delete unused function * Replace deleted comment
2020-03-04 06:10:22 +00:00
pub const VALIDATOR_COUNT: usize = 16;
lazy_static! {
/// A cached set of keys.
static ref KEYPAIRS: Vec<Keypair> = types::test_utils::generate_deterministic_keypairs(VALIDATOR_COUNT);
}
/// This test builds a chain that is just long enough to finalize an epoch then it produces an
/// attestation at each slot from genesis through to three epochs past the head.
///
/// It checks the produced attestation against some locally computed values.
Use async code when interacting with EL (#3244) ## Overview This rather extensive PR achieves two primary goals: 1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state. 2. Refactors fork choice, block production and block processing to `async` functions. Additionally, it achieves: - Concurrent forkchoice updates to the EL and cache pruning after a new head is selected. - Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production. - Concurrent per-block-processing and execution payload verification during block processing. - The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?): - I had to do this to deal with sending blocks into spawned tasks. - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones. - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap. - Avoids cloning *all the blocks* in *every chain segment* during sync. - It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:) - The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs. For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273 ## Changes to `canonical_head` and `fork_choice` Previously, the `BeaconChain` had two separate fields: ``` canonical_head: RwLock<Snapshot>, fork_choice: RwLock<BeaconForkChoice> ``` Now, we have grouped these values under a single struct: ``` canonical_head: CanonicalHead { cached_head: RwLock<Arc<Snapshot>>, fork_choice: RwLock<BeaconForkChoice> } ``` Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously. ## Breaking Changes ### The `state` (root) field in the `finalized_checkpoint` SSE event Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event: 1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`. 4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots. Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1). I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku. ## Notes for Reviewers I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct. I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking". I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it. I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around. Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2. You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests: - Changing tests to be `tokio::async` tests. - Adding `.await` to fork choice, block processing and block production functions. - Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`. - Wrapping `SignedBeaconBlock` in an `Arc`. - In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant. I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic. Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
#[tokio::test]
async fn produces_attestations() {
Faster attestation production (#838) * Start adding interop genesis state to lcli * Use more efficient method to generate genesis state * Remove duplicate int_to_bytes32 * Add lcli command to change state genesis time * Add option to allow VC to start with unsynced BN * Set VC to do parallel key loading * Don't default to dummy eth1 backend * Add endpoint to dump operation pool * Add metrics for op pool * Remove state clone for slot notifier * Add mem size approximation for tree hash cache * Avoid cloning tree hash when getting head * Avoid cloning tree hash when getting head * Add working arena-based cached tree hash * Add another benchmark * Add pre-allocation for caches * Make cache nullable * Fix bugs in cache tree hash * Add validator tree hash optimization * Optimize hash_concat * Make hash32_concat return fixed-len array * Fix failing API tests * Add new beacon state cache struct * Add validator-specific cache * Separate list and values arenas * Add parallel validator registry hashing * Remove MultiTreeHashCache * Remove cached tree hash macro * Fix failing tree hash test * Address Michael's comments * Add CachedTreeHash impl for ef tests * Fix messy merge conflict * Optimize attestation production * Add first basic optimizations * Fix SlotOutOfBounds error * Resolved missed merge conflicts * Fix another missed merge conflict * Fix more merge conflict issues * Add `StateSkipConfig` * Fix test compile errors * Add failing test * Fix bug, make tests pass * Add comment * Delete unused function * Replace deleted comment
2020-03-04 06:10:22 +00:00
let num_blocks_produced = MainnetEthSpec::slots_per_epoch() * 4;
Disallow attestation production earlier than head (#2130) ## Issue Addressed The non-finality period on Pyrmont between epochs [`9114`](https://pyrmont.beaconcha.in/epoch/9114) and [`9182`](https://pyrmont.beaconcha.in/epoch/9182) was contributed to by all the `lighthouse_team` validators going down. The nodes saw excessive CPU and RAM usage, resulting in the system to kill the `lighthouse bn` process. The `Restart=on-failure` directive for `systemd` caused the process to bounce in ~10-30m intervals. Diagnosis with `heaptrack` showed that the `BeaconChain::produce_unaggregated_attestation` function was calling `store::beacon_state::get_full_state` and sometimes resulting in a tree hash cache allocation. These allocations were approximately the size of the hosts physical memory and still allocated when `lighthouse bn` was killed by the OS. There was no CPU analysis (e.g., `perf`), but the `BeaconChain::produce_unaggregated_attestation` is very CPU-heavy so it is reasonable to assume it is the cause of the excessive CPU usage, too. ## Proposed Changes `BeaconChain::produce_unaggregated_attestation` has two paths: 1. Fast path: attesting to the head slot or later. 2. Slow path: attesting to a slot earlier than the head block. Path (2) is the only path that calls `store::beacon_state::get_full_state`, therefore it is the path causing this excessive CPU/RAM usage. This PR removes the current functionality of path (2) and replaces it with a static error (`BeaconChainError::AttestingPriorToHead`). This change reduces the generality of `BeaconChain::produce_unaggregated_attestation` (and therefore [`/eth/v1/validator/attestation_data`](https://ethereum.github.io/eth2.0-APIs/#/Validator/produceAttestationData)), but I argue that this functionality is an edge-case and arguably a violation of the [Honest Validator spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md). It's possible that a validator goes back to a prior slot to "catch up" and submit some missed attestations. This change would prevent such behaviour, returning an error. My concerns with this catch-up behaviour is that it is: - Not specified as "honest validator" attesting behaviour. - Is behaviour that is risky for slashing (although, all validator clients *should* have slashing protection and will eventually fail if they do not). - It disguises clock-sync issues between a BN and VC. ## Additional Info It's likely feasible to implement path (2) if we implement some sort of caching mechanism. This would be a multi-week task and this PR gets the issue patched in the short term. I haven't created an issue to add path (2), instead I think we should implement it if we get user-demand.
2021-01-20 06:52:37 +00:00
let additional_slots_tested = MainnetEthSpec::slots_per_epoch() * 3;
Faster attestation production (#838) * Start adding interop genesis state to lcli * Use more efficient method to generate genesis state * Remove duplicate int_to_bytes32 * Add lcli command to change state genesis time * Add option to allow VC to start with unsynced BN * Set VC to do parallel key loading * Don't default to dummy eth1 backend * Add endpoint to dump operation pool * Add metrics for op pool * Remove state clone for slot notifier * Add mem size approximation for tree hash cache * Avoid cloning tree hash when getting head * Avoid cloning tree hash when getting head * Add working arena-based cached tree hash * Add another benchmark * Add pre-allocation for caches * Make cache nullable * Fix bugs in cache tree hash * Add validator tree hash optimization * Optimize hash_concat * Make hash32_concat return fixed-len array * Fix failing API tests * Add new beacon state cache struct * Add validator-specific cache * Separate list and values arenas * Add parallel validator registry hashing * Remove MultiTreeHashCache * Remove cached tree hash macro * Fix failing tree hash test * Address Michael's comments * Add CachedTreeHash impl for ef tests * Fix messy merge conflict * Optimize attestation production * Add first basic optimizations * Fix SlotOutOfBounds error * Resolved missed merge conflicts * Fix another missed merge conflict * Fix more merge conflict issues * Add `StateSkipConfig` * Fix test compile errors * Add failing test * Fix bug, make tests pass * Add comment * Delete unused function * Replace deleted comment
2020-03-04 06:10:22 +00:00
let harness = BeaconChainHarness::builder(MainnetEthSpec)
.default_spec()
.keypairs(KEYPAIRS[..].to_vec())
.fresh_ephemeral_store()
.mock_execution_layer()
.build();
Faster attestation production (#838) * Start adding interop genesis state to lcli * Use more efficient method to generate genesis state * Remove duplicate int_to_bytes32 * Add lcli command to change state genesis time * Add option to allow VC to start with unsynced BN * Set VC to do parallel key loading * Don't default to dummy eth1 backend * Add endpoint to dump operation pool * Add metrics for op pool * Remove state clone for slot notifier * Add mem size approximation for tree hash cache * Avoid cloning tree hash when getting head * Avoid cloning tree hash when getting head * Add working arena-based cached tree hash * Add another benchmark * Add pre-allocation for caches * Make cache nullable * Fix bugs in cache tree hash * Add validator tree hash optimization * Optimize hash_concat * Make hash32_concat return fixed-len array * Fix failing API tests * Add new beacon state cache struct * Add validator-specific cache * Separate list and values arenas * Add parallel validator registry hashing * Remove MultiTreeHashCache * Remove cached tree hash macro * Fix failing tree hash test * Address Michael's comments * Add CachedTreeHash impl for ef tests * Fix messy merge conflict * Optimize attestation production * Add first basic optimizations * Fix SlotOutOfBounds error * Resolved missed merge conflicts * Fix another missed merge conflict * Fix more merge conflict issues * Add `StateSkipConfig` * Fix test compile errors * Add failing test * Fix bug, make tests pass * Add comment * Delete unused function * Replace deleted comment
2020-03-04 06:10:22 +00:00
let chain = &harness.chain;
Disallow attestation production earlier than head (#2130) ## Issue Addressed The non-finality period on Pyrmont between epochs [`9114`](https://pyrmont.beaconcha.in/epoch/9114) and [`9182`](https://pyrmont.beaconcha.in/epoch/9182) was contributed to by all the `lighthouse_team` validators going down. The nodes saw excessive CPU and RAM usage, resulting in the system to kill the `lighthouse bn` process. The `Restart=on-failure` directive for `systemd` caused the process to bounce in ~10-30m intervals. Diagnosis with `heaptrack` showed that the `BeaconChain::produce_unaggregated_attestation` function was calling `store::beacon_state::get_full_state` and sometimes resulting in a tree hash cache allocation. These allocations were approximately the size of the hosts physical memory and still allocated when `lighthouse bn` was killed by the OS. There was no CPU analysis (e.g., `perf`), but the `BeaconChain::produce_unaggregated_attestation` is very CPU-heavy so it is reasonable to assume it is the cause of the excessive CPU usage, too. ## Proposed Changes `BeaconChain::produce_unaggregated_attestation` has two paths: 1. Fast path: attesting to the head slot or later. 2. Slow path: attesting to a slot earlier than the head block. Path (2) is the only path that calls `store::beacon_state::get_full_state`, therefore it is the path causing this excessive CPU/RAM usage. This PR removes the current functionality of path (2) and replaces it with a static error (`BeaconChainError::AttestingPriorToHead`). This change reduces the generality of `BeaconChain::produce_unaggregated_attestation` (and therefore [`/eth/v1/validator/attestation_data`](https://ethereum.github.io/eth2.0-APIs/#/Validator/produceAttestationData)), but I argue that this functionality is an edge-case and arguably a violation of the [Honest Validator spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md). It's possible that a validator goes back to a prior slot to "catch up" and submit some missed attestations. This change would prevent such behaviour, returning an error. My concerns with this catch-up behaviour is that it is: - Not specified as "honest validator" attesting behaviour. - Is behaviour that is risky for slashing (although, all validator clients *should* have slashing protection and will eventually fail if they do not). - It disguises clock-sync issues between a BN and VC. ## Additional Info It's likely feasible to implement path (2) if we implement some sort of caching mechanism. This would be a multi-week task and this PR gets the issue patched in the short term. I haven't created an issue to add path (2), instead I think we should implement it if we get user-demand.
2021-01-20 06:52:37 +00:00
// Test all valid committee indices for all slots in the chain.
// for slot in 0..=current_slot.as_u64() + MainnetEthSpec::slots_per_epoch() * 3 {
for slot in 0..=num_blocks_produced + additional_slots_tested {
if slot > 0 && slot <= num_blocks_produced {
harness.advance_slot();
Faster attestation production (#838) * Start adding interop genesis state to lcli * Use more efficient method to generate genesis state * Remove duplicate int_to_bytes32 * Add lcli command to change state genesis time * Add option to allow VC to start with unsynced BN * Set VC to do parallel key loading * Don't default to dummy eth1 backend * Add endpoint to dump operation pool * Add metrics for op pool * Remove state clone for slot notifier * Add mem size approximation for tree hash cache * Avoid cloning tree hash when getting head * Avoid cloning tree hash when getting head * Add working arena-based cached tree hash * Add another benchmark * Add pre-allocation for caches * Make cache nullable * Fix bugs in cache tree hash * Add validator tree hash optimization * Optimize hash_concat * Make hash32_concat return fixed-len array * Fix failing API tests * Add new beacon state cache struct * Add validator-specific cache * Separate list and values arenas * Add parallel validator registry hashing * Remove MultiTreeHashCache * Remove cached tree hash macro * Fix failing tree hash test * Address Michael's comments * Add CachedTreeHash impl for ef tests * Fix messy merge conflict * Optimize attestation production * Add first basic optimizations * Fix SlotOutOfBounds error * Resolved missed merge conflicts * Fix another missed merge conflict * Fix more merge conflict issues * Add `StateSkipConfig` * Fix test compile errors * Add failing test * Fix bug, make tests pass * Add comment * Delete unused function * Replace deleted comment
2020-03-04 06:10:22 +00:00
Use async code when interacting with EL (#3244) ## Overview This rather extensive PR achieves two primary goals: 1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state. 2. Refactors fork choice, block production and block processing to `async` functions. Additionally, it achieves: - Concurrent forkchoice updates to the EL and cache pruning after a new head is selected. - Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production. - Concurrent per-block-processing and execution payload verification during block processing. - The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?): - I had to do this to deal with sending blocks into spawned tasks. - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones. - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap. - Avoids cloning *all the blocks* in *every chain segment* during sync. - It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:) - The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs. For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273 ## Changes to `canonical_head` and `fork_choice` Previously, the `BeaconChain` had two separate fields: ``` canonical_head: RwLock<Snapshot>, fork_choice: RwLock<BeaconForkChoice> ``` Now, we have grouped these values under a single struct: ``` canonical_head: CanonicalHead { cached_head: RwLock<Arc<Snapshot>>, fork_choice: RwLock<BeaconForkChoice> } ``` Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously. ## Breaking Changes ### The `state` (root) field in the `finalized_checkpoint` SSE event Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event: 1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`. 4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots. Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1). I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku. ## Notes for Reviewers I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct. I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking". I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it. I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around. Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2. You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests: - Changing tests to be `tokio::async` tests. - Adding `.await` to fork choice, block processing and block production functions. - Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`. - Wrapping `SignedBeaconBlock` in an `Arc`. - In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant. I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic. Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
harness
.extend_chain(
1,
BlockStrategy::OnCanonicalHead,
AttestationStrategy::AllValidators,
)
.await;
Disallow attestation production earlier than head (#2130) ## Issue Addressed The non-finality period on Pyrmont between epochs [`9114`](https://pyrmont.beaconcha.in/epoch/9114) and [`9182`](https://pyrmont.beaconcha.in/epoch/9182) was contributed to by all the `lighthouse_team` validators going down. The nodes saw excessive CPU and RAM usage, resulting in the system to kill the `lighthouse bn` process. The `Restart=on-failure` directive for `systemd` caused the process to bounce in ~10-30m intervals. Diagnosis with `heaptrack` showed that the `BeaconChain::produce_unaggregated_attestation` function was calling `store::beacon_state::get_full_state` and sometimes resulting in a tree hash cache allocation. These allocations were approximately the size of the hosts physical memory and still allocated when `lighthouse bn` was killed by the OS. There was no CPU analysis (e.g., `perf`), but the `BeaconChain::produce_unaggregated_attestation` is very CPU-heavy so it is reasonable to assume it is the cause of the excessive CPU usage, too. ## Proposed Changes `BeaconChain::produce_unaggregated_attestation` has two paths: 1. Fast path: attesting to the head slot or later. 2. Slow path: attesting to a slot earlier than the head block. Path (2) is the only path that calls `store::beacon_state::get_full_state`, therefore it is the path causing this excessive CPU/RAM usage. This PR removes the current functionality of path (2) and replaces it with a static error (`BeaconChainError::AttestingPriorToHead`). This change reduces the generality of `BeaconChain::produce_unaggregated_attestation` (and therefore [`/eth/v1/validator/attestation_data`](https://ethereum.github.io/eth2.0-APIs/#/Validator/produceAttestationData)), but I argue that this functionality is an edge-case and arguably a violation of the [Honest Validator spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md). It's possible that a validator goes back to a prior slot to "catch up" and submit some missed attestations. This change would prevent such behaviour, returning an error. My concerns with this catch-up behaviour is that it is: - Not specified as "honest validator" attesting behaviour. - Is behaviour that is risky for slashing (although, all validator clients *should* have slashing protection and will eventually fail if they do not). - It disguises clock-sync issues between a BN and VC. ## Additional Info It's likely feasible to implement path (2) if we implement some sort of caching mechanism. This would be a multi-week task and this PR gets the issue patched in the short term. I haven't created an issue to add path (2), instead I think we should implement it if we get user-demand.
2021-01-20 06:52:37 +00:00
}
Faster attestation production (#838) * Start adding interop genesis state to lcli * Use more efficient method to generate genesis state * Remove duplicate int_to_bytes32 * Add lcli command to change state genesis time * Add option to allow VC to start with unsynced BN * Set VC to do parallel key loading * Don't default to dummy eth1 backend * Add endpoint to dump operation pool * Add metrics for op pool * Remove state clone for slot notifier * Add mem size approximation for tree hash cache * Avoid cloning tree hash when getting head * Avoid cloning tree hash when getting head * Add working arena-based cached tree hash * Add another benchmark * Add pre-allocation for caches * Make cache nullable * Fix bugs in cache tree hash * Add validator tree hash optimization * Optimize hash_concat * Make hash32_concat return fixed-len array * Fix failing API tests * Add new beacon state cache struct * Add validator-specific cache * Separate list and values arenas * Add parallel validator registry hashing * Remove MultiTreeHashCache * Remove cached tree hash macro * Fix failing tree hash test * Address Michael's comments * Add CachedTreeHash impl for ef tests * Fix messy merge conflict * Optimize attestation production * Add first basic optimizations * Fix SlotOutOfBounds error * Resolved missed merge conflicts * Fix another missed merge conflict * Fix more merge conflict issues * Add `StateSkipConfig` * Fix test compile errors * Add failing test * Fix bug, make tests pass * Add comment * Delete unused function * Replace deleted comment
2020-03-04 06:10:22 +00:00
let slot = Slot::from(slot);
Fix head tracker concurrency bugs (#1771) ## Issue Addressed Closes #1557 ## Proposed Changes Modify the pruning algorithm so that it mutates the head-tracker _before_ committing the database transaction to disk, and _only if_ all the heads to be removed are still present in the head-tracker (i.e. no concurrent mutations). In the process of writing and testing this I also had to make a few other changes: * Use internal mutability for all `BeaconChainHarness` functions (namely the RNG and the graffiti), in order to enable parallel calls (see testing section below). * Disable logging in harness tests unless the `test_logger` feature is turned on And chose to make some clean-ups: * Delete the `NullMigrator` * Remove type-based configuration for the migrator in favour of runtime config (simpler, less duplicated code) * Use the non-blocking migrator unless the blocking migrator is required. In the store tests we need the blocking migrator because some tests make asserts about the state of the DB after the migration has run. * Rename `validators_keypairs` -> `validator_keypairs` in the `BeaconChainHarness` ## Testing To confirm that the fix worked, I wrote a test using [Hiatus](https://crates.io/crates/hiatus), which can be found here: https://github.com/michaelsproul/lighthouse/tree/hiatus-issue-1557 That test can't be merged because it inserts random breakpoints everywhere, but if you check out that branch you can run the test with: ``` $ cd beacon_node/beacon_chain $ cargo test --release --test parallel_tests --features test_logger ``` It should pass, and the log output should show: ``` WARN Pruning deferred because of a concurrent mutation, message: this is expected only very rarely! ``` ## Additional Info This is a backwards-compatible change with no impact on consensus.
2020-10-19 05:58:39 +00:00
let mut state = chain
Faster attestation production (#838) * Start adding interop genesis state to lcli * Use more efficient method to generate genesis state * Remove duplicate int_to_bytes32 * Add lcli command to change state genesis time * Add option to allow VC to start with unsynced BN * Set VC to do parallel key loading * Don't default to dummy eth1 backend * Add endpoint to dump operation pool * Add metrics for op pool * Remove state clone for slot notifier * Add mem size approximation for tree hash cache * Avoid cloning tree hash when getting head * Avoid cloning tree hash when getting head * Add working arena-based cached tree hash * Add another benchmark * Add pre-allocation for caches * Make cache nullable * Fix bugs in cache tree hash * Add validator tree hash optimization * Optimize hash_concat * Make hash32_concat return fixed-len array * Fix failing API tests * Add new beacon state cache struct * Add validator-specific cache * Separate list and values arenas * Add parallel validator registry hashing * Remove MultiTreeHashCache * Remove cached tree hash macro * Fix failing tree hash test * Address Michael's comments * Add CachedTreeHash impl for ef tests * Fix messy merge conflict * Optimize attestation production * Add first basic optimizations * Fix SlotOutOfBounds error * Resolved missed merge conflicts * Fix another missed merge conflict * Fix more merge conflict issues * Add `StateSkipConfig` * Fix test compile errors * Add failing test * Fix bug, make tests pass * Add comment * Delete unused function * Replace deleted comment
2020-03-04 06:10:22 +00:00
.state_at_slot(slot, StateSkipConfig::WithStateRoots)
.expect("should get state");
Disallow attestation production earlier than head (#2130) ## Issue Addressed The non-finality period on Pyrmont between epochs [`9114`](https://pyrmont.beaconcha.in/epoch/9114) and [`9182`](https://pyrmont.beaconcha.in/epoch/9182) was contributed to by all the `lighthouse_team` validators going down. The nodes saw excessive CPU and RAM usage, resulting in the system to kill the `lighthouse bn` process. The `Restart=on-failure` directive for `systemd` caused the process to bounce in ~10-30m intervals. Diagnosis with `heaptrack` showed that the `BeaconChain::produce_unaggregated_attestation` function was calling `store::beacon_state::get_full_state` and sometimes resulting in a tree hash cache allocation. These allocations were approximately the size of the hosts physical memory and still allocated when `lighthouse bn` was killed by the OS. There was no CPU analysis (e.g., `perf`), but the `BeaconChain::produce_unaggregated_attestation` is very CPU-heavy so it is reasonable to assume it is the cause of the excessive CPU usage, too. ## Proposed Changes `BeaconChain::produce_unaggregated_attestation` has two paths: 1. Fast path: attesting to the head slot or later. 2. Slow path: attesting to a slot earlier than the head block. Path (2) is the only path that calls `store::beacon_state::get_full_state`, therefore it is the path causing this excessive CPU/RAM usage. This PR removes the current functionality of path (2) and replaces it with a static error (`BeaconChainError::AttestingPriorToHead`). This change reduces the generality of `BeaconChain::produce_unaggregated_attestation` (and therefore [`/eth/v1/validator/attestation_data`](https://ethereum.github.io/eth2.0-APIs/#/Validator/produceAttestationData)), but I argue that this functionality is an edge-case and arguably a violation of the [Honest Validator spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md). It's possible that a validator goes back to a prior slot to "catch up" and submit some missed attestations. This change would prevent such behaviour, returning an error. My concerns with this catch-up behaviour is that it is: - Not specified as "honest validator" attesting behaviour. - Is behaviour that is risky for slashing (although, all validator clients *should* have slashing protection and will eventually fail if they do not). - It disguises clock-sync issues between a BN and VC. ## Additional Info It's likely feasible to implement path (2) if we implement some sort of caching mechanism. This would be a multi-week task and this PR gets the issue patched in the short term. I haven't created an issue to add path (2), instead I think we should implement it if we get user-demand.
2021-01-20 06:52:37 +00:00
let block_slot = if slot <= num_blocks_produced {
Faster attestation production (#838) * Start adding interop genesis state to lcli * Use more efficient method to generate genesis state * Remove duplicate int_to_bytes32 * Add lcli command to change state genesis time * Add option to allow VC to start with unsynced BN * Set VC to do parallel key loading * Don't default to dummy eth1 backend * Add endpoint to dump operation pool * Add metrics for op pool * Remove state clone for slot notifier * Add mem size approximation for tree hash cache * Avoid cloning tree hash when getting head * Avoid cloning tree hash when getting head * Add working arena-based cached tree hash * Add another benchmark * Add pre-allocation for caches * Make cache nullable * Fix bugs in cache tree hash * Add validator tree hash optimization * Optimize hash_concat * Make hash32_concat return fixed-len array * Fix failing API tests * Add new beacon state cache struct * Add validator-specific cache * Separate list and values arenas * Add parallel validator registry hashing * Remove MultiTreeHashCache * Remove cached tree hash macro * Fix failing tree hash test * Address Michael's comments * Add CachedTreeHash impl for ef tests * Fix messy merge conflict * Optimize attestation production * Add first basic optimizations * Fix SlotOutOfBounds error * Resolved missed merge conflicts * Fix another missed merge conflict * Fix more merge conflict issues * Add `StateSkipConfig` * Fix test compile errors * Add failing test * Fix bug, make tests pass * Add comment * Delete unused function * Replace deleted comment
2020-03-04 06:10:22 +00:00
slot
Disallow attestation production earlier than head (#2130) ## Issue Addressed The non-finality period on Pyrmont between epochs [`9114`](https://pyrmont.beaconcha.in/epoch/9114) and [`9182`](https://pyrmont.beaconcha.in/epoch/9182) was contributed to by all the `lighthouse_team` validators going down. The nodes saw excessive CPU and RAM usage, resulting in the system to kill the `lighthouse bn` process. The `Restart=on-failure` directive for `systemd` caused the process to bounce in ~10-30m intervals. Diagnosis with `heaptrack` showed that the `BeaconChain::produce_unaggregated_attestation` function was calling `store::beacon_state::get_full_state` and sometimes resulting in a tree hash cache allocation. These allocations were approximately the size of the hosts physical memory and still allocated when `lighthouse bn` was killed by the OS. There was no CPU analysis (e.g., `perf`), but the `BeaconChain::produce_unaggregated_attestation` is very CPU-heavy so it is reasonable to assume it is the cause of the excessive CPU usage, too. ## Proposed Changes `BeaconChain::produce_unaggregated_attestation` has two paths: 1. Fast path: attesting to the head slot or later. 2. Slow path: attesting to a slot earlier than the head block. Path (2) is the only path that calls `store::beacon_state::get_full_state`, therefore it is the path causing this excessive CPU/RAM usage. This PR removes the current functionality of path (2) and replaces it with a static error (`BeaconChainError::AttestingPriorToHead`). This change reduces the generality of `BeaconChain::produce_unaggregated_attestation` (and therefore [`/eth/v1/validator/attestation_data`](https://ethereum.github.io/eth2.0-APIs/#/Validator/produceAttestationData)), but I argue that this functionality is an edge-case and arguably a violation of the [Honest Validator spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md). It's possible that a validator goes back to a prior slot to "catch up" and submit some missed attestations. This change would prevent such behaviour, returning an error. My concerns with this catch-up behaviour is that it is: - Not specified as "honest validator" attesting behaviour. - Is behaviour that is risky for slashing (although, all validator clients *should* have slashing protection and will eventually fail if they do not). - It disguises clock-sync issues between a BN and VC. ## Additional Info It's likely feasible to implement path (2) if we implement some sort of caching mechanism. This would be a multi-week task and this PR gets the issue patched in the short term. I haven't created an issue to add path (2), instead I think we should implement it if we get user-demand.
2021-01-20 06:52:37 +00:00
} else {
Slot::from(num_blocks_produced)
Faster attestation production (#838) * Start adding interop genesis state to lcli * Use more efficient method to generate genesis state * Remove duplicate int_to_bytes32 * Add lcli command to change state genesis time * Add option to allow VC to start with unsynced BN * Set VC to do parallel key loading * Don't default to dummy eth1 backend * Add endpoint to dump operation pool * Add metrics for op pool * Remove state clone for slot notifier * Add mem size approximation for tree hash cache * Avoid cloning tree hash when getting head * Avoid cloning tree hash when getting head * Add working arena-based cached tree hash * Add another benchmark * Add pre-allocation for caches * Make cache nullable * Fix bugs in cache tree hash * Add validator tree hash optimization * Optimize hash_concat * Make hash32_concat return fixed-len array * Fix failing API tests * Add new beacon state cache struct * Add validator-specific cache * Separate list and values arenas * Add parallel validator registry hashing * Remove MultiTreeHashCache * Remove cached tree hash macro * Fix failing tree hash test * Address Michael's comments * Add CachedTreeHash impl for ef tests * Fix messy merge conflict * Optimize attestation production * Add first basic optimizations * Fix SlotOutOfBounds error * Resolved missed merge conflicts * Fix another missed merge conflict * Fix more merge conflict issues * Add `StateSkipConfig` * Fix test compile errors * Add failing test * Fix bug, make tests pass * Add comment * Delete unused function * Replace deleted comment
2020-03-04 06:10:22 +00:00
};
Disallow attestation production earlier than head (#2130) ## Issue Addressed The non-finality period on Pyrmont between epochs [`9114`](https://pyrmont.beaconcha.in/epoch/9114) and [`9182`](https://pyrmont.beaconcha.in/epoch/9182) was contributed to by all the `lighthouse_team` validators going down. The nodes saw excessive CPU and RAM usage, resulting in the system to kill the `lighthouse bn` process. The `Restart=on-failure` directive for `systemd` caused the process to bounce in ~10-30m intervals. Diagnosis with `heaptrack` showed that the `BeaconChain::produce_unaggregated_attestation` function was calling `store::beacon_state::get_full_state` and sometimes resulting in a tree hash cache allocation. These allocations were approximately the size of the hosts physical memory and still allocated when `lighthouse bn` was killed by the OS. There was no CPU analysis (e.g., `perf`), but the `BeaconChain::produce_unaggregated_attestation` is very CPU-heavy so it is reasonable to assume it is the cause of the excessive CPU usage, too. ## Proposed Changes `BeaconChain::produce_unaggregated_attestation` has two paths: 1. Fast path: attesting to the head slot or later. 2. Slow path: attesting to a slot earlier than the head block. Path (2) is the only path that calls `store::beacon_state::get_full_state`, therefore it is the path causing this excessive CPU/RAM usage. This PR removes the current functionality of path (2) and replaces it with a static error (`BeaconChainError::AttestingPriorToHead`). This change reduces the generality of `BeaconChain::produce_unaggregated_attestation` (and therefore [`/eth/v1/validator/attestation_data`](https://ethereum.github.io/eth2.0-APIs/#/Validator/produceAttestationData)), but I argue that this functionality is an edge-case and arguably a violation of the [Honest Validator spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md). It's possible that a validator goes back to a prior slot to "catch up" and submit some missed attestations. This change would prevent such behaviour, returning an error. My concerns with this catch-up behaviour is that it is: - Not specified as "honest validator" attesting behaviour. - Is behaviour that is risky for slashing (although, all validator clients *should* have slashing protection and will eventually fail if they do not). - It disguises clock-sync issues between a BN and VC. ## Additional Info It's likely feasible to implement path (2) if we implement some sort of caching mechanism. This would be a multi-week task and this PR gets the issue patched in the short term. I haven't created an issue to add path (2), instead I think we should implement it if we get user-demand.
2021-01-20 06:52:37 +00:00
Separate execution payloads in the DB (#3157) ## Proposed Changes Reduce post-merge disk usage by not storing finalized execution payloads in Lighthouse's database. :warning: **This is achieved in a backwards-incompatible way for networks that have already merged** :warning:. Kiln users and shadow fork enjoyers will be unable to downgrade after running the code from this PR. The upgrade migration may take several minutes to run, and can't be aborted after it begins. The main changes are: - New column in the database called `ExecPayload`, keyed by beacon block root. - The `BeaconBlock` column now stores blinded blocks only. - Lots of places that previously used full blocks now use blinded blocks, e.g. analytics APIs, block replay in the DB, etc. - On finalization: - `prune_abanonded_forks` deletes non-canonical payloads whilst deleting non-canonical blocks. - `migrate_db` deletes finalized canonical payloads whilst deleting finalized states. - Conversions between blinded and full blocks are implemented in a compositional way, duplicating some work from Sean's PR #3134. - The execution layer has a new `get_payload_by_block_hash` method that reconstructs a payload using the EE's `eth_getBlockByHash` call. - I've tested manually that it works on Kiln, using Geth and Nethermind. - This isn't necessarily the most efficient method, and new engine APIs are being discussed to improve this: https://github.com/ethereum/execution-apis/pull/146. - We're depending on the `ethers` master branch, due to lots of recent changes. We're also using a workaround for https://github.com/gakonst/ethers-rs/issues/1134. - Payload reconstruction is used in the HTTP API via `BeaconChain::get_block`, which is now `async`. Due to the `async` fn, the `blocking_json` wrapper has been removed. - Payload reconstruction is used in network RPC to serve blocks-by-{root,range} responses. Here the `async` adjustment is messier, although I think I've managed to come up with a reasonable compromise: the handlers take the `SendOnDrop` by value so that they can drop it on _task completion_ (after the `fn` returns). Still, this is introducing disk reads onto core executor threads, which may have a negative performance impact (thoughts appreciated). ## Additional Info - [x] For performance it would be great to remove the cloning of full blocks when converting them to blinded blocks to write to disk. I'm going to experiment with a `put_block` API that takes the block by value, breaks it into a blinded block and a payload, stores the blinded block, and then re-assembles the full block for the caller. - [x] We should measure the latency of blocks-by-root and blocks-by-range responses. - [x] We should add integration tests that stress the payload reconstruction (basic tests done, issue for more extensive tests: https://github.com/sigp/lighthouse/issues/3159) - [x] We should (manually) test the schema v9 migration from several prior versions, particularly as blocks have changed on disk and some migrations rely on being able to load blocks. Co-authored-by: Paul Hauner <paul@paulhauner.com>
2022-05-12 00:42:17 +00:00
let blinded_block = chain
Use the forwards iterator more often (#2376) ## Issue Addressed NA ## Primary Change When investigating memory usage, I noticed that retrieving a block from an early slot (e.g., slot 900) would cause a sharp increase in the memory footprint (from 400mb to 800mb+) which seemed to be ever-lasting. After some investigation, I found that the reverse iteration from the head back to that slot was the likely culprit. To counter this, I've switched the `BeaconChain::block_root_at_slot` to use the forwards iterator, instead of the reverse one. I also noticed that the networking stack is using `BeaconChain::root_at_slot` to check if a peer is relevant (`check_peer_relevance`). Perhaps the steep, seemingly-random-but-consistent increases in memory usage are caused by the use of this function. Using the forwards iterator with the HTTP API alleviated the sharp increases in memory usage. It also made the response much faster (before it felt like to took 1-2s, now it feels instant). ## Additional Changes In the process I also noticed that we have two functions for getting block roots: - `BeaconChain::block_root_at_slot`: returns `None` for a skip slot. - `BeaconChain::root_at_slot`: returns the previous root for a skip slot. I unified these two functions into `block_root_at_slot` and added the `WhenSlotSkipped` enum. Now, the caller must be explicit about the skip-slot behaviour when requesting a root. Additionally, I replaced `vec![]` with `Vec::with_capacity` in `store::chunked_vector::range_query`. I stumbled across this whilst debugging and made this modification to see what effect it would have (not much). It seems like a decent change to keep around, but I'm not concerned either way. Also, `BeaconChain::get_ancestor_block_root` is unused, so I got rid of it :wastebasket:. ## Additional Info I haven't also done the same for state roots here. Whilst it's possible and a good idea, it's more work since the fwds iterators are presently block-roots-specific. Whilst there's a few places a reverse iteration of state roots could be triggered (e.g., attestation production, HTTP API), they're no where near as common as the `check_peer_relevance` call. As such, I think we should get this PR merged first, then come back for the state root iters. I made an issue here https://github.com/sigp/lighthouse/issues/2377.
2021-05-31 04:18:20 +00:00
.block_at_slot(block_slot, WhenSlotSkipped::Prev)
Faster attestation production (#838) * Start adding interop genesis state to lcli * Use more efficient method to generate genesis state * Remove duplicate int_to_bytes32 * Add lcli command to change state genesis time * Add option to allow VC to start with unsynced BN * Set VC to do parallel key loading * Don't default to dummy eth1 backend * Add endpoint to dump operation pool * Add metrics for op pool * Remove state clone for slot notifier * Add mem size approximation for tree hash cache * Avoid cloning tree hash when getting head * Avoid cloning tree hash when getting head * Add working arena-based cached tree hash * Add another benchmark * Add pre-allocation for caches * Make cache nullable * Fix bugs in cache tree hash * Add validator tree hash optimization * Optimize hash_concat * Make hash32_concat return fixed-len array * Fix failing API tests * Add new beacon state cache struct * Add validator-specific cache * Separate list and values arenas * Add parallel validator registry hashing * Remove MultiTreeHashCache * Remove cached tree hash macro * Fix failing tree hash test * Address Michael's comments * Add CachedTreeHash impl for ef tests * Fix messy merge conflict * Optimize attestation production * Add first basic optimizations * Fix SlotOutOfBounds error * Resolved missed merge conflicts * Fix another missed merge conflict * Fix more merge conflict issues * Add `StateSkipConfig` * Fix test compile errors * Add failing test * Fix bug, make tests pass * Add comment * Delete unused function * Replace deleted comment
2020-03-04 06:10:22 +00:00
.expect("should get block")
.expect("block should not be skipped");
Separate execution payloads in the DB (#3157) ## Proposed Changes Reduce post-merge disk usage by not storing finalized execution payloads in Lighthouse's database. :warning: **This is achieved in a backwards-incompatible way for networks that have already merged** :warning:. Kiln users and shadow fork enjoyers will be unable to downgrade after running the code from this PR. The upgrade migration may take several minutes to run, and can't be aborted after it begins. The main changes are: - New column in the database called `ExecPayload`, keyed by beacon block root. - The `BeaconBlock` column now stores blinded blocks only. - Lots of places that previously used full blocks now use blinded blocks, e.g. analytics APIs, block replay in the DB, etc. - On finalization: - `prune_abanonded_forks` deletes non-canonical payloads whilst deleting non-canonical blocks. - `migrate_db` deletes finalized canonical payloads whilst deleting finalized states. - Conversions between blinded and full blocks are implemented in a compositional way, duplicating some work from Sean's PR #3134. - The execution layer has a new `get_payload_by_block_hash` method that reconstructs a payload using the EE's `eth_getBlockByHash` call. - I've tested manually that it works on Kiln, using Geth and Nethermind. - This isn't necessarily the most efficient method, and new engine APIs are being discussed to improve this: https://github.com/ethereum/execution-apis/pull/146. - We're depending on the `ethers` master branch, due to lots of recent changes. We're also using a workaround for https://github.com/gakonst/ethers-rs/issues/1134. - Payload reconstruction is used in the HTTP API via `BeaconChain::get_block`, which is now `async`. Due to the `async` fn, the `blocking_json` wrapper has been removed. - Payload reconstruction is used in network RPC to serve blocks-by-{root,range} responses. Here the `async` adjustment is messier, although I think I've managed to come up with a reasonable compromise: the handlers take the `SendOnDrop` by value so that they can drop it on _task completion_ (after the `fn` returns). Still, this is introducing disk reads onto core executor threads, which may have a negative performance impact (thoughts appreciated). ## Additional Info - [x] For performance it would be great to remove the cloning of full blocks when converting them to blinded blocks to write to disk. I'm going to experiment with a `put_block` API that takes the block by value, breaks it into a blinded block and a payload, stores the blinded block, and then re-assembles the full block for the caller. - [x] We should measure the latency of blocks-by-root and blocks-by-range responses. - [x] We should add integration tests that stress the payload reconstruction (basic tests done, issue for more extensive tests: https://github.com/sigp/lighthouse/issues/3159) - [x] We should (manually) test the schema v9 migration from several prior versions, particularly as blocks have changed on disk and some migrations rely on being able to load blocks. Co-authored-by: Paul Hauner <paul@paulhauner.com>
2022-05-12 00:42:17 +00:00
let block_root = blinded_block.message().tree_hash_root();
let block = chain
.store
.make_full_block(&block_root, blinded_block)
.unwrap();
Devnet6 (#4404) * some blob reprocessing work * remove ForceBlockLookup * reorder enum match arms in sync manager * a lot more reprocessing work * impl logic for triggerng blob lookups along with block lookups * deal with rpc blobs in groups per block in the da checker. don't cache missing blob ids in the da checker. * make single block lookup generic * more work * add delayed processing logic and combine some requests * start fixing some compile errors * fix compilation in main block lookup mod * much work * get things compiling * parent blob lookups * fix compile * revert red/stevie changes * fix up sync manager delay message logic * add peer usefulness enum * should remove lookup refactor * consolidate retry error handling * improve peer scoring during certain failures in parent lookups * improve retry code * drop parent lookup if either req has a peer disconnect during download * refactor single block processed method * processing peer refactor * smol bugfix * fix some todos * fix lints * fix lints * fix compile in lookup tests * fix lints * fix lints * fix existing block lookup tests * renamings * fix after merge * cargo fmt * compilation fix in beacon chain tests * fix * refactor lookup tests to work with multiple forks and response types * make tests into macros * wrap availability check error * fix compile after merge * add random blobs * start fixing up lookup verify error handling * some bug fixes and the start of deneb only tests * make tests work for all forks * track information about peer source * error refactoring * improve peer scoring * fix test compilation * make sure blobs are sent for processing after stream termination, delete copied tests * add some tests and fix a bug * smol bugfixes and moar tests * add tests and fix some things * compile after merge * lots of refactoring * retry on invalid block/blob * merge unknown parent messages before current slot lookup * get tests compiling * penalize blob peer on invalid blobs * Check disk on in-memory cache miss * Update beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs * Update beacon_node/network/src/sync/network_context.rs Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com> * fix bug in matching blocks and blobs in range sync * pr feedback * fix conflicts * upgrade logs from warn to crit when we receive incorrect response in range * synced_and_connected_within_tolerance -> should_search_for_block * remove todo * add data gas used and update excess data gas to u64 * Fix Broken Overflow Tests * payload verification with commitments * fix merge conflicts * restore payload file * Restore payload file * remove todo * add max blob commitments per block * c-kzg lib update * Fix ef tests * Abstract over minimal/mainnet spec in kzg crate * Start integrating new KZG * checkpoint sync without alignment * checkpoint sync without alignment * add import * add import * query for checkpoint state by slot rather than state root (teku doesn't serve by state root) * query for checkpoint state by slot rather than state root (teku doesn't serve by state root) * loosen check * get state first and query by most recent block root * Revert "loosen check" This reverts commit 069d13dd63aa794a3505db9f17bd1a6b73f0be81. * get state first and query by most recent block root * merge max blobs change * simplify delay logic * rename unknown parent sync message variants * rename parameter, block_slot -> slot * add some docs to the lookup module * use interval instead of sleep * drop request if blocks and blobs requests both return `None` for `Id` * clean up `find_single_lookup` logic * add lookup source enum * clean up `find_single_lookup` logic * add docs to find_single_lookup_request * move LookupSource our of param where unnecessary * remove unnecessary todo * query for block by `state.latest_block_header.slot` * fix lint * fix merge transition ef tests * fix test * fix test * fix observed blob sidecars test * Add some metrics (#33) * fix protocol limits for blobs by root * Update Engine API for 1:1 Structure Method * make beacon chain tests to fix devnet 6 changes * get ckzg working and fix some tests * fix remaining tests * fix lints * Fix KZG linking issues * remove unused dep * lockfile * test fixes * remove dbgs * remove unwrap * cleanup tx generator * small fixes * fixing fixes * more self reivew * more self review * refactor genesis header initialization * refactor mock el instantiations * fix compile * fix network test, make sure they run for each fork * pr feedback * fix last test (hopefully) --------- Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com> Co-authored-by: Mark Mackey <mark@sigmaprime.io> Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2023-06-29 19:35:43 +00:00
let blobs = chain.get_blobs(&block_root).unwrap();
Faster attestation production (#838) * Start adding interop genesis state to lcli * Use more efficient method to generate genesis state * Remove duplicate int_to_bytes32 * Add lcli command to change state genesis time * Add option to allow VC to start with unsynced BN * Set VC to do parallel key loading * Don't default to dummy eth1 backend * Add endpoint to dump operation pool * Add metrics for op pool * Remove state clone for slot notifier * Add mem size approximation for tree hash cache * Avoid cloning tree hash when getting head * Avoid cloning tree hash when getting head * Add working arena-based cached tree hash * Add another benchmark * Add pre-allocation for caches * Make cache nullable * Fix bugs in cache tree hash * Add validator tree hash optimization * Optimize hash_concat * Make hash32_concat return fixed-len array * Fix failing API tests * Add new beacon state cache struct * Add validator-specific cache * Separate list and values arenas * Add parallel validator registry hashing * Remove MultiTreeHashCache * Remove cached tree hash macro * Fix failing tree hash test * Address Michael's comments * Add CachedTreeHash impl for ef tests * Fix messy merge conflict * Optimize attestation production * Add first basic optimizations * Fix SlotOutOfBounds error * Resolved missed merge conflicts * Fix another missed merge conflict * Fix more merge conflict issues * Add `StateSkipConfig` * Fix test compile errors * Add failing test * Fix bug, make tests pass * Add comment * Delete unused function * Replace deleted comment
2020-03-04 06:10:22 +00:00
let epoch_boundary_slot = state
.current_epoch()
.start_slot(MainnetEthSpec::slots_per_epoch());
let target_root = if state.slot() == epoch_boundary_slot {
Faster attestation production (#838) * Start adding interop genesis state to lcli * Use more efficient method to generate genesis state * Remove duplicate int_to_bytes32 * Add lcli command to change state genesis time * Add option to allow VC to start with unsynced BN * Set VC to do parallel key loading * Don't default to dummy eth1 backend * Add endpoint to dump operation pool * Add metrics for op pool * Remove state clone for slot notifier * Add mem size approximation for tree hash cache * Avoid cloning tree hash when getting head * Avoid cloning tree hash when getting head * Add working arena-based cached tree hash * Add another benchmark * Add pre-allocation for caches * Make cache nullable * Fix bugs in cache tree hash * Add validator tree hash optimization * Optimize hash_concat * Make hash32_concat return fixed-len array * Fix failing API tests * Add new beacon state cache struct * Add validator-specific cache * Separate list and values arenas * Add parallel validator registry hashing * Remove MultiTreeHashCache * Remove cached tree hash macro * Fix failing tree hash test * Address Michael's comments * Add CachedTreeHash impl for ef tests * Fix messy merge conflict * Optimize attestation production * Add first basic optimizations * Fix SlotOutOfBounds error * Resolved missed merge conflicts * Fix another missed merge conflict * Fix more merge conflict issues * Add `StateSkipConfig` * Fix test compile errors * Add failing test * Fix bug, make tests pass * Add comment * Delete unused function * Replace deleted comment
2020-03-04 06:10:22 +00:00
block_root
} else {
*state
.get_block_root(epoch_boundary_slot)
.expect("should get target block root")
};
Fix head tracker concurrency bugs (#1771) ## Issue Addressed Closes #1557 ## Proposed Changes Modify the pruning algorithm so that it mutates the head-tracker _before_ committing the database transaction to disk, and _only if_ all the heads to be removed are still present in the head-tracker (i.e. no concurrent mutations). In the process of writing and testing this I also had to make a few other changes: * Use internal mutability for all `BeaconChainHarness` functions (namely the RNG and the graffiti), in order to enable parallel calls (see testing section below). * Disable logging in harness tests unless the `test_logger` feature is turned on And chose to make some clean-ups: * Delete the `NullMigrator` * Remove type-based configuration for the migrator in favour of runtime config (simpler, less duplicated code) * Use the non-blocking migrator unless the blocking migrator is required. In the store tests we need the blocking migrator because some tests make asserts about the state of the DB after the migration has run. * Rename `validators_keypairs` -> `validator_keypairs` in the `BeaconChainHarness` ## Testing To confirm that the fix worked, I wrote a test using [Hiatus](https://crates.io/crates/hiatus), which can be found here: https://github.com/michaelsproul/lighthouse/tree/hiatus-issue-1557 That test can't be merged because it inserts random breakpoints everywhere, but if you check out that branch you can run the test with: ``` $ cd beacon_node/beacon_chain $ cargo test --release --test parallel_tests --features test_logger ``` It should pass, and the log output should show: ``` WARN Pruning deferred because of a concurrent mutation, message: this is expected only very rarely! ``` ## Additional Info This is a backwards-compatible change with no impact on consensus.
2020-10-19 05:58:39 +00:00
state
.build_committee_cache(RelativeEpoch::Current, &harness.chain.spec)
.unwrap();
Faster attestation production (#838) * Start adding interop genesis state to lcli * Use more efficient method to generate genesis state * Remove duplicate int_to_bytes32 * Add lcli command to change state genesis time * Add option to allow VC to start with unsynced BN * Set VC to do parallel key loading * Don't default to dummy eth1 backend * Add endpoint to dump operation pool * Add metrics for op pool * Remove state clone for slot notifier * Add mem size approximation for tree hash cache * Avoid cloning tree hash when getting head * Avoid cloning tree hash when getting head * Add working arena-based cached tree hash * Add another benchmark * Add pre-allocation for caches * Make cache nullable * Fix bugs in cache tree hash * Add validator tree hash optimization * Optimize hash_concat * Make hash32_concat return fixed-len array * Fix failing API tests * Add new beacon state cache struct * Add validator-specific cache * Separate list and values arenas * Add parallel validator registry hashing * Remove MultiTreeHashCache * Remove cached tree hash macro * Fix failing tree hash test * Address Michael's comments * Add CachedTreeHash impl for ef tests * Fix messy merge conflict * Optimize attestation production * Add first basic optimizations * Fix SlotOutOfBounds error * Resolved missed merge conflicts * Fix another missed merge conflict * Fix more merge conflict issues * Add `StateSkipConfig` * Fix test compile errors * Add failing test * Fix bug, make tests pass * Add comment * Delete unused function * Replace deleted comment
2020-03-04 06:10:22 +00:00
let committee_cache = state
.committee_cache(RelativeEpoch::Current)
.expect("should get committee_cache");
let committee_count = committee_cache.committees_per_slot();
for index in 0..committee_count {
let committee_len = committee_cache
.get_beacon_committee(slot, index)
.expect("should get committee for slot")
.committee
.len();
let attestation = chain
Add attestation gossip pre-verification (#983) * Add PH & MS slot clock changes * Account for genesis time * Add progress on duties refactor * Add simple is_aggregator bool to val subscription * Start work on attestation_verification.rs * Add progress on ObservedAttestations * Progress with ObservedAttestations * Fix tests * Add observed attestations to the beacon chain * Add attestation observation to processing code * Add progress on attestation verification * Add first draft of ObservedAttesters * Add more tests * Add observed attesters to beacon chain * Add observers to attestation processing * Add more attestation verification * Create ObservedAggregators map * Remove commented-out code * Add observed aggregators into chain * Add progress * Finish adding features to attestation verification * Ensure beacon chain compiles * Link attn verification into chain * Integrate new attn verification in chain * Remove old attestation processing code * Start trying to fix beacon_chain tests * Split adding into pools into two functions * Add aggregation to harness * Get test harness working again * Adjust the number of aggregators for test harness * Fix edge-case in harness * Integrate new attn processing in network * Fix compile bug in validator_client * Update validator API endpoints * Fix aggreagation in test harness * Fix enum thing * Fix attestation observation bug: * Patch failing API tests * Start adding comments to attestation verification * Remove unused attestation field * Unify "is block known" logic * Update comments * Supress fork choice errors for network processing * Add todos * Tidy * Add gossip attn tests * Disallow test harness to produce old attns * Comment out in-progress tests * Partially address pruning tests * Fix failing store test * Add aggregate tests * Add comments about which spec conditions we check * Dont re-aggregate * Split apart test harness attn production * Fix compile error in network * Make progress on commented-out test * Fix skipping attestation test * Add fork choice verification tests * Tidy attn tests, remove dead code * Remove some accidentally added code * Fix clippy lint * Rename test file * Add block tests, add cheap block proposer check * Rename block testing file * Add observed_block_producers * Tidy * Switch around block signature verification * Finish block testing * Remove gossip from signature tests * First pass of self review * Fix deviation in spec * Update test spec tags * Start moving over to hashset * Finish moving observed attesters to hashmap * Move aggregation pool over to hashmap * Make fc attn borrow again * Fix rest_api compile error * Fix missing comments * Fix monster test * Uncomment increasing slots test * Address remaining comments * Remove unsafe, use cfg test * Remove cfg test flag * Fix dodgy comment * Ignore aggregates that are already known. * Unify aggregator modulo logic * Fix typo in logs * Refactor validator subscription logic * Avoid reproducing selection proof * Skip HTTP call if no subscriptions * Rename DutyAndState -> DutyAndProof * Tidy logs * Print root as dbg * Fix compile errors in tests * Fix compile error in test
2020-05-06 11:42:56 +00:00
.produce_unaggregated_attestation(slot, index)
Faster attestation production (#838) * Start adding interop genesis state to lcli * Use more efficient method to generate genesis state * Remove duplicate int_to_bytes32 * Add lcli command to change state genesis time * Add option to allow VC to start with unsynced BN * Set VC to do parallel key loading * Don't default to dummy eth1 backend * Add endpoint to dump operation pool * Add metrics for op pool * Remove state clone for slot notifier * Add mem size approximation for tree hash cache * Avoid cloning tree hash when getting head * Avoid cloning tree hash when getting head * Add working arena-based cached tree hash * Add another benchmark * Add pre-allocation for caches * Make cache nullable * Fix bugs in cache tree hash * Add validator tree hash optimization * Optimize hash_concat * Make hash32_concat return fixed-len array * Fix failing API tests * Add new beacon state cache struct * Add validator-specific cache * Separate list and values arenas * Add parallel validator registry hashing * Remove MultiTreeHashCache * Remove cached tree hash macro * Fix failing tree hash test * Address Michael's comments * Add CachedTreeHash impl for ef tests * Fix messy merge conflict * Optimize attestation production * Add first basic optimizations * Fix SlotOutOfBounds error * Resolved missed merge conflicts * Fix another missed merge conflict * Fix more merge conflict issues * Add `StateSkipConfig` * Fix test compile errors * Add failing test * Fix bug, make tests pass * Add comment * Delete unused function * Replace deleted comment
2020-03-04 06:10:22 +00:00
.expect("should produce attestation");
let data = &attestation.data;
assert_eq!(
attestation.aggregation_bits.len(),
committee_len,
"bad committee len"
);
assert!(
attestation.aggregation_bits.is_zero(),
"some committee bits are set"
);
assert_eq!(
attestation.signature,
AggregateSignature::empty(),
Faster attestation production (#838) * Start adding interop genesis state to lcli * Use more efficient method to generate genesis state * Remove duplicate int_to_bytes32 * Add lcli command to change state genesis time * Add option to allow VC to start with unsynced BN * Set VC to do parallel key loading * Don't default to dummy eth1 backend * Add endpoint to dump operation pool * Add metrics for op pool * Remove state clone for slot notifier * Add mem size approximation for tree hash cache * Avoid cloning tree hash when getting head * Avoid cloning tree hash when getting head * Add working arena-based cached tree hash * Add another benchmark * Add pre-allocation for caches * Make cache nullable * Fix bugs in cache tree hash * Add validator tree hash optimization * Optimize hash_concat * Make hash32_concat return fixed-len array * Fix failing API tests * Add new beacon state cache struct * Add validator-specific cache * Separate list and values arenas * Add parallel validator registry hashing * Remove MultiTreeHashCache * Remove cached tree hash macro * Fix failing tree hash test * Address Michael's comments * Add CachedTreeHash impl for ef tests * Fix messy merge conflict * Optimize attestation production * Add first basic optimizations * Fix SlotOutOfBounds error * Resolved missed merge conflicts * Fix another missed merge conflict * Fix more merge conflict issues * Add `StateSkipConfig` * Fix test compile errors * Add failing test * Fix bug, make tests pass * Add comment * Delete unused function * Replace deleted comment
2020-03-04 06:10:22 +00:00
"bad signature"
);
assert_eq!(data.index, index, "bad index");
assert_eq!(data.slot, slot, "bad slot");
assert_eq!(data.beacon_block_root, block_root, "bad block root");
assert_eq!(
data.source,
state.current_justified_checkpoint(),
Faster attestation production (#838) * Start adding interop genesis state to lcli * Use more efficient method to generate genesis state * Remove duplicate int_to_bytes32 * Add lcli command to change state genesis time * Add option to allow VC to start with unsynced BN * Set VC to do parallel key loading * Don't default to dummy eth1 backend * Add endpoint to dump operation pool * Add metrics for op pool * Remove state clone for slot notifier * Add mem size approximation for tree hash cache * Avoid cloning tree hash when getting head * Avoid cloning tree hash when getting head * Add working arena-based cached tree hash * Add another benchmark * Add pre-allocation for caches * Make cache nullable * Fix bugs in cache tree hash * Add validator tree hash optimization * Optimize hash_concat * Make hash32_concat return fixed-len array * Fix failing API tests * Add new beacon state cache struct * Add validator-specific cache * Separate list and values arenas * Add parallel validator registry hashing * Remove MultiTreeHashCache * Remove cached tree hash macro * Fix failing tree hash test * Address Michael's comments * Add CachedTreeHash impl for ef tests * Fix messy merge conflict * Optimize attestation production * Add first basic optimizations * Fix SlotOutOfBounds error * Resolved missed merge conflicts * Fix another missed merge conflict * Fix more merge conflict issues * Add `StateSkipConfig` * Fix test compile errors * Add failing test * Fix bug, make tests pass * Add comment * Delete unused function * Replace deleted comment
2020-03-04 06:10:22 +00:00
"bad source"
);
assert_eq!(
data.source,
state.current_justified_checkpoint(),
Faster attestation production (#838) * Start adding interop genesis state to lcli * Use more efficient method to generate genesis state * Remove duplicate int_to_bytes32 * Add lcli command to change state genesis time * Add option to allow VC to start with unsynced BN * Set VC to do parallel key loading * Don't default to dummy eth1 backend * Add endpoint to dump operation pool * Add metrics for op pool * Remove state clone for slot notifier * Add mem size approximation for tree hash cache * Avoid cloning tree hash when getting head * Avoid cloning tree hash when getting head * Add working arena-based cached tree hash * Add another benchmark * Add pre-allocation for caches * Make cache nullable * Fix bugs in cache tree hash * Add validator tree hash optimization * Optimize hash_concat * Make hash32_concat return fixed-len array * Fix failing API tests * Add new beacon state cache struct * Add validator-specific cache * Separate list and values arenas * Add parallel validator registry hashing * Remove MultiTreeHashCache * Remove cached tree hash macro * Fix failing tree hash test * Address Michael's comments * Add CachedTreeHash impl for ef tests * Fix messy merge conflict * Optimize attestation production * Add first basic optimizations * Fix SlotOutOfBounds error * Resolved missed merge conflicts * Fix another missed merge conflict * Fix more merge conflict issues * Add `StateSkipConfig` * Fix test compile errors * Add failing test * Fix bug, make tests pass * Add comment * Delete unused function * Replace deleted comment
2020-03-04 06:10:22 +00:00
"bad source"
);
assert_eq!(data.target.epoch, state.current_epoch(), "bad target epoch");
assert_eq!(data.target.root, target_root, "bad target root");
Add early attester cache (#2872) ## Issue Addressed NA ## Proposed Changes Introduces a cache to attestation to produce atop blocks which will become the head, but are not fully imported (e.g., not inserted into the database). Whilst attesting to a block before it's imported is rather easy, if we're going to produce that attestation then we also need to be able to: 1. Verify that attestation. 1. Respond to RPC requests for the `beacon_block_root`. Attestation verification (1) is *partially* covered. Since we prime the shuffling cache before we insert the block into the early attester cache, we should be fine for all typical use-cases. However, it is possible that the cache is washed out before we've managed to insert the state into the database and then attestation verification will fail with a "missing beacon state"-type error. Providing the block via RPC (2) is also partially covered, since we'll check the database *and* the early attester cache when responding a blocks-by-root request. However, we'll still omit the block from blocks-by-range requests (until the block lands in the DB). I *think* this is fine, since there's no guarantee that we return all blocks for those responses. Another important consideration is whether or not the *parent* of the early attester block is available in the databse. If it were not, we might fail to respond to blocks-by-root request that are iterating backwards to collect a chain of blocks. I argue that *we will always have the parent of the early attester block in the database.* This is because we are holding the fork-choice write-lock when inserting the block into the early attester cache and we do not drop that until the block is in the database.
2022-01-11 01:35:55 +00:00
let rpc_block =
2023-07-28 08:27:38 +00:00
RpcBlock::<MainnetEthSpec>::new(Arc::new(block.clone()), Some(blobs.clone()))
.unwrap();
let beacon_chain::data_availability_checker::MaybeAvailableBlock::Available(available_block) = chain
Rework block processing (#4092) * introduce availability pending block * add intoavailableblock trait * small fixes * add 'gossip blob cache' and start to clean up processing and transition types * shard memory blob cache * Initial commit * Fix after rebase * Add gossip verification conditions * cache cleanup * general chaos * extended chaos * cargo fmt * more progress * more progress * tons of changes, just tryna compile * everything, everywhere, all at once * Reprocess an ExecutedBlock on unavailable blobs * Add sus gossip verification for blobs * Merge stuff * Remove reprocessing cache stuff * lint * Add a wrapper to allow construction of only valid `AvailableBlock`s * rename blob arc list to blob list * merge cleanuo * Revert "merge cleanuo" This reverts commit 5e98326878c77528d0c4668c5a4db4a4b0fbaeaa. * Revert "Revert "merge cleanuo"" This reverts commit 3a4009443a5812b3028abe855079307436dc5419. * fix rpc methods * move beacon block and blob to eth2/types * rename gossip blob cache to data availability checker * lots of changes * fix some compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * cargo fmt * use a common data structure for block import types * fix availability check on proposal import * refactor the blob cache and split the block wrapper into two types * add type conversion for signed block and block wrapper * fix beacon chain tests and do some renaming, add some comments * Partial processing (#4) * move beacon block and blob to eth2/types * rename gossip blob cache to data availability checker * lots of changes * fix some compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * cargo fmt * use a common data structure for block import types * fix availability check on proposal import * refactor the blob cache and split the block wrapper into two types * add type conversion for signed block and block wrapper * fix beacon chain tests and do some renaming, add some comments * cargo update (#6) --------- Co-authored-by: realbigsean <sean@sigmaprime.io> Co-authored-by: realbigsean <seananderson33@gmail.com>
2023-03-24 21:30:41 +00:00
.data_availability_checker
.check_rpc_block_availability(rpc_block)
Single blob lookups (#4152) * some blob reprocessing work * remove ForceBlockLookup * reorder enum match arms in sync manager * a lot more reprocessing work * impl logic for triggerng blob lookups along with block lookups * deal with rpc blobs in groups per block in the da checker. don't cache missing blob ids in the da checker. * make single block lookup generic * more work * add delayed processing logic and combine some requests * start fixing some compile errors * fix compilation in main block lookup mod * much work * get things compiling * parent blob lookups * fix compile * revert red/stevie changes * fix up sync manager delay message logic * add peer usefulness enum * should remove lookup refactor * consolidate retry error handling * improve peer scoring during certain failures in parent lookups * improve retry code * drop parent lookup if either req has a peer disconnect during download * refactor single block processed method * processing peer refactor * smol bugfix * fix some todos * fix lints * fix lints * fix compile in lookup tests * fix lints * fix lints * fix existing block lookup tests * renamings * fix after merge * cargo fmt * compilation fix in beacon chain tests * fix * refactor lookup tests to work with multiple forks and response types * make tests into macros * wrap availability check error * fix compile after merge * add random blobs * start fixing up lookup verify error handling * some bug fixes and the start of deneb only tests * make tests work for all forks * track information about peer source * error refactoring * improve peer scoring * fix test compilation * make sure blobs are sent for processing after stream termination, delete copied tests * add some tests and fix a bug * smol bugfixes and moar tests * add tests and fix some things * compile after merge * lots of refactoring * retry on invalid block/blob * merge unknown parent messages before current slot lookup * get tests compiling * penalize blob peer on invalid blobs * Check disk on in-memory cache miss * Update beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs * Update beacon_node/network/src/sync/network_context.rs Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com> * fix bug in matching blocks and blobs in range sync * pr feedback * fix conflicts * upgrade logs from warn to crit when we receive incorrect response in range * synced_and_connected_within_tolerance -> should_search_for_block * remove todo * Fix Broken Overflow Tests * fix merge conflicts * checkpoint sync without alignment * add import * query for checkpoint state by slot rather than state root (teku doesn't serve by state root) * get state first and query by most recent block root * simplify delay logic * rename unknown parent sync message variants * rename parameter, block_slot -> slot * add some docs to the lookup module * use interval instead of sleep * drop request if blocks and blobs requests both return `None` for `Id` * clean up `find_single_lookup` logic * add lookup source enum * clean up `find_single_lookup` logic * add docs to find_single_lookup_request * move LookupSource our of param where unnecessary * remove unnecessary todo * query for block by `state.latest_block_header.slot` * fix lint * fix test * fix test * fix observed blob sidecars test * PR updates * use optional params instead of a closure * create lookup and trigger request in separate method calls * remove `LookupSource` * make sure duplicate lookups are not dropped --------- Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com> Co-authored-by: Mark Mackey <mark@sigmaprime.io> Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>
2023-06-15 16:59:10 +00:00
.unwrap()
else {
panic!("block should be available")
};
2023-01-26 21:18:29 +00:00
Add early attester cache (#2872) ## Issue Addressed NA ## Proposed Changes Introduces a cache to attestation to produce atop blocks which will become the head, but are not fully imported (e.g., not inserted into the database). Whilst attesting to a block before it's imported is rather easy, if we're going to produce that attestation then we also need to be able to: 1. Verify that attestation. 1. Respond to RPC requests for the `beacon_block_root`. Attestation verification (1) is *partially* covered. Since we prime the shuffling cache before we insert the block into the early attester cache, we should be fine for all typical use-cases. However, it is possible that the cache is washed out before we've managed to insert the state into the database and then attestation verification will fail with a "missing beacon state"-type error. Providing the block via RPC (2) is also partially covered, since we'll check the database *and* the early attester cache when responding a blocks-by-root request. However, we'll still omit the block from blocks-by-range requests (until the block lands in the DB). I *think* this is fine, since there's no guarantee that we return all blocks for those responses. Another important consideration is whether or not the *parent* of the early attester block is available in the databse. If it were not, we might fail to respond to blocks-by-root request that are iterating backwards to collect a chain of blocks. I argue that *we will always have the parent of the early attester block in the database.* This is because we are holding the fork-choice write-lock when inserting the block into the early attester cache and we do not drop that until the block is in the database.
2022-01-11 01:35:55 +00:00
let early_attestation = {
Use async code when interacting with EL (#3244) ## Overview This rather extensive PR achieves two primary goals: 1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state. 2. Refactors fork choice, block production and block processing to `async` functions. Additionally, it achieves: - Concurrent forkchoice updates to the EL and cache pruning after a new head is selected. - Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production. - Concurrent per-block-processing and execution payload verification during block processing. - The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?): - I had to do this to deal with sending blocks into spawned tasks. - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones. - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap. - Avoids cloning *all the blocks* in *every chain segment* during sync. - It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:) - The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs. For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273 ## Changes to `canonical_head` and `fork_choice` Previously, the `BeaconChain` had two separate fields: ``` canonical_head: RwLock<Snapshot>, fork_choice: RwLock<BeaconForkChoice> ``` Now, we have grouped these values under a single struct: ``` canonical_head: CanonicalHead { cached_head: RwLock<Arc<Snapshot>>, fork_choice: RwLock<BeaconForkChoice> } ``` Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously. ## Breaking Changes ### The `state` (root) field in the `finalized_checkpoint` SSE event Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event: 1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`. 4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots. Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1). I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku. ## Notes for Reviewers I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct. I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking". I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it. I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around. Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2. You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests: - Changing tests to be `tokio::async` tests. - Adding `.await` to fork choice, block processing and block production functions. - Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`. - Wrapping `SignedBeaconBlock` in an `Arc`. - In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant. I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic. Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
let proto_block = chain
.canonical_head
.fork_choice_read_lock()
.get_block(&block_root)
.unwrap();
Add early attester cache (#2872) ## Issue Addressed NA ## Proposed Changes Introduces a cache to attestation to produce atop blocks which will become the head, but are not fully imported (e.g., not inserted into the database). Whilst attesting to a block before it's imported is rather easy, if we're going to produce that attestation then we also need to be able to: 1. Verify that attestation. 1. Respond to RPC requests for the `beacon_block_root`. Attestation verification (1) is *partially* covered. Since we prime the shuffling cache before we insert the block into the early attester cache, we should be fine for all typical use-cases. However, it is possible that the cache is washed out before we've managed to insert the state into the database and then attestation verification will fail with a "missing beacon state"-type error. Providing the block via RPC (2) is also partially covered, since we'll check the database *and* the early attester cache when responding a blocks-by-root request. However, we'll still omit the block from blocks-by-range requests (until the block lands in the DB). I *think* this is fine, since there's no guarantee that we return all blocks for those responses. Another important consideration is whether or not the *parent* of the early attester block is available in the databse. If it were not, we might fail to respond to blocks-by-root request that are iterating backwards to collect a chain of blocks. I argue that *we will always have the parent of the early attester block in the database.* This is because we are holding the fork-choice write-lock when inserting the block into the early attester cache and we do not drop that until the block is in the database.
2022-01-11 01:35:55 +00:00
chain
.early_attester_cache
Use async code when interacting with EL (#3244) ## Overview This rather extensive PR achieves two primary goals: 1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state. 2. Refactors fork choice, block production and block processing to `async` functions. Additionally, it achieves: - Concurrent forkchoice updates to the EL and cache pruning after a new head is selected. - Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production. - Concurrent per-block-processing and execution payload verification during block processing. - The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?): - I had to do this to deal with sending blocks into spawned tasks. - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones. - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap. - Avoids cloning *all the blocks* in *every chain segment* during sync. - It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:) - The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs. For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273 ## Changes to `canonical_head` and `fork_choice` Previously, the `BeaconChain` had two separate fields: ``` canonical_head: RwLock<Snapshot>, fork_choice: RwLock<BeaconForkChoice> ``` Now, we have grouped these values under a single struct: ``` canonical_head: CanonicalHead { cached_head: RwLock<Arc<Snapshot>>, fork_choice: RwLock<BeaconForkChoice> } ``` Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously. ## Breaking Changes ### The `state` (root) field in the `finalized_checkpoint` SSE event Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event: 1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`. 4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots. Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1). I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku. ## Notes for Reviewers I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct. I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking". I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it. I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around. Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2. You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests: - Changing tests to be `tokio::async` tests. - Adding `.await` to fork choice, block processing and block production functions. - Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`. - Wrapping `SignedBeaconBlock` in an `Arc`. - In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant. I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic. Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
.add_head_block(
block_root,
Rework block processing (#4092) * introduce availability pending block * add intoavailableblock trait * small fixes * add 'gossip blob cache' and start to clean up processing and transition types * shard memory blob cache * Initial commit * Fix after rebase * Add gossip verification conditions * cache cleanup * general chaos * extended chaos * cargo fmt * more progress * more progress * tons of changes, just tryna compile * everything, everywhere, all at once * Reprocess an ExecutedBlock on unavailable blobs * Add sus gossip verification for blobs * Merge stuff * Remove reprocessing cache stuff * lint * Add a wrapper to allow construction of only valid `AvailableBlock`s * rename blob arc list to blob list * merge cleanuo * Revert "merge cleanuo" This reverts commit 5e98326878c77528d0c4668c5a4db4a4b0fbaeaa. * Revert "Revert "merge cleanuo"" This reverts commit 3a4009443a5812b3028abe855079307436dc5419. * fix rpc methods * move beacon block and blob to eth2/types * rename gossip blob cache to data availability checker * lots of changes * fix some compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * cargo fmt * use a common data structure for block import types * fix availability check on proposal import * refactor the blob cache and split the block wrapper into two types * add type conversion for signed block and block wrapper * fix beacon chain tests and do some renaming, add some comments * Partial processing (#4) * move beacon block and blob to eth2/types * rename gossip blob cache to data availability checker * lots of changes * fix some compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * cargo fmt * use a common data structure for block import types * fix availability check on proposal import * refactor the blob cache and split the block wrapper into two types * add type conversion for signed block and block wrapper * fix beacon chain tests and do some renaming, add some comments * cargo update (#6) --------- Co-authored-by: realbigsean <sean@sigmaprime.io> Co-authored-by: realbigsean <seananderson33@gmail.com>
2023-03-24 21:30:41 +00:00
available_block,
Use async code when interacting with EL (#3244) ## Overview This rather extensive PR achieves two primary goals: 1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state. 2. Refactors fork choice, block production and block processing to `async` functions. Additionally, it achieves: - Concurrent forkchoice updates to the EL and cache pruning after a new head is selected. - Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production. - Concurrent per-block-processing and execution payload verification during block processing. - The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?): - I had to do this to deal with sending blocks into spawned tasks. - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones. - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap. - Avoids cloning *all the blocks* in *every chain segment* during sync. - It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:) - The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs. For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273 ## Changes to `canonical_head` and `fork_choice` Previously, the `BeaconChain` had two separate fields: ``` canonical_head: RwLock<Snapshot>, fork_choice: RwLock<BeaconForkChoice> ``` Now, we have grouped these values under a single struct: ``` canonical_head: CanonicalHead { cached_head: RwLock<Arc<Snapshot>>, fork_choice: RwLock<BeaconForkChoice> } ``` Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously. ## Breaking Changes ### The `state` (root) field in the `finalized_checkpoint` SSE event Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event: 1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`. 4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots. Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1). I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku. ## Notes for Reviewers I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct. I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking". I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it. I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around. Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2. You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests: - Changing tests to be `tokio::async` tests. - Adding `.await` to fork choice, block processing and block production functions. - Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`. - Wrapping `SignedBeaconBlock` in an `Arc`. - In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant. I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic. Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
proto_block,
&state,
&chain.spec,
)
Add early attester cache (#2872) ## Issue Addressed NA ## Proposed Changes Introduces a cache to attestation to produce atop blocks which will become the head, but are not fully imported (e.g., not inserted into the database). Whilst attesting to a block before it's imported is rather easy, if we're going to produce that attestation then we also need to be able to: 1. Verify that attestation. 1. Respond to RPC requests for the `beacon_block_root`. Attestation verification (1) is *partially* covered. Since we prime the shuffling cache before we insert the block into the early attester cache, we should be fine for all typical use-cases. However, it is possible that the cache is washed out before we've managed to insert the state into the database and then attestation verification will fail with a "missing beacon state"-type error. Providing the block via RPC (2) is also partially covered, since we'll check the database *and* the early attester cache when responding a blocks-by-root request. However, we'll still omit the block from blocks-by-range requests (until the block lands in the DB). I *think* this is fine, since there's no guarantee that we return all blocks for those responses. Another important consideration is whether or not the *parent* of the early attester block is available in the databse. If it were not, we might fail to respond to blocks-by-root request that are iterating backwards to collect a chain of blocks. I argue that *we will always have the parent of the early attester block in the database.* This is because we are holding the fork-choice write-lock when inserting the block into the early attester cache and we do not drop that until the block is in the database.
2022-01-11 01:35:55 +00:00
.unwrap();
chain
.early_attester_cache
.try_attest(slot, index, &chain.spec)
.unwrap()
.unwrap()
};
assert_eq!(
attestation, early_attestation,
"early attester cache inconsistent"
);
Faster attestation production (#838) * Start adding interop genesis state to lcli * Use more efficient method to generate genesis state * Remove duplicate int_to_bytes32 * Add lcli command to change state genesis time * Add option to allow VC to start with unsynced BN * Set VC to do parallel key loading * Don't default to dummy eth1 backend * Add endpoint to dump operation pool * Add metrics for op pool * Remove state clone for slot notifier * Add mem size approximation for tree hash cache * Avoid cloning tree hash when getting head * Avoid cloning tree hash when getting head * Add working arena-based cached tree hash * Add another benchmark * Add pre-allocation for caches * Make cache nullable * Fix bugs in cache tree hash * Add validator tree hash optimization * Optimize hash_concat * Make hash32_concat return fixed-len array * Fix failing API tests * Add new beacon state cache struct * Add validator-specific cache * Separate list and values arenas * Add parallel validator registry hashing * Remove MultiTreeHashCache * Remove cached tree hash macro * Fix failing tree hash test * Address Michael's comments * Add CachedTreeHash impl for ef tests * Fix messy merge conflict * Optimize attestation production * Add first basic optimizations * Fix SlotOutOfBounds error * Resolved missed merge conflicts * Fix another missed merge conflict * Fix more merge conflict issues * Add `StateSkipConfig` * Fix test compile errors * Add failing test * Fix bug, make tests pass * Add comment * Delete unused function * Replace deleted comment
2020-03-04 06:10:22 +00:00
}
}
}
/// Ensures that the early attester cache wont create an attestation to a block in a later slot than
/// the one requested.
Use async code when interacting with EL (#3244) ## Overview This rather extensive PR achieves two primary goals: 1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state. 2. Refactors fork choice, block production and block processing to `async` functions. Additionally, it achieves: - Concurrent forkchoice updates to the EL and cache pruning after a new head is selected. - Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production. - Concurrent per-block-processing and execution payload verification during block processing. - The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?): - I had to do this to deal with sending blocks into spawned tasks. - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones. - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap. - Avoids cloning *all the blocks* in *every chain segment* during sync. - It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:) - The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs. For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273 ## Changes to `canonical_head` and `fork_choice` Previously, the `BeaconChain` had two separate fields: ``` canonical_head: RwLock<Snapshot>, fork_choice: RwLock<BeaconForkChoice> ``` Now, we have grouped these values under a single struct: ``` canonical_head: CanonicalHead { cached_head: RwLock<Arc<Snapshot>>, fork_choice: RwLock<BeaconForkChoice> } ``` Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously. ## Breaking Changes ### The `state` (root) field in the `finalized_checkpoint` SSE event Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event: 1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`. 4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots. Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1). I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku. ## Notes for Reviewers I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct. I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking". I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it. I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around. Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2. You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests: - Changing tests to be `tokio::async` tests. - Adding `.await` to fork choice, block processing and block production functions. - Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`. - Wrapping `SignedBeaconBlock` in an `Arc`. - In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant. I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic. Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
#[tokio::test]
async fn early_attester_cache_old_request() {
let harness = BeaconChainHarness::builder(MainnetEthSpec)
.default_spec()
.keypairs(KEYPAIRS[..].to_vec())
.fresh_ephemeral_store()
.mock_execution_layer()
.build();
harness.advance_slot();
Use async code when interacting with EL (#3244) ## Overview This rather extensive PR achieves two primary goals: 1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state. 2. Refactors fork choice, block production and block processing to `async` functions. Additionally, it achieves: - Concurrent forkchoice updates to the EL and cache pruning after a new head is selected. - Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production. - Concurrent per-block-processing and execution payload verification during block processing. - The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?): - I had to do this to deal with sending blocks into spawned tasks. - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones. - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap. - Avoids cloning *all the blocks* in *every chain segment* during sync. - It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:) - The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs. For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273 ## Changes to `canonical_head` and `fork_choice` Previously, the `BeaconChain` had two separate fields: ``` canonical_head: RwLock<Snapshot>, fork_choice: RwLock<BeaconForkChoice> ``` Now, we have grouped these values under a single struct: ``` canonical_head: CanonicalHead { cached_head: RwLock<Arc<Snapshot>>, fork_choice: RwLock<BeaconForkChoice> } ``` Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously. ## Breaking Changes ### The `state` (root) field in the `finalized_checkpoint` SSE event Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event: 1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`. 4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots. Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1). I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku. ## Notes for Reviewers I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct. I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking". I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it. I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around. Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2. You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests: - Changing tests to be `tokio::async` tests. - Adding `.await` to fork choice, block processing and block production functions. - Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`. - Wrapping `SignedBeaconBlock` in an `Arc`. - In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant. I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic. Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
harness
.extend_chain(
2,
BlockStrategy::OnCanonicalHead,
AttestationStrategy::AllValidators,
)
.await;
Use async code when interacting with EL (#3244) ## Overview This rather extensive PR achieves two primary goals: 1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state. 2. Refactors fork choice, block production and block processing to `async` functions. Additionally, it achieves: - Concurrent forkchoice updates to the EL and cache pruning after a new head is selected. - Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production. - Concurrent per-block-processing and execution payload verification during block processing. - The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?): - I had to do this to deal with sending blocks into spawned tasks. - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones. - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap. - Avoids cloning *all the blocks* in *every chain segment* during sync. - It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:) - The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs. For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273 ## Changes to `canonical_head` and `fork_choice` Previously, the `BeaconChain` had two separate fields: ``` canonical_head: RwLock<Snapshot>, fork_choice: RwLock<BeaconForkChoice> ``` Now, we have grouped these values under a single struct: ``` canonical_head: CanonicalHead { cached_head: RwLock<Arc<Snapshot>>, fork_choice: RwLock<BeaconForkChoice> } ``` Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously. ## Breaking Changes ### The `state` (root) field in the `finalized_checkpoint` SSE event Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event: 1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`. 4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots. Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1). I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku. ## Notes for Reviewers I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct. I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking". I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it. I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around. Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2. You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests: - Changing tests to be `tokio::async` tests. - Adding `.await` to fork choice, block processing and block production functions. - Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`. - Wrapping `SignedBeaconBlock` in an `Arc`. - In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant. I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic. Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
let head = harness.chain.head_snapshot();
assert_eq!(head.beacon_block.slot(), 2);
let head_proto_block = harness
.chain
Use async code when interacting with EL (#3244) ## Overview This rather extensive PR achieves two primary goals: 1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state. 2. Refactors fork choice, block production and block processing to `async` functions. Additionally, it achieves: - Concurrent forkchoice updates to the EL and cache pruning after a new head is selected. - Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production. - Concurrent per-block-processing and execution payload verification during block processing. - The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?): - I had to do this to deal with sending blocks into spawned tasks. - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones. - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap. - Avoids cloning *all the blocks* in *every chain segment* during sync. - It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:) - The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs. For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273 ## Changes to `canonical_head` and `fork_choice` Previously, the `BeaconChain` had two separate fields: ``` canonical_head: RwLock<Snapshot>, fork_choice: RwLock<BeaconForkChoice> ``` Now, we have grouped these values under a single struct: ``` canonical_head: CanonicalHead { cached_head: RwLock<Arc<Snapshot>>, fork_choice: RwLock<BeaconForkChoice> } ``` Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously. ## Breaking Changes ### The `state` (root) field in the `finalized_checkpoint` SSE event Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event: 1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`. 4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots. Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1). I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku. ## Notes for Reviewers I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct. I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking". I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it. I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around. Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2. You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests: - Changing tests to be `tokio::async` tests. - Adding `.await` to fork choice, block processing and block production functions. - Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`. - Wrapping `SignedBeaconBlock` in an `Arc`. - In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant. I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic. Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
.canonical_head
.fork_choice_read_lock()
.get_block(&head.beacon_block_root)
.unwrap();
Devnet6 (#4404) * some blob reprocessing work * remove ForceBlockLookup * reorder enum match arms in sync manager * a lot more reprocessing work * impl logic for triggerng blob lookups along with block lookups * deal with rpc blobs in groups per block in the da checker. don't cache missing blob ids in the da checker. * make single block lookup generic * more work * add delayed processing logic and combine some requests * start fixing some compile errors * fix compilation in main block lookup mod * much work * get things compiling * parent blob lookups * fix compile * revert red/stevie changes * fix up sync manager delay message logic * add peer usefulness enum * should remove lookup refactor * consolidate retry error handling * improve peer scoring during certain failures in parent lookups * improve retry code * drop parent lookup if either req has a peer disconnect during download * refactor single block processed method * processing peer refactor * smol bugfix * fix some todos * fix lints * fix lints * fix compile in lookup tests * fix lints * fix lints * fix existing block lookup tests * renamings * fix after merge * cargo fmt * compilation fix in beacon chain tests * fix * refactor lookup tests to work with multiple forks and response types * make tests into macros * wrap availability check error * fix compile after merge * add random blobs * start fixing up lookup verify error handling * some bug fixes and the start of deneb only tests * make tests work for all forks * track information about peer source * error refactoring * improve peer scoring * fix test compilation * make sure blobs are sent for processing after stream termination, delete copied tests * add some tests and fix a bug * smol bugfixes and moar tests * add tests and fix some things * compile after merge * lots of refactoring * retry on invalid block/blob * merge unknown parent messages before current slot lookup * get tests compiling * penalize blob peer on invalid blobs * Check disk on in-memory cache miss * Update beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs * Update beacon_node/network/src/sync/network_context.rs Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com> * fix bug in matching blocks and blobs in range sync * pr feedback * fix conflicts * upgrade logs from warn to crit when we receive incorrect response in range * synced_and_connected_within_tolerance -> should_search_for_block * remove todo * add data gas used and update excess data gas to u64 * Fix Broken Overflow Tests * payload verification with commitments * fix merge conflicts * restore payload file * Restore payload file * remove todo * add max blob commitments per block * c-kzg lib update * Fix ef tests * Abstract over minimal/mainnet spec in kzg crate * Start integrating new KZG * checkpoint sync without alignment * checkpoint sync without alignment * add import * add import * query for checkpoint state by slot rather than state root (teku doesn't serve by state root) * query for checkpoint state by slot rather than state root (teku doesn't serve by state root) * loosen check * get state first and query by most recent block root * Revert "loosen check" This reverts commit 069d13dd63aa794a3505db9f17bd1a6b73f0be81. * get state first and query by most recent block root * merge max blobs change * simplify delay logic * rename unknown parent sync message variants * rename parameter, block_slot -> slot * add some docs to the lookup module * use interval instead of sleep * drop request if blocks and blobs requests both return `None` for `Id` * clean up `find_single_lookup` logic * add lookup source enum * clean up `find_single_lookup` logic * add docs to find_single_lookup_request * move LookupSource our of param where unnecessary * remove unnecessary todo * query for block by `state.latest_block_header.slot` * fix lint * fix merge transition ef tests * fix test * fix test * fix observed blob sidecars test * Add some metrics (#33) * fix protocol limits for blobs by root * Update Engine API for 1:1 Structure Method * make beacon chain tests to fix devnet 6 changes * get ckzg working and fix some tests * fix remaining tests * fix lints * Fix KZG linking issues * remove unused dep * lockfile * test fixes * remove dbgs * remove unwrap * cleanup tx generator * small fixes * fixing fixes * more self reivew * more self review * refactor genesis header initialization * refactor mock el instantiations * fix compile * fix network test, make sure they run for each fork * pr feedback * fix last test (hopefully) --------- Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com> Co-authored-by: Mark Mackey <mark@sigmaprime.io> Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2023-06-29 19:35:43 +00:00
let head_blobs = harness
.chain
.get_blobs(&head.beacon_block_root)
.expect("should get blobs");
2023-07-28 08:27:38 +00:00
let rpc_block =
RpcBlock::<MainnetEthSpec>::new(head.beacon_block.clone(), Some(head_blobs)).unwrap();
let beacon_chain::data_availability_checker::MaybeAvailableBlock::Available(available_block) = harness.chain
Rework block processing (#4092) * introduce availability pending block * add intoavailableblock trait * small fixes * add 'gossip blob cache' and start to clean up processing and transition types * shard memory blob cache * Initial commit * Fix after rebase * Add gossip verification conditions * cache cleanup * general chaos * extended chaos * cargo fmt * more progress * more progress * tons of changes, just tryna compile * everything, everywhere, all at once * Reprocess an ExecutedBlock on unavailable blobs * Add sus gossip verification for blobs * Merge stuff * Remove reprocessing cache stuff * lint * Add a wrapper to allow construction of only valid `AvailableBlock`s * rename blob arc list to blob list * merge cleanuo * Revert "merge cleanuo" This reverts commit 5e98326878c77528d0c4668c5a4db4a4b0fbaeaa. * Revert "Revert "merge cleanuo"" This reverts commit 3a4009443a5812b3028abe855079307436dc5419. * fix rpc methods * move beacon block and blob to eth2/types * rename gossip blob cache to data availability checker * lots of changes * fix some compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * cargo fmt * use a common data structure for block import types * fix availability check on proposal import * refactor the blob cache and split the block wrapper into two types * add type conversion for signed block and block wrapper * fix beacon chain tests and do some renaming, add some comments * Partial processing (#4) * move beacon block and blob to eth2/types * rename gossip blob cache to data availability checker * lots of changes * fix some compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * cargo fmt * use a common data structure for block import types * fix availability check on proposal import * refactor the blob cache and split the block wrapper into two types * add type conversion for signed block and block wrapper * fix beacon chain tests and do some renaming, add some comments * cargo update (#6) --------- Co-authored-by: realbigsean <sean@sigmaprime.io> Co-authored-by: realbigsean <seananderson33@gmail.com>
2023-03-24 21:30:41 +00:00
.data_availability_checker
.check_rpc_block_availability(rpc_block)
Single blob lookups (#4152) * some blob reprocessing work * remove ForceBlockLookup * reorder enum match arms in sync manager * a lot more reprocessing work * impl logic for triggerng blob lookups along with block lookups * deal with rpc blobs in groups per block in the da checker. don't cache missing blob ids in the da checker. * make single block lookup generic * more work * add delayed processing logic and combine some requests * start fixing some compile errors * fix compilation in main block lookup mod * much work * get things compiling * parent blob lookups * fix compile * revert red/stevie changes * fix up sync manager delay message logic * add peer usefulness enum * should remove lookup refactor * consolidate retry error handling * improve peer scoring during certain failures in parent lookups * improve retry code * drop parent lookup if either req has a peer disconnect during download * refactor single block processed method * processing peer refactor * smol bugfix * fix some todos * fix lints * fix lints * fix compile in lookup tests * fix lints * fix lints * fix existing block lookup tests * renamings * fix after merge * cargo fmt * compilation fix in beacon chain tests * fix * refactor lookup tests to work with multiple forks and response types * make tests into macros * wrap availability check error * fix compile after merge * add random blobs * start fixing up lookup verify error handling * some bug fixes and the start of deneb only tests * make tests work for all forks * track information about peer source * error refactoring * improve peer scoring * fix test compilation * make sure blobs are sent for processing after stream termination, delete copied tests * add some tests and fix a bug * smol bugfixes and moar tests * add tests and fix some things * compile after merge * lots of refactoring * retry on invalid block/blob * merge unknown parent messages before current slot lookup * get tests compiling * penalize blob peer on invalid blobs * Check disk on in-memory cache miss * Update beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs * Update beacon_node/network/src/sync/network_context.rs Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com> * fix bug in matching blocks and blobs in range sync * pr feedback * fix conflicts * upgrade logs from warn to crit when we receive incorrect response in range * synced_and_connected_within_tolerance -> should_search_for_block * remove todo * Fix Broken Overflow Tests * fix merge conflicts * checkpoint sync without alignment * add import * query for checkpoint state by slot rather than state root (teku doesn't serve by state root) * get state first and query by most recent block root * simplify delay logic * rename unknown parent sync message variants * rename parameter, block_slot -> slot * add some docs to the lookup module * use interval instead of sleep * drop request if blocks and blobs requests both return `None` for `Id` * clean up `find_single_lookup` logic * add lookup source enum * clean up `find_single_lookup` logic * add docs to find_single_lookup_request * move LookupSource our of param where unnecessary * remove unnecessary todo * query for block by `state.latest_block_header.slot` * fix lint * fix test * fix test * fix observed blob sidecars test * PR updates * use optional params instead of a closure * create lookup and trigger request in separate method calls * remove `LookupSource` * make sure duplicate lookups are not dropped --------- Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com> Co-authored-by: Mark Mackey <mark@sigmaprime.io> Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>
2023-06-15 16:59:10 +00:00
.unwrap()
else {
panic!("block should be available")
};
2023-01-26 21:18:29 +00:00
harness
.chain
.early_attester_cache
.add_head_block(
head.beacon_block_root,
Rework block processing (#4092) * introduce availability pending block * add intoavailableblock trait * small fixes * add 'gossip blob cache' and start to clean up processing and transition types * shard memory blob cache * Initial commit * Fix after rebase * Add gossip verification conditions * cache cleanup * general chaos * extended chaos * cargo fmt * more progress * more progress * tons of changes, just tryna compile * everything, everywhere, all at once * Reprocess an ExecutedBlock on unavailable blobs * Add sus gossip verification for blobs * Merge stuff * Remove reprocessing cache stuff * lint * Add a wrapper to allow construction of only valid `AvailableBlock`s * rename blob arc list to blob list * merge cleanuo * Revert "merge cleanuo" This reverts commit 5e98326878c77528d0c4668c5a4db4a4b0fbaeaa. * Revert "Revert "merge cleanuo"" This reverts commit 3a4009443a5812b3028abe855079307436dc5419. * fix rpc methods * move beacon block and blob to eth2/types * rename gossip blob cache to data availability checker * lots of changes * fix some compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * cargo fmt * use a common data structure for block import types * fix availability check on proposal import * refactor the blob cache and split the block wrapper into two types * add type conversion for signed block and block wrapper * fix beacon chain tests and do some renaming, add some comments * Partial processing (#4) * move beacon block and blob to eth2/types * rename gossip blob cache to data availability checker * lots of changes * fix some compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * fix compilation issues * cargo fmt * use a common data structure for block import types * fix availability check on proposal import * refactor the blob cache and split the block wrapper into two types * add type conversion for signed block and block wrapper * fix beacon chain tests and do some renaming, add some comments * cargo update (#6) --------- Co-authored-by: realbigsean <sean@sigmaprime.io> Co-authored-by: realbigsean <seananderson33@gmail.com>
2023-03-24 21:30:41 +00:00
available_block,
head_proto_block,
&head.beacon_state,
&harness.chain.spec,
)
.unwrap();
let attest_slot = head.beacon_block.slot() - 1;
let attestation = harness
.chain
.produce_unaggregated_attestation(attest_slot, 0)
.unwrap();
assert_eq!(attestation.data.slot, attest_slot);
let attested_block = harness
.chain
.get_blinded_block(&attestation.data.beacon_block_root)
.unwrap()
.unwrap();
assert_eq!(attested_block.slot(), attest_slot);
}