b06559ae97
## Issue Addressed The non-finality period on Pyrmont between epochs [`9114`](https://pyrmont.beaconcha.in/epoch/9114) and [`9182`](https://pyrmont.beaconcha.in/epoch/9182) was contributed to by all the `lighthouse_team` validators going down. The nodes saw excessive CPU and RAM usage, resulting in the system to kill the `lighthouse bn` process. The `Restart=on-failure` directive for `systemd` caused the process to bounce in ~10-30m intervals. Diagnosis with `heaptrack` showed that the `BeaconChain::produce_unaggregated_attestation` function was calling `store::beacon_state::get_full_state` and sometimes resulting in a tree hash cache allocation. These allocations were approximately the size of the hosts physical memory and still allocated when `lighthouse bn` was killed by the OS. There was no CPU analysis (e.g., `perf`), but the `BeaconChain::produce_unaggregated_attestation` is very CPU-heavy so it is reasonable to assume it is the cause of the excessive CPU usage, too. ## Proposed Changes `BeaconChain::produce_unaggregated_attestation` has two paths: 1. Fast path: attesting to the head slot or later. 2. Slow path: attesting to a slot earlier than the head block. Path (2) is the only path that calls `store::beacon_state::get_full_state`, therefore it is the path causing this excessive CPU/RAM usage. This PR removes the current functionality of path (2) and replaces it with a static error (`BeaconChainError::AttestingPriorToHead`). This change reduces the generality of `BeaconChain::produce_unaggregated_attestation` (and therefore [`/eth/v1/validator/attestation_data`](https://ethereum.github.io/eth2.0-APIs/#/Validator/produceAttestationData)), but I argue that this functionality is an edge-case and arguably a violation of the [Honest Validator spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md). It's possible that a validator goes back to a prior slot to "catch up" and submit some missed attestations. This change would prevent such behaviour, returning an error. My concerns with this catch-up behaviour is that it is: - Not specified as "honest validator" attesting behaviour. - Is behaviour that is risky for slashing (although, all validator clients *should* have slashing protection and will eventually fail if they do not). - It disguises clock-sync issues between a BN and VC. ## Additional Info It's likely feasible to implement path (2) if we implement some sort of caching mechanism. This would be a multi-week task and this PR gets the issue patched in the short term. I haven't created an issue to add path (2), instead I think we should implement it if we get user-demand.
131 lines
4.3 KiB
Rust
131 lines
4.3 KiB
Rust
#![cfg(not(debug_assertions))]
|
|
|
|
#[macro_use]
|
|
extern crate lazy_static;
|
|
|
|
use beacon_chain::{
|
|
test_utils::{AttestationStrategy, BeaconChainHarness, BlockStrategy},
|
|
StateSkipConfig,
|
|
};
|
|
use store::config::StoreConfig;
|
|
use tree_hash::TreeHash;
|
|
use types::{AggregateSignature, EthSpec, Keypair, MainnetEthSpec, RelativeEpoch, Slot};
|
|
|
|
pub const VALIDATOR_COUNT: usize = 16;
|
|
|
|
lazy_static! {
|
|
/// A cached set of keys.
|
|
static ref KEYPAIRS: Vec<Keypair> = types::test_utils::generate_deterministic_keypairs(VALIDATOR_COUNT);
|
|
}
|
|
|
|
/// This test builds a chain that is just long enough to finalize an epoch then it produces an
|
|
/// attestation at each slot from genesis through to three epochs past the head.
|
|
///
|
|
/// It checks the produced attestation against some locally computed values.
|
|
#[test]
|
|
fn produces_attestations() {
|
|
let num_blocks_produced = MainnetEthSpec::slots_per_epoch() * 4;
|
|
let additional_slots_tested = MainnetEthSpec::slots_per_epoch() * 3;
|
|
|
|
let harness = BeaconChainHarness::new_with_store_config(
|
|
MainnetEthSpec,
|
|
KEYPAIRS[..].to_vec(),
|
|
StoreConfig::default(),
|
|
);
|
|
|
|
let chain = &harness.chain;
|
|
|
|
// Test all valid committee indices for all slots in the chain.
|
|
// for slot in 0..=current_slot.as_u64() + MainnetEthSpec::slots_per_epoch() * 3 {
|
|
for slot in 0..=num_blocks_produced + additional_slots_tested {
|
|
if slot > 0 && slot <= num_blocks_produced {
|
|
harness.advance_slot();
|
|
|
|
harness.extend_chain(
|
|
1,
|
|
BlockStrategy::OnCanonicalHead,
|
|
AttestationStrategy::AllValidators,
|
|
);
|
|
}
|
|
|
|
let slot = Slot::from(slot);
|
|
let mut state = chain
|
|
.state_at_slot(slot, StateSkipConfig::WithStateRoots)
|
|
.expect("should get state");
|
|
|
|
let block_slot = if slot <= num_blocks_produced {
|
|
slot
|
|
} else {
|
|
Slot::from(num_blocks_produced)
|
|
};
|
|
|
|
let block = chain
|
|
.block_at_slot(block_slot)
|
|
.expect("should get block")
|
|
.expect("block should not be skipped");
|
|
let block_root = block.message.tree_hash_root();
|
|
|
|
let epoch_boundary_slot = state
|
|
.current_epoch()
|
|
.start_slot(MainnetEthSpec::slots_per_epoch());
|
|
let target_root = if state.slot == epoch_boundary_slot {
|
|
block_root
|
|
} else {
|
|
*state
|
|
.get_block_root(epoch_boundary_slot)
|
|
.expect("should get target block root")
|
|
};
|
|
|
|
state
|
|
.build_committee_cache(RelativeEpoch::Current, &harness.chain.spec)
|
|
.unwrap();
|
|
let committee_cache = state
|
|
.committee_cache(RelativeEpoch::Current)
|
|
.expect("should get committee_cache");
|
|
|
|
let committee_count = committee_cache.committees_per_slot();
|
|
|
|
for index in 0..committee_count {
|
|
let committee_len = committee_cache
|
|
.get_beacon_committee(slot, index)
|
|
.expect("should get committee for slot")
|
|
.committee
|
|
.len();
|
|
|
|
let attestation = chain
|
|
.produce_unaggregated_attestation(slot, index)
|
|
.expect("should produce attestation");
|
|
|
|
let data = &attestation.data;
|
|
|
|
assert_eq!(
|
|
attestation.aggregation_bits.len(),
|
|
committee_len,
|
|
"bad committee len"
|
|
);
|
|
assert!(
|
|
attestation.aggregation_bits.is_zero(),
|
|
"some committee bits are set"
|
|
);
|
|
assert_eq!(
|
|
attestation.signature,
|
|
AggregateSignature::empty(),
|
|
"bad signature"
|
|
);
|
|
assert_eq!(data.index, index, "bad index");
|
|
assert_eq!(data.slot, slot, "bad slot");
|
|
assert_eq!(data.beacon_block_root, block_root, "bad block root");
|
|
assert_eq!(
|
|
data.source, state.current_justified_checkpoint,
|
|
"bad source"
|
|
);
|
|
assert_eq!(
|
|
data.source, state.current_justified_checkpoint,
|
|
"bad source"
|
|
);
|
|
assert_eq!(data.target.epoch, state.current_epoch(), "bad target epoch");
|
|
assert_eq!(data.target.root, target_root, "bad target root");
|
|
}
|
|
}
|
|
}
|