lighthouse/beacon_node/beacon_chain/tests/attestation_production.rs
Paul Hauner b06559ae97 Disallow attestation production earlier than head (#2130)
## Issue Addressed

The non-finality period on Pyrmont between epochs [`9114`](https://pyrmont.beaconcha.in/epoch/9114) and [`9182`](https://pyrmont.beaconcha.in/epoch/9182) was contributed to by all the `lighthouse_team` validators going down. The nodes saw excessive CPU and RAM usage, resulting in the system to kill the `lighthouse bn` process. The `Restart=on-failure` directive for `systemd` caused the process to bounce in ~10-30m intervals.

Diagnosis with `heaptrack` showed that the `BeaconChain::produce_unaggregated_attestation` function was calling `store::beacon_state::get_full_state` and sometimes resulting in a tree hash cache allocation. These allocations were approximately the size of the hosts physical memory and still allocated when `lighthouse bn` was killed by the OS.

There was no CPU analysis (e.g., `perf`), but the `BeaconChain::produce_unaggregated_attestation` is very CPU-heavy so it is reasonable to assume it is the cause of the excessive CPU usage, too.

## Proposed Changes

`BeaconChain::produce_unaggregated_attestation` has two paths:

1. Fast path: attesting to the head slot or later.
2. Slow path: attesting to a slot earlier than the head block.

Path (2) is the only path that calls `store::beacon_state::get_full_state`, therefore it is the path causing this excessive CPU/RAM usage.

This PR removes the current functionality of path (2) and replaces it with a static error (`BeaconChainError::AttestingPriorToHead`).

This change reduces the generality of `BeaconChain::produce_unaggregated_attestation` (and therefore [`/eth/v1/validator/attestation_data`](https://ethereum.github.io/eth2.0-APIs/#/Validator/produceAttestationData)), but I argue that this functionality is an edge-case and arguably a violation of the [Honest Validator spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md).

It's possible that a validator goes back to a prior slot to "catch up" and submit some missed attestations. This change would prevent such behaviour, returning an error. My concerns with this catch-up behaviour is that it is:

- Not specified as "honest validator" attesting behaviour.
- Is behaviour that is risky for slashing (although, all validator clients *should* have slashing protection and will eventually fail if they do not).
- It disguises clock-sync issues between a BN and VC.

## Additional Info

It's likely feasible to implement path (2) if we implement some sort of caching mechanism. This would be a multi-week task and this PR gets the issue patched in the short term. I haven't created an issue to add path (2), instead I think we should implement it if we get user-demand.
2021-01-20 06:52:37 +00:00

131 lines
4.3 KiB
Rust

#![cfg(not(debug_assertions))]
#[macro_use]
extern crate lazy_static;
use beacon_chain::{
test_utils::{AttestationStrategy, BeaconChainHarness, BlockStrategy},
StateSkipConfig,
};
use store::config::StoreConfig;
use tree_hash::TreeHash;
use types::{AggregateSignature, EthSpec, Keypair, MainnetEthSpec, RelativeEpoch, Slot};
pub const VALIDATOR_COUNT: usize = 16;
lazy_static! {
/// A cached set of keys.
static ref KEYPAIRS: Vec<Keypair> = types::test_utils::generate_deterministic_keypairs(VALIDATOR_COUNT);
}
/// This test builds a chain that is just long enough to finalize an epoch then it produces an
/// attestation at each slot from genesis through to three epochs past the head.
///
/// It checks the produced attestation against some locally computed values.
#[test]
fn produces_attestations() {
let num_blocks_produced = MainnetEthSpec::slots_per_epoch() * 4;
let additional_slots_tested = MainnetEthSpec::slots_per_epoch() * 3;
let harness = BeaconChainHarness::new_with_store_config(
MainnetEthSpec,
KEYPAIRS[..].to_vec(),
StoreConfig::default(),
);
let chain = &harness.chain;
// Test all valid committee indices for all slots in the chain.
// for slot in 0..=current_slot.as_u64() + MainnetEthSpec::slots_per_epoch() * 3 {
for slot in 0..=num_blocks_produced + additional_slots_tested {
if slot > 0 && slot <= num_blocks_produced {
harness.advance_slot();
harness.extend_chain(
1,
BlockStrategy::OnCanonicalHead,
AttestationStrategy::AllValidators,
);
}
let slot = Slot::from(slot);
let mut state = chain
.state_at_slot(slot, StateSkipConfig::WithStateRoots)
.expect("should get state");
let block_slot = if slot <= num_blocks_produced {
slot
} else {
Slot::from(num_blocks_produced)
};
let block = chain
.block_at_slot(block_slot)
.expect("should get block")
.expect("block should not be skipped");
let block_root = block.message.tree_hash_root();
let epoch_boundary_slot = state
.current_epoch()
.start_slot(MainnetEthSpec::slots_per_epoch());
let target_root = if state.slot == epoch_boundary_slot {
block_root
} else {
*state
.get_block_root(epoch_boundary_slot)
.expect("should get target block root")
};
state
.build_committee_cache(RelativeEpoch::Current, &harness.chain.spec)
.unwrap();
let committee_cache = state
.committee_cache(RelativeEpoch::Current)
.expect("should get committee_cache");
let committee_count = committee_cache.committees_per_slot();
for index in 0..committee_count {
let committee_len = committee_cache
.get_beacon_committee(slot, index)
.expect("should get committee for slot")
.committee
.len();
let attestation = chain
.produce_unaggregated_attestation(slot, index)
.expect("should produce attestation");
let data = &attestation.data;
assert_eq!(
attestation.aggregation_bits.len(),
committee_len,
"bad committee len"
);
assert!(
attestation.aggregation_bits.is_zero(),
"some committee bits are set"
);
assert_eq!(
attestation.signature,
AggregateSignature::empty(),
"bad signature"
);
assert_eq!(data.index, index, "bad index");
assert_eq!(data.slot, slot, "bad slot");
assert_eq!(data.beacon_block_root, block_root, "bad block root");
assert_eq!(
data.source, state.current_justified_checkpoint,
"bad source"
);
assert_eq!(
data.source, state.current_justified_checkpoint,
"bad source"
);
assert_eq!(data.target.epoch, state.current_epoch(), "bad target epoch");
assert_eq!(data.target.root, target_root, "bad target root");
}
}
}