lighthouse

History

Paul Hauner b06559ae97 Disallow attestation production earlier than head (#2130 ) ## Issue Addressed The non-finality period on Pyrmont between epochs [`9114`](https://pyrmont.beaconcha.in/epoch/9114) and [`9182`](https://pyrmont.beaconcha.in/epoch/9182) was contributed to by all the `lighthouse_team` validators going down. The nodes saw excessive CPU and RAM usage, resulting in the system to kill the `lighthouse bn` process. The `Restart=on-failure` directive for `systemd` caused the process to bounce in ~10-30m intervals. Diagnosis with `heaptrack` showed that the `BeaconChain::produce_unaggregated_attestation` function was calling `store::beacon_state::get_full_state` and sometimes resulting in a tree hash cache allocation. These allocations were approximately the size of the hosts physical memory and still allocated when `lighthouse bn` was killed by the OS. There was no CPU analysis (e.g., `perf`), but the `BeaconChain::produce_unaggregated_attestation` is very CPU-heavy so it is reasonable to assume it is the cause of the excessive CPU usage, too. ## Proposed Changes `BeaconChain::produce_unaggregated_attestation` has two paths: 1. Fast path: attesting to the head slot or later. 2. Slow path: attesting to a slot earlier than the head block. Path (2) is the only path that calls `store::beacon_state::get_full_state`, therefore it is the path causing this excessive CPU/RAM usage. This PR removes the current functionality of path (2) and replaces it with a static error (`BeaconChainError::AttestingPriorToHead`). This change reduces the generality of `BeaconChain::produce_unaggregated_attestation` (and therefore [`/eth/v1/validator/attestation_data`](https://ethereum.github.io/eth2.0-APIs/#/Validator/produceAttestationData)), but I argue that this functionality is an edge-case and arguably a violation of the [Honest Validator spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md). It's possible that a validator goes back to a prior slot to "catch up" and submit some missed attestations. This change would prevent such behaviour, returning an error. My concerns with this catch-up behaviour is that it is: - Not specified as "honest validator" attesting behaviour. - Is behaviour that is risky for slashing (although, all validator clients should have slashing protection and will eventually fail if they do not). - It disguises clock-sync issues between a BN and VC. ## Additional Info It's likely feasible to implement path (2) if we implement some sort of caching mechanism. This would be a multi-week task and this PR gets the issue patched in the short term. I haven't created an issue to add path (2), instead I think we should implement it if we get user-demand.		2021-01-20 06:52:37 +00:00
..
beacon_chain	Disallow attestation production earlier than head (#2130 )	2021-01-20 06:52:37 +00:00
client	Represent slots in secs instead of millisecs (#2163 )	2021-01-19 09:39:51 +00:00
eth1	Represent slots in secs instead of millisecs (#2163 )	2021-01-19 09:39:51 +00:00
eth2_libp2p	Represent slots in secs instead of millisecs (#2163 )	2021-01-19 09:39:51 +00:00
genesis	Improve compile time (#1989 )	2020-12-09 01:34:58 +00:00
http_api	Ssz state api endpoint (#2111 )	2021-01-06 03:01:46 +00:00
http_metrics	update dependencies (#2032 )	2020-12-07 08:20:33 +00:00
network	Represent slots in secs instead of millisecs (#2163 )	2021-01-19 09:39:51 +00:00
operation_pool	Update pool/attestations and committees endpoints (#1899 )	2020-11-18 23:31:39 +00:00
src	Clippy 1.49.0 updates and dht persistence test fix (#2156 )	2021-01-19 00:34:28 +00:00
store	Clippy 1.49.0 updates and dht persistence test fix (#2156 )	2021-01-19 00:34:28 +00:00
tests	Upgrade to tokio 0.3 (#1839 )	2020-11-28 05:30:57 +00:00
timer	Represent slots in secs instead of millisecs (#2163 )	2021-01-19 09:39:51 +00:00
websocket_server	Server sent events (#1920 )	2020-12-04 00:18:58 +00:00
Cargo.toml	Version v1.0.6 (#2126 )	2020-12-28 23:38:02 +00:00