lighthouse

History

Michael Sproul e5bf2576f1 Optimise tree hash caching for block production (#2106 ) ## Proposed Changes `@potuz` on the Eth R&D Discord observed that Lighthouse blocks on Pyrmont were always arriving at other nodes after at least 1 second. Part of this could be due to processing and slow propagation, but metrics also revealed that the Lighthouse nodes were usually taking 400-600ms to even just produce a block before broadcasting it. I tracked the slowness down to the lack of a pre-built tree hash cache (THC) on the states being used for block production. This was due to using the head state for block production, which lacks a THC in order to keep fork choice fast (cloning a THC takes at least 30ms for 100k validators). This PR modifies block production to clone a state from the snapshot cache rather than the head, which speeds things up by 200-400ms by avoiding the tree hash cache rebuild. In practice this seems to have cut block production time down to 300ms or less. Ideally we could _remove_ the snapshot from the cache (and save the 30ms), but it is required for when we re-process the block after signing it with the validator client. ## Alternatives I experimented with 2 alternatives to this approach, before deciding on it: * Alternative 1: ensure the `head` has a tree hash cache. This is too slow, as it imposes a +30ms hit on fork choice, which currently takes ~5ms (with occasional spikes). * Alternative 2: use `Arc<BeaconSnapshot>` in the snapshot cache and share snapshots between the cache and the `head`. This made fork choice blazing fast (1ms), and block production the same as in this PR, but had a negative impact on block processing which I don't think is worth it. It ended up being necessary to clone the full state from the snapshot cache during block production, imposing the +30ms penalty there _as well_ as in block production. In contract, the approach in this PR should only impact block production, and it improves it! Yay for pareto improvements 🎉 ## Additional Info This commit (ac59dfa) is currently running on all the Lighthouse Pyrmont nodes, and I've added a dashboard to the Pyrmont grafana instance with the metrics. In future work we should optimise the attestation packing, which consumes around 30-60ms and is now a substantial contributor to the total.		2020-12-21 06:29:39 +00:00
..
beacon_chain	Optimise tree hash caching for block production (#2106 )	2020-12-21 06:29:39 +00:00
client	Add slasher broadcast (#2079 )	2020-12-16 03:44:01 +00:00
eth1	Improve eth1 fallback logging (#2096 )	2020-12-16 02:39:09 +00:00
eth2_libp2p	Subnet discovery fixes (#2095 )	2020-12-17 00:39:15 +00:00
genesis	Improve compile time (#1989 )	2020-12-09 01:34:58 +00:00
http_api	BN Fallback v2 (#2080 )	2020-12-18 09:17:03 +00:00
http_metrics	update dependencies (#2032 )	2020-12-07 08:20:33 +00:00
network	Subnet discovery fixes (#2095 )	2020-12-17 00:39:15 +00:00
operation_pool	Update pool/attestations and committees endpoints (#1899 )	2020-11-18 23:31:39 +00:00
src	BN Fallback v2 (#2080 )	2020-12-18 09:17:03 +00:00
store	update dependencies (#2032 )	2020-12-07 08:20:33 +00:00
tests	Upgrade to tokio 0.3 (#1839 )	2020-11-28 05:30:57 +00:00
timer	Fix new clippy lints (#2036 )	2020-12-03 01:10:26 +00:00
websocket_server	Server sent events (#1920 )	2020-12-04 00:18:58 +00:00
Cargo.toml	Version v1.0.4 (#2073 )	2020-12-10 04:01:40 +00:00