lighthouse/beacon_node/beacon_chain/src
Michael Sproul e5bf2576f1 Optimise tree hash caching for block production (#2106)
## Proposed Changes

`@potuz` on the Eth R&D Discord observed that Lighthouse blocks on Pyrmont were always arriving at other nodes after at least 1 second. Part of this could be due to processing and slow propagation, but metrics also revealed that the Lighthouse nodes were usually taking 400-600ms to even just produce a block before broadcasting it.

I tracked the slowness down to the lack of a pre-built tree hash cache (THC) on the states being used for block production. This was due to using the head state for block production, which lacks a THC in order to keep fork choice fast (cloning a THC takes at least 30ms for 100k validators). This PR modifies block production to clone a state from the snapshot cache rather than the head, which speeds things up by 200-400ms by avoiding the tree hash cache rebuild. In practice this seems to have cut block production time down to 300ms or less. Ideally we could _remove_ the snapshot from the cache (and save the 30ms), but it is required for when we re-process the block after signing it with the validator client.

## Alternatives

I experimented with 2 alternatives to this approach, before deciding on it:

* Alternative 1: ensure the `head` has a tree hash cache. This is too slow, as it imposes a +30ms hit on fork choice, which currently takes ~5ms (with occasional spikes).
* Alternative 2: use `Arc<BeaconSnapshot>` in the snapshot cache and share snapshots between the cache and the `head`. This made fork choice blazing fast (1ms), and block production the same as in this PR, but had a negative impact on block processing which I don't think is worth it. It ended up being necessary to clone the full state from the snapshot cache during block production, imposing the +30ms penalty there _as well_ as in block production.

In contract, the approach in this PR should only impact block production, and it improves it! Yay for pareto improvements 🎉

## Additional Info

This commit (ac59dfa) is currently running on all the Lighthouse Pyrmont nodes, and I've added a dashboard to the Pyrmont grafana instance with the metrics.

In future work we should optimise the attestation packing, which consumes around 30-60ms and is now a substantial contributor to the total.
2020-12-21 06:29:39 +00:00
..
attestation_verification.rs Fix new clippy lints (#2036) 2020-12-03 01:10:26 +00:00
beacon_chain.rs Optimise tree hash caching for block production (#2106) 2020-12-21 06:29:39 +00:00
beacon_fork_choice_store.rs Fix new clippy lints (#2036) 2020-12-03 01:10:26 +00:00
beacon_snapshot.rs Optimise tree hash caching for block production (#2106) 2020-12-21 06:29:39 +00:00
block_verification.rs Pass failed gossip blocks to the slasher (#2047) 2020-12-04 05:03:30 +00:00
builder.rs Revert fork choice if disk write fails (#2068) 2020-12-09 05:10:34 +00:00
chain_config.rs Implement database temp states to reduce memory usage (#1798) 2020-10-23 01:27:51 +00:00
errors.rs Optimise tree hash caching for block production (#2106) 2020-12-21 06:29:39 +00:00
eth1_chain.rs Minor fixes (#2038) 2020-12-03 01:10:28 +00:00
events.rs Server sent events (#1920) 2020-12-04 00:18:58 +00:00
head_tracker.rs Fix head tracker concurrency bugs (#1771) 2020-10-19 05:58:39 +00:00
lib.rs Server sent events (#1920) 2020-12-04 00:18:58 +00:00
metrics.rs Optimise tree hash caching for block production (#2106) 2020-12-21 06:29:39 +00:00
migrate.rs Address queue congestion in migrator (#1923) 2020-11-17 23:11:26 +00:00
naive_aggregation_pool.rs Fix new clippy lints (#2036) 2020-12-03 01:10:26 +00:00
observed_attestations.rs Fix new clippy lints (#2036) 2020-12-03 01:10:26 +00:00
observed_attesters.rs Fix race condition in seen caches (#1937) 2020-11-22 23:02:51 +00:00
observed_block_producers.rs Fix race condition in seen caches (#1937) 2020-11-22 23:02:51 +00:00
observed_operations.rs Fix race condition in seen caches (#1937) 2020-11-22 23:02:51 +00:00
persisted_beacon_chain.rs Fix head tracker concurrency bugs (#1771) 2020-10-19 05:58:39 +00:00
persisted_fork_choice.rs v0.12 fork choice update (#1229) 2020-06-17 11:10:22 +10:00
shuffling_cache.rs Implement standard eth2.0 API (#1569) 2020-10-01 11:12:36 +10:00
snapshot_cache.rs Optimise tree hash caching for block production (#2106) 2020-12-21 06:29:39 +00:00
test_utils.rs Server sent events (#1920) 2020-12-04 00:18:58 +00:00
timeout_rw_lock.rs Add timeouts to canonical head rwlock (#759) 2020-01-06 17:30:37 +11:00
validator_pubkey_cache.rs Allow truncation of pubkey cache on creation (#1686) 2020-09-30 04:42:52 +00:00