lighthouse

History

Paul Hauner 44fa54004c Persist to DB after setting canonical head (#2547 ) ## Issue Addressed NA ## Proposed Changes Missed head votes on attestations is a well-known issue. The primary cause is a block getting set as the head after the attestation deadline. This PR aims to shorten the overall time between "block received" and "block set as head" by: 1. Persisting the head and fork choice after setting the canonical head - Informal measurements show this takes ~200ms 1. Pruning the op pool after setting the canonical head. 1. No longer persisting the op pool to disk during `BeaconChain::fork_choice` - Informal measurements show this can take up to 1.2s. I also add some metrics to help measure the effect of these changes. Persistence changes like this run the risk of breaking assumptions downstream. However, I have considered these risks and I think we're fine here. I will describe my reasoning for each change. ## Reasoning ### Change 1: Persisting the head and fork choice after setting the canonical head For (1), although the function is called `persist_head_and_fork_choice`, it only persists: - Fork choice - Head tracker - Genesis block root Since `BeaconChain::fork_choice_internal` does not modify these values between the original time we were persisting it and the current time, I assert that the change I've made is non-substantial in terms of what ends up on-disk. There's the possibility that some other thread has modified fork choice in the extra time we've given it, but that's totally fine. Since the only time we read those values from disk is during startup, I assert that this has no impact during runtime. ### Change 2: Pruning the op pool after setting the canonical head Similar to the argument above, we don't modify the op pool during `BeaconChain::fork_choice_internal` so it shouldn't matter when we prune. This change should be non-substantial. ### Change 3: No longer persisting the op pool to disk during `BeaconChain::fork_choice` This change is substantial. With the proposed changes, we'll only be persisting the op pool to disk when we shut down cleanly (i.e., the `BeaconChain` gets dropped). This means we'll save disk IO and time during usual operation, but a `kill -9` or similar "crash" will probably result in an out-of-date op pool when we reboot. An out-of-date op pool can only have an impact when producing blocks or aggregate attestations/sync committees. I think it's pretty reasonable that a crash might result in an out-of-date op pool, since: - Crashes are fairly rare. Practically the only time I see LH suffer a full crash is when the OOM killer shows up, and that's a very serious event. - It's generally quite rare to produce a block/aggregate immediately after a reboot. Just a few slots of runtime is probably enough to have a decent-enough op pool again. ## Additional Info Credits to @macladson for the timings referenced here.		2021-08-31 04:48:21 +00:00
..
beacon_chain	Persist to DB after setting canonical head (#2547 )	2021-08-31 04:48:21 +00:00
client	Upgrade dependencies (#2513 )	2021-08-17 01:00:24 +00:00
eth1	Improve eth1 fallback logging (#2490 )	2021-08-30 00:51:26 +00:00
eth2_libp2p	Shutdown after sync (#2519 )	2021-08-30 13:46:13 +00:00
genesis	Upgrade dependencies (#2513 )	2021-08-17 01:00:24 +00:00
http_api	Fork schedule api (#2525 )	2021-08-24 01:36:27 +00:00
http_metrics	Upgrade dependencies (#2513 )	2021-08-17 01:00:24 +00:00
network	Shutdown after sync (#2519 )	2021-08-30 13:46:13 +00:00
operation_pool	Rust 1.54.0 lints (#2483 )	2021-07-30 01:11:47 +00:00
src	Shutdown after sync (#2519 )	2021-08-30 13:46:13 +00:00
store	Revert bad blocks on missed fork (#2529 )	2021-08-30 06:41:31 +00:00
tests	Altair consensus changes and refactors (#2279 )	2021-07-09 06:15:32 +00:00
timer	Upgrade dependencies (#2513 )	2021-08-17 01:00:24 +00:00
websocket_server	Server sent events (#1920 )	2020-12-04 00:18:58 +00:00
Cargo.toml	Add more unix signal handlers (#2486 )	2021-08-30 05:19:34 +00:00