lighthouse

Author	SHA1	Message	Date
Paul Hauner	be4e261e74	Use async code when interacting with EL (#3244 ) ## Overview This rather extensive PR achieves two primary goals: 1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state. 2. Refactors fork choice, block production and block processing to `async` functions. Additionally, it achieves: - Concurrent forkchoice updates to the EL and cache pruning after a new head is selected. - Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production. - Concurrent per-block-processing and execution payload verification during block processing. - The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?): - I had to do this to deal with sending blocks into spawned tasks. - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones. - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap. - Avoids cloning all the blocks in every chain segment during sync. - It also has the potential to clean up our code where we need to pass an owned block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough 😅) - The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs. For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273 ## Changes to `canonical_head` and `fork_choice` Previously, the `BeaconChain` had two separate fields: ``` canonical_head: RwLock<Snapshot>, fork_choice: RwLock<BeaconForkChoice> ``` Now, we have grouped these values under a single struct: ``` canonical_head: CanonicalHead { cached_head: RwLock<Arc<Snapshot>>, fork_choice: RwLock<BeaconForkChoice> } ``` Apart from ergonomics, the only actual change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously. ## Breaking Changes ### The `state` (root) field in the `finalized_checkpoint` SSE event Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event: 1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`. 4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots. Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](`de2b2801c8/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java (L171-L182)`) it uses [`getStateRootFromBlockRoot`](`de2b2801c8/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java (L336-L341)`) which uses (1). I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku. ## Notes for Reviewers I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct. I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking". I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run at least once means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows when it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it. I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around. Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2. You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests: - Changing tests to be `tokio::async` tests. - Adding `.await` to fork choice, block processing and block production functions. - Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`. - Wrapping `SignedBeaconBlock` in an `Arc`. - In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant. I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic. Co-authored-by: Mac L <mjladson@pm.me>	2022-07-03 05:36:50 +00:00
Paul Hauner	e5212f1320	Avoid growing Vec for sync committee indices (#3301 ) ## Issue Addressed NA ## Proposed Changes This is a fairly simple micro-optimization to avoid using `Vec::grow`. I don't believe this will have a substantial effect on block processing times, however it was showing up in flamegraphs. I think it's worth making this change for general memory-hygiene. ## Additional Info NA	2022-07-01 03:44:37 +00:00
realbigsean	a7da0677d5	Remove builder redundancy (#3294 ) ## Issue Addressed This PR is a subset of the changes in #3134. Unstable will still not function correctly with the new builder spec once this is merged, #3134 should be used on testnets ## Proposed Changes - Removes redundancy in "builders" (servers implementing the builder spec) - Renames `payload-builder` flag to `builder` - Moves from old builder RPC API to new HTTP API, but does not implement the validator registration API (implemented in https://github.com/sigp/lighthouse/pull/3194) Co-authored-by: sean <seananderson33@gmail.com> Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-07-01 01:15:19 +00:00
Divma	d40c76e667	Fix clippy lints for rust 1.62 (#3300 ) ## Issue Addressed Fixes some new clippy lints after the last rust release ### Lints fixed for the curious: - [cast_abs_to_unsigned](https://rust-lang.github.io/rust-clippy/master/index.html#cast_abs_to_unsigned) - [map_identity](https://rust-lang.github.io/rust-clippy/master/index.html#map_identity) - [let_unit_value](https://rust-lang.github.io/rust-clippy/master/index.html#let_unit_value) - [crate_in_macro_def](https://rust-lang.github.io/rust-clippy/master/index.html#crate_in_macro_def) - [extra_unused_lifetimes](https://rust-lang.github.io/rust-clippy/master/index.html#extra_unused_lifetimes) - [format_push_string](https://rust-lang.github.io/rust-clippy/master/index.html#format_push_string)	2022-06-30 22:51:49 +00:00
realbigsean	f6ec44f0dd	Register validator api (#3194 ) ## Issue Addressed Lays the groundwork for builder API changes by implementing the beacon-API's new `register_validator` endpoint ## Proposed Changes - Add a routine in the VC that runs on startup (re-try until success), once per epoch or whenever `suggested_fee_recipient` is updated, signing `ValidatorRegistrationData` and sending it to the BN. - TODO: `gas_limit` config options https://github.com/ethereum/builder-specs/issues/17 - BN only sends VC registration data to builders on demand, but VC registration data does update the BN's prepare proposer cache and send an updated fcU to a local EE. This is necessary for fee recipient consistency between the blinded and full block flow in the event of fallback. Having the BN only send registration data to builders on demand gives feedback directly to the VC about relay status. Also, since the BN has no ability to sign these messages anyways (so couldn't refresh them if it wanted), and validator registration is independent of the BN head, I think this approach makes sense. - Adds upcoming consensus spec changes for this PR https://github.com/ethereum/consensus-specs/pull/2884 - I initially applied the bit mask based on a configured application domain.. but I ended up just hard coding it here instead because that's how it's spec'd in the builder repo. - Should application mask appear in the api? Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-06-30 00:49:21 +00:00
Pawan Dhananjay	5de00b7ee8	Unify execution layer endpoints (#3214 ) ## Issue Addressed Resolves #3069 ## Proposed Changes Unify the `eth1-endpoints` and `execution-endpoints` flags in a backwards compatible way as described in https://github.com/sigp/lighthouse/issues/3069#issuecomment-1134219221 Users have 2 options: 1. Use multiple non auth execution endpoints for deposit processing pre-merge 2. Use a single jwt authenticated execution endpoint for both execution layer and deposit processing post merge Related https://github.com/sigp/lighthouse/issues/3118 To enable jwt authenticated deposit processing, this PR removes the calls to `net_version` as the `net` namespace is not exposed in the auth server in execution clients. Moving away from using `networkId` is a good step in my opinion as it doesn't provide us with any added guarantees over `chainId`. See https://github.com/ethereum/consensus-specs/issues/2163 and https://github.com/sigp/lighthouse/issues/2115 Co-authored-by: Paul Hauner <paul@paulhauner.com>	2022-06-29 09:07:09 +00:00
Michael Sproul	53b2b500db	Extend block reward APIs (#3290 ) ## Proposed Changes Add a new HTTP endpoint `POST /lighthouse/analysis/block_rewards` which takes a vec of `BeaconBlock`s as input and outputs the `BlockReward`s for them. Augment the `BlockReward` struct with the attestation data for attestations in the block, which simplifies access to this information from blockprint. Using attestation data I've been able to make blockprint up to 95% accurate across Prysm/Lighthouse/Teku/Nimbus. I hope to go even higher using a bunch of synthetic blocks produced for Prysm/Nimbus/Lodestar, which are underrepresented in the current training data.	2022-06-29 04:50:37 +00:00
Michael Sproul	36453929d5	Update Cross config for v0.2.2 (#3286 ) ## Proposed Changes Update `Cross.toml` for the recently released Cross v0.2.2. This allows us to remove the dependency on my fork of the Cross Docker image, which was a maintenance burden and prone to bit-rot. This PR puts us back in sync with upstream Cross. ## Additional Info Due to some bindgen errors on the default Cross images we seemingly need a full `clang-3.9` install. The `libclang-3.9-dev` package was found to be insufficient due to `stdarg.h` being missing. In order to continue building locally all Lighthouse devs should update their local cross version with `cargo install cross`.	2022-06-29 04:50:36 +00:00
Paul Hauner	45b2eb18bc	v2.3.2-rc.0 (#3289 ) ## Issue Addressed NA ## Proposed Changes Bump versions ## Additional Info NA	2022-06-28 03:03:30 +00:00
Paul Hauner	f3a1b5da31	Update Sepolia TTD (#3288 ) ## Issue Addressed NA ## Proposed Changes Update Sepolia TTD as per https://github.com/eth-clients/merge-testnets/pull/21 ## Additional Info NA	2022-06-27 22:50:27 +00:00
Pawan Dhananjay	7acfbd89ee	Recover from NonConsecutive eth1 errors (#3273 ) ## Issue Addressed Fixes #1864 and a bunch of other closed but unresolved issues. ## Proposed Changes Allows the deposit caching to recover from `NonConsecutive` deposit errors by resetting the last processed block to the last valid deposit's block number. Still not sure of the underlying cause of this error, but this should recover the cache so we don't need `--eth1-purge-cache` anymore 🎉 A huge thanks to @one-three-three-seven for reproducing the error and providing the data that helped testing out the fix 🙌 Still needs a few more tests.	2022-06-26 23:10:58 +00:00
Akihito Nakano	082ed35bdc	Test the pruning of excess peers using randomly generated input (#3248 ) ## Issue Addressed https://github.com/sigp/lighthouse/issues/3092 ## Proposed Changes Added property-based tests for the pruning implementation. A randomly generated input for the test contains connection direction, subnets, and scores. ## Additional Info I left some comments on this PR, what I have tried, and [a question](https://github.com/sigp/lighthouse/pull/3248#discussion_r891981969). Co-authored-by: Diva M <divma@protonmail.com>	2022-06-25 22:22:34 +00:00
Michael Sproul	d21f083777	Add more paths to HTTP API metrics (#3282 ) ## Proposed Changes Expand the set of paths tracked by the HTTP API metrics to include all paths hit by the validator client. These paths were only partially updated for Altair, so we were missing some of the sync committee and v2 APIs.	2022-06-23 05:19:21 +00:00
Paul Hauner	748658e32c	Add some debug logs for checkpoint sync (#3281 ) ## Issue Addressed NA ## Proposed Changes I used these logs when debugging a spurious failure with Infura and thought they might be nice to have around permanently. There's no changes to functionality in this PR, just some additional `debug!` logs. ## Additional Info NA	2022-06-23 05:19:20 +00:00
Divma	7af5742081	Deprecate step param in BlocksByRange RPC request (#3275 ) ## Issue Addressed Deprecates the step parameter in the blocks by range request ## Proposed Changes - Modifies the BlocksByRangeRequest type to remove the step parameter and everywhere we took it into account before - Adds a new type to still handle coding and decoding of requests that use the parameter ## Additional Info I went with a deprecation over the type itself so that requests received outside `lighthouse_network` don't even need to deal with this parameter. After the deprecation period just removing the Old blocks by range request should be straightforward	2022-06-22 16:23:34 +00:00
Divma	2063c0fa0d	Initial work to remove engines fallback from the `execution_layer` crate (#3257 ) ## Issue Addressed Part of #3160 ## Proposed Changes Use only the first url given in the execution engine, if more than one is provided log it. This change only moves having multiple engines to one. The amount of code cleanup that can and should be done forward is not small and would interfere with ongoing PRs. I'm keeping the changes intentionally very very minimal. ## Additional Info Future works: - In [ `EngineError` ](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L173-L177)`) the id is not needed since it now has no meaning. - the [ `first_success_without_retry` ](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L348-L351)`) function can return a single error. - the [`first_success`](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L324)`) function can return a single error. - After the redundancy is removed for the builders, we can probably make the [ `EngineErrors` ](`9c429d0764/beacon_node/execution_layer/src/lib.rs (L69)`) carry a single error. - Merge the [`Engines`](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L161-L165)`) struct and [`Engine` ](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L62-L67)`) - Fix the associated configurations and cli params. Not sure if both are done in https://github.com/sigp/lighthouse/pull/3214 In general I think those changes can be done incrementally and in individual pull requests.	2022-06-22 14:27:16 +00:00
Michael Sproul	8faaa35b58	Enable malloc metrics for the VC (#3279 ) ## Issue Addressed Following up from https://github.com/sigp/lighthouse/pull/3223#issuecomment-1158718102, it has been observed that the validator client uses vastly more memory in some compilation configurations than others. Compiling with Cross and then putting the binary into an Ubuntu 22.04 image seems to use 3x more memory than compiling with Cargo directly on Debian bullseye. ## Proposed Changes Enable malloc metrics for the validator client. This will hopefully allow us to see the difference between the two compilation configs and compare heap fragmentation. This PR doesn't enable malloc tuning for the VC because it was found to perform significantly worse. The `--disable-malloc-tuning` flag is repurposed to just disable the metrics.	2022-06-20 23:20:30 +00:00
Michael Sproul	efebf712dd	Avoid cloning snapshots during sync (#3271 ) ## Issue Addressed Closes https://github.com/sigp/lighthouse/issues/2944 ## Proposed Changes Remove snapshots from the cache during sync rather than cloning them. This reduces unnecessary cloning and memory fragmentation during sync. ## Additional Info This PR relies on the fact that the `block_delay` cache is not populated for blocks from sync. Relying on block delay may have the side effect that a change in `block_delay` calculation could lead to: a) more clones, if block delays are added for syncing blocks or b) less clones, if blocks near the head are erroneously provided without a `block_delay`. Case (a) would be a regression to the current status quo, and (b) is low-risk given we know that the snapshot cache is current susceptible to misses (hence `tree-states`).	2022-06-20 23:20:29 +00:00
eklm	a9e158663b	Fix validator_monitor_prev_epoch_ metrics (#2911 ) ## Issue Addressed #2820 ## Proposed Changes The problem is that validator_monitor_prev_epoch metrics are updated only if there is EpochSummary present in summaries map for the previous epoch and it is not the case for the offline validator. Ensure that EpochSummary is inserted into summaries map also for the offline validators.	2022-06-20 04:06:30 +00:00
Pawan Dhananjay	f428719761	Do not penalize peers on execution layer offline errors (#3258 ) ## Issue Addressed Partly resolves https://github.com/sigp/lighthouse/issues/3032 ## Proposed Changes Extracts some of the functionality of #3094 into a separate PR as the original PR requires a bit more work. Do not unnecessarily penalize peers when we fail to validate received execution payloads because our execution layer is offline.	2022-06-19 23:13:40 +00:00
Divma	21b3425a12	Update cargo lockfile to fix RUSTSEC-2022-0025, RUSTSEC-2022-0026 and RUSTSEC-2022-0027 (#3278 ) ## Issue Addressed Fixes [RUSTSEC-2022-0025](https://rustsec.org/advisories/RUSTSEC-2022-0025), [RUSTSEC-2022-0026](https://rustsec.org/advisories/RUSTSEC-2022-0026) and [RUSTSEC-2022-0027](https://rustsec.org/advisories/RUSTSEC-2022-0027) raised in [this test run](https://github.com/sigp/lighthouse/runs/6943343852?check_suite_focus=true) ## Proposed Changes a `cargo update` was enough ## Additional Info	2022-06-18 23:59:43 +00:00
Pawan Dhananjay	7aeb9f9ecd	Add sepolia config (#3268 ) ## Issue Addressed N/A ## Proposed Changes Add network config for sepolia from https://github.com/eth-clients/merge-testnets/pull/14	2022-06-17 03:10:52 +00:00
Paul Hauner	564d7da656	v2.3.1 (#3262 ) ## Issue Addressed NA ## Proposed Changes Bump versions ## Additional Info NA	2022-06-14 05:25:38 +00:00
Divma	3dd50bda11	Improve substream management (#3261 ) ## Issue Addressed Which issue # does this PR address? ## Proposed Changes Please list or describe the changes introduced by this PR. ## Additional Info Please provide any additional information. For example, future considerations or information useful for reviewers.	2022-06-10 06:58:50 +00:00
Paul Hauner	11d80a6a38	Optimise `per_epoch_processing` low-hanging-fruit (#3254 ) ## Issue Addressed NA ## Proposed Changes - Uses a `Vec` in `SingleEpochParticipationCache` rather than `HashMap` to speed up processing times at the cost of memory usage. - Cache the result of `integer_sqrt` rather than recomputing for each validator. - Cache `state.previous_epoch` rather than recomputing it for each validator. ### Benchmarks Benchmarks on a recent mainnet state using #3252 to get timing. #### Without this PR ``` lcli skip-slots --state-path /tmp/state-0x3cdc.ssz --partial-state-advance --slots 32 --state-root 0x3cdc33cd02713d8d6cc33a6dbe2d3a5bf9af1d357de0d175a403496486ff845e --runs 10 [2022-06-09T08:21:02Z INFO lcli::skip_slots] Using mainnet spec [2022-06-09T08:21:02Z INFO lcli::skip_slots] Advancing 32 slots [2022-06-09T08:21:02Z INFO lcli::skip_slots] Doing 10 runs [2022-06-09T08:21:02Z INFO lcli::skip_slots] State path: "/tmp/state-0x3cdc.ssz" SSZ decoding /tmp/state-0x3cdc.ssz: 43ms [2022-06-09T08:21:03Z INFO lcli::skip_slots] Run 0: 245.718794ms [2022-06-09T08:21:03Z INFO lcli::skip_slots] Run 1: 245.364782ms [2022-06-09T08:21:03Z INFO lcli::skip_slots] Run 2: 255.866179ms [2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 3: 243.838909ms [2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 4: 250.431425ms [2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 5: 248.68765ms [2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 6: 262.051113ms [2022-06-09T08:21:05Z INFO lcli::skip_slots] Run 7: 264.293967ms [2022-06-09T08:21:05Z INFO lcli::skip_slots] Run 8: 293.202007ms [2022-06-09T08:21:05Z INFO lcli::skip_slots] Run 9: 264.552017ms ``` #### With this PR: ``` lcli skip-slots --state-path /tmp/state-0x3cdc.ssz --partial-state-advance --slots 32 --state-root 0x3cdc33cd02713d8d6cc33a6dbe2d3a5bf9af1d357de0d175a403496486ff845e --runs 10 [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 0: 73.898678ms [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 1: 75.536978ms [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 2: 75.176104ms [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 3: 76.460828ms [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 4: 75.904195ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 5: 75.53077ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 6: 74.745572ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 7: 75.823489ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 8: 74.892055ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 9: 76.333569ms ``` ## Additional Info NA	2022-06-10 04:29:28 +00:00
Michael Sproul	1d016a83f2	Lint against panicky calls in async functions (#3250 ) ## Description Add a new lint to CI that attempts to detect calls to functions like `block_on` from async execution contexts. This lint was written from scratch exactly for this purpose, on my fork of Clippy: https://github.com/michaelsproul/rust-clippy/tree/disallow-from-async ## Additional Info - I've successfully detected the previous two issues we had with `block_on` by running the linter on the commits prior to each of these PRs: https://github.com/sigp/lighthouse/pull/3165, https://github.com/sigp/lighthouse/pull/3199. - The lint runs on CI with `continue-on-error: true` so that if it fails spuriously it won't block CI. - I think it would be good to merge this PR before https://github.com/sigp/lighthouse/pull/3244 so that we can lint the extensive executor-related changes in that PR. - I aim to upstream the lint to Clippy, at which point building a custom version of Clippy from my fork will no longer be necessary. I imagine this will take several weeks or months though, because the code is currently a bit hacky and will need some renovations to pass review.	2022-06-10 04:29:27 +00:00
Michael Sproul	452b46a7af	Pin MDBX at last version with Win/Mac support (#3246 ) ## Issue Addressed Newer versions of MDBX have removed Windows and macOS support, so this PR pins MDBX at the last working version to prevent an accidental regression via `cargo update`. ## Additional Info This is a short-term solution, if our pinned version of MDBX turns out to be buggy we will need to consider backporting patches from upstream to our own fork.	2022-06-10 04:29:26 +00:00
Divma	56b4cd88ca	minor libp2p upgrade (#3259 ) ## Issue Addressed Upgrades libp2p	2022-06-09 23:48:51 +00:00
Mac L	9c429d0764	Only use authenticated endpoints during EE integration testing (#3253 ) ## Issue Addressed Failures in our CI integration tests for Geth. ## Proposed Changes Only connect to the authenticated execution endpoints during execution tests. This is necessary now that it is impossible to connect to the `engine` api on an unauthenticated endpoint. See https://github.com/ethereum/go-ethereum/pull/24997 ## Additional Info As these tests break semi-regularly, I have kept logs enabled to ease future debugging. I've also updated the Nethermind tests, although these weren't broken. This should future-proof us if Nethermind decides to follow suit with Geth	2022-06-09 10:47:03 +00:00
Divma	cfd26d25e0	do not count sync batch attempts when peer is not at fault (#3245 ) ## Issue Addressed currently we count a failed attempt for a syncing chain even if the peer is not at fault. This makes us do more work if the chain fails, and heavily penalize peers, when we can simply retry. Inspired by a proposal I made to #3094 ## Proposed Changes If a batch fails but the peer is not at fault, do not count the attempt Also removes some annoying logs ## Additional Info We still get a counter on ignored attempts.. just in case	2022-06-07 02:35:56 +00:00
Divma	58e223e429	update libp2p (#3233 ) ## Issue Addressed na ## Proposed Changes Updates libp2p to https://github.com/libp2p/rust-libp2p/pull/2662 ## Additional Info From comments on the relevant PRs listed, we should pay attention at peer management consistency, but I don't think anything weird will happen. This is running in prater tok and sin	2022-06-07 02:35:55 +00:00
Michael Sproul	54cf94ea59	Fix per-slot timer in presence of clock changes (#3243 ) ## Issue Addressed Fixes a timing issue that results in spurious fork choice notifier failures: ``` WARN Error signalling fork choice waiter slot: 3962270, error: ForkChoiceSignalOutOfOrder { current: Slot(3962271), latest: Slot(3962270) }, service: beacon ``` There’s a fork choice run that is scheduled to run at the start of every slot by the `timer`, which creates a 12s interval timer when the beacon node starts up. The problem is that if there’s a bit of clock drift that gets corrected via NTP (or a leap second for that matter) then these 12s intervals will cease to line up with the start of the slot. This then creates the mismatch in slot number that we see above. Lighthouse also runs fork choice 500ms before the slot begins, and these runs are what is conflicting with the start-of-slot runs. This means that the warning in current versions of Lighthouse is mostly cosmetic because fork choice is up to date with all but the most recent 500ms of attestations (which usually isn’t many). ## Proposed Changes Fix the per-slot timer so that it continually re-calculates the duration to the start of the next slot and waits for that. A side-effect of this change is that we may skip slots if the per-slot task takes >12s to run, but I think this is an unlikely scenario and an acceptable compromise.	2022-06-06 23:52:32 +00:00
Divma	493c2c037c	reduce reprocess queue/channel sizes (#3239 ) ## Issue Addressed Reduces the effect of late blocks on overall node buildup ## Proposed Changes change the capacity of the channels used to send work for reprocessing in the beacon processor, and to send back to the main processor task, to be 75% of the capacity of the channel for receiving new events ## Additional Info The issues we've seen suggest we should still evaluate node performance under stress, with late blocks being a big factor. Other changes that could help: 1. right now we have a cap for queued attestations for reprocessing that applies to the sum of aggregated and unaggregated attestations. We could consider adding a separate cap that favors aggregated ones. 2. solving #2848	2022-06-06 23:52:31 +00:00
Akihito Nakano	a6d2ed6119	Fix: PeerManager doesn't remove "outbound only" peers which should be pruned (#3236 ) ## Issue Addressed This is one step to address https://github.com/sigp/lighthouse/issues/3092 before introducing `quickcheck`. I noticed an issue while I was reading the pruning implementation `PeerManager::prune_excess_peers()`. If a peer with the following condition, `outbound_peers_pruned` counter increases but the peer is not pushed to `peers_to_prune`. - [outbound only](`1e4ac8a4b9/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L1018)`) - [min_subnet_count <= MIN_SYNC_COMMITTEE_PEERS](`1e4ac8a4b9/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L1047)`) As a result, PeerManager doesn't remove "outbound" peers which should be pruned. Note: [`subnet_to_peer`](`e0d673ea86/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L999)`) (HashMap) doesn't guarantee a particular order of iteration. So whether the test fails depend on the order of iteration.	2022-06-06 05:51:10 +00:00
Paul Hauner	3d51f24717	Update Ropsten TTD (#3240 ) ## Issue Addressed NA ## Proposed Changes Updates the Ropsten TTD as per: https://blog.ethereum.org/2022/06/03/ropsten-merge-ttd/ ## Additional Info NA	2022-06-04 21:24:39 +00:00
Michael Sproul	47d57a290b	Improve eth1 block cache sync (for Ropsten) (#3234 ) ## Issue Addressed Fix for the eth1 cache sync issue observed on Ropsten. ## Proposed Changes Ropsten blocks are so infrequent that they broke our algorithm for downloading eth1 blocks. We currently try to download forwards from the last block in our cache to the block with block number [`remote_highest_block - FOLLOW_DISTANCE + FOLLOW_DISTANCE / ETH1_BLOCK_TIME_TOLERANCE_FACTOR`](`6f732986f1/beacon_node/eth1/src/service.rs (L489-L492)`). With the tolerance set to 4 this is insufficient because we lag by 1536 blocks, which is more like ~14 hours on Ropsten. This results in us having an incomplete eth1 cache, because we should cache all blocks between -16h and -8h. Even if we were to set the tolerance to 2 for the largest allowance, we would only look back 1024 blocks which is still more than 8 hours. For example consider this block https://ropsten.etherscan.io/block/12321390. The block from 1536 blocks earlier is 14 hours and 20 minutes before it: https://ropsten.etherscan.io/block/12319854. The block from 1024 blocks earlier is https://ropsten.etherscan.io/block/12320366, 8 hours and 48 minutes before. - This PR introduces a new CLI flag called `--eth1-cache-follow-distance` which can be used to set the distance manually. - A new dynamic catchup mechanism is added which detects when the cache is lagging the true eth1 chain and tries to download more blocks within the follow distance in order to catch up.	2022-06-03 06:05:03 +00:00
Mac L	20071975c7	Switch Nethermind integration tests to use `master` branch (#3228 ) ## Issue Addressed N/A ## Proposed Changes Preemptively switch Nethermind integration tests to use the `master` branch along with the baked in `kiln` config. ## Additional Info There have been some spurious timeouts across CI so this also increases the timeout to 20s.	2022-06-03 03:22:55 +00:00
Mac L	55ac423872	Emit log when fee recipient values are inconsistent (#3202 ) ## Issue Addressed #3156 ## Proposed Changes Emit a `WARN` log whenever the value of `fee_recipient` as returned from the EE is different from the value of `suggested_fee_recipient` as set on the BN, for example by the `--suggested-fee-recipient` CLI flag. ## Additional Info I have set the log level to `WARN` since it is legal behaviour (meaning it isn't really an error but is important to know when it is occurring). If we feel like this behaviour is almost always undesired (caused by a misconfiguration or malicious EE) then an `ERRO` log would be more appropriate. Happy to change it in that case.	2022-06-03 03:22:54 +00:00
Pawan Dhananjay	8e1305a3d2	Use a stable tag for ubuntu in dockerfile (#3231 ) ## Issue Addressed N/A ## Proposed Changes Use stable version of ubuntu base image in dockerfile instead of using latest. This will help in narrowing down issues with docker images.	2022-05-31 06:09:12 +00:00
Michael Sproul	cc4b778b1f	Inline `safe_arith` methods (#3229 ) ## Proposed Changes Speed up epoch processing by around 10% by inlining methods from the `safe_arith` crate. The Rust standard library uses `#[inline]` for the `checked_` functions that we're wrapping, so it makes sense for us to inline them too. ## Additional Info I conducted a brief statistical test on the block at slot [3858336](https://beaconcha.in/block/3858336) applied to the state at slot 3858335, which requires an epoch transition. The command used for testing was: ``` lcli transition-blocks --testnet-dir ./common/eth2_network_config/built_in_network_configs/mainnet --no-signature-verification state.ssz block.ssz output.ssz ``` The testing found that inlining reduced the epoch transition time from 398ms to 359ms, a reduction of 9.77%, which was found to be statistically significant with a two-tailed t-test (p < 0.01). Data and intermediate calculations can be found here: https://docs.google.com/spreadsheets/d/1tlf3eFjz3dcXeb9XVOn21953uYpc9RdQapPtcHGH1PY	2022-05-31 06:09:12 +00:00
Paul Hauner	16e49af8e1	Use genesis slot for node/syncing (#3226 ) ## Issue Addressed NA ## Proposed Changes Resolves this error log emitted from the VC prior to genesis: ``` WARN Unable connect to beacon node error: ServerMessage(ErrorMessage { code: 500, message: "UNHANDLED_ERROR: UnableToReadSlot", stacktraces: [] }) ``` ## Additional Info NA	2022-05-31 06:09:11 +00:00
Michael Sproul	98c8ac1a87	Fix typo in peer state transition log (#3224 ) ## Issue Addressed We were logging `out_finalized_epoch` instead of `our_finalized_epoch`. I noticed this ages ago but only just got around to fixing it. ## Additional Info I also reformatted the log line to respect the line length limit (`rustfmt` won't do it because it gets confused by the `;` in slog's log macros).	2022-05-31 06:09:10 +00:00
Michael Sproul	ee18f6a9f7	Add `lcli indexed-attestations` (#3221 ) ## Proposed Changes It's reasonably often that we want to manually convert an attestation to indexed form. This PR adds an `lcli` command for doing this, using an SSZ state and a list of JSON attestations (as extracted from a JSON block) as input.	2022-05-31 06:09:08 +00:00
Pawan Dhananjay	af5da1244e	Fix links in docs (#3219 ) ## Issue Addressed N/A ## Proposed Changes Fix the link for `advanced-release-candidates.md` in the lighthouse book and add it to the summary page.	2022-05-31 06:09:07 +00:00
Paul Hauner	6f732986f1	v2.3.0 (#3222 ) ## Issue Addressed NA ## Proposed Changes Please list or describe the changes introduced by this PR. ## Additional Info - Pending testing on our infra. Please do not merge	2022-05-30 01:35:10 +00:00
Paul Hauner	f675c865e2	Set Ropsten TTD to unrealistically high value (#3225 ) ## Issue Addressed NA ## Proposed Changes Updates Ropsten TTD as per https://github.com/eth-clients/merge-testnets/pull/11. ## Additional Info NA	2022-05-27 04:29:46 +00:00
Divma	a7896a58cc	move backfill sync jobs from highest priority to lowest (#3215 ) ## Issue Addressed #3212 ## Proposed Changes Move chain segments coming from back-fill syncing from highest priority to lowest ## Additional Info If this does not solve the issue, next steps would be lowering the batch size for back-fill sync, and as last resort throttling the processing of these chain segments	2022-05-26 02:05:17 +00:00
Mac L	fd55373b88	Add new VC metrics for beacon node availability (#3193 ) ## Issue Addressed #3154 ## Proposed Changes Add three new metrics for the VC: 1. `vc_beacon_nodes_synced_count` 2. `vc_beacon_nodes_available_count` 3. `vc_beacon_nodes_total_count` Their values mirror the values present in the following log line: ``` Apr 08 17:25:17.000 INFO Connected to beacon node(s) synced: 4, available: 4, total: 4, service: notifier ```	2022-05-26 02:05:16 +00:00
Paul Hauner	f4aa17ef85	v2.3.0-rc.0 (#3218 ) ## Issue Addressed NA ## Proposed Changes Bump versions ## Additional Info NA	2022-05-25 05:29:26 +00:00
Michael Sproul	229f883968	Avoid parallel fork choice runs during sync (#3217 ) ## Issue Addressed Fixes an issue that @paulhauner found with the v2.3.0 release candidate whereby the fork choice runs introduced by #3168 tripped over each other during sync: ``` May 24 23:06:40.542 WARN Error signalling fork choice waiter slot: 3884129, error: ForkChoiceSignalOutOfOrder { current: Slot(3884131), latest: Slot(3884129) }, service: beacon ``` This can occur because fork choice is called from the state advance _and_ the per-slot task. When one of these runs takes a long time it can end up finishing after a run from a later slot, tripping the error above. The problem is resolved by not running either of these fork choice calls during sync. Additionally, these parallel fork choice runs were causing issues in the database: ``` May 24 07:49:05.098 WARN Found a chain that should already have been pruned, head_slot: 92925, head_block_root: 0xa76c7bf1b98e54ed4b0d8686efcfdf853484e6c2a4c67e91cbf19e5ad1f96b17, service: beacon May 24 07:49:05.101 WARN Database migration failed error: HotColdDBError(FreezeSlotError { current_split_slot: Slot(92608), proposed_split_slot: Slot(92576) }), service: beacon ``` In this case, two fork choice calls triggering the finalization processing were being processed out of order due to differences in their processing time, causing the background migrator to try to advance finalization _backwards_ 😳. Removing the parallel fork choice runs from sync effectively addresses the issue, because these runs are most likely to have different finalized checkpoints (because of the speed at which fork choice advances during sync). In theory it's still possible to process updates out of order if any other fork choice runs end up completing out of order, but this should be much less common. Fixing out of order fork choice runs in general is difficult as it requires architectural changes like serialising fork choice updates through a single thread, or locking fork choice along with the head when it is mutated (https://github.com/sigp/lighthouse/pull/3175). ## Proposed Changes * Don't run per-slot fork choice during sync (if head is older than 4 slots) * Don't run state-advance fork choice during sync (if head is older than 4 slots) * Check for monotonic finalization updates in the background migrator. This is a good defensive check to have, and I'm not sure why we didn't have it before (we may have had it and wrongly removed it).	2022-05-25 03:27:30 +00:00

1 2 3 4 5 ...

4501 Commits