lighthouse

Author	SHA1	Message	Date
ethDreamer	f7354abe0f	Fix Block Cache Range Math for Faster Syncing (#3358 ) ## Issue Addressed While messing with the deposit snapshot stuff, I had my proxy running and noticed the beacon node wasn't syncing the block cache continuously. There were long periods where it did nothing. I believe this was caused by a logical error introduced in #3234 that dealt with an issue that arose while syncing the block cache on Ropsten. The problem is that when the block cache is initially syncing, it will trigger the logic that detects the cache is far behind the execution chain in time. This will trigger a batch syncing mechanism which is intended to sync further ahead than the chain would normally. But the batch syncing is actually slower than the range this function usually estimates (in this scenario). ## Proposed Changes I believe I've fixed this function by taking the end of the range to be the maximum of (batch syncing range, usual range). I've also renamed and restructured some things a bit. It's equivalent logic but I think it's more clear what's going on.	2022-07-26 02:17:21 +00:00
realbigsean	20ebf1f3c1	Realized unrealized experimentation (#3322 ) ## Issue Addressed Add a flag that optionally enables unrealized vote tracking. Would like to test out on testnets and benchmark differences in methods of vote tracking. This PR includes a DB schema upgrade to enable to new vote tracking style. Co-authored-by: realbigsean <sean@sigmaprime.io> Co-authored-by: Paul Hauner <paul@paulhauner.com> Co-authored-by: sean <seananderson33@gmail.com> Co-authored-by: Mac L <mjladson@pm.me>	2022-07-25 23:53:26 +00:00
Mac L	bb5a6d2cca	Add `execution_optimistic` flag to HTTP responses (#3070 ) ## Issue Addressed #3031 ## Proposed Changes Updates the following API endpoints to conform with https://github.com/ethereum/beacon-APIs/pull/190 and https://github.com/ethereum/beacon-APIs/pull/196 - [x] `beacon/states/{state_id}/root` - [x] `beacon/states/{state_id}/fork` - [x] `beacon/states/{state_id}/finality_checkpoints` - [x] `beacon/states/{state_id}/validators` - [x] `beacon/states/{state_id}/validators/{validator_id}` - [x] `beacon/states/{state_id}/validator_balances` - [x] `beacon/states/{state_id}/committees` - [x] `beacon/states/{state_id}/sync_committees` - [x] `beacon/headers` - [x] `beacon/headers/{block_id}` - [x] `beacon/blocks/{block_id}` - [x] `beacon/blocks/{block_id}/root` - [x] `beacon/blocks/{block_id}/attestations` - [x] `debug/beacon/states/{state_id}` - [x] `debug/beacon/heads` - [x] `validator/duties/attester/{epoch}` - [x] `validator/duties/proposer/{epoch}` - [x] `validator/duties/sync/{epoch}` Updates the following Server-Sent Events: - [x] `events?topics=head` - [x] `events?topics=block` - [x] `events?topics=finalized_checkpoint` - [x] `events?topics=chain_reorg` ## Backwards Incompatible There is a very minor breaking change with the way the API now handles requests to `beacon/blocks/{block_id}/root` and `beacon/states/{state_id}/root` when `block_id` or `state_id` is the `Root` variant of `BlockId` and `StateId` respectively. Previously a request to a non-existent root would simply echo the root back to the requester: ``` curl "http://localhost:5052/eth/v1/beacon/states/0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/root" {"data":{"root":"0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}} ``` Now it will return a `404`: ``` curl "http://localhost:5052/eth/v1/beacon/blocks/0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/root" {"code":404,"message":"NOT_FOUND: beacon block with root 0xaaaa…aaaa","stacktraces":[]} ``` In addition to this is the block root `0x0000000000000000000000000000000000000000000000000000000000000000` previously would return the genesis block. It will now return a `404`: ``` curl "http://localhost:5052/eth/v1/beacon/blocks/0x0000000000000000000000000000000000000000000000000000000000000000" {"code":404,"message":"NOT_FOUND: beacon block with root 0x0000…0000","stacktraces":[]} ``` ## Additional Info - `execution_optimistic` is always set, and will return `false` pre-Bellatrix. I am also open to the idea of doing something like `#[serde(skip_serializing_if = "Option::is_none")]`. - The value of `execution_optimistic` is set to `false` where possible. Any computation that is reliant on the `head` will simply use the `ExecutionStatus` of the head (unless the head block is pre-Bellatrix). Co-authored-by: Paul Hauner <paul@paulhauner.com>	2022-07-25 08:23:00 +00:00
Paul Hauner	21dec6f603	v2.4.0 (#3360 ) ## Issue Addressed NA ## Proposed Changes Bump versions to v2.4.0 ## Additional Info Blocked on: - ~~#3349~~ - ~~#3347~~	2022-07-21 22:02:36 +00:00
Pawan Dhananjay	612cdb7092	Merge readiness endpoint (#3349 ) ## Issue Addressed Resolves final task in https://github.com/sigp/lighthouse/issues/3260 ## Proposed Changes Adds a lighthouse http endpoint to indicate merge readiness. Blocked on #3339	2022-07-21 05:45:39 +00:00
Michael Sproul	e32868458f	Set safe block hash to justified (#3347 ) ## Issue Addressed Closes https://github.com/sigp/lighthouse/issues/3189. ## Proposed Changes - Always supply the justified block hash as the `safe_block_hash` when calling `forkchoiceUpdated` on the execution engine. - Refactor the `get_payload` routine to use the new `ForkchoiceUpdateParameters` struct rather than just the `finalized_block_hash`. I think this is a nice simplification and that the old way of computing the `finalized_block_hash` was unnecessary, but if anyone sees reason to keep that approach LMK.	2022-07-21 05:45:37 +00:00
Pawan Dhananjay	5b5cf9cfaa	Log ttd (#3339 ) ## Issue Addressed Resolves #3249 ## Proposed Changes Log merge related parameters and EE status in the beacon notifier before the merge. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2022-07-20 23:16:54 +00:00
ethDreamer	7c3ff903ca	Fix Gossip Penalties During Optimistic Sync Window (#3350 ) ## Issue Addressed * #3344 ## Proposed Changes There are a number of cases during block processing where we might get an `ExecutionPayloadError` but we shouldn't penalize peers. We were forgetting to enumerate all of the non-penalizing errors in every single match statement where we are making that decision. I created a function to make it explicit when we should and should not penalize peers and I used that function in all places where this logic is needed. This way we won't make the same mistake if we add another variant of `ExecutionPayloadError` in the future.	2022-07-20 20:59:38 +00:00
Pawan Dhananjay	e5e4e62758	Don't create a execution payload with same timestamp as terminal block (#3331 ) ## Issue Addressed Resolves #3316 ## Proposed Changes This PR fixes an issue where lighthouse created a transition block with `block.execution_payload().timestamp == terminal_block.timestamp` if the terminal block was created at the slot boundary.	2022-07-18 23:15:41 +00:00
Pawan Dhananjay	f9b9658711	Add merge support to simulator (#3292 ) ## Issue Addressed N/A ## Proposed Changes Make simulator merge compatible. Adds a `--post_merge` flag to the eth1 simulator that enables a ttd and simulates the merge transition. Uses the `MockServer` in the execution layer test utils to simulate a dummy execution node. Adds the merge transition simulation to CI.	2022-07-18 23:15:40 +00:00
Age Manning	2ed51c364d	Improve block-lookup functionality (#3287 ) Improves some of the functionality around single and parent block lookup. Gives extra information about whether failures for lookups are related to processing or downloading. This is entirely untested. Co-authored-by: Diva M <divma@protonmail.com>	2022-07-17 23:26:58 +00:00
Pawan Dhananjay	5243cc6c30	Add a u256_hex_be module to encode/decode U256 types (#3321 ) ## Issue Addressed Resolves #3314 ## Proposed Changes Add a module to encode/decode u256 types according to the execution layer encoding/decoding standards https://github.com/ethereum/execution-apis/blob/main/src/engine/specification.md#structures Updates `JsonExecutionPayloadV1.base_fee_per_gas`, `JsonExecutionPayloadHeaderV1.base_fee_per_gas` and `TransitionConfigurationV1.terminal_total_difficulty` to encode/decode according to standards Co-authored-by: Michael Sproul <micsproul@gmail.com>	2022-07-15 07:31:21 +00:00
Pawan Dhananjay	28b0ff27ff	Ignored sync jobs 2 (#3317 ) ## Issue Addressed Duplicate of #3269. Making this since @divagant-martian opened the previous PR and she can't approve her own PR 😄 Co-authored-by: Diva M <divma@protonmail.com>	2022-07-15 07:31:20 +00:00
Akihito Nakano	98a9626ef5	Bump the MSRV to 1.62 and using `#[derive(Default)]` on enums (#3304 ) ## Issue Addressed N/A ## Proposed Changes Since Rust 1.62, we can use `#[derive(Default)]` on enums. ✨ https://blog.rust-lang.org/2022/06/30/Rust-1.62.0.html#default-enum-variants There are no changes to functionality in this PR, just replaced the `Default` trait implementation with `#[derive(Default)]`.	2022-07-15 07:31:19 +00:00
Paul Hauner	1f54e10b7b	Do not interpret "latest valid hash" as identifying a valid hash (#3327 ) ## Issue Addressed NA ## Proposed Changes After some discussion in Discord with @mkalinin it was raised that it was not the intention of the engine API to have CLs validate the `latest_valid_hash` (LVH) and all ancestors. Whilst I believe the engine API is being updated such that the LVH must identify a valid hash or be set to some junk value, I'm not confident that we can rely upon the LVH as being valid (at least for now) due to the confusion surrounding it. Being able to validate blocks via the LVH is a relatively minor optimisation; if the LVH value ends up becoming our head we'll send an fcU and get the VALID status there. Falsely marking a block as valid has serious consequences and since it's a minor optimisation to use LVH I think that we don't take the risk. For clarity, we will still invalidate the descendants of the LVH, we just wont validate the ancestors. ## Additional Info NA	2022-07-13 23:07:49 +00:00
Paul Hauner	7a6e6928a3	Further remove EE redundancy (#3324 ) ## Issue Addressed Resolves #3176 ## Proposed Changes Continues from PRs by @divagant-martian to gradually remove EL redundancy (see #3284, #3257). This PR achieves: - Removes the `broadcast` and `first_success` methods. The functional impact is that every request to the EE will always be tried immediately, regardless of the cached `EngineState` (this resolves #3176). Previously we would check the engine state before issuing requests, this doesn't make sense in a single-EE world; there's only one EE so we might as well try it for every request. - Runs the upcheck/watchdog routine once per slot rather than thrice. When we had multiple EEs frequent polling was useful to try and detect when the primary EE had come back online and we could switch to it. That's not as relevant now. - Always creates logs in the `Engines::upcheck` function. Previously we would mute some logs since they could get really noisy when one EE was down but others were functioning fine. Now we only have one EE and are upcheck-ing it less, it makes sense to always produce logs. This PR purposefully does not achieve: - Updating all occurances of "engines" to "engine". I'm trying to keep the diff small and manageable. We can come back for this. ## Additional Info NA	2022-07-13 20:31:39 +00:00
Divma	6d42a09ff8	Merge Engines and Engine struct in one in the `execution_layer` crate (#3284 ) ## Issue Addressed Part of https://github.com/sigp/lighthouse/issues/3118, continuation of https://github.com/sigp/lighthouse/pull/3257 and https://github.com/sigp/lighthouse/pull/3283 ## Proposed Changes - Merge the [`Engines`](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L161-L165)`) struct and [`Engine` ](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L62-L67)`) - Remove unnecessary generics ## Additional Info There is more cleanup to do that will come in subsequent PRs	2022-07-11 01:44:41 +00:00
Michael Sproul	748475be1d	Ensure caches are built for block_rewards POST API (#3305 ) ## Issue Addressed Follow up to https://github.com/sigp/lighthouse/pull/3290 that fixes a caching bug ## Proposed Changes Build the committee cache for the new `POST /lighthouse/analysis/block_rewards` API. Due to an unusual quirk of the total active balance cache the API endpoint would sometimes fail after loading a state from disk which had a current epoch cache _but not_ a total active balance cache. This PR adds calls to build the caches immediately before they're required, and has been running smoothly with `blockdreamer` the last few days.	2022-07-04 02:56:15 +00:00
Akihito Nakano	1cc8a97d4e	Remove unused method in HandlerNetworkContext (#3299 ) ## Issue Addressed N/A ## Proposed Changes Removed unused method in `HandlerNetworkContext`.	2022-07-04 02:56:14 +00:00
Divma	1219da9a45	Simplify error handling after engines fallback removal (#3283 ) ## Issue Addressed Part of #3118, continuation of #3257 ## Proposed Changes - the [ `first_success_without_retry` ](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L348-L351)`) function returns a single error. - the [`first_success`](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L324)`) function returns a single error. - [ `EngineErrors` ](`9c429d0764/beacon_node/execution_layer/src/lib.rs (L69)`) carries a single error. - [`EngineError`](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L173-L177)`) now does not need to carry an Id - [`process_multiple_payload_statuses`](`9c429d0764/beacon_node/execution_layer/src/payload_status.rs (L46-L50)`) now doesn't need to receive an iterator of statuses and weight in different errors ## Additional Info This is built on top of #3294	2022-07-04 02:56:13 +00:00
Michael Sproul	61ed5f0ec6	Optimize historic committee calculation for the HTTP API (#3272 ) ## Issue Addressed Closes https://github.com/sigp/lighthouse/issues/3270 ## Proposed Changes Optimize the calculation of historic beacon committees in the HTTP API. This is achieved by allowing committee caches to be constructed for historic epochs, and constructing these committee caches on the fly in the API. This is much faster than reconstructing the state at the requested epoch, which usually takes upwards of 20s, and sometimes minutes with SPRP=8192. The depth of the `randao_mixes` array allows us to look back 64K epochs/0.8 years from a single state, which is pretty awesome! We always use the `state_id` provided by the caller, but will return a nice 400 error if the epoch requested is out of range for the state requested, e.g. ```bash # Prater curl "http://localhost:5052/eth/v1/beacon/states/3170304/committees?epoch=33538" ``` ```json {"code":400,"message":"BAD_REQUEST: epoch out of bounds, try state at slot 1081344","stacktraces":[]} ``` Queries will be fastest when aligned to `slot % SPRP == 0`, so the hint suggests a slot that is 0 mod 8192.	2022-07-04 02:56:11 +00:00
Paul Hauner	be4e261e74	Use async code when interacting with EL (#3244 ) ## Overview This rather extensive PR achieves two primary goals: 1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state. 2. Refactors fork choice, block production and block processing to `async` functions. Additionally, it achieves: - Concurrent forkchoice updates to the EL and cache pruning after a new head is selected. - Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production. - Concurrent per-block-processing and execution payload verification during block processing. - The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?): - I had to do this to deal with sending blocks into spawned tasks. - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones. - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap. - Avoids cloning all the blocks in every chain segment during sync. - It also has the potential to clean up our code where we need to pass an owned block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough 😅) - The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs. For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273 ## Changes to `canonical_head` and `fork_choice` Previously, the `BeaconChain` had two separate fields: ``` canonical_head: RwLock<Snapshot>, fork_choice: RwLock<BeaconForkChoice> ``` Now, we have grouped these values under a single struct: ``` canonical_head: CanonicalHead { cached_head: RwLock<Arc<Snapshot>>, fork_choice: RwLock<BeaconForkChoice> } ``` Apart from ergonomics, the only actual change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously. ## Breaking Changes ### The `state` (root) field in the `finalized_checkpoint` SSE event Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event: 1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`. 4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots. Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](`de2b2801c8/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java (L171-L182)`) it uses [`getStateRootFromBlockRoot`](`de2b2801c8/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java (L336-L341)`) which uses (1). I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku. ## Notes for Reviewers I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct. I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking". I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run at least once means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows when it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it. I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around. Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2. You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests: - Changing tests to be `tokio::async` tests. - Adding `.await` to fork choice, block processing and block production functions. - Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`. - Wrapping `SignedBeaconBlock` in an `Arc`. - In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant. I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic. Co-authored-by: Mac L <mjladson@pm.me>	2022-07-03 05:36:50 +00:00
realbigsean	a7da0677d5	Remove builder redundancy (#3294 ) ## Issue Addressed This PR is a subset of the changes in #3134. Unstable will still not function correctly with the new builder spec once this is merged, #3134 should be used on testnets ## Proposed Changes - Removes redundancy in "builders" (servers implementing the builder spec) - Renames `payload-builder` flag to `builder` - Moves from old builder RPC API to new HTTP API, but does not implement the validator registration API (implemented in https://github.com/sigp/lighthouse/pull/3194) Co-authored-by: sean <seananderson33@gmail.com> Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-07-01 01:15:19 +00:00
Divma	d40c76e667	Fix clippy lints for rust 1.62 (#3300 ) ## Issue Addressed Fixes some new clippy lints after the last rust release ### Lints fixed for the curious: - [cast_abs_to_unsigned](https://rust-lang.github.io/rust-clippy/master/index.html#cast_abs_to_unsigned) - [map_identity](https://rust-lang.github.io/rust-clippy/master/index.html#map_identity) - [let_unit_value](https://rust-lang.github.io/rust-clippy/master/index.html#let_unit_value) - [crate_in_macro_def](https://rust-lang.github.io/rust-clippy/master/index.html#crate_in_macro_def) - [extra_unused_lifetimes](https://rust-lang.github.io/rust-clippy/master/index.html#extra_unused_lifetimes) - [format_push_string](https://rust-lang.github.io/rust-clippy/master/index.html#format_push_string)	2022-06-30 22:51:49 +00:00
realbigsean	f6ec44f0dd	Register validator api (#3194 ) ## Issue Addressed Lays the groundwork for builder API changes by implementing the beacon-API's new `register_validator` endpoint ## Proposed Changes - Add a routine in the VC that runs on startup (re-try until success), once per epoch or whenever `suggested_fee_recipient` is updated, signing `ValidatorRegistrationData` and sending it to the BN. - TODO: `gas_limit` config options https://github.com/ethereum/builder-specs/issues/17 - BN only sends VC registration data to builders on demand, but VC registration data does update the BN's prepare proposer cache and send an updated fcU to a local EE. This is necessary for fee recipient consistency between the blinded and full block flow in the event of fallback. Having the BN only send registration data to builders on demand gives feedback directly to the VC about relay status. Also, since the BN has no ability to sign these messages anyways (so couldn't refresh them if it wanted), and validator registration is independent of the BN head, I think this approach makes sense. - Adds upcoming consensus spec changes for this PR https://github.com/ethereum/consensus-specs/pull/2884 - I initially applied the bit mask based on a configured application domain.. but I ended up just hard coding it here instead because that's how it's spec'd in the builder repo. - Should application mask appear in the api? Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-06-30 00:49:21 +00:00
Pawan Dhananjay	5de00b7ee8	Unify execution layer endpoints (#3214 ) ## Issue Addressed Resolves #3069 ## Proposed Changes Unify the `eth1-endpoints` and `execution-endpoints` flags in a backwards compatible way as described in https://github.com/sigp/lighthouse/issues/3069#issuecomment-1134219221 Users have 2 options: 1. Use multiple non auth execution endpoints for deposit processing pre-merge 2. Use a single jwt authenticated execution endpoint for both execution layer and deposit processing post merge Related https://github.com/sigp/lighthouse/issues/3118 To enable jwt authenticated deposit processing, this PR removes the calls to `net_version` as the `net` namespace is not exposed in the auth server in execution clients. Moving away from using `networkId` is a good step in my opinion as it doesn't provide us with any added guarantees over `chainId`. See https://github.com/ethereum/consensus-specs/issues/2163 and https://github.com/sigp/lighthouse/issues/2115 Co-authored-by: Paul Hauner <paul@paulhauner.com>	2022-06-29 09:07:09 +00:00
Michael Sproul	53b2b500db	Extend block reward APIs (#3290 ) ## Proposed Changes Add a new HTTP endpoint `POST /lighthouse/analysis/block_rewards` which takes a vec of `BeaconBlock`s as input and outputs the `BlockReward`s for them. Augment the `BlockReward` struct with the attestation data for attestations in the block, which simplifies access to this information from blockprint. Using attestation data I've been able to make blockprint up to 95% accurate across Prysm/Lighthouse/Teku/Nimbus. I hope to go even higher using a bunch of synthetic blocks produced for Prysm/Nimbus/Lodestar, which are underrepresented in the current training data.	2022-06-29 04:50:37 +00:00
Paul Hauner	45b2eb18bc	v2.3.2-rc.0 (#3289 ) ## Issue Addressed NA ## Proposed Changes Bump versions ## Additional Info NA	2022-06-28 03:03:30 +00:00
Pawan Dhananjay	7acfbd89ee	Recover from NonConsecutive eth1 errors (#3273 ) ## Issue Addressed Fixes #1864 and a bunch of other closed but unresolved issues. ## Proposed Changes Allows the deposit caching to recover from `NonConsecutive` deposit errors by resetting the last processed block to the last valid deposit's block number. Still not sure of the underlying cause of this error, but this should recover the cache so we don't need `--eth1-purge-cache` anymore 🎉 A huge thanks to @one-three-three-seven for reproducing the error and providing the data that helped testing out the fix 🙌 Still needs a few more tests.	2022-06-26 23:10:58 +00:00
Akihito Nakano	082ed35bdc	Test the pruning of excess peers using randomly generated input (#3248 ) ## Issue Addressed https://github.com/sigp/lighthouse/issues/3092 ## Proposed Changes Added property-based tests for the pruning implementation. A randomly generated input for the test contains connection direction, subnets, and scores. ## Additional Info I left some comments on this PR, what I have tried, and [a question](https://github.com/sigp/lighthouse/pull/3248#discussion_r891981969). Co-authored-by: Diva M <divma@protonmail.com>	2022-06-25 22:22:34 +00:00
Michael Sproul	d21f083777	Add more paths to HTTP API metrics (#3282 ) ## Proposed Changes Expand the set of paths tracked by the HTTP API metrics to include all paths hit by the validator client. These paths were only partially updated for Altair, so we were missing some of the sync committee and v2 APIs.	2022-06-23 05:19:21 +00:00
Paul Hauner	748658e32c	Add some debug logs for checkpoint sync (#3281 ) ## Issue Addressed NA ## Proposed Changes I used these logs when debugging a spurious failure with Infura and thought they might be nice to have around permanently. There's no changes to functionality in this PR, just some additional `debug!` logs. ## Additional Info NA	2022-06-23 05:19:20 +00:00
Divma	7af5742081	Deprecate step param in BlocksByRange RPC request (#3275 ) ## Issue Addressed Deprecates the step parameter in the blocks by range request ## Proposed Changes - Modifies the BlocksByRangeRequest type to remove the step parameter and everywhere we took it into account before - Adds a new type to still handle coding and decoding of requests that use the parameter ## Additional Info I went with a deprecation over the type itself so that requests received outside `lighthouse_network` don't even need to deal with this parameter. After the deprecation period just removing the Old blocks by range request should be straightforward	2022-06-22 16:23:34 +00:00
Divma	2063c0fa0d	Initial work to remove engines fallback from the `execution_layer` crate (#3257 ) ## Issue Addressed Part of #3160 ## Proposed Changes Use only the first url given in the execution engine, if more than one is provided log it. This change only moves having multiple engines to one. The amount of code cleanup that can and should be done forward is not small and would interfere with ongoing PRs. I'm keeping the changes intentionally very very minimal. ## Additional Info Future works: - In [ `EngineError` ](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L173-L177)`) the id is not needed since it now has no meaning. - the [ `first_success_without_retry` ](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L348-L351)`) function can return a single error. - the [`first_success`](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L324)`) function can return a single error. - After the redundancy is removed for the builders, we can probably make the [ `EngineErrors` ](`9c429d0764/beacon_node/execution_layer/src/lib.rs (L69)`) carry a single error. - Merge the [`Engines`](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L161-L165)`) struct and [`Engine` ](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L62-L67)`) - Fix the associated configurations and cli params. Not sure if both are done in https://github.com/sigp/lighthouse/pull/3214 In general I think those changes can be done incrementally and in individual pull requests.	2022-06-22 14:27:16 +00:00
Michael Sproul	efebf712dd	Avoid cloning snapshots during sync (#3271 ) ## Issue Addressed Closes https://github.com/sigp/lighthouse/issues/2944 ## Proposed Changes Remove snapshots from the cache during sync rather than cloning them. This reduces unnecessary cloning and memory fragmentation during sync. ## Additional Info This PR relies on the fact that the `block_delay` cache is not populated for blocks from sync. Relying on block delay may have the side effect that a change in `block_delay` calculation could lead to: a) more clones, if block delays are added for syncing blocks or b) less clones, if blocks near the head are erroneously provided without a `block_delay`. Case (a) would be a regression to the current status quo, and (b) is low-risk given we know that the snapshot cache is current susceptible to misses (hence `tree-states`).	2022-06-20 23:20:29 +00:00
eklm	a9e158663b	Fix validator_monitor_prev_epoch_ metrics (#2911 ) ## Issue Addressed #2820 ## Proposed Changes The problem is that validator_monitor_prev_epoch metrics are updated only if there is EpochSummary present in summaries map for the previous epoch and it is not the case for the offline validator. Ensure that EpochSummary is inserted into summaries map also for the offline validators.	2022-06-20 04:06:30 +00:00
Pawan Dhananjay	f428719761	Do not penalize peers on execution layer offline errors (#3258 ) ## Issue Addressed Partly resolves https://github.com/sigp/lighthouse/issues/3032 ## Proposed Changes Extracts some of the functionality of #3094 into a separate PR as the original PR requires a bit more work. Do not unnecessarily penalize peers when we fail to validate received execution payloads because our execution layer is offline.	2022-06-19 23:13:40 +00:00
Paul Hauner	564d7da656	v2.3.1 (#3262 ) ## Issue Addressed NA ## Proposed Changes Bump versions ## Additional Info NA	2022-06-14 05:25:38 +00:00
Divma	3dd50bda11	Improve substream management (#3261 ) ## Issue Addressed Which issue # does this PR address? ## Proposed Changes Please list or describe the changes introduced by this PR. ## Additional Info Please provide any additional information. For example, future considerations or information useful for reviewers.	2022-06-10 06:58:50 +00:00
Paul Hauner	11d80a6a38	Optimise `per_epoch_processing` low-hanging-fruit (#3254 ) ## Issue Addressed NA ## Proposed Changes - Uses a `Vec` in `SingleEpochParticipationCache` rather than `HashMap` to speed up processing times at the cost of memory usage. - Cache the result of `integer_sqrt` rather than recomputing for each validator. - Cache `state.previous_epoch` rather than recomputing it for each validator. ### Benchmarks Benchmarks on a recent mainnet state using #3252 to get timing. #### Without this PR ``` lcli skip-slots --state-path /tmp/state-0x3cdc.ssz --partial-state-advance --slots 32 --state-root 0x3cdc33cd02713d8d6cc33a6dbe2d3a5bf9af1d357de0d175a403496486ff845e --runs 10 [2022-06-09T08:21:02Z INFO lcli::skip_slots] Using mainnet spec [2022-06-09T08:21:02Z INFO lcli::skip_slots] Advancing 32 slots [2022-06-09T08:21:02Z INFO lcli::skip_slots] Doing 10 runs [2022-06-09T08:21:02Z INFO lcli::skip_slots] State path: "/tmp/state-0x3cdc.ssz" SSZ decoding /tmp/state-0x3cdc.ssz: 43ms [2022-06-09T08:21:03Z INFO lcli::skip_slots] Run 0: 245.718794ms [2022-06-09T08:21:03Z INFO lcli::skip_slots] Run 1: 245.364782ms [2022-06-09T08:21:03Z INFO lcli::skip_slots] Run 2: 255.866179ms [2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 3: 243.838909ms [2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 4: 250.431425ms [2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 5: 248.68765ms [2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 6: 262.051113ms [2022-06-09T08:21:05Z INFO lcli::skip_slots] Run 7: 264.293967ms [2022-06-09T08:21:05Z INFO lcli::skip_slots] Run 8: 293.202007ms [2022-06-09T08:21:05Z INFO lcli::skip_slots] Run 9: 264.552017ms ``` #### With this PR: ``` lcli skip-slots --state-path /tmp/state-0x3cdc.ssz --partial-state-advance --slots 32 --state-root 0x3cdc33cd02713d8d6cc33a6dbe2d3a5bf9af1d357de0d175a403496486ff845e --runs 10 [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 0: 73.898678ms [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 1: 75.536978ms [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 2: 75.176104ms [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 3: 76.460828ms [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 4: 75.904195ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 5: 75.53077ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 6: 74.745572ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 7: 75.823489ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 8: 74.892055ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 9: 76.333569ms ``` ## Additional Info NA	2022-06-10 04:29:28 +00:00
Divma	56b4cd88ca	minor libp2p upgrade (#3259 ) ## Issue Addressed Upgrades libp2p	2022-06-09 23:48:51 +00:00
Divma	cfd26d25e0	do not count sync batch attempts when peer is not at fault (#3245 ) ## Issue Addressed currently we count a failed attempt for a syncing chain even if the peer is not at fault. This makes us do more work if the chain fails, and heavily penalize peers, when we can simply retry. Inspired by a proposal I made to #3094 ## Proposed Changes If a batch fails but the peer is not at fault, do not count the attempt Also removes some annoying logs ## Additional Info We still get a counter on ignored attempts.. just in case	2022-06-07 02:35:56 +00:00
Divma	58e223e429	update libp2p (#3233 ) ## Issue Addressed na ## Proposed Changes Updates libp2p to https://github.com/libp2p/rust-libp2p/pull/2662 ## Additional Info From comments on the relevant PRs listed, we should pay attention at peer management consistency, but I don't think anything weird will happen. This is running in prater tok and sin	2022-06-07 02:35:55 +00:00
Michael Sproul	54cf94ea59	Fix per-slot timer in presence of clock changes (#3243 ) ## Issue Addressed Fixes a timing issue that results in spurious fork choice notifier failures: ``` WARN Error signalling fork choice waiter slot: 3962270, error: ForkChoiceSignalOutOfOrder { current: Slot(3962271), latest: Slot(3962270) }, service: beacon ``` There’s a fork choice run that is scheduled to run at the start of every slot by the `timer`, which creates a 12s interval timer when the beacon node starts up. The problem is that if there’s a bit of clock drift that gets corrected via NTP (or a leap second for that matter) then these 12s intervals will cease to line up with the start of the slot. This then creates the mismatch in slot number that we see above. Lighthouse also runs fork choice 500ms before the slot begins, and these runs are what is conflicting with the start-of-slot runs. This means that the warning in current versions of Lighthouse is mostly cosmetic because fork choice is up to date with all but the most recent 500ms of attestations (which usually isn’t many). ## Proposed Changes Fix the per-slot timer so that it continually re-calculates the duration to the start of the next slot and waits for that. A side-effect of this change is that we may skip slots if the per-slot task takes >12s to run, but I think this is an unlikely scenario and an acceptable compromise.	2022-06-06 23:52:32 +00:00
Divma	493c2c037c	reduce reprocess queue/channel sizes (#3239 ) ## Issue Addressed Reduces the effect of late blocks on overall node buildup ## Proposed Changes change the capacity of the channels used to send work for reprocessing in the beacon processor, and to send back to the main processor task, to be 75% of the capacity of the channel for receiving new events ## Additional Info The issues we've seen suggest we should still evaluate node performance under stress, with late blocks being a big factor. Other changes that could help: 1. right now we have a cap for queued attestations for reprocessing that applies to the sum of aggregated and unaggregated attestations. We could consider adding a separate cap that favors aggregated ones. 2. solving #2848	2022-06-06 23:52:31 +00:00
Akihito Nakano	a6d2ed6119	Fix: PeerManager doesn't remove "outbound only" peers which should be pruned (#3236 ) ## Issue Addressed This is one step to address https://github.com/sigp/lighthouse/issues/3092 before introducing `quickcheck`. I noticed an issue while I was reading the pruning implementation `PeerManager::prune_excess_peers()`. If a peer with the following condition, `outbound_peers_pruned` counter increases but the peer is not pushed to `peers_to_prune`. - [outbound only](`1e4ac8a4b9/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L1018)`) - [min_subnet_count <= MIN_SYNC_COMMITTEE_PEERS](`1e4ac8a4b9/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L1047)`) As a result, PeerManager doesn't remove "outbound" peers which should be pruned. Note: [`subnet_to_peer`](`e0d673ea86/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L999)`) (HashMap) doesn't guarantee a particular order of iteration. So whether the test fails depend on the order of iteration.	2022-06-06 05:51:10 +00:00
Michael Sproul	47d57a290b	Improve eth1 block cache sync (for Ropsten) (#3234 ) ## Issue Addressed Fix for the eth1 cache sync issue observed on Ropsten. ## Proposed Changes Ropsten blocks are so infrequent that they broke our algorithm for downloading eth1 blocks. We currently try to download forwards from the last block in our cache to the block with block number [`remote_highest_block - FOLLOW_DISTANCE + FOLLOW_DISTANCE / ETH1_BLOCK_TIME_TOLERANCE_FACTOR`](`6f732986f1/beacon_node/eth1/src/service.rs (L489-L492)`). With the tolerance set to 4 this is insufficient because we lag by 1536 blocks, which is more like ~14 hours on Ropsten. This results in us having an incomplete eth1 cache, because we should cache all blocks between -16h and -8h. Even if we were to set the tolerance to 2 for the largest allowance, we would only look back 1024 blocks which is still more than 8 hours. For example consider this block https://ropsten.etherscan.io/block/12321390. The block from 1536 blocks earlier is 14 hours and 20 minutes before it: https://ropsten.etherscan.io/block/12319854. The block from 1024 blocks earlier is https://ropsten.etherscan.io/block/12320366, 8 hours and 48 minutes before. - This PR introduces a new CLI flag called `--eth1-cache-follow-distance` which can be used to set the distance manually. - A new dynamic catchup mechanism is added which detects when the cache is lagging the true eth1 chain and tries to download more blocks within the follow distance in order to catch up.	2022-06-03 06:05:03 +00:00
Mac L	55ac423872	Emit log when fee recipient values are inconsistent (#3202 ) ## Issue Addressed #3156 ## Proposed Changes Emit a `WARN` log whenever the value of `fee_recipient` as returned from the EE is different from the value of `suggested_fee_recipient` as set on the BN, for example by the `--suggested-fee-recipient` CLI flag. ## Additional Info I have set the log level to `WARN` since it is legal behaviour (meaning it isn't really an error but is important to know when it is occurring). If we feel like this behaviour is almost always undesired (caused by a misconfiguration or malicious EE) then an `ERRO` log would be more appropriate. Happy to change it in that case.	2022-06-03 03:22:54 +00:00
Paul Hauner	16e49af8e1	Use genesis slot for node/syncing (#3226 ) ## Issue Addressed NA ## Proposed Changes Resolves this error log emitted from the VC prior to genesis: ``` WARN Unable connect to beacon node error: ServerMessage(ErrorMessage { code: 500, message: "UNHANDLED_ERROR: UnableToReadSlot", stacktraces: [] }) ``` ## Additional Info NA	2022-05-31 06:09:11 +00:00
Michael Sproul	98c8ac1a87	Fix typo in peer state transition log (#3224 ) ## Issue Addressed We were logging `out_finalized_epoch` instead of `our_finalized_epoch`. I noticed this ages ago but only just got around to fixing it. ## Additional Info I also reformatted the log line to respect the line length limit (`rustfmt` won't do it because it gets confused by the `;` in slog's log macros).	2022-05-31 06:09:10 +00:00

1 2 3 4 5 ...

1903 Commits