lighthouse

Author	SHA1	Message	Date
Michael Sproul	77eabc5401	Revert "Optimise HTTP validator lookups" (#3658 ) ## Issue Addressed This reverts commit `ca9dc8e094` (PR #3559) with some modifications. ## Proposed Changes Unfortunately that PR introduced a performance regression in fork choice. The optimisation _intended_ to build the exit and pubkey caches on the head state _only if_ they were not already built. However, due to the head state always being cloned without these caches, we ended up building them every time the head changed, leading to a ~70ms+ penalty on mainnet. `fcfd02aeec/beacon_node/beacon_chain/src/canonical_head.rs (L633-L636)` I believe this is a severe enough regression to justify immediately releasing v3.2.1 with this change. ## Additional Info I didn't fully revert #3559, because there were some unrelated deletions of dead code in that PR which I figured we may as well keep. An alternative would be to clone the extra caches, but this likely still imposes some cost, so in the interest of applying a conservative fix quickly, I think reversion is the best approach. The optimisation from #3559 was not even optimising a particularly significant path, it was mostly for VCs running larger numbers of inactive keys. We can re-do it in the `tree-states` world where cache clones are cheap.	2022-10-26 06:50:04 +00:00
Paul Hauner	fcfd02aeec	Release v3.2.0 (#3647 ) ## Issue Addressed NA ## Proposed Changes Bump version to `v3.2.0` ## Additional Info - ~~Blocked on #3597~~ - ~~Blocked on #3645~~ - ~~Blocked on #3653~~ - ~~Requires additional testing~~	2022-10-25 06:36:51 +00:00
Divma	3a5888e53d	Ban and unban peers at the swarm level (#3653 ) ## Issue Addressed I missed this from https://github.com/sigp/lighthouse/pull/3491. peers were being banned at the behaviour level only. The identify errors are explained by this as well ## Proposed Changes Add banning and unbanning ## Additional Info Befor,e having tests that catch this was hard because the swarm was outside the behaviour. We could now have tests that prevent something like this in the future	2022-10-24 21:39:30 +00:00
pinkiebell	d0efb6b18a	beacon_node: add --disable-deposit-contract-sync flag (#3597 ) Overrides any previous option that enables the eth1 service. Useful for operating a `light` beacon node. Co-authored-by: Michael Sproul <micsproul@gmail.com>	2022-10-19 22:55:49 +00:00
GeemoCandama	c5cd0d9b3f	add execution-timeout-multiplier flag to optionally increase timeouts (#3631 ) ## Issue Addressed Add flag to lengthen execution layer timeouts Which issue # does this PR address? #3607 ## Proposed Changes Added execution-timeout-multiplier flag and a cli test to ensure the execution layer config has the multiplier set correctly. Please list or describe the changes introduced by this PR. Add execution_timeout_multiplier to the execution layer config as Option<u32> and pass the u32 to HttpJsonRpc. ## Additional Info Not certain that this is the best way to implement it so I'd appreciate any feedback. Please provide any additional information. For example, future considerations or information useful for reviewers.	2022-10-18 04:02:07 +00:00
Michael Sproul	edf23bb40e	Fix attestation shuffling filter (#3629 ) ## Issue Addressed Fix a bug in block production that results in blocks with 0 attestations during the first slot of an epoch. The bug is marked by debug logs of the form: > DEBG Discarding attestation because of missing ancestor, block_root: 0x3cc00d9c9e0883b2d0db8606278f2b8423d4902f9a1ee619258b5b60590e64f8, pivot_slot: 4042591 It occurs when trying to look up the shuffling decision root for an attestation from a slot which is prior to fork choice's finalized block. This happens frequently when proposing in the first slot of the epoch where we have: - `current_epoch == n` - `attestation.data.target.epoch == n - 1` - attestation shuffling epoch `== n - 3` (decision block being the last block of `n - 3`) - `state.finalized_checkpoint.epoch == n - 2` (first block of `n - 2` is finalized) Hence the shuffling decision slot is out of range of the fork choice backwards iterator _by a single slot_. Unfortunately this bug was hidden when we weren't pruning fork choice, and then reintroduced in v2.5.1 when we fixed the pruning (https://github.com/sigp/lighthouse/releases/tag/v2.5.1). There's no way to turn that off or disable the filtering in our current release, so we need a new release to fix this issue. Fortunately, it also does not occur on every epoch boundary because of the gradual pruning of fork choice every 256 blocks (~8 epochs): `01e84b71f5/consensus/proto_array/src/proto_array_fork_choice.rs (L16)` `01e84b71f5/consensus/proto_array/src/proto_array.rs (L713-L716)` So the probability of proposing a 0-attestation block given a proposal assignment is approximately `1/32 * 1/8 = 0.39%`. ## Proposed Changes - Load the block's shuffling ID from fork choice and verify it against the expected shuffling ID of the head state. This code was initially written before we had settled on a representation of shuffling IDs, so I think it's a nice simplification to make use of them here rather than more ad-hoc logic that fundamentally does the same thing. ## Additional Info Thanks to @moshe-blox for noticing this issue and bringing it to our attention.	2022-10-18 04:02:06 +00:00
Michael Sproul	59ec6b71b8	Consensus context with proposer index caching (#3604 ) ## Issue Addressed Closes https://github.com/sigp/lighthouse/issues/2371 ## Proposed Changes Backport some changes from `tree-states` that remove duplicated calculations of the `proposer_index`. With this change the proposer index should be calculated only once for each block, and then plumbed through to every place it is required. ## Additional Info In future I hope to add more data to the consensus context that is cached on a per-epoch basis, like the effective balances of validators and the base rewards. There are some other changes to remove indexing in tests that were also useful for `tree-states` (the `tree-states` types don't implement `Index`).	2022-10-15 22:25:54 +00:00
Michael Sproul	e4cbdc1c77	Optimistic sync spec tests (v1.2.0) (#3564 ) ## Issue Addressed Implements new optimistic sync test format from https://github.com/ethereum/consensus-specs/pull/2982. ## Proposed Changes - Add parsing and runner support for the new test format. - Extend the mock EL with a set of canned responses keyed by block hash. Although this doubles up on some of the existing functionality I think it's really nice to use compared to the `preloaded_responses` or static responses. I think we could write novel new opt sync tests using these primtives much more easily than the previous ones. Forks are natively supported, and different responses to `forkchoiceUpdated` and `newPayload` are also straight-forward. ## Additional Info Blocked on merge of the spec PR and release of new test vectors.	2022-10-15 22:25:52 +00:00
Michael Sproul	ca9dc8e094	Optimise HTTP validator lookups (#3559 ) ## Issue Addressed While digging around in some logs I noticed that queries for validators by pubkey were taking 10ms+, which seemed too long. This was due to a loop through the entire validator registry for each lookup. ## Proposed Changes Rather than using a loop through the register, this PR utilises the pubkey cache which is usually initialised at the head. In case the cache isn't built, we fall back to the previous loop logic. In the vast majority of cases I expect the cache will be built, as the validator client queries at the `head` where all caches should be built. ## Additional Info I had to modify the cache build that runs after fork choice to build the pubkey cache. I think it had been optimised out, perhaps accidentally. I think it's preferable to have the exit cache and the pubkey cache built on the head state, as they are required for verifying deposits and exits respectively, and we may as well build them off the hot path of block processing. Previously they'd get built the first time a deposit or exit needed to be verified. I've deleted the unused `map_state` function which was obsoleted by `map_state_and_execution_optimistic`.	2022-10-15 22:25:51 +00:00
ethDreamer	221c433d62	Fixed a ton of state_processing stuff (#3642 ) FIXME's: * consensus/fork_choice/src/fork_choice.rs * consensus/state_processing/src/per_epoch_processing/capella.rs * consensus/types/src/execution_payload_header.rs TODO's: * consensus/state_processing/src/per_epoch_processing/capella/partial_withdrawals.rs * consensus/state_processing/src/per_epoch_processing/capella/full_withdrawals.rs	2022-10-14 17:35:10 -05:00
ethDreamer	255fdf0724	Added Capella Data Structures to consensus/types (#3637 ) * Ran Cargo fmt * Added Capella Data Structures to consensus/types	2022-10-13 09:37:20 -05:00
Pawan Dhananjay	1430b561c3	Add more gossip verification conditions	2022-10-06 21:16:59 -05:00
realbigsean	44515b8cbe	cargo fix	2022-10-05 17:20:54 -04:00
realbigsean	b5b4ce9509	blob production	2022-10-05 17:14:45 -04:00
Pawan Dhananjay	91efb9d4c7	Add todos	2022-10-05 02:56:57 -05:00
Pawan Dhananjay	21bf3d37cd	Reprocess blob sidecar messages	2022-10-05 02:52:26 -05:00
Pawan Dhananjay	c55b28bf10	Minor fixes	2022-10-04 19:18:06 -05:00
Pawan Dhananjay	12fe514550	Add more gossip verification functions for blobs	2022-10-04 19:17:53 -05:00
Pawan Dhananjay	9d99c784ea	Add gossip verification stub	2022-10-04 17:54:14 -05:00
realbigsean	7527c2b455	fix RPC limit add blob signing domain	2022-10-04 14:57:29 -04:00
realbigsean	ba16a037a3	cleanup	2022-10-04 09:34:05 -04:00
mariuspod	242ae21e5d	Pass EL JWT secret key via cli flag (#3568 ) ## Proposed Changes In this change I've added a new beacon_node cli flag `--execution-jwt-secret-key` for passing the JWT secret directly as string. Without this flag, it was non-trivial to pass a secrets file containing a JWT secret key without compromising its contents into some management repo or fiddling around with manual file mounts for cloud-based deployments. When used in combination with environment variables, the secret can be injected into container-based systems like docker & friends quite easily. It's both possible to either specify the file_path to the JWT secret or pass the JWT secret directly. I've modified the docs and attached a test as well. ## Additional Info The logic has been adapted a bit so that either one of `--execution-jwt` or `--execution-jwt-secret-key` must be set when specifying `--execution-endpoint` so that it's still compatible with the semantics before this change and there's at least one secret provided.	2022-10-04 12:41:03 +00:00
realbigsean	c0dc42ea07	cargo fmt	2022-10-04 08:21:46 -04:00
Divma	4926e3967f	[DEV FEATURE] Deterministic long lived subnets (#3453 ) ## Issue Addressed #2847 ## Proposed Changes Add under a feature flag the required changes to subscribe to long lived subnets in a deterministic way ## Additional Info There is an additional required change that is actually searching for peers using the prefix, but I find that it's best to make this change in the future	2022-10-04 10:37:48 +00:00
GeemoCandama	6a92bf70e4	CLI tests for logging flags (#3609 ) ## Issue Addressed Adding CLI tests for logging flags: log-color and disable-log-timestamp Which issue # does this PR address? #3588 ## Proposed Changes Add CLI tests for logging flags as described in #3588 Please list or describe the changes introduced by this PR. Added logger_config to client::Config as suggested. Implemented Default for LoggerConfig based on what was being done elsewhere in the repo. Created 2 tests for each flag addressed. ## Additional Info Please provide any additional information. For example, future considerations or information useful for reviewers.	2022-10-04 08:33:40 +00:00
Pawan Dhananjay	8728c40102	Remove fallback support from eth1 service (#3594 ) ## Issue Addressed N/A ## Proposed Changes With https://github.com/sigp/lighthouse/pull/3214 we made it such that you can either have 1 auth endpoint or multiple non auth endpoints. Now that we are post merge on all networks (testnets and mainnet), we cannot progress a chain without a dedicated auth execution layer connection so there is no point in having a non-auth eth1-endpoint for syncing deposit cache. This code removes all fallback related code in the eth1 service. We still keep the single non-auth endpoint since it's useful for testing. ## Additional Info This removes all eth1 fallback related metrics that were relevant for the monitoring service, so we might need to change the api upstream.	2022-10-04 08:33:39 +00:00
realbigsean	8d45e48775	cargo fix	2022-10-03 21:52:16 -04:00
realbigsean	e81dbbfea4	compile	2022-10-03 21:48:02 -04:00
realbigsean	88006735c4	compile	2022-10-03 10:06:04 -04:00
realbigsean	7520651515	cargo fix and some test fixes	2022-09-29 12:43:35 -04:00
realbigsean	fe6fc55449	fix compilation errors, rename capella -> shanghai, cleanup some rebase issues	2022-09-29 12:43:13 -04:00
realbigsean	809b52715e	some block building updates	2022-09-29 12:38:00 -04:00
realbigsean	acaa340b41	add new beacon state variant for shanghai	2022-09-29 12:37:14 -04:00
realbigsean	203418ffc9	add `engine_getBlobV1`	2022-09-29 12:35:55 -04:00
realbigsean	3f1e5cee78	Some gossip work	2022-09-29 12:35:53 -04:00
realbigsean	ebc0ccd02a	some more sync boilerplate	2022-09-29 12:34:09 -04:00
realbigsean	4008da6c60	sync tx blobs	2022-09-29 12:32:55 -04:00
realbigsean	4cdf1b546d	add shanghai fork version and epoch	2022-09-29 12:28:58 -04:00
realbigsean	de44b300c0	add/update types	2022-09-29 12:25:56 -04:00
Age Manning	27bb9ff07d	Handle Lodestar's new agent string (#3620 ) ## Issue Addressed #3561 ## Proposed Changes Recognize Lodestars new agent string and appropriately count these peers as lodestar peers.	2022-09-29 01:50:13 +00:00
Age Manning	01b6bf7a2d	Improve logging a little (#3619 ) Some of the logs in combination with others could be improved. It will save some time debugging by improving the wording slightly.	2022-09-29 01:50:12 +00:00
Divma	b1d2510d1b	Libp2p v0.48.0 upgrade (#3547 ) ## Issue Addressed Upgrades libp2p to v.0.47.0. This is the compilation of - [x] #3495 - [x] #3497 - [x] #3491 - [x] #3546 - [x] #3553 Co-authored-by: Age Manning <Age@AgeManning.com>	2022-09-29 01:50:11 +00:00
Paul Hauner	01e84b71f5	v3.1.2 (#3603 ) ## Issue Addressed NA ## Proposed Changes Bump versions to v3.1.2 ## Additional Info - ~~Blocked on several PRs.~~ - ~~Requires further testing.~~	2022-09-26 01:17:36 +00:00
Divma	bd873e7162	New rust lints for rustc 1.64.0 (#3602 ) ## Issue Addressed fixes lints from the last rust release ## Proposed Changes Fix the lints, most of the lints by `clippy::question-mark` are false positives in the form of https://github.com/rust-lang/rust-clippy/issues/9518 so it's allowed for now ## Additional Info	2022-09-23 03:52:46 +00:00
Divma	9bd384a573	send attnet unsubscription event on random subnet expiry (#3600 ) ## Issue Addressed 🐞 in which we don't actually unsubscribe from a random long lived subnet when it expires ## Proposed Changes Remove code addressing a specific case in which we are subscribed to all subnets and handle the removal of the long lived subnet. I don't think the special case code is particularly important as, if someone is running with that many validators to be subscribed to all subnets, it should use `--subscribe-all-subnets` instead ## Additional Info Noticed on some test nodes climbing bandwidth usage periodically (around 27hours, the time of subnet expirations) I'm running this code to test this does not happen anymore, but I think it should be good now	2022-09-23 03:52:45 +00:00
Paul Hauner	9246a92d76	Make garbage collection test less failure prone (#3599 ) ## Issue Addressed NA ## Proposed Changes This PR attempts to fix the following spurious CI failure: ``` ---- store_tests::garbage_collect_temp_states_from_failed_block stdout ---- thread 'store_tests::garbage_collect_temp_states_from_failed_block' panicked at 'disk store should initialize: DBError { message: "Error { message: \"IO error: lock /tmp/.tmp6DcBQ9/cold_db/LOCK: already held by process\" }" }', beacon_node/beacon_chain/tests/store_tests.rs:59:10 ``` I believe that some async task is taking a clone of the store and holding it in some other thread for a short time. This creates a race-condition when we try to open a new instance of the store. ## Additional Info NA	2022-09-23 03:52:44 +00:00
Paul Hauner	fa6ad1a11a	Deduplicate block root computation (#3590 ) ## Issue Addressed NA ## Proposed Changes This PR removes duplicated block root computation. Computing the `SignedBeaconBlock::canonical_root` has become more expensive since the merge as we need to compute the merke root of each transaction inside an `ExecutionPayload`. Computing the root for [a mainnet block](https://beaconcha.in/slot/4704236) is taking ~10ms on my i7-8700K CPU @ 3.70GHz (no sha extensions). Given that our median seen-to-imported time for blocks is presently 300-400ms, removing a few duplicated block roots (~30ms) could represent an easy 10% improvement. When we consider that the seen-to-imported times include operations after the block has been placed in the early attester cache, we could expect the 30ms to be more significant WRT our seen-to-attestable times. ## Additional Info NA	2022-09-23 03:52:42 +00:00
Paul Hauner	3128b5b430	v3.1.1 (#3585 ) ## Issue Addressed NA ## Proposed Changes Bump versions ## Additional Info - ~~Requires additional testing~~ - ~~Blocked on:~~ - ~~#3589~~ - ~~#3540~~ - ~~#3587~~	2022-09-22 06:08:52 +00:00
Paul Hauner	96692b8e43	Impl `oneshot_broadcast` for committee promises (#3595 ) ## Issue Addressed NA ## Proposed Changes Fixes an issue introduced in #3574 where I erroneously assumed that a `crossbeam_channel` multiple receiver queue was a broadcast queue. This is incorrect, each message will be received by only one receiver. The effect of this mistake is these logs: ``` Sep 20 06:56:17.001 INFO Synced slot: 4736079, block: 0xaa8a…180d, epoch: 148002, finalized_epoch: 148000, finalized_root: 0x2775…47f2, exec_hash: 0x2ca5…ffde (verified), peers: 6, service: slot_notifier Sep 20 06:56:23.237 ERRO Unable to validate attestation error: CommitteeCacheWait(RecvError), peer_id: 16Uiu2HAm2Jnnj8868tb7hCta1rmkXUf5YjqUH1YPj35DCwNyeEzs, type: "aggregated", slot: Slot(4736047), beacon_block_root: 0x88d318534b1010e0ebd79aed60b6b6da1d70357d72b271c01adf55c2b46206c1 ``` ## Additional Info NA	2022-09-21 01:01:50 +00:00
Paul Hauner	a95bcba2ab	Avoid holding write-lock whilst waiting on shuffling cache promise (#3589 ) ## Issue Addressed NA ## Proposed Changes Fixes a bug which hogged the write-lock for the `shuffling_cache`. ## Additional Info NA	2022-09-19 07:58:50 +00:00
Michael Sproul	507bb9dad4	Refined payload pruning (#3587 ) ## Proposed Changes Improve the payload pruning feature in several ways: - Payload pruning is now entirely optional. It is enabled by default but can be disabled with `--prune-payloads false`. The previous `--prune-payloads-on-startup` flag from #3565 is removed. - Initial payload pruning on startup now runs in a background thread. This thread will always load the split state, which is a small fraction of its total work (up to ~300ms) and then backtrack from that state. This pruning process ran in 2m5s on one Prater node with good I/O and 16m on a node with slower I/O. - To work with the optional payload pruning the database function `try_load_full_block` will now attempt to load execution payloads for finalized slots _if_ pruning is currently disabled. This gives users an opt-out for the extensive traffic between the CL and EL for reconstructing payloads. ## Additional Info If the `prune-payloads` flag is toggled on and off then the on-startup check may not see any payloads to delete and fail to clean them up. In this case the `lighthouse db prune_payloads` command should be used to force a manual sweep of the database.	2022-09-19 07:58:49 +00:00
Michael Sproul	f2ac0738d8	Implement `skip_randao_verification` and blinded block rewards API (#3540 ) ## Issue Addressed https://github.com/ethereum/beacon-APIs/pull/222 ## Proposed Changes Update Lighthouse's randao verification API to match the `beacon-APIs` spec. We implemented the API before spec stabilisation, and it changed slightly in the course of review. Rather than a flag `verify_randao` taking a boolean value, the new API uses a `skip_randao_verification` flag which takes no argument. The new spec also requires the randao reveal to be present and equal to the point-at-infinity when `skip_randao_verification` is set. I've also updated the `POST /lighthouse/analysis/block_rewards` API to take blinded blocks as input, as the execution payload is irrelevant and we may want to assess blocks produced by builders. ## Additional Info This is technically a breaking change, but seeing as I suspect I'm the only one using these parameters/APIs, I think we're OK to include this in a patch release.	2022-09-19 07:58:48 +00:00
Marius van der Wijden	6f7d21c542	enable 4844 at epoch 3	2022-09-18 12:13:03 +02:00
Marius van der Wijden	285dbf43ed	hacky hacks	2022-09-18 11:34:46 +02:00
Marius van der Wijden	8b71b978e0	new round of hacks (config etc)	2022-09-17 23:42:49 +02:00
Daniel Knopik	750c594f5f	forgor something	2022-09-17 21:38:57 +02:00
Daniel Knopik	eab1fce0e5	Merge branch 'eip4844' of github.com:dknopik/lighthouse into eip4844	2022-09-17 20:55:36 +02:00
Daniel Knopik	76572db9d5	add network config	2022-09-17 20:55:21 +02:00
Marius van der Wijden	f43532d3de	implement handle blobs by range req	2022-09-17 20:05:51 +02:00
Marius van der Wijden	f9209e2d08	more network stuff	2022-09-17 16:39:40 +02:00
Marius van der Wijden	aeb52ff186	network stuff	2022-09-17 16:10:42 +02:00
Daniel Knopik	d4d40be870	storable blobs	2022-09-17 15:58:52 +02:00
Marius van der Wijden	36a0add0cd	network stuff	2022-09-17 15:23:28 +02:00
Daniel Knopik	0518665949	Merge remote-tracking branch 'fork/eip4844' into eip4844	2022-09-17 14:58:33 +02:00
Daniel Knopik	292a16a6eb	gossip boilerplate	2022-09-17 14:58:27 +02:00
Marius van der Wijden	acace8ab31	network: blobs by range message	2022-09-17 14:55:18 +02:00
Daniel Knopik	bcc738cb9d	progress on gossip stuff	2022-09-17 14:31:57 +02:00
Marius van der Wijden	8473f08d10	beacon: consensus: implement engine api getBlobs	2022-09-17 14:10:15 +02:00
Daniel Knopik	dcfae6c5cf	implement From<FullPayload> for Payload	2022-09-17 13:29:20 +02:00
Marius van der Wijden	fe6be28e6b	beacon: consensus: implement engine api getBlobs	2022-09-17 13:20:18 +02:00
Daniel Knopik	ca1e17b386	it compiles!	2022-09-17 12:23:03 +02:00
Daniel Knopik	95203c51d4	fix some bugx, adjust stucts	2022-09-17 11:26:18 +02:00
Michael Sproul	ca42ef2e5a	Prune finalized execution payloads (#3565 ) ## Issue Addressed Closes https://github.com/sigp/lighthouse/issues/3556 ## Proposed Changes Delete finalized execution payloads from the database in two places: 1. When running the finalization migration in `migrate_database`. We delete the finalized payloads between the last split point and the new updated split point. _If_ payloads are already pruned prior to this then this is sufficient to prune _all_ payloads as non-canonical payloads are already deleted by the head pruner, and all canonical payloads prior to the previous split will already have been pruned. 2. To address the fact that users will update to this code _after_ the merge on mainnet (and testnets), we need a one-off scan to delete the finalized payloads from the canonical chain. This is implemented in `try_prune_execution_payloads` which runs on startup and scans the chain back to the Bellatrix fork or the anchor slot (if checkpoint synced after Bellatrix). In the case where payloads are already pruned this check only imposes a single state load for the split state, which shouldn't be _too slow_. Even so, a flag `--prepare-payloads-on-startup=false` is provided to turn this off after it has run the first time, which provides faster start-up times. There is also a new `lighthouse db prune_payloads` subcommand for users who prefer to run the pruning manually. ## Additional Info The tests have been updated to not rely on finalized payloads in the database, instead using the `MockExecutionLayer` to reconstruct them. Additionally a check was added to `check_chain_dump` which asserts the non-existence or existence of payloads on disk depending on their slot.	2022-09-17 02:27:01 +00:00
Paul Hauner	2cd3e3a768	Avoid duplicate committee cache loads (#3574 ) ## Issue Addressed NA ## Proposed Changes I have observed scenarios on Goerli where Lighthouse was receiving attestations which reference the same, un-cached shuffling on multiple threads at the same time. Lighthouse was then loading the same state from database and determining the shuffling on multiple threads at the same time. This is unnecessary load on the disk and RAM. This PR modifies the shuffling cache so that each entry can be either: - A committee - A promise for a committee (i.e., a `crossbeam_channel::Receiver`) Now, in the scenario where we have thread A and thread B simultaneously requesting the same un-cached shuffling, we will have the following: 1. Thread A will take the write-lock on the shuffling cache, find that there's no cached committee and then create a "promise" (a `crossbeam_channel::Sender`) for a committee before dropping the write-lock. 1. Thread B will then be allowed to take the write-lock for the shuffling cache and find the promise created by thread A. It will block the current thread waiting for thread A to fulfill that promise. 1. Thread A will load the state from disk, obtain the shuffling, send it down the channel, insert the entry into the cache and then continue to verify the attestation. 1. Thread B will then receive the shuffling from the receiver, be un-blocked and then continue to verify the attestation. In the case where thread A fails to generate the shuffling and drops the sender, the next time that specific shuffling is requested we will detect that the channel is disconnected and return a `None` entry for that shuffling. This will cause the shuffling to be re-calculated. ## Additional Info NA	2022-09-16 08:54:03 +00:00
Paul Hauner	7d3948c8fe	Add metric for re-org distance (#3566 ) ## Issue Addressed NA ## Proposed Changes Add a metric to track the re-org distance. ## Additional Info NA	2022-09-13 17:19:27 +00:00
tim gretler	98815516a1	Support histogram buckets (#3391 ) ## Issue Addressed #3285 ## Proposed Changes Adds support for specifying histogram with buckets and adds new metric buckets for metrics mentioned in issue. ## Additional Info Need some help for the buckets. Co-authored-by: Michael Sproul <micsproul@gmail.com>	2022-09-13 01:57:44 +00:00
Nils Effinghausen	f682df51a1	fix description for BALANCES_CACHE_MISSES metric (#3545 ) ## Issue Addressed fixes metric description Co-authored-by: Nils Effinghausen <nils.effinghausen@t-systems.com>	2022-09-10 01:35:10 +00:00
realbigsean	d1a8d6cf91	Pin mev rs deps (#3557 ) ## Issue Addressed We were unable to update lighthouse by running `cargo update` because some of the `mev-build-rs` deps weren't pinned. But `mev-build-rs` is now pinned here and includes it's own pinned commits for `ssz-rs` and `etheruem-consensus` Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-09-08 23:46:03 +00:00
Michael Sproul	9a7f7f1c1e	Configurable monitoring endpoint frequency (#3530 ) ## Issue Addressed Closes #3514 ## Proposed Changes - Change default monitoring endpoint frequency to 120 seconds to fit with 30k requests/month limit. - Allow configuration of the monitoring endpoint frequency using `--monitoring-endpoint-frequency N` where `N` is a value in seconds.	2022-09-05 08:29:00 +00:00
realbigsean	177aef8f1e	Builder profit threshold flag (#3534 ) ## Issue Addressed Resolves https://github.com/sigp/lighthouse/issues/3517 ## Proposed Changes Adds a `--builder-profit-threshold <wei value>` flag to the BN. If an external payload's value field is less than this value, the local payload will be used. The value of the local payload will not be checked (it can't really be checked until the engine API is updated to support this). Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-09-05 04:50:49 +00:00
realbigsean	cae40731a2	Strict count unrealized (#3522 ) ## Issue Addressed Add a flag that can increase count unrealized strictness, defaults to false ## Proposed Changes Please list or describe the changes introduced by this PR. ## Additional Info Please provide any additional information. For example, future considerations or information useful for reviewers. Co-authored-by: realbigsean <seananderson33@gmail.com> Co-authored-by: sean <seananderson33@gmail.com>	2022-09-05 04:50:47 +00:00
Mac L	80359d8ddb	Fix attestation performance API `InvalidValidatorIndex` error (#3503 ) ## Issue Addressed When requesting an index which is not active during `start_epoch`, Lighthouse returns: ``` curl "http://localhost:5052/lighthouse/analysis/attestation_performance/999999999?start_epoch=100000&end_epoch=100000" ``` ```json { "code": 500, "message": "INTERNAL_SERVER_ERROR: ParticipationCache(InvalidValidatorIndex(999999999))", "stacktraces": [] } ``` This error occurs even when the index in question becomes active before `end_epoch` which is undesirable as it can prevent larger queries from completing. ## Proposed Changes In the event the index is out-of-bounds (has not yet been activated), simply return all fields as `false`: ``` -> curl "http://localhost:5052/lighthouse/analysis/attestation_performance/999999999?start_epoch=100000&end_epoch=100000" ``` ```json [ { "index": 999999999, "epochs": { "100000": { "active": false, "head": false, "target": false, "source": false } } } ] ``` By doing this, we cover the case where a validator becomes active sometime between `start_epoch` and `end_epoch`. ## Additional Info Note that this error only occurs for epochs after the Altair hard fork.	2022-09-05 04:50:45 +00:00
Divma	473abc14ca	Subscribe to subnets only when needed (#3419 ) ## Issue Addressed We currently subscribe to attestation subnets as soon as the subscription arrives (one epoch in advance), this makes it so that subscriptions for future slots are scheduled instead of done immediately. ## Proposed Changes - Schedule subscriptions to subnets for future slots. - Finish removing hashmap_delay, in favor of [delay_map](https://github.com/AgeManning/delay_map). This was the only remaining service to do this. - Subscriptions for past slots are rejected, before we would subscribe for one slot. - Add a new test for subscriptions that are not consecutive. ## Additional Info This is also an effort in making the code easier to understand	2022-09-05 00:22:48 +00:00
Paul Hauner	aa022f4685	v3.1.0 (#3525 ) ## Issue Addressed NA ## Proposed Changes - Bump versions ## Additional Info - ~~Blocked on #3508~~ - ~~Blocked on #3526~~ - ~~Requires additional testing.~~ - Expected release date is 2022-09-01	2022-08-31 22:21:55 +00:00
Paul Hauner	661307dce1	Separate committee subscriptions queue (#3508 ) ## Issue Addressed NA ## Proposed Changes As we've seen on Prater, there seems to be a correlation between these messages ``` WARN Not enough time for a discovery search subnet_id: ExactSubnet { subnet_id: SubnetId(19), slot: Slot(3742336) }, service: attestation_service ``` ... and nodes falling 20-30 slots behind the head for short periods. These nodes are running ~20k Prater validators. After running some metrics, I can see that the `network_recv` channel is processing ~250k `AttestationSubscribe` messages per minute. It occurred to me that perhaps the `AttestationSubscribe` messages are "washing out" the `SendRequest` and `SendResponse` messages. In this PR I separate the `AttestationSubscribe` and `SyncCommitteeSubscribe` messages into their own queue so the `tokio::select!` in the `NetworkService` can still process the other messages in the `network_recv` channel without necessarily having to clear all the subscription messages first. ~~I've also added filter to the HTTP API to prevent duplicate subscriptions going to the network service.~~ ## Additional Info - Currently being tested on Prater	2022-08-30 05:47:31 +00:00
Michael Sproul	7a50684741	Harden slot notifier against clock drift (#3519 ) ## Issue Addressed Partly resolves #3518 ## Proposed Changes Change the slot notifier to use `duration_to_next_slot` rather than an interval timer. This makes it robust against underlying clock changes.	2022-08-29 14:34:43 +00:00
Paul Hauner	1a833ecc17	Add more logging for invalid payloads (#3515 ) ## Issue Addressed NA ## Proposed Changes Adds more `debug` logging to help troubleshoot invalid execution payload blocks. I was doing some of this recently and found it to be challenging. With this PR we should be able to grep `Invalid execution payload` and get one-liners that will show the block, slot and details about the proposer. I also changed the log in `process_invalid_execution_payload` since it was a little misleading; the `block_root` wasn't necessary the block which had an invalid payload. ## Additional Info NA	2022-08-29 14:34:42 +00:00
Paul Hauner	8609cced0e	Reset payload statuses when resuming fork choice (#3498 ) ## Issue Addressed NA ## Proposed Changes This PR is motivated by a recent consensus failure in Geth where it returned `INVALID` for an `VALID` block. Without this PR, the only way to recover is by re-syncing Lighthouse. Whilst ELs "shouldn't have consensus failures", in reality it's something that we can expect from time to time due to the complex nature of Ethereum. Being able to recover easily will help the network recover and EL devs to troubleshoot. The risk introduced with this PR is that genuinely INVALID payloads get a "second chance" at being imported. I believe the DoS risk here is negligible since LH needs to be restarted in order to re-process the payload. Furthermore, there's no reason to think that a well-performing EL will accept a truly invalid payload the second-time-around. ## Additional Info This implementation has the following intricacies: 1. Instead of just resetting invalid payloads to optimistic, we'll also reset valid payloads. This is an artifact of our existing implementation. 1. We will only reset payload statuses when we detect an invalid payload present in `proto_array` - This helps save us from forgetting that all our blocks are valid in the "best case scenario" where there are no invalid blocks. 1. If we fail to revert the payload statuses we'll log a `CRIT` and just continue with a `proto_array` that does not have reverted payload statuses. - The code to revert statuses needs to deal with balances and proposer-boost, so it's a failure point. This is a defensive measure to avoid introducing new show-stopping bugs to LH.	2022-08-29 14:34:41 +00:00
Michael Sproul	66eca1a882	Refactor op pool for speed and correctness (#3312 ) ## Proposed Changes This PR has two aims: to speed up attestation packing in the op pool, and to fix bugs in the verification of attester slashings, proposer slashings and voluntary exits. The changes are bundled into a single database schema upgrade (v12). Attestation packing is sped up by removing several inefficiencies: - No more recalculation of `attesting_indices` during packing. - No (unnecessary) examination of the `ParticipationFlags`: a bitfield suffices. See `RewardCache`. - No re-checking of attestation validity during packing: the `AttestationMap` provides attestations which are "correct by construction" (I have checked this using Hydra). - No SSZ re-serialization for the clunky `AttestationId` type (it can be removed in a future release). So far the speed-up seems to be roughly 2-10x, from 500ms down to 50-100ms. Verification of attester slashings, proposer slashings and voluntary exits is fixed by: - Tracking the `ForkVersion`s that were used to verify each message inside the `SigVerifiedOp`. This allows us to quickly re-verify that they match the head state's opinion of what the `ForkVersion` should be at the epoch(s) relevant to the message. - Storing the `SigVerifiedOp` on disk rather than the raw operation. This allows us to continue track the fork versions after a reboot. This is mostly contained in this commit 52bb1840ae5c4356a8fc3a51e5df23ed65ed2c7f. ## Additional Info The schema upgrade uses the justified state to re-verify attestations and compute `attesting_indices` for them. It will drop any attestations that fail to verify, by the logic that attestations are most valuable in the few slots after they're observed, and are probably stale and useless by the time a node restarts. Exits and proposer slashings and similarly re-verified to obtain `SigVerifiedOp`s. This PR contains a runtime killswitch `--paranoid-block-proposal` which opts out of all the optimisations in favour of closely verifying every included message. Although I'm quite sure that the optimisations are correct this flag could be useful in the event of an unforeseen emergency. Finally, you might notice that the `RewardCache` appears quite useless in its current form because it is only updated on the hot-path immediately before proposal. My hope is that in future we can shift calls to `RewardCache::update` into the background, e.g. while performing the state advance. It is also forward-looking to `tree-states` compatibility, where iterating and indexing `state.{previous,current}_epoch_participation` is expensive and needs to be minimised.	2022-08-29 09:10:26 +00:00
realbigsean	cb132c622d	don't register exited or slashed validators with the builder api (#3473 ) ## Issue Addressed #3465 ## Proposed Changes Filter out any validator registrations for validators that are not `active` or `pending`. I'm adding this filtering the beacon node because all the information is readily available there. In other parts of the VC we are usually sending per-validator requests based on duties from the BN. And duties will only be provided for active validators so we don't have this type of filtering elsewhere in the VC. Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-08-24 23:34:58 +00:00
Divma	8c69d57c2c	Pause sync when EE is offline (#3428 ) ## Issue Addressed #3032 ## Proposed Changes Pause sync when ee is offline. Changes include three main parts: - Online/offline notification system - Pause sync - Resume sync #### Online/offline notification system - The engine state is now guarded behind a new struct `State` that ensures every change is correctly notified. Notifications are only sent if the state changes. The new `State` is behind a `RwLock` (as before) as the synchronization mechanism. - The actual notification channel is a [tokio::sync::watch](https://docs.rs/tokio/latest/tokio/sync/watch/index.html) which ensures only the last value is in the receiver channel. This way we don't need to worry about message order etc. - Sync waits for state changes concurrently with normal messages. #### Pause Sync Sync has four components, pausing is done differently in each: - Block lookups: Disabled while in this state. We drop current requests and don't search for new blocks. Block lookups are infrequent and I don't think it's worth the extra logic of keeping these and delaying processing. If we later see that this is required, we can add it. - Parent lookups: Disabled while in this state. We drop current requests and don't search for new parents. Parent lookups are even less frequent and I don't think it's worth the extra logic of keeping these and delaying processing. If we later see that this is required, we can add it. - Range: Chains don't send batches for processing to the beacon processor. This is easily done by guarding the channel to the beacon processor and giving it access only if the ee is responsive. I find this the simplest and most powerful approach since we don't need to deal with new sync states and chain segments that are added while the ee is offline will follow the same logic without needing to synchronize a shared state among those. Another advantage of passive pause vs active pause is that we can still keep track of active advertised chain segments so that on resume we don't need to re-evaluate all our peers. - Backfill: Not affected by ee states, we don't pause. #### Resume Sync - Block lookups: Enabled again. - Parent lookups: Enabled again. - Range: Active resume. Since the only real pause range does is not sending batches for processing, resume makes all chains that are holding read-for-processing batches send them. - Backfill: Not affected by ee states, no need to resume. ## Additional Info QUESTION: Originally I made this to notify and change on synced state, but @pawanjay176 on talks with @paulhauner concluded we only need to check online/offline states. The upcheck function mentions extra checks to have a very up to date sync status to aid the networking stack. However, the only need the networking stack would have is this one. I added a TODO to review if the extra check can be removed Next gen of #3094 Will work best with #3439 Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>	2022-08-24 23:34:56 +00:00
Michael Sproul	aab4a8d2f2	Update docs for mainnet merge release (#3494 ) ## Proposed Changes Update the merge migration docs to encourage updating mainnet configs _now_! The docs are also updated to recommend _against_ `--suggested-fee-recipient` on the beacon node (https://github.com/sigp/lighthouse/issues/3432). Additionally the `--help` for the CLI is updated to match with a few small semantic changes: - `--execution-jwt` is no longer allowed without `--execution-endpoint`. We've ended up without a default for `--execution-endpoint`, so I think that's fine. - The flags related to the JWT are only allowed if `--execution-jwt` is provided.	2022-08-23 03:50:58 +00:00
Paul Hauner	18c61a5e8b	v3.0.0 (#3464 ) ## Issue Addressed NA ## Proposed Changes Bump versions to v3.0.0 ## Additional Info - ~~Blocked on #3439~~ - ~~Blocked on #3459~~ - ~~Blocked on #3463~~ - ~~Blocked on #3462~~ - ~~Requires further testing~~ Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2022-08-22 03:43:08 +00:00
Paul Hauner	931153885c	Run per-slot fork choice at a further distance from the head (#3487 ) ## Issue Addressed NA ## Proposed Changes Run fork choice when the head is 256 slots from the wall-clock slot, rather than 4. The reason we don't always run FC is so that it doesn't slow us down during sync. As the comments state, setting the value to 256 means that we'd only have one interrupting fork-choice call if we were syncing at 20 slots/sec. ## Additional Info NA	2022-08-19 04:27:24 +00:00
Paul Hauner	df358b864d	Add metrics for EE `PayloadStatus` returns (#3486 ) ## Issue Addressed NA ## Proposed Changes Adds some metrics so we can track payload status responses from the EE. I think this will be useful for troubleshooting and alerting. I also bumped the `BecaonChain::per_slot_task` to `debug` since it doesn't seem too noisy and would have helped us with some things we were debugging in the past. ## Additional Info NA	2022-08-19 04:27:23 +00:00
Paul Hauner	043fa2153e	Revise EE peer penalites (#3485 ) ## Issue Addressed NA ## Proposed Changes Don't penalize peers for errors that might be caused by an honest optimistic node. ## Additional Info NA	2022-08-19 04:27:22 +00:00
Michael Sproul	8255c8682e	Align engine API timeouts with spec (#3470 ) ## Proposed Changes Match the timeouts from the `execution-apis` spec. Our existing values were already quite close so I don't imagine this change to be very disruptive. The spec sets the timeout for `engine_getPayloadV1` to only 1 second, but we were already using a longer value of 2 seconds. I've kept the 2 second timeout as I don't think there's any need to fail faster when producing a payload. There's no timeout specified for `eth_syncing` so I've matched it to the shortest timeout from the spec (1 second). I think the previous value of 250ms was likely too low and could have been contributing to spurious timeouts, particularly for remote ELs. ## Additional Info The timeouts are defined on each endpoint in this document: https://github.com/ethereum/execution-apis/blob/main/src/engine/specification.md	2022-08-17 02:36:39 +00:00
Michael Sproul	e5fc9f26bc	Log if no execution endpoint is configured (#3467 ) ## Issue Addressed Fixes an issue whereby syncing a post-merge network without an execution endpoint would silently stall. Sync swallows the errors from block verification so previously there was no indication in the logs for why the node couldn't sync. ## Proposed Changes Add an error log to the merge-readiness notifier for the case where the merge has already completed but no execution endpoint is configured.	2022-08-15 01:31:02 +00:00
Michael Sproul	25e3dc9300	Fix block verification and checkpoint sync caches (#3466 ) ## Issue Addressed Closes https://github.com/sigp/lighthouse/issues/2962 ## Proposed Changes Build all caches on the checkpoint state before storing it in the database. Additionally, fix a bug in `signature_verify_chain_segment` which prevented block verification from succeeding unless the previous epoch cache was already built. The previous epoch cache is required to verify the signatures of attestations included from previous epochs, even when all the blocks in the segment are from the same epoch. The comments around `signature_verify_chain_segment` have also been updated to reflect the fact that it should only be used on a chain of blocks from a single epoch. I believe this restriction had already been added at some point in the past and that the current comments were just outdated (and I think because the proposer shuffling can change in the next epoch based on the blocks applied in the current epoch that this limitation is essential).	2022-08-15 01:31:00 +00:00
Paul Hauner	f03f9ba680	Increase merge-readiness lookhead (#3463 ) ## Issue Addressed NA ## Proposed Changes Start issuing merge-readiness logs 2 weeks before the Bellatrix fork epoch. Additionally, if the Bellatrix epoch is specified and the use has configured an EL, always log merge readiness logs, this should benefit pro-active users. ### Lookahead Reasoning - Bellatrix fork is: - epoch 144896 - slot 4636672 - Unix timestamp: `1606824023 + (4636672 * 12) = 1662464087` - GMT: Tue Sep 06 2022 11:34:47 GMT+0000 - Warning start time is: - Unix timestamp: `1662464087 - 604800 * 2 = 1661254487` - GMT: Tue Aug 23 2022 11:34:47 GMT+0000 The [current expectation](https://discord.com/channels/595666850260713488/745077610685661265/1007445305198911569) is that EL and CL clients will releases out by Aug 22nd at the latest, then an EF announcement will go out on the 23rd. If all goes well, LH will start alerting users about merge-readiness just after the announcement. ## Additional Info NA	2022-08-15 01:30:59 +00:00
Michael Sproul	92d597ad23	Modularise slasher backend (#3443 ) ## Proposed Changes Enable multiple database backends for the slasher, either MDBX (default) or LMDB. The backend can be selected using `--slasher-backend={lmdb,mdbx}`. ## Additional Info In order to abstract over the two library's different handling of database lifetimes I've used `Box::leak` to give the `Environment` type a `'static` lifetime. This was the only way I could think of using 100% safe code to construct a self-referential struct `SlasherDB`, where the `OpenDatabases` refers to the `Environment`. I think this is OK, as the `Environment` is expected to live for the life of the program, and both database engines leave the database in a consistent state after each write. The memory claimed for memory-mapping will be freed by the OS and appropriately flushed regardless of whether the `Environment` is actually dropped. We are depending on two `sigp` forks of `libmdbx-rs` and `lmdb-rs`, to give us greater control over MDBX OS support and LMDB's version.	2022-08-15 01:30:56 +00:00
Pawan Dhananjay	71fd0b42f2	Fix lints for Rust 1.63 (#3459 ) ## Issue Addressed N/A ## Proposed Changes Fix clippy lints for latest rust version 1.63. I have allowed the [derive_partial_eq_without_eq](https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq) lint as satisfying this lint would result in more code that we might not want and I feel it's not required. Happy to fix this lint across lighthouse if required though.	2022-08-12 00:56:39 +00:00
Divma	f4ffa9e0b4	Handle processing results of non faulty batches (#3439 ) ## Issue Addressed Solves #3390 So after checking some logs @pawanjay176 got, we conclude that this happened because we blacklisted a chain after trying it "too much". Now here, in all occurrences it seems that "too much" means we got too many download failures. This happened very slowly, exactly because the batch is allowed to stay alive for very long times after not counting penalties when the ee is offline. The error here then was not that the batch failed because of offline ee errors, but that we blacklisted a chain because of download errors, which we can't pin on the chain but on the peer. This PR fixes that. ## Proposed Changes Adds a missing piece of logic so that if a chain fails for errors that can't be attributed to an objectively bad behavior from the peer, it is not blacklisted. The issue at hand occurred when new peers arrived claiming a head that had wrongfully blacklisted, even if the original peers participating in the chain were not penalized. Another notable change is that we need to consider a batch invalid if it processed correctly but its next non empty batch fails processing. Now since a batch can fail processing in non empty ways, there is no need to mark as invalid previous batches. Improves some logging as well. ## Additional Info We should do this regardless of pausing sync on ee offline/unsynced state. This is because I think it's almost impossible to ensure a processing result will reach in a predictable order with a synced notification from the ee. Doing this handles what I think are inevitable data races when we actually pause sync This also fixes a return that reports which batch failed and caused us some confusion checking the logs	2022-08-12 00:56:38 +00:00
Paul Hauner	4fc0cb121c	Remove some "wontfix" TODOs for the merge (#3449 ) ## Issue Addressed NA ## Proposed Changes Removes three types of TODOs: 1. `execution_layer/src/lib.rs`: It was [determined](https://github.com/ethereum/consensus-specs/issues/2636#issuecomment-988688742) that there is no action required here. 2. `beacon_processor/worker/gossip_methods.rs`: Removed TODOs relating to peer scoring that have already been addressed via `epe.penalize_peer()`. - It seems `cargo fmt` wanted to adjust some things here as well 🤷 3. `proto_array_fork_choice.rs`: it would be nice to remove that useless `bool` for cleanliness, but I don't think it's something we need to do and the TODO just makes things look messier IMO. ## Additional Info There should be no functional changes to the code in this PR. There are still some TODOs lingering, those ones require actual changes or more thought.	2022-08-10 13:06:46 +00:00
Michael Sproul	4e05f19fb5	Serve Bellatrix preset in BN API (#3425 ) ## Issue Addressed Resolves #3388 Resolves #2638 ## Proposed Changes - Return the `BellatrixPreset` on `/eth/v1/config/spec` by default. - Allow users to opt out of this by providing `--http-spec-fork=altair` (unless there's a Bellatrix fork epoch set). - Add the Altair constants from #2638 and make serving the constants non-optional (the `http-disable-legacy-spec` flag is deprecated). - Modify the VC to only read the `Config` and not to log extra fields. This prevents it from having to muck around parsing the `ConfigAndPreset` fields it doesn't need. ## Additional Info This change is backwards-compatible for the VC and the BN, but is marked as a breaking change for the removal of `--http-disable-legacy-spec`. I tried making `Config` a `superstruct` too, but getting the automatic decoding to work was a huge pain and was going to require a lot of hacks, so I gave up in favour of keeping the default-based approach we have now.	2022-08-10 07:52:59 +00:00
Pawan Dhananjay	c25934956b	Remove INVALID_TERMINAL_BLOCK (#3385 ) ## Issue Addressed Resolves #3379 ## Proposed Changes Remove instances of `InvalidTerminalBlock` in lighthouse and use `Invalid {latest_valid_hash: "0x0000000000000000000000000000000000000000000000000000000000000000"}` to represent that status.	2022-08-10 07:52:58 +00:00
Paul Hauner	2de26b20f8	Don't return errors on HTTP API for already-known messages (#3341 ) ## Issue Addressed - Resolves #3266 ## Proposed Changes Return 200 OK rather than an error when a block, attestation or sync message is already known. Presently, we will log return an error which causes a BN to go "offline" from the VCs perspective which causes the fallback mechanism to do work to try and avoid and upcheck offline nodes. This can be observed as instability in the `vc_beacon_nodes_available_count` metric. The current behaviour also causes scary logs for the user. There's nothing to actually be concerned about when we see duplicate messages, this can happen on fallback systems (see code comments). ## Additional Info NA	2022-08-10 07:52:57 +00:00
realbigsean	6f13727fbe	Don't use the builder network if the head is optimistic (#3412 ) ## Issue Addressed Resolves https://github.com/sigp/lighthouse/issues/3394 Adds a check in `is_healthy` about whether the head is optimistic when choosing whether to use the builder network. Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-08-09 06:05:16 +00:00
Paul Hauner	a688621919	Add support for beaconAPI in `lcli` functions (#3252 ) ## Issue Addressed NA ## Proposed Changes Modifies `lcli skip-slots` and `lcli transition-blocks` allow them to source blocks/states from a beaconAPI and also gives them some more features to assist with benchmarking. ## Additional Info Breaks the current `lcli skip-slots` and `lcli transition-blocks` APIs by changing some flag names. It should be simple enough to figure out the changes via `--help`. Currently blocked on #3263.	2022-08-09 06:05:13 +00:00
Michael Sproul	6bc4a2cc91	Update invalid head tests (#3400 ) ## Proposed Changes Update the invalid head tests so that they work with the current default fork choice configuration. Thanks @realbigsean for fixing the persistence test and the EF tests. Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-08-05 23:41:09 +00:00
Mac L	5d317779bb	Ensure `validator/blinded_blocks/{slot}` endpoint conforms to spec (#3429 ) ## Issue Addressed #3418 ## Proposed Changes - Remove `eth/v2/validator/blinded_blocks/{slot}` as this endpoint does not exist in the spec. - Return `version` in the `eth/v1/validator/blinded_blocks/{slot}` endpoint. ## Additional Info Since this removes the `v2` endpoint, this is technically a breaking change, but as this does not exist in the spec users may or may not be relying on this. Depending on what we feel is appropriate, I'm happy to edit this so we keep the `v2` endpoint for now but simply bring the `v1` endpoint in line with `v2`.	2022-08-05 06:46:58 +00:00
realbigsean	43ce0de73f	Downgrade log for 204 from builder (#3411 ) ## Issue Addressed A 204 from the connected builder just indicates there's no payload available from the builder, not that there's an issue. So I don't actually think this should be a warn. During the merge transition when we are pre-finalization a 204 will actually be expected. And maybe even longer if the relay chooses to delay providing payloads for a longer period post-merge. Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-08-03 17:13:15 +00:00
Michael Sproul	df51a73272	Release v2.5.1 (#3406 ) ## Issue Addressed Patch release to address fork choice issues in the presence of clock drift: https://github.com/sigp/lighthouse/pull/3402	2022-08-03 04:23:09 +00:00
Mac L	e24552d61a	Restore backwards compatibility when using older BNs (#3410 ) ## Issue Addressed https://github.com/status-im/nimbus-eth2/issues/3930 ## Proposed Changes We can trivially support beacon nodes which do not provide the `is_optimistic` field by wrapping the field in an `Option`.	2022-08-02 23:20:51 +00:00
Paul Hauner	d0beecca20	Make fork choice prune again (#3408 ) ## Issue Addressed NA ## Proposed Changes There was a regression in #3244 (released in v2.4.0) which stopped pruning fork choice (see [here](https://github.com/sigp/lighthouse/pull/3244#discussion_r935187485)). This would form a very slow memory leak, using ~100mb per month. The release has been out for ~11 days, so users should not be seeing a dangerous increase in memory, yet. Credits to @michaelsproul for noticing this 🎉 ## Additional Info NA	2022-08-02 07:58:42 +00:00
Justin Traglia	807bc8b0b3	Fix a few typos in option help strings (#3401 ) ## Proposed Changes Fixes a typo I noticed while looking at options.	2022-08-02 00:58:24 +00:00
Michael Sproul	18383a63b2	Tidy eth1/deposit contract logging (#3397 ) ## Issue Addressed Fixes an issue identified by @remyroy whereby we were logging a recommendation to use `--eth1-endpoints` on merge-ready setups (when the execution layer was out of sync). ## Proposed Changes I took the opportunity to clean up the other eth1-related logs, replacing "eth1" by "deposit contract" or "execution" as appropriate. I've downgraded the severity of the `CRIT` log to `ERRO` and removed most of the recommendation text. The reason being that users lacking an execution endpoint will be informed by the new `WARN Not merge ready` log pre-Bellatrix, or the regular errors from block verification post-Bellatrix.	2022-08-01 07:20:43 +00:00
Paul Hauner	2983235650	v2.5.0 (#3392 ) ## Issue Addressed NA ## Proposed Changes Bump versions. ## Additional Info - ~~Blocked on #3383~~ - ~~Awaiting further testing.~~	2022-08-01 03:41:08 +00:00
Paul Hauner	bcfde6e7df	Indicate that invalid blocks are optimistic (#3383 ) ## Issue Addressed NA ## Proposed Changes This PR will make Lighthouse return blocks with invalid payloads via the API with `execution_optimistic = true`. This seems a bit awkward, however I think it's better than returning a 404 or some other error. Let's consider the case where the only possible head is invalid (#3370 deals with this). In such a scenario all of the duties endpoints will start failing because the head is invalid. I think it would be better if the duties endpoints continue to work, because it's likely that even though the head is invalid the duties are still based upon valid blocks and we want the VC to have them cached. There's no risk to the VC here because we won't actually produce an attestation pointing to an invalid head. Ultimately, I don't think it's particularly important for us to distinguish between optimistic and invalid blocks on the API. Neither should be trusted and the only real reason that we track this is so we can try and fork around the invalid blocks. ## Additional Info - ~~Blocked on #3370~~	2022-07-30 05:08:57 +00:00
Michael Sproul	fdfdb9b57c	Enable `count-unrealized` by default (#3389 ) ## Issue Addressed Enable https://github.com/sigp/lighthouse/pull/3322 by default on all networks. The feature can be opted out of using `--count-unrealized=false` (the CLI flag is updated to take a parameter).	2022-07-30 00:22:41 +00:00
Pawan Dhananjay	b3ce8d0de9	Fix penalties in sync methods (#3384 ) ## Issue Addressed N/A ## Proposed Changes Uses the `penalize_peer` function added in #3350 in sync methods as well. The existing code in sync methods missed the `ExecutionPayloadError::UnverifiedNonOptimisticCandidate` case.	2022-07-30 00:22:39 +00:00
ethDreamer	034260bd99	Initial Commit of Retrospective OTB Verification (#3372 ) ## Issue Addressed * #2983 ## Proposed Changes Basically followed the [instructions laid out here](https://github.com/sigp/lighthouse/issues/2983#issuecomment-1062494947) Co-authored-by: Paul Hauner <paul@paulhauner.com> Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>	2022-07-30 00:22:38 +00:00
realbigsean	6c2d8b2262	Builder Specs v0.2.0 (#3134 ) ## Issue Addressed https://github.com/sigp/lighthouse/issues/3091 Extends https://github.com/sigp/lighthouse/pull/3062, adding pre-bellatrix block support on blinded endpoints and allowing the normal proposal flow (local payload construction) on blinded endpoints. This resulted in better fallback logic because the VC will not have to switch endpoints on failure in the BN <> Builder API, the BN can just fallback immediately and without repeating block processing that it shouldn't need to. We can also keep VC fallback from the VC<>BN API's blinded endpoint to full endpoint. ## Proposed Changes - Pre-bellatrix blocks on blinded endpoints - Add a new `PayloadCache` to the execution layer - Better fallback-from-builder logic ## Todos - [x] Remove VC transition logic - [x] Add logic to only enable builder flow after Merge transition finalization - [x] Tests - [x] Fix metrics - [x] Rustdocs Co-authored-by: Mac L <mjladson@pm.me> Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-07-30 00:22:37 +00:00
Paul Hauner	25f0e261cb	Don't return errors when fork choice fails (#3370 ) ## Issue Addressed NA ## Proposed Changes There are scenarios where the only viable head will have an invalid execution payload, in this scenario the `get_head` function on `proto_array` will return an error. We must recover from this scenario by importing blocks from the network. This PR stops `BeaconChain::recompute_head` from returning an error so that we can't accidentally start down-scoring peers or aborting block import just because the current head has an invalid payload. ## Reviewer Notes The following changes are included: 1. Allow `fork_choice.get_head` to fail gracefully in `BeaconChain::process_block` when trying to update the `early_attester_cache`; simply don't add the block to the cache rather than aborting the entire process. 1. Don't return an error from `BeaconChain::recompute_head_at_current_slot` and `BeaconChain::recompute_head` to defensively prevent calling functions from aborting any process just because the fork choice function failed to run. - This should have practically no effect, since most callers were still continuing if recomputing the head failed. - The outlier is that the API will return 200 rather than a 500 when fork choice fails. 1. Add the `ProtoArrayForkChoice::set_all_blocks_to_optimistic` function to recover from the scenario where we've rebooted and the persisted fork choice has an invalid head.	2022-07-28 13:57:09 +00:00
Michael Sproul	d04fde3ba9	Remove equivocating validators from fork choice (#3371 ) ## Issue Addressed Closes https://github.com/sigp/lighthouse/issues/3241 Closes https://github.com/sigp/lighthouse/issues/3242 ## Proposed Changes * [x] Implement logic to remove equivocating validators from fork choice per https://github.com/ethereum/consensus-specs/pull/2845 * [x] Update tests to v1.2.0-rc.1. The new test which exercises `equivocating_indices` is passing. * [x] Pull in some SSZ abstractions from the `tree-states` branch that make implementing Vec-compatible encoding for types like `BTreeSet` and `BTreeMap`. * [x] Implement schema upgrades and downgrades for the database (new schema version is V11). * [x] Apply attester slashings from blocks to fork choice ## Additional Info * This PR doesn't need the `BTreeMap` impl, but `tree-states` does, and I don't think there's any harm in keeping it. But I could also be convinced to drop it. Blocked on #3322.	2022-07-28 09:43:41 +00:00
Pawan Dhananjay	f3439116da	Return ResourceUnavailable if we are unable to reconstruct execution payloads (#3365 ) ## Issue Addressed Resolves #3351 ## Proposed Changes Returns a `ResourceUnavailable` rpc error if we are unable to serve full payloads to blocks by root and range requests because the execution layer is not synced. ## Additional Info This PR also changes the penalties such that a `ResourceUnavailable` error is only penalized if it is an outgoing request. If we are syncing and aren't getting full block responses, then we don't have use for the peer. However, this might not be true for the incoming request case. We let the peer decide in this case if we are still useful or if we should be banned. cc @divagant-martian please let me know if i'm missing something here.	2022-07-27 03:20:00 +00:00
Justin Traglia	0f62d900fe	Fix some typos (#3376 ) ## Proposed Changes This PR fixes various minor typos in the project.	2022-07-27 00:51:06 +00:00
Mac L	44fae52cd7	Refuse to sign sync committee messages when head is optimistic (#3191 ) ## Issue Addressed Resolves #3151 ## Proposed Changes When fetching duties for sync committee contributions, check the value of `execution_optimistic` of the head block from the BN and refuse to sign any sync committee messages `if execution_optimistic == true`. ## Additional Info - Is backwards compatible with older BNs - Finding a way to add test coverage for this would be prudent. Open to suggestions.	2022-07-27 00:51:05 +00:00
Mac L	d316305411	Add `is_optimistic` to `eth/v1/node/syncing` response (#3374 ) ## Issue Addressed As specified in the [Beacon Chain API specs](https://github.com/ethereum/beacon-APIs/blob/master/apis/node/syncing.yaml#L32-L35) we should return `is_optimistic` as part of the response to a query for the `eth/v1/node/syncing` endpoint. ## Proposed Changes Compute the optimistic status of the head and add it to the `SyncingData` response.	2022-07-26 08:50:16 +00:00
realbigsean	904dd62524	Strict fee recipient (#3363 ) ## Issue Addressed Resolves #3267 Resolves #3156 ## Proposed Changes - Move the log for fee recipient checks from proposer cache insertion into block proposal so we are directly checking what we get from the EE - Only log when there is a discrepancy with the local EE, not when using the builder API. In the `builder-api` branch there is an `info` log when there is a discrepancy, I think it is more likely there will be a difference in fee recipient with the builder api because proposer payments might be made via a transaction in the block. Not really sure what patterns will become commong. - Upgrade the log from a `warn` to an `error` - not actually sure which we want, but I think this is worth an error because the local EE with default transaction ordering I think should pretty much always use the provided fee recipient - add a `strict-fee-recipient` flag to the VC so we only sign blocks with matching fee recipients. Falls back from the builder API to the local API if there is a discrepancy . Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-07-26 02:17:24 +00:00
ethDreamer	f7354abe0f	Fix Block Cache Range Math for Faster Syncing (#3358 ) ## Issue Addressed While messing with the deposit snapshot stuff, I had my proxy running and noticed the beacon node wasn't syncing the block cache continuously. There were long periods where it did nothing. I believe this was caused by a logical error introduced in #3234 that dealt with an issue that arose while syncing the block cache on Ropsten. The problem is that when the block cache is initially syncing, it will trigger the logic that detects the cache is far behind the execution chain in time. This will trigger a batch syncing mechanism which is intended to sync further ahead than the chain would normally. But the batch syncing is actually slower than the range this function usually estimates (in this scenario). ## Proposed Changes I believe I've fixed this function by taking the end of the range to be the maximum of (batch syncing range, usual range). I've also renamed and restructured some things a bit. It's equivalent logic but I think it's more clear what's going on.	2022-07-26 02:17:21 +00:00
realbigsean	20ebf1f3c1	Realized unrealized experimentation (#3322 ) ## Issue Addressed Add a flag that optionally enables unrealized vote tracking. Would like to test out on testnets and benchmark differences in methods of vote tracking. This PR includes a DB schema upgrade to enable to new vote tracking style. Co-authored-by: realbigsean <sean@sigmaprime.io> Co-authored-by: Paul Hauner <paul@paulhauner.com> Co-authored-by: sean <seananderson33@gmail.com> Co-authored-by: Mac L <mjladson@pm.me>	2022-07-25 23:53:26 +00:00
Mac L	bb5a6d2cca	Add `execution_optimistic` flag to HTTP responses (#3070 ) ## Issue Addressed #3031 ## Proposed Changes Updates the following API endpoints to conform with https://github.com/ethereum/beacon-APIs/pull/190 and https://github.com/ethereum/beacon-APIs/pull/196 - [x] `beacon/states/{state_id}/root` - [x] `beacon/states/{state_id}/fork` - [x] `beacon/states/{state_id}/finality_checkpoints` - [x] `beacon/states/{state_id}/validators` - [x] `beacon/states/{state_id}/validators/{validator_id}` - [x] `beacon/states/{state_id}/validator_balances` - [x] `beacon/states/{state_id}/committees` - [x] `beacon/states/{state_id}/sync_committees` - [x] `beacon/headers` - [x] `beacon/headers/{block_id}` - [x] `beacon/blocks/{block_id}` - [x] `beacon/blocks/{block_id}/root` - [x] `beacon/blocks/{block_id}/attestations` - [x] `debug/beacon/states/{state_id}` - [x] `debug/beacon/heads` - [x] `validator/duties/attester/{epoch}` - [x] `validator/duties/proposer/{epoch}` - [x] `validator/duties/sync/{epoch}` Updates the following Server-Sent Events: - [x] `events?topics=head` - [x] `events?topics=block` - [x] `events?topics=finalized_checkpoint` - [x] `events?topics=chain_reorg` ## Backwards Incompatible There is a very minor breaking change with the way the API now handles requests to `beacon/blocks/{block_id}/root` and `beacon/states/{state_id}/root` when `block_id` or `state_id` is the `Root` variant of `BlockId` and `StateId` respectively. Previously a request to a non-existent root would simply echo the root back to the requester: ``` curl "http://localhost:5052/eth/v1/beacon/states/0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/root" {"data":{"root":"0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}} ``` Now it will return a `404`: ``` curl "http://localhost:5052/eth/v1/beacon/blocks/0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/root" {"code":404,"message":"NOT_FOUND: beacon block with root 0xaaaa…aaaa","stacktraces":[]} ``` In addition to this is the block root `0x0000000000000000000000000000000000000000000000000000000000000000` previously would return the genesis block. It will now return a `404`: ``` curl "http://localhost:5052/eth/v1/beacon/blocks/0x0000000000000000000000000000000000000000000000000000000000000000" {"code":404,"message":"NOT_FOUND: beacon block with root 0x0000…0000","stacktraces":[]} ``` ## Additional Info - `execution_optimistic` is always set, and will return `false` pre-Bellatrix. I am also open to the idea of doing something like `#[serde(skip_serializing_if = "Option::is_none")]`. - The value of `execution_optimistic` is set to `false` where possible. Any computation that is reliant on the `head` will simply use the `ExecutionStatus` of the head (unless the head block is pre-Bellatrix). Co-authored-by: Paul Hauner <paul@paulhauner.com>	2022-07-25 08:23:00 +00:00
Paul Hauner	21dec6f603	v2.4.0 (#3360 ) ## Issue Addressed NA ## Proposed Changes Bump versions to v2.4.0 ## Additional Info Blocked on: - ~~#3349~~ - ~~#3347~~	2022-07-21 22:02:36 +00:00
Pawan Dhananjay	612cdb7092	Merge readiness endpoint (#3349 ) ## Issue Addressed Resolves final task in https://github.com/sigp/lighthouse/issues/3260 ## Proposed Changes Adds a lighthouse http endpoint to indicate merge readiness. Blocked on #3339	2022-07-21 05:45:39 +00:00
Michael Sproul	e32868458f	Set safe block hash to justified (#3347 ) ## Issue Addressed Closes https://github.com/sigp/lighthouse/issues/3189. ## Proposed Changes - Always supply the justified block hash as the `safe_block_hash` when calling `forkchoiceUpdated` on the execution engine. - Refactor the `get_payload` routine to use the new `ForkchoiceUpdateParameters` struct rather than just the `finalized_block_hash`. I think this is a nice simplification and that the old way of computing the `finalized_block_hash` was unnecessary, but if anyone sees reason to keep that approach LMK.	2022-07-21 05:45:37 +00:00
Pawan Dhananjay	5b5cf9cfaa	Log ttd (#3339 ) ## Issue Addressed Resolves #3249 ## Proposed Changes Log merge related parameters and EE status in the beacon notifier before the merge. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2022-07-20 23:16:54 +00:00
ethDreamer	7c3ff903ca	Fix Gossip Penalties During Optimistic Sync Window (#3350 ) ## Issue Addressed * #3344 ## Proposed Changes There are a number of cases during block processing where we might get an `ExecutionPayloadError` but we shouldn't penalize peers. We were forgetting to enumerate all of the non-penalizing errors in every single match statement where we are making that decision. I created a function to make it explicit when we should and should not penalize peers and I used that function in all places where this logic is needed. This way we won't make the same mistake if we add another variant of `ExecutionPayloadError` in the future.	2022-07-20 20:59:38 +00:00
Pawan Dhananjay	e5e4e62758	Don't create a execution payload with same timestamp as terminal block (#3331 ) ## Issue Addressed Resolves #3316 ## Proposed Changes This PR fixes an issue where lighthouse created a transition block with `block.execution_payload().timestamp == terminal_block.timestamp` if the terminal block was created at the slot boundary.	2022-07-18 23:15:41 +00:00
Pawan Dhananjay	f9b9658711	Add merge support to simulator (#3292 ) ## Issue Addressed N/A ## Proposed Changes Make simulator merge compatible. Adds a `--post_merge` flag to the eth1 simulator that enables a ttd and simulates the merge transition. Uses the `MockServer` in the execution layer test utils to simulate a dummy execution node. Adds the merge transition simulation to CI.	2022-07-18 23:15:40 +00:00
Age Manning	2ed51c364d	Improve block-lookup functionality (#3287 ) Improves some of the functionality around single and parent block lookup. Gives extra information about whether failures for lookups are related to processing or downloading. This is entirely untested. Co-authored-by: Diva M <divma@protonmail.com>	2022-07-17 23:26:58 +00:00
Pawan Dhananjay	5243cc6c30	Add a u256_hex_be module to encode/decode U256 types (#3321 ) ## Issue Addressed Resolves #3314 ## Proposed Changes Add a module to encode/decode u256 types according to the execution layer encoding/decoding standards https://github.com/ethereum/execution-apis/blob/main/src/engine/specification.md#structures Updates `JsonExecutionPayloadV1.base_fee_per_gas`, `JsonExecutionPayloadHeaderV1.base_fee_per_gas` and `TransitionConfigurationV1.terminal_total_difficulty` to encode/decode according to standards Co-authored-by: Michael Sproul <micsproul@gmail.com>	2022-07-15 07:31:21 +00:00
Pawan Dhananjay	28b0ff27ff	Ignored sync jobs 2 (#3317 ) ## Issue Addressed Duplicate of #3269. Making this since @divagant-martian opened the previous PR and she can't approve her own PR 😄 Co-authored-by: Diva M <divma@protonmail.com>	2022-07-15 07:31:20 +00:00
Akihito Nakano	98a9626ef5	Bump the MSRV to 1.62 and using `#[derive(Default)]` on enums (#3304 ) ## Issue Addressed N/A ## Proposed Changes Since Rust 1.62, we can use `#[derive(Default)]` on enums. ✨ https://blog.rust-lang.org/2022/06/30/Rust-1.62.0.html#default-enum-variants There are no changes to functionality in this PR, just replaced the `Default` trait implementation with `#[derive(Default)]`.	2022-07-15 07:31:19 +00:00
Paul Hauner	1f54e10b7b	Do not interpret "latest valid hash" as identifying a valid hash (#3327 ) ## Issue Addressed NA ## Proposed Changes After some discussion in Discord with @mkalinin it was raised that it was not the intention of the engine API to have CLs validate the `latest_valid_hash` (LVH) and all ancestors. Whilst I believe the engine API is being updated such that the LVH must identify a valid hash or be set to some junk value, I'm not confident that we can rely upon the LVH as being valid (at least for now) due to the confusion surrounding it. Being able to validate blocks via the LVH is a relatively minor optimisation; if the LVH value ends up becoming our head we'll send an fcU and get the VALID status there. Falsely marking a block as valid has serious consequences and since it's a minor optimisation to use LVH I think that we don't take the risk. For clarity, we will still invalidate the descendants of the LVH, we just wont validate the ancestors. ## Additional Info NA	2022-07-13 23:07:49 +00:00
Paul Hauner	7a6e6928a3	Further remove EE redundancy (#3324 ) ## Issue Addressed Resolves #3176 ## Proposed Changes Continues from PRs by @divagant-martian to gradually remove EL redundancy (see #3284, #3257). This PR achieves: - Removes the `broadcast` and `first_success` methods. The functional impact is that every request to the EE will always be tried immediately, regardless of the cached `EngineState` (this resolves #3176). Previously we would check the engine state before issuing requests, this doesn't make sense in a single-EE world; there's only one EE so we might as well try it for every request. - Runs the upcheck/watchdog routine once per slot rather than thrice. When we had multiple EEs frequent polling was useful to try and detect when the primary EE had come back online and we could switch to it. That's not as relevant now. - Always creates logs in the `Engines::upcheck` function. Previously we would mute some logs since they could get really noisy when one EE was down but others were functioning fine. Now we only have one EE and are upcheck-ing it less, it makes sense to always produce logs. This PR purposefully does not achieve: - Updating all occurances of "engines" to "engine". I'm trying to keep the diff small and manageable. We can come back for this. ## Additional Info NA	2022-07-13 20:31:39 +00:00
Divma	6d42a09ff8	Merge Engines and Engine struct in one in the `execution_layer` crate (#3284 ) ## Issue Addressed Part of https://github.com/sigp/lighthouse/issues/3118, continuation of https://github.com/sigp/lighthouse/pull/3257 and https://github.com/sigp/lighthouse/pull/3283 ## Proposed Changes - Merge the [`Engines`](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L161-L165)`) struct and [`Engine` ](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L62-L67)`) - Remove unnecessary generics ## Additional Info There is more cleanup to do that will come in subsequent PRs	2022-07-11 01:44:41 +00:00
Michael Sproul	748475be1d	Ensure caches are built for block_rewards POST API (#3305 ) ## Issue Addressed Follow up to https://github.com/sigp/lighthouse/pull/3290 that fixes a caching bug ## Proposed Changes Build the committee cache for the new `POST /lighthouse/analysis/block_rewards` API. Due to an unusual quirk of the total active balance cache the API endpoint would sometimes fail after loading a state from disk which had a current epoch cache _but not_ a total active balance cache. This PR adds calls to build the caches immediately before they're required, and has been running smoothly with `blockdreamer` the last few days.	2022-07-04 02:56:15 +00:00
Akihito Nakano	1cc8a97d4e	Remove unused method in HandlerNetworkContext (#3299 ) ## Issue Addressed N/A ## Proposed Changes Removed unused method in `HandlerNetworkContext`.	2022-07-04 02:56:14 +00:00
Divma	1219da9a45	Simplify error handling after engines fallback removal (#3283 ) ## Issue Addressed Part of #3118, continuation of #3257 ## Proposed Changes - the [ `first_success_without_retry` ](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L348-L351)`) function returns a single error. - the [`first_success`](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L324)`) function returns a single error. - [ `EngineErrors` ](`9c429d0764/beacon_node/execution_layer/src/lib.rs (L69)`) carries a single error. - [`EngineError`](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L173-L177)`) now does not need to carry an Id - [`process_multiple_payload_statuses`](`9c429d0764/beacon_node/execution_layer/src/payload_status.rs (L46-L50)`) now doesn't need to receive an iterator of statuses and weight in different errors ## Additional Info This is built on top of #3294	2022-07-04 02:56:13 +00:00
Michael Sproul	61ed5f0ec6	Optimize historic committee calculation for the HTTP API (#3272 ) ## Issue Addressed Closes https://github.com/sigp/lighthouse/issues/3270 ## Proposed Changes Optimize the calculation of historic beacon committees in the HTTP API. This is achieved by allowing committee caches to be constructed for historic epochs, and constructing these committee caches on the fly in the API. This is much faster than reconstructing the state at the requested epoch, which usually takes upwards of 20s, and sometimes minutes with SPRP=8192. The depth of the `randao_mixes` array allows us to look back 64K epochs/0.8 years from a single state, which is pretty awesome! We always use the `state_id` provided by the caller, but will return a nice 400 error if the epoch requested is out of range for the state requested, e.g. ```bash # Prater curl "http://localhost:5052/eth/v1/beacon/states/3170304/committees?epoch=33538" ``` ```json {"code":400,"message":"BAD_REQUEST: epoch out of bounds, try state at slot 1081344","stacktraces":[]} ``` Queries will be fastest when aligned to `slot % SPRP == 0`, so the hint suggests a slot that is 0 mod 8192.	2022-07-04 02:56:11 +00:00
Paul Hauner	be4e261e74	Use async code when interacting with EL (#3244 ) ## Overview This rather extensive PR achieves two primary goals: 1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state. 2. Refactors fork choice, block production and block processing to `async` functions. Additionally, it achieves: - Concurrent forkchoice updates to the EL and cache pruning after a new head is selected. - Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production. - Concurrent per-block-processing and execution payload verification during block processing. - The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?): - I had to do this to deal with sending blocks into spawned tasks. - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones. - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap. - Avoids cloning all the blocks in every chain segment during sync. - It also has the potential to clean up our code where we need to pass an owned block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough 😅) - The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs. For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273 ## Changes to `canonical_head` and `fork_choice` Previously, the `BeaconChain` had two separate fields: ``` canonical_head: RwLock<Snapshot>, fork_choice: RwLock<BeaconForkChoice> ``` Now, we have grouped these values under a single struct: ``` canonical_head: CanonicalHead { cached_head: RwLock<Arc<Snapshot>>, fork_choice: RwLock<BeaconForkChoice> } ``` Apart from ergonomics, the only actual change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously. ## Breaking Changes ### The `state` (root) field in the `finalized_checkpoint` SSE event Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event: 1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`. 4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots. Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](`de2b2801c8/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java (L171-L182)`) it uses [`getStateRootFromBlockRoot`](`de2b2801c8/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java (L336-L341)`) which uses (1). I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku. ## Notes for Reviewers I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct. I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking". I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run at least once means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows when it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it. I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around. Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2. You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests: - Changing tests to be `tokio::async` tests. - Adding `.await` to fork choice, block processing and block production functions. - Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`. - Wrapping `SignedBeaconBlock` in an `Arc`. - In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant. I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic. Co-authored-by: Mac L <mjladson@pm.me>	2022-07-03 05:36:50 +00:00
realbigsean	a7da0677d5	Remove builder redundancy (#3294 ) ## Issue Addressed This PR is a subset of the changes in #3134. Unstable will still not function correctly with the new builder spec once this is merged, #3134 should be used on testnets ## Proposed Changes - Removes redundancy in "builders" (servers implementing the builder spec) - Renames `payload-builder` flag to `builder` - Moves from old builder RPC API to new HTTP API, but does not implement the validator registration API (implemented in https://github.com/sigp/lighthouse/pull/3194) Co-authored-by: sean <seananderson33@gmail.com> Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-07-01 01:15:19 +00:00
Divma	d40c76e667	Fix clippy lints for rust 1.62 (#3300 ) ## Issue Addressed Fixes some new clippy lints after the last rust release ### Lints fixed for the curious: - [cast_abs_to_unsigned](https://rust-lang.github.io/rust-clippy/master/index.html#cast_abs_to_unsigned) - [map_identity](https://rust-lang.github.io/rust-clippy/master/index.html#map_identity) - [let_unit_value](https://rust-lang.github.io/rust-clippy/master/index.html#let_unit_value) - [crate_in_macro_def](https://rust-lang.github.io/rust-clippy/master/index.html#crate_in_macro_def) - [extra_unused_lifetimes](https://rust-lang.github.io/rust-clippy/master/index.html#extra_unused_lifetimes) - [format_push_string](https://rust-lang.github.io/rust-clippy/master/index.html#format_push_string)	2022-06-30 22:51:49 +00:00
realbigsean	f6ec44f0dd	Register validator api (#3194 ) ## Issue Addressed Lays the groundwork for builder API changes by implementing the beacon-API's new `register_validator` endpoint ## Proposed Changes - Add a routine in the VC that runs on startup (re-try until success), once per epoch or whenever `suggested_fee_recipient` is updated, signing `ValidatorRegistrationData` and sending it to the BN. - TODO: `gas_limit` config options https://github.com/ethereum/builder-specs/issues/17 - BN only sends VC registration data to builders on demand, but VC registration data does update the BN's prepare proposer cache and send an updated fcU to a local EE. This is necessary for fee recipient consistency between the blinded and full block flow in the event of fallback. Having the BN only send registration data to builders on demand gives feedback directly to the VC about relay status. Also, since the BN has no ability to sign these messages anyways (so couldn't refresh them if it wanted), and validator registration is independent of the BN head, I think this approach makes sense. - Adds upcoming consensus spec changes for this PR https://github.com/ethereum/consensus-specs/pull/2884 - I initially applied the bit mask based on a configured application domain.. but I ended up just hard coding it here instead because that's how it's spec'd in the builder repo. - Should application mask appear in the api? Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-06-30 00:49:21 +00:00
Pawan Dhananjay	5de00b7ee8	Unify execution layer endpoints (#3214 ) ## Issue Addressed Resolves #3069 ## Proposed Changes Unify the `eth1-endpoints` and `execution-endpoints` flags in a backwards compatible way as described in https://github.com/sigp/lighthouse/issues/3069#issuecomment-1134219221 Users have 2 options: 1. Use multiple non auth execution endpoints for deposit processing pre-merge 2. Use a single jwt authenticated execution endpoint for both execution layer and deposit processing post merge Related https://github.com/sigp/lighthouse/issues/3118 To enable jwt authenticated deposit processing, this PR removes the calls to `net_version` as the `net` namespace is not exposed in the auth server in execution clients. Moving away from using `networkId` is a good step in my opinion as it doesn't provide us with any added guarantees over `chainId`. See https://github.com/ethereum/consensus-specs/issues/2163 and https://github.com/sigp/lighthouse/issues/2115 Co-authored-by: Paul Hauner <paul@paulhauner.com>	2022-06-29 09:07:09 +00:00
Michael Sproul	53b2b500db	Extend block reward APIs (#3290 ) ## Proposed Changes Add a new HTTP endpoint `POST /lighthouse/analysis/block_rewards` which takes a vec of `BeaconBlock`s as input and outputs the `BlockReward`s for them. Augment the `BlockReward` struct with the attestation data for attestations in the block, which simplifies access to this information from blockprint. Using attestation data I've been able to make blockprint up to 95% accurate across Prysm/Lighthouse/Teku/Nimbus. I hope to go even higher using a bunch of synthetic blocks produced for Prysm/Nimbus/Lodestar, which are underrepresented in the current training data.	2022-06-29 04:50:37 +00:00
Paul Hauner	45b2eb18bc	v2.3.2-rc.0 (#3289 ) ## Issue Addressed NA ## Proposed Changes Bump versions ## Additional Info NA	2022-06-28 03:03:30 +00:00
Pawan Dhananjay	7acfbd89ee	Recover from NonConsecutive eth1 errors (#3273 ) ## Issue Addressed Fixes #1864 and a bunch of other closed but unresolved issues. ## Proposed Changes Allows the deposit caching to recover from `NonConsecutive` deposit errors by resetting the last processed block to the last valid deposit's block number. Still not sure of the underlying cause of this error, but this should recover the cache so we don't need `--eth1-purge-cache` anymore 🎉 A huge thanks to @one-three-three-seven for reproducing the error and providing the data that helped testing out the fix 🙌 Still needs a few more tests.	2022-06-26 23:10:58 +00:00
Akihito Nakano	082ed35bdc	Test the pruning of excess peers using randomly generated input (#3248 ) ## Issue Addressed https://github.com/sigp/lighthouse/issues/3092 ## Proposed Changes Added property-based tests for the pruning implementation. A randomly generated input for the test contains connection direction, subnets, and scores. ## Additional Info I left some comments on this PR, what I have tried, and [a question](https://github.com/sigp/lighthouse/pull/3248#discussion_r891981969). Co-authored-by: Diva M <divma@protonmail.com>	2022-06-25 22:22:34 +00:00
Michael Sproul	d21f083777	Add more paths to HTTP API metrics (#3282 ) ## Proposed Changes Expand the set of paths tracked by the HTTP API metrics to include all paths hit by the validator client. These paths were only partially updated for Altair, so we were missing some of the sync committee and v2 APIs.	2022-06-23 05:19:21 +00:00
Paul Hauner	748658e32c	Add some debug logs for checkpoint sync (#3281 ) ## Issue Addressed NA ## Proposed Changes I used these logs when debugging a spurious failure with Infura and thought they might be nice to have around permanently. There's no changes to functionality in this PR, just some additional `debug!` logs. ## Additional Info NA	2022-06-23 05:19:20 +00:00
Divma	7af5742081	Deprecate step param in BlocksByRange RPC request (#3275 ) ## Issue Addressed Deprecates the step parameter in the blocks by range request ## Proposed Changes - Modifies the BlocksByRangeRequest type to remove the step parameter and everywhere we took it into account before - Adds a new type to still handle coding and decoding of requests that use the parameter ## Additional Info I went with a deprecation over the type itself so that requests received outside `lighthouse_network` don't even need to deal with this parameter. After the deprecation period just removing the Old blocks by range request should be straightforward	2022-06-22 16:23:34 +00:00
Divma	2063c0fa0d	Initial work to remove engines fallback from the `execution_layer` crate (#3257 ) ## Issue Addressed Part of #3160 ## Proposed Changes Use only the first url given in the execution engine, if more than one is provided log it. This change only moves having multiple engines to one. The amount of code cleanup that can and should be done forward is not small and would interfere with ongoing PRs. I'm keeping the changes intentionally very very minimal. ## Additional Info Future works: - In [ `EngineError` ](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L173-L177)`) the id is not needed since it now has no meaning. - the [ `first_success_without_retry` ](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L348-L351)`) function can return a single error. - the [`first_success`](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L324)`) function can return a single error. - After the redundancy is removed for the builders, we can probably make the [ `EngineErrors` ](`9c429d0764/beacon_node/execution_layer/src/lib.rs (L69)`) carry a single error. - Merge the [`Engines`](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L161-L165)`) struct and [`Engine` ](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L62-L67)`) - Fix the associated configurations and cli params. Not sure if both are done in https://github.com/sigp/lighthouse/pull/3214 In general I think those changes can be done incrementally and in individual pull requests.	2022-06-22 14:27:16 +00:00
Michael Sproul	efebf712dd	Avoid cloning snapshots during sync (#3271 ) ## Issue Addressed Closes https://github.com/sigp/lighthouse/issues/2944 ## Proposed Changes Remove snapshots from the cache during sync rather than cloning them. This reduces unnecessary cloning and memory fragmentation during sync. ## Additional Info This PR relies on the fact that the `block_delay` cache is not populated for blocks from sync. Relying on block delay may have the side effect that a change in `block_delay` calculation could lead to: a) more clones, if block delays are added for syncing blocks or b) less clones, if blocks near the head are erroneously provided without a `block_delay`. Case (a) would be a regression to the current status quo, and (b) is low-risk given we know that the snapshot cache is current susceptible to misses (hence `tree-states`).	2022-06-20 23:20:29 +00:00
eklm	a9e158663b	Fix validator_monitor_prev_epoch_ metrics (#2911 ) ## Issue Addressed #2820 ## Proposed Changes The problem is that validator_monitor_prev_epoch metrics are updated only if there is EpochSummary present in summaries map for the previous epoch and it is not the case for the offline validator. Ensure that EpochSummary is inserted into summaries map also for the offline validators.	2022-06-20 04:06:30 +00:00
Pawan Dhananjay	f428719761	Do not penalize peers on execution layer offline errors (#3258 ) ## Issue Addressed Partly resolves https://github.com/sigp/lighthouse/issues/3032 ## Proposed Changes Extracts some of the functionality of #3094 into a separate PR as the original PR requires a bit more work. Do not unnecessarily penalize peers when we fail to validate received execution payloads because our execution layer is offline.	2022-06-19 23:13:40 +00:00
Paul Hauner	564d7da656	v2.3.1 (#3262 ) ## Issue Addressed NA ## Proposed Changes Bump versions ## Additional Info NA	2022-06-14 05:25:38 +00:00
Divma	3dd50bda11	Improve substream management (#3261 ) ## Issue Addressed Which issue # does this PR address? ## Proposed Changes Please list or describe the changes introduced by this PR. ## Additional Info Please provide any additional information. For example, future considerations or information useful for reviewers.	2022-06-10 06:58:50 +00:00
Paul Hauner	11d80a6a38	Optimise `per_epoch_processing` low-hanging-fruit (#3254 ) ## Issue Addressed NA ## Proposed Changes - Uses a `Vec` in `SingleEpochParticipationCache` rather than `HashMap` to speed up processing times at the cost of memory usage. - Cache the result of `integer_sqrt` rather than recomputing for each validator. - Cache `state.previous_epoch` rather than recomputing it for each validator. ### Benchmarks Benchmarks on a recent mainnet state using #3252 to get timing. #### Without this PR ``` lcli skip-slots --state-path /tmp/state-0x3cdc.ssz --partial-state-advance --slots 32 --state-root 0x3cdc33cd02713d8d6cc33a6dbe2d3a5bf9af1d357de0d175a403496486ff845e --runs 10 [2022-06-09T08:21:02Z INFO lcli::skip_slots] Using mainnet spec [2022-06-09T08:21:02Z INFO lcli::skip_slots] Advancing 32 slots [2022-06-09T08:21:02Z INFO lcli::skip_slots] Doing 10 runs [2022-06-09T08:21:02Z INFO lcli::skip_slots] State path: "/tmp/state-0x3cdc.ssz" SSZ decoding /tmp/state-0x3cdc.ssz: 43ms [2022-06-09T08:21:03Z INFO lcli::skip_slots] Run 0: 245.718794ms [2022-06-09T08:21:03Z INFO lcli::skip_slots] Run 1: 245.364782ms [2022-06-09T08:21:03Z INFO lcli::skip_slots] Run 2: 255.866179ms [2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 3: 243.838909ms [2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 4: 250.431425ms [2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 5: 248.68765ms [2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 6: 262.051113ms [2022-06-09T08:21:05Z INFO lcli::skip_slots] Run 7: 264.293967ms [2022-06-09T08:21:05Z INFO lcli::skip_slots] Run 8: 293.202007ms [2022-06-09T08:21:05Z INFO lcli::skip_slots] Run 9: 264.552017ms ``` #### With this PR: ``` lcli skip-slots --state-path /tmp/state-0x3cdc.ssz --partial-state-advance --slots 32 --state-root 0x3cdc33cd02713d8d6cc33a6dbe2d3a5bf9af1d357de0d175a403496486ff845e --runs 10 [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 0: 73.898678ms [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 1: 75.536978ms [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 2: 75.176104ms [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 3: 76.460828ms [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 4: 75.904195ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 5: 75.53077ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 6: 74.745572ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 7: 75.823489ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 8: 74.892055ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 9: 76.333569ms ``` ## Additional Info NA	2022-06-10 04:29:28 +00:00
Divma	56b4cd88ca	minor libp2p upgrade (#3259 ) ## Issue Addressed Upgrades libp2p	2022-06-09 23:48:51 +00:00
Divma	cfd26d25e0	do not count sync batch attempts when peer is not at fault (#3245 ) ## Issue Addressed currently we count a failed attempt for a syncing chain even if the peer is not at fault. This makes us do more work if the chain fails, and heavily penalize peers, when we can simply retry. Inspired by a proposal I made to #3094 ## Proposed Changes If a batch fails but the peer is not at fault, do not count the attempt Also removes some annoying logs ## Additional Info We still get a counter on ignored attempts.. just in case	2022-06-07 02:35:56 +00:00
Divma	58e223e429	update libp2p (#3233 ) ## Issue Addressed na ## Proposed Changes Updates libp2p to https://github.com/libp2p/rust-libp2p/pull/2662 ## Additional Info From comments on the relevant PRs listed, we should pay attention at peer management consistency, but I don't think anything weird will happen. This is running in prater tok and sin	2022-06-07 02:35:55 +00:00
Michael Sproul	54cf94ea59	Fix per-slot timer in presence of clock changes (#3243 ) ## Issue Addressed Fixes a timing issue that results in spurious fork choice notifier failures: ``` WARN Error signalling fork choice waiter slot: 3962270, error: ForkChoiceSignalOutOfOrder { current: Slot(3962271), latest: Slot(3962270) }, service: beacon ``` There’s a fork choice run that is scheduled to run at the start of every slot by the `timer`, which creates a 12s interval timer when the beacon node starts up. The problem is that if there’s a bit of clock drift that gets corrected via NTP (or a leap second for that matter) then these 12s intervals will cease to line up with the start of the slot. This then creates the mismatch in slot number that we see above. Lighthouse also runs fork choice 500ms before the slot begins, and these runs are what is conflicting with the start-of-slot runs. This means that the warning in current versions of Lighthouse is mostly cosmetic because fork choice is up to date with all but the most recent 500ms of attestations (which usually isn’t many). ## Proposed Changes Fix the per-slot timer so that it continually re-calculates the duration to the start of the next slot and waits for that. A side-effect of this change is that we may skip slots if the per-slot task takes >12s to run, but I think this is an unlikely scenario and an acceptable compromise.	2022-06-06 23:52:32 +00:00
Divma	493c2c037c	reduce reprocess queue/channel sizes (#3239 ) ## Issue Addressed Reduces the effect of late blocks on overall node buildup ## Proposed Changes change the capacity of the channels used to send work for reprocessing in the beacon processor, and to send back to the main processor task, to be 75% of the capacity of the channel for receiving new events ## Additional Info The issues we've seen suggest we should still evaluate node performance under stress, with late blocks being a big factor. Other changes that could help: 1. right now we have a cap for queued attestations for reprocessing that applies to the sum of aggregated and unaggregated attestations. We could consider adding a separate cap that favors aggregated ones. 2. solving #2848	2022-06-06 23:52:31 +00:00
Akihito Nakano	a6d2ed6119	Fix: PeerManager doesn't remove "outbound only" peers which should be pruned (#3236 ) ## Issue Addressed This is one step to address https://github.com/sigp/lighthouse/issues/3092 before introducing `quickcheck`. I noticed an issue while I was reading the pruning implementation `PeerManager::prune_excess_peers()`. If a peer with the following condition, `outbound_peers_pruned` counter increases but the peer is not pushed to `peers_to_prune`. - [outbound only](`1e4ac8a4b9/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L1018)`) - [min_subnet_count <= MIN_SYNC_COMMITTEE_PEERS](`1e4ac8a4b9/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L1047)`) As a result, PeerManager doesn't remove "outbound" peers which should be pruned. Note: [`subnet_to_peer`](`e0d673ea86/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L999)`) (HashMap) doesn't guarantee a particular order of iteration. So whether the test fails depend on the order of iteration.	2022-06-06 05:51:10 +00:00
Michael Sproul	47d57a290b	Improve eth1 block cache sync (for Ropsten) (#3234 ) ## Issue Addressed Fix for the eth1 cache sync issue observed on Ropsten. ## Proposed Changes Ropsten blocks are so infrequent that they broke our algorithm for downloading eth1 blocks. We currently try to download forwards from the last block in our cache to the block with block number [`remote_highest_block - FOLLOW_DISTANCE + FOLLOW_DISTANCE / ETH1_BLOCK_TIME_TOLERANCE_FACTOR`](`6f732986f1/beacon_node/eth1/src/service.rs (L489-L492)`). With the tolerance set to 4 this is insufficient because we lag by 1536 blocks, which is more like ~14 hours on Ropsten. This results in us having an incomplete eth1 cache, because we should cache all blocks between -16h and -8h. Even if we were to set the tolerance to 2 for the largest allowance, we would only look back 1024 blocks which is still more than 8 hours. For example consider this block https://ropsten.etherscan.io/block/12321390. The block from 1536 blocks earlier is 14 hours and 20 minutes before it: https://ropsten.etherscan.io/block/12319854. The block from 1024 blocks earlier is https://ropsten.etherscan.io/block/12320366, 8 hours and 48 minutes before. - This PR introduces a new CLI flag called `--eth1-cache-follow-distance` which can be used to set the distance manually. - A new dynamic catchup mechanism is added which detects when the cache is lagging the true eth1 chain and tries to download more blocks within the follow distance in order to catch up.	2022-06-03 06:05:03 +00:00
Mac L	55ac423872	Emit log when fee recipient values are inconsistent (#3202 ) ## Issue Addressed #3156 ## Proposed Changes Emit a `WARN` log whenever the value of `fee_recipient` as returned from the EE is different from the value of `suggested_fee_recipient` as set on the BN, for example by the `--suggested-fee-recipient` CLI flag. ## Additional Info I have set the log level to `WARN` since it is legal behaviour (meaning it isn't really an error but is important to know when it is occurring). If we feel like this behaviour is almost always undesired (caused by a misconfiguration or malicious EE) then an `ERRO` log would be more appropriate. Happy to change it in that case.	2022-06-03 03:22:54 +00:00
Paul Hauner	16e49af8e1	Use genesis slot for node/syncing (#3226 ) ## Issue Addressed NA ## Proposed Changes Resolves this error log emitted from the VC prior to genesis: ``` WARN Unable connect to beacon node error: ServerMessage(ErrorMessage { code: 500, message: "UNHANDLED_ERROR: UnableToReadSlot", stacktraces: [] }) ``` ## Additional Info NA	2022-05-31 06:09:11 +00:00
Michael Sproul	98c8ac1a87	Fix typo in peer state transition log (#3224 ) ## Issue Addressed We were logging `out_finalized_epoch` instead of `our_finalized_epoch`. I noticed this ages ago but only just got around to fixing it. ## Additional Info I also reformatted the log line to respect the line length limit (`rustfmt` won't do it because it gets confused by the `;` in slog's log macros).	2022-05-31 06:09:10 +00:00
Paul Hauner	6f732986f1	v2.3.0 (#3222 ) ## Issue Addressed NA ## Proposed Changes Please list or describe the changes introduced by this PR. ## Additional Info - Pending testing on our infra. Please do not merge	2022-05-30 01:35:10 +00:00
Divma	a7896a58cc	move backfill sync jobs from highest priority to lowest (#3215 ) ## Issue Addressed #3212 ## Proposed Changes Move chain segments coming from back-fill syncing from highest priority to lowest ## Additional Info If this does not solve the issue, next steps would be lowering the batch size for back-fill sync, and as last resort throttling the processing of these chain segments	2022-05-26 02:05:17 +00:00
Paul Hauner	f4aa17ef85	v2.3.0-rc.0 (#3218 ) ## Issue Addressed NA ## Proposed Changes Bump versions ## Additional Info NA	2022-05-25 05:29:26 +00:00
Michael Sproul	229f883968	Avoid parallel fork choice runs during sync (#3217 ) ## Issue Addressed Fixes an issue that @paulhauner found with the v2.3.0 release candidate whereby the fork choice runs introduced by #3168 tripped over each other during sync: ``` May 24 23:06:40.542 WARN Error signalling fork choice waiter slot: 3884129, error: ForkChoiceSignalOutOfOrder { current: Slot(3884131), latest: Slot(3884129) }, service: beacon ``` This can occur because fork choice is called from the state advance _and_ the per-slot task. When one of these runs takes a long time it can end up finishing after a run from a later slot, tripping the error above. The problem is resolved by not running either of these fork choice calls during sync. Additionally, these parallel fork choice runs were causing issues in the database: ``` May 24 07:49:05.098 WARN Found a chain that should already have been pruned, head_slot: 92925, head_block_root: 0xa76c7bf1b98e54ed4b0d8686efcfdf853484e6c2a4c67e91cbf19e5ad1f96b17, service: beacon May 24 07:49:05.101 WARN Database migration failed error: HotColdDBError(FreezeSlotError { current_split_slot: Slot(92608), proposed_split_slot: Slot(92576) }), service: beacon ``` In this case, two fork choice calls triggering the finalization processing were being processed out of order due to differences in their processing time, causing the background migrator to try to advance finalization _backwards_ 😳. Removing the parallel fork choice runs from sync effectively addresses the issue, because these runs are most likely to have different finalized checkpoints (because of the speed at which fork choice advances during sync). In theory it's still possible to process updates out of order if any other fork choice runs end up completing out of order, but this should be much less common. Fixing out of order fork choice runs in general is difficult as it requires architectural changes like serialising fork choice updates through a single thread, or locking fork choice along with the head when it is mutated (https://github.com/sigp/lighthouse/pull/3175). ## Proposed Changes * Don't run per-slot fork choice during sync (if head is older than 4 slots) * Don't run state-advance fork choice during sync (if head is older than 4 slots) * Check for monotonic finalization updates in the background migrator. This is a good defensive check to have, and I'm not sure why we didn't have it before (we may have had it and wrongly removed it).	2022-05-25 03:27:30 +00:00
Paul Hauner	7a64994283	Call per_slot_task from a blocking thread (v2) (#3199 ) This PR was adapted from @pawanjay176's work in #3197. ## Issue Addressed Fixes a regression in https://github.com/sigp/lighthouse/pull/3168 ## Proposed Changes https://github.com/sigp/lighthouse/pull/3168 added calls to `fork_choice` in `BeaconChain::per_slot_task` function. This leads to a panic as `per_slot_task` is called from an async context which calls fork choice, which then calls `block_on`. This PR changes the timer to call the `per_slot_task` function in a blocking thread. Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>	2022-05-20 23:05:07 +00:00
Michael Sproul	6eaeaa542f	Fix Rust 1.61 clippy lints (#3192 ) ## Issue Addressed This fixes the low-hanging Clippy lints introduced in Rust 1.61 (due any hour now). It _ignores_ one lint, because fixing it requires a structural refactor of the validator client that needs to be done delicately. I've started on that refactor and will create another PR that can be reviewed in more depth in the coming days. I think we should merge this PR in the meantime to unblock CI.	2022-05-20 05:02:13 +00:00
Michael Sproul	8fa032c8ae	Run fork choice before block proposal (#3168 ) ## Issue Addressed Upcoming spec change https://github.com/ethereum/consensus-specs/pull/2878 ## Proposed Changes 1. Run fork choice at the start of every slot, and wait for this run to complete before proposing a block. 2. As an optimisation, also run fork choice 3/4 of the way through the slot (at 9s), _dequeueing attestations for the next slot_. 3. Remove the fork choice run from the state advance timer that occurred before advancing the state. ## Additional Info ### Block Proposal Accuracy This change makes us more likely to propose on top of the correct head in the presence of re-orgs with proposer boost in play. The main scenario that this change is designed to address is described in the linked spec issue. ### Attestation Accuracy This change _also_ makes us more likely to attest to the correct head. Currently in the case of a skipped slot at `slot` we only run fork choice 9s into `slot - 1`. This means the attestations from `slot - 1` aren't taken into consideration, and any boost applied to the block from `slot - 1` is not removed (it should be). In the language of the linked spec issue, this means we are liable to attest to C, even when the majority voting weight has already caused a re-org to B. ### Why remove the call before the state advance? If we've run fork choice at the start of the slot then it has already dequeued all the attestations from the previous slot, which are the only ones eligible to influence the head in the current slot. Running fork choice again is unnecessary (unless we run it for the next slot and try to pre-empt a re-org, but I don't currently think this is a great idea). ### Performance Based on Prater testing this adds about 5-25ms of runtime to block proposal times, which are 500-1000ms on average (and spike to 5s+ sometimes due to state handling issues 😢 ). I believe this is a small enough penalty to enable it by default, with the option to disable it via the new flag `--fork-choice-before-proposal-timeout 0`. Upcoming work on block packing and state representation will also reduce block production times in general, while removing the spikes. ### Implementation Fork choice gets invoked at the start of the slot via the `per_slot_task` function called from the slot timer. It then uses a condition variable to signal to block production that fork choice has been updated. This is a bit funky, but it seems to work. One downside of the timer-based approach is that it doesn't happen automatically in most of the tests. The test added by this PR has to trigger the run manually.	2022-05-20 05:02:11 +00:00
realbigsean	54b58fdc01	Log out response status when we hit `PayloadIdUnavailable` (#3190 ) ## Issue Addressed @z3n-chada is currently getting a `PayloadIdUnavailable` error when connecting lighthouse to Erigon and it's difficult to discern why so this just logs out the response status from the EE when we hit an `PayloadIdUnavailable` error Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-05-19 06:00:48 +00:00
Akihito Nakano	695f415590	Tiny improvement: PeerManager and maximum discovery query (#3182 ) ## Issue Addressed As [`Discovery` bounds the maximum discovery query](`e88b18be09/beacon_node/lighthouse_network/src/discovery/mod.rs (L328)`), `PeerManager` no need to handle it. `e88b18be09/beacon_node/lighthouse_network/src/discovery/mod.rs (L328)`	2022-05-19 06:00:46 +00:00
Mac L	def9bc660e	Remove DB migrations for legacy database schemas (#3181 ) ## Proposed Changes Remove support for DB migrations that support upgrading from schema's below version 5. This is mostly for cosmetic/code quality reasons as in most circumstances upgrading from versions of Lighthouse this old will almost always require a re-sync. ## Additional Info The minimum supported database schema is now version 5.	2022-05-17 04:54:39 +00:00
Paul Hauner	db8a6f81ea	Prevent attestation to future blocks from early attester cache (#3183 ) ## Issue Addressed N/A ## Proposed Changes Prevents the early attester cache from producing attestations to future blocks. This bug could result in a missed head vote if the BN was requested to produce an attestation for an earlier slot than the head block during the (usually) short window of time between verifying a block and setting it as the head. This bug was noticed in an [Antithesis](https://andreagrieser.com/) test and diagnosed by @realbigsean. ## Additional Info NA	2022-05-17 01:51:25 +00:00
Paul Hauner	38050fa460	Allow `TaskExecutor` to be used in `async` tests (#3178 ) # Description Since the `TaskExecutor` currently requires a `Weak<Runtime>`, it's impossible to use it in an async test where the `Runtime` is created outside our scope. Whilst we could create a new `Runtime` instance inside the async test, dropping that `Runtime` would cause a panic (you can't drop a `Runtime` in an async context). To address this issue, this PR creates the `enum Handle`, which supports either: - A `Weak<Runtime>` (for use in our production code) - A `Handle` to a runtime (for use in testing) In theory, there should be no change to the behaviour of our production code (beyond some slightly different descriptions in HTTP 500 errors), or even our tests. If there is no change, you might ask "why bother?". There are two PRs (#3070 and #3175) that are waiting on these fixes to introduce some new tests. Since we've added the EL to the `BeaconChain` (for the merge), we are now doing more async stuff in tests. I've also added a `RuntimeExecutor` to the `BeaconChainTestHarness`. Whilst that's not immediately useful, it will become useful in the near future with all the new async testing.	2022-05-16 08:35:59 +00:00
François Garillot	3f9e83e840	[refactor] Refactor Option/Result combinators (#3180 ) Code simplifications using `Option`/`Result` combinators to make pattern-matches a tad simpler. Opinions on these loosely held, happy to adjust in review. Tool-aided by [comby-rust](https://github.com/huitseeker/comby-rust).	2022-05-16 01:59:47 +00:00
Michael Sproul	bcdd960ab1	Separate execution payloads in the DB (#3157 ) ## Proposed Changes Reduce post-merge disk usage by not storing finalized execution payloads in Lighthouse's database. ⚠️ This is achieved in a backwards-incompatible way for networks that have already merged ⚠️. Kiln users and shadow fork enjoyers will be unable to downgrade after running the code from this PR. The upgrade migration may take several minutes to run, and can't be aborted after it begins. The main changes are: - New column in the database called `ExecPayload`, keyed by beacon block root. - The `BeaconBlock` column now stores blinded blocks only. - Lots of places that previously used full blocks now use blinded blocks, e.g. analytics APIs, block replay in the DB, etc. - On finalization: - `prune_abanonded_forks` deletes non-canonical payloads whilst deleting non-canonical blocks. - `migrate_db` deletes finalized canonical payloads whilst deleting finalized states. - Conversions between blinded and full blocks are implemented in a compositional way, duplicating some work from Sean's PR #3134. - The execution layer has a new `get_payload_by_block_hash` method that reconstructs a payload using the EE's `eth_getBlockByHash` call. - I've tested manually that it works on Kiln, using Geth and Nethermind. - This isn't necessarily the most efficient method, and new engine APIs are being discussed to improve this: https://github.com/ethereum/execution-apis/pull/146. - We're depending on the `ethers` master branch, due to lots of recent changes. We're also using a workaround for https://github.com/gakonst/ethers-rs/issues/1134. - Payload reconstruction is used in the HTTP API via `BeaconChain::get_block`, which is now `async`. Due to the `async` fn, the `blocking_json` wrapper has been removed. - Payload reconstruction is used in network RPC to serve blocks-by-{root,range} responses. Here the `async` adjustment is messier, although I think I've managed to come up with a reasonable compromise: the handlers take the `SendOnDrop` by value so that they can drop it on _task completion_ (after the `fn` returns). Still, this is introducing disk reads onto core executor threads, which may have a negative performance impact (thoughts appreciated). ## Additional Info - [x] For performance it would be great to remove the cloning of full blocks when converting them to blinded blocks to write to disk. I'm going to experiment with a `put_block` API that takes the block by value, breaks it into a blinded block and a payload, stores the blinded block, and then re-assembles the full block for the caller. - [x] We should measure the latency of blocks-by-root and blocks-by-range responses. - [x] We should add integration tests that stress the payload reconstruction (basic tests done, issue for more extensive tests: https://github.com/sigp/lighthouse/issues/3159) - [x] We should (manually) test the schema v9 migration from several prior versions, particularly as blocks have changed on disk and some migrations rely on being able to load blocks. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2022-05-12 00:42:17 +00:00
Michael Sproul	ae47a93c42	Don't panic in forkchoiceUpdated handler (#3165 ) ## Issue Addressed Fix a panic due to misuse of the Tokio executor when processing a forkchoiceUpdated response. We were previously calling `process_invalid_execution_payload` from the async function `update_execution_engine_forkchoice_async`, which resulted in a panic because `process_invalid_execution_payload` contains a call to fork choice, which ultimately calls `block_on`. An example backtrace can be found here: https://gist.github.com/michaelsproul/ac5da03e203d6ffac672423eaf52fb20 ## Proposed Changes Wrap the call to `process_invalid_execution_payload` in a `spawn_blocking` so that `block_on` is no longer called from an async context. ## Additional Info - I've been thinking about how to catch bugs like this with static analysis (a new Clippy lint). - The payload validation tests have been re-worked to support distinct responses from the mock EE for newPayload and forkchoiceUpdated. Three new tests have been added covering the `Invalid`, `InvalidBlockHash` and `InvalidTerminalBlock` cases. - I think we need a bunch more tests of different legal and illegal variations	2022-05-04 23:30:34 +00:00
Pawan Dhananjay	db0beb5178	Poll shutdown timeout in rpc handler (#3153 ) ## Issue Addressed N/A ## Proposed Changes Previously, we were using `Sleep::is_elapsed()` to check if the shutdown timeout had triggered without polling the sleep. This PR polls the sleep timer.	2022-04-13 03:54:44 +00:00
Divma	580d2f7873	log upgrades + prevent dialing of disconnecting peers (#3148 ) ## Issue Addressed We still ping peers that are considered in a disconnecting state ## Proposed Changes Do not ping peers once we decide they are disconnecting Upgrade logs about ignored rpc messages ## Additional Info --	2022-04-13 03:54:43 +00:00
Paul Hauner	b49b4291a3	Disallow attesting to optimistic head (#3140 ) ## Issue Addressed NA ## Proposed Changes Disallow the production of attestations and retrieval of unaggregated attestations when they reference an optimistic head. Add tests to this end. I also moved `BeaconChain::produce_unaggregated_attestation_for_block` to the `BeaconChainHarness`. It was only being used during tests, so it's nice to stop pretending it's production code. I also needed something that could produce attestations to optimistic blocks in order to simulate scenarios where the justified checkpoint is determined invalid (if no one would attest to an optimistic block, we could never justify it and then flip it to invalid). ## Additional Info - ~~Blocked on #3126~~	2022-04-13 03:54:42 +00:00
Divma	7366266bd1	keep failed finalized chains to avoid retries (#3142 ) ## Issue Addressed In very rare occasions we've seen most if not all our peers in a chain with which we don't agree. Purging these peers can take a very long time: number of retries of the chain. Meanwhile sync is caught in a loop trying the chain again and again. This makes it so that we fast track purging peers via registering the failed chain to prevent retrying for some time (30 seconds). Longer times could be dangerous since a chain can fail if a batch fails to download for example. In this case, I think it's still acceptable to fast track purging peers since they are nor providing the required info anyway Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>	2022-04-13 01:10:55 +00:00
Michael Sproul	aa72088f8f	v2.2.1 (#3149 ) ## Issue Addressed Addresses sync stalls on v2.2.0 (i.e. https://github.com/sigp/lighthouse/issues/3147). ## Additional Info I've avoided doing a full `cargo update` because I noticed there's a new patch version of libp2p and thought it could do with some more testing. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2022-04-12 02:52:12 +00:00

... 2 3 4 5 6 ...

2183 Commits