lighthouse

Author	SHA1	Message	Date
Paul Hauner	2cd3e3a768	Avoid duplicate committee cache loads (#3574 ) ## Issue Addressed NA ## Proposed Changes I have observed scenarios on Goerli where Lighthouse was receiving attestations which reference the same, un-cached shuffling on multiple threads at the same time. Lighthouse was then loading the same state from database and determining the shuffling on multiple threads at the same time. This is unnecessary load on the disk and RAM. This PR modifies the shuffling cache so that each entry can be either: - A committee - A promise for a committee (i.e., a `crossbeam_channel::Receiver`) Now, in the scenario where we have thread A and thread B simultaneously requesting the same un-cached shuffling, we will have the following: 1. Thread A will take the write-lock on the shuffling cache, find that there's no cached committee and then create a "promise" (a `crossbeam_channel::Sender`) for a committee before dropping the write-lock. 1. Thread B will then be allowed to take the write-lock for the shuffling cache and find the promise created by thread A. It will block the current thread waiting for thread A to fulfill that promise. 1. Thread A will load the state from disk, obtain the shuffling, send it down the channel, insert the entry into the cache and then continue to verify the attestation. 1. Thread B will then receive the shuffling from the receiver, be un-blocked and then continue to verify the attestation. In the case where thread A fails to generate the shuffling and drops the sender, the next time that specific shuffling is requested we will detect that the channel is disconnected and return a `None` entry for that shuffling. This will cause the shuffling to be re-calculated. ## Additional Info NA	2022-09-16 08:54:03 +00:00
Paul Hauner	7d3948c8fe	Add metric for re-org distance (#3566 ) ## Issue Addressed NA ## Proposed Changes Add a metric to track the re-org distance. ## Additional Info NA	2022-09-13 17:19:27 +00:00
tim gretler	98815516a1	Support histogram buckets (#3391 ) ## Issue Addressed #3285 ## Proposed Changes Adds support for specifying histogram with buckets and adds new metric buckets for metrics mentioned in issue. ## Additional Info Need some help for the buckets. Co-authored-by: Michael Sproul <micsproul@gmail.com>	2022-09-13 01:57:44 +00:00
Nils Effinghausen	f682df51a1	fix description for BALANCES_CACHE_MISSES metric (#3545 ) ## Issue Addressed fixes metric description Co-authored-by: Nils Effinghausen <nils.effinghausen@t-systems.com>	2022-09-10 01:35:10 +00:00
realbigsean	d1a8d6cf91	Pin mev rs deps (#3557 ) ## Issue Addressed We were unable to update lighthouse by running `cargo update` because some of the `mev-build-rs` deps weren't pinned. But `mev-build-rs` is now pinned here and includes it's own pinned commits for `ssz-rs` and `etheruem-consensus` Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-09-08 23:46:03 +00:00
Michael Sproul	9a7f7f1c1e	Configurable monitoring endpoint frequency (#3530 ) ## Issue Addressed Closes #3514 ## Proposed Changes - Change default monitoring endpoint frequency to 120 seconds to fit with 30k requests/month limit. - Allow configuration of the monitoring endpoint frequency using `--monitoring-endpoint-frequency N` where `N` is a value in seconds.	2022-09-05 08:29:00 +00:00
realbigsean	177aef8f1e	Builder profit threshold flag (#3534 ) ## Issue Addressed Resolves https://github.com/sigp/lighthouse/issues/3517 ## Proposed Changes Adds a `--builder-profit-threshold <wei value>` flag to the BN. If an external payload's value field is less than this value, the local payload will be used. The value of the local payload will not be checked (it can't really be checked until the engine API is updated to support this). Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-09-05 04:50:49 +00:00
realbigsean	cae40731a2	Strict count unrealized (#3522 ) ## Issue Addressed Add a flag that can increase count unrealized strictness, defaults to false ## Proposed Changes Please list or describe the changes introduced by this PR. ## Additional Info Please provide any additional information. For example, future considerations or information useful for reviewers. Co-authored-by: realbigsean <seananderson33@gmail.com> Co-authored-by: sean <seananderson33@gmail.com>	2022-09-05 04:50:47 +00:00
Mac L	80359d8ddb	Fix attestation performance API `InvalidValidatorIndex` error (#3503 ) ## Issue Addressed When requesting an index which is not active during `start_epoch`, Lighthouse returns: ``` curl "http://localhost:5052/lighthouse/analysis/attestation_performance/999999999?start_epoch=100000&end_epoch=100000" ``` ```json { "code": 500, "message": "INTERNAL_SERVER_ERROR: ParticipationCache(InvalidValidatorIndex(999999999))", "stacktraces": [] } ``` This error occurs even when the index in question becomes active before `end_epoch` which is undesirable as it can prevent larger queries from completing. ## Proposed Changes In the event the index is out-of-bounds (has not yet been activated), simply return all fields as `false`: ``` -> curl "http://localhost:5052/lighthouse/analysis/attestation_performance/999999999?start_epoch=100000&end_epoch=100000" ``` ```json [ { "index": 999999999, "epochs": { "100000": { "active": false, "head": false, "target": false, "source": false } } } ] ``` By doing this, we cover the case where a validator becomes active sometime between `start_epoch` and `end_epoch`. ## Additional Info Note that this error only occurs for epochs after the Altair hard fork.	2022-09-05 04:50:45 +00:00
Divma	473abc14ca	Subscribe to subnets only when needed (#3419 ) ## Issue Addressed We currently subscribe to attestation subnets as soon as the subscription arrives (one epoch in advance), this makes it so that subscriptions for future slots are scheduled instead of done immediately. ## Proposed Changes - Schedule subscriptions to subnets for future slots. - Finish removing hashmap_delay, in favor of [delay_map](https://github.com/AgeManning/delay_map). This was the only remaining service to do this. - Subscriptions for past slots are rejected, before we would subscribe for one slot. - Add a new test for subscriptions that are not consecutive. ## Additional Info This is also an effort in making the code easier to understand	2022-09-05 00:22:48 +00:00
Paul Hauner	aa022f4685	v3.1.0 (#3525 ) ## Issue Addressed NA ## Proposed Changes - Bump versions ## Additional Info - ~~Blocked on #3508~~ - ~~Blocked on #3526~~ - ~~Requires additional testing.~~ - Expected release date is 2022-09-01	2022-08-31 22:21:55 +00:00
Paul Hauner	661307dce1	Separate committee subscriptions queue (#3508 ) ## Issue Addressed NA ## Proposed Changes As we've seen on Prater, there seems to be a correlation between these messages ``` WARN Not enough time for a discovery search subnet_id: ExactSubnet { subnet_id: SubnetId(19), slot: Slot(3742336) }, service: attestation_service ``` ... and nodes falling 20-30 slots behind the head for short periods. These nodes are running ~20k Prater validators. After running some metrics, I can see that the `network_recv` channel is processing ~250k `AttestationSubscribe` messages per minute. It occurred to me that perhaps the `AttestationSubscribe` messages are "washing out" the `SendRequest` and `SendResponse` messages. In this PR I separate the `AttestationSubscribe` and `SyncCommitteeSubscribe` messages into their own queue so the `tokio::select!` in the `NetworkService` can still process the other messages in the `network_recv` channel without necessarily having to clear all the subscription messages first. ~~I've also added filter to the HTTP API to prevent duplicate subscriptions going to the network service.~~ ## Additional Info - Currently being tested on Prater	2022-08-30 05:47:31 +00:00
Michael Sproul	7a50684741	Harden slot notifier against clock drift (#3519 ) ## Issue Addressed Partly resolves #3518 ## Proposed Changes Change the slot notifier to use `duration_to_next_slot` rather than an interval timer. This makes it robust against underlying clock changes.	2022-08-29 14:34:43 +00:00
Paul Hauner	1a833ecc17	Add more logging for invalid payloads (#3515 ) ## Issue Addressed NA ## Proposed Changes Adds more `debug` logging to help troubleshoot invalid execution payload blocks. I was doing some of this recently and found it to be challenging. With this PR we should be able to grep `Invalid execution payload` and get one-liners that will show the block, slot and details about the proposer. I also changed the log in `process_invalid_execution_payload` since it was a little misleading; the `block_root` wasn't necessary the block which had an invalid payload. ## Additional Info NA	2022-08-29 14:34:42 +00:00
Paul Hauner	8609cced0e	Reset payload statuses when resuming fork choice (#3498 ) ## Issue Addressed NA ## Proposed Changes This PR is motivated by a recent consensus failure in Geth where it returned `INVALID` for an `VALID` block. Without this PR, the only way to recover is by re-syncing Lighthouse. Whilst ELs "shouldn't have consensus failures", in reality it's something that we can expect from time to time due to the complex nature of Ethereum. Being able to recover easily will help the network recover and EL devs to troubleshoot. The risk introduced with this PR is that genuinely INVALID payloads get a "second chance" at being imported. I believe the DoS risk here is negligible since LH needs to be restarted in order to re-process the payload. Furthermore, there's no reason to think that a well-performing EL will accept a truly invalid payload the second-time-around. ## Additional Info This implementation has the following intricacies: 1. Instead of just resetting invalid payloads to optimistic, we'll also reset valid payloads. This is an artifact of our existing implementation. 1. We will only reset payload statuses when we detect an invalid payload present in `proto_array` - This helps save us from forgetting that all our blocks are valid in the "best case scenario" where there are no invalid blocks. 1. If we fail to revert the payload statuses we'll log a `CRIT` and just continue with a `proto_array` that does not have reverted payload statuses. - The code to revert statuses needs to deal with balances and proposer-boost, so it's a failure point. This is a defensive measure to avoid introducing new show-stopping bugs to LH.	2022-08-29 14:34:41 +00:00
Michael Sproul	66eca1a882	Refactor op pool for speed and correctness (#3312 ) ## Proposed Changes This PR has two aims: to speed up attestation packing in the op pool, and to fix bugs in the verification of attester slashings, proposer slashings and voluntary exits. The changes are bundled into a single database schema upgrade (v12). Attestation packing is sped up by removing several inefficiencies: - No more recalculation of `attesting_indices` during packing. - No (unnecessary) examination of the `ParticipationFlags`: a bitfield suffices. See `RewardCache`. - No re-checking of attestation validity during packing: the `AttestationMap` provides attestations which are "correct by construction" (I have checked this using Hydra). - No SSZ re-serialization for the clunky `AttestationId` type (it can be removed in a future release). So far the speed-up seems to be roughly 2-10x, from 500ms down to 50-100ms. Verification of attester slashings, proposer slashings and voluntary exits is fixed by: - Tracking the `ForkVersion`s that were used to verify each message inside the `SigVerifiedOp`. This allows us to quickly re-verify that they match the head state's opinion of what the `ForkVersion` should be at the epoch(s) relevant to the message. - Storing the `SigVerifiedOp` on disk rather than the raw operation. This allows us to continue track the fork versions after a reboot. This is mostly contained in this commit 52bb1840ae5c4356a8fc3a51e5df23ed65ed2c7f. ## Additional Info The schema upgrade uses the justified state to re-verify attestations and compute `attesting_indices` for them. It will drop any attestations that fail to verify, by the logic that attestations are most valuable in the few slots after they're observed, and are probably stale and useless by the time a node restarts. Exits and proposer slashings and similarly re-verified to obtain `SigVerifiedOp`s. This PR contains a runtime killswitch `--paranoid-block-proposal` which opts out of all the optimisations in favour of closely verifying every included message. Although I'm quite sure that the optimisations are correct this flag could be useful in the event of an unforeseen emergency. Finally, you might notice that the `RewardCache` appears quite useless in its current form because it is only updated on the hot-path immediately before proposal. My hope is that in future we can shift calls to `RewardCache::update` into the background, e.g. while performing the state advance. It is also forward-looking to `tree-states` compatibility, where iterating and indexing `state.{previous,current}_epoch_participation` is expensive and needs to be minimised.	2022-08-29 09:10:26 +00:00
realbigsean	cb132c622d	don't register exited or slashed validators with the builder api (#3473 ) ## Issue Addressed #3465 ## Proposed Changes Filter out any validator registrations for validators that are not `active` or `pending`. I'm adding this filtering the beacon node because all the information is readily available there. In other parts of the VC we are usually sending per-validator requests based on duties from the BN. And duties will only be provided for active validators so we don't have this type of filtering elsewhere in the VC. Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-08-24 23:34:58 +00:00
Divma	8c69d57c2c	Pause sync when EE is offline (#3428 ) ## Issue Addressed #3032 ## Proposed Changes Pause sync when ee is offline. Changes include three main parts: - Online/offline notification system - Pause sync - Resume sync #### Online/offline notification system - The engine state is now guarded behind a new struct `State` that ensures every change is correctly notified. Notifications are only sent if the state changes. The new `State` is behind a `RwLock` (as before) as the synchronization mechanism. - The actual notification channel is a [tokio::sync::watch](https://docs.rs/tokio/latest/tokio/sync/watch/index.html) which ensures only the last value is in the receiver channel. This way we don't need to worry about message order etc. - Sync waits for state changes concurrently with normal messages. #### Pause Sync Sync has four components, pausing is done differently in each: - Block lookups: Disabled while in this state. We drop current requests and don't search for new blocks. Block lookups are infrequent and I don't think it's worth the extra logic of keeping these and delaying processing. If we later see that this is required, we can add it. - Parent lookups: Disabled while in this state. We drop current requests and don't search for new parents. Parent lookups are even less frequent and I don't think it's worth the extra logic of keeping these and delaying processing. If we later see that this is required, we can add it. - Range: Chains don't send batches for processing to the beacon processor. This is easily done by guarding the channel to the beacon processor and giving it access only if the ee is responsive. I find this the simplest and most powerful approach since we don't need to deal with new sync states and chain segments that are added while the ee is offline will follow the same logic without needing to synchronize a shared state among those. Another advantage of passive pause vs active pause is that we can still keep track of active advertised chain segments so that on resume we don't need to re-evaluate all our peers. - Backfill: Not affected by ee states, we don't pause. #### Resume Sync - Block lookups: Enabled again. - Parent lookups: Enabled again. - Range: Active resume. Since the only real pause range does is not sending batches for processing, resume makes all chains that are holding read-for-processing batches send them. - Backfill: Not affected by ee states, no need to resume. ## Additional Info QUESTION: Originally I made this to notify and change on synced state, but @pawanjay176 on talks with @paulhauner concluded we only need to check online/offline states. The upcheck function mentions extra checks to have a very up to date sync status to aid the networking stack. However, the only need the networking stack would have is this one. I added a TODO to review if the extra check can be removed Next gen of #3094 Will work best with #3439 Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>	2022-08-24 23:34:56 +00:00
Michael Sproul	aab4a8d2f2	Update docs for mainnet merge release (#3494 ) ## Proposed Changes Update the merge migration docs to encourage updating mainnet configs _now_! The docs are also updated to recommend _against_ `--suggested-fee-recipient` on the beacon node (https://github.com/sigp/lighthouse/issues/3432). Additionally the `--help` for the CLI is updated to match with a few small semantic changes: - `--execution-jwt` is no longer allowed without `--execution-endpoint`. We've ended up without a default for `--execution-endpoint`, so I think that's fine. - The flags related to the JWT are only allowed if `--execution-jwt` is provided.	2022-08-23 03:50:58 +00:00
Paul Hauner	18c61a5e8b	v3.0.0 (#3464 ) ## Issue Addressed NA ## Proposed Changes Bump versions to v3.0.0 ## Additional Info - ~~Blocked on #3439~~ - ~~Blocked on #3459~~ - ~~Blocked on #3463~~ - ~~Blocked on #3462~~ - ~~Requires further testing~~ Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2022-08-22 03:43:08 +00:00
Paul Hauner	931153885c	Run per-slot fork choice at a further distance from the head (#3487 ) ## Issue Addressed NA ## Proposed Changes Run fork choice when the head is 256 slots from the wall-clock slot, rather than 4. The reason we don't always run FC is so that it doesn't slow us down during sync. As the comments state, setting the value to 256 means that we'd only have one interrupting fork-choice call if we were syncing at 20 slots/sec. ## Additional Info NA	2022-08-19 04:27:24 +00:00
Paul Hauner	df358b864d	Add metrics for EE `PayloadStatus` returns (#3486 ) ## Issue Addressed NA ## Proposed Changes Adds some metrics so we can track payload status responses from the EE. I think this will be useful for troubleshooting and alerting. I also bumped the `BecaonChain::per_slot_task` to `debug` since it doesn't seem too noisy and would have helped us with some things we were debugging in the past. ## Additional Info NA	2022-08-19 04:27:23 +00:00
Paul Hauner	043fa2153e	Revise EE peer penalites (#3485 ) ## Issue Addressed NA ## Proposed Changes Don't penalize peers for errors that might be caused by an honest optimistic node. ## Additional Info NA	2022-08-19 04:27:22 +00:00
Michael Sproul	8255c8682e	Align engine API timeouts with spec (#3470 ) ## Proposed Changes Match the timeouts from the `execution-apis` spec. Our existing values were already quite close so I don't imagine this change to be very disruptive. The spec sets the timeout for `engine_getPayloadV1` to only 1 second, but we were already using a longer value of 2 seconds. I've kept the 2 second timeout as I don't think there's any need to fail faster when producing a payload. There's no timeout specified for `eth_syncing` so I've matched it to the shortest timeout from the spec (1 second). I think the previous value of 250ms was likely too low and could have been contributing to spurious timeouts, particularly for remote ELs. ## Additional Info The timeouts are defined on each endpoint in this document: https://github.com/ethereum/execution-apis/blob/main/src/engine/specification.md	2022-08-17 02:36:39 +00:00
Michael Sproul	e5fc9f26bc	Log if no execution endpoint is configured (#3467 ) ## Issue Addressed Fixes an issue whereby syncing a post-merge network without an execution endpoint would silently stall. Sync swallows the errors from block verification so previously there was no indication in the logs for why the node couldn't sync. ## Proposed Changes Add an error log to the merge-readiness notifier for the case where the merge has already completed but no execution endpoint is configured.	2022-08-15 01:31:02 +00:00
Michael Sproul	25e3dc9300	Fix block verification and checkpoint sync caches (#3466 ) ## Issue Addressed Closes https://github.com/sigp/lighthouse/issues/2962 ## Proposed Changes Build all caches on the checkpoint state before storing it in the database. Additionally, fix a bug in `signature_verify_chain_segment` which prevented block verification from succeeding unless the previous epoch cache was already built. The previous epoch cache is required to verify the signatures of attestations included from previous epochs, even when all the blocks in the segment are from the same epoch. The comments around `signature_verify_chain_segment` have also been updated to reflect the fact that it should only be used on a chain of blocks from a single epoch. I believe this restriction had already been added at some point in the past and that the current comments were just outdated (and I think because the proposer shuffling can change in the next epoch based on the blocks applied in the current epoch that this limitation is essential).	2022-08-15 01:31:00 +00:00
Paul Hauner	f03f9ba680	Increase merge-readiness lookhead (#3463 ) ## Issue Addressed NA ## Proposed Changes Start issuing merge-readiness logs 2 weeks before the Bellatrix fork epoch. Additionally, if the Bellatrix epoch is specified and the use has configured an EL, always log merge readiness logs, this should benefit pro-active users. ### Lookahead Reasoning - Bellatrix fork is: - epoch 144896 - slot 4636672 - Unix timestamp: `1606824023 + (4636672 * 12) = 1662464087` - GMT: Tue Sep 06 2022 11:34:47 GMT+0000 - Warning start time is: - Unix timestamp: `1662464087 - 604800 * 2 = 1661254487` - GMT: Tue Aug 23 2022 11:34:47 GMT+0000 The [current expectation](https://discord.com/channels/595666850260713488/745077610685661265/1007445305198911569) is that EL and CL clients will releases out by Aug 22nd at the latest, then an EF announcement will go out on the 23rd. If all goes well, LH will start alerting users about merge-readiness just after the announcement. ## Additional Info NA	2022-08-15 01:30:59 +00:00
Michael Sproul	92d597ad23	Modularise slasher backend (#3443 ) ## Proposed Changes Enable multiple database backends for the slasher, either MDBX (default) or LMDB. The backend can be selected using `--slasher-backend={lmdb,mdbx}`. ## Additional Info In order to abstract over the two library's different handling of database lifetimes I've used `Box::leak` to give the `Environment` type a `'static` lifetime. This was the only way I could think of using 100% safe code to construct a self-referential struct `SlasherDB`, where the `OpenDatabases` refers to the `Environment`. I think this is OK, as the `Environment` is expected to live for the life of the program, and both database engines leave the database in a consistent state after each write. The memory claimed for memory-mapping will be freed by the OS and appropriately flushed regardless of whether the `Environment` is actually dropped. We are depending on two `sigp` forks of `libmdbx-rs` and `lmdb-rs`, to give us greater control over MDBX OS support and LMDB's version.	2022-08-15 01:30:56 +00:00
Pawan Dhananjay	71fd0b42f2	Fix lints for Rust 1.63 (#3459 ) ## Issue Addressed N/A ## Proposed Changes Fix clippy lints for latest rust version 1.63. I have allowed the [derive_partial_eq_without_eq](https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq) lint as satisfying this lint would result in more code that we might not want and I feel it's not required. Happy to fix this lint across lighthouse if required though.	2022-08-12 00:56:39 +00:00
Divma	f4ffa9e0b4	Handle processing results of non faulty batches (#3439 ) ## Issue Addressed Solves #3390 So after checking some logs @pawanjay176 got, we conclude that this happened because we blacklisted a chain after trying it "too much". Now here, in all occurrences it seems that "too much" means we got too many download failures. This happened very slowly, exactly because the batch is allowed to stay alive for very long times after not counting penalties when the ee is offline. The error here then was not that the batch failed because of offline ee errors, but that we blacklisted a chain because of download errors, which we can't pin on the chain but on the peer. This PR fixes that. ## Proposed Changes Adds a missing piece of logic so that if a chain fails for errors that can't be attributed to an objectively bad behavior from the peer, it is not blacklisted. The issue at hand occurred when new peers arrived claiming a head that had wrongfully blacklisted, even if the original peers participating in the chain were not penalized. Another notable change is that we need to consider a batch invalid if it processed correctly but its next non empty batch fails processing. Now since a batch can fail processing in non empty ways, there is no need to mark as invalid previous batches. Improves some logging as well. ## Additional Info We should do this regardless of pausing sync on ee offline/unsynced state. This is because I think it's almost impossible to ensure a processing result will reach in a predictable order with a synced notification from the ee. Doing this handles what I think are inevitable data races when we actually pause sync This also fixes a return that reports which batch failed and caused us some confusion checking the logs	2022-08-12 00:56:38 +00:00
Paul Hauner	4fc0cb121c	Remove some "wontfix" TODOs for the merge (#3449 ) ## Issue Addressed NA ## Proposed Changes Removes three types of TODOs: 1. `execution_layer/src/lib.rs`: It was [determined](https://github.com/ethereum/consensus-specs/issues/2636#issuecomment-988688742) that there is no action required here. 2. `beacon_processor/worker/gossip_methods.rs`: Removed TODOs relating to peer scoring that have already been addressed via `epe.penalize_peer()`. - It seems `cargo fmt` wanted to adjust some things here as well 🤷 3. `proto_array_fork_choice.rs`: it would be nice to remove that useless `bool` for cleanliness, but I don't think it's something we need to do and the TODO just makes things look messier IMO. ## Additional Info There should be no functional changes to the code in this PR. There are still some TODOs lingering, those ones require actual changes or more thought.	2022-08-10 13:06:46 +00:00
Michael Sproul	4e05f19fb5	Serve Bellatrix preset in BN API (#3425 ) ## Issue Addressed Resolves #3388 Resolves #2638 ## Proposed Changes - Return the `BellatrixPreset` on `/eth/v1/config/spec` by default. - Allow users to opt out of this by providing `--http-spec-fork=altair` (unless there's a Bellatrix fork epoch set). - Add the Altair constants from #2638 and make serving the constants non-optional (the `http-disable-legacy-spec` flag is deprecated). - Modify the VC to only read the `Config` and not to log extra fields. This prevents it from having to muck around parsing the `ConfigAndPreset` fields it doesn't need. ## Additional Info This change is backwards-compatible for the VC and the BN, but is marked as a breaking change for the removal of `--http-disable-legacy-spec`. I tried making `Config` a `superstruct` too, but getting the automatic decoding to work was a huge pain and was going to require a lot of hacks, so I gave up in favour of keeping the default-based approach we have now.	2022-08-10 07:52:59 +00:00
Pawan Dhananjay	c25934956b	Remove INVALID_TERMINAL_BLOCK (#3385 ) ## Issue Addressed Resolves #3379 ## Proposed Changes Remove instances of `InvalidTerminalBlock` in lighthouse and use `Invalid {latest_valid_hash: "0x0000000000000000000000000000000000000000000000000000000000000000"}` to represent that status.	2022-08-10 07:52:58 +00:00
Paul Hauner	2de26b20f8	Don't return errors on HTTP API for already-known messages (#3341 ) ## Issue Addressed - Resolves #3266 ## Proposed Changes Return 200 OK rather than an error when a block, attestation or sync message is already known. Presently, we will log return an error which causes a BN to go "offline" from the VCs perspective which causes the fallback mechanism to do work to try and avoid and upcheck offline nodes. This can be observed as instability in the `vc_beacon_nodes_available_count` metric. The current behaviour also causes scary logs for the user. There's nothing to actually be concerned about when we see duplicate messages, this can happen on fallback systems (see code comments). ## Additional Info NA	2022-08-10 07:52:57 +00:00
realbigsean	6f13727fbe	Don't use the builder network if the head is optimistic (#3412 ) ## Issue Addressed Resolves https://github.com/sigp/lighthouse/issues/3394 Adds a check in `is_healthy` about whether the head is optimistic when choosing whether to use the builder network. Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-08-09 06:05:16 +00:00
Paul Hauner	a688621919	Add support for beaconAPI in `lcli` functions (#3252 ) ## Issue Addressed NA ## Proposed Changes Modifies `lcli skip-slots` and `lcli transition-blocks` allow them to source blocks/states from a beaconAPI and also gives them some more features to assist with benchmarking. ## Additional Info Breaks the current `lcli skip-slots` and `lcli transition-blocks` APIs by changing some flag names. It should be simple enough to figure out the changes via `--help`. Currently blocked on #3263.	2022-08-09 06:05:13 +00:00
Michael Sproul	6bc4a2cc91	Update invalid head tests (#3400 ) ## Proposed Changes Update the invalid head tests so that they work with the current default fork choice configuration. Thanks @realbigsean for fixing the persistence test and the EF tests. Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-08-05 23:41:09 +00:00
Mac L	5d317779bb	Ensure `validator/blinded_blocks/{slot}` endpoint conforms to spec (#3429 ) ## Issue Addressed #3418 ## Proposed Changes - Remove `eth/v2/validator/blinded_blocks/{slot}` as this endpoint does not exist in the spec. - Return `version` in the `eth/v1/validator/blinded_blocks/{slot}` endpoint. ## Additional Info Since this removes the `v2` endpoint, this is technically a breaking change, but as this does not exist in the spec users may or may not be relying on this. Depending on what we feel is appropriate, I'm happy to edit this so we keep the `v2` endpoint for now but simply bring the `v1` endpoint in line with `v2`.	2022-08-05 06:46:58 +00:00
realbigsean	43ce0de73f	Downgrade log for 204 from builder (#3411 ) ## Issue Addressed A 204 from the connected builder just indicates there's no payload available from the builder, not that there's an issue. So I don't actually think this should be a warn. During the merge transition when we are pre-finalization a 204 will actually be expected. And maybe even longer if the relay chooses to delay providing payloads for a longer period post-merge. Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-08-03 17:13:15 +00:00
Michael Sproul	df51a73272	Release v2.5.1 (#3406 ) ## Issue Addressed Patch release to address fork choice issues in the presence of clock drift: https://github.com/sigp/lighthouse/pull/3402	2022-08-03 04:23:09 +00:00
Mac L	e24552d61a	Restore backwards compatibility when using older BNs (#3410 ) ## Issue Addressed https://github.com/status-im/nimbus-eth2/issues/3930 ## Proposed Changes We can trivially support beacon nodes which do not provide the `is_optimistic` field by wrapping the field in an `Option`.	2022-08-02 23:20:51 +00:00
Paul Hauner	d0beecca20	Make fork choice prune again (#3408 ) ## Issue Addressed NA ## Proposed Changes There was a regression in #3244 (released in v2.4.0) which stopped pruning fork choice (see [here](https://github.com/sigp/lighthouse/pull/3244#discussion_r935187485)). This would form a very slow memory leak, using ~100mb per month. The release has been out for ~11 days, so users should not be seeing a dangerous increase in memory, yet. Credits to @michaelsproul for noticing this 🎉 ## Additional Info NA	2022-08-02 07:58:42 +00:00
Justin Traglia	807bc8b0b3	Fix a few typos in option help strings (#3401 ) ## Proposed Changes Fixes a typo I noticed while looking at options.	2022-08-02 00:58:24 +00:00
Michael Sproul	18383a63b2	Tidy eth1/deposit contract logging (#3397 ) ## Issue Addressed Fixes an issue identified by @remyroy whereby we were logging a recommendation to use `--eth1-endpoints` on merge-ready setups (when the execution layer was out of sync). ## Proposed Changes I took the opportunity to clean up the other eth1-related logs, replacing "eth1" by "deposit contract" or "execution" as appropriate. I've downgraded the severity of the `CRIT` log to `ERRO` and removed most of the recommendation text. The reason being that users lacking an execution endpoint will be informed by the new `WARN Not merge ready` log pre-Bellatrix, or the regular errors from block verification post-Bellatrix.	2022-08-01 07:20:43 +00:00
Paul Hauner	2983235650	v2.5.0 (#3392 ) ## Issue Addressed NA ## Proposed Changes Bump versions. ## Additional Info - ~~Blocked on #3383~~ - ~~Awaiting further testing.~~	2022-08-01 03:41:08 +00:00
Paul Hauner	bcfde6e7df	Indicate that invalid blocks are optimistic (#3383 ) ## Issue Addressed NA ## Proposed Changes This PR will make Lighthouse return blocks with invalid payloads via the API with `execution_optimistic = true`. This seems a bit awkward, however I think it's better than returning a 404 or some other error. Let's consider the case where the only possible head is invalid (#3370 deals with this). In such a scenario all of the duties endpoints will start failing because the head is invalid. I think it would be better if the duties endpoints continue to work, because it's likely that even though the head is invalid the duties are still based upon valid blocks and we want the VC to have them cached. There's no risk to the VC here because we won't actually produce an attestation pointing to an invalid head. Ultimately, I don't think it's particularly important for us to distinguish between optimistic and invalid blocks on the API. Neither should be trusted and the only real reason that we track this is so we can try and fork around the invalid blocks. ## Additional Info - ~~Blocked on #3370~~	2022-07-30 05:08:57 +00:00
Michael Sproul	fdfdb9b57c	Enable `count-unrealized` by default (#3389 ) ## Issue Addressed Enable https://github.com/sigp/lighthouse/pull/3322 by default on all networks. The feature can be opted out of using `--count-unrealized=false` (the CLI flag is updated to take a parameter).	2022-07-30 00:22:41 +00:00
Pawan Dhananjay	b3ce8d0de9	Fix penalties in sync methods (#3384 ) ## Issue Addressed N/A ## Proposed Changes Uses the `penalize_peer` function added in #3350 in sync methods as well. The existing code in sync methods missed the `ExecutionPayloadError::UnverifiedNonOptimisticCandidate` case.	2022-07-30 00:22:39 +00:00
ethDreamer	034260bd99	Initial Commit of Retrospective OTB Verification (#3372 ) ## Issue Addressed * #2983 ## Proposed Changes Basically followed the [instructions laid out here](https://github.com/sigp/lighthouse/issues/2983#issuecomment-1062494947) Co-authored-by: Paul Hauner <paul@paulhauner.com> Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>	2022-07-30 00:22:38 +00:00
realbigsean	6c2d8b2262	Builder Specs v0.2.0 (#3134 ) ## Issue Addressed https://github.com/sigp/lighthouse/issues/3091 Extends https://github.com/sigp/lighthouse/pull/3062, adding pre-bellatrix block support on blinded endpoints and allowing the normal proposal flow (local payload construction) on blinded endpoints. This resulted in better fallback logic because the VC will not have to switch endpoints on failure in the BN <> Builder API, the BN can just fallback immediately and without repeating block processing that it shouldn't need to. We can also keep VC fallback from the VC<>BN API's blinded endpoint to full endpoint. ## Proposed Changes - Pre-bellatrix blocks on blinded endpoints - Add a new `PayloadCache` to the execution layer - Better fallback-from-builder logic ## Todos - [x] Remove VC transition logic - [x] Add logic to only enable builder flow after Merge transition finalization - [x] Tests - [x] Fix metrics - [x] Rustdocs Co-authored-by: Mac L <mjladson@pm.me> Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-07-30 00:22:37 +00:00
Paul Hauner	25f0e261cb	Don't return errors when fork choice fails (#3370 ) ## Issue Addressed NA ## Proposed Changes There are scenarios where the only viable head will have an invalid execution payload, in this scenario the `get_head` function on `proto_array` will return an error. We must recover from this scenario by importing blocks from the network. This PR stops `BeaconChain::recompute_head` from returning an error so that we can't accidentally start down-scoring peers or aborting block import just because the current head has an invalid payload. ## Reviewer Notes The following changes are included: 1. Allow `fork_choice.get_head` to fail gracefully in `BeaconChain::process_block` when trying to update the `early_attester_cache`; simply don't add the block to the cache rather than aborting the entire process. 1. Don't return an error from `BeaconChain::recompute_head_at_current_slot` and `BeaconChain::recompute_head` to defensively prevent calling functions from aborting any process just because the fork choice function failed to run. - This should have practically no effect, since most callers were still continuing if recomputing the head failed. - The outlier is that the API will return 200 rather than a 500 when fork choice fails. 1. Add the `ProtoArrayForkChoice::set_all_blocks_to_optimistic` function to recover from the scenario where we've rebooted and the persisted fork choice has an invalid head.	2022-07-28 13:57:09 +00:00
Michael Sproul	d04fde3ba9	Remove equivocating validators from fork choice (#3371 ) ## Issue Addressed Closes https://github.com/sigp/lighthouse/issues/3241 Closes https://github.com/sigp/lighthouse/issues/3242 ## Proposed Changes * [x] Implement logic to remove equivocating validators from fork choice per https://github.com/ethereum/consensus-specs/pull/2845 * [x] Update tests to v1.2.0-rc.1. The new test which exercises `equivocating_indices` is passing. * [x] Pull in some SSZ abstractions from the `tree-states` branch that make implementing Vec-compatible encoding for types like `BTreeSet` and `BTreeMap`. * [x] Implement schema upgrades and downgrades for the database (new schema version is V11). * [x] Apply attester slashings from blocks to fork choice ## Additional Info * This PR doesn't need the `BTreeMap` impl, but `tree-states` does, and I don't think there's any harm in keeping it. But I could also be convinced to drop it. Blocked on #3322.	2022-07-28 09:43:41 +00:00
Pawan Dhananjay	f3439116da	Return ResourceUnavailable if we are unable to reconstruct execution payloads (#3365 ) ## Issue Addressed Resolves #3351 ## Proposed Changes Returns a `ResourceUnavailable` rpc error if we are unable to serve full payloads to blocks by root and range requests because the execution layer is not synced. ## Additional Info This PR also changes the penalties such that a `ResourceUnavailable` error is only penalized if it is an outgoing request. If we are syncing and aren't getting full block responses, then we don't have use for the peer. However, this might not be true for the incoming request case. We let the peer decide in this case if we are still useful or if we should be banned. cc @divagant-martian please let me know if i'm missing something here.	2022-07-27 03:20:00 +00:00
Justin Traglia	0f62d900fe	Fix some typos (#3376 ) ## Proposed Changes This PR fixes various minor typos in the project.	2022-07-27 00:51:06 +00:00
Mac L	44fae52cd7	Refuse to sign sync committee messages when head is optimistic (#3191 ) ## Issue Addressed Resolves #3151 ## Proposed Changes When fetching duties for sync committee contributions, check the value of `execution_optimistic` of the head block from the BN and refuse to sign any sync committee messages `if execution_optimistic == true`. ## Additional Info - Is backwards compatible with older BNs - Finding a way to add test coverage for this would be prudent. Open to suggestions.	2022-07-27 00:51:05 +00:00
Mac L	d316305411	Add `is_optimistic` to `eth/v1/node/syncing` response (#3374 ) ## Issue Addressed As specified in the [Beacon Chain API specs](https://github.com/ethereum/beacon-APIs/blob/master/apis/node/syncing.yaml#L32-L35) we should return `is_optimistic` as part of the response to a query for the `eth/v1/node/syncing` endpoint. ## Proposed Changes Compute the optimistic status of the head and add it to the `SyncingData` response.	2022-07-26 08:50:16 +00:00
realbigsean	904dd62524	Strict fee recipient (#3363 ) ## Issue Addressed Resolves #3267 Resolves #3156 ## Proposed Changes - Move the log for fee recipient checks from proposer cache insertion into block proposal so we are directly checking what we get from the EE - Only log when there is a discrepancy with the local EE, not when using the builder API. In the `builder-api` branch there is an `info` log when there is a discrepancy, I think it is more likely there will be a difference in fee recipient with the builder api because proposer payments might be made via a transaction in the block. Not really sure what patterns will become commong. - Upgrade the log from a `warn` to an `error` - not actually sure which we want, but I think this is worth an error because the local EE with default transaction ordering I think should pretty much always use the provided fee recipient - add a `strict-fee-recipient` flag to the VC so we only sign blocks with matching fee recipients. Falls back from the builder API to the local API if there is a discrepancy . Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-07-26 02:17:24 +00:00
ethDreamer	f7354abe0f	Fix Block Cache Range Math for Faster Syncing (#3358 ) ## Issue Addressed While messing with the deposit snapshot stuff, I had my proxy running and noticed the beacon node wasn't syncing the block cache continuously. There were long periods where it did nothing. I believe this was caused by a logical error introduced in #3234 that dealt with an issue that arose while syncing the block cache on Ropsten. The problem is that when the block cache is initially syncing, it will trigger the logic that detects the cache is far behind the execution chain in time. This will trigger a batch syncing mechanism which is intended to sync further ahead than the chain would normally. But the batch syncing is actually slower than the range this function usually estimates (in this scenario). ## Proposed Changes I believe I've fixed this function by taking the end of the range to be the maximum of (batch syncing range, usual range). I've also renamed and restructured some things a bit. It's equivalent logic but I think it's more clear what's going on.	2022-07-26 02:17:21 +00:00
realbigsean	20ebf1f3c1	Realized unrealized experimentation (#3322 ) ## Issue Addressed Add a flag that optionally enables unrealized vote tracking. Would like to test out on testnets and benchmark differences in methods of vote tracking. This PR includes a DB schema upgrade to enable to new vote tracking style. Co-authored-by: realbigsean <sean@sigmaprime.io> Co-authored-by: Paul Hauner <paul@paulhauner.com> Co-authored-by: sean <seananderson33@gmail.com> Co-authored-by: Mac L <mjladson@pm.me>	2022-07-25 23:53:26 +00:00
Mac L	bb5a6d2cca	Add `execution_optimistic` flag to HTTP responses (#3070 ) ## Issue Addressed #3031 ## Proposed Changes Updates the following API endpoints to conform with https://github.com/ethereum/beacon-APIs/pull/190 and https://github.com/ethereum/beacon-APIs/pull/196 - [x] `beacon/states/{state_id}/root` - [x] `beacon/states/{state_id}/fork` - [x] `beacon/states/{state_id}/finality_checkpoints` - [x] `beacon/states/{state_id}/validators` - [x] `beacon/states/{state_id}/validators/{validator_id}` - [x] `beacon/states/{state_id}/validator_balances` - [x] `beacon/states/{state_id}/committees` - [x] `beacon/states/{state_id}/sync_committees` - [x] `beacon/headers` - [x] `beacon/headers/{block_id}` - [x] `beacon/blocks/{block_id}` - [x] `beacon/blocks/{block_id}/root` - [x] `beacon/blocks/{block_id}/attestations` - [x] `debug/beacon/states/{state_id}` - [x] `debug/beacon/heads` - [x] `validator/duties/attester/{epoch}` - [x] `validator/duties/proposer/{epoch}` - [x] `validator/duties/sync/{epoch}` Updates the following Server-Sent Events: - [x] `events?topics=head` - [x] `events?topics=block` - [x] `events?topics=finalized_checkpoint` - [x] `events?topics=chain_reorg` ## Backwards Incompatible There is a very minor breaking change with the way the API now handles requests to `beacon/blocks/{block_id}/root` and `beacon/states/{state_id}/root` when `block_id` or `state_id` is the `Root` variant of `BlockId` and `StateId` respectively. Previously a request to a non-existent root would simply echo the root back to the requester: ``` curl "http://localhost:5052/eth/v1/beacon/states/0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/root" {"data":{"root":"0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}} ``` Now it will return a `404`: ``` curl "http://localhost:5052/eth/v1/beacon/blocks/0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/root" {"code":404,"message":"NOT_FOUND: beacon block with root 0xaaaa…aaaa","stacktraces":[]} ``` In addition to this is the block root `0x0000000000000000000000000000000000000000000000000000000000000000` previously would return the genesis block. It will now return a `404`: ``` curl "http://localhost:5052/eth/v1/beacon/blocks/0x0000000000000000000000000000000000000000000000000000000000000000" {"code":404,"message":"NOT_FOUND: beacon block with root 0x0000…0000","stacktraces":[]} ``` ## Additional Info - `execution_optimistic` is always set, and will return `false` pre-Bellatrix. I am also open to the idea of doing something like `#[serde(skip_serializing_if = "Option::is_none")]`. - The value of `execution_optimistic` is set to `false` where possible. Any computation that is reliant on the `head` will simply use the `ExecutionStatus` of the head (unless the head block is pre-Bellatrix). Co-authored-by: Paul Hauner <paul@paulhauner.com>	2022-07-25 08:23:00 +00:00
Paul Hauner	21dec6f603	v2.4.0 (#3360 ) ## Issue Addressed NA ## Proposed Changes Bump versions to v2.4.0 ## Additional Info Blocked on: - ~~#3349~~ - ~~#3347~~	2022-07-21 22:02:36 +00:00
Pawan Dhananjay	612cdb7092	Merge readiness endpoint (#3349 ) ## Issue Addressed Resolves final task in https://github.com/sigp/lighthouse/issues/3260 ## Proposed Changes Adds a lighthouse http endpoint to indicate merge readiness. Blocked on #3339	2022-07-21 05:45:39 +00:00
Michael Sproul	e32868458f	Set safe block hash to justified (#3347 ) ## Issue Addressed Closes https://github.com/sigp/lighthouse/issues/3189. ## Proposed Changes - Always supply the justified block hash as the `safe_block_hash` when calling `forkchoiceUpdated` on the execution engine. - Refactor the `get_payload` routine to use the new `ForkchoiceUpdateParameters` struct rather than just the `finalized_block_hash`. I think this is a nice simplification and that the old way of computing the `finalized_block_hash` was unnecessary, but if anyone sees reason to keep that approach LMK.	2022-07-21 05:45:37 +00:00
Pawan Dhananjay	5b5cf9cfaa	Log ttd (#3339 ) ## Issue Addressed Resolves #3249 ## Proposed Changes Log merge related parameters and EE status in the beacon notifier before the merge. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2022-07-20 23:16:54 +00:00
ethDreamer	7c3ff903ca	Fix Gossip Penalties During Optimistic Sync Window (#3350 ) ## Issue Addressed * #3344 ## Proposed Changes There are a number of cases during block processing where we might get an `ExecutionPayloadError` but we shouldn't penalize peers. We were forgetting to enumerate all of the non-penalizing errors in every single match statement where we are making that decision. I created a function to make it explicit when we should and should not penalize peers and I used that function in all places where this logic is needed. This way we won't make the same mistake if we add another variant of `ExecutionPayloadError` in the future.	2022-07-20 20:59:38 +00:00
Pawan Dhananjay	e5e4e62758	Don't create a execution payload with same timestamp as terminal block (#3331 ) ## Issue Addressed Resolves #3316 ## Proposed Changes This PR fixes an issue where lighthouse created a transition block with `block.execution_payload().timestamp == terminal_block.timestamp` if the terminal block was created at the slot boundary.	2022-07-18 23:15:41 +00:00
Pawan Dhananjay	f9b9658711	Add merge support to simulator (#3292 ) ## Issue Addressed N/A ## Proposed Changes Make simulator merge compatible. Adds a `--post_merge` flag to the eth1 simulator that enables a ttd and simulates the merge transition. Uses the `MockServer` in the execution layer test utils to simulate a dummy execution node. Adds the merge transition simulation to CI.	2022-07-18 23:15:40 +00:00
Age Manning	2ed51c364d	Improve block-lookup functionality (#3287 ) Improves some of the functionality around single and parent block lookup. Gives extra information about whether failures for lookups are related to processing or downloading. This is entirely untested. Co-authored-by: Diva M <divma@protonmail.com>	2022-07-17 23:26:58 +00:00
Pawan Dhananjay	5243cc6c30	Add a u256_hex_be module to encode/decode U256 types (#3321 ) ## Issue Addressed Resolves #3314 ## Proposed Changes Add a module to encode/decode u256 types according to the execution layer encoding/decoding standards https://github.com/ethereum/execution-apis/blob/main/src/engine/specification.md#structures Updates `JsonExecutionPayloadV1.base_fee_per_gas`, `JsonExecutionPayloadHeaderV1.base_fee_per_gas` and `TransitionConfigurationV1.terminal_total_difficulty` to encode/decode according to standards Co-authored-by: Michael Sproul <micsproul@gmail.com>	2022-07-15 07:31:21 +00:00
Pawan Dhananjay	28b0ff27ff	Ignored sync jobs 2 (#3317 ) ## Issue Addressed Duplicate of #3269. Making this since @divagant-martian opened the previous PR and she can't approve her own PR 😄 Co-authored-by: Diva M <divma@protonmail.com>	2022-07-15 07:31:20 +00:00
Akihito Nakano	98a9626ef5	Bump the MSRV to 1.62 and using `#[derive(Default)]` on enums (#3304 ) ## Issue Addressed N/A ## Proposed Changes Since Rust 1.62, we can use `#[derive(Default)]` on enums. ✨ https://blog.rust-lang.org/2022/06/30/Rust-1.62.0.html#default-enum-variants There are no changes to functionality in this PR, just replaced the `Default` trait implementation with `#[derive(Default)]`.	2022-07-15 07:31:19 +00:00
Paul Hauner	1f54e10b7b	Do not interpret "latest valid hash" as identifying a valid hash (#3327 ) ## Issue Addressed NA ## Proposed Changes After some discussion in Discord with @mkalinin it was raised that it was not the intention of the engine API to have CLs validate the `latest_valid_hash` (LVH) and all ancestors. Whilst I believe the engine API is being updated such that the LVH must identify a valid hash or be set to some junk value, I'm not confident that we can rely upon the LVH as being valid (at least for now) due to the confusion surrounding it. Being able to validate blocks via the LVH is a relatively minor optimisation; if the LVH value ends up becoming our head we'll send an fcU and get the VALID status there. Falsely marking a block as valid has serious consequences and since it's a minor optimisation to use LVH I think that we don't take the risk. For clarity, we will still invalidate the descendants of the LVH, we just wont validate the ancestors. ## Additional Info NA	2022-07-13 23:07:49 +00:00
Paul Hauner	7a6e6928a3	Further remove EE redundancy (#3324 ) ## Issue Addressed Resolves #3176 ## Proposed Changes Continues from PRs by @divagant-martian to gradually remove EL redundancy (see #3284, #3257). This PR achieves: - Removes the `broadcast` and `first_success` methods. The functional impact is that every request to the EE will always be tried immediately, regardless of the cached `EngineState` (this resolves #3176). Previously we would check the engine state before issuing requests, this doesn't make sense in a single-EE world; there's only one EE so we might as well try it for every request. - Runs the upcheck/watchdog routine once per slot rather than thrice. When we had multiple EEs frequent polling was useful to try and detect when the primary EE had come back online and we could switch to it. That's not as relevant now. - Always creates logs in the `Engines::upcheck` function. Previously we would mute some logs since they could get really noisy when one EE was down but others were functioning fine. Now we only have one EE and are upcheck-ing it less, it makes sense to always produce logs. This PR purposefully does not achieve: - Updating all occurances of "engines" to "engine". I'm trying to keep the diff small and manageable. We can come back for this. ## Additional Info NA	2022-07-13 20:31:39 +00:00
Divma	6d42a09ff8	Merge Engines and Engine struct in one in the `execution_layer` crate (#3284 ) ## Issue Addressed Part of https://github.com/sigp/lighthouse/issues/3118, continuation of https://github.com/sigp/lighthouse/pull/3257 and https://github.com/sigp/lighthouse/pull/3283 ## Proposed Changes - Merge the [`Engines`](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L161-L165)`) struct and [`Engine` ](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L62-L67)`) - Remove unnecessary generics ## Additional Info There is more cleanup to do that will come in subsequent PRs	2022-07-11 01:44:41 +00:00
Michael Sproul	748475be1d	Ensure caches are built for block_rewards POST API (#3305 ) ## Issue Addressed Follow up to https://github.com/sigp/lighthouse/pull/3290 that fixes a caching bug ## Proposed Changes Build the committee cache for the new `POST /lighthouse/analysis/block_rewards` API. Due to an unusual quirk of the total active balance cache the API endpoint would sometimes fail after loading a state from disk which had a current epoch cache _but not_ a total active balance cache. This PR adds calls to build the caches immediately before they're required, and has been running smoothly with `blockdreamer` the last few days.	2022-07-04 02:56:15 +00:00
Akihito Nakano	1cc8a97d4e	Remove unused method in HandlerNetworkContext (#3299 ) ## Issue Addressed N/A ## Proposed Changes Removed unused method in `HandlerNetworkContext`.	2022-07-04 02:56:14 +00:00
Divma	1219da9a45	Simplify error handling after engines fallback removal (#3283 ) ## Issue Addressed Part of #3118, continuation of #3257 ## Proposed Changes - the [ `first_success_without_retry` ](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L348-L351)`) function returns a single error. - the [`first_success`](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L324)`) function returns a single error. - [ `EngineErrors` ](`9c429d0764/beacon_node/execution_layer/src/lib.rs (L69)`) carries a single error. - [`EngineError`](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L173-L177)`) now does not need to carry an Id - [`process_multiple_payload_statuses`](`9c429d0764/beacon_node/execution_layer/src/payload_status.rs (L46-L50)`) now doesn't need to receive an iterator of statuses and weight in different errors ## Additional Info This is built on top of #3294	2022-07-04 02:56:13 +00:00
Michael Sproul	61ed5f0ec6	Optimize historic committee calculation for the HTTP API (#3272 ) ## Issue Addressed Closes https://github.com/sigp/lighthouse/issues/3270 ## Proposed Changes Optimize the calculation of historic beacon committees in the HTTP API. This is achieved by allowing committee caches to be constructed for historic epochs, and constructing these committee caches on the fly in the API. This is much faster than reconstructing the state at the requested epoch, which usually takes upwards of 20s, and sometimes minutes with SPRP=8192. The depth of the `randao_mixes` array allows us to look back 64K epochs/0.8 years from a single state, which is pretty awesome! We always use the `state_id` provided by the caller, but will return a nice 400 error if the epoch requested is out of range for the state requested, e.g. ```bash # Prater curl "http://localhost:5052/eth/v1/beacon/states/3170304/committees?epoch=33538" ``` ```json {"code":400,"message":"BAD_REQUEST: epoch out of bounds, try state at slot 1081344","stacktraces":[]} ``` Queries will be fastest when aligned to `slot % SPRP == 0`, so the hint suggests a slot that is 0 mod 8192.	2022-07-04 02:56:11 +00:00
Paul Hauner	be4e261e74	Use async code when interacting with EL (#3244 ) ## Overview This rather extensive PR achieves two primary goals: 1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state. 2. Refactors fork choice, block production and block processing to `async` functions. Additionally, it achieves: - Concurrent forkchoice updates to the EL and cache pruning after a new head is selected. - Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production. - Concurrent per-block-processing and execution payload verification during block processing. - The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?): - I had to do this to deal with sending blocks into spawned tasks. - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones. - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap. - Avoids cloning all the blocks in every chain segment during sync. - It also has the potential to clean up our code where we need to pass an owned block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough 😅) - The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs. For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273 ## Changes to `canonical_head` and `fork_choice` Previously, the `BeaconChain` had two separate fields: ``` canonical_head: RwLock<Snapshot>, fork_choice: RwLock<BeaconForkChoice> ``` Now, we have grouped these values under a single struct: ``` canonical_head: CanonicalHead { cached_head: RwLock<Arc<Snapshot>>, fork_choice: RwLock<BeaconForkChoice> } ``` Apart from ergonomics, the only actual change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously. ## Breaking Changes ### The `state` (root) field in the `finalized_checkpoint` SSE event Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event: 1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`. 4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots. Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](`de2b2801c8/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java (L171-L182)`) it uses [`getStateRootFromBlockRoot`](`de2b2801c8/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java (L336-L341)`) which uses (1). I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku. ## Notes for Reviewers I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct. I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking". I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run at least once means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows when it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it. I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around. Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2. You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests: - Changing tests to be `tokio::async` tests. - Adding `.await` to fork choice, block processing and block production functions. - Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`. - Wrapping `SignedBeaconBlock` in an `Arc`. - In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant. I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic. Co-authored-by: Mac L <mjladson@pm.me>	2022-07-03 05:36:50 +00:00
realbigsean	a7da0677d5	Remove builder redundancy (#3294 ) ## Issue Addressed This PR is a subset of the changes in #3134. Unstable will still not function correctly with the new builder spec once this is merged, #3134 should be used on testnets ## Proposed Changes - Removes redundancy in "builders" (servers implementing the builder spec) - Renames `payload-builder` flag to `builder` - Moves from old builder RPC API to new HTTP API, but does not implement the validator registration API (implemented in https://github.com/sigp/lighthouse/pull/3194) Co-authored-by: sean <seananderson33@gmail.com> Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-07-01 01:15:19 +00:00
Divma	d40c76e667	Fix clippy lints for rust 1.62 (#3300 ) ## Issue Addressed Fixes some new clippy lints after the last rust release ### Lints fixed for the curious: - [cast_abs_to_unsigned](https://rust-lang.github.io/rust-clippy/master/index.html#cast_abs_to_unsigned) - [map_identity](https://rust-lang.github.io/rust-clippy/master/index.html#map_identity) - [let_unit_value](https://rust-lang.github.io/rust-clippy/master/index.html#let_unit_value) - [crate_in_macro_def](https://rust-lang.github.io/rust-clippy/master/index.html#crate_in_macro_def) - [extra_unused_lifetimes](https://rust-lang.github.io/rust-clippy/master/index.html#extra_unused_lifetimes) - [format_push_string](https://rust-lang.github.io/rust-clippy/master/index.html#format_push_string)	2022-06-30 22:51:49 +00:00
realbigsean	f6ec44f0dd	Register validator api (#3194 ) ## Issue Addressed Lays the groundwork for builder API changes by implementing the beacon-API's new `register_validator` endpoint ## Proposed Changes - Add a routine in the VC that runs on startup (re-try until success), once per epoch or whenever `suggested_fee_recipient` is updated, signing `ValidatorRegistrationData` and sending it to the BN. - TODO: `gas_limit` config options https://github.com/ethereum/builder-specs/issues/17 - BN only sends VC registration data to builders on demand, but VC registration data does update the BN's prepare proposer cache and send an updated fcU to a local EE. This is necessary for fee recipient consistency between the blinded and full block flow in the event of fallback. Having the BN only send registration data to builders on demand gives feedback directly to the VC about relay status. Also, since the BN has no ability to sign these messages anyways (so couldn't refresh them if it wanted), and validator registration is independent of the BN head, I think this approach makes sense. - Adds upcoming consensus spec changes for this PR https://github.com/ethereum/consensus-specs/pull/2884 - I initially applied the bit mask based on a configured application domain.. but I ended up just hard coding it here instead because that's how it's spec'd in the builder repo. - Should application mask appear in the api? Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-06-30 00:49:21 +00:00
Pawan Dhananjay	5de00b7ee8	Unify execution layer endpoints (#3214 ) ## Issue Addressed Resolves #3069 ## Proposed Changes Unify the `eth1-endpoints` and `execution-endpoints` flags in a backwards compatible way as described in https://github.com/sigp/lighthouse/issues/3069#issuecomment-1134219221 Users have 2 options: 1. Use multiple non auth execution endpoints for deposit processing pre-merge 2. Use a single jwt authenticated execution endpoint for both execution layer and deposit processing post merge Related https://github.com/sigp/lighthouse/issues/3118 To enable jwt authenticated deposit processing, this PR removes the calls to `net_version` as the `net` namespace is not exposed in the auth server in execution clients. Moving away from using `networkId` is a good step in my opinion as it doesn't provide us with any added guarantees over `chainId`. See https://github.com/ethereum/consensus-specs/issues/2163 and https://github.com/sigp/lighthouse/issues/2115 Co-authored-by: Paul Hauner <paul@paulhauner.com>	2022-06-29 09:07:09 +00:00
Michael Sproul	53b2b500db	Extend block reward APIs (#3290 ) ## Proposed Changes Add a new HTTP endpoint `POST /lighthouse/analysis/block_rewards` which takes a vec of `BeaconBlock`s as input and outputs the `BlockReward`s for them. Augment the `BlockReward` struct with the attestation data for attestations in the block, which simplifies access to this information from blockprint. Using attestation data I've been able to make blockprint up to 95% accurate across Prysm/Lighthouse/Teku/Nimbus. I hope to go even higher using a bunch of synthetic blocks produced for Prysm/Nimbus/Lodestar, which are underrepresented in the current training data.	2022-06-29 04:50:37 +00:00
Paul Hauner	45b2eb18bc	v2.3.2-rc.0 (#3289 ) ## Issue Addressed NA ## Proposed Changes Bump versions ## Additional Info NA	2022-06-28 03:03:30 +00:00
Pawan Dhananjay	7acfbd89ee	Recover from NonConsecutive eth1 errors (#3273 ) ## Issue Addressed Fixes #1864 and a bunch of other closed but unresolved issues. ## Proposed Changes Allows the deposit caching to recover from `NonConsecutive` deposit errors by resetting the last processed block to the last valid deposit's block number. Still not sure of the underlying cause of this error, but this should recover the cache so we don't need `--eth1-purge-cache` anymore 🎉 A huge thanks to @one-three-three-seven for reproducing the error and providing the data that helped testing out the fix 🙌 Still needs a few more tests.	2022-06-26 23:10:58 +00:00
Akihito Nakano	082ed35bdc	Test the pruning of excess peers using randomly generated input (#3248 ) ## Issue Addressed https://github.com/sigp/lighthouse/issues/3092 ## Proposed Changes Added property-based tests for the pruning implementation. A randomly generated input for the test contains connection direction, subnets, and scores. ## Additional Info I left some comments on this PR, what I have tried, and [a question](https://github.com/sigp/lighthouse/pull/3248#discussion_r891981969). Co-authored-by: Diva M <divma@protonmail.com>	2022-06-25 22:22:34 +00:00
Michael Sproul	d21f083777	Add more paths to HTTP API metrics (#3282 ) ## Proposed Changes Expand the set of paths tracked by the HTTP API metrics to include all paths hit by the validator client. These paths were only partially updated for Altair, so we were missing some of the sync committee and v2 APIs.	2022-06-23 05:19:21 +00:00
Paul Hauner	748658e32c	Add some debug logs for checkpoint sync (#3281 ) ## Issue Addressed NA ## Proposed Changes I used these logs when debugging a spurious failure with Infura and thought they might be nice to have around permanently. There's no changes to functionality in this PR, just some additional `debug!` logs. ## Additional Info NA	2022-06-23 05:19:20 +00:00
Divma	7af5742081	Deprecate step param in BlocksByRange RPC request (#3275 ) ## Issue Addressed Deprecates the step parameter in the blocks by range request ## Proposed Changes - Modifies the BlocksByRangeRequest type to remove the step parameter and everywhere we took it into account before - Adds a new type to still handle coding and decoding of requests that use the parameter ## Additional Info I went with a deprecation over the type itself so that requests received outside `lighthouse_network` don't even need to deal with this parameter. After the deprecation period just removing the Old blocks by range request should be straightforward	2022-06-22 16:23:34 +00:00
Divma	2063c0fa0d	Initial work to remove engines fallback from the `execution_layer` crate (#3257 ) ## Issue Addressed Part of #3160 ## Proposed Changes Use only the first url given in the execution engine, if more than one is provided log it. This change only moves having multiple engines to one. The amount of code cleanup that can and should be done forward is not small and would interfere with ongoing PRs. I'm keeping the changes intentionally very very minimal. ## Additional Info Future works: - In [ `EngineError` ](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L173-L177)`) the id is not needed since it now has no meaning. - the [ `first_success_without_retry` ](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L348-L351)`) function can return a single error. - the [`first_success`](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L324)`) function can return a single error. - After the redundancy is removed for the builders, we can probably make the [ `EngineErrors` ](`9c429d0764/beacon_node/execution_layer/src/lib.rs (L69)`) carry a single error. - Merge the [`Engines`](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L161-L165)`) struct and [`Engine` ](`9c429d0764/beacon_node/execution_layer/src/engines.rs (L62-L67)`) - Fix the associated configurations and cli params. Not sure if both are done in https://github.com/sigp/lighthouse/pull/3214 In general I think those changes can be done incrementally and in individual pull requests.	2022-06-22 14:27:16 +00:00
Michael Sproul	efebf712dd	Avoid cloning snapshots during sync (#3271 ) ## Issue Addressed Closes https://github.com/sigp/lighthouse/issues/2944 ## Proposed Changes Remove snapshots from the cache during sync rather than cloning them. This reduces unnecessary cloning and memory fragmentation during sync. ## Additional Info This PR relies on the fact that the `block_delay` cache is not populated for blocks from sync. Relying on block delay may have the side effect that a change in `block_delay` calculation could lead to: a) more clones, if block delays are added for syncing blocks or b) less clones, if blocks near the head are erroneously provided without a `block_delay`. Case (a) would be a regression to the current status quo, and (b) is low-risk given we know that the snapshot cache is current susceptible to misses (hence `tree-states`).	2022-06-20 23:20:29 +00:00
eklm	a9e158663b	Fix validator_monitor_prev_epoch_ metrics (#2911 ) ## Issue Addressed #2820 ## Proposed Changes The problem is that validator_monitor_prev_epoch metrics are updated only if there is EpochSummary present in summaries map for the previous epoch and it is not the case for the offline validator. Ensure that EpochSummary is inserted into summaries map also for the offline validators.	2022-06-20 04:06:30 +00:00
Pawan Dhananjay	f428719761	Do not penalize peers on execution layer offline errors (#3258 ) ## Issue Addressed Partly resolves https://github.com/sigp/lighthouse/issues/3032 ## Proposed Changes Extracts some of the functionality of #3094 into a separate PR as the original PR requires a bit more work. Do not unnecessarily penalize peers when we fail to validate received execution payloads because our execution layer is offline.	2022-06-19 23:13:40 +00:00
Paul Hauner	564d7da656	v2.3.1 (#3262 ) ## Issue Addressed NA ## Proposed Changes Bump versions ## Additional Info NA	2022-06-14 05:25:38 +00:00
Divma	3dd50bda11	Improve substream management (#3261 ) ## Issue Addressed Which issue # does this PR address? ## Proposed Changes Please list or describe the changes introduced by this PR. ## Additional Info Please provide any additional information. For example, future considerations or information useful for reviewers.	2022-06-10 06:58:50 +00:00
Paul Hauner	11d80a6a38	Optimise `per_epoch_processing` low-hanging-fruit (#3254 ) ## Issue Addressed NA ## Proposed Changes - Uses a `Vec` in `SingleEpochParticipationCache` rather than `HashMap` to speed up processing times at the cost of memory usage. - Cache the result of `integer_sqrt` rather than recomputing for each validator. - Cache `state.previous_epoch` rather than recomputing it for each validator. ### Benchmarks Benchmarks on a recent mainnet state using #3252 to get timing. #### Without this PR ``` lcli skip-slots --state-path /tmp/state-0x3cdc.ssz --partial-state-advance --slots 32 --state-root 0x3cdc33cd02713d8d6cc33a6dbe2d3a5bf9af1d357de0d175a403496486ff845e --runs 10 [2022-06-09T08:21:02Z INFO lcli::skip_slots] Using mainnet spec [2022-06-09T08:21:02Z INFO lcli::skip_slots] Advancing 32 slots [2022-06-09T08:21:02Z INFO lcli::skip_slots] Doing 10 runs [2022-06-09T08:21:02Z INFO lcli::skip_slots] State path: "/tmp/state-0x3cdc.ssz" SSZ decoding /tmp/state-0x3cdc.ssz: 43ms [2022-06-09T08:21:03Z INFO lcli::skip_slots] Run 0: 245.718794ms [2022-06-09T08:21:03Z INFO lcli::skip_slots] Run 1: 245.364782ms [2022-06-09T08:21:03Z INFO lcli::skip_slots] Run 2: 255.866179ms [2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 3: 243.838909ms [2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 4: 250.431425ms [2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 5: 248.68765ms [2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 6: 262.051113ms [2022-06-09T08:21:05Z INFO lcli::skip_slots] Run 7: 264.293967ms [2022-06-09T08:21:05Z INFO lcli::skip_slots] Run 8: 293.202007ms [2022-06-09T08:21:05Z INFO lcli::skip_slots] Run 9: 264.552017ms ``` #### With this PR: ``` lcli skip-slots --state-path /tmp/state-0x3cdc.ssz --partial-state-advance --slots 32 --state-root 0x3cdc33cd02713d8d6cc33a6dbe2d3a5bf9af1d357de0d175a403496486ff845e --runs 10 [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 0: 73.898678ms [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 1: 75.536978ms [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 2: 75.176104ms [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 3: 76.460828ms [2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 4: 75.904195ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 5: 75.53077ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 6: 74.745572ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 7: 75.823489ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 8: 74.892055ms [2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 9: 76.333569ms ``` ## Additional Info NA	2022-06-10 04:29:28 +00:00
Divma	56b4cd88ca	minor libp2p upgrade (#3259 ) ## Issue Addressed Upgrades libp2p	2022-06-09 23:48:51 +00:00
Divma	cfd26d25e0	do not count sync batch attempts when peer is not at fault (#3245 ) ## Issue Addressed currently we count a failed attempt for a syncing chain even if the peer is not at fault. This makes us do more work if the chain fails, and heavily penalize peers, when we can simply retry. Inspired by a proposal I made to #3094 ## Proposed Changes If a batch fails but the peer is not at fault, do not count the attempt Also removes some annoying logs ## Additional Info We still get a counter on ignored attempts.. just in case	2022-06-07 02:35:56 +00:00
Divma	58e223e429	update libp2p (#3233 ) ## Issue Addressed na ## Proposed Changes Updates libp2p to https://github.com/libp2p/rust-libp2p/pull/2662 ## Additional Info From comments on the relevant PRs listed, we should pay attention at peer management consistency, but I don't think anything weird will happen. This is running in prater tok and sin	2022-06-07 02:35:55 +00:00
Michael Sproul	54cf94ea59	Fix per-slot timer in presence of clock changes (#3243 ) ## Issue Addressed Fixes a timing issue that results in spurious fork choice notifier failures: ``` WARN Error signalling fork choice waiter slot: 3962270, error: ForkChoiceSignalOutOfOrder { current: Slot(3962271), latest: Slot(3962270) }, service: beacon ``` There’s a fork choice run that is scheduled to run at the start of every slot by the `timer`, which creates a 12s interval timer when the beacon node starts up. The problem is that if there’s a bit of clock drift that gets corrected via NTP (or a leap second for that matter) then these 12s intervals will cease to line up with the start of the slot. This then creates the mismatch in slot number that we see above. Lighthouse also runs fork choice 500ms before the slot begins, and these runs are what is conflicting with the start-of-slot runs. This means that the warning in current versions of Lighthouse is mostly cosmetic because fork choice is up to date with all but the most recent 500ms of attestations (which usually isn’t many). ## Proposed Changes Fix the per-slot timer so that it continually re-calculates the duration to the start of the next slot and waits for that. A side-effect of this change is that we may skip slots if the per-slot task takes >12s to run, but I think this is an unlikely scenario and an acceptable compromise.	2022-06-06 23:52:32 +00:00
Divma	493c2c037c	reduce reprocess queue/channel sizes (#3239 ) ## Issue Addressed Reduces the effect of late blocks on overall node buildup ## Proposed Changes change the capacity of the channels used to send work for reprocessing in the beacon processor, and to send back to the main processor task, to be 75% of the capacity of the channel for receiving new events ## Additional Info The issues we've seen suggest we should still evaluate node performance under stress, with late blocks being a big factor. Other changes that could help: 1. right now we have a cap for queued attestations for reprocessing that applies to the sum of aggregated and unaggregated attestations. We could consider adding a separate cap that favors aggregated ones. 2. solving #2848	2022-06-06 23:52:31 +00:00
Akihito Nakano	a6d2ed6119	Fix: PeerManager doesn't remove "outbound only" peers which should be pruned (#3236 ) ## Issue Addressed This is one step to address https://github.com/sigp/lighthouse/issues/3092 before introducing `quickcheck`. I noticed an issue while I was reading the pruning implementation `PeerManager::prune_excess_peers()`. If a peer with the following condition, `outbound_peers_pruned` counter increases but the peer is not pushed to `peers_to_prune`. - [outbound only](`1e4ac8a4b9/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L1018)`) - [min_subnet_count <= MIN_SYNC_COMMITTEE_PEERS](`1e4ac8a4b9/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L1047)`) As a result, PeerManager doesn't remove "outbound" peers which should be pruned. Note: [`subnet_to_peer`](`e0d673ea86/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L999)`) (HashMap) doesn't guarantee a particular order of iteration. So whether the test fails depend on the order of iteration.	2022-06-06 05:51:10 +00:00
Michael Sproul	47d57a290b	Improve eth1 block cache sync (for Ropsten) (#3234 ) ## Issue Addressed Fix for the eth1 cache sync issue observed on Ropsten. ## Proposed Changes Ropsten blocks are so infrequent that they broke our algorithm for downloading eth1 blocks. We currently try to download forwards from the last block in our cache to the block with block number [`remote_highest_block - FOLLOW_DISTANCE + FOLLOW_DISTANCE / ETH1_BLOCK_TIME_TOLERANCE_FACTOR`](`6f732986f1/beacon_node/eth1/src/service.rs (L489-L492)`). With the tolerance set to 4 this is insufficient because we lag by 1536 blocks, which is more like ~14 hours on Ropsten. This results in us having an incomplete eth1 cache, because we should cache all blocks between -16h and -8h. Even if we were to set the tolerance to 2 for the largest allowance, we would only look back 1024 blocks which is still more than 8 hours. For example consider this block https://ropsten.etherscan.io/block/12321390. The block from 1536 blocks earlier is 14 hours and 20 minutes before it: https://ropsten.etherscan.io/block/12319854. The block from 1024 blocks earlier is https://ropsten.etherscan.io/block/12320366, 8 hours and 48 minutes before. - This PR introduces a new CLI flag called `--eth1-cache-follow-distance` which can be used to set the distance manually. - A new dynamic catchup mechanism is added which detects when the cache is lagging the true eth1 chain and tries to download more blocks within the follow distance in order to catch up.	2022-06-03 06:05:03 +00:00
Mac L	55ac423872	Emit log when fee recipient values are inconsistent (#3202 ) ## Issue Addressed #3156 ## Proposed Changes Emit a `WARN` log whenever the value of `fee_recipient` as returned from the EE is different from the value of `suggested_fee_recipient` as set on the BN, for example by the `--suggested-fee-recipient` CLI flag. ## Additional Info I have set the log level to `WARN` since it is legal behaviour (meaning it isn't really an error but is important to know when it is occurring). If we feel like this behaviour is almost always undesired (caused by a misconfiguration or malicious EE) then an `ERRO` log would be more appropriate. Happy to change it in that case.	2022-06-03 03:22:54 +00:00
Paul Hauner	16e49af8e1	Use genesis slot for node/syncing (#3226 ) ## Issue Addressed NA ## Proposed Changes Resolves this error log emitted from the VC prior to genesis: ``` WARN Unable connect to beacon node error: ServerMessage(ErrorMessage { code: 500, message: "UNHANDLED_ERROR: UnableToReadSlot", stacktraces: [] }) ``` ## Additional Info NA	2022-05-31 06:09:11 +00:00
Michael Sproul	98c8ac1a87	Fix typo in peer state transition log (#3224 ) ## Issue Addressed We were logging `out_finalized_epoch` instead of `our_finalized_epoch`. I noticed this ages ago but only just got around to fixing it. ## Additional Info I also reformatted the log line to respect the line length limit (`rustfmt` won't do it because it gets confused by the `;` in slog's log macros).	2022-05-31 06:09:10 +00:00
Paul Hauner	6f732986f1	v2.3.0 (#3222 ) ## Issue Addressed NA ## Proposed Changes Please list or describe the changes introduced by this PR. ## Additional Info - Pending testing on our infra. Please do not merge	2022-05-30 01:35:10 +00:00
Divma	a7896a58cc	move backfill sync jobs from highest priority to lowest (#3215 ) ## Issue Addressed #3212 ## Proposed Changes Move chain segments coming from back-fill syncing from highest priority to lowest ## Additional Info If this does not solve the issue, next steps would be lowering the batch size for back-fill sync, and as last resort throttling the processing of these chain segments	2022-05-26 02:05:17 +00:00
Paul Hauner	f4aa17ef85	v2.3.0-rc.0 (#3218 ) ## Issue Addressed NA ## Proposed Changes Bump versions ## Additional Info NA	2022-05-25 05:29:26 +00:00
Michael Sproul	229f883968	Avoid parallel fork choice runs during sync (#3217 ) ## Issue Addressed Fixes an issue that @paulhauner found with the v2.3.0 release candidate whereby the fork choice runs introduced by #3168 tripped over each other during sync: ``` May 24 23:06:40.542 WARN Error signalling fork choice waiter slot: 3884129, error: ForkChoiceSignalOutOfOrder { current: Slot(3884131), latest: Slot(3884129) }, service: beacon ``` This can occur because fork choice is called from the state advance _and_ the per-slot task. When one of these runs takes a long time it can end up finishing after a run from a later slot, tripping the error above. The problem is resolved by not running either of these fork choice calls during sync. Additionally, these parallel fork choice runs were causing issues in the database: ``` May 24 07:49:05.098 WARN Found a chain that should already have been pruned, head_slot: 92925, head_block_root: 0xa76c7bf1b98e54ed4b0d8686efcfdf853484e6c2a4c67e91cbf19e5ad1f96b17, service: beacon May 24 07:49:05.101 WARN Database migration failed error: HotColdDBError(FreezeSlotError { current_split_slot: Slot(92608), proposed_split_slot: Slot(92576) }), service: beacon ``` In this case, two fork choice calls triggering the finalization processing were being processed out of order due to differences in their processing time, causing the background migrator to try to advance finalization _backwards_ 😳. Removing the parallel fork choice runs from sync effectively addresses the issue, because these runs are most likely to have different finalized checkpoints (because of the speed at which fork choice advances during sync). In theory it's still possible to process updates out of order if any other fork choice runs end up completing out of order, but this should be much less common. Fixing out of order fork choice runs in general is difficult as it requires architectural changes like serialising fork choice updates through a single thread, or locking fork choice along with the head when it is mutated (https://github.com/sigp/lighthouse/pull/3175). ## Proposed Changes * Don't run per-slot fork choice during sync (if head is older than 4 slots) * Don't run state-advance fork choice during sync (if head is older than 4 slots) * Check for monotonic finalization updates in the background migrator. This is a good defensive check to have, and I'm not sure why we didn't have it before (we may have had it and wrongly removed it).	2022-05-25 03:27:30 +00:00
Paul Hauner	7a64994283	Call per_slot_task from a blocking thread (v2) (#3199 ) This PR was adapted from @pawanjay176's work in #3197. ## Issue Addressed Fixes a regression in https://github.com/sigp/lighthouse/pull/3168 ## Proposed Changes https://github.com/sigp/lighthouse/pull/3168 added calls to `fork_choice` in `BeaconChain::per_slot_task` function. This leads to a panic as `per_slot_task` is called from an async context which calls fork choice, which then calls `block_on`. This PR changes the timer to call the `per_slot_task` function in a blocking thread. Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>	2022-05-20 23:05:07 +00:00
Michael Sproul	6eaeaa542f	Fix Rust 1.61 clippy lints (#3192 ) ## Issue Addressed This fixes the low-hanging Clippy lints introduced in Rust 1.61 (due any hour now). It _ignores_ one lint, because fixing it requires a structural refactor of the validator client that needs to be done delicately. I've started on that refactor and will create another PR that can be reviewed in more depth in the coming days. I think we should merge this PR in the meantime to unblock CI.	2022-05-20 05:02:13 +00:00
Michael Sproul	8fa032c8ae	Run fork choice before block proposal (#3168 ) ## Issue Addressed Upcoming spec change https://github.com/ethereum/consensus-specs/pull/2878 ## Proposed Changes 1. Run fork choice at the start of every slot, and wait for this run to complete before proposing a block. 2. As an optimisation, also run fork choice 3/4 of the way through the slot (at 9s), _dequeueing attestations for the next slot_. 3. Remove the fork choice run from the state advance timer that occurred before advancing the state. ## Additional Info ### Block Proposal Accuracy This change makes us more likely to propose on top of the correct head in the presence of re-orgs with proposer boost in play. The main scenario that this change is designed to address is described in the linked spec issue. ### Attestation Accuracy This change _also_ makes us more likely to attest to the correct head. Currently in the case of a skipped slot at `slot` we only run fork choice 9s into `slot - 1`. This means the attestations from `slot - 1` aren't taken into consideration, and any boost applied to the block from `slot - 1` is not removed (it should be). In the language of the linked spec issue, this means we are liable to attest to C, even when the majority voting weight has already caused a re-org to B. ### Why remove the call before the state advance? If we've run fork choice at the start of the slot then it has already dequeued all the attestations from the previous slot, which are the only ones eligible to influence the head in the current slot. Running fork choice again is unnecessary (unless we run it for the next slot and try to pre-empt a re-org, but I don't currently think this is a great idea). ### Performance Based on Prater testing this adds about 5-25ms of runtime to block proposal times, which are 500-1000ms on average (and spike to 5s+ sometimes due to state handling issues 😢 ). I believe this is a small enough penalty to enable it by default, with the option to disable it via the new flag `--fork-choice-before-proposal-timeout 0`. Upcoming work on block packing and state representation will also reduce block production times in general, while removing the spikes. ### Implementation Fork choice gets invoked at the start of the slot via the `per_slot_task` function called from the slot timer. It then uses a condition variable to signal to block production that fork choice has been updated. This is a bit funky, but it seems to work. One downside of the timer-based approach is that it doesn't happen automatically in most of the tests. The test added by this PR has to trigger the run manually.	2022-05-20 05:02:11 +00:00
realbigsean	54b58fdc01	Log out response status when we hit `PayloadIdUnavailable` (#3190 ) ## Issue Addressed @z3n-chada is currently getting a `PayloadIdUnavailable` error when connecting lighthouse to Erigon and it's difficult to discern why so this just logs out the response status from the EE when we hit an `PayloadIdUnavailable` error Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-05-19 06:00:48 +00:00
Akihito Nakano	695f415590	Tiny improvement: PeerManager and maximum discovery query (#3182 ) ## Issue Addressed As [`Discovery` bounds the maximum discovery query](`e88b18be09/beacon_node/lighthouse_network/src/discovery/mod.rs (L328)`), `PeerManager` no need to handle it. `e88b18be09/beacon_node/lighthouse_network/src/discovery/mod.rs (L328)`	2022-05-19 06:00:46 +00:00
Mac L	def9bc660e	Remove DB migrations for legacy database schemas (#3181 ) ## Proposed Changes Remove support for DB migrations that support upgrading from schema's below version 5. This is mostly for cosmetic/code quality reasons as in most circumstances upgrading from versions of Lighthouse this old will almost always require a re-sync. ## Additional Info The minimum supported database schema is now version 5.	2022-05-17 04:54:39 +00:00
Paul Hauner	db8a6f81ea	Prevent attestation to future blocks from early attester cache (#3183 ) ## Issue Addressed N/A ## Proposed Changes Prevents the early attester cache from producing attestations to future blocks. This bug could result in a missed head vote if the BN was requested to produce an attestation for an earlier slot than the head block during the (usually) short window of time between verifying a block and setting it as the head. This bug was noticed in an [Antithesis](https://andreagrieser.com/) test and diagnosed by @realbigsean. ## Additional Info NA	2022-05-17 01:51:25 +00:00
Paul Hauner	38050fa460	Allow `TaskExecutor` to be used in `async` tests (#3178 ) # Description Since the `TaskExecutor` currently requires a `Weak<Runtime>`, it's impossible to use it in an async test where the `Runtime` is created outside our scope. Whilst we could create a new `Runtime` instance inside the async test, dropping that `Runtime` would cause a panic (you can't drop a `Runtime` in an async context). To address this issue, this PR creates the `enum Handle`, which supports either: - A `Weak<Runtime>` (for use in our production code) - A `Handle` to a runtime (for use in testing) In theory, there should be no change to the behaviour of our production code (beyond some slightly different descriptions in HTTP 500 errors), or even our tests. If there is no change, you might ask "why bother?". There are two PRs (#3070 and #3175) that are waiting on these fixes to introduce some new tests. Since we've added the EL to the `BeaconChain` (for the merge), we are now doing more async stuff in tests. I've also added a `RuntimeExecutor` to the `BeaconChainTestHarness`. Whilst that's not immediately useful, it will become useful in the near future with all the new async testing.	2022-05-16 08:35:59 +00:00
François Garillot	3f9e83e840	[refactor] Refactor Option/Result combinators (#3180 ) Code simplifications using `Option`/`Result` combinators to make pattern-matches a tad simpler. Opinions on these loosely held, happy to adjust in review. Tool-aided by [comby-rust](https://github.com/huitseeker/comby-rust).	2022-05-16 01:59:47 +00:00
Michael Sproul	bcdd960ab1	Separate execution payloads in the DB (#3157 ) ## Proposed Changes Reduce post-merge disk usage by not storing finalized execution payloads in Lighthouse's database. ⚠️ This is achieved in a backwards-incompatible way for networks that have already merged ⚠️. Kiln users and shadow fork enjoyers will be unable to downgrade after running the code from this PR. The upgrade migration may take several minutes to run, and can't be aborted after it begins. The main changes are: - New column in the database called `ExecPayload`, keyed by beacon block root. - The `BeaconBlock` column now stores blinded blocks only. - Lots of places that previously used full blocks now use blinded blocks, e.g. analytics APIs, block replay in the DB, etc. - On finalization: - `prune_abanonded_forks` deletes non-canonical payloads whilst deleting non-canonical blocks. - `migrate_db` deletes finalized canonical payloads whilst deleting finalized states. - Conversions between blinded and full blocks are implemented in a compositional way, duplicating some work from Sean's PR #3134. - The execution layer has a new `get_payload_by_block_hash` method that reconstructs a payload using the EE's `eth_getBlockByHash` call. - I've tested manually that it works on Kiln, using Geth and Nethermind. - This isn't necessarily the most efficient method, and new engine APIs are being discussed to improve this: https://github.com/ethereum/execution-apis/pull/146. - We're depending on the `ethers` master branch, due to lots of recent changes. We're also using a workaround for https://github.com/gakonst/ethers-rs/issues/1134. - Payload reconstruction is used in the HTTP API via `BeaconChain::get_block`, which is now `async`. Due to the `async` fn, the `blocking_json` wrapper has been removed. - Payload reconstruction is used in network RPC to serve blocks-by-{root,range} responses. Here the `async` adjustment is messier, although I think I've managed to come up with a reasonable compromise: the handlers take the `SendOnDrop` by value so that they can drop it on _task completion_ (after the `fn` returns). Still, this is introducing disk reads onto core executor threads, which may have a negative performance impact (thoughts appreciated). ## Additional Info - [x] For performance it would be great to remove the cloning of full blocks when converting them to blinded blocks to write to disk. I'm going to experiment with a `put_block` API that takes the block by value, breaks it into a blinded block and a payload, stores the blinded block, and then re-assembles the full block for the caller. - [x] We should measure the latency of blocks-by-root and blocks-by-range responses. - [x] We should add integration tests that stress the payload reconstruction (basic tests done, issue for more extensive tests: https://github.com/sigp/lighthouse/issues/3159) - [x] We should (manually) test the schema v9 migration from several prior versions, particularly as blocks have changed on disk and some migrations rely on being able to load blocks. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2022-05-12 00:42:17 +00:00
Michael Sproul	ae47a93c42	Don't panic in forkchoiceUpdated handler (#3165 ) ## Issue Addressed Fix a panic due to misuse of the Tokio executor when processing a forkchoiceUpdated response. We were previously calling `process_invalid_execution_payload` from the async function `update_execution_engine_forkchoice_async`, which resulted in a panic because `process_invalid_execution_payload` contains a call to fork choice, which ultimately calls `block_on`. An example backtrace can be found here: https://gist.github.com/michaelsproul/ac5da03e203d6ffac672423eaf52fb20 ## Proposed Changes Wrap the call to `process_invalid_execution_payload` in a `spawn_blocking` so that `block_on` is no longer called from an async context. ## Additional Info - I've been thinking about how to catch bugs like this with static analysis (a new Clippy lint). - The payload validation tests have been re-worked to support distinct responses from the mock EE for newPayload and forkchoiceUpdated. Three new tests have been added covering the `Invalid`, `InvalidBlockHash` and `InvalidTerminalBlock` cases. - I think we need a bunch more tests of different legal and illegal variations	2022-05-04 23:30:34 +00:00
Pawan Dhananjay	db0beb5178	Poll shutdown timeout in rpc handler (#3153 ) ## Issue Addressed N/A ## Proposed Changes Previously, we were using `Sleep::is_elapsed()` to check if the shutdown timeout had triggered without polling the sleep. This PR polls the sleep timer.	2022-04-13 03:54:44 +00:00
Divma	580d2f7873	log upgrades + prevent dialing of disconnecting peers (#3148 ) ## Issue Addressed We still ping peers that are considered in a disconnecting state ## Proposed Changes Do not ping peers once we decide they are disconnecting Upgrade logs about ignored rpc messages ## Additional Info --	2022-04-13 03:54:43 +00:00
Paul Hauner	b49b4291a3	Disallow attesting to optimistic head (#3140 ) ## Issue Addressed NA ## Proposed Changes Disallow the production of attestations and retrieval of unaggregated attestations when they reference an optimistic head. Add tests to this end. I also moved `BeaconChain::produce_unaggregated_attestation_for_block` to the `BeaconChainHarness`. It was only being used during tests, so it's nice to stop pretending it's production code. I also needed something that could produce attestations to optimistic blocks in order to simulate scenarios where the justified checkpoint is determined invalid (if no one would attest to an optimistic block, we could never justify it and then flip it to invalid). ## Additional Info - ~~Blocked on #3126~~	2022-04-13 03:54:42 +00:00
Divma	7366266bd1	keep failed finalized chains to avoid retries (#3142 ) ## Issue Addressed In very rare occasions we've seen most if not all our peers in a chain with which we don't agree. Purging these peers can take a very long time: number of retries of the chain. Meanwhile sync is caught in a loop trying the chain again and again. This makes it so that we fast track purging peers via registering the failed chain to prevent retrying for some time (30 seconds). Longer times could be dangerous since a chain can fail if a batch fails to download for example. In this case, I think it's still acceptable to fast track purging peers since they are nor providing the required info anyway Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>	2022-04-13 01:10:55 +00:00
Michael Sproul	aa72088f8f	v2.2.1 (#3149 ) ## Issue Addressed Addresses sync stalls on v2.2.0 (i.e. https://github.com/sigp/lighthouse/issues/3147). ## Additional Info I've avoided doing a full `cargo update` because I noticed there's a new patch version of libp2p and thought it could do with some more testing. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2022-04-12 02:52:12 +00:00
Paul Hauner	c8edeaff29	Don't log crits for missing EE before Bellatrix (#3150 ) ## Issue Addressed NA ## Proposed Changes Fixes an issue introduced in #3088 which was causing unnecessary `crit` logs on networks without Bellatrix enabled. ## Additional Info NA	2022-04-11 23:14:47 +00:00
Pawan Dhananjay	fff4dd6311	Fix rpc limits version 2 (#3146 ) ## Issue Addressed N/A ## Proposed Changes https://github.com/sigp/lighthouse/pull/3133 changed the rpc type limits to be fork aware i.e. if our current fork based on wall clock slot is Altair, then we apply only altair rpc type limits. This is a bug because phase0 blocks can still be sent over rpc and phase 0 block minimum size is smaller than altair block minimum size. So a phase0 block with `size < SIGNED_BEACON_BLOCK_ALTAIR_MIN` will return an `InvalidData` error as it doesn't pass the rpc types bound check. This error can be seen when we try syncing pre-altair blocks with size smaller than `SIGNED_BEACON_BLOCK_ALTAIR_MIN`. This PR fixes the issue by also accounting for forks earlier than current_fork in the rpc limits calculation in the `rpc_block_limits_by_fork` function. I decided to hardcode the limits in the function because that seemed simpler than calculating previous forks based on current fork and doing a min across forks. Adding a new fork variant is simple and can the limits can be easily checked in a review. Adds unit tests and modifies the syncing simulator to check the syncing from across fork boundaries. The syncing simulator's block 1 would always be of phase 0 minimum size (404 bytes) which is smaller than altair min block size (since block 1 contains no attestations).	2022-04-07 23:45:38 +00:00
ethDreamer	22002a4e68	Transition Block Proposer Preparation (#3088 ) ## Issue Addressed - #3058 Co-authored-by: Paul Hauner <paul@paulhauner.com>	2022-04-07 14:03:34 +00:00
Aren49	5ff4013263	Fix SPRP default value in cli (#3145 ) Changed SPRP to the correct default value of 8192.	2022-04-07 04:04:11 +00:00
Paul Hauner	8a40763183	Ensure VALID response from fcU updates protoarray (#3126 ) ## Issue Addressed NA ## Proposed Changes Ensures that a `VALID` response from a `forkchoiceUpdate` call will update that block in `ProtoArray`. I also had to modify the mock execution engine so it wouldn't return valid when all payloads were supposed to be some other static value. ## Additional Info NA	2022-04-05 20:58:17 +00:00
Paul Hauner	42cdaf5840	Add tests for importing blocks on invalid parents (#3123 ) ## Issue Addressed NA ## Proposed Changes - Adds more checks to prevent importing blocks atop parent with invalid execution payloads. - Adds a test for these conditions. ## Additional Info NA	2022-04-05 20:58:16 +00:00
Michael Sproul	bac7c3fa54	v2.2.0 (#3139 ) ## Proposed Changes Cut release v2.2.0 including proposer boost. ## Additional Info I also updated the clippy lints for the imminent release of Rust 1.60, although LH v2.2.0 will continue to compile using Rust 1.58 (our MSRV).	2022-04-05 02:53:09 +00:00
Michael Sproul	4d0122444b	Update and consolidate dependencies (#3136 ) ## Proposed Changes I did some gardening 🌳 in our dependency tree: - Remove duplicate versions of `warp` (git vs patch) - Remove duplicate versions of lots of small deps: `cpufeatures`, `ethabi`, `ethereum-types`, `bitvec`, `nix`, `libsecp256k1`. - Update MDBX (should resolve #3028). I tested and Lighthouse compiles on Windows 11 now. - Restore `psutil` back to upstream - Make some progress updating everything to rand 0.8. There are a few crates stuck on 0.7. Hopefully this puts us on a better footing for future `cargo audit` issues, and improves compile times slightly. ## Additional Info Some crates are held back by issues with `zeroize`. libp2p-noise depends on [`chacha20poly1305`](https://crates.io/crates/chacha20poly1305) which depends on zeroize < v1.5, and we can only have one version of zeroize because it's post 1.0 (see https://github.com/rust-lang/cargo/issues/6584). The latest version of `zeroize` is v1.5.4, which is used by the new versions of many other crates (e.g. `num-bigint-dig`). Once a new version of chacha20poly1305 is released we can update libp2p-noise and upgrade everything to the latest `zeroize` version. I've also opened a PR to `blst` related to zeroize: https://github.com/supranational/blst/pull/111	2022-04-04 00:26:16 +00:00
Pawan Dhananjay	ab434bc075	Fix merge rpc length limits (#3133 ) ## Issue Addressed N/A ## Proposed Changes Fix the upper bound for blocks by root responses to be equal to the max merge block size instead of altair. Further make the rpc response limits fork aware.	2022-04-04 00:26:15 +00:00
Michael Sproul	375e2b49b3	Conserve disk space by raising default SPRP (#3137 ) ## Proposed Changes Increase the default `--slots-per-restore-point` to 8192 for a 4x reduction in freezer DB disk usage. Existing nodes that use the previous default of 2048 will be left unchanged. Newly synced nodes (with or without checkpoint sync) will use the new 8192 default. Long-term we could do away with the freezer DB entirely for validator-only nodes, but this change is much simpler and grants us some extra space in the short term. We can also roll it out gradually across our nodes by purging databases one by one, while keeping the Ansible config the same. ## Additional Info We ignore a change from 2048 to 8192 if the user hasn't set the 8192 explicitly. We fire a debug log in the case where we do ignore: ``` DEBG Ignoring slots-per-restore-point config in favour of on-disk value, on_disk: 2048, config: 8192 ```	2022-04-01 07:16:25 +00:00
Pawan Dhananjay	9ec072ff3b	Strip newline from jwt secrets (#3132 ) ## Issue Addressed Resolves #3128 ## Proposed Changes Strip trailing newlines from jwt secret files.	2022-04-01 00:59:00 +00:00
Michael Sproul	41e7a07c51	Add `lighthouse db` command (#3129 ) ## Proposed Changes Add a `lighthouse db` command with three initial subcommands: - `lighthouse db version`: print the database schema version. - `lighthouse db migrate --to N`: manually upgrade (or downgrade!) the database to a different version. - `lighthouse db inspect --column C`: log the key and size in bytes of every value in a given `DBColumn`. This PR lays the groundwork for other changes, namely: - Mark's fast-deposit sync (https://github.com/sigp/lighthouse/pull/2915), for which I think we should implement a database downgrade (from v9 to v8). - My `tree-states` work, which already implements a downgrade (v10 to v8). - Standalone purge commands like `lighthouse db purge-dht` per https://github.com/sigp/lighthouse/issues/2824. ## Additional Info I updated the `strum` crate to 0.24.0, which necessitated some changes in the network code to remove calls to deprecated methods. Thanks to @winksaville for the motivation, and implementation work that I used as a source of inspiration (https://github.com/sigp/lighthouse/pull/2685).	2022-04-01 00:58:59 +00:00
realbigsean	ea783360d3	Kiln mev boost (#3062 ) ## Issue Addressed MEV boost compatibility ## Proposed Changes See #2987 ## Additional Info This is blocked on the stabilization of a couple specs, [here](https://github.com/ethereum/beacon-APIs/pull/194) and [here](https://github.com/flashbots/mev-boost/pull/20). Additional TODO's and outstanding questions - [ ] MEV boost JWT Auth - [ ] Will `builder_proposeBlindedBlock` return the revealed payload for the BN to propogate - [ ] Should we remove `private-tx-proposals` flag and communicate BN <> VC with blinded blocks by default once these endpoints enter the beacon-API's repo? This simplifies merge transition logic. Co-authored-by: realbigsean <seananderson33@gmail.com> Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-03-31 07:52:23 +00:00
realbigsean	83234ee4ce	json rpc id to value (#3110 ) ## Issue Addressed N/A ## Proposed Changes - Update the JSON-RPC id field for both our request and response objects to be a `serde_json::Value` rather than a `u32`. This field could be a string or a number according to the JSON-RPC 2.0 spec. We only ever set it to a number, but if, for example, we get a response that wraps this number in quotes, we would fail to deserialize it. I think because we're not doing any validation around this id otherwise, we should be less strict with it in this regard. ## Additional Info Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-03-29 22:59:55 +00:00
Paul Hauner	26e5281c68	Increase timeouts for EEs (#3125 ) ## Issue Addressed NA ## Proposed Changes In the first Goerli shadow-fork, Lighthouse was getting timeouts from Geth which prevented the LH+Geth pair from progressing. There's not a whole lot of information I can use to set these timeouts. The most interesting pieces of information I have are quotes from Marius from Geth: - "Fcu also needs to construct the block which can take 2sec" ([Discord](https://discord.com/channels/595666850260713488/910910348922589184/958006487052066836)) - "2 sec should be ok for new payload, weird that it times out" ([Discord](https://discord.com/channels/595666850260713488/910910348922589184/958006487052066836)) I don't think we should be so worried about getting these timeouts correct now. No one really knows how long the various EEs are going to take, it's a bit too early in development. With these changes I'm giving some headroom so that we don't fail just because EEs are quite optimized enough. I've set the value to 6s (half a mainnet slot), since I think anything beyond 6s is an interesting problem that we want to know about sooner rather than later. ## Additional Info NA	2022-03-28 23:32:12 +00:00
Pawan Dhananjay	a42cb69f6e	Update engine state in broadcast (#3071 ) ## Issue Addressed N/A ## Proposed Changes Set the engine state to `EngineState::Offline` if the engine api call fails during broadcast. This caused issues while pausing sync when the execution engine is offline because `EngineState` always returned `Synced`.	2022-03-28 23:32:11 +00:00
Michael Sproul	6efd95496b	Optionally skip RANDAO verification during block production (#3116 ) ## Proposed Changes Allow Lighthouse to speculatively create blocks via the `/eth/v1/validators/blocks` endpoint by optionally skipping the RANDAO verification that we introduced in #2740. When `verify_randao=false` is passed as a query parameter the `randao_reveal` is not required to be present, and if present will only be lightly checked (must be a valid BLS sig). If `verify_randao` is omitted it defaults to true and Lighthouse behaves exactly as it did previously, hence this PR is backwards-compatible. I'd like to get this change into `unstable` pretty soon as I've got 3 projects building on top of it: - [`blockdreamer`](https://github.com/michaelsproul/blockdreamer), which mocks block production every slot in order to fingerprint clients - analysis of Lighthouse's block packing _optimality_, which uses `blockdreamer` to extract interesting instances of the attestation packing problem - analysis of Lighthouse's block packing _performance_ (as in speed) on the `tree-states` branch ## Additional Info Having tested `blockdreamer` with Prysm, Nimbus and Teku I noticed that none of them verify the randao signature on `/eth/v1/validator/blocks`. I plan to open a PR to the `beacon-APIs` repo anyway so that this parameter can be standardised in case the other clients add RANDAO verification by default in future.	2022-03-28 07:14:13 +00:00
Lucas Manuel	adca0efc64	feat: Update ASCII art (#3113 ) ## Issue Addressed No issue, just updating merge ASCII art. ## Proposed Changes Updating ASCII art for merge. ## Additional Info Please provide any additional information. For example, future considerations or information useful for reviewers.	2022-03-24 00:04:50 +00:00
Mac L	41b5af9b16	Support IPv6 in BN and VC HTTP APIs (#3104 ) ## Issue Addressed #3103 ## Proposed Changes Parse `http-address` and `metrics-address` as `IpAddr` for both the beacon node and validator client to support IPv6 addresses. Also adjusts parsing of CORS origins to allow for IPv6 addresses. ## Usage You can now set `http-address` and/or `metrics-address` flags to IPv6 addresses. For example, the following: `lighthouse bn --http --http-address :: --metrics --metrics-address ::1` will expose the beacon node HTTP server on `[::]` (equivalent of `0.0.0.0` in IPv4) and the metrics HTTP server on `localhost` (the equivalent of `127.0.0.1` in IPv4) The beacon node API can then be accessed by: `curl "http://[server-ipv6-address]:5052/eth/v1/some_endpoint"` And the metrics server api can be accessed by: `curl "http://localhost:5054/metrics"` or by `curl "http://[::1]:5054/metrics"` ## Additional Info On most Linux distributions the `v6only` flag is set to `false` by default (see the section for the `IPV6_V6ONLY` flag in https://www.man7.org/linux/man-pages/man7/ipv6.7.html) which means IPv4 connections will continue to function on a IPv6 address (providing it is appropriately mapped). This means that even if the Lighthouse API is running on `::` it is also possible to accept IPv4 connections. However on Windows, this is not the case. The `v6only` flag is set to `true` so binding to `::` will only allow IPv6 connections.	2022-03-24 00:04:49 +00:00
Divma	788b6af3c4	Remove sync await points (#3036 ) ## Issue Addressed Removes the await points in sync waiting for a processor response for rpc block processing. Built on top of #3029 This also handles a couple of bugs in the previous code and adds a relatively comprehensive test suite.	2022-03-23 01:09:39 +00:00
ethDreamer	af50130e21	Add Proposer Cache Pruning & POS Activated Banner (#3109 ) ## Issue Addressed The proposers cache wasn't being pruned. Also didn't have a celebratory banner for the merge 😄 ## Banner ![pos_log_panda](https://user-images.githubusercontent.com/37123614/159528545-3aa54cbd-9362-49b1-830c-f4402f6ac341.png)	2022-03-22 21:33:38 +00:00
realbigsean	ae5b141dc4	Updates to tests and local testnet for Ganache 7 (#3056 ) ## Issue Addressed #2961 ## Proposed Changes -- update `--chainId` -> `--chain.chainId` -- remove `--keepAliveTimeout` -- fix log to listen for -- rename `ganache-cli` to `ganache` everywhere Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-03-20 22:48:14 +00:00
Michael Sproul	9bc9527998	v2.1.5 (#3096 ) ## Issue Addressed New release to address openssl vuln fixed in #3095 Closes #3093	2022-03-17 23:13:46 +00:00
Paul Hauner	28aceaa213	v2.1.4 (#3076 ) ## Issue Addressed NA ## Proposed Changes - Bump version to `v2.1.4` - Run `cargo update` ## Additional Info I think this release should be published around the 15th of March. Presently `blocked` for testing on our infrastructure.	2022-03-14 23:11:40 +00:00
Paul Hauner	e4fa7d906f	Fix post-merge checkpoint sync (#3065 ) ## Issue Addressed This address an issue which was preventing checkpoint-sync. When the node starts from checkpoint sync, the head block and the finalized block are the same value. We did not respect this when sending a `forkchoiceUpdated` (fcU) call to the EL and were expecting fork choice to hold the finalized ancestor of the head and returning an error when it didn't. This PR uses only fork choice for sending fcU updates. This is actually quite nice and avoids some atomicity issues between `chain.canonical_head` and `chain.fork_choice`. Now, whenever `chain.fork_choice.get_head` returns a value we also cache the values required for the next fcU call. ## TODO - [x] ~~Blocked on #3043~~ - [x] Ensure there isn't a warn message at startup.	2022-03-10 06:05:24 +00:00
Paul Hauner	c475499dfe	Fix `UnableToReadSlot` at startup (#3066 ) ## Issue Addressed Don't send an fcU message at startup if it's pre-genesis. The startup fcU message is not critical, not required by the spec, so it's fine to avoid it for networks that start post-Bellatrix fork.	2022-03-09 23:04:19 +00:00
Paul Hauner	267d8babc8	Prepare proposer (#3043 ) ## Issue Addressed Resolves #2936 ## Proposed Changes Adds functionality for calling [`validator/prepare_beacon_proposer`](https://ethereum.github.io/beacon-APIs/?urls.primaryName=dev#/Validator/prepareBeaconProposer) in advance. There is a `BeaconChain::prepare_beacon_proposer` method which, which called, computes the proposer for the next slot. If that proposer has been registered via the `validator/prepare_beacon_proposer` API method, then the `beacon_chain.execution_layer` will be provided the `PayloadAttributes` for us in all future forkchoiceUpdated calls. An artificial forkchoiceUpdated call will be created 4s before each slot, when the head updates and when a validator updates their information. Additionally, I added strict ordering for calls from the `BeaconChain` to the `ExecutionLayer`. I'm not certain the `ExecutionLayer` will always maintain this ordering, but it's a good start to have consistency from the `BeaconChain`. There are some deadlock opportunities introduced, they are documented in the code. ## Additional Info - ~~Blocked on #2837~~ Co-authored-by: realbigsean <seananderson33@GMAIL.com>	2022-03-09 00:42:05 +00:00
Divma	527dfa4893	cargo audit updates (#3063 ) ## Issue Addressed Closes #3008 and updates `regex` to solve https://rustsec.org/advisories/RUSTSEC-2022-0013	2022-03-08 19:48:12 +00:00
Pawan Dhananjay	381d0ece3c	auth for engine api (#3046 ) ## Issue Addressed Resolves #3015 ## Proposed Changes Add JWT token based authentication to engine api requests. The jwt secret key is read from the provided file and is used to sign tokens that are used for authenticated communication with the EL node. - [x] Interop with geth (synced `merge-devnet-4` with the `merge-kiln-v2` branch on geth) - [x] Interop with other EL clients (nethermind on `merge-devnet-4`) - [x] ~Implement `zeroize` for jwt secrets~ - [x] Add auth server tests with `mock_execution_layer` - [x] Get auth working with the `execution_engine_integration` tests Co-authored-by: Paul Hauner <paul@paulhauner.com>	2022-03-08 06:46:24 +00:00
Paul Hauner	3b4865c3ae	Poll the `engine_exchangeTransitionConfigurationV1` endpoint (#3047 ) ## Issue Addressed There has been an [`engine_exchangetransitionconfigurationv1`](https://github.com/ethereum/execution-apis/blob/main/src/engine/specification.md#engine_exchangetransitionconfigurationv1) method added to the execution API specs. The `engine_exchangetransitionconfigurationv1` will be polled every 60s as per this PR: https://github.com/ethereum/execution-apis/pull/189. If that PR is merged as-is, then we will be matching the spec. If that PR is not merged, we are still fully compatible with the spec, but just doing more than we are required. ## Additional Info - [x] ~~Blocked on #2837~~ - [x] Add method to EE integration tests	2022-03-08 04:40:42 +00:00
Akihito Nakano	4186d117af	Replace `OpenOptions::new` with `File::options` to be readable (#3059 ) ## Issue Addressed Closes #3049 This PR updates widely but this replace is safe as `File::options()` is equivelent to `OpenOptions::new()`. ref: https://doc.rust-lang.org/stable/src/std/fs.rs.html#378-380	2022-03-07 06:30:18 +00:00
tim gretler	cbda0a2f0a	Add log debounce to work processor (#3045 ) ## Issue Addressed #3010 ## Proposed Changes - move log debounce time latch to `./common/logging` - add timelatch to limit logging for `attestations_delay_queue` and `queued_block_roots` ## Additional Info - Is a separate crate for the time latch preferred? - `elapsed()` could take `LOG_DEBOUNCE_INTERVAL ` as an argument to allow for different granularity.	2022-03-07 06:30:17 +00:00
Michael Sproul	1829250ee4	Ignore attestations to finalized blocks (don't reject) (#3052 ) ## Issue Addressed Addresses spec changes from v1.1.0: - https://github.com/ethereum/consensus-specs/pull/2830 - https://github.com/ethereum/consensus-specs/pull/2846 ## Proposed Changes * Downgrade the REJECT for `HeadBlockFinalized` to an IGNORE. This applies to both unaggregated and aggregated attestations. ## Additional Info I thought about also changing the penalty for `UnknownTargetRoot` but I don't think it's reachable in practice.	2022-03-04 00:41:22 +00:00
Paul Hauner	09d2187198	Lower `debug!` logs to `trace!` (#3053 ) ## Issue Addressed These logs were very loud during sync.	2022-03-03 22:37:42 +00:00
Paul Hauner	aea43b626b	Rename random to prev_randao (#3040 ) ## Issue Addressed As discussed on last-night's consensus call, the testnets next week will target the [Kiln Spec v2](https://hackmd.io/@n0ble/kiln-spec). Presently, we support Kiln V1. V2 is backwards compatible, except for renaming `random` to `prev_randao` in: - https://github.com/ethereum/execution-apis/pull/180 - https://github.com/ethereum/consensus-specs/pull/2835 With this PR we'll no longer be compatible with the existing Kintsugi and Kiln testnets, however we'll be ready for the testnets next week. I raised this breaking change in the call last night, we are all keen to move forward and break things. We now target the [`merge-kiln-v2`](https://github.com/MariusVanDerWijden/go-ethereum/tree/merge-kiln-v2) branch for interop with Geth. This required adding the `--http.aauthport` to the tester to avoid a port conflict at startup. ### Changes to exec integration tests There's some change in the `merge-kiln-v2` version of Geth that means it can't compile on a vanilla Github runner. Bumping the `go` version on the runner solved this issue. Whilst addressing this, I refactored the `testing/execution_integration` crate to be a binary rather than a library with tests. This means that we don't need to run the `build.rs` and build Geth whenever someone runs `make lint` or `make test-release`. This is nice for everyday users, but it's also nice for CI so that we can have a specific runner for these tests and we don't need to ensure all runners support everything required to build all execution clients. ## More Info - [x] ~~EF tests are failing since the rename has broken some tests that reference the old field name. I have been told there will be new tests released in the coming days (25/02/22 or 26/02/22).~~	2022-03-03 02:10:57 +00:00
Divma	4bf1af4e85	Custom RPC request management for sync (#3029 ) ## Proposed Changes Make `lighthouse_network` generic over request ids, now usable by sync	2022-03-02 22:07:17 +00:00
Age Manning	e88b18be09	Update libp2p (#3039 ) Update libp2p. This corrects some gossipsub metrics.	2022-03-02 05:09:52 +00:00
Age Manning	f3c1dde898	Filter non global ips from discovery (#3023 ) ## Issue Addressed #3006 ## Proposed Changes This PR changes the default behaviour of lighthouse to ignore discovered IPs that are not globally routable. It adds a CLI flag, --enable-local-discovery to permit the non-global IPs in discovery. NOTE: We should take care in merging this as I will break current set-ups that rely on local IP discovery. I made this the non-default behaviour because we dont really want to be wasting resources attempting to connect to non-routable addresses and we dont want to propagate these to others (on the chance we can connect to one of these local nodes), improving discoveries efficiency.	2022-03-02 03:14:27 +00:00
Age Manning	e34524be75	Increase default target-peer count to 80 (#3005 ) Increase the default peer count from 50 to 80	2022-03-02 01:05:07 +00:00
Paul Hauner	b6493d5e24	Enforce Optimistic Sync Conditions & CLI Tests (v2) (#3050 ) ## Description This PR adds a single, trivial commit (f5d2b27d78349d5a675a2615eba42cc9ae708094) atop #2986 to resolve a tests compile error. The original author (@ethDreamer) is AFK so I'm getting this one merged ☺️ Please see #2986 for more information about the other, significant changes in this PR. Co-authored-by: Mark Mackey <mark@sigmaprime.io> Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>	2022-03-01 22:56:47 +00:00
Age Manning	a1b730c043	Cleanup small issues (#3027 ) Downgrades some excessive networking logs and corrects some metrics.	2022-03-01 01:49:22 +00:00
Paul Hauner	27e83b888c	Retrospective invalidation of exec. payloads for opt. sync (#2837 ) ## Issue Addressed NA ## Proposed Changes Adds the functionality to allow blocks to be validated/invalidated after their import as per the [optimistic sync spec](https://github.com/ethereum/consensus-specs/blob/dev/sync/optimistic.md#how-to-optimistically-import-blocks). This means: - Updating `ProtoArray` to allow flipping the `execution_status` of ancestors/descendants based on payload validity updates. - Creating separation between `execution_layer` and the `beacon_chain` by creating a `PayloadStatus` struct. - Refactoring how the `execution_layer` selects a `PayloadStatus` from the multiple statuses returned from multiple EEs. - Adding testing framework for optimistic imports. - Add `ExecutionBlockHash(Hash256)` new-type struct to avoid confusion between beacon block roots and execution payload hashes. - Add `merge` to [`FORKS`](`c3a793fd73/Makefile (L17)`) in the `Makefile` to ensure we test the beacon chain with merge settings. - Fix some tests here that were failing due to a missing execution layer. ## TODO - [ ] Balance tests Co-authored-by: Mark Mackey <mark@sigmaprime.io>	2022-02-28 22:07:48 +00:00
Michael Sproul	5e1f8a8480	Update to Rust 1.59 and 2021 edition (#3038 ) ## Proposed Changes Lots of lint updates related to `flat_map`, `unwrap_or_else` and string patterns. I did a little more creative refactoring in the op pool, but otherwise followed Clippy's suggestions. ## Additional Info We need this PR to unblock CI.	2022-02-25 00:10:17 +00:00
Mac L	104e3104f9	Add API to compute block packing efficiency data (#2879 ) ## Issue Addressed N/A ## Proposed Changes Add a HTTP API which can be used to compute the block packing data for all blocks over a discrete range of epochs. ## Usage ### Request ``` curl "http:localhost:5052/lighthouse/analysis/block_packing_efficiency?start_epoch=57730&end_epoch=57732" ``` ### Response ``` [ { "slot": "1847360", "block_hash": "0xa7dc230659802df2f99ea3798faede2e75942bb5735d56e6bfdc2df335dcd61f", "proposer_info": { "validator_index": 1686, "graffiti": "" }, "available_attestations": 7096, "included_attestations": 6459, "prior_skip_slots": 0 }, ... ] ``` ## Additional Info This is notably different to the existing lcli code: - Uses `BlockReplayer` #2863 and as such runs significantly faster than the previous method. - Corrects the off-by-one #2878 - Removes the `offline` validators component. This was only a "best guess" and simply was used as a way to determine an estimate of the "true" packing efficiency and was generally not helpful in terms of direct comparisons between different packing methods. As such it has been removed from the API and any future estimates of "offline" validators would be better suited in a separate/more targeted API or as part of 'beacon watch': #2873 - Includes `prior_skip_slots`.	2022-02-21 23:21:02 +00:00
eklm	56b2ec6b29	Allow proposer duties request for the next epoch (#2963 ) ## Issue Addressed Closes #2880 ## Proposed Changes Support requests to the next epoch in proposer_duties api. ## Additional Info Implemented with skipping proposer cache for this case because the cache for the future epoch will be missed every new slot as dependent_root is changed and we don't want to "wash it out" by saving additional values.	2022-02-18 05:32:00 +00:00
Age Manning	3ebb8b0244	Improved peer management (#2993 ) ## Issue Addressed I noticed in some logs some excess and unecessary discovery queries. What was happening was we were pruning our peers down to our outbound target and having some disconnect. When we are below this threshold we try to find more peers (even if we are at our peer limit). The request becomes futile because we have no more peer slots. This PR corrects this issue and advances the pruning mechanism to favour subnet peers. An overview the new logic added is: - We prune peers down to a target outbound peer count which is higher than the minimum outbound peer count. - We only search for more peers if there is room to do so, and we are below the minimum outbound peer count not the target. So this gives us some buffer for peers to disconnect. The buffer is currently 10% The modified pruning logic is documented in the code but for reference it should do the following: - Prune peers with bad scores first - If we need to prune more peers, then prune peers that are subscribed to a long-lived subnet - If we still need to prune peers, the prune peers that we have a higher density of on any given subnet which should drive for uniform peers across all subnets. This will need a bit of testing as it modifies some significant peer management behaviours in lighthouse.	2022-02-18 02:36:43 +00:00
Paul Hauner	0a6a8ea3b0	Engine API v1.0.0.alpha.6 + interop tests (#3024 ) ## Issue Addressed NA ## Proposed Changes This PR extends #3018 to address my review comments there and add automated integration tests with Geth (and other implementations, in the future). I've also de-duplicated the "unused port" logic by creating an `common/unused_port` crate. ## Additional Info I'm not sure if we want to merge this PR, or update #3018 and merge that. I don't mind, I'm primarily opening this PR to make sure CI works. Co-authored-by: Mark Mackey <mark@sigmaprime.io>	2022-02-17 21:47:06 +00:00
Paul Hauner	2f8531dc60	Update to consensus-specs v1.1.9 (#3016 ) ## Issue Addressed Closes #3014 ## Proposed Changes - Rename `receipt_root` to `receipts_root` - Rename `execute_payload` to `notify_new_payload` - This is slightly weird since we modify everything except the actual HTTP call to the engine API. That change is expected to be implemented in #2985 (cc @ethDreamer) - Enable "random" tests for Bellatrix. ## Notes This will break partially compatibility with Kintusgi testnets in order to gain compatibility with [Kiln](https://hackmd.io/@n0ble/kiln-spec) testnets. I think it will only break the BN APIs due to the `receipts_root` change, however it might have some other effects too. Co-authored-by: Michael Sproul <micsproul@gmail.com>	2022-02-14 23:57:23 +00:00
Paul Hauner	c3a793fd73	v2.1.3 (#3017 ) ## Issue Addressed NA ## Proposed Changes Bump versions ## Additional Info NA	2022-02-11 01:54:33 +00:00
Divma	1306b2db96	libp2p upgrade + gossipsub interval fix (#3012 ) ## Issue Addressed Lighthouse gossiping late messages ## Proposed Changes Point LH to our fork using tokio interval, which 1) works as expected 2) is more performant than the previous version that actually worked as expected Upgrade libp2p ## Additional Info https://github.com/libp2p/rust-libp2p/issues/2497	2022-02-10 04:12:03 +00:00
Philipp K	5388183884	Allow per validator fee recipient via flag or file in validator client (similar to graffiti / graffiti-file) (#2924 ) ## Issue Addressed #2883 ## Proposed Changes * Added `suggested-fee-recipient` & `suggested-fee-recipient-file` flags to validator client (similar to graffiti / graffiti-file implementation). * Added proposer preparation service to VC, which sends the fee-recipient of all known validators to the BN via [/eth/v1/validator/prepare_beacon_proposer](https://github.com/ethereum/beacon-APIs/pull/178) api once per slot * Added [/eth/v1/validator/prepare_beacon_proposer](https://github.com/ethereum/beacon-APIs/pull/178) api endpoint and preparation data caching * Added cleanup routine to remove cached proposer preparations when not updated for 2 epochs ## Additional Info Changed the Implementation following the discussion in #2883. Co-authored-by: pk910 <philipp@pk910.de> Co-authored-by: Paul Hauner <paul@paulhauner.com> Co-authored-by: Philipp K <philipp@pk910.de>	2022-02-08 19:52:20 +00:00
Divma	36fc887a40	Gossip cache timeout adjustments (#2997 ) ## Proposed Changes - Do not retry to publish sync committee messages. - Give a more lenient timeout to slashings and exits	2022-02-07 23:25:06 +00:00
Age Manning	675c7b7e26	Correct a dial race condition (#2992 ) ## Issue Addressed On a network with few nodes, it is possible that the same node can be found from a subnet discovery and a normal peer discovery at the same time. The network behaviour loads these peers into events and processes them when it has the chance. It can happen that the same peer can enter the event queue more than once and then attempt to be dialed twice. This PR shifts the registration of nodes in the peerdb as being dialed before they enter the NetworkBehaviour queue, preventing multiple attempts of the same peer being entered into the queue and avoiding the race condition.	2022-02-07 23:25:05 +00:00
Divma	48b7c8685b	upgrade libp2p (#2933 ) ## Issue Addressed Upgrades libp2p to v.0.42.0 pre release (https://github.com/libp2p/rust-libp2p/pull/2440)	2022-02-07 23:25:03 +00:00
Divma	615695776e	Retry gossipsub messages when insufficient peers (#2964 ) ## Issue Addressed #2947 ## Proposed Changes Store messages that fail to be published due to insufficient peers for retry later. Messages expire after half an epoch and are retried if gossipsub informs us that an useful peer has connected. Currently running in Atlanta ## Additional Info If on retry sending the messages fails they will not be tried again	2022-02-03 01:12:30 +00:00
Paul Hauner	0177b9286e	v2.1.2 (#2980 ) ## Issue Addressed NA ## Proposed Changes - Bump version to `v2.1.2` - Run `cargo update` ## Additional Info NA	2022-02-01 23:53:53 +00:00
Paul Hauner	fc37d51e10	Add checks to prevent fwding old messages (#2978 ) ## Issue Addressed NA ## Proposed Changes Checks to see if attestations or sync messages are still valid before "accepting" them for propagation. ## Additional Info NA	2022-02-01 01:04:24 +00:00
Paul Hauner	a6da87066b	Add strict penalties const bool (#2976 ) ## Issue Addressed NA ## Proposed Changes Adds `STRICT_LATE_MESSAGE_PENALTIES: bool` which allows for toggling penalties for late sync/attn messages. `STRICT_LATE_MESSAGE_PENALTIES` is set to `false`, since we're seeing a lot of late messages on the network which are causing peer drops. We can toggle the bool during testing to try and figure out what/who is the cause of these late messages. In effect, this PR relaxes peer downscoring for late attns and sync committee messages. ## Additional Info - ~~Blocked on #2974~~	2022-02-01 01:04:22 +00:00
Mac L	286996b090	Fix small typo in error log (#2975 ) ## Proposed Changes Fixes a small typo I came across.	2022-01-31 22:55:07 +00:00
Age Manning	bdd70d7aef	Reduce gossip history (#2969 ) The gossipsub history was increased to a good portion of a slot from 2.1 seconds in the last release. Although it shouldn't cause too much issue, it could be related to recieving later messages than usual and interacting with our scoring system penalizing peers. For consistency, this PR reduces the time we gossip messages back to the same values of the previous release. It also adjusts the gossipsub heartbeat time for testing purposes with a developer flag but this should not effect end users.	2022-01-31 07:29:41 +00:00
Michael Sproul	99d2c33387	Avoid looking up pre-finalization blocks (#2909 ) ## Issue Addressed This PR fixes the unnecessary `WARN Single block lookup failed` messages described here: https://github.com/sigp/lighthouse/pull/2866#issuecomment-1008442640 ## Proposed Changes Add a new cache to the `BeaconChain` that tracks the block roots of blocks from before finalization. These could be blocks from the canonical chain (which might need to be read from disk), or old pre-finalization blocks that have been forked out. The cache also stores a set of block roots for in-progress single block lookups, which duplicates some of the information from sync's `single_block_lookups` hashmap: `a836e180f9/beacon_node/network/src/sync/manager.rs (L192-L196)` On a live node you can confirm that the cache is working by grepping logs for the message: `Rejected attestation to finalized block`.	2022-01-27 22:58:32 +00:00
Mac L	e05142b798	Add API to compute discrete validator attestation performance (#2874 ) ## Issue Addressed N/A ## Proposed Changes Add a HTTP API which can be used to compute the attestation performances of a validator (or all validators) over a discrete range of epochs. Performances can be computed for a single validator, or for the global validator set. ## Usage ### Request The API can be used as follows: ``` curl "http://localhost:5052/lighthouse/analysis/attestation_performance/{validator_index}?start_epoch=57730&end_epoch=57732" ``` Alternatively, to compute performances for the global validator set: ``` curl "http://localhost:5052/lighthouse/analysis/attestation_performance/global?start_epoch=57730&end_epoch=57732" ``` ### Response The response is JSON formatted as follows: ``` [ { "index": 72, "epochs": { "57730": { "active": true, "head": false, "target": false, "source": false }, "57731": { "active": true, "head": true, "target": true, "source": true, "delay": 1 }, "57732": { "active": true, "head": true, "target": true, "source": true, "delay": 1 }, } } ] ``` > Note that the `"epochs"` are not guaranteed to be in ascending order. ## Additional Info - This API is intended to be used in our upcoming validator analysis tooling (#2873) and will likely not be very useful for regular users. Some advanced users or block explorers may find this API useful however. - The request range is limited to 100 epochs (since the range is inclusive and it also computes the `end_epoch` it's actually 101 epochs) to prevent Lighthouse using exceptionally large amounts of memory.	2022-01-27 22:58:31 +00:00
Michael Sproul	e70daaa3b6	Implement API for block rewards (#2628 ) ## Proposed Changes Add an API endpoint for retrieving detailed information about block rewards. For information on usage see [the docs](https://github.com/sigp/lighthouse/blob/block-rewards-api/book/src/api-lighthouse.md#lighthouseblock_rewards), and the source.	2022-01-27 01:06:02 +00:00
Divma	f2b1e096b2	Code quality improvents to the network service (#2932 ) Checking how to priorize the polling of the network I moved most of the service code to functions. This change I think it's worth on it's own for code quality since inside the `tokio::select` many tools don't work (cargo fmt, sometimes clippy, and sometimes even the compiler's errors get wack). This is functionally equivalent to the previous code, just better organized	2022-01-26 23:14:23 +00:00
Divma	9964f5afe5	Document why we hash downloaded blocks for both sync algs (#2927 ) ## Proposed Changes Initially the idea was to remove hashing of blocks in backfill sync. After considering it more, we conclude that we need to do it in both (forward and backfill) anyway. But since we forgot why we were doing it in the first place, this PR documents this logic. Future us should find it useful Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>	2022-01-26 23:14:22 +00:00
Paul Hauner	5f628a71d4	v2.1.1 (#2951 ) ## Issue Addressed NA ## Proposed Changes - Bump Lighthouse version to v2.1.1 - Update `thread_local` from v1.1.3 to v1.1.4 to address https://rustsec.org/advisories/RUSTSEC-2022-0006 ## Additional Info - ~~Blocked on #2950~~ - ~~Blocked on #2952~~	2022-01-25 00:46:24 +00:00
Age Manning	ca29b580a2	Increase target subnet peers (#2948 ) In the latest release we decreased the target number of subnet peers. It appears this could be causing issues in some cases and so reverting it back to the previous number it wise. A larger PR that follows this will address some other related discovery issues and peer management around subnet peer discovery.	2022-01-24 12:08:00 +00:00
Rishi Kumar Ray	f0f327af0c	Removed all disable_forks (#2925 ) #2923 Which issue # does this PR address? There's a redundant field on the BeaconChain called disabled_forks that was once part of our fork-aware networking (#953) but which is no longer used and could be deleted. so Removed all references to disabled_forks so that the code compiles and git grep disabled_forks returns no results. ## Proposed Changes Please list or describe the changes introduced by this PR. Removed all references of disabled_forks Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>	2022-01-20 09:14:26 +00:00
Age Manning	fc7a1a7dc7	Allow disconnected states to introduce new peers without warning (#2922 ) ## Issue Addressed We emit a warning to verify that all peer connection state information is consistent. A warning is given under one edge case; We try to dial a peer with peer-id X and multiaddr Y. The peer responds to multiaddr Y with a different peer-id, Z. The dialing to the peer fails, but libp2p injects the failed attempt as peer-id Z. In this instance, our PeerDB tries to add a new peer in the disconnected state under a previously unknown peer-id. This is harmless and so this PR permits this behaviour without logging a warning.	2022-01-20 09:14:25 +00:00
Mac L	d06f87486a	Support duplicate keys in HTTP API query strings (#2908 ) ## Issues Addressed Closes #2739 Closes #2812 ## Proposed Changes Support the deserialization of query strings containing duplicate keys into their corresponding types. As `warp` does not support this feature natively (as discussed in #2739), it relies on the external library [`serde_array_query`](https://github.com/sigp/serde_array_query) (written by @michaelsproul) This is backwards compatible meaning that both of the following requests will produce the same output: ``` curl "http://localhost:5052/eth/v1/events?topics=head,block" ``` ``` curl "http://localhost:5052/eth/v1/events?topics=head&topics=block" ``` ## Additional Info Certain error messages have changed slightly. This only affects endpoints which accept multiple values. For example: ``` {"code":400,"message":"BAD_REQUEST: invalid query: Invalid query string","stacktraces":[]} ``` is now ``` {"code":400,"message":"BAD_REQUEST: unable to parse query","stacktraces":[]} ``` The serve order of the endpoints `get_beacon_state_validators` and `get_beacon_state_validators_id` have flipped: ```rust .or(get_beacon_state_validators_id.boxed()) .or(get_beacon_state_validators.boxed()) ``` This is to ensure proper error messages when filter fallback occurs due to the use of the `and_then` filter. ## Future Work - Cleanup / remove filter fallback behaviour by substituting `and_then` with `then` where appropriate. - Add regression tests for HTTP API error messages. ## Credits - @mooori for doing the ground work of investigating possible solutions within the existing Rust ecosystem. - @michaelsproul for writing [`serde_array_query`](https://github.com/sigp/serde_array_query) and for helping debug the behaviour of the `warp` filter fallback leading to incorrect error messages.	2022-01-20 09:14:19 +00:00
Paul Hauner	79db2d4deb	v2.1.0 (#2928 ) ## Issue Addressed NA ## Proposed Changes Bump to `v2.1.0`. ## Additional Info NA	2022-01-20 03:39:41 +00:00
Michael Sproul	ef7351ddfe	Update to spec v1.1.8 (#2893 ) ## Proposed Changes Change the canonical fork name for the merge to Bellatrix. Keep other merge naming the same to avoid churn. I've also fixed and enabled the `fork` and `transition` tests for Bellatrix, and the v1.1.7 fork choice tests. Additionally, the `BellatrixPreset` has been added with tests. It gets served via the `/config/spec` API endpoint along with the other presets.	2022-01-19 00:24:19 +00:00
Michael Sproul	a836e180f9	Release v2.1.0-rc.1 (#2921 ) ## Proposed Changes New release candidate to address Windows build failure for rc.0	2022-01-17 03:25:30 +00:00

... 2 3 4 5 6 ...

2110 Commits