lighthouse

Author	SHA1	Message	Date
Michael Sproul	c76a1971cc	Merge remote-tracking branch 'origin/unstable' into capella	2023-01-25 14:20:16 +11:00
ethDreamer	3d4dd6af75	Use eth1_withdrawal_credentials in Test States (#3898 ) * Use eth1_withdrawal_credential in Some Test States * Update beacon_node/genesis/src/interop.rs Co-authored-by: Michael Sproul <micsproul@gmail.com> * Update beacon_node/genesis/src/interop.rs Co-authored-by: Michael Sproul <micsproul@gmail.com> * Increase validator sizes * Pick next sync committee message Co-authored-by: Michael Sproul <micsproul@gmail.com> Co-authored-by: Paul Hauner <paul@paulhauner.com>	2023-01-24 16:22:51 +01:00
naviechan	2802bc9a9c	Implement sync_committee_rewards API (per-validator reward) (#3903 ) ## Issue Addressed [#3661](https://github.com/sigp/lighthouse/issues/3661) ## Proposed Changes `/eth/v1/beacon/rewards/sync_committee/{block_id}` ``` { "execution_optimistic": false, "finalized": false, "data": [ { "validator_index": "0", "reward": "2000" } ] } ``` The issue contains the implementation of three per-validator reward APIs: * `sync_committee_rewards` * [`attestation_rewards`](https://github.com/sigp/lighthouse/pull/3822) * `block_rewards` This PR only implements the `sync_committe_rewards `. The endpoints can be viewed in the Ethereum Beacon nodes API browser: [https://ethereum.github.io/beacon-APIs/?urls.primaryName=dev#/Rewards](https://ethereum.github.io/beacon-APIs/?urls.primaryName=dev#/Rewards) ## Additional Info The implementation of [consensus client reward APIs](https://github.com/eth-protocol-fellows/cohort-three/blob/master/projects/project-ideas.md#consensus-client-reward-apis) is part of the [EPF](https://github.com/eth-protocol-fellows/cohort-three). Co-authored-by: navie <naviechan@gmail.com> Co-authored-by: kevinbogner <kevbogner@gmail.com>	2023-01-24 02:06:42 +00:00
realbigsean	8f8b7211d0	execution API related fixes	2023-01-19 09:32:08 -05:00
ethDreamer	26787412cd	Update engine_api to Latest spec (#3893 ) * Update engine_api to Latest spec * Small Test Fix * Fix Test Deserialization Issue	2023-01-19 22:42:17 +11:00
realbigsean	06f71e8cce	merge capella	2023-01-12 12:51:09 -05:00
Michael Sproul	aa896decc1	Fix some beacon_chain tests	2023-01-12 19:13:01 +11:00
Michael Sproul	2af8110529	Merge remote-tracking branch 'origin/unstable' into capella Fixing the conflicts involved patching up some of the `block_hash` verification, the rest will be done as part of https://github.com/sigp/lighthouse/issues/3870	2023-01-12 16:22:00 +11:00
realbigsean	438126f19a	merge upstream, fix compile errors	2023-01-11 13:52:58 -05:00
Paul Hauner	830efdb5c2	Improve validator monitor experience for high validator counts (#3728 ) ## Issue Addressed NA ## Proposed Changes Myself and others (#3678) have observed that when running with lots of validators (e.g., 1000s) the cardinality is too much for Prometheus. I've seen Prometheus instances just grind to a halt when we turn the validator monitor on for our testnet validators (we have 10,000s of Goerli validators). Additionally, the debug log volume can get very high with one log per validator, per attestation. To address this, the `bn --validator-monitor-individual-tracking-threshold <INTEGER>` flag has been added to disable per-validator (i.e., non-aggregated) metrics/logging once the validator monitor exceeds the threshold of validators. The default value is `64`, which is a finger-to-the-wind value. I don't actually know the value at which Prometheus starts to become overwhelmed, but I've seen it work with ~64 validators and I've seen it not work with 1000s of validators. A default of `64` seems like it will result in a breaking change to users who are running millions of dollars worth of validators whilst resulting in a no-op for low-validator-count users. I'm open to changing this number, though. Additionally, this PR starts collecting aggregated Prometheus metrics (e.g., total count of head hits across all validators), so that high-validator-count validators still have some interesting metrics. We already had logging for aggregated values, so nothing has been added there. I've opted to make this a breaking change since it can be rather damaging to your Prometheus instance to accidentally enable the validator monitor with large numbers of validators. I've crashed a Prometheus instance myself and had a report from another user who's done the same thing. ## Additional Info NA ## Breaking Changes Note A new label has been added to the validator monitor Prometheus metrics: `total`. This label tracks the aggregated metrics of all validators in the validator monitor (as opposed to each validator being tracking individually using its pubkey as the label). Additionally, a new flag has been added to the Beacon Node: `--validator-monitor-individual-tracking-threshold`. The default value is `64`, which means that when the validator monitor is tracking more than 64 validators then it will stop tracking per-validator metrics and only track the `all_validators` metric. It will also stop logging per-validator logs and only emit aggregated logs (the exception being that exit and slashing logs are always emitted). These changes were introduced in #3728 to address issues with untenable Prometheus cardinality and log volume when using the validator monitor with high validator counts (e.g., 1000s of validators). Users with less than 65 validators will see no change in behavior (apart from the added `all_validators` metric). Users with more than 65 validators who wish to maintain the previous behavior can set something like `--validator-monitor-individual-tracking-threshold 999999`.	2023-01-09 08:18:55 +00:00
Mark Mackey	be232c4587	Update Execution Layer Tests for Capella	2023-01-03 16:58:15 -06:00
realbigsean	d893706e0e	merge with capella	2022-12-15 09:33:18 -05:00
Michael Sproul	991e4094f8	Merge remote-tracking branch 'origin/unstable' into capella-update	2022-12-14 13:00:41 +11:00
Michael Sproul	775d222299	Enable proposer boost re-orging (#2860 ) ## Proposed Changes With proposer boosting implemented (#2822) we have an opportunity to re-org out late blocks. This PR adds three flags to the BN to control this behaviour: * `--disable-proposer-reorgs`: turn aggressive re-orging off (it's on by default). * `--proposer-reorg-threshold N`: attempt to orphan blocks with less than N% of the committee vote. If this parameter isn't set then N defaults to 20% when the feature is enabled. * `--proposer-reorg-epochs-since-finalization N`: only attempt to re-org late blocks when the number of epochs since finalization is less than or equal to N. The default is 2 epochs, meaning re-orgs will only be attempted when the chain is finalizing optimally. For safety Lighthouse will only attempt a re-org under very specific conditions: 1. The block being proposed is 1 slot after the canonical head, and the canonical head is 1 slot after its parent. i.e. at slot `n + 1` rather than building on the block from slot `n` we build on the block from slot `n - 1`. 2. The current canonical head received less than N% of the committee vote. N should be set depending on the proposer boost fraction itself, the fraction of the network that is believed to be applying it, and the size of the largest entity that could be hoarding votes. 3. The current canonical head arrived after the attestation deadline from our perspective. This condition was only added to support suppression of forkchoiceUpdated messages, but makes intuitive sense. 4. The block is being proposed in the first 2 seconds of the slot. This gives it time to propagate and receive the proposer boost. ## Additional Info For the initial idea and background, see: https://github.com/ethereum/consensus-specs/pull/2353#issuecomment-950238004 There is also a specification for this feature here: https://github.com/ethereum/consensus-specs/pull/3034 Co-authored-by: Michael Sproul <micsproul@gmail.com> Co-authored-by: pawan <pawandhananjay@gmail.com>	2022-12-13 09:57:26 +00:00
realbigsean	8102a01085	merge with upstream	2022-12-01 11:13:07 -05:00
Mark Mackey	8a04c3428e	Merged with `unstable`	2022-11-30 17:29:10 -06:00
Mark Mackey	f5e6a54f05	Refactored Execution Layer & Fixed Some Tests	2022-11-29 18:18:33 -06:00
Mark Mackey	36170ec428	Fixed some BeaconChain Tests	2022-11-29 18:18:18 -06:00
GeemoCandama	3534c85e30	Optimize finalized chain sync by skipping newPayload messages (#3738 ) ## Issue Addressed #3704 ## Proposed Changes Adds is_syncing_finalized: bool parameter for block verification functions. Sets the payload_verification_status to Optimistic if is_syncing_finalized is true. Uses SyncState in NetworkGlobals in BeaconProcessor to retrieve the syncing status. ## Additional Info I could implement FinalizedSignatureVerifiedBlock if you think it would be nicer.	2022-11-29 08:19:27 +00:00
realbigsean	48b2efce9f	merge with upstream	2022-11-22 18:38:30 -05:00
ethDreamer	24e5252a55	Massive Update to Engine API (#3740 ) * Massive Update to Engine API * Update beacon_node/execution_layer/src/engine_api/json_structures.rs Co-authored-by: Michael Sproul <micsproul@gmail.com> * Update beacon_node/execution_layer/src/engine_api/json_structures.rs Co-authored-by: Michael Sproul <micsproul@gmail.com> * Update beacon_node/beacon_chain/src/execution_payload.rs Co-authored-by: realbigsean <seananderson33@GMAIL.com> * Update beacon_node/execution_layer/src/engine_api.rs Co-authored-by: realbigsean <seananderson33@GMAIL.com> Co-authored-by: Michael Sproul <micsproul@gmail.com> Co-authored-by: realbigsean <seananderson33@GMAIL.com>	2022-11-22 13:27:48 -05:00
realbigsean	dc87156641	block and blob handling progress	2022-11-19 16:53:34 -05:00
Michael Sproul	edf23bb40e	Fix attestation shuffling filter (#3629 ) ## Issue Addressed Fix a bug in block production that results in blocks with 0 attestations during the first slot of an epoch. The bug is marked by debug logs of the form: > DEBG Discarding attestation because of missing ancestor, block_root: 0x3cc00d9c9e0883b2d0db8606278f2b8423d4902f9a1ee619258b5b60590e64f8, pivot_slot: 4042591 It occurs when trying to look up the shuffling decision root for an attestation from a slot which is prior to fork choice's finalized block. This happens frequently when proposing in the first slot of the epoch where we have: - `current_epoch == n` - `attestation.data.target.epoch == n - 1` - attestation shuffling epoch `== n - 3` (decision block being the last block of `n - 3`) - `state.finalized_checkpoint.epoch == n - 2` (first block of `n - 2` is finalized) Hence the shuffling decision slot is out of range of the fork choice backwards iterator _by a single slot_. Unfortunately this bug was hidden when we weren't pruning fork choice, and then reintroduced in v2.5.1 when we fixed the pruning (https://github.com/sigp/lighthouse/releases/tag/v2.5.1). There's no way to turn that off or disable the filtering in our current release, so we need a new release to fix this issue. Fortunately, it also does not occur on every epoch boundary because of the gradual pruning of fork choice every 256 blocks (~8 epochs): `01e84b71f5/consensus/proto_array/src/proto_array_fork_choice.rs (L16)` `01e84b71f5/consensus/proto_array/src/proto_array.rs (L713-L716)` So the probability of proposing a 0-attestation block given a proposal assignment is approximately `1/32 * 1/8 = 0.39%`. ## Proposed Changes - Load the block's shuffling ID from fork choice and verify it against the expected shuffling ID of the head state. This code was initially written before we had settled on a representation of shuffling IDs, so I think it's a nice simplification to make use of them here rather than more ad-hoc logic that fundamentally does the same thing. ## Additional Info Thanks to @moshe-blox for noticing this issue and bringing it to our attention.	2022-10-18 04:02:06 +00:00
Michael Sproul	59ec6b71b8	Consensus context with proposer index caching (#3604 ) ## Issue Addressed Closes https://github.com/sigp/lighthouse/issues/2371 ## Proposed Changes Backport some changes from `tree-states` that remove duplicated calculations of the `proposer_index`. With this change the proposer index should be calculated only once for each block, and then plumbed through to every place it is required. ## Additional Info In future I hope to add more data to the consensus context that is cached on a per-epoch basis, like the effective balances of validators and the base rewards. There are some other changes to remove indexing in tests that were also useful for `tree-states` (the `tree-states` types don't implement `Index`).	2022-10-15 22:25:54 +00:00
Paul Hauner	9246a92d76	Make garbage collection test less failure prone (#3599 ) ## Issue Addressed NA ## Proposed Changes This PR attempts to fix the following spurious CI failure: ``` ---- store_tests::garbage_collect_temp_states_from_failed_block stdout ---- thread 'store_tests::garbage_collect_temp_states_from_failed_block' panicked at 'disk store should initialize: DBError { message: "Error { message: \"IO error: lock /tmp/.tmp6DcBQ9/cold_db/LOCK: already held by process\" }" }', beacon_node/beacon_chain/tests/store_tests.rs:59:10 ``` I believe that some async task is taking a clone of the store and holding it in some other thread for a short time. This creates a race-condition when we try to open a new instance of the store. ## Additional Info NA	2022-09-23 03:52:44 +00:00
Paul Hauner	fa6ad1a11a	Deduplicate block root computation (#3590 ) ## Issue Addressed NA ## Proposed Changes This PR removes duplicated block root computation. Computing the `SignedBeaconBlock::canonical_root` has become more expensive since the merge as we need to compute the merke root of each transaction inside an `ExecutionPayload`. Computing the root for [a mainnet block](https://beaconcha.in/slot/4704236) is taking ~10ms on my i7-8700K CPU @ 3.70GHz (no sha extensions). Given that our median seen-to-imported time for blocks is presently 300-400ms, removing a few duplicated block roots (~30ms) could represent an easy 10% improvement. When we consider that the seen-to-imported times include operations after the block has been placed in the early attester cache, we could expect the 30ms to be more significant WRT our seen-to-attestable times. ## Additional Info NA	2022-09-23 03:52:42 +00:00
Michael Sproul	ca42ef2e5a	Prune finalized execution payloads (#3565 ) ## Issue Addressed Closes https://github.com/sigp/lighthouse/issues/3556 ## Proposed Changes Delete finalized execution payloads from the database in two places: 1. When running the finalization migration in `migrate_database`. We delete the finalized payloads between the last split point and the new updated split point. _If_ payloads are already pruned prior to this then this is sufficient to prune _all_ payloads as non-canonical payloads are already deleted by the head pruner, and all canonical payloads prior to the previous split will already have been pruned. 2. To address the fact that users will update to this code _after_ the merge on mainnet (and testnets), we need a one-off scan to delete the finalized payloads from the canonical chain. This is implemented in `try_prune_execution_payloads` which runs on startup and scans the chain back to the Bellatrix fork or the anchor slot (if checkpoint synced after Bellatrix). In the case where payloads are already pruned this check only imposes a single state load for the split state, which shouldn't be _too slow_. Even so, a flag `--prepare-payloads-on-startup=false` is provided to turn this off after it has run the first time, which provides faster start-up times. There is also a new `lighthouse db prune_payloads` subcommand for users who prefer to run the pruning manually. ## Additional Info The tests have been updated to not rely on finalized payloads in the database, instead using the `MockExecutionLayer` to reconstruct them. Additionally a check was added to `check_chain_dump` which asserts the non-existence or existence of payloads on disk depending on their slot.	2022-09-17 02:27:01 +00:00
Michael Sproul	66eca1a882	Refactor op pool for speed and correctness (#3312 ) ## Proposed Changes This PR has two aims: to speed up attestation packing in the op pool, and to fix bugs in the verification of attester slashings, proposer slashings and voluntary exits. The changes are bundled into a single database schema upgrade (v12). Attestation packing is sped up by removing several inefficiencies: - No more recalculation of `attesting_indices` during packing. - No (unnecessary) examination of the `ParticipationFlags`: a bitfield suffices. See `RewardCache`. - No re-checking of attestation validity during packing: the `AttestationMap` provides attestations which are "correct by construction" (I have checked this using Hydra). - No SSZ re-serialization for the clunky `AttestationId` type (it can be removed in a future release). So far the speed-up seems to be roughly 2-10x, from 500ms down to 50-100ms. Verification of attester slashings, proposer slashings and voluntary exits is fixed by: - Tracking the `ForkVersion`s that were used to verify each message inside the `SigVerifiedOp`. This allows us to quickly re-verify that they match the head state's opinion of what the `ForkVersion` should be at the epoch(s) relevant to the message. - Storing the `SigVerifiedOp` on disk rather than the raw operation. This allows us to continue track the fork versions after a reboot. This is mostly contained in this commit 52bb1840ae5c4356a8fc3a51e5df23ed65ed2c7f. ## Additional Info The schema upgrade uses the justified state to re-verify attestations and compute `attesting_indices` for them. It will drop any attestations that fail to verify, by the logic that attestations are most valuable in the few slots after they're observed, and are probably stale and useless by the time a node restarts. Exits and proposer slashings and similarly re-verified to obtain `SigVerifiedOp`s. This PR contains a runtime killswitch `--paranoid-block-proposal` which opts out of all the optimisations in favour of closely verifying every included message. Although I'm quite sure that the optimisations are correct this flag could be useful in the event of an unforeseen emergency. Finally, you might notice that the `RewardCache` appears quite useless in its current form because it is only updated on the hot-path immediately before proposal. My hope is that in future we can shift calls to `RewardCache::update` into the background, e.g. while performing the state advance. It is also forward-looking to `tree-states` compatibility, where iterating and indexing `state.{previous,current}_epoch_participation` is expensive and needs to be minimised.	2022-08-29 09:10:26 +00:00
Pawan Dhananjay	c25934956b	Remove INVALID_TERMINAL_BLOCK (#3385 ) ## Issue Addressed Resolves #3379 ## Proposed Changes Remove instances of `InvalidTerminalBlock` in lighthouse and use `Invalid {latest_valid_hash: "0x0000000000000000000000000000000000000000000000000000000000000000"}` to represent that status.	2022-08-10 07:52:58 +00:00
Michael Sproul	6bc4a2cc91	Update invalid head tests (#3400 ) ## Proposed Changes Update the invalid head tests so that they work with the current default fork choice configuration. Thanks @realbigsean for fixing the persistence test and the EF tests. Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-08-05 23:41:09 +00:00
Paul Hauner	bcfde6e7df	Indicate that invalid blocks are optimistic (#3383 ) ## Issue Addressed NA ## Proposed Changes This PR will make Lighthouse return blocks with invalid payloads via the API with `execution_optimistic = true`. This seems a bit awkward, however I think it's better than returning a 404 or some other error. Let's consider the case where the only possible head is invalid (#3370 deals with this). In such a scenario all of the duties endpoints will start failing because the head is invalid. I think it would be better if the duties endpoints continue to work, because it's likely that even though the head is invalid the duties are still based upon valid blocks and we want the VC to have them cached. There's no risk to the VC here because we won't actually produce an attestation pointing to an invalid head. Ultimately, I don't think it's particularly important for us to distinguish between optimistic and invalid blocks on the API. Neither should be trusted and the only real reason that we track this is so we can try and fork around the invalid blocks. ## Additional Info - ~~Blocked on #3370~~	2022-07-30 05:08:57 +00:00
ethDreamer	034260bd99	Initial Commit of Retrospective OTB Verification (#3372 ) ## Issue Addressed * #2983 ## Proposed Changes Basically followed the [instructions laid out here](https://github.com/sigp/lighthouse/issues/2983#issuecomment-1062494947) Co-authored-by: Paul Hauner <paul@paulhauner.com> Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>	2022-07-30 00:22:38 +00:00
Paul Hauner	25f0e261cb	Don't return errors when fork choice fails (#3370 ) ## Issue Addressed NA ## Proposed Changes There are scenarios where the only viable head will have an invalid execution payload, in this scenario the `get_head` function on `proto_array` will return an error. We must recover from this scenario by importing blocks from the network. This PR stops `BeaconChain::recompute_head` from returning an error so that we can't accidentally start down-scoring peers or aborting block import just because the current head has an invalid payload. ## Reviewer Notes The following changes are included: 1. Allow `fork_choice.get_head` to fail gracefully in `BeaconChain::process_block` when trying to update the `early_attester_cache`; simply don't add the block to the cache rather than aborting the entire process. 1. Don't return an error from `BeaconChain::recompute_head_at_current_slot` and `BeaconChain::recompute_head` to defensively prevent calling functions from aborting any process just because the fork choice function failed to run. - This should have practically no effect, since most callers were still continuing if recomputing the head failed. - The outlier is that the API will return 200 rather than a 500 when fork choice fails. 1. Add the `ProtoArrayForkChoice::set_all_blocks_to_optimistic` function to recover from the scenario where we've rebooted and the persisted fork choice has an invalid head.	2022-07-28 13:57:09 +00:00
realbigsean	20ebf1f3c1	Realized unrealized experimentation (#3322 ) ## Issue Addressed Add a flag that optionally enables unrealized vote tracking. Would like to test out on testnets and benchmark differences in methods of vote tracking. This PR includes a DB schema upgrade to enable to new vote tracking style. Co-authored-by: realbigsean <sean@sigmaprime.io> Co-authored-by: Paul Hauner <paul@paulhauner.com> Co-authored-by: sean <seananderson33@gmail.com> Co-authored-by: Mac L <mjladson@pm.me>	2022-07-25 23:53:26 +00:00
Pawan Dhananjay	e5e4e62758	Don't create a execution payload with same timestamp as terminal block (#3331 ) ## Issue Addressed Resolves #3316 ## Proposed Changes This PR fixes an issue where lighthouse created a transition block with `block.execution_payload().timestamp == terminal_block.timestamp` if the terminal block was created at the slot boundary.	2022-07-18 23:15:41 +00:00
Paul Hauner	1f54e10b7b	Do not interpret "latest valid hash" as identifying a valid hash (#3327 ) ## Issue Addressed NA ## Proposed Changes After some discussion in Discord with @mkalinin it was raised that it was not the intention of the engine API to have CLs validate the `latest_valid_hash` (LVH) and all ancestors. Whilst I believe the engine API is being updated such that the LVH must identify a valid hash or be set to some junk value, I'm not confident that we can rely upon the LVH as being valid (at least for now) due to the confusion surrounding it. Being able to validate blocks via the LVH is a relatively minor optimisation; if the LVH value ends up becoming our head we'll send an fcU and get the VALID status there. Falsely marking a block as valid has serious consequences and since it's a minor optimisation to use LVH I think that we don't take the risk. For clarity, we will still invalidate the descendants of the LVH, we just wont validate the ancestors. ## Additional Info NA	2022-07-13 23:07:49 +00:00
Paul Hauner	7a6e6928a3	Further remove EE redundancy (#3324 ) ## Issue Addressed Resolves #3176 ## Proposed Changes Continues from PRs by @divagant-martian to gradually remove EL redundancy (see #3284, #3257). This PR achieves: - Removes the `broadcast` and `first_success` methods. The functional impact is that every request to the EE will always be tried immediately, regardless of the cached `EngineState` (this resolves #3176). Previously we would check the engine state before issuing requests, this doesn't make sense in a single-EE world; there's only one EE so we might as well try it for every request. - Runs the upcheck/watchdog routine once per slot rather than thrice. When we had multiple EEs frequent polling was useful to try and detect when the primary EE had come back online and we could switch to it. That's not as relevant now. - Always creates logs in the `Engines::upcheck` function. Previously we would mute some logs since they could get really noisy when one EE was down but others were functioning fine. Now we only have one EE and are upcheck-ing it less, it makes sense to always produce logs. This PR purposefully does not achieve: - Updating all occurances of "engines" to "engine". I'm trying to keep the diff small and manageable. We can come back for this. ## Additional Info NA	2022-07-13 20:31:39 +00:00
Paul Hauner	be4e261e74	Use async code when interacting with EL (#3244 ) ## Overview This rather extensive PR achieves two primary goals: 1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state. 2. Refactors fork choice, block production and block processing to `async` functions. Additionally, it achieves: - Concurrent forkchoice updates to the EL and cache pruning after a new head is selected. - Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production. - Concurrent per-block-processing and execution payload verification during block processing. - The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?): - I had to do this to deal with sending blocks into spawned tasks. - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones. - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap. - Avoids cloning all the blocks in every chain segment during sync. - It also has the potential to clean up our code where we need to pass an owned block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough 😅) - The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs. For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273 ## Changes to `canonical_head` and `fork_choice` Previously, the `BeaconChain` had two separate fields: ``` canonical_head: RwLock<Snapshot>, fork_choice: RwLock<BeaconForkChoice> ``` Now, we have grouped these values under a single struct: ``` canonical_head: CanonicalHead { cached_head: RwLock<Arc<Snapshot>>, fork_choice: RwLock<BeaconForkChoice> } ``` Apart from ergonomics, the only actual change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously. ## Breaking Changes ### The `state` (root) field in the `finalized_checkpoint` SSE event Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event: 1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`. 4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots. Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](`de2b2801c8/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java (L171-L182)`) it uses [`getStateRootFromBlockRoot`](`de2b2801c8/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java (L336-L341)`) which uses (1). I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku. ## Notes for Reviewers I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct. I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking". I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run at least once means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows when it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it. I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around. Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2. You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests: - Changing tests to be `tokio::async` tests. - Adding `.await` to fork choice, block processing and block production functions. - Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`. - Wrapping `SignedBeaconBlock` in an `Arc`. - In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant. I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic. Co-authored-by: Mac L <mjladson@pm.me>	2022-07-03 05:36:50 +00:00
realbigsean	a7da0677d5	Remove builder redundancy (#3294 ) ## Issue Addressed This PR is a subset of the changes in #3134. Unstable will still not function correctly with the new builder spec once this is merged, #3134 should be used on testnets ## Proposed Changes - Removes redundancy in "builders" (servers implementing the builder spec) - Renames `payload-builder` flag to `builder` - Moves from old builder RPC API to new HTTP API, but does not implement the validator registration API (implemented in https://github.com/sigp/lighthouse/pull/3194) Co-authored-by: sean <seananderson33@gmail.com> Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-07-01 01:15:19 +00:00
Paul Hauner	db8a6f81ea	Prevent attestation to future blocks from early attester cache (#3183 ) ## Issue Addressed N/A ## Proposed Changes Prevents the early attester cache from producing attestations to future blocks. This bug could result in a missed head vote if the BN was requested to produce an attestation for an earlier slot than the head block during the (usually) short window of time between verifying a block and setting it as the head. This bug was noticed in an [Antithesis](https://andreagrieser.com/) test and diagnosed by @realbigsean. ## Additional Info NA	2022-05-17 01:51:25 +00:00
Michael Sproul	bcdd960ab1	Separate execution payloads in the DB (#3157 ) ## Proposed Changes Reduce post-merge disk usage by not storing finalized execution payloads in Lighthouse's database. ⚠️ This is achieved in a backwards-incompatible way for networks that have already merged ⚠️. Kiln users and shadow fork enjoyers will be unable to downgrade after running the code from this PR. The upgrade migration may take several minutes to run, and can't be aborted after it begins. The main changes are: - New column in the database called `ExecPayload`, keyed by beacon block root. - The `BeaconBlock` column now stores blinded blocks only. - Lots of places that previously used full blocks now use blinded blocks, e.g. analytics APIs, block replay in the DB, etc. - On finalization: - `prune_abanonded_forks` deletes non-canonical payloads whilst deleting non-canonical blocks. - `migrate_db` deletes finalized canonical payloads whilst deleting finalized states. - Conversions between blinded and full blocks are implemented in a compositional way, duplicating some work from Sean's PR #3134. - The execution layer has a new `get_payload_by_block_hash` method that reconstructs a payload using the EE's `eth_getBlockByHash` call. - I've tested manually that it works on Kiln, using Geth and Nethermind. - This isn't necessarily the most efficient method, and new engine APIs are being discussed to improve this: https://github.com/ethereum/execution-apis/pull/146. - We're depending on the `ethers` master branch, due to lots of recent changes. We're also using a workaround for https://github.com/gakonst/ethers-rs/issues/1134. - Payload reconstruction is used in the HTTP API via `BeaconChain::get_block`, which is now `async`. Due to the `async` fn, the `blocking_json` wrapper has been removed. - Payload reconstruction is used in network RPC to serve blocks-by-{root,range} responses. Here the `async` adjustment is messier, although I think I've managed to come up with a reasonable compromise: the handlers take the `SendOnDrop` by value so that they can drop it on _task completion_ (after the `fn` returns). Still, this is introducing disk reads onto core executor threads, which may have a negative performance impact (thoughts appreciated). ## Additional Info - [x] For performance it would be great to remove the cloning of full blocks when converting them to blinded blocks to write to disk. I'm going to experiment with a `put_block` API that takes the block by value, breaks it into a blinded block and a payload, stores the blinded block, and then re-assembles the full block for the caller. - [x] We should measure the latency of blocks-by-root and blocks-by-range responses. - [x] We should add integration tests that stress the payload reconstruction (basic tests done, issue for more extensive tests: https://github.com/sigp/lighthouse/issues/3159) - [x] We should (manually) test the schema v9 migration from several prior versions, particularly as blocks have changed on disk and some migrations rely on being able to load blocks. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2022-05-12 00:42:17 +00:00
Michael Sproul	ae47a93c42	Don't panic in forkchoiceUpdated handler (#3165 ) ## Issue Addressed Fix a panic due to misuse of the Tokio executor when processing a forkchoiceUpdated response. We were previously calling `process_invalid_execution_payload` from the async function `update_execution_engine_forkchoice_async`, which resulted in a panic because `process_invalid_execution_payload` contains a call to fork choice, which ultimately calls `block_on`. An example backtrace can be found here: https://gist.github.com/michaelsproul/ac5da03e203d6ffac672423eaf52fb20 ## Proposed Changes Wrap the call to `process_invalid_execution_payload` in a `spawn_blocking` so that `block_on` is no longer called from an async context. ## Additional Info - I've been thinking about how to catch bugs like this with static analysis (a new Clippy lint). - The payload validation tests have been re-worked to support distinct responses from the mock EE for newPayload and forkchoiceUpdated. Three new tests have been added covering the `Invalid`, `InvalidBlockHash` and `InvalidTerminalBlock` cases. - I think we need a bunch more tests of different legal and illegal variations	2022-05-04 23:30:34 +00:00
Paul Hauner	b49b4291a3	Disallow attesting to optimistic head (#3140 ) ## Issue Addressed NA ## Proposed Changes Disallow the production of attestations and retrieval of unaggregated attestations when they reference an optimistic head. Add tests to this end. I also moved `BeaconChain::produce_unaggregated_attestation_for_block` to the `BeaconChainHarness`. It was only being used during tests, so it's nice to stop pretending it's production code. I also needed something that could produce attestations to optimistic blocks in order to simulate scenarios where the justified checkpoint is determined invalid (if no one would attest to an optimistic block, we could never justify it and then flip it to invalid). ## Additional Info - ~~Blocked on #3126~~	2022-04-13 03:54:42 +00:00
ethDreamer	22002a4e68	Transition Block Proposer Preparation (#3088 ) ## Issue Addressed - #3058 Co-authored-by: Paul Hauner <paul@paulhauner.com>	2022-04-07 14:03:34 +00:00
Paul Hauner	8a40763183	Ensure VALID response from fcU updates protoarray (#3126 ) ## Issue Addressed NA ## Proposed Changes Ensures that a `VALID` response from a `forkchoiceUpdate` call will update that block in `ProtoArray`. I also had to modify the mock execution engine so it wouldn't return valid when all payloads were supposed to be some other static value. ## Additional Info NA	2022-04-05 20:58:17 +00:00
Paul Hauner	42cdaf5840	Add tests for importing blocks on invalid parents (#3123 ) ## Issue Addressed NA ## Proposed Changes - Adds more checks to prevent importing blocks atop parent with invalid execution payloads. - Adds a test for these conditions. ## Additional Info NA	2022-04-05 20:58:16 +00:00
realbigsean	ea783360d3	Kiln mev boost (#3062 ) ## Issue Addressed MEV boost compatibility ## Proposed Changes See #2987 ## Additional Info This is blocked on the stabilization of a couple specs, [here](https://github.com/ethereum/beacon-APIs/pull/194) and [here](https://github.com/flashbots/mev-boost/pull/20). Additional TODO's and outstanding questions - [ ] MEV boost JWT Auth - [ ] Will `builder_proposeBlindedBlock` return the revealed payload for the BN to propogate - [ ] Should we remove `private-tx-proposals` flag and communicate BN <> VC with blinded blocks by default once these endpoints enter the beacon-API's repo? This simplifies merge transition logic. Co-authored-by: realbigsean <seananderson33@gmail.com> Co-authored-by: realbigsean <sean@sigmaprime.io>	2022-03-31 07:52:23 +00:00
Paul Hauner	267d8babc8	Prepare proposer (#3043 ) ## Issue Addressed Resolves #2936 ## Proposed Changes Adds functionality for calling [`validator/prepare_beacon_proposer`](https://ethereum.github.io/beacon-APIs/?urls.primaryName=dev#/Validator/prepareBeaconProposer) in advance. There is a `BeaconChain::prepare_beacon_proposer` method which, which called, computes the proposer for the next slot. If that proposer has been registered via the `validator/prepare_beacon_proposer` API method, then the `beacon_chain.execution_layer` will be provided the `PayloadAttributes` for us in all future forkchoiceUpdated calls. An artificial forkchoiceUpdated call will be created 4s before each slot, when the head updates and when a validator updates their information. Additionally, I added strict ordering for calls from the `BeaconChain` to the `ExecutionLayer`. I'm not certain the `ExecutionLayer` will always maintain this ordering, but it's a good start to have consistency from the `BeaconChain`. There are some deadlock opportunities introduced, they are documented in the code. ## Additional Info - ~~Blocked on #2837~~ Co-authored-by: realbigsean <seananderson33@GMAIL.com>	2022-03-09 00:42:05 +00:00
Paul Hauner	aea43b626b	Rename random to prev_randao (#3040 ) ## Issue Addressed As discussed on last-night's consensus call, the testnets next week will target the [Kiln Spec v2](https://hackmd.io/@n0ble/kiln-spec). Presently, we support Kiln V1. V2 is backwards compatible, except for renaming `random` to `prev_randao` in: - https://github.com/ethereum/execution-apis/pull/180 - https://github.com/ethereum/consensus-specs/pull/2835 With this PR we'll no longer be compatible with the existing Kintsugi and Kiln testnets, however we'll be ready for the testnets next week. I raised this breaking change in the call last night, we are all keen to move forward and break things. We now target the [`merge-kiln-v2`](https://github.com/MariusVanDerWijden/go-ethereum/tree/merge-kiln-v2) branch for interop with Geth. This required adding the `--http.aauthport` to the tester to avoid a port conflict at startup. ### Changes to exec integration tests There's some change in the `merge-kiln-v2` version of Geth that means it can't compile on a vanilla Github runner. Bumping the `go` version on the runner solved this issue. Whilst addressing this, I refactored the `testing/execution_integration` crate to be a binary rather than a library with tests. This means that we don't need to run the `build.rs` and build Geth whenever someone runs `make lint` or `make test-release`. This is nice for everyday users, but it's also nice for CI so that we can have a specific runner for these tests and we don't need to ensure all runners support everything required to build all execution clients. ## More Info - [x] ~~EF tests are failing since the rename has broken some tests that reference the old field name. I have been told there will be new tests released in the coming days (25/02/22 or 26/02/22).~~	2022-03-03 02:10:57 +00:00
Paul Hauner	27e83b888c	Retrospective invalidation of exec. payloads for opt. sync (#2837 ) ## Issue Addressed NA ## Proposed Changes Adds the functionality to allow blocks to be validated/invalidated after their import as per the [optimistic sync spec](https://github.com/ethereum/consensus-specs/blob/dev/sync/optimistic.md#how-to-optimistically-import-blocks). This means: - Updating `ProtoArray` to allow flipping the `execution_status` of ancestors/descendants based on payload validity updates. - Creating separation between `execution_layer` and the `beacon_chain` by creating a `PayloadStatus` struct. - Refactoring how the `execution_layer` selects a `PayloadStatus` from the multiple statuses returned from multiple EEs. - Adding testing framework for optimistic imports. - Add `ExecutionBlockHash(Hash256)` new-type struct to avoid confusion between beacon block roots and execution payload hashes. - Add `merge` to [`FORKS`](`c3a793fd73/Makefile (L17)`) in the `Makefile` to ensure we test the beacon chain with merge settings. - Fix some tests here that were failing due to a missing execution layer. ## TODO - [ ] Balance tests Co-authored-by: Mark Mackey <mark@sigmaprime.io>	2022-02-28 22:07:48 +00:00
Michael Sproul	5e1f8a8480	Update to Rust 1.59 and 2021 edition (#3038 ) ## Proposed Changes Lots of lint updates related to `flat_map`, `unwrap_or_else` and string patterns. I did a little more creative refactoring in the op pool, but otherwise followed Clippy's suggestions. ## Additional Info We need this PR to unblock CI.	2022-02-25 00:10:17 +00:00
Michael Sproul	99d2c33387	Avoid looking up pre-finalization blocks (#2909 ) ## Issue Addressed This PR fixes the unnecessary `WARN Single block lookup failed` messages described here: https://github.com/sigp/lighthouse/pull/2866#issuecomment-1008442640 ## Proposed Changes Add a new cache to the `BeaconChain` that tracks the block roots of blocks from before finalization. These could be blocks from the canonical chain (which might need to be read from disk), or old pre-finalization blocks that have been forked out. The cache also stores a set of block roots for in-progress single block lookups, which duplicates some of the information from sync's `single_block_lookups` hashmap: `a836e180f9/beacon_node/network/src/sync/manager.rs (L192-L196)` On a live node you can confirm that the cache is working by grepping logs for the message: `Rejected attestation to finalized block`.	2022-01-27 22:58:32 +00:00
Michael Sproul	ef7351ddfe	Update to spec v1.1.8 (#2893 ) ## Proposed Changes Change the canonical fork name for the merge to Bellatrix. Keep other merge naming the same to avoid churn. I've also fixed and enabled the `fork` and `transition` tests for Bellatrix, and the v1.1.7 fork choice tests. Additionally, the `BellatrixPreset` has been added with tests. It gets served via the `/config/spec` API endpoint along with the other presets.	2022-01-19 00:24:19 +00:00
Paul Hauner	02e2fd2fb8	Add early attester cache (#2872 ) ## Issue Addressed NA ## Proposed Changes Introduces a cache to attestation to produce atop blocks which will become the head, but are not fully imported (e.g., not inserted into the database). Whilst attesting to a block before it's imported is rather easy, if we're going to produce that attestation then we also need to be able to: 1. Verify that attestation. 1. Respond to RPC requests for the `beacon_block_root`. Attestation verification (1) is partially covered. Since we prime the shuffling cache before we insert the block into the early attester cache, we should be fine for all typical use-cases. However, it is possible that the cache is washed out before we've managed to insert the state into the database and then attestation verification will fail with a "missing beacon state"-type error. Providing the block via RPC (2) is also partially covered, since we'll check the database and the early attester cache when responding a blocks-by-root request. However, we'll still omit the block from blocks-by-range requests (until the block lands in the DB). I think this is fine, since there's no guarantee that we return all blocks for those responses. Another important consideration is whether or not the parent of the early attester block is available in the databse. If it were not, we might fail to respond to blocks-by-root request that are iterating backwards to collect a chain of blocks. I argue that we will always have the parent of the early attester block in the database. This is because we are holding the fork-choice write-lock when inserting the block into the early attester cache and we do not drop that until the block is in the database.	2022-01-11 01:35:55 +00:00
Michael Sproul	3b61ac9cbf	Optimise slasher DB layout and switch to MDBX (#2776 ) ## Issue Addressed Closes #2286 Closes #2538 Closes #2342 ## Proposed Changes Part II of major slasher optimisations after #2767 These changes will be backwards-incompatible due to the move to MDBX (and the schema change) 😱 * [x] Shrink attester keys from 16 bytes to 7 bytes. * [x] Shrink attester records from 64 bytes to 6 bytes. * [x] Separate `DiskConfig` from regular `Config`. * [x] Add configuration for the LRU cache size. * [x] Add a "migration" that deletes any legacy LMDB database.	2021-12-21 08:23:17 +00:00
Michael Sproul	a290a3c537	Add configurable block replayer (#2863 ) ## Issue Addressed Successor to #2431 ## Proposed Changes * Add a `BlockReplayer` struct to abstract over the intricacies of calling `per_slot_processing` and `per_block_processing` while avoiding unnecessary tree hashing. * Add a variant of the forwards state root iterator that does not require an `end_state`. * Use the `BlockReplayer` when reconstructing states in the database. Use the efficient forwards iterator for frozen states. * Refactor the iterators to remove `Arc<HotColdDB>` (this seems to be neater than making _everything_ an `Arc<HotColdDB>` as I did in #2431). Supplying the state roots allow us to avoid building a tree hash cache at all when reconstructing historic states, which saves around 1 second flat (regardless of `slots-per-restore-point`). This is a small percentage of worst-case state load times with 200K validators and SPRP=2048 (~15s vs ~16s) but a significant speed-up for more frequent restore points: state loads with SPRP=32 should be now consistently <500ms instead of 1.5s (a ~3x speedup). ## Additional Info Required by https://github.com/sigp/lighthouse/pull/2628	2021-12-21 06:30:52 +00:00
realbigsean	b22ac95d7f	v1.1.6 Fork Choice changes (#2822 ) ## Issue Addressed Resolves: https://github.com/sigp/lighthouse/issues/2741 Includes: https://github.com/sigp/lighthouse/pull/2853 so that we can get ssz static tests passing here on v1.1.6. If we want to merge that first, we can make this diff slightly smaller ## Proposed Changes - Changes the `justified_epoch` and `finalized_epoch` in the `ProtoArrayNode` each to an `Option<Checkpoint>`. The `Option` is necessary only for the migration, so not ideal. But does allow us to add a default logic to `None` on these fields during the database migration. - Adds a database migration from a legacy fork choice struct to the new one, search for all necessary block roots in fork choice by iterating through blocks in the db. - updates related to https://github.com/ethereum/consensus-specs/pull/2727 - We will have to update the persisted forkchoice to make sure the justified checkpoint stored is correct according to the updated fork choice logic. This boils down to setting the forkchoice store's justified checkpoint to the justified checkpoint of the block that advanced the finalized checkpoint to the current one. - AFAICT there's no migration steps necessary for the update to allow applying attestations from prior blocks, but would appreciate confirmation on that - I updated the consensus spec tests to v1.1.6 here, but they will fail until we also implement the proposer score boost updates. I confirmed that the previously failing scenario `new_finalized_slot_is_justified_checkpoint_ancestor` will now pass after the boost updates, but haven't confirmed _all_ tests will pass because I just quickly stubbed out the proposer boost test scenario formatting. - This PR now also includes proposer boosting https://github.com/ethereum/consensus-specs/pull/2730 ## Additional Info I realized checking justified and finalized roots in fork choice makes it more likely that we trigger this bug: https://github.com/ethereum/consensus-specs/pull/2727 It's possible the combination of justified checkpoint and finalized checkpoint in the forkchoice store is different from in any block in fork choice. So when trying to startup our store's justified checkpoint seems invalid to the rest of fork choice (but it should be valid). When this happens we get an `InvalidBestNode` error and fail to start up. So I'm including that bugfix in this branch. Todo: - [x] Fix fork choice tests - [x] Self review - [x] Add fix for https://github.com/ethereum/consensus-specs/pull/2727 - [x] Rebase onto Kintusgi - [x] Fix `num_active_validators` calculation as @michaelsproul pointed out - [x] Clean up db migrations Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-12-13 20:43:22 +00:00
Paul Hauner	a1033a9247	Add `BeaconChainHarness` tests for The Merge (#2661 ) * Start adding merge tests * Expose MockExecutionLayer * Add mock_execution_layer to BeaconChainHarness * Progress with merge test * Return more detailed errors with gas limit issues * Use a better gas limit in block gen * Ensure TTD is met in block gen * Fix basic_merge tests * Start geth testing * Fix conflicts after rebase * Remove geth tests * Improve merge test * Address clippy lints * Make pow block gen a pure function * Add working new test, breaking existing test * Fix test names * Add should_panic * Don't run merge tests in debug * Detect a tokio runtime when starting MockServer * Fix clippy lint, include merge tests	2021-12-02 14:26:52 +11:00
Paul Hauner	801f6f7425	Disable autotests for beacon_chain (#2658 )	2021-12-02 14:26:52 +11:00
Paul Hauner	931daa40d7	Add fork choice EF tests (#2737 ) ## Issue Addressed Resolves #2545 ## Proposed Changes Adds the long-overdue EF tests for fork choice. Although we had pretty good coverage via other implementations that closely followed our approach, it is nonetheless important for us to implement these tests too. During testing I found that we were using a hard-coded `SAFE_SLOTS_TO_UPDATE_JUSTIFIED` value rather than one from the `ChainSpec`. This caused a failure during a minimal preset test. This doesn't represent a risk to mainnet or testnets, since the hard-coded value matched the mainnet preset. ## Failing Cases There is one failing case which is presently marked as `SkippedKnownFailure`: ``` case 4 ("new_finalized_slot_is_justified_checkpoint_ancestor") from /home/paul/development/lighthouse/testing/ef_tests/consensus-spec-tests/tests/minimal/phase0/fork_choice/on_block/pyspec_tests/new_finalized_slot_is_justified_checkpoint_ancestor failed with NotEqual: head check failed: Got Head { slot: Slot(40), root: 0x9183dbaed4191a862bd307d476e687277fc08469fc38618699863333487703e7 } \| Expected Head { slot: Slot(24), root: 0x105b49b51bf7103c182aa58860b039550a89c05a4675992e2af703bd02c84570 } ``` This failure is due to #2741. It's not a particularly high priority issue at the moment, so we fix it after merging this PR.	2021-11-08 07:29:04 +00:00
Paul Hauner	e2d09bb8ac	Add `BeaconChainHarness::builder` (#2707 ) ## Issue Addressed NA ## Proposed Changes This PR is near-identical to https://github.com/sigp/lighthouse/pull/2652, however it is to be merged into `unstable` instead of `merge-f2f`. Please see that PR for reasoning. I'm making this duplicate PR to merge to `unstable` in an effort to shrink the diff between `unstable` and `merge-f2f` by doing smaller, lead-up PRs. ## Additional Info NA	2021-10-14 02:58:10 +00:00
Wink Saville	58870fc6d3	Add test_logger as feature to logging (#2586 ) ## Issue Addressed Fix #2585 ## Proposed Changes Provide a canonical version of test_logger that can be used throughout lighthouse. ## Additional Info This allows tests to conditionally emit logging data by adding test_logger as the default logger. And then when executing `cargo test --features logging/test_logger` log output will be visible: wink@3900x:~/lighthouse/common/logging/tests/test-feature-test_logger (Add-test_logger-as-feature-to-logging) $ cargo test --features logging/test_logger Finished test [unoptimized + debuginfo] target(s) in 0.02s Running unittests (target/debug/deps/test_logger-e20115db6a5e3714) running 1 test Sep 10 12:53:45.212 INFO hi, module: test_logger:8 test tests::test_fn_with_logging ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s Doc-tests test-logger running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s Or, in normal scenarios where logging isn't needed, executing `cargo test` the log output will not be visible: wink@3900x:~/lighthouse/common/logging/tests/test-feature-test_logger (Add-test_logger-as-feature-to-logging) $ cargo test Finished test [unoptimized + debuginfo] target(s) in 0.02s Running unittests (target/debug/deps/test_logger-02e02f8d41e8cf8a) running 1 test test tests::test_fn_with_logging ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s Doc-tests test-logger running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s	2021-10-06 00:46:07 +00:00
Paul Hauner	be11437c27	Batch BLS verification for attestations (#2399 ) ## Issue Addressed NA ## Proposed Changes Adds the ability to verify batches of aggregated/unaggregated attestations from the network. When the `BeaconProcessor` finds there are messages in the aggregated or unaggregated attestation queues, it will first check the length of the queue: - `== 1` verify the attestation individually. - `>= 2` take up to 64 of those attestations and verify them in a batch. Notably, we only perform batch verification if the queue has a backlog. We don't apply any artificial delays to attestations to try and force them into batches. ### Batching Details To assist with implementing batches we modify `beacon_chain::attestation_verification` to have two distinct categories for attestations: - Indexed attestations: those which have passed initial validation and were valid enough for us to derive an `IndexedAttestation`. - Verified attestations: those attestations which were indexed and also passed signature verification. These are well-formed, interesting messages which were signed by validators. The batching functions accept `n` attestations and then return `n` attestation verification `Result`s, where those `Result`s can be any combination of `Ok` or `Err`. In other words, we attempt to verify as many attestations as possible and return specific per-attestation results so peer scores can be updated, if required. When we batch verify attestations, we first try to map all those attestations to indexed attestations. If any of those attestations were able to be indexed, we then perform batch BLS verification on those indexed attestations. If the batch verification succeeds, we convert them into verified attestations, disabling individual signature checking. If the batch fails, we convert to verified attestations with individual signature checking enabled. Ultimately, we optimistically try to do a batch verification of attestation signatures and fall-back to individual verification if it fails. This opens an attach vector for "poisoning" the attestations and causing us to waste a batch verification. I argue that peer scoring should do a good-enough job of defending against this and the typical-case gains massively outweigh the worst-case losses. ## Additional Info Before this PR, attestation verification took the attestations by value (instead of by reference). It turns out that this was unnecessary and, in my opinion, resulted in some undesirable ergonomics (e.g., we had to pass the attestation back in the `Err` variant to avoid clones). In this PR I've modified attestation verification so that it now takes a reference. I refactored the `beacon_chain/tests/attestation_verification.rs` tests so they use a builder-esque "tester" struct instead of a weird macro. It made it easier for me to test individual/batch with the same set of tests and I think it was a nice tidy-up. Notably, I did this last to try and make sure my new refactors to actual production code would pass under the existing test suite.	2021-09-22 08:49:41 +00:00
Michael Sproul	9667dc2f03	Implement checkpoint sync (#2244 ) ## Issue Addressed Closes #1891 Closes #1784 ## Proposed Changes Implement checkpoint sync for Lighthouse, enabling it to start from a weak subjectivity checkpoint. ## Additional Info - [x] Return unavailable status for out-of-range blocks requested by peers (#2561) - [x] Implement sync daemon for fetching historical blocks (#2561) - [x] Verify chain hashes (either in `historical_blocks.rs` or the calling module) - [x] Consistency check for initial block + state - [x] Fetch the initial state and block from a beacon node HTTP endpoint - [x] Don't crash fetching beacon states by slot from the API - [x] Background service for state reconstruction, triggered by CLI flag or API call. Considered out of scope for this PR: - Drop the requirement to provide the `--checkpoint-block` (this would require some pretty heavy refactoring of block verification) Co-authored-by: Diva M <divma@protonmail.com>	2021-09-22 00:37:28 +00:00
Michael Sproul	10945e0619	Revert bad blocks on missed fork (#2529 ) ## Issue Addressed Closes #2526 ## Proposed Changes If the head block fails to decode on start up, do two things: 1. Revert all blocks between the head and the most recent hard fork (to `fork_slot - 1`). 2. Reset fork choice so that it contains the new head, and all blocks back to the new head's finalized checkpoint. ## Additional Info I tweaked some of the beacon chain test harness stuff in order to make it generic enough to test with a non-zero slot clock on start-up. In the process I consolidated all the various `new_` methods into a single generic one which will hopefully serve all future uses 🤞	2021-08-30 06:41:31 +00:00
Paul Hauner	ceda27371d	Ensure doppelganger detects attestations in blocks (#2495 ) ## Issue Addressed NA ## Proposed Changes When testing our (not-yet-released) Doppelganger implementation, I noticed that we aren't detecting attestations included in blocks (only those on the gossip network). This is because during [block processing](`e8c0d1f19b/beacon_node/beacon_chain/src/beacon_chain.rs (L2168)`) we only update the `observed_attestations` cache with each attestation, but not the `observed_attesters` cache. This is the correct behaviour when we consider the [p2p spec](https://github.com/ethereum/eth2.0-specs/blob/v1.0.1/specs/phase0/p2p-interface.md): > [IGNORE] There has been no other valid attestation seen on an attestation subnet that has an identical attestation.data.target.epoch and participating validator index. We're doing the right thing here and still allowing attestations on gossip that we've seen in a block. However, this doesn't work so nicely for Doppelganger. To resolve this, I've taken the following steps: - Add a `observed_block_attesters` cache. - Rename `observed_attesters` to `observed_gossip_attesters`. ## TODO - [x] Add a test to ensure a validator that's been seen in a block attestation (but not a gossip attestation) returns `true` for `BeaconChain::validator_seen_at_epoch`. - [x] Add a test to ensure `observed_block_attesters` isn't polluted via gossip attestations and vice versa. Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-08-09 02:43:03 +00:00
Michael Sproul	17a2c778e3	Altair validator client and HTTP API (#2404 ) ## Proposed Changes * Implement the validator client and HTTP API changes necessary to support Altair Co-authored-by: realbigsean <seananderson33@gmail.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2021-08-06 00:47:31 +00:00
Michael Sproul	f5bdca09ff	Update to spec v1.1.0-beta.1 (#2460 ) ## Proposed Changes Update to the latest version of the Altair spec, which includes new tests and a tweak to the target sync aggregators. ## Additional Info This change is _not_ required for the imminent Altair devnet, and is waiting on the merge of #2321 to unstable. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2021-07-27 05:43:35 +00:00
realbigsean	a3a7f39b0d	[Altair] Sync committee pools (#2321 ) Add pools supporting sync committees: - naive sync aggregation pool - observed sync contributions pool - observed sync contributors pool - observed sync aggregators pool Add SSZ types and tests related to sync committee signatures. Co-authored-by: Michael Sproul <michael@sigmaprime.io> Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-07-15 00:52:02 +00:00
divma	304fb05e44	Maintain attestations that reference unknown blocks (#2319 ) ## Issue Addressed #635 ## Proposed Changes - Keep attestations that reference a block we have not seen for 30secs before being re processed - If we do import the block before that time elapses, it is reprocessed in that moment - The first time it fails, do nothing wrt to gossipsub propagation or peer downscoring. If after being re processed it fails, downscore with a `LowToleranceError` and ignore the message.	2021-07-14 05:24:08 +00:00
Michael Sproul	b4689e20c6	Altair consensus changes and refactors (#2279 ) ## Proposed Changes Implement the consensus changes necessary for the upcoming Altair hard fork. ## Additional Info This is quite a heavy refactor, with pivotal types like the `BeaconState` and `BeaconBlock` changing from structs to enums. This ripples through the whole codebase with field accesses changing to methods, e.g. `state.slot` => `state.slot()`. Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-07-09 06:15:32 +00:00
Mac L	406e3921d9	Use forwards iterator for state root lookups (#2422 ) ## Issue Addressed #2377 ## Proposed Changes Implement the same code used for block root lookups (from #2376) to state root lookups in order to improve performance and reduce associated memory spikes (e.g. from certain HTTP API requests). ## Additional Changes - Tests using `rev_iter_state_roots` and `rev_iter_block_roots` have been refactored to use their `forwards` versions instead. - The `rev_iter_state_roots` and `rev_iter_block_roots` functions are now unused and have been removed. - The `state_at_slot` function has been changed to use the `forwards` iterator. ## Additional Info - Some tests still need to be refactored to use their `forwards_iter` versions. These tests start their iteration from a specific beacon state and thus use the `rev_iter_state_roots_from` and `rev_iter_block_roots_from` functions. If they can be refactored, those functions can also be removed.	2021-07-06 02:38:53 +00:00
realbigsean	b1657a60e9	Reorg events (#2090 ) ## Issue Addressed Resolves #2088 ## Proposed Changes Add the `chain_reorg` SSE event topic ## Additional Info Co-authored-by: realbigsean <seananderson33@gmail.com> Co-authored-by: Paul Hauner <paul@paulhauner.com>	2021-06-17 02:10:46 +00:00
Paul Hauner	4c7bb4984c	Use the forwards iterator more often (#2376 ) ## Issue Addressed NA ## Primary Change When investigating memory usage, I noticed that retrieving a block from an early slot (e.g., slot 900) would cause a sharp increase in the memory footprint (from 400mb to 800mb+) which seemed to be ever-lasting. After some investigation, I found that the reverse iteration from the head back to that slot was the likely culprit. To counter this, I've switched the `BeaconChain::block_root_at_slot` to use the forwards iterator, instead of the reverse one. I also noticed that the networking stack is using `BeaconChain::root_at_slot` to check if a peer is relevant (`check_peer_relevance`). Perhaps the steep, seemingly-random-but-consistent increases in memory usage are caused by the use of this function. Using the forwards iterator with the HTTP API alleviated the sharp increases in memory usage. It also made the response much faster (before it felt like to took 1-2s, now it feels instant). ## Additional Changes In the process I also noticed that we have two functions for getting block roots: - `BeaconChain::block_root_at_slot`: returns `None` for a skip slot. - `BeaconChain::root_at_slot`: returns the previous root for a skip slot. I unified these two functions into `block_root_at_slot` and added the `WhenSlotSkipped` enum. Now, the caller must be explicit about the skip-slot behaviour when requesting a root. Additionally, I replaced `vec![]` with `Vec::with_capacity` in `store::chunked_vector::range_query`. I stumbled across this whilst debugging and made this modification to see what effect it would have (not much). It seems like a decent change to keep around, but I'm not concerned either way. Also, `BeaconChain::get_ancestor_block_root` is unused, so I got rid of it 🗑️. ## Additional Info I haven't also done the same for state roots here. Whilst it's possible and a good idea, it's more work since the fwds iterators are presently block-roots-specific. Whilst there's a few places a reverse iteration of state roots could be triggered (e.g., attestation production, HTTP API), they're no where near as common as the `check_peer_relevance` call. As such, I think we should get this PR merged first, then come back for the state root iters. I made an issue here https://github.com/sigp/lighthouse/issues/2377.	2021-05-31 04:18:20 +00:00
ethDreamer	ba55e140ae	Enable Compatibility with Windows (#2333 ) ## Issue Addressed Windows incompatibility. ## Proposed Changes On windows, lighthouse needs to default to STDIN as tty doesn't exist. Also Windows uses ACLs for file permissions. So to mirror chmod 600, we will remove every entry in a file's ACL and add only a single SID that is an alias for the file owner. Beyond that, there were several changes made to different unit tests because windows has slightly different error messages as well as frustrating nuances around killing a process :/ ## Additional Info Tested on my Windows VM and it appears to work, also compiled & tested on Linux with these changes. Permissions look correct on both platforms now. Just waiting for my validator to activate on Prater so I can test running full validator client on windows. Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com> Co-authored-by: Michael Sproul <micsproul@gmail.com>	2021-05-19 23:05:16 +00:00
Paul Hauner	0df7be1814	Add check for aggregate target (#2306 ) ## Issue Addressed NA ## Proposed Changes - Ensure that the [target consistency check](`b356f52c5c`) is always performed on aggregates. - Add a regression test. ## Additional Info NA	2021-04-13 00:24:39 +00:00
Paul Hauner	015ab7d0a7	Optimize validator duties (#2243 ) ## Issue Addressed Closes #2052 ## Proposed Changes - Refactor the attester/proposer duties endpoints in the BN - Performance improvements - Fixes some potential inconsistencies with the dependent root fields. - Removes `http_api::beacon_proposer_cache` and just uses the one on the `BeaconChain` instead. - Move the code for the proposer/attester duties endpoints into separate files, for readability. - Refactor the `DutiesService` in the VC - Required to reduce the delay on broadcasting new blocks. - Gets rid of the `ValidatorDuty` shim struct that came about when we adopted the standard API. - Separate block/attestation duty tasks so that they don't block each other when one is slow. - In the VC, use `PublicKeyBytes` to represent validators instead of `PublicKey`. `PublicKey` is a legit crypto object whilst `PublicKeyBytes` is just a byte-array, it's much faster to clone/hash `PublicKeyBytes` and this change has had a significant impact on runtimes. - Unfortunately this has created lots of dust changes. - In the BN, store `PublicKeyBytes` in the `beacon_proposer_cache` and allow access to them. The HTTP API always sends `PublicKeyBytes` over the wire and the conversion from `PublicKey` -> `PublickeyBytes` is non-trivial, especially when queries have 100s/1000s of validators (like Pyrmont). - Add the `state_processing::state_advance` mod which dedups a lot of the "apply `n` skip slots to the state" code. - This also fixes a bug with some functions which were failing to include a state root as per [this comment](`072695284f/consensus/state_processing/src/state_advance.rs (L69-L74)`). I couldn't find any instance of this bug that resulted in anything more severe than keying a shuffling cache by the wrong block root. - Swap the VC block service to use `mpsc` from `tokio` instead of `futures`. This is consistent with the rest of the code base. ~~This PR reduces the size of the codebase 🎉~~ It used to reduce the size of the code base before I added more comments. ## Observations on Prymont - Proposer duties times down from peaks of 450ms to consistent <1ms. - Current epoch attester duties times down from >1s peaks to a consistent 20-30ms. - Block production down from +600ms to 100-200ms. ## Additional Info - ~~Blocked on #2241~~ - ~~Blocked on #2234~~ ## TODO - [x] ~~Refactor this into some smaller PRs?~~ Leaving this as-is for now. - [x] Address `per_slot_processing` roots. - [x] Investigate slow next epoch times. Not getting added to cache on block processing? - [x] Consider [this](`072695284f/beacon_node/store/src/hot_cold_store.rs (L811-L812)`) in the scenario of replacing the state roots Co-authored-by: pawan <pawandhananjay@gmail.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2021-03-17 05:09:57 +00:00
Michael Sproul	363f15f362	Use the database to persist the pubkey cache (#2234 ) ## Issue Addressed Closes #1787 ## Proposed Changes * Abstract the `ValidatorPubkeyCache` over a "backing" which is either a file (legacy), or the database. * Implement a migration from schema v2 to schema v3, whereby the contents of the cache file are copied to the DB, and then the file is deleted. The next release to include this change must be a minor version bump, and we will need to warn users of the inability to downgrade (this is our first DB schema change since mainnet genesis). * Move the schema migration code from the `store` crate into the `beacon_chain` crate so that it can access the datadir and the `ValidatorPubkeyCache`, etc. It gets injected back into the `store` via a closure (similar to what we do in fork choice).	2021-03-04 01:25:12 +00:00
Paul Hauner	88cc222204	Advance state to next slot after importing block (#2174 ) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the leading-edge of block processing to the trailing-edge. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~	2021-02-15 07:17:52 +00:00
Paul Hauner	b06559ae97	Disallow attestation production earlier than head (#2130 ) ## Issue Addressed The non-finality period on Pyrmont between epochs [`9114`](https://pyrmont.beaconcha.in/epoch/9114) and [`9182`](https://pyrmont.beaconcha.in/epoch/9182) was contributed to by all the `lighthouse_team` validators going down. The nodes saw excessive CPU and RAM usage, resulting in the system to kill the `lighthouse bn` process. The `Restart=on-failure` directive for `systemd` caused the process to bounce in ~10-30m intervals. Diagnosis with `heaptrack` showed that the `BeaconChain::produce_unaggregated_attestation` function was calling `store::beacon_state::get_full_state` and sometimes resulting in a tree hash cache allocation. These allocations were approximately the size of the hosts physical memory and still allocated when `lighthouse bn` was killed by the OS. There was no CPU analysis (e.g., `perf`), but the `BeaconChain::produce_unaggregated_attestation` is very CPU-heavy so it is reasonable to assume it is the cause of the excessive CPU usage, too. ## Proposed Changes `BeaconChain::produce_unaggregated_attestation` has two paths: 1. Fast path: attesting to the head slot or later. 2. Slow path: attesting to a slot earlier than the head block. Path (2) is the only path that calls `store::beacon_state::get_full_state`, therefore it is the path causing this excessive CPU/RAM usage. This PR removes the current functionality of path (2) and replaces it with a static error (`BeaconChainError::AttestingPriorToHead`). This change reduces the generality of `BeaconChain::produce_unaggregated_attestation` (and therefore [`/eth/v1/validator/attestation_data`](https://ethereum.github.io/eth2.0-APIs/#/Validator/produceAttestationData)), but I argue that this functionality is an edge-case and arguably a violation of the [Honest Validator spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md). It's possible that a validator goes back to a prior slot to "catch up" and submit some missed attestations. This change would prevent such behaviour, returning an error. My concerns with this catch-up behaviour is that it is: - Not specified as "honest validator" attesting behaviour. - Is behaviour that is risky for slashing (although, all validator clients should have slashing protection and will eventually fail if they do not). - It disguises clock-sync issues between a BN and VC. ## Additional Info It's likely feasible to implement path (2) if we implement some sort of caching mechanism. This would be a multi-week task and this PR gets the issue patched in the short term. I haven't created an issue to add path (2), instead I think we should implement it if we get user-demand.	2021-01-20 06:52:37 +00:00
Michael Sproul	c1ec386d18	Pass failed gossip blocks to the slasher (#2047 ) ## Issue Addressed Closes #2042 ## Proposed Changes Pass blocks that fail gossip verification to the slasher. Blocks that are successfully verified are not passed immediately, but will be passed as part of full block verification.	2020-12-04 05:03:30 +00:00
Michael Sproul	acd49d988d	Implement database temp states to reduce memory usage (#1798 ) ## Issue Addressed Closes #800 Closes #1713 ## Proposed Changes Implement the temporary state storage algorithm described in #800. Specifically: * Add `DBColumn::BeaconStateTemporary`, for storing 0-length temporary marker values. * Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully. * Add a garbage collection process to delete leftover temporary states on start-up. * Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784) ## Additional Info There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant. ### Race 1: Permanent state marked temporary EDIT: this has been fixed by the addition of a lock around the relevant critical section There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events: 1. Thread 1 checks if state `s` already exists, and seeing that it doesn't, prepares an atomic commit of `(s, s_temporary_flag)`. 2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and _deleting_ the temporary flag. 3. Thread 1 is (finally) scheduled again, and marks `s` as temporary with its transaction. 4. a) The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state `s` will fail... hope it isn't on the main chain! Alternatively (4b) happens... b) Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state `s` will disappear temporarily, but will come back once thread 1 finishes running. I _hope_ that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. [LMDB only allows one R/W transaction at a time](https://docs.rs/lmdb/0.8.0/lmdb/struct.Environment.html#method.begin_rw_txn). ### Race 2: Temporary state returned from `get_state` I don't think this race really matters, but in `load_hot_state`, if another thread stores a state between when we call `load_state_temporary_flag` and when we call `load_hot_state_summary`, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than _not_ having data). This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now.	2020-10-23 01:27:51 +00:00
Michael Sproul	703c33bdc7	Fix head tracker concurrency bugs (#1771 ) ## Issue Addressed Closes #1557 ## Proposed Changes Modify the pruning algorithm so that it mutates the head-tracker _before_ committing the database transaction to disk, and _only if_ all the heads to be removed are still present in the head-tracker (i.e. no concurrent mutations). In the process of writing and testing this I also had to make a few other changes: * Use internal mutability for all `BeaconChainHarness` functions (namely the RNG and the graffiti), in order to enable parallel calls (see testing section below). * Disable logging in harness tests unless the `test_logger` feature is turned on And chose to make some clean-ups: * Delete the `NullMigrator` * Remove type-based configuration for the migrator in favour of runtime config (simpler, less duplicated code) * Use the non-blocking migrator unless the blocking migrator is required. In the store tests we need the blocking migrator because some tests make asserts about the state of the DB after the migration has run. * Rename `validators_keypairs` -> `validator_keypairs` in the `BeaconChainHarness` ## Testing To confirm that the fix worked, I wrote a test using [Hiatus](https://crates.io/crates/hiatus), which can be found here: https://github.com/michaelsproul/lighthouse/tree/hiatus-issue-1557 That test can't be merged because it inserts random breakpoints everywhere, but if you check out that branch you can run the test with: ``` $ cd beacon_node/beacon_chain $ cargo test --release --test parallel_tests --features test_logger ``` It should pass, and the log output should show: ``` WARN Pruning deferred because of a concurrent mutation, message: this is expected only very rarely! ``` ## Additional Info This is a backwards-compatible change with no impact on consensus.	2020-10-19 05:58:39 +00:00
Paul Hauner	c4bd9c86e6	Add check for head/target consistency (#1702 ) ## Issue Addressed NA ## Proposed Changes Addresses an interesting DoS vector raised by @protolambda by verifying that the head and target are consistent when processing aggregate attestations. This check prevents us from loading very old target blocks and doing lots of work to skip them to the current slot. ## Additional Info NA	2020-10-03 10:08:06 +10:00
Michael Sproul	22aedda1be	Add database schema versioning (#1688 ) ## Issue Addressed Closes #673 ## Proposed Changes Store a schema version in the database so that future releases can check they're running against a compatible database version. This would also enable automatic migration on breaking database changes, but that's left as future work. The database config is also stored in the database so that the `slots_per_restore_point` value can be checked for consistency, which closes #673	2020-10-01 11:12:36 +10:00
Paul Hauner	cdec3cec18	Implement standard eth2.0 API (#1569 ) - Resolves #1550 - Resolves #824 - Resolves #825 - Resolves #1131 - Resolves #1411 - Resolves #1256 - Resolve #1177 - Includes the `ShufflingId` struct initially defined in #1492. That PR is now closed and the changes are included here, with significant bug fixes. - Implement the https://github.com/ethereum/eth2.0-APIs in a new `http_api` crate using `warp`. This replaces the `rest_api` crate. - Add a new `common/eth2` crate which provides a wrapper around `reqwest`, providing the HTTP client that is used by the validator client and for testing. This replaces the `common/remote_beacon_node` crate. - Create a `http_metrics` crate which is a dedicated server for Prometheus metrics (they are no longer served on the same port as the REST API). We now have flags for `--metrics`, `--metrics-address`, etc. - Allow the `subnet_id` to be an optional parameter for `VerifiedUnaggregatedAttestation::verify`. This means it does not need to be provided unnecessarily by the validator client. - Move `fn map_attestation_committee` in `mod beacon_chain::attestation_verification` to a new `fn with_committee_cache` on the `BeaconChain` so the same cache can be used for obtaining validator duties. - Add some other helpers to `BeaconChain` to assist with common API duties (e.g., `block_root_at_slot`, `head_beacon_block_root`). - Change the `NaiveAggregationPool` so it can index attestations by `hash_tree_root(attestation.data)`. This is a requirement of the API. - Add functions to `BeaconChainHarness` to allow it to create slashings and exits. - Allow for `eth1::Eth1NetworkId` to go to/from a `String`. - Add functions to the `OperationPool` to allow getting all objects in the pool. - Add function to `BeaconState` to check if a committee cache is initialized. - Fix bug where `seconds_per_eth1_block` was not transferring over from `YamlConfig` to `ChainSpec`. - Add the `deposit_contract_address` to `YamlConfig` and `ChainSpec`. We needed to be able to return it in an API response. - Change some uses of serde `serialize_with` and `deserialize_with` to a single use of `with` (code quality). - Impl `Display` and `FromStr` for several BLS fields. - Check for clock discrepancy when VC polls BN for sync state (with +/- 1 slot tolerance). This is not intended to be comprehensive, it was just easy to do. - See #1434 for a per-endpoint overview. - Seeking clarity here: https://github.com/ethereum/eth2.0-APIs/issues/75 - [x] Add docs for prom port to close #1256 - [x] Follow up on this #1177 - [x] ~~Follow up with #1424~~ Will fix in future PR. - [x] Follow up with #1411 - [x] ~~Follow up with #1260~~ Will fix in future PR. - [x] Add quotes to all integers. - [x] Remove `rest_types` - [x] Address missing beacon block error. (#1629) - [x] ~~Add tests for lighthouse/peers endpoints~~ Wontfix - [x] ~~Follow up with validator status proposal~~ Tracked in #1434 - [x] Unify graffiti structs - [x] ~~Start server when waiting for genesis?~~ Will fix in future PR. - [x] TODO in http_api tests - [x] Move lighthouse endpoints off /eth/v1 - [x] Update docs to link to standard - ~~Blocked on #1586~~ Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2020-10-01 11:12:36 +10:00
Paul Hauner	1ef4f0ea12	Add gossip conditions from spec v0.12.3 (#1667 ) ## Issue Addressed NA ## Proposed Changes There are four new conditions introduced in v0.12.3: 1. _[REJECT]_ The attestation's epoch matches its target -- i.e. `attestation.data.target.epoch == compute_epoch_at_slot(attestation.data.slot)` 1. _[REJECT]_ The attestation's target block is an ancestor of the block named in the LMD vote -- i.e. `get_ancestor(store, attestation.data.beacon_block_root, compute_start_slot_at_epoch(attestation.data.target.epoch)) == attestation.data.target.root` 1. _[REJECT]_ The committee index is within the expected range -- i.e. `data.index < get_committee_count_per_slot(state, data.target.epoch)`. 1. _[REJECT]_ The number of aggregation bits matches the committee size -- i.e. `len(attestation.aggregation_bits) == len(get_beacon_committee(state, data.slot, data.index))`. This PR implements new logic to suit (1) and (2). Tests are added for (3) and (4), although they were already implicitly enforced. ## Additional Info - There's a bit of edge-case with target root verification that I raised here: https://github.com/ethereum/eth2.0-specs/pull/2001#issuecomment-699246659 - I've had to add an `--ignore` to `cargo audit` to get CI to pass. See https://github.com/sigp/lighthouse/issues/1669	2020-09-27 20:59:40 +00:00
Paul Hauner	bd39cc8e26	Apply hotfix for inconsistent head (#1639 ) ## Issue Addressed - Resolves #1616 ## Proposed Changes If we look at the function which persists fork choice and the canonical head to disk: `1db8daae0c/beacon_node/beacon_chain/src/beacon_chain.rs (L234-L280)` There is a race-condition which might cause the canonical head and fork choice values to be out-of-sync. I believe this is the cause of #1616. I managed to recreate the issue and produce a database that was unable to sync under the `master` branch but able to sync with this branch. These new changes solve the issue by ignoring the persisted `canonical_head_block_root` value and instead getting fork choice to generate it. This ensures that the canonical head is in-sync with fork choice. ## Additional Info This is hotfix method that leaves some crusty code hanging around. Once this PR is merged (to satisfy the v0.2.x users) we should later update and merge #1638 so we can have a clean fix for the v0.3.x versions.	2020-09-22 02:06:10 +00:00
Adam Szkoda	d9f4819fe0	Alternative (to BeaconChainHarness) BeaconChain testing API (#1380 ) The PR: * Adds the ability to generate a crucial test scenario that isn't possible with `BeaconChainHarness` (i.e. two blocks occupying the same slot; previously forks necessitated skipping slots): ![image](https://user-images.githubusercontent.com/165678/88195404-4bce3580-cc40-11ea-8c08-b48d2e1d5959.png) * New testing API: Instead of repeatedly calling add_block(), you generate a sorted `Vec<Slot>` and leave it up to the framework to generate blocks at those slots. * Jumping backwards to an earlier epoch is a hard error, so that tests necessarily generate blocks in a epoch-by-epoch manner. * Configures the test logger so that output is printed on the console in case a test fails. The logger also plays well with `--nocapture`, contrary to the existing testing framework * Rewrites existing fork pruning tests to use the new API * Adds a tests that triggers finalization at a non epoch boundary slot * Renamed `BeaconChainYoke` to `BeaconChainTestingRig` because the former has been too confusing * Fixed multiple tests (e.g. `block_production_different_shuffling_long`, `delete_blocks_and_states`, `shuffling_compatible_simple_fork`) that relied on a weird (and accidental) feature of the old `BeaconChainHarness` that attestations aren't produced for epochs earlier than the current one, thus masking potential bugs in test cases. Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2020-08-26 09:24:55 +00:00
Michael Sproul	4763f03dcc	Fix bug in database pruning (#1564 ) ## Issue Addressed Closes #1488 ## Proposed Changes * Prevent the pruning algorithm from over-eagerly deleting states at skipped slots when they are shared with the canonical chain. * Add `debug` logging to the pruning algorithm so we have so better chance of debugging future issues from logs. * Modify the handling of the "finalized state" in the beacon chain, so that it's always the state at the first slot of the finalized epoch (previously it was the state at the finalized block). This gives database pruning a clearer and cleaner view of things, and will marginally impact the pruning of the op pool, observed proposers, etc (in ways that are safe as far as I can tell). * Remove duplicated `RevertedFinalizedEpoch` check from `after_finalization` * Delete useless and unused `max_finality_distance` * Add tests that exercise pruning with shared states at skip slots * Delete unnecessary `block_strategy` argument from `add_blocks` and friends in the test harness (will likely conflict with #1380 slightly, sorry @adaszko -- but we can fix that) * Bonus: add a `BeaconChain::with_head` method. I didn't end up needing it, but it turned out quite nice, so I figured we could keep it? ## Additional Info Any users who have experienced pruning errors on Medalla will need to resync after upgrading to a release including this change. This should end unbounded `chain_db` growth! 🎉	2020-08-26 00:01:06 +00:00
Paul Hauner	619ad106cf	Restrict fork choice getters to finalized blocks (#1475 ) ## Issue Addressed - Resolves #1451 ## Proposed Changes - Restricts the `contains_block` and `contains_block` so they only indicate a block is present if it descends from the finalized root. This helps to ensure that fork choice never points to a block that has been pruned from the database. - Resolves #1451 - Before importing a block, double-check that its parent is known and a descendant of the finalized root. - Split a big, monolithic block verification test into smaller tests. ## Additional Notes I suspect there would be a craftier way to do the `is_descendant_of_finalized` check, but we're a bit tight on time now and we can optimize later if it starts showing in benches. ## TODO - [x] Tests	2020-08-14 06:36:38 +00:00
Adam Szkoda	8a1a4051cf	Fix a bug in fork pruning (#1507 ) Extracted from https://github.com/sigp/lighthouse/pull/1380 because merging #1380 proves to be contentious. Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2020-08-12 07:00:00 +00:00
Paul Hauner	b73c497be2	Support multiple BLS implementations (#1335 ) ## Issue Addressed NA ## Proposed Changes - Refactor the `bls` crate to support multiple BLS "backends" (e.g., milagro, blst, etc). - Removes some duplicate, unused code in `common/rest_types/src/validator.rs`. - Removes the old "upgrade legacy keypairs" functionality (these were unencrypted keys that haven't been supported for a few testnets, no one should be using them anymore). ## Additional Info Most of the files changed are just inconsequential changes to function names. ## TODO - [x] Optimization levels - [x] Infinity point: https://github.com/supranational/blst/issues/11 - [x] Ensure milagro and blst are tested via CI - [x] What to do with unsafe code? - [x] Test infinity point in signature sets	2020-07-25 02:03:18 +00:00
blacktemplar	23a8f31f83	Fix clippy warnings (#1385 ) ## Issue Addressed NA ## Proposed Changes Fixes most clippy warnings and ignores the rest of them, see issue #1388.	2020-07-23 14:18:00 +00:00
Adam Szkoda	c7f47af9fb	Harden the freezing procedure against failures (#1323 ) * Enable logging in tests * Migrate states to the freezer atomically	2020-07-03 09:47:31 +10:00
Michael Sproul	305724770d	Bump all spec tags to v0.12.1 (#1275 )	2020-06-19 11:18:27 +10:00
Michael Sproul	bcb6afa0aa	Process exits and slashings off the network (#1253 ) * Process exits and slashings off the network * Fix rest_api tests * Add op verification tests * Add tests for pruning of slashings in the op pool * Address Paul's review comments	2020-06-18 21:06:34 +10:00
Pawan Dhananjay	3199b1a6f2	Use all attestation subnets (#1257 ) * Update `milagro_bls` to new release (#1183) * Update milagro_bls to new release Signed-off-by: Kirk Baird <baird.k@outlook.com> * Tidy up fake cryptos Signed-off-by: Kirk Baird <baird.k@outlook.com> * move SecretHash to bls and put plaintext back Signed-off-by: Kirk Baird <baird.k@outlook.com> * Update v0.12.0 to v0.12.1 * Add compute_subnet_for_attestation * Replace CommitteeIndex topic with Attestation * Fix warnings * Fix attestation service tests * fmt * Appease clippy * return error from validator_subscriptions * move state out of loop * Fix early break on error * Get state from slot clock * Fix beacon state in attestation tests * Add failing test for lookahead > 1 * Minor change * Address some review comments * Add subnet verification to beacon chain * Move subnet verification to processor * Pass committee_count_at_slot to ValidatorDuty and ValidatorSubscription * Pass subnet id for publishing attestations * Fix attestation service tests * Fix more tests * Fix fork choice test * Remove unused code * Remove more unused and expensive code Co-authored-by: Kirk Baird <baird.k@outlook.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io> Co-authored-by: Age Manning <Age@AgeManning.com> Co-authored-by: Paul Hauner <paul@paulhauner.com>	2020-06-18 19:11:03 +10:00
Michael Sproul	e6f97bf466	Merge remote-tracking branch 'origin/master' into spec-v0.12	2020-06-17 12:34:11 +10:00
Paul Hauner	764cb2d32a	v0.12 fork choice update (#1229 ) * Incomplete scraps * Add progress on new fork choice impl * Further progress * First complete compiling version * Remove chain reference * Add new lmd_ghost crate * Start integrating into beacon chain * Update `milagro_bls` to new release (#1183) * Update milagro_bls to new release Signed-off-by: Kirk Baird <baird.k@outlook.com> * Tidy up fake cryptos Signed-off-by: Kirk Baird <baird.k@outlook.com> * move SecretHash to bls and put plaintext back Signed-off-by: Kirk Baird <baird.k@outlook.com> * Update state processing for v0.12 * Fix EF test runners for v0.12 * Fix some tests * Fix broken attestation verification test * More test fixes * Rough beacon chain impl working * Remove fork_choice_2 * Remove checkpoint manager * Half finished ssz impl * Add missed file * Add persistence * Tidy, fix some compile errors * Remove RwLock from ProtoArrayForkChoice * Fix store-based compile errors * Add comments, tidy * Move function out of ForkChoice struct * Start testing * More testing * Fix compile error * Tidy beacon_chain::fork_choice * Queue attestations from the current slot * Allow fork choice to handle prior-to-genesis start * Improve error granularity * Test attestation dequeuing * Process attestations during block * Store target root in fork choice * Move fork choice verification into new crate * Update tests * Consensus updates for v0.12 (#1228) * Update state processing for v0.12 * Fix EF test runners for v0.12 * Fix some tests * Fix broken attestation verification test * More test fixes * Fix typo found in review * Add `Block` struct to ProtoArray * Start fixing get_ancestor * Add rough progress on testing * Get fork choice tests working * Progress with testing * Fix partialeq impl * Move slot clock from fc_store * Improve testing * Add testing for best justified * Add clone back to SystemTimeSlotClock * Add balances test * Start adding balances cache again * Wire-in balances cache * Improve tests * Remove commented-out tests * Remove beacon_chain::ForkChoice * Rename crates * Update wider codebase to new fork_choice layout * Move advance_slot in test harness * Tidy ForkChoice::update_time * Fix verification tests * Fix compile error with iter::once * Fix fork choice tests * Ensure block attestations are processed * Fix failing beacon_chain tests * Add first invalid block check * Add finalized block check * Progress with testing, new store builder * Add fixes to get_ancestor * Fix old genesis justification test * Fix remaining fork choice tests * Change root iteration method * Move on_verified_block * Remove unused method * Start adding attestation verification tests * Add invalid ffg target test * Add target epoch test * Add queued attestation test * Remove old fork choice verification tests * Tidy, add test * Move fork choice lock drop * Rename BeaconForkChoiceStore * Add comments, tidy BeaconForkChoiceStore * Update metrics, rename fork_choice_store.rs * Remove genesis_block_root from ForkChoice * Tidy * Update fork_choice comments * Tidy, add comments * Tidy, simplify ForkChoice, fix compile issue * Tidy, removed dead file * Increase http request timeout * Fix failing rest_api test * Set HTTP timeout back to 5s * Apply fix to get_ancestor * Address Michael's comments * Fix typo * Revert "Fix broken attestation verification test" This reverts commit 722cdc903b12611de27916a57eeecfa3224f2279. Co-authored-by: Kirk Baird <baird.k@outlook.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2020-06-17 11:10:22 +10:00

1 2 3 4 5

206 Commits