lighthouse

Author	SHA1	Message	Date
Michael Sproul	affea585f4	Remove `CountUnrealized` (#4357 ) ## Issue Addressed Closes #4332 ## Proposed Changes Remove the `CountUnrealized` type, defaulting unrealized justification to _on_. This fixes the #4332 issue by ensuring that importing the same block to fork choice always results in the same outcome. Finalized sync speed may be slightly impacted by this change, but that is deemed an acceptable trade-off until the optimisation from #4118 is implemented. TODO: - [x] Also check that the block isn't a duplicate before importing	2023-06-16 06:44:31 +00:00
Paul Hauner	1f8c17b530	Fork choice modifications and cleanup (#3962 ) ## Issue Addressed NA ## Proposed Changes - Implements https://github.com/ethereum/consensus-specs/pull/3290/ - Bumps `ef-tests` to [v1.3.0-rc.4](https://github.com/ethereum/consensus-spec-tests/releases/tag/v1.3.0-rc.4). The `CountRealizedFull` concept has been removed and the `--count-unrealized-full` and `--count-unrealized` BN flags now do nothing but log a `WARN` when used. ## Database Migration Debt This PR removes the `best_justified_checkpoint` from fork choice. This field is persisted on-disk and the correct way to go about this would be to make a DB migration to remove the field. However, in this PR I've simply stubbed out the value with a junk value. I've taken this approach because if we're going to do a DB migration I'd love to remove the `Option`s around the justified and finalized checkpoints on `ProtoNode` whilst we're at it. Those options were added in #2822 which was included in Lighthouse v2.1.0. The options were only put there to handle the migration and they've been set to `Some` ever since v2.1.0. There's no reason to keep them as options anymore. I started adding the DB migration to this branch but I started to feel like I was bloating this rather critical PR with nice-to-haves. I've kept the partially-complete migration [over in my repo](https://github.com/paulhauner/lighthouse/tree/fc-pr-18-migration) so we can pick it up after this PR is merged.	2023-03-21 07:34:41 +00:00
Michael Sproul	775d222299	Enable proposer boost re-orging (#2860 ) ## Proposed Changes With proposer boosting implemented (#2822) we have an opportunity to re-org out late blocks. This PR adds three flags to the BN to control this behaviour: * `--disable-proposer-reorgs`: turn aggressive re-orging off (it's on by default). * `--proposer-reorg-threshold N`: attempt to orphan blocks with less than N% of the committee vote. If this parameter isn't set then N defaults to 20% when the feature is enabled. * `--proposer-reorg-epochs-since-finalization N`: only attempt to re-org late blocks when the number of epochs since finalization is less than or equal to N. The default is 2 epochs, meaning re-orgs will only be attempted when the chain is finalizing optimally. For safety Lighthouse will only attempt a re-org under very specific conditions: 1. The block being proposed is 1 slot after the canonical head, and the canonical head is 1 slot after its parent. i.e. at slot `n + 1` rather than building on the block from slot `n` we build on the block from slot `n - 1`. 2. The current canonical head received less than N% of the committee vote. N should be set depending on the proposer boost fraction itself, the fraction of the network that is believed to be applying it, and the size of the largest entity that could be hoarding votes. 3. The current canonical head arrived after the attestation deadline from our perspective. This condition was only added to support suppression of forkchoiceUpdated messages, but makes intuitive sense. 4. The block is being proposed in the first 2 seconds of the slot. This gives it time to propagate and receive the proposer boost. ## Additional Info For the initial idea and background, see: https://github.com/ethereum/consensus-specs/pull/2353#issuecomment-950238004 There is also a specification for this feature here: https://github.com/ethereum/consensus-specs/pull/3034 Co-authored-by: Michael Sproul <micsproul@gmail.com> Co-authored-by: pawan <pawandhananjay@gmail.com>	2022-12-13 09:57:26 +00:00
realbigsean	20ebf1f3c1	Realized unrealized experimentation (#3322 ) ## Issue Addressed Add a flag that optionally enables unrealized vote tracking. Would like to test out on testnets and benchmark differences in methods of vote tracking. This PR includes a DB schema upgrade to enable to new vote tracking style. Co-authored-by: realbigsean <sean@sigmaprime.io> Co-authored-by: Paul Hauner <paul@paulhauner.com> Co-authored-by: sean <seananderson33@gmail.com> Co-authored-by: Mac L <mjladson@pm.me>	2022-07-25 23:53:26 +00:00
Paul Hauner	be4e261e74	Use async code when interacting with EL (#3244 ) ## Overview This rather extensive PR achieves two primary goals: 1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state. 2. Refactors fork choice, block production and block processing to `async` functions. Additionally, it achieves: - Concurrent forkchoice updates to the EL and cache pruning after a new head is selected. - Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production. - Concurrent per-block-processing and execution payload verification during block processing. - The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?): - I had to do this to deal with sending blocks into spawned tasks. - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones. - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap. - Avoids cloning all the blocks in every chain segment during sync. - It also has the potential to clean up our code where we need to pass an owned block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough 😅) - The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs. For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273 ## Changes to `canonical_head` and `fork_choice` Previously, the `BeaconChain` had two separate fields: ``` canonical_head: RwLock<Snapshot>, fork_choice: RwLock<BeaconForkChoice> ``` Now, we have grouped these values under a single struct: ``` canonical_head: CanonicalHead { cached_head: RwLock<Arc<Snapshot>>, fork_choice: RwLock<BeaconForkChoice> } ``` Apart from ergonomics, the only actual change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously. ## Breaking Changes ### The `state` (root) field in the `finalized_checkpoint` SSE event Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event: 1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`. 4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots. Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](`de2b2801c8/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java (L171-L182)`) it uses [`getStateRootFromBlockRoot`](`de2b2801c8/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java (L336-L341)`) which uses (1). I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku. ## Notes for Reviewers I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct. I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking". I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run at least once means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows when it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it. I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around. Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2. You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests: - Changing tests to be `tokio::async` tests. - Adding `.await` to fork choice, block processing and block production functions. - Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`. - Wrapping `SignedBeaconBlock` in an `Arc`. - In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant. I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic. Co-authored-by: Mac L <mjladson@pm.me>	2022-07-03 05:36:50 +00:00
Michael Sproul	bcdd960ab1	Separate execution payloads in the DB (#3157 ) ## Proposed Changes Reduce post-merge disk usage by not storing finalized execution payloads in Lighthouse's database. ⚠️ This is achieved in a backwards-incompatible way for networks that have already merged ⚠️. Kiln users and shadow fork enjoyers will be unable to downgrade after running the code from this PR. The upgrade migration may take several minutes to run, and can't be aborted after it begins. The main changes are: - New column in the database called `ExecPayload`, keyed by beacon block root. - The `BeaconBlock` column now stores blinded blocks only. - Lots of places that previously used full blocks now use blinded blocks, e.g. analytics APIs, block replay in the DB, etc. - On finalization: - `prune_abanonded_forks` deletes non-canonical payloads whilst deleting non-canonical blocks. - `migrate_db` deletes finalized canonical payloads whilst deleting finalized states. - Conversions between blinded and full blocks are implemented in a compositional way, duplicating some work from Sean's PR #3134. - The execution layer has a new `get_payload_by_block_hash` method that reconstructs a payload using the EE's `eth_getBlockByHash` call. - I've tested manually that it works on Kiln, using Geth and Nethermind. - This isn't necessarily the most efficient method, and new engine APIs are being discussed to improve this: https://github.com/ethereum/execution-apis/pull/146. - We're depending on the `ethers` master branch, due to lots of recent changes. We're also using a workaround for https://github.com/gakonst/ethers-rs/issues/1134. - Payload reconstruction is used in the HTTP API via `BeaconChain::get_block`, which is now `async`. Due to the `async` fn, the `blocking_json` wrapper has been removed. - Payload reconstruction is used in network RPC to serve blocks-by-{root,range} responses. Here the `async` adjustment is messier, although I think I've managed to come up with a reasonable compromise: the handlers take the `SendOnDrop` by value so that they can drop it on _task completion_ (after the `fn` returns). Still, this is introducing disk reads onto core executor threads, which may have a negative performance impact (thoughts appreciated). ## Additional Info - [x] For performance it would be great to remove the cloning of full blocks when converting them to blinded blocks to write to disk. I'm going to experiment with a `put_block` API that takes the block by value, breaks it into a blinded block and a payload, stores the blinded block, and then re-assembles the full block for the caller. - [x] We should measure the latency of blocks-by-root and blocks-by-range responses. - [x] We should add integration tests that stress the payload reconstruction (basic tests done, issue for more extensive tests: https://github.com/sigp/lighthouse/issues/3159) - [x] We should (manually) test the schema v9 migration from several prior versions, particularly as blocks have changed on disk and some migrations rely on being able to load blocks. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2022-05-12 00:42:17 +00:00
Paul Hauner	27e83b888c	Retrospective invalidation of exec. payloads for opt. sync (#2837 ) ## Issue Addressed NA ## Proposed Changes Adds the functionality to allow blocks to be validated/invalidated after their import as per the [optimistic sync spec](https://github.com/ethereum/consensus-specs/blob/dev/sync/optimistic.md#how-to-optimistically-import-blocks). This means: - Updating `ProtoArray` to allow flipping the `execution_status` of ancestors/descendants based on payload validity updates. - Creating separation between `execution_layer` and the `beacon_chain` by creating a `PayloadStatus` struct. - Refactoring how the `execution_layer` selects a `PayloadStatus` from the multiple statuses returned from multiple EEs. - Adding testing framework for optimistic imports. - Add `ExecutionBlockHash(Hash256)` new-type struct to avoid confusion between beacon block roots and execution payload hashes. - Add `merge` to [`FORKS`](`c3a793fd73/Makefile (L17)`) in the `Makefile` to ensure we test the beacon chain with merge settings. - Fix some tests here that were failing due to a missing execution layer. ## TODO - [ ] Balance tests Co-authored-by: Mark Mackey <mark@sigmaprime.io>	2022-02-28 22:07:48 +00:00
Michael Sproul	5e1f8a8480	Update to Rust 1.59 and 2021 edition (#3038 ) ## Proposed Changes Lots of lint updates related to `flat_map`, `unwrap_or_else` and string patterns. I did a little more creative refactoring in the op pool, but otherwise followed Clippy's suggestions. ## Additional Info We need this PR to unblock CI.	2022-02-25 00:10:17 +00:00
realbigsean	b22ac95d7f	v1.1.6 Fork Choice changes (#2822 ) ## Issue Addressed Resolves: https://github.com/sigp/lighthouse/issues/2741 Includes: https://github.com/sigp/lighthouse/pull/2853 so that we can get ssz static tests passing here on v1.1.6. If we want to merge that first, we can make this diff slightly smaller ## Proposed Changes - Changes the `justified_epoch` and `finalized_epoch` in the `ProtoArrayNode` each to an `Option<Checkpoint>`. The `Option` is necessary only for the migration, so not ideal. But does allow us to add a default logic to `None` on these fields during the database migration. - Adds a database migration from a legacy fork choice struct to the new one, search for all necessary block roots in fork choice by iterating through blocks in the db. - updates related to https://github.com/ethereum/consensus-specs/pull/2727 - We will have to update the persisted forkchoice to make sure the justified checkpoint stored is correct according to the updated fork choice logic. This boils down to setting the forkchoice store's justified checkpoint to the justified checkpoint of the block that advanced the finalized checkpoint to the current one. - AFAICT there's no migration steps necessary for the update to allow applying attestations from prior blocks, but would appreciate confirmation on that - I updated the consensus spec tests to v1.1.6 here, but they will fail until we also implement the proposer score boost updates. I confirmed that the previously failing scenario `new_finalized_slot_is_justified_checkpoint_ancestor` will now pass after the boost updates, but haven't confirmed _all_ tests will pass because I just quickly stubbed out the proposer boost test scenario formatting. - This PR now also includes proposer boosting https://github.com/ethereum/consensus-specs/pull/2730 ## Additional Info I realized checking justified and finalized roots in fork choice makes it more likely that we trigger this bug: https://github.com/ethereum/consensus-specs/pull/2727 It's possible the combination of justified checkpoint and finalized checkpoint in the forkchoice store is different from in any block in fork choice. So when trying to startup our store's justified checkpoint seems invalid to the rest of fork choice (but it should be valid). When this happens we get an `InvalidBestNode` error and fail to start up. So I'm including that bugfix in this branch. Todo: - [x] Fix fork choice tests - [x] Self review - [x] Add fix for https://github.com/ethereum/consensus-specs/pull/2727 - [x] Rebase onto Kintusgi - [x] Fix `num_active_validators` calculation as @michaelsproul pointed out - [x] Clean up db migrations Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-12-13 20:43:22 +00:00
Paul Hauner	cbd2201164	Fixes after rebasing Kintsugi onto unstable (#2799 ) * Fix fork choice after rebase * Remove paulhauner warp dep * Fix fork choice test compile errors * Assume fork choice payloads are valid * Add comment * Ignore new tests * Fix error in test skipping	2021-12-02 14:26:55 +11:00
Paul Hauner	6dde12f311	[Merge] Optimistic Sync: Stage 1 (#2686 ) * Add payload verification status to fork choice * Pass payload verification status to import_block * Add valid back-propagation * Add head safety status latch to API * Remove ExecutionLayerStatus * Add execution info to client notifier * Update notifier logs * Change use of "hash" to refer to beacon block * Shutdown on invalid finalized block * Tidy, add comments * Fix failing FC tests * Allow blocks with unsafe head * Fix forkchoiceUpdate call on startup	2021-12-02 14:26:54 +11:00
Paul Hauner	d8623cfc4f	[Merge] Implement `execution_layer` (#2635 ) * Checkout serde_utils from rayonism * Make eth1::http functions pub * Add bones of execution_layer * Modify decoding * Expose Transaction, cargo fmt * Add executePayload * Add all minimal spec endpoints * Start adding json rpc wrapper * Finish custom JSON response handler * Switch to new rpc sending method * Add first test * Fix camelCase * Finish adding tests * Begin threading execution layer into BeaconChain * Fix clippy lints * Fix clippy lints * Thread execution layer into ClientBuilder * Add CLI flags * Add block processing methods to ExecutionLayer * Add block_on to execution_layer * Integrate execute_payload * Add extra_data field * Begin implementing payload handle * Send consensus valid/invalid messages * Fix minor type in task_executor * Call forkchoiceUpdated * Add search for TTD block * Thread TTD into execution layer * Allow producing block with execution payload * Add LRU cache for execution blocks * Remove duplicate 0x on ssz_types serialization * Add tests for block getter methods * Add basic block generator impl * Add is_valid_terminal_block to EL * Verify merge block in block_verification * Partially implement --terminal-block-hash-override * Add terminal_block_hash to ChainSpec * Remove Option from terminal_block_hash in EL * Revert merge changes to consensus/fork_choice * Remove commented-out code * Add bones for handling RPC methods on test server * Add first ExecutionLayer tests * Add testing for finding terminal block * Prevent infinite loops * Add insert_merge_block to block gen * Add block gen test for pos blocks * Start adding payloads to block gen * Fix clippy lints * Add execution payload to block gen * Add execute_payload to block_gen * Refactor block gen * Add all routes to mock server * Use Uint256 for base_fee_per_gas * Add working execution chain build * Remove unused var * Revert "Use Uint256 for base_fee_per_gas" This reverts commit 6c88f19ac45db834dd4dbf7a3c6e7242c1c0f735. * Fix base_fee_for_gas Uint256 * Update execute payload handle * Improve testing, fix bugs * Fix default fee-recipient * Fix fee-recipient address (again) * Add check for terminal block, add comments, tidy * Apply suggestions from code review Co-authored-by: realbigsean <seananderson33@GMAIL.com> * Fix is_none on handle Drop * Remove commented-out tests Co-authored-by: realbigsean <seananderson33@GMAIL.com>	2021-12-02 14:26:51 +11:00
Paul Hauner	931daa40d7	Add fork choice EF tests (#2737 ) ## Issue Addressed Resolves #2545 ## Proposed Changes Adds the long-overdue EF tests for fork choice. Although we had pretty good coverage via other implementations that closely followed our approach, it is nonetheless important for us to implement these tests too. During testing I found that we were using a hard-coded `SAFE_SLOTS_TO_UPDATE_JUSTIFIED` value rather than one from the `ChainSpec`. This caused a failure during a minimal preset test. This doesn't represent a risk to mainnet or testnets, since the hard-coded value matched the mainnet preset. ## Failing Cases There is one failing case which is presently marked as `SkippedKnownFailure`: ``` case 4 ("new_finalized_slot_is_justified_checkpoint_ancestor") from /home/paul/development/lighthouse/testing/ef_tests/consensus-spec-tests/tests/minimal/phase0/fork_choice/on_block/pyspec_tests/new_finalized_slot_is_justified_checkpoint_ancestor failed with NotEqual: head check failed: Got Head { slot: Slot(40), root: 0x9183dbaed4191a862bd307d476e687277fc08469fc38618699863333487703e7 } \| Expected Head { slot: Slot(24), root: 0x105b49b51bf7103c182aa58860b039550a89c05a4675992e2af703bd02c84570 } ``` This failure is due to #2741. It's not a particularly high priority issue at the moment, so we fix it after merging this PR.	2021-11-08 07:29:04 +00:00
Michael Sproul	bf1667a904	Fix test warnings on Rust 1.56.0 (#2743 ) ## Issue Addressed Continuation of #2728, fix the fork choice tests for Rust 1.56.0 so that `unstable` is free of warnings. CI will be broken until this PR merges, because we strictly enforce the absence of warnings (even for tests)	2021-10-22 04:49:51 +00:00
Paul Hauner	e2d09bb8ac	Add `BeaconChainHarness::builder` (#2707 ) ## Issue Addressed NA ## Proposed Changes This PR is near-identical to https://github.com/sigp/lighthouse/pull/2652, however it is to be merged into `unstable` instead of `merge-f2f`. Please see that PR for reasoning. I'm making this duplicate PR to merge to `unstable` in an effort to shrink the diff between `unstable` and `merge-f2f` by doing smaller, lead-up PRs. ## Additional Info NA	2021-10-14 02:58:10 +00:00
Paul Hauner	be11437c27	Batch BLS verification for attestations (#2399 ) ## Issue Addressed NA ## Proposed Changes Adds the ability to verify batches of aggregated/unaggregated attestations from the network. When the `BeaconProcessor` finds there are messages in the aggregated or unaggregated attestation queues, it will first check the length of the queue: - `== 1` verify the attestation individually. - `>= 2` take up to 64 of those attestations and verify them in a batch. Notably, we only perform batch verification if the queue has a backlog. We don't apply any artificial delays to attestations to try and force them into batches. ### Batching Details To assist with implementing batches we modify `beacon_chain::attestation_verification` to have two distinct categories for attestations: - Indexed attestations: those which have passed initial validation and were valid enough for us to derive an `IndexedAttestation`. - Verified attestations: those attestations which were indexed and also passed signature verification. These are well-formed, interesting messages which were signed by validators. The batching functions accept `n` attestations and then return `n` attestation verification `Result`s, where those `Result`s can be any combination of `Ok` or `Err`. In other words, we attempt to verify as many attestations as possible and return specific per-attestation results so peer scores can be updated, if required. When we batch verify attestations, we first try to map all those attestations to indexed attestations. If any of those attestations were able to be indexed, we then perform batch BLS verification on those indexed attestations. If the batch verification succeeds, we convert them into verified attestations, disabling individual signature checking. If the batch fails, we convert to verified attestations with individual signature checking enabled. Ultimately, we optimistically try to do a batch verification of attestation signatures and fall-back to individual verification if it fails. This opens an attach vector for "poisoning" the attestations and causing us to waste a batch verification. I argue that peer scoring should do a good-enough job of defending against this and the typical-case gains massively outweigh the worst-case losses. ## Additional Info Before this PR, attestation verification took the attestations by value (instead of by reference). It turns out that this was unnecessary and, in my opinion, resulted in some undesirable ergonomics (e.g., we had to pass the attestation back in the `Err` variant to avoid clones). In this PR I've modified attestation verification so that it now takes a reference. I refactored the `beacon_chain/tests/attestation_verification.rs` tests so they use a builder-esque "tester" struct instead of a weird macro. It made it easier for me to test individual/batch with the same set of tests and I think it was a nice tidy-up. Notably, I did this last to try and make sure my new refactors to actual production code would pass under the existing test suite.	2021-09-22 08:49:41 +00:00
Michael Sproul	9667dc2f03	Implement checkpoint sync (#2244 ) ## Issue Addressed Closes #1891 Closes #1784 ## Proposed Changes Implement checkpoint sync for Lighthouse, enabling it to start from a weak subjectivity checkpoint. ## Additional Info - [x] Return unavailable status for out-of-range blocks requested by peers (#2561) - [x] Implement sync daemon for fetching historical blocks (#2561) - [x] Verify chain hashes (either in `historical_blocks.rs` or the calling module) - [x] Consistency check for initial block + state - [x] Fetch the initial state and block from a beacon node HTTP endpoint - [x] Don't crash fetching beacon states by slot from the API - [x] Background service for state reconstruction, triggered by CLI flag or API call. Considered out of scope for this PR: - Drop the requirement to provide the `--checkpoint-block` (this would require some pretty heavy refactoring of block verification) Co-authored-by: Diva M <divma@protonmail.com>	2021-09-22 00:37:28 +00:00
Michael Sproul	10945e0619	Revert bad blocks on missed fork (#2529 ) ## Issue Addressed Closes #2526 ## Proposed Changes If the head block fails to decode on start up, do two things: 1. Revert all blocks between the head and the most recent hard fork (to `fork_slot - 1`). 2. Reset fork choice so that it contains the new head, and all blocks back to the new head's finalized checkpoint. ## Additional Info I tweaked some of the beacon chain test harness stuff in order to make it generic enough to test with a non-zero slot clock on start-up. In the process I consolidated all the various `new_` methods into a single generic one which will hopefully serve all future uses 🤞	2021-08-30 06:41:31 +00:00
Michael Sproul	b4689e20c6	Altair consensus changes and refactors (#2279 ) ## Proposed Changes Implement the consensus changes necessary for the upcoming Altair hard fork. ## Additional Info This is quite a heavy refactor, with pivotal types like the `BeaconState` and `BeaconBlock` changing from structs to enums. This ripples through the whole codebase with field accesses changing to methods, e.g. `state.slot` => `state.slot()`. Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-07-09 06:15:32 +00:00
Paul Hauner	4c7bb4984c	Use the forwards iterator more often (#2376 ) ## Issue Addressed NA ## Primary Change When investigating memory usage, I noticed that retrieving a block from an early slot (e.g., slot 900) would cause a sharp increase in the memory footprint (from 400mb to 800mb+) which seemed to be ever-lasting. After some investigation, I found that the reverse iteration from the head back to that slot was the likely culprit. To counter this, I've switched the `BeaconChain::block_root_at_slot` to use the forwards iterator, instead of the reverse one. I also noticed that the networking stack is using `BeaconChain::root_at_slot` to check if a peer is relevant (`check_peer_relevance`). Perhaps the steep, seemingly-random-but-consistent increases in memory usage are caused by the use of this function. Using the forwards iterator with the HTTP API alleviated the sharp increases in memory usage. It also made the response much faster (before it felt like to took 1-2s, now it feels instant). ## Additional Changes In the process I also noticed that we have two functions for getting block roots: - `BeaconChain::block_root_at_slot`: returns `None` for a skip slot. - `BeaconChain::root_at_slot`: returns the previous root for a skip slot. I unified these two functions into `block_root_at_slot` and added the `WhenSlotSkipped` enum. Now, the caller must be explicit about the skip-slot behaviour when requesting a root. Additionally, I replaced `vec![]` with `Vec::with_capacity` in `store::chunked_vector::range_query`. I stumbled across this whilst debugging and made this modification to see what effect it would have (not much). It seems like a decent change to keep around, but I'm not concerned either way. Also, `BeaconChain::get_ancestor_block_root` is unused, so I got rid of it 🗑️. ## Additional Info I haven't also done the same for state roots here. Whilst it's possible and a good idea, it's more work since the fwds iterators are presently block-roots-specific. Whilst there's a few places a reverse iteration of state roots could be triggered (e.g., attestation production, HTTP API), they're no where near as common as the `check_peer_relevance` call. As such, I think we should get this PR merged first, then come back for the state root iters. I made an issue here https://github.com/sigp/lighthouse/issues/2377.	2021-05-31 04:18:20 +00:00
Paul Hauner	015ab7d0a7	Optimize validator duties (#2243 ) ## Issue Addressed Closes #2052 ## Proposed Changes - Refactor the attester/proposer duties endpoints in the BN - Performance improvements - Fixes some potential inconsistencies with the dependent root fields. - Removes `http_api::beacon_proposer_cache` and just uses the one on the `BeaconChain` instead. - Move the code for the proposer/attester duties endpoints into separate files, for readability. - Refactor the `DutiesService` in the VC - Required to reduce the delay on broadcasting new blocks. - Gets rid of the `ValidatorDuty` shim struct that came about when we adopted the standard API. - Separate block/attestation duty tasks so that they don't block each other when one is slow. - In the VC, use `PublicKeyBytes` to represent validators instead of `PublicKey`. `PublicKey` is a legit crypto object whilst `PublicKeyBytes` is just a byte-array, it's much faster to clone/hash `PublicKeyBytes` and this change has had a significant impact on runtimes. - Unfortunately this has created lots of dust changes. - In the BN, store `PublicKeyBytes` in the `beacon_proposer_cache` and allow access to them. The HTTP API always sends `PublicKeyBytes` over the wire and the conversion from `PublicKey` -> `PublickeyBytes` is non-trivial, especially when queries have 100s/1000s of validators (like Pyrmont). - Add the `state_processing::state_advance` mod which dedups a lot of the "apply `n` skip slots to the state" code. - This also fixes a bug with some functions which were failing to include a state root as per [this comment](`072695284f/consensus/state_processing/src/state_advance.rs (L69-L74)`). I couldn't find any instance of this bug that resulted in anything more severe than keying a shuffling cache by the wrong block root. - Swap the VC block service to use `mpsc` from `tokio` instead of `futures`. This is consistent with the rest of the code base. ~~This PR reduces the size of the codebase 🎉~~ It used to reduce the size of the code base before I added more comments. ## Observations on Prymont - Proposer duties times down from peaks of 450ms to consistent <1ms. - Current epoch attester duties times down from >1s peaks to a consistent 20-30ms. - Block production down from +600ms to 100-200ms. ## Additional Info - ~~Blocked on #2241~~ - ~~Blocked on #2234~~ ## TODO - [x] ~~Refactor this into some smaller PRs?~~ Leaving this as-is for now. - [x] Address `per_slot_processing` roots. - [x] Investigate slow next epoch times. Not getting added to cache on block processing? - [x] Consider [this](`072695284f/beacon_node/store/src/hot_cold_store.rs (L811-L812)`) in the scenario of replacing the state roots Co-authored-by: pawan <pawandhananjay@gmail.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2021-03-17 05:09:57 +00:00
realbigsean	e20f64b21a	Update to tokio 1.1 (#2172 ) ## Issue Addressed resolves #2129 resolves #2099 addresses some of #1712 unblocks #2076 unblocks #2153 ## Proposed Changes - Updates all the dependencies mentioned in #2129, except for web3. They haven't merged their tokio 1.0 update because they are waiting on some dependencies of their own. Since we only use web3 in tests, I think updating it in a separate issue is fine. If they are able to merge soon though, I can update in this PR. - Updates `tokio_util` to 0.6.2 and `bytes` to 1.0.1. - We haven't made a discv5 release since merging tokio 1.0 updates so I'm using a commit rather than release atm. Edit: I think we should merge an update of `tokio_util` to 0.6.2 into discv5 before this release because it has panic fixes in `DelayQueue` --> PR in discv5: https://github.com/sigp/discv5/pull/58 ## Additional Info tokio 1.0 changes that required some changes in lighthouse: - `interval.next().await.is_some()` -> `interval.tick().await` - `sleep` future is now `!Unpin` -> https://github.com/tokio-rs/tokio/issues/3028 - `try_recv` has been temporarily removed from `mpsc` -> https://github.com/tokio-rs/tokio/issues/3350 - stream features have moved to `tokio-stream` and `broadcast::Receiver::into_stream()` has been temporarily removed -> `https://github.com/tokio-rs/tokio/issues/2870 - I've copied over the `BroadcastStream` wrapper from this PR, but can update to use `tokio-stream` once it's merged https://github.com/tokio-rs/tokio/pull/3384 Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-02-10 23:29:49 +00:00
Michael Sproul	703c33bdc7	Fix head tracker concurrency bugs (#1771 ) ## Issue Addressed Closes #1557 ## Proposed Changes Modify the pruning algorithm so that it mutates the head-tracker _before_ committing the database transaction to disk, and _only if_ all the heads to be removed are still present in the head-tracker (i.e. no concurrent mutations). In the process of writing and testing this I also had to make a few other changes: * Use internal mutability for all `BeaconChainHarness` functions (namely the RNG and the graffiti), in order to enable parallel calls (see testing section below). * Disable logging in harness tests unless the `test_logger` feature is turned on And chose to make some clean-ups: * Delete the `NullMigrator` * Remove type-based configuration for the migrator in favour of runtime config (simpler, less duplicated code) * Use the non-blocking migrator unless the blocking migrator is required. In the store tests we need the blocking migrator because some tests make asserts about the state of the DB after the migration has run. * Rename `validators_keypairs` -> `validator_keypairs` in the `BeaconChainHarness` ## Testing To confirm that the fix worked, I wrote a test using [Hiatus](https://crates.io/crates/hiatus), which can be found here: https://github.com/michaelsproul/lighthouse/tree/hiatus-issue-1557 That test can't be merged because it inserts random breakpoints everywhere, but if you check out that branch you can run the test with: ``` $ cd beacon_node/beacon_chain $ cargo test --release --test parallel_tests --features test_logger ``` It should pass, and the log output should show: ``` WARN Pruning deferred because of a concurrent mutation, message: this is expected only very rarely! ``` ## Additional Info This is a backwards-compatible change with no impact on consensus.	2020-10-19 05:58:39 +00:00
realbigsean	255cc25623	Weak subjectivity start from genesis (#1675 ) This commit was edited by Paul H when rebasing from master to v0.3.0-staging. Solution 2 proposed here: https://github.com/sigp/lighthouse/issues/1435#issuecomment-692317639 - Adds an optional `--wss-checkpoint` flag that takes a string `root:epoch` - Verify that the given checkpoint exists in the chain, or that the the chain syncs through this checkpoint. If not, shutdown and prompt the user to purge state before restarting. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2020-10-03 10:00:28 +10:00
Paul Hauner	cdec3cec18	Implement standard eth2.0 API (#1569 ) - Resolves #1550 - Resolves #824 - Resolves #825 - Resolves #1131 - Resolves #1411 - Resolves #1256 - Resolve #1177 - Includes the `ShufflingId` struct initially defined in #1492. That PR is now closed and the changes are included here, with significant bug fixes. - Implement the https://github.com/ethereum/eth2.0-APIs in a new `http_api` crate using `warp`. This replaces the `rest_api` crate. - Add a new `common/eth2` crate which provides a wrapper around `reqwest`, providing the HTTP client that is used by the validator client and for testing. This replaces the `common/remote_beacon_node` crate. - Create a `http_metrics` crate which is a dedicated server for Prometheus metrics (they are no longer served on the same port as the REST API). We now have flags for `--metrics`, `--metrics-address`, etc. - Allow the `subnet_id` to be an optional parameter for `VerifiedUnaggregatedAttestation::verify`. This means it does not need to be provided unnecessarily by the validator client. - Move `fn map_attestation_committee` in `mod beacon_chain::attestation_verification` to a new `fn with_committee_cache` on the `BeaconChain` so the same cache can be used for obtaining validator duties. - Add some other helpers to `BeaconChain` to assist with common API duties (e.g., `block_root_at_slot`, `head_beacon_block_root`). - Change the `NaiveAggregationPool` so it can index attestations by `hash_tree_root(attestation.data)`. This is a requirement of the API. - Add functions to `BeaconChainHarness` to allow it to create slashings and exits. - Allow for `eth1::Eth1NetworkId` to go to/from a `String`. - Add functions to the `OperationPool` to allow getting all objects in the pool. - Add function to `BeaconState` to check if a committee cache is initialized. - Fix bug where `seconds_per_eth1_block` was not transferring over from `YamlConfig` to `ChainSpec`. - Add the `deposit_contract_address` to `YamlConfig` and `ChainSpec`. We needed to be able to return it in an API response. - Change some uses of serde `serialize_with` and `deserialize_with` to a single use of `with` (code quality). - Impl `Display` and `FromStr` for several BLS fields. - Check for clock discrepancy when VC polls BN for sync state (with +/- 1 slot tolerance). This is not intended to be comprehensive, it was just easy to do. - See #1434 for a per-endpoint overview. - Seeking clarity here: https://github.com/ethereum/eth2.0-APIs/issues/75 - [x] Add docs for prom port to close #1256 - [x] Follow up on this #1177 - [x] ~~Follow up with #1424~~ Will fix in future PR. - [x] Follow up with #1411 - [x] ~~Follow up with #1260~~ Will fix in future PR. - [x] Add quotes to all integers. - [x] Remove `rest_types` - [x] Address missing beacon block error. (#1629) - [x] ~~Add tests for lighthouse/peers endpoints~~ Wontfix - [x] ~~Follow up with validator status proposal~~ Tracked in #1434 - [x] Unify graffiti structs - [x] ~~Start server when waiting for genesis?~~ Will fix in future PR. - [x] TODO in http_api tests - [x] Move lighthouse endpoints off /eth/v1 - [x] Update docs to link to standard - ~~Blocked on #1586~~ Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2020-10-01 11:12:36 +10:00
Paul Hauner	a17f74896a	Fix bad assumption when checking finalized descendant (#1629 ) ## Issue Addressed - Resolves #1616 ## Proposed Changes Fixes a bug where we are unable to read the finalized block from fork choice. ## Detail I had made an assumption that the finalized block always has a parent root of `None`: `e5fc6bab48/consensus/fork_choice/src/fork_choice.rs (L749-L752)` This was a faulty assumption, we don't set parent roots to `None`. Instead we sometimes set parent indices to `None`, depending if this pruning condition is satisfied: `e5fc6bab48/consensus/proto_array/src/proto_array.rs (L229-L232)` The bug manifested itself like this: 1. We attempt to get the finalized block from fork choice 1. We try to check that the block is descendant of the finalized block (note: they're the same block). 1. We expect the parent root to be `None`, but it's actually the parent root of the finalized root. 1. We therefore end up checking if the parent of the finalized root is a descendant of itself. (note: it's an ancestor not a descendant). 1. We therefore declare that the finalized block is not a descendant of (or eq to) the finalized block. Bad. ## Additional Info In reflection, I made a poor assumption in the quest to obtain a probably negligible performance gain. The performance gain wasn't worth the risk and we got burnt.	2020-09-18 05:14:31 +00:00
Adam Szkoda	d9f4819fe0	Alternative (to BeaconChainHarness) BeaconChain testing API (#1380 ) The PR: * Adds the ability to generate a crucial test scenario that isn't possible with `BeaconChainHarness` (i.e. two blocks occupying the same slot; previously forks necessitated skipping slots): ![image](https://user-images.githubusercontent.com/165678/88195404-4bce3580-cc40-11ea-8c08-b48d2e1d5959.png) * New testing API: Instead of repeatedly calling add_block(), you generate a sorted `Vec<Slot>` and leave it up to the framework to generate blocks at those slots. * Jumping backwards to an earlier epoch is a hard error, so that tests necessarily generate blocks in a epoch-by-epoch manner. * Configures the test logger so that output is printed on the console in case a test fails. The logger also plays well with `--nocapture`, contrary to the existing testing framework * Rewrites existing fork pruning tests to use the new API * Adds a tests that triggers finalization at a non epoch boundary slot * Renamed `BeaconChainYoke` to `BeaconChainTestingRig` because the former has been too confusing * Fixed multiple tests (e.g. `block_production_different_shuffling_long`, `delete_blocks_and_states`, `shuffling_compatible_simple_fork`) that relied on a weird (and accidental) feature of the old `BeaconChainHarness` that attestations aren't produced for epochs earlier than the current one, thus masking potential bugs in test cases. Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2020-08-26 09:24:55 +00:00
Paul Hauner	ac89bb190a	Fix invalid attestation verification condition (#1321 ) * Fix bug with attestation target * Change comment wording	2020-07-01 12:45:34 +10:00
Pawan Dhananjay	3199b1a6f2	Use all attestation subnets (#1257 ) * Update `milagro_bls` to new release (#1183) * Update milagro_bls to new release Signed-off-by: Kirk Baird <baird.k@outlook.com> * Tidy up fake cryptos Signed-off-by: Kirk Baird <baird.k@outlook.com> * move SecretHash to bls and put plaintext back Signed-off-by: Kirk Baird <baird.k@outlook.com> * Update v0.12.0 to v0.12.1 * Add compute_subnet_for_attestation * Replace CommitteeIndex topic with Attestation * Fix warnings * Fix attestation service tests * fmt * Appease clippy * return error from validator_subscriptions * move state out of loop * Fix early break on error * Get state from slot clock * Fix beacon state in attestation tests * Add failing test for lookahead > 1 * Minor change * Address some review comments * Add subnet verification to beacon chain * Move subnet verification to processor * Pass committee_count_at_slot to ValidatorDuty and ValidatorSubscription * Pass subnet id for publishing attestations * Fix attestation service tests * Fix more tests * Fix fork choice test * Remove unused code * Remove more unused and expensive code Co-authored-by: Kirk Baird <baird.k@outlook.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io> Co-authored-by: Age Manning <Age@AgeManning.com> Co-authored-by: Paul Hauner <paul@paulhauner.com>	2020-06-18 19:11:03 +10:00
Michael Sproul	81c9fe3817	Apply store refactor to new fork choice	2020-06-17 15:20:44 +10:00
Paul Hauner	764cb2d32a	v0.12 fork choice update (#1229 ) * Incomplete scraps * Add progress on new fork choice impl * Further progress * First complete compiling version * Remove chain reference * Add new lmd_ghost crate * Start integrating into beacon chain * Update `milagro_bls` to new release (#1183) * Update milagro_bls to new release Signed-off-by: Kirk Baird <baird.k@outlook.com> * Tidy up fake cryptos Signed-off-by: Kirk Baird <baird.k@outlook.com> * move SecretHash to bls and put plaintext back Signed-off-by: Kirk Baird <baird.k@outlook.com> * Update state processing for v0.12 * Fix EF test runners for v0.12 * Fix some tests * Fix broken attestation verification test * More test fixes * Rough beacon chain impl working * Remove fork_choice_2 * Remove checkpoint manager * Half finished ssz impl * Add missed file * Add persistence * Tidy, fix some compile errors * Remove RwLock from ProtoArrayForkChoice * Fix store-based compile errors * Add comments, tidy * Move function out of ForkChoice struct * Start testing * More testing * Fix compile error * Tidy beacon_chain::fork_choice * Queue attestations from the current slot * Allow fork choice to handle prior-to-genesis start * Improve error granularity * Test attestation dequeuing * Process attestations during block * Store target root in fork choice * Move fork choice verification into new crate * Update tests * Consensus updates for v0.12 (#1228) * Update state processing for v0.12 * Fix EF test runners for v0.12 * Fix some tests * Fix broken attestation verification test * More test fixes * Fix typo found in review * Add `Block` struct to ProtoArray * Start fixing get_ancestor * Add rough progress on testing * Get fork choice tests working * Progress with testing * Fix partialeq impl * Move slot clock from fc_store * Improve testing * Add testing for best justified * Add clone back to SystemTimeSlotClock * Add balances test * Start adding balances cache again * Wire-in balances cache * Improve tests * Remove commented-out tests * Remove beacon_chain::ForkChoice * Rename crates * Update wider codebase to new fork_choice layout * Move advance_slot in test harness * Tidy ForkChoice::update_time * Fix verification tests * Fix compile error with iter::once * Fix fork choice tests * Ensure block attestations are processed * Fix failing beacon_chain tests * Add first invalid block check * Add finalized block check * Progress with testing, new store builder * Add fixes to get_ancestor * Fix old genesis justification test * Fix remaining fork choice tests * Change root iteration method * Move on_verified_block * Remove unused method * Start adding attestation verification tests * Add invalid ffg target test * Add target epoch test * Add queued attestation test * Remove old fork choice verification tests * Tidy, add test * Move fork choice lock drop * Rename BeaconForkChoiceStore * Add comments, tidy BeaconForkChoiceStore * Update metrics, rename fork_choice_store.rs * Remove genesis_block_root from ForkChoice * Tidy * Update fork_choice comments * Tidy, add comments * Tidy, simplify ForkChoice, fix compile issue * Tidy, removed dead file * Increase http request timeout * Fix failing rest_api test * Set HTTP timeout back to 5s * Apply fix to get_ancestor * Address Michael's comments * Fix typo * Revert "Fix broken attestation verification test" This reverts commit 722cdc903b12611de27916a57eeecfa3224f2279. Co-authored-by: Kirk Baird <baird.k@outlook.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2020-06-17 11:10:22 +10:00

31 Commits