## Proposed Changes
With proposer boosting implemented (#2822) we have an opportunity to re-org out late blocks.
This PR adds three flags to the BN to control this behaviour:
* `--disable-proposer-reorgs`: turn aggressive re-orging off (it's on by default).
* `--proposer-reorg-threshold N`: attempt to orphan blocks with less than N% of the committee vote. If this parameter isn't set then N defaults to 20% when the feature is enabled.
* `--proposer-reorg-epochs-since-finalization N`: only attempt to re-org late blocks when the number of epochs since finalization is less than or equal to N. The default is 2 epochs, meaning re-orgs will only be attempted when the chain is finalizing optimally.
For safety Lighthouse will only attempt a re-org under very specific conditions:
1. The block being proposed is 1 slot after the canonical head, and the canonical head is 1 slot after its parent. i.e. at slot `n + 1` rather than building on the block from slot `n` we build on the block from slot `n - 1`.
2. The current canonical head received less than N% of the committee vote. N should be set depending on the proposer boost fraction itself, the fraction of the network that is believed to be applying it, and the size of the largest entity that could be hoarding votes.
3. The current canonical head arrived after the attestation deadline from our perspective. This condition was only added to support suppression of forkchoiceUpdated messages, but makes intuitive sense.
4. The block is being proposed in the first 2 seconds of the slot. This gives it time to propagate and receive the proposer boost.
## Additional Info
For the initial idea and background, see: https://github.com/ethereum/consensus-specs/pull/2353#issuecomment-950238004
There is also a specification for this feature here: https://github.com/ethereum/consensus-specs/pull/3034
Co-authored-by: Michael Sproul <micsproul@gmail.com>
Co-authored-by: pawan <pawandhananjay@gmail.com>
## Issue Addressed
Implementing the light_client_gossip topics but I'm not there yet.
Which issue # does this PR address?
Partially #3651
## Proposed Changes
Add light client gossip topics.
Please list or describe the changes introduced by this PR.
I'm going to Implement light_client_finality_update and light_client_optimistic_update gossip topics. Currently I've attempted the former and I'm seeking feedback.
## Additional Info
I've only implemented the light_client_finality_update topic because I wanted to make sure I was on the correct path. Also checking that the gossiped LightClientFinalityUpdate is the same as the locally constructed one is not implemented because caching the updates will make this much easier. Could someone give me some feedback on this please?
Please provide any additional information. For example, future considerations
or information useful for reviewers.
Co-authored-by: GeemoCandama <104614073+GeemoCandama@users.noreply.github.com>
## Proposed Changes
Now that the Gnosis merge is scheduled, all users should have upgraded beyond Lighthouse v3.0.0. Accordingly we can delete schema migrations for versions prior to v3.0.0.
## Additional Info
I also deleted the state cache stuff I added in #3714 as it turned out to be useless for the light client proofs due to the one-slot offset.
## Issue Addressed
Closes https://github.com/sigp/lighthouse/issues/2327
## Proposed Changes
This is an extension of some ideas I implemented while working on `tree-states`:
- Cache the indexed attestations from blocks in the `ConsensusContext`. Previously we were re-computing them 3-4 times over.
- Clean up `import_block` by splitting each part into `import_block_XXX`.
- Move some stuff off hot paths, specifically:
- Relocate non-essential tasks that were running between receiving the payload verification status and priming the early attester cache. These tasks are moved after the cache priming:
- Attestation observation
- Validator monitor updates
- Slasher updates
- Updating the shuffling cache
- Fork choice attestation observation now happens at the end of block verification in parallel with payload verification (this seems to save 5-10ms).
- Payload verification now happens _before_ advancing the pre-state and writing it to disk! States were previously being written eagerly and adding ~20-30ms in front of verifying the execution payload. State catchup also sometimes takes ~500ms if we get a cache miss and need to rebuild the tree hash cache.
The remaining task that's taking substantial time (~20ms) is importing the block to fork choice. I _think_ this is because of pull-tips, and we should be able to optimise it out with a clever total active balance cache in the state (which would be computed in parallel with payload verification). I've decided to leave that for future work though. For now it can be observed via the new `beacon_block_processing_post_exec_pre_attestable_seconds` metric.
Co-authored-by: Michael Sproul <micsproul@gmail.com>
## Issue Addressed
#3704
## Proposed Changes
Adds is_syncing_finalized: bool parameter for block verification functions. Sets the payload_verification_status to Optimistic if is_syncing_finalized is true. Uses SyncState in NetworkGlobals in BeaconProcessor to retrieve the syncing status.
## Additional Info
I could implement FinalizedSignatureVerifiedBlock if you think it would be nicer.
## Issue Addressed
Part of https://github.com/sigp/lighthouse/issues/3651.
## Proposed Changes
Add a flag for enabling the light client server, which should be checked before gossip/RPC traffic is processed (e.g. https://github.com/sigp/lighthouse/pull/3693, https://github.com/sigp/lighthouse/pull/3711). The flag is available at runtime from `beacon_chain.config.enable_light_client_server`.
Additionally, a new method `BeaconChain::with_mutable_state_for_block` is added which I envisage being used for computing light client updates. Unfortunately its performance will be quite poor on average because it will only run quickly with access to the tree hash cache. Each slot the tree hash cache is only available for a brief window of time between the head block being processed and the state advance at 9s in the slot. When the state advance happens the cache is moved and mutated to get ready for the next slot, which makes it no longer useful for merkle proofs related to the head block. Rather than spend more time trying to optimise this I think we should continue prototyping with this code, and I'll make sure `tree-states` is ready to ship before we enable the light client server in prod (cf. https://github.com/sigp/lighthouse/pull/3206).
## Additional Info
I also fixed a bug in the implementation of `BeaconState::compute_merkle_proof` whereby the tree hash cache was moved with `.take()` but never put back with `.restore()`.
## Summary
The deposit cache now has the ability to finalize deposits. This will cause it to drop unneeded deposit logs and hashes in the deposit Merkle tree that are no longer required to construct deposit proofs. The cache is finalized whenever the latest finalized checkpoint has a new `Eth1Data` with all deposits imported.
This has three benefits:
1. Improves the speed of constructing Merkle proofs for deposits as we can just replay deposits since the last finalized checkpoint instead of all historical deposits when re-constructing the Merkle tree.
2. Significantly faster weak subjectivity sync as the deposit cache can be transferred to the newly syncing node in compressed form. The Merkle tree that stores `N` finalized deposits requires a maximum of `log2(N)` hashes. The newly syncing node then only needs to download deposits since the last finalized checkpoint to have a full tree.
3. Future proofing in preparation for [EIP-4444](https://eips.ethereum.org/EIPS/eip-4444) as execution nodes will no longer be required to store logs permanently so we won't always have all historical logs available to us.
## More Details
Image to illustrate how the deposit contract merkle tree evolves and finalizes along with the resulting `DepositTreeSnapshot`
![image](https://user-images.githubusercontent.com/37123614/151465302-5fc56284-8a69-4998-b20e-45db3934ac70.png)
## Other Considerations
I've changed the structure of the `SszDepositCache` so once you load & save your database from this version of lighthouse, you will no longer be able to load it from older versions.
Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>
## Issue Addressed
Fix a bug in block production that results in blocks with 0 attestations during the first slot of an epoch.
The bug is marked by debug logs of the form:
> DEBG Discarding attestation because of missing ancestor, block_root: 0x3cc00d9c9e0883b2d0db8606278f2b8423d4902f9a1ee619258b5b60590e64f8, pivot_slot: 4042591
It occurs when trying to look up the shuffling decision root for an attestation from a slot which is prior to fork choice's finalized block. This happens frequently when proposing in the first slot of the epoch where we have:
- `current_epoch == n`
- `attestation.data.target.epoch == n - 1`
- attestation shuffling epoch `== n - 3` (decision block being the last block of `n - 3`)
- `state.finalized_checkpoint.epoch == n - 2` (first block of `n - 2` is finalized)
Hence the shuffling decision slot is out of range of the fork choice backwards iterator _by a single slot_.
Unfortunately this bug was hidden when we weren't pruning fork choice, and then reintroduced in v2.5.1 when we fixed the pruning (https://github.com/sigp/lighthouse/releases/tag/v2.5.1). There's no way to turn that off or disable the filtering in our current release, so we need a new release to fix this issue.
Fortunately, it also does not occur on every epoch boundary because of the gradual pruning of fork choice every 256 blocks (~8 epochs):
01e84b71f5/consensus/proto_array/src/proto_array_fork_choice.rs (L16)01e84b71f5/consensus/proto_array/src/proto_array.rs (L713-L716)
So the probability of proposing a 0-attestation block given a proposal assignment is approximately `1/32 * 1/8 = 0.39%`.
## Proposed Changes
- Load the block's shuffling ID from fork choice and verify it against the expected shuffling ID of the head state. This code was initially written before we had settled on a representation of shuffling IDs, so I think it's a nice simplification to make use of them here rather than more ad-hoc logic that fundamentally does the same thing.
## Additional Info
Thanks to @moshe-blox for noticing this issue and bringing it to our attention.
## Issue Addressed
Closes https://github.com/sigp/lighthouse/issues/2371
## Proposed Changes
Backport some changes from `tree-states` that remove duplicated calculations of the `proposer_index`.
With this change the proposer index should be calculated only once for each block, and then plumbed through to every place it is required.
## Additional Info
In future I hope to add more data to the consensus context that is cached on a per-epoch basis, like the effective balances of validators and the base rewards.
There are some other changes to remove indexing in tests that were also useful for `tree-states` (the `tree-states` types don't implement `Index`).
## Issue Addressed
fixes lints from the last rust release
## Proposed Changes
Fix the lints, most of the lints by `clippy::question-mark` are false positives in the form of https://github.com/rust-lang/rust-clippy/issues/9518 so it's allowed for now
## Additional Info
## Issue Addressed
NA
## Proposed Changes
This PR removes duplicated block root computation.
Computing the `SignedBeaconBlock::canonical_root` has become more expensive since the merge as we need to compute the merke root of each transaction inside an `ExecutionPayload`.
Computing the root for [a mainnet block](https://beaconcha.in/slot/4704236) is taking ~10ms on my i7-8700K CPU @ 3.70GHz (no sha extensions). Given that our median seen-to-imported time for blocks is presently 300-400ms, removing a few duplicated block roots (~30ms) could represent an easy 10% improvement. When we consider that the seen-to-imported times include operations *after* the block has been placed in the early attester cache, we could expect the 30ms to be more significant WRT our seen-to-attestable times.
## Additional Info
NA
## Issue Addressed
NA
## Proposed Changes
Fixes an issue introduced in #3574 where I erroneously assumed that a `crossbeam_channel` multiple receiver queue was a *broadcast* queue. This is incorrect, each message will be received by *only one* receiver. The effect of this mistake is these logs:
```
Sep 20 06:56:17.001 INFO Synced slot: 4736079, block: 0xaa8a…180d, epoch: 148002, finalized_epoch: 148000, finalized_root: 0x2775…47f2, exec_hash: 0x2ca5…ffde (verified), peers: 6, service: slot_notifier
Sep 20 06:56:23.237 ERRO Unable to validate attestation error: CommitteeCacheWait(RecvError), peer_id: 16Uiu2HAm2Jnnj8868tb7hCta1rmkXUf5YjqUH1YPj35DCwNyeEzs, type: "aggregated", slot: Slot(4736047), beacon_block_root: 0x88d318534b1010e0ebd79aed60b6b6da1d70357d72b271c01adf55c2b46206c1
```
## Additional Info
NA
## Issue Addressed
NA
## Proposed Changes
I have observed scenarios on Goerli where Lighthouse was receiving attestations which reference the same, un-cached shuffling on multiple threads at the same time. Lighthouse was then loading the same state from database and determining the shuffling on multiple threads at the same time. This is unnecessary load on the disk and RAM.
This PR modifies the shuffling cache so that each entry can be either:
- A committee
- A promise for a committee (i.e., a `crossbeam_channel::Receiver`)
Now, in the scenario where we have thread A and thread B simultaneously requesting the same un-cached shuffling, we will have the following:
1. Thread A will take the write-lock on the shuffling cache, find that there's no cached committee and then create a "promise" (a `crossbeam_channel::Sender`) for a committee before dropping the write-lock.
1. Thread B will then be allowed to take the write-lock for the shuffling cache and find the promise created by thread A. It will block the current thread waiting for thread A to fulfill that promise.
1. Thread A will load the state from disk, obtain the shuffling, send it down the channel, insert the entry into the cache and then continue to verify the attestation.
1. Thread B will then receive the shuffling from the receiver, be un-blocked and then continue to verify the attestation.
In the case where thread A fails to generate the shuffling and drops the sender, the next time that specific shuffling is requested we will detect that the channel is disconnected and return a `None` entry for that shuffling. This will cause the shuffling to be re-calculated.
## Additional Info
NA
## Issue Addressed
Add a flag that can increase count unrealized strictness, defaults to false
## Proposed Changes
Please list or describe the changes introduced by this PR.
## Additional Info
Please provide any additional information. For example, future considerations
or information useful for reviewers.
Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: sean <seananderson33@gmail.com>
## Issue Addressed
NA
## Proposed Changes
Adds more `debug` logging to help troubleshoot invalid execution payload blocks. I was doing some of this recently and found it to be challenging.
With this PR we should be able to grep `Invalid execution payload` and get one-liners that will show the block, slot and details about the proposer.
I also changed the log in `process_invalid_execution_payload` since it was a little misleading; the `block_root` wasn't necessary the block which had an invalid payload.
## Additional Info
NA
## Issue Addressed
NA
## Proposed Changes
This PR is motivated by a recent consensus failure in Geth where it returned `INVALID` for an `VALID` block. Without this PR, the only way to recover is by re-syncing Lighthouse. Whilst ELs "shouldn't have consensus failures", in reality it's something that we can expect from time to time due to the complex nature of Ethereum. Being able to recover easily will help the network recover and EL devs to troubleshoot.
The risk introduced with this PR is that genuinely INVALID payloads get a "second chance" at being imported. I believe the DoS risk here is negligible since LH needs to be restarted in order to re-process the payload. Furthermore, there's no reason to think that a well-performing EL will accept a truly invalid payload the second-time-around.
## Additional Info
This implementation has the following intricacies:
1. Instead of just resetting *invalid* payloads to optimistic, we'll also reset *valid* payloads. This is an artifact of our existing implementation.
1. We will only reset payload statuses when we detect an invalid payload present in `proto_array`
- This helps save us from forgetting that all our blocks are valid in the "best case scenario" where there are no invalid blocks.
1. If we fail to revert the payload statuses we'll log a `CRIT` and just continue with a `proto_array` that *does not* have reverted payload statuses.
- The code to revert statuses needs to deal with balances and proposer-boost, so it's a failure point. This is a defensive measure to avoid introducing new show-stopping bugs to LH.
## Proposed Changes
This PR has two aims: to speed up attestation packing in the op pool, and to fix bugs in the verification of attester slashings, proposer slashings and voluntary exits. The changes are bundled into a single database schema upgrade (v12).
Attestation packing is sped up by removing several inefficiencies:
- No more recalculation of `attesting_indices` during packing.
- No (unnecessary) examination of the `ParticipationFlags`: a bitfield suffices. See `RewardCache`.
- No re-checking of attestation validity during packing: the `AttestationMap` provides attestations which are "correct by construction" (I have checked this using Hydra).
- No SSZ re-serialization for the clunky `AttestationId` type (it can be removed in a future release).
So far the speed-up seems to be roughly 2-10x, from 500ms down to 50-100ms.
Verification of attester slashings, proposer slashings and voluntary exits is fixed by:
- Tracking the `ForkVersion`s that were used to verify each message inside the `SigVerifiedOp`. This allows us to quickly re-verify that they match the head state's opinion of what the `ForkVersion` should be at the epoch(s) relevant to the message.
- Storing the `SigVerifiedOp` on disk rather than the raw operation. This allows us to continue track the fork versions after a reboot.
This is mostly contained in this commit 52bb1840ae5c4356a8fc3a51e5df23ed65ed2c7f.
## Additional Info
The schema upgrade uses the justified state to re-verify attestations and compute `attesting_indices` for them. It will drop any attestations that fail to verify, by the logic that attestations are most valuable in the few slots after they're observed, and are probably stale and useless by the time a node restarts. Exits and proposer slashings and similarly re-verified to obtain `SigVerifiedOp`s.
This PR contains a runtime killswitch `--paranoid-block-proposal` which opts out of all the optimisations in favour of closely verifying every included message. Although I'm quite sure that the optimisations are correct this flag could be useful in the event of an unforeseen emergency.
Finally, you might notice that the `RewardCache` appears quite useless in its current form because it is only updated on the hot-path immediately before proposal. My hope is that in future we can shift calls to `RewardCache::update` into the background, e.g. while performing the state advance. It is also forward-looking to `tree-states` compatibility, where iterating and indexing `state.{previous,current}_epoch_participation` is expensive and needs to be minimised.
## Issue Addressed
NA
## Proposed Changes
Run fork choice when the head is 256 slots from the wall-clock slot, rather than 4.
The reason we don't *always* run FC is so that it doesn't slow us down during sync. As the comments state, setting the value to 256 means that we'd only have one interrupting fork-choice call if we were syncing at 20 slots/sec.
## Additional Info
NA
## Issue Addressed
NA
## Proposed Changes
Adds some metrics so we can track payload status responses from the EE. I think this will be useful for troubleshooting and alerting.
I also bumped the `BecaonChain::per_slot_task` to `debug` since it doesn't seem too noisy and would have helped us with some things we were debugging in the past.
## Additional Info
NA
## Issue Addressed
Closes https://github.com/sigp/lighthouse/issues/2962
## Proposed Changes
Build all caches on the checkpoint state before storing it in the database.
Additionally, fix a bug in `signature_verify_chain_segment` which prevented block verification from succeeding unless the previous epoch cache was already built. The previous epoch cache is required to verify the signatures of attestations included from previous epochs, even when all the blocks in the segment are from the same epoch.
The comments around `signature_verify_chain_segment` have also been updated to reflect the fact that it should only be used on a chain of blocks from a single epoch. I believe this restriction had already been added at some point in the past and that the current comments were just outdated (and I think because the proposer shuffling can change in the next epoch based on the blocks applied in the current epoch that this limitation is essential).
## Issue Addressed
Resolves#3379
## Proposed Changes
Remove instances of `InvalidTerminalBlock` in lighthouse and use
`Invalid {latest_valid_hash: "0x0000000000000000000000000000000000000000000000000000000000000000"}`
to represent that status.
## Issue Addressed
Resolves https://github.com/sigp/lighthouse/issues/3394
Adds a check in `is_healthy` about whether the head is optimistic when choosing whether to use the builder network.
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Issue Addressed
NA
## Proposed Changes
This PR will make Lighthouse return blocks with invalid payloads via the API with `execution_optimistic = true`. This seems a bit awkward, however I think it's better than returning a 404 or some other error.
Let's consider the case where the only possible head is invalid (#3370 deals with this). In such a scenario all of the duties endpoints will start failing because the head is invalid. I think it would be better if the duties endpoints continue to work, because it's likely that even though the head is invalid the duties are still based upon valid blocks and we want the VC to have them cached. There's no risk to the VC here because we won't actually produce an attestation pointing to an invalid head.
Ultimately, I don't think it's particularly important for us to distinguish between optimistic and invalid blocks on the API. Neither should be trusted and the only *real* reason that we track this is so we can try and fork around the invalid blocks.
## Additional Info
- ~~Blocked on #3370~~
## Issue Addressed
https://github.com/sigp/lighthouse/issues/3091
Extends https://github.com/sigp/lighthouse/pull/3062, adding pre-bellatrix block support on blinded endpoints and allowing the normal proposal flow (local payload construction) on blinded endpoints. This resulted in better fallback logic because the VC will not have to switch endpoints on failure in the BN <> Builder API, the BN can just fallback immediately and without repeating block processing that it shouldn't need to. We can also keep VC fallback from the VC<>BN API's blinded endpoint to full endpoint.
## Proposed Changes
- Pre-bellatrix blocks on blinded endpoints
- Add a new `PayloadCache` to the execution layer
- Better fallback-from-builder logic
## Todos
- [x] Remove VC transition logic
- [x] Add logic to only enable builder flow after Merge transition finalization
- [x] Tests
- [x] Fix metrics
- [x] Rustdocs
Co-authored-by: Mac L <mjladson@pm.me>
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Issue Addressed
NA
## Proposed Changes
There are scenarios where the only viable head will have an invalid execution payload, in this scenario the `get_head` function on `proto_array` will return an error. We must recover from this scenario by importing blocks from the network.
This PR stops `BeaconChain::recompute_head` from returning an error so that we can't accidentally start down-scoring peers or aborting block import just because the current head has an invalid payload.
## Reviewer Notes
The following changes are included:
1. Allow `fork_choice.get_head` to fail gracefully in `BeaconChain::process_block` when trying to update the `early_attester_cache`; simply don't add the block to the cache rather than aborting the entire process.
1. Don't return an error from `BeaconChain::recompute_head_at_current_slot` and `BeaconChain::recompute_head` to defensively prevent calling functions from aborting any process just because the fork choice function failed to run.
- This should have practically no effect, since most callers were still continuing if recomputing the head failed.
- The outlier is that the API will return 200 rather than a 500 when fork choice fails.
1. Add the `ProtoArrayForkChoice::set_all_blocks_to_optimistic` function to recover from the scenario where we've rebooted and the persisted fork choice has an invalid head.
## Issue Addressed
Closes https://github.com/sigp/lighthouse/issues/3241
Closes https://github.com/sigp/lighthouse/issues/3242
## Proposed Changes
* [x] Implement logic to remove equivocating validators from fork choice per https://github.com/ethereum/consensus-specs/pull/2845
* [x] Update tests to v1.2.0-rc.1. The new test which exercises `equivocating_indices` is passing.
* [x] Pull in some SSZ abstractions from the `tree-states` branch that make implementing Vec-compatible encoding for types like `BTreeSet` and `BTreeMap`.
* [x] Implement schema upgrades and downgrades for the database (new schema version is V11).
* [x] Apply attester slashings from blocks to fork choice
## Additional Info
* This PR doesn't need the `BTreeMap` impl, but `tree-states` does, and I don't think there's any harm in keeping it. But I could also be convinced to drop it.
Blocked on #3322.
## Issue Addressed
Resolves#3151
## Proposed Changes
When fetching duties for sync committee contributions, check the value of `execution_optimistic` of the head block from the BN and refuse to sign any sync committee messages `if execution_optimistic == true`.
## Additional Info
- Is backwards compatible with older BNs
- Finding a way to add test coverage for this would be prudent. Open to suggestions.
## Issue Addressed
Add a flag that optionally enables unrealized vote tracking. Would like to test out on testnets and benchmark differences in methods of vote tracking. This PR includes a DB schema upgrade to enable to new vote tracking style.
Co-authored-by: realbigsean <sean@sigmaprime.io>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
Co-authored-by: sean <seananderson33@gmail.com>
Co-authored-by: Mac L <mjladson@pm.me>
## Issue Addressed
#3031
## Proposed Changes
Updates the following API endpoints to conform with https://github.com/ethereum/beacon-APIs/pull/190 and https://github.com/ethereum/beacon-APIs/pull/196
- [x] `beacon/states/{state_id}/root`
- [x] `beacon/states/{state_id}/fork`
- [x] `beacon/states/{state_id}/finality_checkpoints`
- [x] `beacon/states/{state_id}/validators`
- [x] `beacon/states/{state_id}/validators/{validator_id}`
- [x] `beacon/states/{state_id}/validator_balances`
- [x] `beacon/states/{state_id}/committees`
- [x] `beacon/states/{state_id}/sync_committees`
- [x] `beacon/headers`
- [x] `beacon/headers/{block_id}`
- [x] `beacon/blocks/{block_id}`
- [x] `beacon/blocks/{block_id}/root`
- [x] `beacon/blocks/{block_id}/attestations`
- [x] `debug/beacon/states/{state_id}`
- [x] `debug/beacon/heads`
- [x] `validator/duties/attester/{epoch}`
- [x] `validator/duties/proposer/{epoch}`
- [x] `validator/duties/sync/{epoch}`
Updates the following Server-Sent Events:
- [x] `events?topics=head`
- [x] `events?topics=block`
- [x] `events?topics=finalized_checkpoint`
- [x] `events?topics=chain_reorg`
## Backwards Incompatible
There is a very minor breaking change with the way the API now handles requests to `beacon/blocks/{block_id}/root` and `beacon/states/{state_id}/root` when `block_id` or `state_id` is the `Root` variant of `BlockId` and `StateId` respectively.
Previously a request to a non-existent root would simply echo the root back to the requester:
```
curl "http://localhost:5052/eth/v1/beacon/states/0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/root"
{"data":{"root":"0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}}
```
Now it will return a `404`:
```
curl "http://localhost:5052/eth/v1/beacon/blocks/0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/root"
{"code":404,"message":"NOT_FOUND: beacon block with root 0xaaaa…aaaa","stacktraces":[]}
```
In addition to this is the block root `0x0000000000000000000000000000000000000000000000000000000000000000` previously would return the genesis block. It will now return a `404`:
```
curl "http://localhost:5052/eth/v1/beacon/blocks/0x0000000000000000000000000000000000000000000000000000000000000000"
{"code":404,"message":"NOT_FOUND: beacon block with root 0x0000…0000","stacktraces":[]}
```
## Additional Info
- `execution_optimistic` is always set, and will return `false` pre-Bellatrix. I am also open to the idea of doing something like `#[serde(skip_serializing_if = "Option::is_none")]`.
- The value of `execution_optimistic` is set to `false` where possible. Any computation that is reliant on the `head` will simply use the `ExecutionStatus` of the head (unless the head block is pre-Bellatrix).
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
Closes https://github.com/sigp/lighthouse/issues/3189.
## Proposed Changes
- Always supply the justified block hash as the `safe_block_hash` when calling `forkchoiceUpdated` on the execution engine.
- Refactor the `get_payload` routine to use the new `ForkchoiceUpdateParameters` struct rather than just the `finalized_block_hash`. I think this is a nice simplification and that the old way of computing the `finalized_block_hash` was unnecessary, but if anyone sees reason to keep that approach LMK.
## Issue Addressed
Resolves#3316
## Proposed Changes
This PR fixes an issue where lighthouse created a transition block with `block.execution_payload().timestamp == terminal_block.timestamp` if the terminal block was created at the slot boundary.
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough 😅)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](de2b2801c8/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java (L171-L182)) it uses [`getStateRootFromBlockRoot`](de2b2801c8/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java (L336-L341)) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
## Issue Addressed
This PR is a subset of the changes in #3134. Unstable will still not function correctly with the new builder spec once this is merged, #3134 should be used on testnets
## Proposed Changes
- Removes redundancy in "builders" (servers implementing the builder spec)
- Renames `payload-builder` flag to `builder`
- Moves from old builder RPC API to new HTTP API, but does not implement the validator registration API (implemented in https://github.com/sigp/lighthouse/pull/3194)
Co-authored-by: sean <seananderson33@gmail.com>
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Issue Addressed
Fixes an issue that @paulhauner found with the v2.3.0 release candidate whereby the fork choice runs introduced by #3168 tripped over each other during sync:
```
May 24 23:06:40.542 WARN Error signalling fork choice waiter slot: 3884129, error: ForkChoiceSignalOutOfOrder { current: Slot(3884131), latest: Slot(3884129) }, service: beacon
```
This can occur because fork choice is called from the state advance _and_ the per-slot task. When one of these runs takes a long time it can end up finishing after a run from a later slot, tripping the error above. The problem is resolved by not running either of these fork choice calls during sync.
Additionally, these parallel fork choice runs were causing issues in the database:
```
May 24 07:49:05.098 WARN Found a chain that should already have been pruned, head_slot: 92925, head_block_root: 0xa76c7bf1b98e54ed4b0d8686efcfdf853484e6c2a4c67e91cbf19e5ad1f96b17, service: beacon
May 24 07:49:05.101 WARN Database migration failed error: HotColdDBError(FreezeSlotError { current_split_slot: Slot(92608), proposed_split_slot: Slot(92576) }), service: beacon
```
In this case, two fork choice calls triggering the finalization processing were being processed out of order due to differences in their processing time, causing the background migrator to try to advance finalization _backwards_ 😳. Removing the parallel fork choice runs from sync effectively addresses the issue, because these runs are most likely to have different finalized checkpoints (because of the speed at which fork choice advances during sync). In theory it's still possible to process updates out of order if any other fork choice runs end up completing out of order, but this should be much less common. Fixing out of order fork choice runs in general is difficult as it requires architectural changes like serialising fork choice updates through a single thread, or locking fork choice along with the head when it is mutated (https://github.com/sigp/lighthouse/pull/3175).
## Proposed Changes
* Don't run per-slot fork choice during sync (if head is older than 4 slots)
* Don't run state-advance fork choice during sync (if head is older than 4 slots)
* Check for monotonic finalization updates in the background migrator. This is a good defensive check to have, and I'm not sure why we didn't have it before (we may have had it and wrongly removed it).
*This PR was adapted from @pawanjay176's work in #3197.*
## Issue Addressed
Fixes a regression in https://github.com/sigp/lighthouse/pull/3168
## Proposed Changes
https://github.com/sigp/lighthouse/pull/3168 added calls to `fork_choice` in `BeaconChain::per_slot_task` function. This leads to a panic as `per_slot_task` is called from an async context which calls fork choice, which then calls `block_on`.
This PR changes the timer to call the `per_slot_task` function in a blocking thread.
Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
## Issue Addressed
Upcoming spec change https://github.com/ethereum/consensus-specs/pull/2878
## Proposed Changes
1. Run fork choice at the start of every slot, and wait for this run to complete before proposing a block.
2. As an optimisation, also run fork choice 3/4 of the way through the slot (at 9s), _dequeueing attestations for the next slot_.
3. Remove the fork choice run from the state advance timer that occurred before advancing the state.
## Additional Info
### Block Proposal Accuracy
This change makes us more likely to propose on top of the correct head in the presence of re-orgs with proposer boost in play. The main scenario that this change is designed to address is described in the linked spec issue.
### Attestation Accuracy
This change _also_ makes us more likely to attest to the correct head. Currently in the case of a skipped slot at `slot` we only run fork choice 9s into `slot - 1`. This means the attestations from `slot - 1` aren't taken into consideration, and any boost applied to the block from `slot - 1` is not removed (it should be). In the language of the linked spec issue, this means we are liable to attest to C, even when the majority voting weight has already caused a re-org to B.
### Why remove the call before the state advance?
If we've run fork choice at the start of the slot then it has already dequeued all the attestations from the previous slot, which are the only ones eligible to influence the head in the current slot. Running fork choice again is unnecessary (unless we run it for the next slot and try to pre-empt a re-org, but I don't currently think this is a great idea).
### Performance
Based on Prater testing this adds about 5-25ms of runtime to block proposal times, which are 500-1000ms on average (and spike to 5s+ sometimes due to state handling issues 😢 ). I believe this is a small enough penalty to enable it by default, with the option to disable it via the new flag `--fork-choice-before-proposal-timeout 0`. Upcoming work on block packing and state representation will also reduce block production times in general, while removing the spikes.
### Implementation
Fork choice gets invoked at the start of the slot via the `per_slot_task` function called from the slot timer. It then uses a condition variable to signal to block production that fork choice has been updated. This is a bit funky, but it seems to work. One downside of the timer-based approach is that it doesn't happen automatically in most of the tests. The test added by this PR has to trigger the run manually.
## Proposed Changes
Reduce post-merge disk usage by not storing finalized execution payloads in Lighthouse's database.
⚠️ **This is achieved in a backwards-incompatible way for networks that have already merged** ⚠️. Kiln users and shadow fork enjoyers will be unable to downgrade after running the code from this PR. The upgrade migration may take several minutes to run, and can't be aborted after it begins.
The main changes are:
- New column in the database called `ExecPayload`, keyed by beacon block root.
- The `BeaconBlock` column now stores blinded blocks only.
- Lots of places that previously used full blocks now use blinded blocks, e.g. analytics APIs, block replay in the DB, etc.
- On finalization:
- `prune_abanonded_forks` deletes non-canonical payloads whilst deleting non-canonical blocks.
- `migrate_db` deletes finalized canonical payloads whilst deleting finalized states.
- Conversions between blinded and full blocks are implemented in a compositional way, duplicating some work from Sean's PR #3134.
- The execution layer has a new `get_payload_by_block_hash` method that reconstructs a payload using the EE's `eth_getBlockByHash` call.
- I've tested manually that it works on Kiln, using Geth and Nethermind.
- This isn't necessarily the most efficient method, and new engine APIs are being discussed to improve this: https://github.com/ethereum/execution-apis/pull/146.
- We're depending on the `ethers` master branch, due to lots of recent changes. We're also using a workaround for https://github.com/gakonst/ethers-rs/issues/1134.
- Payload reconstruction is used in the HTTP API via `BeaconChain::get_block`, which is now `async`. Due to the `async` fn, the `blocking_json` wrapper has been removed.
- Payload reconstruction is used in network RPC to serve blocks-by-{root,range} responses. Here the `async` adjustment is messier, although I think I've managed to come up with a reasonable compromise: the handlers take the `SendOnDrop` by value so that they can drop it on _task completion_ (after the `fn` returns). Still, this is introducing disk reads onto core executor threads, which may have a negative performance impact (thoughts appreciated).
## Additional Info
- [x] For performance it would be great to remove the cloning of full blocks when converting them to blinded blocks to write to disk. I'm going to experiment with a `put_block` API that takes the block by value, breaks it into a blinded block and a payload, stores the blinded block, and then re-assembles the full block for the caller.
- [x] We should measure the latency of blocks-by-root and blocks-by-range responses.
- [x] We should add integration tests that stress the payload reconstruction (basic tests done, issue for more extensive tests: https://github.com/sigp/lighthouse/issues/3159)
- [x] We should (manually) test the schema v9 migration from several prior versions, particularly as blocks have changed on disk and some migrations rely on being able to load blocks.
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
Fix a panic due to misuse of the Tokio executor when processing a forkchoiceUpdated response. We were previously calling `process_invalid_execution_payload` from the async function `update_execution_engine_forkchoice_async`, which resulted in a panic because `process_invalid_execution_payload` contains a call to fork choice, which ultimately calls `block_on`.
An example backtrace can be found here: https://gist.github.com/michaelsproul/ac5da03e203d6ffac672423eaf52fb20
## Proposed Changes
Wrap the call to `process_invalid_execution_payload` in a `spawn_blocking` so that `block_on` is no longer called from an async context.
## Additional Info
- I've been thinking about how to catch bugs like this with static analysis (a new Clippy lint).
- The payload validation tests have been re-worked to support distinct responses from the mock EE for newPayload and forkchoiceUpdated. Three new tests have been added covering the `Invalid`, `InvalidBlockHash` and `InvalidTerminalBlock` cases.
- I think we need a bunch more tests of different legal and illegal variations
## Issue Addressed
NA
## Proposed Changes
Disallow the production of attestations and retrieval of unaggregated attestations when they reference an optimistic head. Add tests to this end.
I also moved `BeaconChain::produce_unaggregated_attestation_for_block` to the `BeaconChainHarness`. It was only being used during tests, so it's nice to stop pretending it's production code. I also needed something that could produce attestations to optimistic blocks in order to simulate scenarios where the justified checkpoint is determined invalid (if no one would attest to an optimistic block, we could never justify it and then flip it to invalid).
## Additional Info
- ~~Blocked on #3126~~
## Issue Addressed
NA
## Proposed Changes
Fixes an issue introduced in #3088 which was causing unnecessary `crit` logs on networks without Bellatrix enabled.
## Additional Info
NA
## Issue Addressed
NA
## Proposed Changes
Ensures that a `VALID` response from a `forkchoiceUpdate` call will update that block in `ProtoArray`.
I also had to modify the mock execution engine so it wouldn't return valid when all payloads were supposed to be some other static value.
## Additional Info
NA
## Issue Addressed
MEV boost compatibility
## Proposed Changes
See #2987
## Additional Info
This is blocked on the stabilization of a couple specs, [here](https://github.com/ethereum/beacon-APIs/pull/194) and [here](https://github.com/flashbots/mev-boost/pull/20).
Additional TODO's and outstanding questions
- [ ] MEV boost JWT Auth
- [ ] Will `builder_proposeBlindedBlock` return the revealed payload for the BN to propogate
- [ ] Should we remove `private-tx-proposals` flag and communicate BN <> VC with blinded blocks by default once these endpoints enter the beacon-API's repo? This simplifies merge transition logic.
Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Proposed Changes
Allow Lighthouse to speculatively create blocks via the `/eth/v1/validators/blocks` endpoint by optionally skipping the RANDAO verification that we introduced in #2740. When `verify_randao=false` is passed as a query parameter the `randao_reveal` is not required to be present, and if present will only be lightly checked (must be a valid BLS sig). If `verify_randao` is omitted it defaults to true and Lighthouse behaves exactly as it did previously, hence this PR is backwards-compatible.
I'd like to get this change into `unstable` pretty soon as I've got 3 projects building on top of it:
- [`blockdreamer`](https://github.com/michaelsproul/blockdreamer), which mocks block production every slot in order to fingerprint clients
- analysis of Lighthouse's block packing _optimality_, which uses `blockdreamer` to extract interesting instances of the attestation packing problem
- analysis of Lighthouse's block packing _performance_ (as in speed) on the `tree-states` branch
## Additional Info
Having tested `blockdreamer` with Prysm, Nimbus and Teku I noticed that none of them verify the randao signature on `/eth/v1/validator/blocks`. I plan to open a PR to the `beacon-APIs` repo anyway so that this parameter can be standardised in case the other clients add RANDAO verification by default in future.
## Issue Addressed
This address an issue which was preventing checkpoint-sync.
When the node starts from checkpoint sync, the head block and the finalized block are the same value. We did not respect this when sending a `forkchoiceUpdated` (fcU) call to the EL and were expecting fork choice to hold the *finalized ancestor of the head* and returning an error when it didn't.
This PR uses *only fork choice* for sending fcU updates. This is actually quite nice and avoids some atomicity issues between `chain.canonical_head` and `chain.fork_choice`. Now, whenever `chain.fork_choice.get_head` returns a value we also cache the values required for the next fcU call.
## TODO
- [x] ~~Blocked on #3043~~
- [x] Ensure there isn't a warn message at startup.
## Issue Addressed
Resolves#2936
## Proposed Changes
Adds functionality for calling [`validator/prepare_beacon_proposer`](https://ethereum.github.io/beacon-APIs/?urls.primaryName=dev#/Validator/prepareBeaconProposer) in advance.
There is a `BeaconChain::prepare_beacon_proposer` method which, which called, computes the proposer for the next slot. If that proposer has been registered via the `validator/prepare_beacon_proposer` API method, then the `beacon_chain.execution_layer` will be provided the `PayloadAttributes` for us in all future forkchoiceUpdated calls. An artificial forkchoiceUpdated call will be created 4s before each slot, when the head updates and when a validator updates their information.
Additionally, I added strict ordering for calls from the `BeaconChain` to the `ExecutionLayer`. I'm not certain the `ExecutionLayer` will always maintain this ordering, but it's a good start to have consistency from the `BeaconChain`. There are some deadlock opportunities introduced, they are documented in the code.
## Additional Info
- ~~Blocked on #2837~~
Co-authored-by: realbigsean <seananderson33@GMAIL.com>