* add historical summaries
* fix tree hash caching, disable the sanity slots test with fake crypto
* add ssz static HistoricalSummary
* only store historical summaries after capella
* Teach `UpdatePattern` about Capella
* Tidy EF tests
* Clippy
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
## Issue Addressed
Recent discussions with other client devs about optimistic sync have revealed a conceptual issue with the optimisation implemented in #3738. In designing that feature I failed to consider that the execution node checks the `blockHash` of the execution payload before responding with `SYNCING`, and that omitting this check entirely results in a degradation of the full node's validation. A node omitting the `blockHash` checks could be tricked by a supermajority of validators into following an invalid chain, something which is ordinarily impossible.
## Proposed Changes
I've added verification of the `payload.block_hash` in Lighthouse. In case of failure we log a warning and fall back to verifying the payload with the execution client.
I've used our existing dependency on `ethers_core` for RLP support, and a new dependency on Parity's `triehash` crate for the Merkle patricia trie. Although the `triehash` crate is currently unmaintained it seems like our best option at the moment (it is also used by Reth, and requires vastly less boilerplate than Parity's generic `trie-root` library).
Block hash verification is pretty quick, about 500us per block on my machine (mainnet).
The optimistic finalized sync feature can be disabled using `--disable-optimistic-finalized-sync` which forces full verification with the EL.
## Additional Info
This PR also introduces a new dependency on our [`metastruct`](https://github.com/sigp/metastruct) library, which was perfectly suited to the RLP serialization method. There will likely be changes as `metastruct` grows, but I think this is a good way to start dogfooding it.
I took inspiration from some Parity and Reth code while writing this, and have preserved the relevant license headers on the files containing code that was copied and modified.
## Proposed Changes
With proposer boosting implemented (#2822) we have an opportunity to re-org out late blocks.
This PR adds three flags to the BN to control this behaviour:
* `--disable-proposer-reorgs`: turn aggressive re-orging off (it's on by default).
* `--proposer-reorg-threshold N`: attempt to orphan blocks with less than N% of the committee vote. If this parameter isn't set then N defaults to 20% when the feature is enabled.
* `--proposer-reorg-epochs-since-finalization N`: only attempt to re-org late blocks when the number of epochs since finalization is less than or equal to N. The default is 2 epochs, meaning re-orgs will only be attempted when the chain is finalizing optimally.
For safety Lighthouse will only attempt a re-org under very specific conditions:
1. The block being proposed is 1 slot after the canonical head, and the canonical head is 1 slot after its parent. i.e. at slot `n + 1` rather than building on the block from slot `n` we build on the block from slot `n - 1`.
2. The current canonical head received less than N% of the committee vote. N should be set depending on the proposer boost fraction itself, the fraction of the network that is believed to be applying it, and the size of the largest entity that could be hoarding votes.
3. The current canonical head arrived after the attestation deadline from our perspective. This condition was only added to support suppression of forkchoiceUpdated messages, but makes intuitive sense.
4. The block is being proposed in the first 2 seconds of the slot. This gives it time to propagate and receive the proposer boost.
## Additional Info
For the initial idea and background, see: https://github.com/ethereum/consensus-specs/pull/2353#issuecomment-950238004
There is also a specification for this feature here: https://github.com/ethereum/consensus-specs/pull/3034
Co-authored-by: Michael Sproul <micsproul@gmail.com>
Co-authored-by: pawan <pawandhananjay@gmail.com>
## Issue Addressed
Implementing the light_client_gossip topics but I'm not there yet.
Which issue # does this PR address?
Partially #3651
## Proposed Changes
Add light client gossip topics.
Please list or describe the changes introduced by this PR.
I'm going to Implement light_client_finality_update and light_client_optimistic_update gossip topics. Currently I've attempted the former and I'm seeking feedback.
## Additional Info
I've only implemented the light_client_finality_update topic because I wanted to make sure I was on the correct path. Also checking that the gossiped LightClientFinalityUpdate is the same as the locally constructed one is not implemented because caching the updates will make this much easier. Could someone give me some feedback on this please?
Please provide any additional information. For example, future considerations
or information useful for reviewers.
Co-authored-by: GeemoCandama <104614073+GeemoCandama@users.noreply.github.com>
## Issue Addressed
Closes https://github.com/sigp/lighthouse/issues/2327
## Proposed Changes
This is an extension of some ideas I implemented while working on `tree-states`:
- Cache the indexed attestations from blocks in the `ConsensusContext`. Previously we were re-computing them 3-4 times over.
- Clean up `import_block` by splitting each part into `import_block_XXX`.
- Move some stuff off hot paths, specifically:
- Relocate non-essential tasks that were running between receiving the payload verification status and priming the early attester cache. These tasks are moved after the cache priming:
- Attestation observation
- Validator monitor updates
- Slasher updates
- Updating the shuffling cache
- Fork choice attestation observation now happens at the end of block verification in parallel with payload verification (this seems to save 5-10ms).
- Payload verification now happens _before_ advancing the pre-state and writing it to disk! States were previously being written eagerly and adding ~20-30ms in front of verifying the execution payload. State catchup also sometimes takes ~500ms if we get a cache miss and need to rebuild the tree hash cache.
The remaining task that's taking substantial time (~20ms) is importing the block to fork choice. I _think_ this is because of pull-tips, and we should be able to optimise it out with a clever total active balance cache in the state (which would be computed in parallel with payload verification). I've decided to leave that for future work though. For now it can be observed via the new `beacon_block_processing_post_exec_pre_attestable_seconds` metric.
Co-authored-by: Michael Sproul <micsproul@gmail.com>
## Issue Addressed
Part of https://github.com/sigp/lighthouse/issues/3651.
## Proposed Changes
Add a flag for enabling the light client server, which should be checked before gossip/RPC traffic is processed (e.g. https://github.com/sigp/lighthouse/pull/3693, https://github.com/sigp/lighthouse/pull/3711). The flag is available at runtime from `beacon_chain.config.enable_light_client_server`.
Additionally, a new method `BeaconChain::with_mutable_state_for_block` is added which I envisage being used for computing light client updates. Unfortunately its performance will be quite poor on average because it will only run quickly with access to the tree hash cache. Each slot the tree hash cache is only available for a brief window of time between the head block being processed and the state advance at 9s in the slot. When the state advance happens the cache is moved and mutated to get ready for the next slot, which makes it no longer useful for merkle proofs related to the head block. Rather than spend more time trying to optimise this I think we should continue prototyping with this code, and I'll make sure `tree-states` is ready to ship before we enable the light client server in prod (cf. https://github.com/sigp/lighthouse/pull/3206).
## Additional Info
I also fixed a bug in the implementation of `BeaconState::compute_merkle_proof` whereby the tree hash cache was moved with `.take()` but never put back with `.restore()`.
## Issue Addressed
This PR addresses partially #3651
## Proposed Changes
This PR adds the following methods:
* a new method to trait `TreeHash`, `hash_tree_leaves` which returns all the Merkle leaves of the ssz object.
* a new method to `BeaconState`: `compute_merkle_proof` which generates a specific merkle proof for given depth and index by using the `hash_tree_leaves` as leaves function.
## Additional Info
Now here is some rationale on why I decided to go down this route: adding a new function to commonly used trait is a pain but was necessary to make sure we have all merkle leaves for every object, that is why I just added `hash_tree_leaves` in the trait and not `compute_merkle_proof` as well. although it would make sense it gives us code duplication/harder review time and we just need it from one specific object in one specific usecase so not worth the effort YET. In my humble opinion.
Co-authored-by: Michael Sproul <micsproul@gmail.com>
## Issue Addressed
New lints for rust 1.65
## Proposed Changes
Notable change is the identification or parameters that are only used in recursion
## Additional Info
na
## Summary
The deposit cache now has the ability to finalize deposits. This will cause it to drop unneeded deposit logs and hashes in the deposit Merkle tree that are no longer required to construct deposit proofs. The cache is finalized whenever the latest finalized checkpoint has a new `Eth1Data` with all deposits imported.
This has three benefits:
1. Improves the speed of constructing Merkle proofs for deposits as we can just replay deposits since the last finalized checkpoint instead of all historical deposits when re-constructing the Merkle tree.
2. Significantly faster weak subjectivity sync as the deposit cache can be transferred to the newly syncing node in compressed form. The Merkle tree that stores `N` finalized deposits requires a maximum of `log2(N)` hashes. The newly syncing node then only needs to download deposits since the last finalized checkpoint to have a full tree.
3. Future proofing in preparation for [EIP-4444](https://eips.ethereum.org/EIPS/eip-4444) as execution nodes will no longer be required to store logs permanently so we won't always have all historical logs available to us.
## More Details
Image to illustrate how the deposit contract merkle tree evolves and finalizes along with the resulting `DepositTreeSnapshot`
![image](https://user-images.githubusercontent.com/37123614/151465302-5fc56284-8a69-4998-b20e-45db3934ac70.png)
## Other Considerations
I've changed the structure of the `SszDepositCache` so once you load & save your database from this version of lighthouse, you will no longer be able to load it from older versions.
Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>
## Issue Addressed
This PR partially addresses #3651
## Proposed Changes
This PR adds the following containers types from [the lightclient specs](https://github.com/ethereum/consensus-specs/blob/dev/specs/altair/light-client/sync-protocol.md): `LightClientUpdate`, `LightClientFinalityUpdate`, `LightClientOptimisticUpdate` and `LightClientBootstrap`. It also implements the creation of each updates as delined by this [document](https://github.com/ethereum/consensus-specs/blob/dev/specs/altair/light-client/full-node.md).
## Additional Info
Here is a brief description of what each of these container signify:
`LightClientUpdate`: This container is only provided by server (full node) to lightclients when catching up new sync committees beetwen periods and we want possibly one lightclient update ready for each post-altair period the lighthouse node go over. it is needed in the resp/req in method `light_client_update_by_range`.
`LightClientFinalityUpdate/LightClientFinalityUpdate`: Lighthouse will need only the latest of each of this kind of updates, so no need to store them in the database, we can just store the latest one of each one in memory and then just supply them via gossip or respreq, only the latest ones are served by a full node. finality updates marks the transition to a new finalized header, while optimistic updates signify new non-finalized header which are imported optimistically.
`LightClientBootstrap`: This object is retrieved by lightclients during the bootstrap process after a finalized checkpoint is retrieved, ideally we want to store a LightClientBootstrap for each finalized root and then serve each of them by finalized root in respreq protocol id `light_client_bootstrap`.
Little digression to how we implement the creation of each updates: the creation of a optimistic/finality update is just a version of the lightclient_update creation mechanism with less fields being set, there is underlying concept of inheritance, if you look at the specs it becomes very obvious that a lightclient update is just an extension of a finality update and a finality update an extension to an optimistic update.
## Extra note
`LightClientStore` is not implemented as it is only useful as internal storage design for the lightclient side.
* add capella gossip boiler plate
* get everything compiling
Co-authored-by: realbigsean <sean@sigmaprime.io
Co-authored-by: Mark Mackey <mark@sigmaprime.io>
* small cleanup
* small cleanup
* cargo fix + some test cleanup
* improve block production
* add fixme for potential panic
Co-authored-by: Mark Mackey <mark@sigmaprime.io>
## Issue Addressed
This reverts commit ca9dc8e094 (PR #3559) with some modifications.
## Proposed Changes
Unfortunately that PR introduced a performance regression in fork choice. The optimisation _intended_ to build the exit and pubkey caches on the head state _only if_ they were not already built. However, due to the head state always being cloned without these caches, we ended up building them every time the head changed, leading to a ~70ms+ penalty on mainnet.
fcfd02aeec/beacon_node/beacon_chain/src/canonical_head.rs (L633-L636)
I believe this is a severe enough regression to justify immediately releasing v3.2.1 with this change.
## Additional Info
I didn't fully revert #3559, because there were some unrelated deletions of dead code in that PR which I figured we may as well keep.
An alternative would be to clone the extra caches, but this likely still imposes some cost, so in the interest of applying a conservative fix quickly, I think reversion is the best approach. The optimisation from #3559 was not even optimising a particularly significant path, it was mostly for VCs running larger numbers of inactive keys. We can re-do it in the `tree-states` world where cache clones are cheap.
## Issue Addressed
Closes https://github.com/sigp/lighthouse/issues/2371
## Proposed Changes
Backport some changes from `tree-states` that remove duplicated calculations of the `proposer_index`.
With this change the proposer index should be calculated only once for each block, and then plumbed through to every place it is required.
## Additional Info
In future I hope to add more data to the consensus context that is cached on a per-epoch basis, like the effective balances of validators and the base rewards.
There are some other changes to remove indexing in tests that were also useful for `tree-states` (the `tree-states` types don't implement `Index`).
## Issue Addressed
While digging around in some logs I noticed that queries for validators by pubkey were taking 10ms+, which seemed too long. This was due to a loop through the entire validator registry for each lookup.
## Proposed Changes
Rather than using a loop through the register, this PR utilises the pubkey cache which is usually initialised at the head*. In case the cache isn't built, we fall back to the previous loop logic. In the vast majority of cases I expect the cache will be built, as the validator client queries at the `head` where all caches should be built.
## Additional Info
*I had to modify the cache build that runs after fork choice to build the pubkey cache. I think it had been optimised out, perhaps accidentally. I think it's preferable to have the exit cache and the pubkey cache built on the head state, as they are required for verifying deposits and exits respectively, and we may as well build them off the hot path of block processing. Previously they'd get built the first time a deposit or exit needed to be verified.
I've deleted the unused `map_state` function which was obsoleted by `map_state_and_execution_optimistic`.
## Issue Addressed
#2847
## Proposed Changes
Add under a feature flag the required changes to subscribe to long lived subnets in a deterministic way
## Additional Info
There is an additional required change that is actually searching for peers using the prefix, but I find that it's best to make this change in the future
## Issue Addressed
fixes lints from the last rust release
## Proposed Changes
Fix the lints, most of the lints by `clippy::question-mark` are false positives in the form of https://github.com/rust-lang/rust-clippy/issues/9518 so it's allowed for now
## Additional Info
## Issue Addressed
https://github.com/ethereum/beacon-APIs/pull/222
## Proposed Changes
Update Lighthouse's randao verification API to match the `beacon-APIs` spec. We implemented the API before spec stabilisation, and it changed slightly in the course of review.
Rather than a flag `verify_randao` taking a boolean value, the new API uses a `skip_randao_verification` flag which takes no argument. The new spec also requires the randao reveal to be present and equal to the point-at-infinity when `skip_randao_verification` is set.
I've also updated the `POST /lighthouse/analysis/block_rewards` API to take blinded blocks as input, as the execution payload is irrelevant and we may want to assess blocks produced by builders.
## Additional Info
This is technically a breaking change, but seeing as I suspect I'm the only one using these parameters/APIs, I think we're OK to include this in a patch release.
## Issue Addressed
Fixes a potential regression in memory fragmentation identified by @paulhauner here: https://github.com/sigp/lighthouse/pull/3371#discussion_r931770045.
## Proposed Changes
Immediately allocate a vector with sufficient size to hold all decoded elements in SSZ decoding. The `size_hint` is derived from the range iterator here:
2983235650/consensus/ssz/src/decode/impls.rs (L489)
## Additional Info
I'd like to test this out on some infra for a substantial duration to see if it affects total fragmentation.
## Issue Addressed
NA
## Proposed Changes
I've noticed that our block hashing times increase significantly after the merge. I did some flamegraph-ing and noticed that we're allocating a `Vec` for each byte of each execution payload transaction. This seems like unnecessary work and a bit of a fragmentation risk.
This PR switches to `SmallVec<[u8; 32]>` for the packed encoding of `TreeHash`. I believe this is a nice simple optimisation with no downside.
### Benchmarking
These numbers were computed using #3580 on my desktop (i7 hex-core). You can see a bit of noise in the numbers, that's probably just my computer doing other things. Generally I found this change takes the time from 10-11ms to 8-9ms. I can also see all the allocations disappear from flamegraph.
This is the block being benchmarked: https://beaconcha.in/slot/4704236
#### Before
```
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 980: 10.553003ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 981: 10.563737ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 982: 10.646352ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 983: 10.628532ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 984: 10.552112ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 985: 10.587778ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 986: 10.640526ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 987: 10.587243ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 988: 10.554748ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 989: 10.551111ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 990: 11.559031ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 991: 11.944827ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 992: 10.554308ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 993: 11.043397ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 994: 11.043315ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 995: 11.207711ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 996: 11.056246ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 997: 11.049706ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 998: 11.432449ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 999: 11.149617ms
```
#### After
```
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 980: 14.011653ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 981: 8.925314ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 982: 8.849563ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 983: 8.893689ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 984: 8.902964ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 985: 8.942067ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 986: 8.907088ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 987: 9.346101ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 988: 8.96142ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 989: 9.366437ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 990: 9.809334ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 991: 9.541561ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 992: 11.143518ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 993: 10.821181ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 994: 9.855973ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 995: 10.941006ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 996: 9.596155ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 997: 9.121739ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 998: 9.090019ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 999: 9.071885ms
```
## Additional Info
Please provide any additional information. For example, future considerations
or information useful for reviewers.
## Issue Addressed
NA
## Proposed Changes
I have observed scenarios on Goerli where Lighthouse was receiving attestations which reference the same, un-cached shuffling on multiple threads at the same time. Lighthouse was then loading the same state from database and determining the shuffling on multiple threads at the same time. This is unnecessary load on the disk and RAM.
This PR modifies the shuffling cache so that each entry can be either:
- A committee
- A promise for a committee (i.e., a `crossbeam_channel::Receiver`)
Now, in the scenario where we have thread A and thread B simultaneously requesting the same un-cached shuffling, we will have the following:
1. Thread A will take the write-lock on the shuffling cache, find that there's no cached committee and then create a "promise" (a `crossbeam_channel::Sender`) for a committee before dropping the write-lock.
1. Thread B will then be allowed to take the write-lock for the shuffling cache and find the promise created by thread A. It will block the current thread waiting for thread A to fulfill that promise.
1. Thread A will load the state from disk, obtain the shuffling, send it down the channel, insert the entry into the cache and then continue to verify the attestation.
1. Thread B will then receive the shuffling from the receiver, be un-blocked and then continue to verify the attestation.
In the case where thread A fails to generate the shuffling and drops the sender, the next time that specific shuffling is requested we will detect that the channel is disconnected and return a `None` entry for that shuffling. This will cause the shuffling to be re-calculated.
## Additional Info
NA
## Issue Addressed
Add a flag that can increase count unrealized strictness, defaults to false
## Proposed Changes
Please list or describe the changes introduced by this PR.
## Additional Info
Please provide any additional information. For example, future considerations
or information useful for reviewers.
Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: sean <seananderson33@gmail.com>
## Issue Addressed
When requesting an index which is not active during `start_epoch`, Lighthouse returns:
```
curl "http://localhost:5052/lighthouse/analysis/attestation_performance/999999999?start_epoch=100000&end_epoch=100000"
```
```json
{
"code": 500,
"message": "INTERNAL_SERVER_ERROR: ParticipationCache(InvalidValidatorIndex(999999999))",
"stacktraces": []
}
```
This error occurs even when the index in question becomes active before `end_epoch` which is undesirable as it can prevent larger queries from completing.
## Proposed Changes
In the event the index is out-of-bounds (has not yet been activated), simply return all fields as `false`:
```
-> curl "http://localhost:5052/lighthouse/analysis/attestation_performance/999999999?start_epoch=100000&end_epoch=100000"
```
```json
[
{
"index": 999999999,
"epochs": {
"100000": {
"active": false,
"head": false,
"target": false,
"source": false
}
}
}
]
```
By doing this, we cover the case where a validator becomes active sometime between `start_epoch` and `end_epoch`.
## Additional Info
Note that this error only occurs for epochs after the Altair hard fork.
## Issue Addressed
NA
## Proposed Changes
Adds more `debug` logging to help troubleshoot invalid execution payload blocks. I was doing some of this recently and found it to be challenging.
With this PR we should be able to grep `Invalid execution payload` and get one-liners that will show the block, slot and details about the proposer.
I also changed the log in `process_invalid_execution_payload` since it was a little misleading; the `block_root` wasn't necessary the block which had an invalid payload.
## Additional Info
NA