* introduce availability pending block
* add intoavailableblock trait
* small fixes
* add 'gossip blob cache' and start to clean up processing and transition types
* shard memory blob cache
* Initial commit
* Fix after rebase
* Add gossip verification conditions
* cache cleanup
* general chaos
* extended chaos
* cargo fmt
* more progress
* more progress
* tons of changes, just tryna compile
* everything, everywhere, all at once
* Reprocess an ExecutedBlock on unavailable blobs
* Add sus gossip verification for blobs
* Merge stuff
* Remove reprocessing cache stuff
* lint
* Add a wrapper to allow construction of only valid `AvailableBlock`s
* rename blob arc list to blob list
* merge cleanuo
* Revert "merge cleanuo"
This reverts commit 5e98326878c77528d0c4668c5a4db4a4b0fbaeaa.
* Revert "Revert "merge cleanuo""
This reverts commit 3a4009443a5812b3028abe855079307436dc5419.
* fix rpc methods
* move beacon block and blob to eth2/types
* rename gossip blob cache to data availability checker
* lots of changes
* fix some compilation issues
* fix compilation issues
* fix compilation issues
* fix compilation issues
* fix compilation issues
* fix compilation issues
* cargo fmt
* use a common data structure for block import types
* fix availability check on proposal import
* refactor the blob cache and split the block wrapper into two types
* add type conversion for signed block and block wrapper
* fix beacon chain tests and do some renaming, add some comments
* Partial processing (#4)
* move beacon block and blob to eth2/types
* rename gossip blob cache to data availability checker
* lots of changes
* fix some compilation issues
* fix compilation issues
* fix compilation issues
* fix compilation issues
* fix compilation issues
* fix compilation issues
* cargo fmt
* use a common data structure for block import types
* fix availability check on proposal import
* refactor the blob cache and split the block wrapper into two types
* add type conversion for signed block and block wrapper
* fix beacon chain tests and do some renaming, add some comments
* cargo update (#6)
---------
Co-authored-by: realbigsean <sean@sigmaprime.io>
Co-authored-by: realbigsean <seananderson33@gmail.com>
* Update get blobs endpoint to return BlobSidecarList
* Update code comment
* Update blob retrieval to return BlobSidecarList without Arc
* Remove usage of BlobSidecarList type alias to avoid code conflicts
* Add clippy allow exception
This PR enables the user to adjust the shuffling cache size.
This is useful for some HTTP API requests which require re-computing old shufflings. This PR currently optimizes the
beacon/states/{state_id}/committees HTTP API by first checking the cache before re-building shuffling.
If the shuffling is set to a non-default value, then the HTTP API request will also fill the cache when as it constructs new shufflings.
If the CLI flag is not present or the value is set to the default of 16 the default behaviour is observed.
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
## Issue Addressed
NA
## Proposed Changes
When producing a block from a builder, there are two points where we could consider the block "broadcast":
1. When the blinded block is published to the builder.
2. When the un-blinded block is published to the P2P network (this is always *after* the previous step).
Our logging for late block broadcasts was using (2) for builder-blocks, which was creating a lot of false-positive logs. This is because the builder publishes the block on the P2P network themselves before returning it to us and we perform (2). For clarity, the logs were false-positives because we claim that the block was published late by us when it was actually published earlier by the builder.
This PR changes our logging behavior so we do our logging at (1) instead. It also updates our metrics for block broadcast to distinguish between local and builder blocks. I believe the metrics change will be natively compatible with existing Grafana dashboards.
## Additional Info
One could argue that the builder *should* return the block to us faster, however that's not the case. I think it's more important that we don't desensitize users with false-positives.
## Issue Addressed
Add support for ipv6 and dual stack in lighthouse.
## Proposed Changes
From an user perspective, now setting an ipv6 address, optionally configuring the ports should feel exactly the same as using an ipv4 address. If listening over both ipv4 and ipv6 then the user needs to:
- use the `--listen-address` two times (ipv4 and ipv6 addresses)
- `--port6` becomes then required
- `--discovery-port6` can now be used to additionally configure the ipv6 udp port
### Rough list of code changes
- Discovery:
- Table filter and ip mode set to match the listening config.
- Ipv6 address, tcp port and udp port set in the ENR builder
- Reported addresses now check which tcp port to give to libp2p
- LH Network Service:
- Can listen over Ipv6, Ipv4, or both. This uses two sockets. Using mapped addresses is disabled from libp2p and it's the most compatible option.
- NetworkGlobals:
- No longer stores udp port since was not used at all. Instead, stores the Ipv4 and Ipv6 TCP ports.
- NetworkConfig:
- Update names to make it clear that previous udp and tcp ports in ENR were Ipv4
- Add fields to configure Ipv6 udp and tcp ports in the ENR
- Include advertised enr Ipv6 address.
- Add type to model Listening address that's either Ipv4, Ipv6 or both. A listening address includes the ip, udp port and tcp port.
- UPnP:
- Kept only for ipv4
- Cli flags:
- `--listen-addresses` now can take up to two values
- `--port` will apply to ipv4 or ipv6 if only one listening address is given. If two listening addresses are given it will apply only to Ipv4.
- `--port6` New flag required when listening over ipv4 and ipv6 that applies exclusively to Ipv6.
- `--discovery-port` will now apply to ipv4 and ipv6 if only one listening address is given.
- `--discovery-port6` New flag to configure the individual udp port of ipv6 if listening over both ipv4 and ipv6.
- `--enr-udp-port` Updated docs to specify that it only applies to ipv4. This is an old behaviour.
- `--enr-udp6-port` Added to configure the enr udp6 field.
- `--enr-tcp-port` Updated docs to specify that it only applies to ipv4. This is an old behaviour.
- `--enr-tcp6-port` Added to configure the enr tcp6 field.
- `--enr-addresses` now can take two values.
- `--enr-match` updated behaviour.
- Common:
- rename `unused_port` functions to specify that they are over ipv4.
- add functions to get unused ports over ipv6.
- Testing binaries
- Updated code to reflect network config changes and unused_port changes.
## Additional Info
TODOs:
- use two sockets in discovery. I'll get back to this and it's on https://github.com/sigp/discv5/pull/160
- lcli allow listening over two sockets in generate_bootnodes_enr
- add at least one smoke flag for ipv6 (I have tested this and works for me)
- update the book
## Proposed Changes
Two tiny updates to satisfy Clippy 1.68
Plus refactoring of the `http_api` into less complex types so the compiler can chew and digest them more easily.
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
* wip
* fix router
* arc the byroot responses we send
* add placeholder for blob verification
* respond to blobs by range and blobs by root request in the most horrible and gross way ever
* everything in sync is now unimplemented
* fix compiation issues
* http_pi change is very small, just add it
* remove ctrl-c ctrl-v's docs
## Issue Addressed
Closes#3896Closes#3998Closes#3700
## Proposed Changes
- Optimise the calculation of withdrawals for payload attributes by avoiding state clones, avoiding unnecessary state advances and reading from the snapshot cache if possible.
- Use the execution layer's payload attributes cache to avoid re-calculating payload attributes. I actually implemented a new LRU cache just for withdrawals but it had the exact same key and most of the same data as the existing payload attributes cache, so I deleted it.
- Add a new SSE event that fires when payloadAttributes are calculated. This is useful for block builders, a la https://github.com/ethereum/beacon-APIs/issues/244.
- Add a new CLI flag `--always-prepare-payload` which forces payload attributes to be sent with every fcU regardless of connected proposers. This is intended for use by builders/relays.
For maximum effect, the flags I've been using to run Lighthouse in "payload builder mode" are:
```
--always-prepare-payload \
--prepare-payload-lookahead 12000 \
--suggested-fee-recipient 0x0000000000000000000000000000000000000000
```
The fee recipient is required so Lighthouse has something to pack in the payload attributes (it can be ignored by the builder). The lookahead causes fcU to be sent at the start of every slot rather than at 8s. As usual, fcU will also be sent after each change of head block. I think this combination is sufficient for builders to build on all viable heads. Often there will be two fcU (and two payload attributes) sent for the same slot: one sent at the start of the slot with the head from `n - 1` as the parent, and one sent after the block arrives with `n` as the parent.
Example usage of the new event stream:
```bash
curl -N "http://localhost:5052/eth/v1/events?topics=payload_attributes"
```
## Additional Info
- [x] Tests added by updating the proposer re-org tests. This has the benefit of testing the proposer re-org code paths with withdrawals too, confirming that the new changes don't interact poorly.
- [ ] Benchmarking with `blockdreamer` on devnet-7 showed promising results but I'm yet to do a comparison to `unstable`.
Co-authored-by: Michael Sproul <micsproul@gmail.com>
## Issue Addressed
Cleans up all the remnants of 4844 in capella. This makes sure when 4844 is reviewed there is nothing we are missing because it got included here
## Proposed Changes
drop a bomb on every 4844 thing
## Additional Info
Merge process I did (locally) is as follows:
- squash merge to produce one commit
- in new branch off unstable with the squashed commit create a `git revert HEAD` commit
- merge that new branch onto 4844 with `--strategy ours`
- compare local 4844 to remote 4844 and make sure the diff is empty
- enjoy
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
NA
## Proposed Changes
Adds two new `DEBG` logs to the HTTP API:
1. As soon as we are requested to produce a block.
2. As soon as a signed block is received.
In #3858 we added some very helpful logs to the VC so we could see when things are happening with block proposals in the VC. After doing some more debugging, I found that I can tell when the VC is sending a block but I *can't* tell the time that the BN receives it (I can only get the time after the BN has started doing some work with the block). Knowing when the VC published and as soon as the BN receives is useful for determining the delays introduced by network latency (and some other things like JSON decoding, etc).
## Additional Info
NA
## Issue Addressed
#3804
## Proposed Changes
- Add `total_balance` to the validator monitor and adjust the number of historical epochs which are cached.
- Allow certain values in the cache to be served out via the HTTP API without requiring a state read.
## Usage
```
curl -X POST "http://localhost:5052/lighthouse/ui/validator_info" -d '{"indices": [0]}' -H "Content-Type: application/json" | jq
```
```
{
"data": {
"validators": {
"0": {
"info": [
{
"epoch": 172981,
"total_balance": 36566388519
},
...
{
"epoch": 172990,
"total_balance": 36566496513
}
]
},
"1": {
"info": [
{
"epoch": 172981,
"total_balance": 36355797968
},
...
{
"epoch": 172990,
"total_balance": 36355905962
}
]
}
}
}
}
```
## Additional Info
This requires no historical states to operate which mean it will still function on the freshly checkpoint synced node, however because of this, the values will populate each epoch (up to a maximum of 10 entries).
Another benefit of this method, is that we can easily cache any other values which would normally require a state read and serve them via the same endpoint. However, we would need be cautious about not overly increasing block processing time by caching values from complex computations.
This also caches some of the validator metrics directly, rather than pulling them from the Prometheus metrics when the API is called. This means when the validator count exceeds the individual monitor threshold, the cached values will still be available.
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
NA
## Proposed Changes
Our `ERRO` stream has been rather noisy since the merge due to some unexpected behaviours of builders and EEs. Now that we've been running post-merge for a while, I think we can drop some of these `ERRO` to `WARN` so we're not "crying wolf".
The modified logs are:
#### `ERRO Execution engine call failed`
I'm seeing this quite frequently on Geth nodes. They seem to timeout when they're busy and it rarely indicates a serious issue. We also have logging across block import, fork choice updating and payload production that raise `ERRO` or `CRIT` when the EE times out, so I think we're not at risk of silencing actual issues.
#### `ERRO "Builder failed to reveal payload"`
In #3775 we reduced this log from `CRIT` to `ERRO` since it's common for builders to fail to reveal the block to the producer directly whilst still broadcasting it to the networ. I think it's worth dropping this to `WARN` since it's rarely interesting.
I elected to stay with `WARN` since I really do wish builders would fulfill their API promises by returning the block to us. Perhaps I'm just being pedantic here, I could be convinced otherwise.
#### `ERRO "Relay error when registering validator(s)"`
It seems like builders and/or mev-boost struggle to handle heavy loads of validator registrations. I haven't observed issues with validators not actually being registered, but I see timeouts on these endpoints many times a day. It doesn't seem like this `ERRO` is worth it.
#### `ERRO Error fetching block for peer ExecutionLayerErrorPayloadReconstruction`
This means we failed to respond to a peer on the P2P network with a block they requested because of an error in the `execution_layer`. It's very common to see timeouts or incomplete responses on this endpoint whilst the EE is busy and I don't think it's important enough for an `ERRO`. As long as the peer count stays high, I don't think the user needs to be actively concerned about how we're responding to peers.
## Additional Info
NA
* Add first efforts at broadcast
* Tidy
* Move broadcast code to client
* Progress with broadcast impl
* Rename to address change
* Fix compile errors
* Use `while` loop
* Tidy
* Flip broadcast condition
* Switch to forgetting individual indices
* Always broadcast when the node starts
* Refactor into two functions
* Add testing
* Add another test
* Tidy, add more testing
* Tidy
* Add test, rename enum
* Rename enum again
* Tidy
* Break loop early
* Add V15 schema migration
* Bump schema version
* Progress with migration
* Update beacon_node/client/src/address_change_broadcast.rs
Co-authored-by: Michael Sproul <micsproul@gmail.com>
* Fix typo in function name
---------
Co-authored-by: Michael Sproul <micsproul@gmail.com>
* Use Local Payload if More Profitable than Builder
* Rename clone -> clone_from_ref
* Minimize Clones of GetPayloadResponse
* Cleanup & Fix Tests
* Added Tests for Payload Choice by Profit
* Fix Outdated Comments
* Import BLS to execution changes before Capella
* Test for BLS to execution change HTTP API
* Pack BLS to execution changes in LIFO order
* Remove unused var
* Clippy
Issue #3112
Add `Filter::recover` to the GET chain to handle rejections specifically as 404 NOT FOUND
Making a request to `http://localhost:5052/not_real` now returns the following:
```
{
"code": 404,
"message": "NOT_FOUND",
"stacktraces": []
}
```
Co-authored-by: Paul Hauner <paul@paulhauner.com>
* Import BLS to execution changes before Capella
* Test for BLS to execution change HTTP API
* Pack BLS to execution changes in LIFO order
* Remove unused var
* Clippy
## Issue Addressed
Issue #3112
## Proposed Changes
Add `Filter::recover` to the GET chain to handle rejections specifically as 404 NOT FOUND
## Additional Info
Making a request to `http://localhost:5052/not_real` now returns the following:
```
{
"code": 404,
"message": "NOT_FOUND",
"stacktraces": []
}
```
Co-authored-by: Paul Hauner <paul@paulhauner.com>
I've needed to do this work in order to do some episub testing.
This version of libp2p has not yet been released, so this is left as a draft for when we wish to update.
Co-authored-by: Diva M <divma@protonmail.com>
## Proposed Changes
With proposer boosting implemented (#2822) we have an opportunity to re-org out late blocks.
This PR adds three flags to the BN to control this behaviour:
* `--disable-proposer-reorgs`: turn aggressive re-orging off (it's on by default).
* `--proposer-reorg-threshold N`: attempt to orphan blocks with less than N% of the committee vote. If this parameter isn't set then N defaults to 20% when the feature is enabled.
* `--proposer-reorg-epochs-since-finalization N`: only attempt to re-org late blocks when the number of epochs since finalization is less than or equal to N. The default is 2 epochs, meaning re-orgs will only be attempted when the chain is finalizing optimally.
For safety Lighthouse will only attempt a re-org under very specific conditions:
1. The block being proposed is 1 slot after the canonical head, and the canonical head is 1 slot after its parent. i.e. at slot `n + 1` rather than building on the block from slot `n` we build on the block from slot `n - 1`.
2. The current canonical head received less than N% of the committee vote. N should be set depending on the proposer boost fraction itself, the fraction of the network that is believed to be applying it, and the size of the largest entity that could be hoarding votes.
3. The current canonical head arrived after the attestation deadline from our perspective. This condition was only added to support suppression of forkchoiceUpdated messages, but makes intuitive sense.
4. The block is being proposed in the first 2 seconds of the slot. This gives it time to propagate and receive the proposer boost.
## Additional Info
For the initial idea and background, see: https://github.com/ethereum/consensus-specs/pull/2353#issuecomment-950238004
There is also a specification for this feature here: https://github.com/ethereum/consensus-specs/pull/3034
Co-authored-by: Michael Sproul <micsproul@gmail.com>
Co-authored-by: pawan <pawandhananjay@gmail.com>
## Issue Addressed
NA
## Proposed Changes
In #3725 I introduced a `CRIT` log for unrevealed payloads, against @michaelsproul's [advice](https://github.com/sigp/lighthouse/pull/3725#discussion_r1034142113). After being woken up in the middle of the night by a block that was not revealed to the BN but *was* revealed to the network, I have capitulated. This PR implements @michaelsproul's suggestion and reduces the severity to `ERRO`.
Additionally, I have dropped a `CRIT` to an `ERRO` for when a block is published late. The block in question was indeed published late on the network, however now that we have builders that can slow down block production I don't think the error is "actionable" enough to warrant a `CRIT` for the user.
## Additional Info
NA
## Issue Addressed
#3724
## Proposed Changes
Exposes certain `validator_monitor` as an endpoint on the HTTP API. Will only return metrics for validators which are actively being monitored.
### Usage
```bash
curl -X GET "http://localhost:5052/lighthouse/ui/validator_metrics" -H "accept: application/json" | jq
```
```json
{
"data": {
"validators": {
"12345": {
"attestation_hits": 10,
"attestation_misses": 0,
"attestation_hit_percentage": 100,
"attestation_head_hits": 10,
"attestation_head_misses": 0,
"attestation_head_hit_percentage": 100,
"attestation_target_hits": 5,
"attestation_target_misses": 5,
"attestation_target_hit_percentage": 50
}
}
}
}
```
## Additional Info
Based on #3756 which should be merged first.
* Add API endpoint to count statuses of all validators (#3756)
* Delete DB schema migrations for v11 and earlier (#3761)
Co-authored-by: Mac L <mjladson@pm.me>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
## Issue Addressed
#3724
## Proposed Changes
Adds an endpoint to quickly count the number of occurances of each status in the validator set.
## Usage
```bash
curl -X GET "http://localhost:5052/lighthouse/ui/validator_count" -H "accept: application/json" | jq
```
```json
{
"data": {
"active_ongoing":479508,
"active_exiting":0,
"active_slashed":0,
"pending_initialized":28,
"pending_queued":0,
"withdrawal_possible":933,
"withdrawal_done":0,
"exited_unslashed":0,
"exited_slashed":3
}
}
```
## Issue Addressed
#3704
## Proposed Changes
Adds is_syncing_finalized: bool parameter for block verification functions. Sets the payload_verification_status to Optimistic if is_syncing_finalized is true. Uses SyncState in NetworkGlobals in BeaconProcessor to retrieve the syncing status.
## Additional Info
I could implement FinalizedSignatureVerifiedBlock if you think it would be nicer.
This PR adds some health endpoints for the beacon node and the validator client.
Specifically it adds the endpoint:
`/lighthouse/ui/health`
These are not entirely stable yet. But provide a base for modification for our UI.
These also may have issues with various platforms and may need modification.
## Issue Addressed
NA
## Proposed Changes
Adds clarification to an error log when there is an error submitting a validator registration.
There seems to be a few cases where relays return errors during validator registration, including spurious timeouts and when a validator has been very recently activated/made pending.
Changing this log helps indicate that it's "just another registration error" rather than something more serious. I didn't drop this to a `WARN` since I still have hope we can eliminate these errors completely by chatting with relays and adjusting timeouts.
## Additional Info
NA
## Issue Addressed
New lints for rust 1.65
## Proposed Changes
Notable change is the identification or parameters that are only used in recursion
## Additional Info
na
## Summary
The deposit cache now has the ability to finalize deposits. This will cause it to drop unneeded deposit logs and hashes in the deposit Merkle tree that are no longer required to construct deposit proofs. The cache is finalized whenever the latest finalized checkpoint has a new `Eth1Data` with all deposits imported.
This has three benefits:
1. Improves the speed of constructing Merkle proofs for deposits as we can just replay deposits since the last finalized checkpoint instead of all historical deposits when re-constructing the Merkle tree.
2. Significantly faster weak subjectivity sync as the deposit cache can be transferred to the newly syncing node in compressed form. The Merkle tree that stores `N` finalized deposits requires a maximum of `log2(N)` hashes. The newly syncing node then only needs to download deposits since the last finalized checkpoint to have a full tree.
3. Future proofing in preparation for [EIP-4444](https://eips.ethereum.org/EIPS/eip-4444) as execution nodes will no longer be required to store logs permanently so we won't always have all historical logs available to us.
## More Details
Image to illustrate how the deposit contract merkle tree evolves and finalizes along with the resulting `DepositTreeSnapshot`
![image](https://user-images.githubusercontent.com/37123614/151465302-5fc56284-8a69-4998-b20e-45db3934ac70.png)
## Other Considerations
I've changed the structure of the `SszDepositCache` so once you load & save your database from this version of lighthouse, you will no longer be able to load it from older versions.
Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>
* add capella gossip boiler plate
* get everything compiling
Co-authored-by: realbigsean <sean@sigmaprime.io
Co-authored-by: Mark Mackey <mark@sigmaprime.io>
* small cleanup
* small cleanup
* cargo fix + some test cleanup
* improve block production
* add fixme for potential panic
Co-authored-by: Mark Mackey <mark@sigmaprime.io>
## Issue Addressed
This reverts commit ca9dc8e094 (PR #3559) with some modifications.
## Proposed Changes
Unfortunately that PR introduced a performance regression in fork choice. The optimisation _intended_ to build the exit and pubkey caches on the head state _only if_ they were not already built. However, due to the head state always being cloned without these caches, we ended up building them every time the head changed, leading to a ~70ms+ penalty on mainnet.
fcfd02aeec/beacon_node/beacon_chain/src/canonical_head.rs (L633-L636)
I believe this is a severe enough regression to justify immediately releasing v3.2.1 with this change.
## Additional Info
I didn't fully revert #3559, because there were some unrelated deletions of dead code in that PR which I figured we may as well keep.
An alternative would be to clone the extra caches, but this likely still imposes some cost, so in the interest of applying a conservative fix quickly, I think reversion is the best approach. The optimisation from #3559 was not even optimising a particularly significant path, it was mostly for VCs running larger numbers of inactive keys. We can re-do it in the `tree-states` world where cache clones are cheap.
## Issue Addressed
While digging around in some logs I noticed that queries for validators by pubkey were taking 10ms+, which seemed too long. This was due to a loop through the entire validator registry for each lookup.
## Proposed Changes
Rather than using a loop through the register, this PR utilises the pubkey cache which is usually initialised at the head*. In case the cache isn't built, we fall back to the previous loop logic. In the vast majority of cases I expect the cache will be built, as the validator client queries at the `head` where all caches should be built.
## Additional Info
*I had to modify the cache build that runs after fork choice to build the pubkey cache. I think it had been optimised out, perhaps accidentally. I think it's preferable to have the exit cache and the pubkey cache built on the head state, as they are required for verifying deposits and exits respectively, and we may as well build them off the hot path of block processing. Previously they'd get built the first time a deposit or exit needed to be verified.
I've deleted the unused `map_state` function which was obsoleted by `map_state_and_execution_optimistic`.
## Issue Addressed
N/A
## Proposed Changes
With https://github.com/sigp/lighthouse/pull/3214 we made it such that you can either have 1 auth endpoint or multiple non auth endpoints. Now that we are post merge on all networks (testnets and mainnet), we cannot progress a chain without a dedicated auth execution layer connection so there is no point in having a non-auth eth1-endpoint for syncing deposit cache.
This code removes all fallback related code in the eth1 service. We still keep the single non-auth endpoint since it's useful for testing.
## Additional Info
This removes all eth1 fallback related metrics that were relevant for the monitoring service, so we might need to change the api upstream.
## Issue Addressed
fixes lints from the last rust release
## Proposed Changes
Fix the lints, most of the lints by `clippy::question-mark` are false positives in the form of https://github.com/rust-lang/rust-clippy/issues/9518 so it's allowed for now
## Additional Info
## Issue Addressed
NA
## Proposed Changes
This PR removes duplicated block root computation.
Computing the `SignedBeaconBlock::canonical_root` has become more expensive since the merge as we need to compute the merke root of each transaction inside an `ExecutionPayload`.
Computing the root for [a mainnet block](https://beaconcha.in/slot/4704236) is taking ~10ms on my i7-8700K CPU @ 3.70GHz (no sha extensions). Given that our median seen-to-imported time for blocks is presently 300-400ms, removing a few duplicated block roots (~30ms) could represent an easy 10% improvement. When we consider that the seen-to-imported times include operations *after* the block has been placed in the early attester cache, we could expect the 30ms to be more significant WRT our seen-to-attestable times.
## Additional Info
NA
## Issue Addressed
https://github.com/ethereum/beacon-APIs/pull/222
## Proposed Changes
Update Lighthouse's randao verification API to match the `beacon-APIs` spec. We implemented the API before spec stabilisation, and it changed slightly in the course of review.
Rather than a flag `verify_randao` taking a boolean value, the new API uses a `skip_randao_verification` flag which takes no argument. The new spec also requires the randao reveal to be present and equal to the point-at-infinity when `skip_randao_verification` is set.
I've also updated the `POST /lighthouse/analysis/block_rewards` API to take blinded blocks as input, as the execution payload is irrelevant and we may want to assess blocks produced by builders.
## Additional Info
This is technically a breaking change, but seeing as I suspect I'm the only one using these parameters/APIs, I think we're OK to include this in a patch release.
## Issue Addressed
Resolves https://github.com/sigp/lighthouse/issues/3517
## Proposed Changes
Adds a `--builder-profit-threshold <wei value>` flag to the BN. If an external payload's value field is less than this value, the local payload will be used. The value of the local payload will not be checked (it can't really be checked until the engine API is updated to support this).
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Issue Addressed
When requesting an index which is not active during `start_epoch`, Lighthouse returns:
```
curl "http://localhost:5052/lighthouse/analysis/attestation_performance/999999999?start_epoch=100000&end_epoch=100000"
```
```json
{
"code": 500,
"message": "INTERNAL_SERVER_ERROR: ParticipationCache(InvalidValidatorIndex(999999999))",
"stacktraces": []
}
```
This error occurs even when the index in question becomes active before `end_epoch` which is undesirable as it can prevent larger queries from completing.
## Proposed Changes
In the event the index is out-of-bounds (has not yet been activated), simply return all fields as `false`:
```
-> curl "http://localhost:5052/lighthouse/analysis/attestation_performance/999999999?start_epoch=100000&end_epoch=100000"
```
```json
[
{
"index": 999999999,
"epochs": {
"100000": {
"active": false,
"head": false,
"target": false,
"source": false
}
}
}
]
```
By doing this, we cover the case where a validator becomes active sometime between `start_epoch` and `end_epoch`.
## Additional Info
Note that this error only occurs for epochs after the Altair hard fork.
## Issue Addressed
NA
## Proposed Changes
As we've seen on Prater, there seems to be a correlation between these messages
```
WARN Not enough time for a discovery search subnet_id: ExactSubnet { subnet_id: SubnetId(19), slot: Slot(3742336) }, service: attestation_service
```
... and nodes falling 20-30 slots behind the head for short periods. These nodes are running ~20k Prater validators.
After running some metrics, I can see that the `network_recv` channel is processing ~250k `AttestationSubscribe` messages per minute. It occurred to me that perhaps the `AttestationSubscribe` messages are "washing out" the `SendRequest` and `SendResponse` messages. In this PR I separate the `AttestationSubscribe` and `SyncCommitteeSubscribe` messages into their own queue so the `tokio::select!` in the `NetworkService` can still process the other messages in the `network_recv` channel without necessarily having to clear all the subscription messages first.
~~I've also added filter to the HTTP API to prevent duplicate subscriptions going to the network service.~~
## Additional Info
- Currently being tested on Prater
## Proposed Changes
This PR has two aims: to speed up attestation packing in the op pool, and to fix bugs in the verification of attester slashings, proposer slashings and voluntary exits. The changes are bundled into a single database schema upgrade (v12).
Attestation packing is sped up by removing several inefficiencies:
- No more recalculation of `attesting_indices` during packing.
- No (unnecessary) examination of the `ParticipationFlags`: a bitfield suffices. See `RewardCache`.
- No re-checking of attestation validity during packing: the `AttestationMap` provides attestations which are "correct by construction" (I have checked this using Hydra).
- No SSZ re-serialization for the clunky `AttestationId` type (it can be removed in a future release).
So far the speed-up seems to be roughly 2-10x, from 500ms down to 50-100ms.
Verification of attester slashings, proposer slashings and voluntary exits is fixed by:
- Tracking the `ForkVersion`s that were used to verify each message inside the `SigVerifiedOp`. This allows us to quickly re-verify that they match the head state's opinion of what the `ForkVersion` should be at the epoch(s) relevant to the message.
- Storing the `SigVerifiedOp` on disk rather than the raw operation. This allows us to continue track the fork versions after a reboot.
This is mostly contained in this commit 52bb1840ae5c4356a8fc3a51e5df23ed65ed2c7f.
## Additional Info
The schema upgrade uses the justified state to re-verify attestations and compute `attesting_indices` for them. It will drop any attestations that fail to verify, by the logic that attestations are most valuable in the few slots after they're observed, and are probably stale and useless by the time a node restarts. Exits and proposer slashings and similarly re-verified to obtain `SigVerifiedOp`s.
This PR contains a runtime killswitch `--paranoid-block-proposal` which opts out of all the optimisations in favour of closely verifying every included message. Although I'm quite sure that the optimisations are correct this flag could be useful in the event of an unforeseen emergency.
Finally, you might notice that the `RewardCache` appears quite useless in its current form because it is only updated on the hot-path immediately before proposal. My hope is that in future we can shift calls to `RewardCache::update` into the background, e.g. while performing the state advance. It is also forward-looking to `tree-states` compatibility, where iterating and indexing `state.{previous,current}_epoch_participation` is expensive and needs to be minimised.
## Issue Addressed
#3465
## Proposed Changes
Filter out any validator registrations for validators that are not `active` or `pending`. I'm adding this filtering the beacon node because all the information is readily available there. In other parts of the VC we are usually sending per-validator requests based on duties from the BN. And duties will only be provided for active validators so we don't have this type of filtering elsewhere in the VC.
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Issue Addressed
N/A
## Proposed Changes
Fix clippy lints for latest rust version 1.63. I have allowed the [derive_partial_eq_without_eq](https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq) lint as satisfying this lint would result in more code that we might not want and I feel it's not required.
Happy to fix this lint across lighthouse if required though.
## Issue Addressed
Resolves#3388Resolves#2638
## Proposed Changes
- Return the `BellatrixPreset` on `/eth/v1/config/spec` by default.
- Allow users to opt out of this by providing `--http-spec-fork=altair` (unless there's a Bellatrix fork epoch set).
- Add the Altair constants from #2638 and make serving the constants non-optional (the `http-disable-legacy-spec` flag is deprecated).
- Modify the VC to only read the `Config` and not to log extra fields. This prevents it from having to muck around parsing the `ConfigAndPreset` fields it doesn't need.
## Additional Info
This change is backwards-compatible for the VC and the BN, but is marked as a breaking change for the removal of `--http-disable-legacy-spec`.
I tried making `Config` a `superstruct` too, but getting the automatic decoding to work was a huge pain and was going to require a lot of hacks, so I gave up in favour of keeping the default-based approach we have now.
## Issue Addressed
- Resolves#3266
## Proposed Changes
Return 200 OK rather than an error when a block, attestation or sync message is already known.
Presently, we will log return an error which causes a BN to go "offline" from the VCs perspective which causes the fallback mechanism to do work to try and avoid and upcheck offline nodes. This can be observed as instability in the `vc_beacon_nodes_available_count` metric.
The current behaviour also causes scary logs for the user. There's nothing to *actually* be concerned about when we see duplicate messages, this can happen on fallback systems (see code comments).
## Additional Info
NA
## Issue Addressed
Resolves https://github.com/sigp/lighthouse/issues/3394
Adds a check in `is_healthy` about whether the head is optimistic when choosing whether to use the builder network.
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Issue Addressed
#3418
## Proposed Changes
- Remove `eth/v2/validator/blinded_blocks/{slot}` as this endpoint does not exist in the spec.
- Return `version` in the `eth/v1/validator/blinded_blocks/{slot}` endpoint.
## Additional Info
Since this removes the `v2` endpoint, this is *technically* a breaking change, but as this does not exist in the spec users may or may not be relying on this.
Depending on what we feel is appropriate, I'm happy to edit this so we keep the `v2` endpoint for now but simply bring the `v1` endpoint in line with `v2`.
## Issue Addressed
https://github.com/status-im/nimbus-eth2/issues/3930
## Proposed Changes
We can trivially support beacon nodes which do not provide the `is_optimistic` field by wrapping the field in an `Option`.
## Issue Addressed
NA
## Proposed Changes
This PR will make Lighthouse return blocks with invalid payloads via the API with `execution_optimistic = true`. This seems a bit awkward, however I think it's better than returning a 404 or some other error.
Let's consider the case where the only possible head is invalid (#3370 deals with this). In such a scenario all of the duties endpoints will start failing because the head is invalid. I think it would be better if the duties endpoints continue to work, because it's likely that even though the head is invalid the duties are still based upon valid blocks and we want the VC to have them cached. There's no risk to the VC here because we won't actually produce an attestation pointing to an invalid head.
Ultimately, I don't think it's particularly important for us to distinguish between optimistic and invalid blocks on the API. Neither should be trusted and the only *real* reason that we track this is so we can try and fork around the invalid blocks.
## Additional Info
- ~~Blocked on #3370~~
## Issue Addressed
https://github.com/sigp/lighthouse/issues/3091
Extends https://github.com/sigp/lighthouse/pull/3062, adding pre-bellatrix block support on blinded endpoints and allowing the normal proposal flow (local payload construction) on blinded endpoints. This resulted in better fallback logic because the VC will not have to switch endpoints on failure in the BN <> Builder API, the BN can just fallback immediately and without repeating block processing that it shouldn't need to. We can also keep VC fallback from the VC<>BN API's blinded endpoint to full endpoint.
## Proposed Changes
- Pre-bellatrix blocks on blinded endpoints
- Add a new `PayloadCache` to the execution layer
- Better fallback-from-builder logic
## Todos
- [x] Remove VC transition logic
- [x] Add logic to only enable builder flow after Merge transition finalization
- [x] Tests
- [x] Fix metrics
- [x] Rustdocs
Co-authored-by: Mac L <mjladson@pm.me>
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Issue Addressed
NA
## Proposed Changes
There are scenarios where the only viable head will have an invalid execution payload, in this scenario the `get_head` function on `proto_array` will return an error. We must recover from this scenario by importing blocks from the network.
This PR stops `BeaconChain::recompute_head` from returning an error so that we can't accidentally start down-scoring peers or aborting block import just because the current head has an invalid payload.
## Reviewer Notes
The following changes are included:
1. Allow `fork_choice.get_head` to fail gracefully in `BeaconChain::process_block` when trying to update the `early_attester_cache`; simply don't add the block to the cache rather than aborting the entire process.
1. Don't return an error from `BeaconChain::recompute_head_at_current_slot` and `BeaconChain::recompute_head` to defensively prevent calling functions from aborting any process just because the fork choice function failed to run.
- This should have practically no effect, since most callers were still continuing if recomputing the head failed.
- The outlier is that the API will return 200 rather than a 500 when fork choice fails.
1. Add the `ProtoArrayForkChoice::set_all_blocks_to_optimistic` function to recover from the scenario where we've rebooted and the persisted fork choice has an invalid head.
## Issue Addressed
Resolves#3151
## Proposed Changes
When fetching duties for sync committee contributions, check the value of `execution_optimistic` of the head block from the BN and refuse to sign any sync committee messages `if execution_optimistic == true`.
## Additional Info
- Is backwards compatible with older BNs
- Finding a way to add test coverage for this would be prudent. Open to suggestions.
## Issue Addressed
As specified in the [Beacon Chain API specs](https://github.com/ethereum/beacon-APIs/blob/master/apis/node/syncing.yaml#L32-L35) we should return `is_optimistic` as part of the response to a query for the `eth/v1/node/syncing` endpoint.
## Proposed Changes
Compute the optimistic status of the head and add it to the `SyncingData` response.
## Issue Addressed
Add a flag that optionally enables unrealized vote tracking. Would like to test out on testnets and benchmark differences in methods of vote tracking. This PR includes a DB schema upgrade to enable to new vote tracking style.
Co-authored-by: realbigsean <sean@sigmaprime.io>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
Co-authored-by: sean <seananderson33@gmail.com>
Co-authored-by: Mac L <mjladson@pm.me>
## Issue Addressed
#3031
## Proposed Changes
Updates the following API endpoints to conform with https://github.com/ethereum/beacon-APIs/pull/190 and https://github.com/ethereum/beacon-APIs/pull/196
- [x] `beacon/states/{state_id}/root`
- [x] `beacon/states/{state_id}/fork`
- [x] `beacon/states/{state_id}/finality_checkpoints`
- [x] `beacon/states/{state_id}/validators`
- [x] `beacon/states/{state_id}/validators/{validator_id}`
- [x] `beacon/states/{state_id}/validator_balances`
- [x] `beacon/states/{state_id}/committees`
- [x] `beacon/states/{state_id}/sync_committees`
- [x] `beacon/headers`
- [x] `beacon/headers/{block_id}`
- [x] `beacon/blocks/{block_id}`
- [x] `beacon/blocks/{block_id}/root`
- [x] `beacon/blocks/{block_id}/attestations`
- [x] `debug/beacon/states/{state_id}`
- [x] `debug/beacon/heads`
- [x] `validator/duties/attester/{epoch}`
- [x] `validator/duties/proposer/{epoch}`
- [x] `validator/duties/sync/{epoch}`
Updates the following Server-Sent Events:
- [x] `events?topics=head`
- [x] `events?topics=block`
- [x] `events?topics=finalized_checkpoint`
- [x] `events?topics=chain_reorg`
## Backwards Incompatible
There is a very minor breaking change with the way the API now handles requests to `beacon/blocks/{block_id}/root` and `beacon/states/{state_id}/root` when `block_id` or `state_id` is the `Root` variant of `BlockId` and `StateId` respectively.
Previously a request to a non-existent root would simply echo the root back to the requester:
```
curl "http://localhost:5052/eth/v1/beacon/states/0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/root"
{"data":{"root":"0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}}
```
Now it will return a `404`:
```
curl "http://localhost:5052/eth/v1/beacon/blocks/0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/root"
{"code":404,"message":"NOT_FOUND: beacon block with root 0xaaaa…aaaa","stacktraces":[]}
```
In addition to this is the block root `0x0000000000000000000000000000000000000000000000000000000000000000` previously would return the genesis block. It will now return a `404`:
```
curl "http://localhost:5052/eth/v1/beacon/blocks/0x0000000000000000000000000000000000000000000000000000000000000000"
{"code":404,"message":"NOT_FOUND: beacon block with root 0x0000…0000","stacktraces":[]}
```
## Additional Info
- `execution_optimistic` is always set, and will return `false` pre-Bellatrix. I am also open to the idea of doing something like `#[serde(skip_serializing_if = "Option::is_none")]`.
- The value of `execution_optimistic` is set to `false` where possible. Any computation that is reliant on the `head` will simply use the `ExecutionStatus` of the head (unless the head block is pre-Bellatrix).
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
Follow up to https://github.com/sigp/lighthouse/pull/3290 that fixes a caching bug
## Proposed Changes
Build the committee cache for the new `POST /lighthouse/analysis/block_rewards` API. Due to an unusual quirk of the total active balance cache the API endpoint would sometimes fail after loading a state from disk which had a current epoch cache _but not_ a total active balance cache. This PR adds calls to build the caches immediately before they're required, and has been running smoothly with `blockdreamer` the last few days.
## Issue Addressed
Closes https://github.com/sigp/lighthouse/issues/3270
## Proposed Changes
Optimize the calculation of historic beacon committees in the HTTP API.
This is achieved by allowing committee caches to be constructed for historic epochs, and constructing these committee caches on the fly in the API. This is much faster than reconstructing the state at the requested epoch, which usually takes upwards of 20s, and sometimes minutes with SPRP=8192. The depth of the `randao_mixes` array allows us to look back 64K epochs/0.8 years from a single state, which is pretty awesome!
We always use the `state_id` provided by the caller, but will return a nice 400 error if the epoch requested is out of range for the state requested, e.g.
```bash
# Prater
curl "http://localhost:5052/eth/v1/beacon/states/3170304/committees?epoch=33538"
```
```json
{"code":400,"message":"BAD_REQUEST: epoch out of bounds, try state at slot 1081344","stacktraces":[]}
```
Queries will be fastest when aligned to `slot % SPRP == 0`, so the hint suggests a slot that is 0 mod 8192.
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough 😅)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](de2b2801c8/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java (L171-L182)) it uses [`getStateRootFromBlockRoot`](de2b2801c8/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java (L336-L341)) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
## Issue Addressed
Lays the groundwork for builder API changes by implementing the beacon-API's new `register_validator` endpoint
## Proposed Changes
- Add a routine in the VC that runs on startup (re-try until success), once per epoch or whenever `suggested_fee_recipient` is updated, signing `ValidatorRegistrationData` and sending it to the BN.
- TODO: `gas_limit` config options https://github.com/ethereum/builder-specs/issues/17
- BN only sends VC registration data to builders on demand, but VC registration data *does update* the BN's prepare proposer cache and send an updated fcU to a local EE. This is necessary for fee recipient consistency between the blinded and full block flow in the event of fallback. Having the BN only send registration data to builders on demand gives feedback directly to the VC about relay status. Also, since the BN has no ability to sign these messages anyways (so couldn't refresh them if it wanted), and validator registration is independent of the BN head, I think this approach makes sense.
- Adds upcoming consensus spec changes for this PR https://github.com/ethereum/consensus-specs/pull/2884
- I initially applied the bit mask based on a configured application domain.. but I ended up just hard coding it here instead because that's how it's spec'd in the builder repo.
- Should application mask appear in the api?
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Proposed Changes
Add a new HTTP endpoint `POST /lighthouse/analysis/block_rewards` which takes a vec of `BeaconBlock`s as input and outputs the `BlockReward`s for them.
Augment the `BlockReward` struct with the attestation data for attestations in the block, which simplifies access to this information from blockprint. Using attestation data I've been able to make blockprint up to 95% accurate across Prysm/Lighthouse/Teku/Nimbus. I hope to go even higher using a bunch of synthetic blocks produced for Prysm/Nimbus/Lodestar, which are underrepresented in the current training data.
## Proposed Changes
Expand the set of paths tracked by the HTTP API metrics to include all paths hit by the validator client.
These paths were only partially updated for Altair, so we were missing some of the sync committee and v2 APIs.
## Issue Addressed
Upcoming spec change https://github.com/ethereum/consensus-specs/pull/2878
## Proposed Changes
1. Run fork choice at the start of every slot, and wait for this run to complete before proposing a block.
2. As an optimisation, also run fork choice 3/4 of the way through the slot (at 9s), _dequeueing attestations for the next slot_.
3. Remove the fork choice run from the state advance timer that occurred before advancing the state.
## Additional Info
### Block Proposal Accuracy
This change makes us more likely to propose on top of the correct head in the presence of re-orgs with proposer boost in play. The main scenario that this change is designed to address is described in the linked spec issue.
### Attestation Accuracy
This change _also_ makes us more likely to attest to the correct head. Currently in the case of a skipped slot at `slot` we only run fork choice 9s into `slot - 1`. This means the attestations from `slot - 1` aren't taken into consideration, and any boost applied to the block from `slot - 1` is not removed (it should be). In the language of the linked spec issue, this means we are liable to attest to C, even when the majority voting weight has already caused a re-org to B.
### Why remove the call before the state advance?
If we've run fork choice at the start of the slot then it has already dequeued all the attestations from the previous slot, which are the only ones eligible to influence the head in the current slot. Running fork choice again is unnecessary (unless we run it for the next slot and try to pre-empt a re-org, but I don't currently think this is a great idea).
### Performance
Based on Prater testing this adds about 5-25ms of runtime to block proposal times, which are 500-1000ms on average (and spike to 5s+ sometimes due to state handling issues 😢 ). I believe this is a small enough penalty to enable it by default, with the option to disable it via the new flag `--fork-choice-before-proposal-timeout 0`. Upcoming work on block packing and state representation will also reduce block production times in general, while removing the spikes.
### Implementation
Fork choice gets invoked at the start of the slot via the `per_slot_task` function called from the slot timer. It then uses a condition variable to signal to block production that fork choice has been updated. This is a bit funky, but it seems to work. One downside of the timer-based approach is that it doesn't happen automatically in most of the tests. The test added by this PR has to trigger the run manually.
# Description
Since the `TaskExecutor` currently requires a `Weak<Runtime>`, it's impossible to use it in an async test where the `Runtime` is created outside our scope. Whilst we *could* create a new `Runtime` instance inside the async test, dropping that `Runtime` would cause a panic (you can't drop a `Runtime` in an async context).
To address this issue, this PR creates the `enum Handle`, which supports either:
- A `Weak<Runtime>` (for use in our production code)
- A `Handle` to a runtime (for use in testing)
In theory, there should be no change to the behaviour of our production code (beyond some slightly different descriptions in HTTP 500 errors), or even our tests. If there is no change, you might ask *"why bother?"*. There are two PRs (#3070 and #3175) that are waiting on these fixes to introduce some new tests. Since we've added the EL to the `BeaconChain` (for the merge), we are now doing more async stuff in tests.
I've also added a `RuntimeExecutor` to the `BeaconChainTestHarness`. Whilst that's not immediately useful, it will become useful in the near future with all the new async testing.
Code simplifications using `Option`/`Result` combinators to make pattern-matches a tad simpler.
Opinions on these loosely held, happy to adjust in review.
Tool-aided by [comby-rust](https://github.com/huitseeker/comby-rust).
## Proposed Changes
Reduce post-merge disk usage by not storing finalized execution payloads in Lighthouse's database.
⚠️ **This is achieved in a backwards-incompatible way for networks that have already merged** ⚠️. Kiln users and shadow fork enjoyers will be unable to downgrade after running the code from this PR. The upgrade migration may take several minutes to run, and can't be aborted after it begins.
The main changes are:
- New column in the database called `ExecPayload`, keyed by beacon block root.
- The `BeaconBlock` column now stores blinded blocks only.
- Lots of places that previously used full blocks now use blinded blocks, e.g. analytics APIs, block replay in the DB, etc.
- On finalization:
- `prune_abanonded_forks` deletes non-canonical payloads whilst deleting non-canonical blocks.
- `migrate_db` deletes finalized canonical payloads whilst deleting finalized states.
- Conversions between blinded and full blocks are implemented in a compositional way, duplicating some work from Sean's PR #3134.
- The execution layer has a new `get_payload_by_block_hash` method that reconstructs a payload using the EE's `eth_getBlockByHash` call.
- I've tested manually that it works on Kiln, using Geth and Nethermind.
- This isn't necessarily the most efficient method, and new engine APIs are being discussed to improve this: https://github.com/ethereum/execution-apis/pull/146.
- We're depending on the `ethers` master branch, due to lots of recent changes. We're also using a workaround for https://github.com/gakonst/ethers-rs/issues/1134.
- Payload reconstruction is used in the HTTP API via `BeaconChain::get_block`, which is now `async`. Due to the `async` fn, the `blocking_json` wrapper has been removed.
- Payload reconstruction is used in network RPC to serve blocks-by-{root,range} responses. Here the `async` adjustment is messier, although I think I've managed to come up with a reasonable compromise: the handlers take the `SendOnDrop` by value so that they can drop it on _task completion_ (after the `fn` returns). Still, this is introducing disk reads onto core executor threads, which may have a negative performance impact (thoughts appreciated).
## Additional Info
- [x] For performance it would be great to remove the cloning of full blocks when converting them to blinded blocks to write to disk. I'm going to experiment with a `put_block` API that takes the block by value, breaks it into a blinded block and a payload, stores the blinded block, and then re-assembles the full block for the caller.
- [x] We should measure the latency of blocks-by-root and blocks-by-range responses.
- [x] We should add integration tests that stress the payload reconstruction (basic tests done, issue for more extensive tests: https://github.com/sigp/lighthouse/issues/3159)
- [x] We should (manually) test the schema v9 migration from several prior versions, particularly as blocks have changed on disk and some migrations rely on being able to load blocks.
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
NA
## Proposed Changes
Disallow the production of attestations and retrieval of unaggregated attestations when they reference an optimistic head. Add tests to this end.
I also moved `BeaconChain::produce_unaggregated_attestation_for_block` to the `BeaconChainHarness`. It was only being used during tests, so it's nice to stop pretending it's production code. I also needed something that could produce attestations to optimistic blocks in order to simulate scenarios where the justified checkpoint is determined invalid (if no one would attest to an optimistic block, we could never justify it and then flip it to invalid).
## Additional Info
- ~~Blocked on #3126~~
## Proposed Changes
I did some gardening 🌳 in our dependency tree:
- Remove duplicate versions of `warp` (git vs patch)
- Remove duplicate versions of lots of small deps: `cpufeatures`, `ethabi`, `ethereum-types`, `bitvec`, `nix`, `libsecp256k1`.
- Update MDBX (should resolve#3028). I tested and Lighthouse compiles on Windows 11 now.
- Restore `psutil` back to upstream
- Make some progress updating everything to rand 0.8. There are a few crates stuck on 0.7.
Hopefully this puts us on a better footing for future `cargo audit` issues, and improves compile times slightly.
## Additional Info
Some crates are held back by issues with `zeroize`. libp2p-noise depends on [`chacha20poly1305`](https://crates.io/crates/chacha20poly1305) which depends on zeroize < v1.5, and we can only have one version of zeroize because it's post 1.0 (see https://github.com/rust-lang/cargo/issues/6584). The latest version of `zeroize` is v1.5.4, which is used by the new versions of many other crates (e.g. `num-bigint-dig`). Once a new version of chacha20poly1305 is released we can update libp2p-noise and upgrade everything to the latest `zeroize` version.
I've also opened a PR to `blst` related to zeroize: https://github.com/supranational/blst/pull/111
## Proposed Changes
Increase the default `--slots-per-restore-point` to 8192 for a 4x reduction in freezer DB disk usage.
Existing nodes that use the previous default of 2048 will be left unchanged. Newly synced nodes (with or without checkpoint sync) will use the new 8192 default.
Long-term we could do away with the freezer DB entirely for validator-only nodes, but this change is much simpler and grants us some extra space in the short term. We can also roll it out gradually across our nodes by purging databases one by one, while keeping the Ansible config the same.
## Additional Info
We ignore a change from 2048 to 8192 if the user hasn't set the 8192 explicitly. We fire a debug log in the case where we do ignore:
```
DEBG Ignoring slots-per-restore-point config in favour of on-disk value, on_disk: 2048, config: 8192
```
## Issue Addressed
MEV boost compatibility
## Proposed Changes
See #2987
## Additional Info
This is blocked on the stabilization of a couple specs, [here](https://github.com/ethereum/beacon-APIs/pull/194) and [here](https://github.com/flashbots/mev-boost/pull/20).
Additional TODO's and outstanding questions
- [ ] MEV boost JWT Auth
- [ ] Will `builder_proposeBlindedBlock` return the revealed payload for the BN to propogate
- [ ] Should we remove `private-tx-proposals` flag and communicate BN <> VC with blinded blocks by default once these endpoints enter the beacon-API's repo? This simplifies merge transition logic.
Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Proposed Changes
Allow Lighthouse to speculatively create blocks via the `/eth/v1/validators/blocks` endpoint by optionally skipping the RANDAO verification that we introduced in #2740. When `verify_randao=false` is passed as a query parameter the `randao_reveal` is not required to be present, and if present will only be lightly checked (must be a valid BLS sig). If `verify_randao` is omitted it defaults to true and Lighthouse behaves exactly as it did previously, hence this PR is backwards-compatible.
I'd like to get this change into `unstable` pretty soon as I've got 3 projects building on top of it:
- [`blockdreamer`](https://github.com/michaelsproul/blockdreamer), which mocks block production every slot in order to fingerprint clients
- analysis of Lighthouse's block packing _optimality_, which uses `blockdreamer` to extract interesting instances of the attestation packing problem
- analysis of Lighthouse's block packing _performance_ (as in speed) on the `tree-states` branch
## Additional Info
Having tested `blockdreamer` with Prysm, Nimbus and Teku I noticed that none of them verify the randao signature on `/eth/v1/validator/blocks`. I plan to open a PR to the `beacon-APIs` repo anyway so that this parameter can be standardised in case the other clients add RANDAO verification by default in future.
## Issue Addressed
#3103
## Proposed Changes
Parse `http-address` and `metrics-address` as `IpAddr` for both the beacon node and validator client to support IPv6 addresses.
Also adjusts parsing of CORS origins to allow for IPv6 addresses.
## Usage
You can now set `http-address` and/or `metrics-address` flags to IPv6 addresses.
For example, the following:
`lighthouse bn --http --http-address :: --metrics --metrics-address ::1`
will expose the beacon node HTTP server on `[::]` (equivalent of `0.0.0.0` in IPv4) and the metrics HTTP server on `localhost` (the equivalent of `127.0.0.1` in IPv4)
The beacon node API can then be accessed by:
`curl "http://[server-ipv6-address]:5052/eth/v1/some_endpoint"`
And the metrics server api can be accessed by:
`curl "http://localhost:5054/metrics"` or by `curl "http://[::1]:5054/metrics"`
## Additional Info
On most Linux distributions the `v6only` flag is set to `false` by default (see the section for the `IPV6_V6ONLY` flag in https://www.man7.org/linux/man-pages/man7/ipv6.7.html) which means IPv4 connections will continue to function on a IPv6 address (providing it is appropriately mapped). This means that even if the Lighthouse API is running on `::` it is also possible to accept IPv4 connections.
However on Windows, this is not the case. The `v6only` flag is set to `true` so binding to `::` will only allow IPv6 connections.
## Issue Addressed
Resolves#2936
## Proposed Changes
Adds functionality for calling [`validator/prepare_beacon_proposer`](https://ethereum.github.io/beacon-APIs/?urls.primaryName=dev#/Validator/prepareBeaconProposer) in advance.
There is a `BeaconChain::prepare_beacon_proposer` method which, which called, computes the proposer for the next slot. If that proposer has been registered via the `validator/prepare_beacon_proposer` API method, then the `beacon_chain.execution_layer` will be provided the `PayloadAttributes` for us in all future forkchoiceUpdated calls. An artificial forkchoiceUpdated call will be created 4s before each slot, when the head updates and when a validator updates their information.
Additionally, I added strict ordering for calls from the `BeaconChain` to the `ExecutionLayer`. I'm not certain the `ExecutionLayer` will always maintain this ordering, but it's a good start to have consistency from the `BeaconChain`. There are some deadlock opportunities introduced, they are documented in the code.
## Additional Info
- ~~Blocked on #2837~~
Co-authored-by: realbigsean <seananderson33@GMAIL.com>
## Proposed Changes
Lots of lint updates related to `flat_map`, `unwrap_or_else` and string patterns. I did a little more creative refactoring in the op pool, but otherwise followed Clippy's suggestions.
## Additional Info
We need this PR to unblock CI.
## Issue Addressed
N/A
## Proposed Changes
Add a HTTP API which can be used to compute the block packing data for all blocks over a discrete range of epochs.
## Usage
### Request
```
curl "http:localhost:5052/lighthouse/analysis/block_packing_efficiency?start_epoch=57730&end_epoch=57732"
```
### Response
```
[
{
"slot": "1847360",
"block_hash": "0xa7dc230659802df2f99ea3798faede2e75942bb5735d56e6bfdc2df335dcd61f",
"proposer_info": {
"validator_index": 1686,
"graffiti": ""
},
"available_attestations": 7096,
"included_attestations": 6459,
"prior_skip_slots": 0
},
...
]
```
## Additional Info
This is notably different to the existing lcli code:
- Uses `BlockReplayer` #2863 and as such runs significantly faster than the previous method.
- Corrects the off-by-one #2878
- Removes the `offline` validators component. This was only a "best guess" and simply was used as a way to determine an estimate of the "true" packing efficiency and was generally not helpful in terms of direct comparisons between different packing methods. As such it has been removed from the API and any future estimates of "offline" validators would be better suited in a separate/more targeted API or as part of 'beacon watch': #2873
- Includes `prior_skip_slots`.
## Issue Addressed
Closes#2880
## Proposed Changes
Support requests to the next epoch in proposer_duties api.
## Additional Info
Implemented with skipping proposer cache for this case because the cache for the future epoch will be missed every new slot as dependent_root is changed and we don't want to "wash it out" by saving additional values.
## Issue Addressed
Lighthouse gossiping late messages
## Proposed Changes
Point LH to our fork using tokio interval, which 1) works as expected 2) is more performant than the previous version that actually worked as expected
Upgrade libp2p
## Additional Info
https://github.com/libp2p/rust-libp2p/issues/2497
## Issue Addressed
#2883
## Proposed Changes
* Added `suggested-fee-recipient` & `suggested-fee-recipient-file` flags to validator client (similar to graffiti / graffiti-file implementation).
* Added proposer preparation service to VC, which sends the fee-recipient of all known validators to the BN via [/eth/v1/validator/prepare_beacon_proposer](https://github.com/ethereum/beacon-APIs/pull/178) api once per slot
* Added [/eth/v1/validator/prepare_beacon_proposer](https://github.com/ethereum/beacon-APIs/pull/178) api endpoint and preparation data caching
* Added cleanup routine to remove cached proposer preparations when not updated for 2 epochs
## Additional Info
Changed the Implementation following the discussion in #2883.
Co-authored-by: pk910 <philipp@pk910.de>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
Co-authored-by: Philipp K <philipp@pk910.de>
## Issue Addressed
N/A
## Proposed Changes
Add a HTTP API which can be used to compute the attestation performances of a validator (or all validators) over a discrete range of epochs.
Performances can be computed for a single validator, or for the global validator set.
## Usage
### Request
The API can be used as follows:
```
curl "http://localhost:5052/lighthouse/analysis/attestation_performance/{validator_index}?start_epoch=57730&end_epoch=57732"
```
Alternatively, to compute performances for the global validator set:
```
curl "http://localhost:5052/lighthouse/analysis/attestation_performance/global?start_epoch=57730&end_epoch=57732"
```
### Response
The response is JSON formatted as follows:
```
[
{
"index": 72,
"epochs": {
"57730": {
"active": true,
"head": false,
"target": false,
"source": false
},
"57731": {
"active": true,
"head": true,
"target": true,
"source": true,
"delay": 1
},
"57732": {
"active": true,
"head": true,
"target": true,
"source": true,
"delay": 1
},
}
}
]
```
> Note that the `"epochs"` are not guaranteed to be in ascending order.
## Additional Info
- This API is intended to be used in our upcoming validator analysis tooling (#2873) and will likely not be very useful for regular users. Some advanced users or block explorers may find this API useful however.
- The request range is limited to 100 epochs (since the range is inclusive and it also computes the `end_epoch` it's actually 101 epochs) to prevent Lighthouse using exceptionally large amounts of memory.
## Issues Addressed
Closes#2739Closes#2812
## Proposed Changes
Support the deserialization of query strings containing duplicate keys into their corresponding types.
As `warp` does not support this feature natively (as discussed in #2739), it relies on the external library [`serde_array_query`](https://github.com/sigp/serde_array_query) (written by @michaelsproul)
This is backwards compatible meaning that both of the following requests will produce the same output:
```
curl "http://localhost:5052/eth/v1/events?topics=head,block"
```
```
curl "http://localhost:5052/eth/v1/events?topics=head&topics=block"
```
## Additional Info
Certain error messages have changed slightly. This only affects endpoints which accept multiple values.
For example:
```
{"code":400,"message":"BAD_REQUEST: invalid query: Invalid query string","stacktraces":[]}
```
is now
```
{"code":400,"message":"BAD_REQUEST: unable to parse query","stacktraces":[]}
```
The serve order of the endpoints `get_beacon_state_validators` and `get_beacon_state_validators_id` have flipped:
```rust
.or(get_beacon_state_validators_id.boxed())
.or(get_beacon_state_validators.boxed())
```
This is to ensure proper error messages when filter fallback occurs due to the use of the `and_then` filter.
## Future Work
- Cleanup / remove filter fallback behaviour by substituting `and_then` with `then` where appropriate.
- Add regression tests for HTTP API error messages.
## Credits
- @mooori for doing the ground work of investigating possible solutions within the existing Rust ecosystem.
- @michaelsproul for writing [`serde_array_query`](https://github.com/sigp/serde_array_query) and for helping debug the behaviour of the `warp` filter fallback leading to incorrect error messages.
## Issue Addressed
New rust lints
## Proposed Changes
- Boxing some enum variants
- removing some unused fields (is the validator lockfile unused? seemed so to me)
## Additional Info
- some error fields were marked as dead code but are logged out in areas
- left some dead fields in our ef test code because I assume they are useful for debugging?
Co-authored-by: realbigsean <seananderson33@gmail.com>
* Add payload verification status to fork choice
* Pass payload verification status to import_block
* Add valid back-propagation
* Add head safety status latch to API
* Remove ExecutionLayerStatus
* Add execution info to client notifier
* Update notifier logs
* Change use of "hash" to refer to beacon block
* Shutdown on invalid finalized block
* Tidy, add comments
* Fix failing FC tests
* Allow blocks with unsafe head
* Fix forkchoiceUpdate call on startup
* Reject some HTTP endpoints when EL is not ready
* Restrict more endpoints
* Add watchdog task
* Change scheduling
* Update to new schedule
* Add "syncing" concept
* Remove RequireSynced
* Add is_merge_complete to head_info
* Cache latest_head in Engines
* Call consensus_forkchoiceUpdate on startup
* Thread eth1_block_hash into interop genesis state
* Add merge-fork-epoch flag
* Build LH with minimal spec by default
* Add verbose logs to execution_layer
* Add --http-allow-sync-stalled flag
* Update lcli new-testnet to create genesis state
* Fix http test
* Fix compile errors in tests
## Issue Addressed
Part of a bigger effort to make the network globals read only. This moves all writes to the `PeerDB` to the `eth2_libp2p` crate. Limiting writes to the peer manager is a slightly more complicated issue for a next PR, to keep things reviewable.
## Proposed Changes
- Make the peers field in the globals a private field.
- Allow mutable access to the peers field to `eth2_libp2p` for now.
- Add a new network message to update the sync state.
Co-authored-by: Age Manning <Age@AgeManning.com>
This simply moves some functions that were "swarm notifications" to a network behaviour implementation.
Notes
------
- We could disconnect from the peer manager but we would lose the rpc shutdown message
- We still notify from the swarm since this is the most reliable way to get some events. Ugly but best for now
- Events need to be pushed with "add event" to wake the waker
Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>
## Issue Addressed
I've done this change in a couple of WIPs already so I might as well submit it on its own. This changes no functionality but reduces coupling in a 0.0001%. It also helps new people who need to work in the peer manager to better understand what it actually needs from the outside
## Proposed Changes
Add a config to the peer manager
## Proposed Changes
* Add the `Eth-Consensus-Version` header to the HTTP API for the block and state endpoints. This is part of the v2.1.0 API that was recently released: https://github.com/ethereum/beacon-APIs/pull/170
* Add tests for the above. I refactored the `eth2` crate's helper functions to make this more straight-forward, and introduced some new mixin traits that I think greatly improve readability and flexibility.
* Add a new `map_with_fork!` macro which is useful for decoding a superstruct type without naming all its variants. It is now used for SSZ-decoding `BeaconBlock` and `BeaconState`, and for JSON-decoding `SignedBeaconBlock` in the API.
## Additional Info
The `map_with_fork!` changes will conflict with the Merge changes, but when resolving the conflict the changes from this branch should be preferred (it is no longer necessary to enumerate every fork). The merge fork _will_ need to be added to `map_fork_name_with`.
## Description
The `eth2_libp2p` crate was originally named and designed to incorporate a simple libp2p integration into lighthouse. Since its origins the crates purpose has expanded dramatically. It now houses a lot more sophistication that is specific to lighthouse and no longer just a libp2p integration.
As of this writing it currently houses the following high-level lighthouse-specific logic:
- Lighthouse's implementation of the eth2 RPC protocol and specific encodings/decodings
- Integration and handling of ENRs with respect to libp2p and eth2
- Lighthouse's discovery logic, its integration with discv5 and logic about searching and handling peers.
- Lighthouse's peer manager - This is a large module handling various aspects of Lighthouse's network, such as peer scoring, handling pings and metadata, connection maintenance and recording, etc.
- Lighthouse's peer database - This is a collection of information stored for each individual peer which is specific to lighthouse. We store connection state, sync state, last seen ips and scores etc. The data stored for each peer is designed for various elements of the lighthouse code base such as syncing and the http api.
- Gossipsub scoring - This stores a collection of gossipsub 1.1 scoring mechanisms that are continuously analyssed and updated based on the ethereum 2 networks and how Lighthouse performs on these networks.
- Lighthouse specific types for managing gossipsub topics, sync status and ENR fields
- Lighthouse's network HTTP API metrics - A collection of metrics for lighthouse network monitoring
- Lighthouse's custom configuration of all networking protocols, RPC, gossipsub, discovery, identify and libp2p.
Therefore it makes sense to rename the crate to be more akin to its current purposes, simply that it manages the majority of Lighthouse's network stack. This PR renames this crate to `lighthouse_network`
Co-authored-by: Paul Hauner <paul@paulhauner.com>