## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough 😅)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](de2b2801c8/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java (L171-L182)) it uses [`getStateRootFromBlockRoot`](de2b2801c8/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java (L336-L341)) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
## Issue Addressed
This PR is a subset of the changes in #3134. Unstable will still not function correctly with the new builder spec once this is merged, #3134 should be used on testnets
## Proposed Changes
- Removes redundancy in "builders" (servers implementing the builder spec)
- Renames `payload-builder` flag to `builder`
- Moves from old builder RPC API to new HTTP API, but does not implement the validator registration API (implemented in https://github.com/sigp/lighthouse/pull/3194)
Co-authored-by: sean <seananderson33@gmail.com>
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Issue Addressed
Lays the groundwork for builder API changes by implementing the beacon-API's new `register_validator` endpoint
## Proposed Changes
- Add a routine in the VC that runs on startup (re-try until success), once per epoch or whenever `suggested_fee_recipient` is updated, signing `ValidatorRegistrationData` and sending it to the BN.
- TODO: `gas_limit` config options https://github.com/ethereum/builder-specs/issues/17
- BN only sends VC registration data to builders on demand, but VC registration data *does update* the BN's prepare proposer cache and send an updated fcU to a local EE. This is necessary for fee recipient consistency between the blinded and full block flow in the event of fallback. Having the BN only send registration data to builders on demand gives feedback directly to the VC about relay status. Also, since the BN has no ability to sign these messages anyways (so couldn't refresh them if it wanted), and validator registration is independent of the BN head, I think this approach makes sense.
- Adds upcoming consensus spec changes for this PR https://github.com/ethereum/consensus-specs/pull/2884
- I initially applied the bit mask based on a configured application domain.. but I ended up just hard coding it here instead because that's how it's spec'd in the builder repo.
- Should application mask appear in the api?
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Issue Addressed
Resolves#3069
## Proposed Changes
Unify the `eth1-endpoints` and `execution-endpoints` flags in a backwards compatible way as described in https://github.com/sigp/lighthouse/issues/3069#issuecomment-1134219221
Users have 2 options:
1. Use multiple non auth execution endpoints for deposit processing pre-merge
2. Use a single jwt authenticated execution endpoint for both execution layer and deposit processing post merge
Related https://github.com/sigp/lighthouse/issues/3118
To enable jwt authenticated deposit processing, this PR removes the calls to `net_version` as the `net` namespace is not exposed in the auth server in execution clients.
Moving away from using `networkId` is a good step in my opinion as it doesn't provide us with any added guarantees over `chainId`. See https://github.com/ethereum/consensus-specs/issues/2163 and https://github.com/sigp/lighthouse/issues/2115
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
Failures in our CI integration tests for Geth.
## Proposed Changes
Only connect to the authenticated execution endpoints during execution tests.
This is necessary now that it is impossible to connect to the `engine` api on an unauthenticated endpoint.
See https://github.com/ethereum/go-ethereum/pull/24997
## Additional Info
As these tests break semi-regularly, I have kept logs enabled to ease future debugging.
I've also updated the Nethermind tests, although these weren't broken. This should future-proof us if Nethermind decides to follow suit with Geth
## Issue Addressed
N/A
## Proposed Changes
Preemptively switch Nethermind integration tests to use the `master` branch along with the baked in `kiln` config.
## Additional Info
There have been some spurious timeouts across CI so this also increases the timeout to 20s.
## Issue Addressed
This fixes the low-hanging Clippy lints introduced in Rust 1.61 (due any hour now). It _ignores_ one lint, because fixing it requires a structural refactor of the validator client that needs to be done delicately. I've started on that refactor and will create another PR that can be reviewed in more depth in the coming days. I think we should merge this PR in the meantime to unblock CI.
## Issue Addressed
Web3Signer validators do not support client authentication. This means the `--tls-known-clients-file` option on Web3Signer can't be used with Lighthouse.
## Proposed Changes
Add two new fields to Web3Signer validators, `client_identity_path` and `client_identity_password`, which specify the path and password for a PKCS12 file containing a certificate and private key. If `client_identity_path` is present, use the certificate for SSL client authentication.
## Additional Info
I am successfully validating on Prater using client authentication with Web3Signer and client authentication.
## Proposed Changes
Reduce post-merge disk usage by not storing finalized execution payloads in Lighthouse's database.
⚠️ **This is achieved in a backwards-incompatible way for networks that have already merged** ⚠️. Kiln users and shadow fork enjoyers will be unable to downgrade after running the code from this PR. The upgrade migration may take several minutes to run, and can't be aborted after it begins.
The main changes are:
- New column in the database called `ExecPayload`, keyed by beacon block root.
- The `BeaconBlock` column now stores blinded blocks only.
- Lots of places that previously used full blocks now use blinded blocks, e.g. analytics APIs, block replay in the DB, etc.
- On finalization:
- `prune_abanonded_forks` deletes non-canonical payloads whilst deleting non-canonical blocks.
- `migrate_db` deletes finalized canonical payloads whilst deleting finalized states.
- Conversions between blinded and full blocks are implemented in a compositional way, duplicating some work from Sean's PR #3134.
- The execution layer has a new `get_payload_by_block_hash` method that reconstructs a payload using the EE's `eth_getBlockByHash` call.
- I've tested manually that it works on Kiln, using Geth and Nethermind.
- This isn't necessarily the most efficient method, and new engine APIs are being discussed to improve this: https://github.com/ethereum/execution-apis/pull/146.
- We're depending on the `ethers` master branch, due to lots of recent changes. We're also using a workaround for https://github.com/gakonst/ethers-rs/issues/1134.
- Payload reconstruction is used in the HTTP API via `BeaconChain::get_block`, which is now `async`. Due to the `async` fn, the `blocking_json` wrapper has been removed.
- Payload reconstruction is used in network RPC to serve blocks-by-{root,range} responses. Here the `async` adjustment is messier, although I think I've managed to come up with a reasonable compromise: the handlers take the `SendOnDrop` by value so that they can drop it on _task completion_ (after the `fn` returns). Still, this is introducing disk reads onto core executor threads, which may have a negative performance impact (thoughts appreciated).
## Additional Info
- [x] For performance it would be great to remove the cloning of full blocks when converting them to blinded blocks to write to disk. I'm going to experiment with a `put_block` API that takes the block by value, breaks it into a blinded block and a payload, stores the blinded block, and then re-assembles the full block for the caller.
- [x] We should measure the latency of blocks-by-root and blocks-by-range responses.
- [x] We should add integration tests that stress the payload reconstruction (basic tests done, issue for more extensive tests: https://github.com/sigp/lighthouse/issues/3159)
- [x] We should (manually) test the schema v9 migration from several prior versions, particularly as blocks have changed on disk and some migrations rely on being able to load blocks.
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Proposed Changes
Recently, changes to Nethermind's Kiln branch have broken our integration tests.
This PR updates the chainspec to Kiln to ensure proper compatibility.
## Issue Addressed
N/A
## Proposed Changes
https://github.com/sigp/lighthouse/pull/3133 changed the rpc type limits to be fork aware i.e. if our current fork based on wall clock slot is Altair, then we apply only altair rpc type limits. This is a bug because phase0 blocks can still be sent over rpc and phase 0 block minimum size is smaller than altair block minimum size. So a phase0 block with `size < SIGNED_BEACON_BLOCK_ALTAIR_MIN` will return an `InvalidData` error as it doesn't pass the rpc types bound check.
This error can be seen when we try syncing pre-altair blocks with size smaller than `SIGNED_BEACON_BLOCK_ALTAIR_MIN`.
This PR fixes the issue by also accounting for forks earlier than current_fork in the rpc limits calculation in the `rpc_block_limits_by_fork` function. I decided to hardcode the limits in the function because that seemed simpler than calculating previous forks based on current fork and doing a min across forks. Adding a new fork variant is simple and can the limits can be easily checked in a review.
Adds unit tests and modifies the syncing simulator to check the syncing from across fork boundaries.
The syncing simulator's block 1 would always be of phase 0 minimum size (404 bytes) which is smaller than altair min block size (since block 1 contains no attestations).
## Proposed Changes
I did some gardening 🌳 in our dependency tree:
- Remove duplicate versions of `warp` (git vs patch)
- Remove duplicate versions of lots of small deps: `cpufeatures`, `ethabi`, `ethereum-types`, `bitvec`, `nix`, `libsecp256k1`.
- Update MDBX (should resolve#3028). I tested and Lighthouse compiles on Windows 11 now.
- Restore `psutil` back to upstream
- Make some progress updating everything to rand 0.8. There are a few crates stuck on 0.7.
Hopefully this puts us on a better footing for future `cargo audit` issues, and improves compile times slightly.
## Additional Info
Some crates are held back by issues with `zeroize`. libp2p-noise depends on [`chacha20poly1305`](https://crates.io/crates/chacha20poly1305) which depends on zeroize < v1.5, and we can only have one version of zeroize because it's post 1.0 (see https://github.com/rust-lang/cargo/issues/6584). The latest version of `zeroize` is v1.5.4, which is used by the new versions of many other crates (e.g. `num-bigint-dig`). Once a new version of chacha20poly1305 is released we can update libp2p-noise and upgrade everything to the latest `zeroize` version.
I've also opened a PR to `blst` related to zeroize: https://github.com/supranational/blst/pull/111
## Issue Addressed
MEV boost compatibility
## Proposed Changes
See #2987
## Additional Info
This is blocked on the stabilization of a couple specs, [here](https://github.com/ethereum/beacon-APIs/pull/194) and [here](https://github.com/flashbots/mev-boost/pull/20).
Additional TODO's and outstanding questions
- [ ] MEV boost JWT Auth
- [ ] Will `builder_proposeBlindedBlock` return the revealed payload for the BN to propogate
- [ ] Should we remove `private-tx-proposals` flag and communicate BN <> VC with blinded blocks by default once these endpoints enter the beacon-API's repo? This simplifies merge transition logic.
Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Issue Addressed
Resolves#2936
## Proposed Changes
Adds functionality for calling [`validator/prepare_beacon_proposer`](https://ethereum.github.io/beacon-APIs/?urls.primaryName=dev#/Validator/prepareBeaconProposer) in advance.
There is a `BeaconChain::prepare_beacon_proposer` method which, which called, computes the proposer for the next slot. If that proposer has been registered via the `validator/prepare_beacon_proposer` API method, then the `beacon_chain.execution_layer` will be provided the `PayloadAttributes` for us in all future forkchoiceUpdated calls. An artificial forkchoiceUpdated call will be created 4s before each slot, when the head updates and when a validator updates their information.
Additionally, I added strict ordering for calls from the `BeaconChain` to the `ExecutionLayer`. I'm not certain the `ExecutionLayer` will always maintain this ordering, but it's a good start to have consistency from the `BeaconChain`. There are some deadlock opportunities introduced, they are documented in the code.
## Additional Info
- ~~Blocked on #2837~~
Co-authored-by: realbigsean <seananderson33@GMAIL.com>
## Issue Addressed
Resolves#3015
## Proposed Changes
Add JWT token based authentication to engine api requests. The jwt secret key is read from the provided file and is used to sign tokens that are used for authenticated communication with the EL node.
- [x] Interop with geth (synced `merge-devnet-4` with the `merge-kiln-v2` branch on geth)
- [x] Interop with other EL clients (nethermind on `merge-devnet-4`)
- [x] ~Implement `zeroize` for jwt secrets~
- [x] Add auth server tests with `mock_execution_layer`
- [x] Get auth working with the `execution_engine_integration` tests
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
As discussed on last-night's consensus call, the testnets next week will target the [Kiln Spec v2](https://hackmd.io/@n0ble/kiln-spec).
Presently, we support Kiln V1. V2 is backwards compatible, except for renaming `random` to `prev_randao` in:
- https://github.com/ethereum/execution-apis/pull/180
- https://github.com/ethereum/consensus-specs/pull/2835
With this PR we'll no longer be compatible with the existing Kintsugi and Kiln testnets, however we'll be ready for the testnets next week. I raised this breaking change in the call last night, we are all keen to move forward and break things.
We now target the [`merge-kiln-v2`](https://github.com/MariusVanDerWijden/go-ethereum/tree/merge-kiln-v2) branch for interop with Geth. This required adding the `--http.aauthport` to the tester to avoid a port conflict at startup.
### Changes to exec integration tests
There's some change in the `merge-kiln-v2` version of Geth that means it can't compile on a vanilla Github runner. Bumping the `go` version on the runner solved this issue.
Whilst addressing this, I refactored the `testing/execution_integration` crate to be a *binary* rather than a *library* with tests. This means that we don't need to run the `build.rs` and build Geth whenever someone runs `make lint` or `make test-release`. This is nice for everyday users, but it's also nice for CI so that we can have a specific runner for these tests and we don't need to ensure *all* runners support everything required to build all execution clients.
## More Info
- [x] ~~EF tests are failing since the rename has broken some tests that reference the old field name. I have been told there will be new tests released in the coming days (25/02/22 or 26/02/22).~~
## Issue Addressed
#3006
## Proposed Changes
This PR changes the default behaviour of lighthouse to ignore discovered IPs that are not globally routable. It adds a CLI flag, --enable-local-discovery to permit the non-global IPs in discovery.
NOTE: We should take care in merging this as I will break current set-ups that rely on local IP discovery. I made this the non-default behaviour because we dont really want to be wasting resources attempting to connect to non-routable addresses and we dont want to propagate these to others (on the chance we can connect to one of these local nodes), improving discoveries efficiency.
## Issue Addressed
NA
## Proposed Changes
Adds the functionality to allow blocks to be validated/invalidated after their import as per the [optimistic sync spec](https://github.com/ethereum/consensus-specs/blob/dev/sync/optimistic.md#how-to-optimistically-import-blocks). This means:
- Updating `ProtoArray` to allow flipping the `execution_status` of ancestors/descendants based on payload validity updates.
- Creating separation between `execution_layer` and the `beacon_chain` by creating a `PayloadStatus` struct.
- Refactoring how the `execution_layer` selects a `PayloadStatus` from the multiple statuses returned from multiple EEs.
- Adding testing framework for optimistic imports.
- Add `ExecutionBlockHash(Hash256)` new-type struct to avoid confusion between *beacon block roots* and *execution payload hashes*.
- Add `merge` to [`FORKS`](c3a793fd73/Makefile (L17)) in the `Makefile` to ensure we test the beacon chain with merge settings.
- Fix some tests here that were failing due to a missing execution layer.
## TODO
- [ ] Balance tests
Co-authored-by: Mark Mackey <mark@sigmaprime.io>
## Proposed Changes
Lots of lint updates related to `flat_map`, `unwrap_or_else` and string patterns. I did a little more creative refactoring in the op pool, but otherwise followed Clippy's suggestions.
## Additional Info
We need this PR to unblock CI.
## Issue Addressed
NA
## Proposed Changes
This PR extends #3018 to address my review comments there and add automated integration tests with Geth (and other implementations, in the future).
I've also de-duplicated the "unused port" logic by creating an `common/unused_port` crate.
## Additional Info
I'm not sure if we want to merge this PR, or update #3018 and merge that. I don't mind, I'm primarily opening this PR to make sure CI works.
Co-authored-by: Mark Mackey <mark@sigmaprime.io>
## Issue Addressed
Closes#3014
## Proposed Changes
- Rename `receipt_root` to `receipts_root`
- Rename `execute_payload` to `notify_new_payload`
- This is slightly weird since we modify everything except the actual HTTP call to the engine API. That change is expected to be implemented in #2985 (cc @ethDreamer)
- Enable "random" tests for Bellatrix.
## Notes
This will break *partially* compatibility with Kintusgi testnets in order to gain compatibility with [Kiln](https://hackmd.io/@n0ble/kiln-spec) testnets. I think it will only break the BN APIs due to the `receipts_root` change, however it might have some other effects too.
Co-authored-by: Michael Sproul <micsproul@gmail.com>
## Issue Addressed
N/A
## Proposed Changes
Removes all configurations and hard-coded rules related to the deprecated Pyrmont testnet.
## Additional Info
Pyrmont is deprecated/will be shut down after being used for scenario testing, this PR removes configurations related to it.
Co-authored-by: Zachinquarantine <zachinquarantine@yahoo.com>
## Issue Addressed
#2883
## Proposed Changes
* Added `suggested-fee-recipient` & `suggested-fee-recipient-file` flags to validator client (similar to graffiti / graffiti-file implementation).
* Added proposer preparation service to VC, which sends the fee-recipient of all known validators to the BN via [/eth/v1/validator/prepare_beacon_proposer](https://github.com/ethereum/beacon-APIs/pull/178) api once per slot
* Added [/eth/v1/validator/prepare_beacon_proposer](https://github.com/ethereum/beacon-APIs/pull/178) api endpoint and preparation data caching
* Added cleanup routine to remove cached proposer preparations when not updated for 2 epochs
## Additional Info
Changed the Implementation following the discussion in #2883.
Co-authored-by: pk910 <philipp@pk910.de>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
Co-authored-by: Philipp K <philipp@pk910.de>
## Issue Addressed
Closes#2938
## Proposed Changes
* Build and publish images with a `-modern` suffix which enable CPU optimizations for modern hardware.
* Add docs for the plethora of available images!
* Unify all the Docker workflows in `docker.yml` (including for tagged releases).
## Additional Info
The `Dockerfile` is no longer used by our Docker Hub builds, as we use `cross` and a generic approach for ARM and x86. There's a new CI job `docker-build-from-source` which tests the `Dockerfile` without publishing anything.
## Proposed Changes
Change the canonical fork name for the merge to Bellatrix. Keep other merge naming the same to avoid churn.
I've also fixed and enabled the `fork` and `transition` tests for Bellatrix, and the v1.1.7 fork choice tests.
Additionally, the `BellatrixPreset` has been added with tests. It gets served via the `/config/spec` API endpoint along with the other presets.
## Issue Addressed
Automates a build and push to antithesis servers on merges to unstable. They run tests against lighthouse daily and have requested more frequent pushes. Currently we are just manually pushing stable images when we have a new release.
## Proposed Changes
- Add a `Dockerfile.libvoidstar`
- Add the `libvoidstar.so` binary
- Add a new workflow to autmatically build and push on merges to unstable
## Additional Info
Requires adding the following secrets
-`ANTITHESIS_USERNAME`
-`ANTITHESIS_PASSWORD`
-`ANTITHESIS_REPOSITORY`
-`ANTITHESIS_SERVER`
Tested here: https://github.com/realbigsean/lighthouse/actions/runs/1612821446
Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Proposed Changes
Update `superstruct` to bring in @realbigsean's fixes necessary for MEV-compatible private beacon block types (a la #2795).
The refactoring is due to another change in superstruct that allows partial getters to be auto-generated.
## Issue Addressed
Successor to #2431
## Proposed Changes
* Add a `BlockReplayer` struct to abstract over the intricacies of calling `per_slot_processing` and `per_block_processing` while avoiding unnecessary tree hashing.
* Add a variant of the forwards state root iterator that does not require an `end_state`.
* Use the `BlockReplayer` when reconstructing states in the database. Use the efficient forwards iterator for frozen states.
* Refactor the iterators to remove `Arc<HotColdDB>` (this seems to be neater than making _everything_ an `Arc<HotColdDB>` as I did in #2431).
Supplying the state roots allow us to avoid building a tree hash cache at all when reconstructing historic states, which saves around 1 second flat (regardless of `slots-per-restore-point`). This is a small percentage of worst-case state load times with 200K validators and SPRP=2048 (~15s vs ~16s) but a significant speed-up for more frequent restore points: state loads with SPRP=32 should be now consistently <500ms instead of 1.5s (a ~3x speedup).
## Additional Info
Required by https://github.com/sigp/lighthouse/pull/2628
## Issue Addressed
Resolves: https://github.com/sigp/lighthouse/issues/2741
Includes: https://github.com/sigp/lighthouse/pull/2853 so that we can get ssz static tests passing here on v1.1.6. If we want to merge that first, we can make this diff slightly smaller
## Proposed Changes
- Changes the `justified_epoch` and `finalized_epoch` in the `ProtoArrayNode` each to an `Option<Checkpoint>`. The `Option` is necessary only for the migration, so not ideal. But does allow us to add a default logic to `None` on these fields during the database migration.
- Adds a database migration from a legacy fork choice struct to the new one, search for all necessary block roots in fork choice by iterating through blocks in the db.
- updates related to https://github.com/ethereum/consensus-specs/pull/2727
- We will have to update the persisted forkchoice to make sure the justified checkpoint stored is correct according to the updated fork choice logic. This boils down to setting the forkchoice store's justified checkpoint to the justified checkpoint of the block that advanced the finalized checkpoint to the current one.
- AFAICT there's no migration steps necessary for the update to allow applying attestations from prior blocks, but would appreciate confirmation on that
- I updated the consensus spec tests to v1.1.6 here, but they will fail until we also implement the proposer score boost updates. I confirmed that the previously failing scenario `new_finalized_slot_is_justified_checkpoint_ancestor` will now pass after the boost updates, but haven't confirmed _all_ tests will pass because I just quickly stubbed out the proposer boost test scenario formatting.
- This PR now also includes proposer boosting https://github.com/ethereum/consensus-specs/pull/2730
## Additional Info
I realized checking justified and finalized roots in fork choice makes it more likely that we trigger this bug: https://github.com/ethereum/consensus-specs/pull/2727
It's possible the combination of justified checkpoint and finalized checkpoint in the forkchoice store is different from in any block in fork choice. So when trying to startup our store's justified checkpoint seems invalid to the rest of fork choice (but it should be valid). When this happens we get an `InvalidBestNode` error and fail to start up. So I'm including that bugfix in this branch.
Todo:
- [x] Fix fork choice tests
- [x] Self review
- [x] Add fix for https://github.com/ethereum/consensus-specs/pull/2727
- [x] Rebase onto Kintusgi
- [x] Fix `num_active_validators` calculation as @michaelsproul pointed out
- [x] Clean up db migrations
Co-authored-by: realbigsean <seananderson33@gmail.com>
## Issue Addressed
N/A
## Proposed Changes
Changes required for the `merge-devnet-3`. Added some more non substantive renames on top of @realbigsean 's commit.
Note: this doesn't include the proposer boosting changes in kintsugi v3.
This devnet isn't running with the proposer boosting fork choice changes so if we are looking to merge https://github.com/sigp/lighthouse/pull/2822 into `unstable`, then I think we should just maintain this branch for the devnet temporarily.
Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
New rust lints
## Proposed Changes
- Boxing some enum variants
- removing some unused fields (is the validator lockfile unused? seemed so to me)
## Additional Info
- some error fields were marked as dead code but are logged out in areas
- left some dead fields in our ef test code because I assume they are useful for debugging?
Co-authored-by: realbigsean <seananderson33@gmail.com>
* Fix fork choice after rebase
* Remove paulhauner warp dep
* Fix fork choice test compile errors
* Assume fork choice payloads are valid
* Add comment
* Ignore new tests
* Fix error in test skipping
* Fix arbitrary check kintsugi
* Add merge chain spec fields, and a function to determine which constant to use based on the state variant
* increment spec test version
* Remove `Transaction` enum wrapper
* Remove Transaction new-type
* Remove gas validations
* Add `--terminal-block-hash-epoch-override` flag
* Increment spec tests version to 1.1.5
* Remove extraneous gossip verification https://github.com/ethereum/consensus-specs/pull/2687
* - Remove unused Error variants
- Require both "terminal-block-hash-epoch-override" and "terminal-block-hash-override" when either flag is used
* - Remove a couple more unused Error variants
Co-authored-by: Paul Hauner <paul@paulhauner.com>
* update initializing from eth1 for merge genesis
* read execution payload header from file lcli
* add `create-payload-header` command to `lcli`
* fix base fee parsing
* Apply suggestions from code review
* default `execution_payload_header` bool to false when deserializing `meta.yml` in EF tests
Co-authored-by: Paul Hauner <paul@paulhauner.com>
Added Execution Payload from Rayonism Fork
Updated new Containers to match Merge Spec
Updated BeaconBlockBody for Merge Spec
Completed updating BeaconState and BeaconBlockBody
Modified ExecutionPayload<T> to use Transaction<T>
Mostly Finished Changes for beacon-chain.md
Added some things for fork-choice.md
Update to match new fork-choice.md/fork.md changes
ran cargo fmt
Added Missing Pieces in eth2_libp2p for Merge
fix ef test
Various Changes to Conform Closer to Merge Spec
## Issue Addressed
Closes#1996
## Proposed Changes
Run a second `Logger` via `sloggers` which logs to a file in the background with:
- separate `debug-level` for background and terminal logging
- the ability to limit log size
- rotation through a customizable number of log files
- an option to compress old log files (`.gz` format)
Add the following new CLI flags:
- `--logfile-debug-level`: The debug level of the log files
- `--logfile-max-size`: The maximum size of each log file
- `--logfile-max-number`: The number of old log files to store
- `--logfile-compress`: Whether to compress old log files
By default background logging uses the `debug` log level and saves logfiles to:
- Beacon Node: `$HOME/.lighthouse/$network/beacon/logs/beacon.log`
- Validator Client: `$HOME/.lighthouse/$network/validators/logs/validator.log`
Or, when using the `--datadir` flag:
`$datadir/beacon/logs/beacon.log` and `$datadir/validators/logs/validator.log`
Once rotated, old logs are stored like so: `beacon.log.1`, `beacon.log.2` etc.
> Note: `beacon.log.1` is always newer than `beacon.log.2`.
## Additional Info
Currently the default value of `--logfile-max-size` is 200 (MB) and `--logfile-max-number` is 5.
This means that the maximum storage space that the logs will take up by default is 1.2GB.
(200MB x 5 from old log files + <200MB the current logfile being written to)
Happy to adjust these default values to whatever people think is appropriate.
It's also worth noting that when logging to a file, we lose our custom `slog` formatting. This means the logfile logs look like this:
```
Oct 27 16:02:50.305 INFO Lighthouse started, version: Lighthouse/v2.0.1-8edd9d4+, module: lighthouse:413
Oct 27 16:02:50.305 INFO Configured for network, name: prater, module: lighthouse:414
```
## Issue Addressed
Resolves#2775
## Proposed Changes
1. Reduces the target_peers to stop recursive discovery requests.
2. Changes the eth1 `auto_update_interval` config parameter to be equal to the eth1 block time in the simulator. Without this, the eth1 `latest_cached_block` wouldn't be updated for some slots around `eth1_voting_period` boundaries which would lead to the eth1 cache falling out of sync. This is what caused the `Syncing eth1 block cache` and `No valid eth1_data votes` logs.
## Issue Addressed
Resolves#2545
## Proposed Changes
Adds the long-overdue EF tests for fork choice. Although we had pretty good coverage via other implementations that closely followed our approach, it is nonetheless important for us to implement these tests too.
During testing I found that we were using a hard-coded `SAFE_SLOTS_TO_UPDATE_JUSTIFIED` value rather than one from the `ChainSpec`. This caused a failure during a minimal preset test. This doesn't represent a risk to mainnet or testnets, since the hard-coded value matched the mainnet preset.
## Failing Cases
There is one failing case which is presently marked as `SkippedKnownFailure`:
```
case 4 ("new_finalized_slot_is_justified_checkpoint_ancestor") from /home/paul/development/lighthouse/testing/ef_tests/consensus-spec-tests/tests/minimal/phase0/fork_choice/on_block/pyspec_tests/new_finalized_slot_is_justified_checkpoint_ancestor failed with NotEqual:
head check failed: Got Head { slot: Slot(40), root: 0x9183dbaed4191a862bd307d476e687277fc08469fc38618699863333487703e7 } | Expected Head { slot: Slot(24), root: 0x105b49b51bf7103c182aa58860b039550a89c05a4675992e2af703bd02c84570 }
```
This failure is due to #2741. It's not a particularly high priority issue at the moment, so we fix it after merging this PR.
## Issue Addressed
Currently, running the Web3Signer tests locally without having a java runtime environment installed and available on your PATH will result in the tests failing.
## Proposed Changes
Add a note regarding the Web3Signer tests being dependent on java (similar to what we have for `ganache-cli`)
## Issue Addressed
e895074ba updated the altair fork and now that we are a week away this test no longer panics.
## Proposed Changes
Remove the expected panic and explanatory note.
## Issue Addressed
When compiling with Rust 1.56.0 the compiler generates 3 instances of this warning:
```
warning: trailing semicolon in macro used in expression position
--> common/eth2_network_config/src/lib.rs:181:24
|
181 | })?;
| ^
...
195 | let deposit_contract_deploy_block = load_from_file!(DEPLOY_BLOCK_FILE);
| ---------------------------------- in this macro invocation
|
= note: `#[warn(semicolon_in_expressions_from_macros)]` on by default
= warning: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!
= note: for more information, see issue #79813 <https://github.com/rust-lang/rust/issues/79813>
= note: this warning originates in the macro `load_from_file` (in Nightly builds, run with -Z macro-backtrace for more info)
```
This warning is completely harmless, but will be visible to users compiling Lighthouse v2.0.1 (or earlier) with Rust 1.56.0 (to be released October 21st). It is **completely safe** to ignore this warning, it's just a superficial change to Rust's syntax.
## Proposed Changes
This PR removes the semi-colon as recommended, and fixes the new Clippy lints from 1.56.0
## Issue Addressed
NA
## Proposed Changes
This is a wholesale rip-off of #2708, see that PR for more of a description.
I've made this PR since @realbigsean is offline and I can't merge his PR due to Github's frustrating `target-branch-check` bug. I also changed the branch to `unstable`, since I'm trying to minimize the diff between `merge-f2f`/`unstable`. I'll just rebase `merge-f2f` onto `unstable` after this PR merges.
When running `make lint` I noticed the following warning:
```
warning: patch for `fixed-hash` uses the features mechanism. default-features and features will not take effect because the patch dependency does not support this mechanism
```
So, I removed the `features` section from the patch.
## Additional Info
NA
Co-authored-by: realbigsean <seananderson33@gmail.com>
## Issue Addressed
NA
## Proposed Changes
This PR is near-identical to https://github.com/sigp/lighthouse/pull/2652, however it is to be merged into `unstable` instead of `merge-f2f`. Please see that PR for reasoning.
I'm making this duplicate PR to merge to `unstable` in an effort to shrink the diff between `unstable` and `merge-f2f` by doing smaller, lead-up PRs.
## Additional Info
NA
## Issue Addressed
Fix#2585
## Proposed Changes
Provide a canonical version of test_logger that can be used
throughout lighthouse.
## Additional Info
This allows tests to conditionally emit logging data by adding
test_logger as the default logger. And then when executing
`cargo test --features logging/test_logger` log output
will be visible:
wink@3900x:~/lighthouse/common/logging/tests/test-feature-test_logger (Add-test_logger-as-feature-to-logging)
$ cargo test --features logging/test_logger
Finished test [unoptimized + debuginfo] target(s) in 0.02s
Running unittests (target/debug/deps/test_logger-e20115db6a5e3714)
running 1 test
Sep 10 12:53:45.212 INFO hi, module: test_logger:8
test tests::test_fn_with_logging ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests test-logger
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Or, in normal scenarios where logging isn't needed, executing
`cargo test` the log output will not be visible:
wink@3900x:~/lighthouse/common/logging/tests/test-feature-test_logger (Add-test_logger-as-feature-to-logging)
$ cargo test
Finished test [unoptimized + debuginfo] target(s) in 0.02s
Running unittests (target/debug/deps/test_logger-02e02f8d41e8cf8a)
running 1 test
test tests::test_fn_with_logging ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests test-logger
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
## Issue Addressed
NA
## Proposed Changes
Implements the "union" type from the SSZ spec for `ssz`, `ssz_derive`, `tree_hash` and `tree_hash_derive` so it may be derived for `enums`:
https://github.com/ethereum/consensus-specs/blob/v1.1.0-beta.3/ssz/simple-serialize.md#union
The union type is required for the merge, since the `Transaction` type is defined as a single-variant union `Union[OpaqueTransaction]`.
### Crate Updates
This PR will (hopefully) cause CI to publish new versions for the following crates:
- `eth2_ssz_derive`: `0.2.1` -> `0.3.0`
- `eth2_ssz`: `0.3.0` -> `0.4.0`
- `eth2_ssz_types`: `0.2.0` -> `0.2.1`
- `tree_hash`: `0.3.0` -> `0.4.0`
- `tree_hash_derive`: `0.3.0` -> `0.4.0`
These these crates depend on each other, I've had to add a workspace-level `[patch]` for these crates. A follow-up PR will need to remove this patch, ones the new versions are published.
### Union Behaviors
We already had SSZ `Encode` and `TreeHash` derive for enums, however it just did a "transparent" pass-through of the inner value. Since the "union" decoding from the spec is in conflict with the transparent method, I've required that all `enum` have exactly one of the following enum-level attributes:
#### SSZ
- `#[ssz(enum_behaviour = "union")]`
- matches the spec used for the merge
- `#[ssz(enum_behaviour = "transparent")]`
- maintains existing functionality
- not supported for `Decode` (never was)
#### TreeHash
- `#[tree_hash(enum_behaviour = "union")]`
- matches the spec used for the merge
- `#[tree_hash(enum_behaviour = "transparent")]`
- maintains existing functionality
This means that we can maintain the existing transparent behaviour, but all existing users will get a compile-time error until they explicitly opt-in to being transparent.
### Legacy Option Encoding
Before this PR, we already had a union-esque encoding for `Option<T>`. However, this was with the *old* SSZ spec where the union selector was 4 bytes. During merge specification, the spec was changed to use 1 byte for the selector.
Whilst the 4-byte `Option` encoding was never used in the spec, we used it in our database. Writing a migrate script for all occurrences of `Option` in the database would be painful, especially since it's used in the `CommitteeCache`. To avoid the migrate script, I added a serde-esque `#[ssz(with = "module")]` field-level attribute to `ssz_derive` so that we can opt into the 4-byte encoding on a field-by-field basis.
The `ssz::legacy::four_byte_impl!` macro allows a one-liner to define the module required for the `#[ssz(with = "module")]` for some `Option<T> where T: Encode + Decode`.
Notably, **I have removed `Encode` and `Decode` impls for `Option`**. I've done this to force a break on downstream users. Like I mentioned, `Option` isn't used in the spec so I don't think it'll be *that* annoying. I think it's nicer than quietly having two different union implementations or quietly breaking the existing `Option` impl.
### Crate Publish Ordering
I've modified the order in which CI publishes crates to ensure that we don't publish a crate without ensuring we already published a crate that it depends upon.
## TODO
- [ ] Queue a follow-up `[patch]`-removing PR.
[EIP-3030]: https://eips.ethereum.org/EIPS/eip-3030
[Web3Signer]: https://consensys.github.io/web3signer/web3signer-eth2.html
## Issue Addressed
Resolves#2498
## Proposed Changes
Allows the VC to call out to a [Web3Signer] remote signer to obtain signatures.
## Additional Info
### Making Signing Functions `async`
To allow remote signing, I needed to make all the signing functions `async`. This caused a bit of noise where I had to convert iterators into `for` loops.
In `duties_service.rs` there was a particularly tricky case where we couldn't hold a write-lock across an `await`, so I had to first take a read-lock, then grab a write-lock.
### Move Signing from Core Executor
Whilst implementing this feature, I noticed that we signing was happening on the core tokio executor. I suspect this was causing the executor to temporarily lock and occasionally trigger some HTTP timeouts (and potentially SQL pool timeouts, but I can't verify this). Since moving all signing into blocking tokio tasks, I noticed a distinct drop in the "atttestations_http_get" metric on a Prater node:
![http_get_times](https://user-images.githubusercontent.com/6660660/132143737-82fd3836-2e7e-445b-a143-cb347783baad.png)
I think this graph indicates that freeing the core executor allows the VC to operate more smoothly.
### Refactor TaskExecutor
I noticed that the `TaskExecutor::spawn_blocking_handle` function would fail to spawn tasks if it were unable to obtain handles to some metrics (this can happen if the same metric is defined twice). It seemed that a more sensible approach would be to keep spawning tasks, but without metrics. To that end, I refactored the function so that it would still function without metrics. There are no other changes made.
## TODO
- [x] Restructure to support multiple signing methods.
- [x] Add calls to remote signer from VC.
- [x] Documentation
- [x] Test all endpoints
- [x] Test HTTPS certificate
- [x] Allow adding remote signer validators via the API
- [x] Add Altair support via [21.8.1-rc1](https://github.com/ConsenSys/web3signer/releases/tag/21.8.1-rc1)
- [x] Create issue to start using latest version of web3signer. (See #2570)
## Notes
- ~~Web3Signer doesn't yet support the Altair fork for Prater. See https://github.com/ConsenSys/web3signer/issues/423.~~
- ~~There is not yet a release of Web3Signer which supports Altair blocks. See https://github.com/ConsenSys/web3signer/issues/391.~~
## Proposed Changes
Cache the total active balance for the current epoch in the `BeaconState`. Computing this value takes around 1ms, and this was negatively impacting block processing times on Prater, particularly when reconstructing states.
With a large number of attestations in each block, I saw the `process_attestations` function taking 150ms, which means that reconstructing hot states can take up to 4.65s (31 * 150ms), and reconstructing freezer states can take up to 307s (2047 * 150ms).
I opted to add the cache to the beacon state rather than computing the total active balance at the start of state processing and threading it through. Although this would be simpler in a way, it would waste time, particularly during block replay, as the total active balance doesn't change for the duration of an epoch. So we save ~32ms for hot states, and up to 8.1s for freezer states (using `--slots-per-restore-point 8192`).
## Proposed Changes
This PR deletes all `remote_signer` code from Lighthouse, for the following reasons:
* The `remote_signer` code is unused, and we have no plans to use it now that we're moving to supporting the Web3Signer APIs: #2522
* It represents a significant maintenance burden. The HTTP API tests have been prone to platform-specific failures, and breakages due to dependency upgrades, e.g. #2400.
Although the code is deleted it remains in the Git history should we ever want to recover it. For ease of reference:
- The last commit containing remote signer code: 5a3bcd2904
- The last Lighthouse version: v1.5.1
## Issue Addressed
N/A
## Proposed Changes
Set a valid fork epoch in the simulator and run checks on
1. If all nodes transitioned at the fork
2. If all altair block sync aggregates are full
## Issue Addressed
Related to: #2259
Made an attempt at all the necessary updates here to publish the crates to crates.io. I incremented the minor versions on all the crates that have been previously published. We still might run into some issues as we try to publish because I'm not able to test this out but I think it's a good starting point.
## Proposed Changes
- Add description and license to `ssz_types` and `serde_util`
- rename `serde_util` to `eth2_serde_util`
- increment minor versions
- remove path dependencies
- remove patch dependencies
## Additional Info
Crates published:
- [x] `tree_hash` -- need to publish `tree_hash_derive` and `eth2_hashing` first
- [x] `eth2_ssz_types` -- need to publish `eth2_serde_util` first
- [x] `tree_hash_derive`
- [x] `eth2_ssz`
- [x] `eth2_ssz_derive`
- [x] `eth2_serde_util`
- [x] `eth2_hashing`
Co-authored-by: realbigsean <seananderson33@gmail.com>
## Proposed Changes
* Consolidate Tokio versions: everything now uses the latest v1.10.0, no more `tokio-compat`.
* Many semver-compatible changes via `cargo update`. Notably this upgrades from the yanked v0.8.0 version of crossbeam-deque which is present in v1.5.0-rc.0
* Many semver incompatible upgrades via `cargo upgrades` and `cargo upgrade --workspace pkg_name`. Notable ommissions:
- Prometheus, to be handled separately: https://github.com/sigp/lighthouse/issues/2485
- `rand`, `rand_xorshift`: the libsecp256k1 package requires 0.7.x, so we'll stick with that for now
- `ethereum-types` is pinned at 0.11.0 because that's what `web3` is using and it seems nice to have just a single version
## Additional Info
We still have two versions of `libp2p-core` due to `discv5` depending on the v0.29.0 release rather than `master`. AFAIK it should be OK to release in this state (cc @AgeManning )
## Issue Addressed
Resolves#2069
## Proposed Changes
- Adds a `--doppelganger-detection` flag
- Adds a `lighthouse/seen_validators` endpoint, which will make it so the lighthouse VC is not interopable with other client beacon nodes if the `--doppelganger-detection` flag is used, but hopefully this will become standardized. Relevant Eth2 API repo issue: https://github.com/ethereum/eth2.0-APIs/issues/64
- If the `--doppelganger-detection` flag is used, the VC will wait until the beacon node is synced, and then wait an additional 2 epochs. The reason for this is to make sure the beacon node is able to subscribe to the subnets our validators should be attesting on. I think an alternative would be to have the beacon node subscribe to all subnets for 2+ epochs on startup by default.
## Additional Info
I'd like to add tests and would appreciate feedback.
TODO: handle validators started via the API, potentially make this default behavior
Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
N/A
## Proposed Changes
- Removing a bunch of unnecessary references
- Updated `Error::VariantError` to `Error::Variant`
- There were additional enum variant lints that I ignored, because I thought our variant names were fine
- removed `MonitoredValidator`'s `pubkey` field, because I couldn't find it used anywhere. It looks like we just use the string version of the pubkey (the `id` field) if there is no index
## Additional Info
Co-authored-by: realbigsean <seananderson33@gmail.com>
## Proposed Changes
Add the `sync_aggregate` from `BeaconBlock` to the bulk signature verifier for blocks. This necessitates a new signature set constructor for the sync aggregate, which is different from the others due to the use of [`eth2_fast_aggregate_verify`](https://github.com/ethereum/eth2.0-specs/blob/v1.1.0-alpha.7/specs/altair/bls.md#eth2_fast_aggregate_verify) for sync aggregates, per [`process_sync_aggregate`](https://github.com/ethereum/eth2.0-specs/blob/v1.1.0-alpha.7/specs/altair/beacon-chain.md#sync-aggregate-processing). I made the choice to return an optional signature set, with `None` representing the case where the signature is valid on account of being the point at infinity (requires no further checking).
To "dogfood" the changes and prevent duplication, the consensus logic now uses the signature set approach as well whenever it is required to verify signatures (which should only be in testing AFAIK). The EF tests pass with the code as it exists currently, but failed before I adapted the `eth2_fast_aggregate_verify` changes (which is good).
As a result of this change Altair block processing should be a little faster, and importantly, we will no longer accidentally verify signatures when replaying blocks, e.g. when replaying blocks from the database.
## Issue Addressed
NA
## Proposed Changes
This PR addresses two things:
1. Allows the `ValidatorMonitor` to work with Altair states.
1. Optimizes `altair::process_epoch` (see [code](https://github.com/paulhauner/lighthouse/blob/participation-cache/consensus/state_processing/src/per_epoch_processing/altair/participation_cache.rs) for description)
## Breaking Changes
The breaking changes in this PR revolve around one premise:
*After the Altair fork, it's not longer possible (given only a `BeaconState`) to identify if a validator had *any* attestation included during some epoch. The best we can do is see if that validator made the "timely" source/target/head flags.*
Whilst this seems annoying, it's not actually too bad. Finalization is based upon "timely target" attestations, so that's really the most important thing. Although there's *some* value in knowing if a validator had *any* attestation included, it's far more important to know about "timely target" participation, since this is what affects finality and justification.
For simplicity and consistency, I've also removed the ability to determine if *any* attestation was included from metrics and API endpoints. Now, all Altair and non-Altair states will simply report on the head/target attestations.
The following section details where we've removed fields and provides replacement values.
### Breaking Changes: Prometheus Metrics
Some participation metrics have been removed and replaced. Some were removed since they are no longer relevant to Altair (e.g., total attesting balance) and others replaced with gwei values instead of pre-computed values. This provides more flexibility at display-time (e.g., Grafana).
The following metrics were added as replacements:
- `beacon_participation_prev_epoch_head_attesting_gwei_total`
- `beacon_participation_prev_epoch_target_attesting_gwei_total`
- `beacon_participation_prev_epoch_source_attesting_gwei_total`
- `beacon_participation_prev_epoch_active_gwei_total`
The following metrics were removed:
- `beacon_participation_prev_epoch_attester`
- instead use `beacon_participation_prev_epoch_source_attesting_gwei_total / beacon_participation_prev_epoch_active_gwei_total`.
- `beacon_participation_prev_epoch_target_attester`
- instead use `beacon_participation_prev_epoch_target_attesting_gwei_total / beacon_participation_prev_epoch_active_gwei_total`.
- `beacon_participation_prev_epoch_head_attester`
- instead use `beacon_participation_prev_epoch_head_attesting_gwei_total / beacon_participation_prev_epoch_active_gwei_total`.
The `beacon_participation_prev_epoch_attester` endpoint has been removed. Users should instead use the pre-existing `beacon_participation_prev_epoch_target_attester`.
### Breaking Changes: HTTP API
The `/lighthouse/validator_inclusion/{epoch}/{validator_id}` endpoint loses the following fields:
- `current_epoch_attesting_gwei` (use `current_epoch_target_attesting_gwei` instead)
- `previous_epoch_attesting_gwei` (use `previous_epoch_target_attesting_gwei` instead)
The `/lighthouse/validator_inclusion/{epoch}/{validator_id}` endpoint lose the following fields:
- `is_current_epoch_attester` (use `is_current_epoch_target_attester` instead)
- `is_previous_epoch_attester` (use `is_previous_epoch_target_attester` instead)
- `is_active_in_current_epoch` becomes `is_active_unslashed_in_current_epoch`.
- `is_active_in_previous_epoch` becomes `is_active_unslashed_in_previous_epoch`.
## Additional Info
NA
## TODO
- [x] Deal with total balances
- [x] Update validator_inclusion API
- [ ] Ensure `beacon_participation_prev_epoch_target_attester` and `beacon_participation_prev_epoch_head_attester` work before Altair
Co-authored-by: realbigsean <seananderson33@gmail.com>
## Proposed Changes
Update to the latest version of the Altair spec, which includes new tests and a tweak to the target sync aggregators.
## Additional Info
This change is _not_ required for the imminent Altair devnet, and is waiting on the merge of #2321 to unstable.
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Proposed Changes
Remove the remaining Altair `FIXME`s from consensus land.
1. Implement tree hash caching for the participation lists. This required some light type manipulation, including removing the `TreeHash` bound from `CachedTreeHash` which was purely descriptive.
2. Plumb the proposer index through Altair attestation processing, to avoid calculating it for _every_ attestation (potentially 128ms on large networks). This duplicates some work from #2431, but with the aim of getting it in sooner, particularly for the Altair devnets.
3. Removes two FIXMEs related to `superstruct` and cloning, which are unlikely to be particularly detrimental and will be tracked here instead: https://github.com/sigp/superstruct/issues/5
## Issue Addressed
Resolves#2313
## Proposed Changes
Provide `BeaconNodeHttpClient` with a dedicated `Timeouts` struct.
This will allow granular adjustment of the timeout duration for different calls made from the VC to the BN. These can either be a constant value, or as a ratio of the slot duration.
Improve timeout performance by using these adjusted timeout duration's only whenever a fallback endpoint is available.
Add a CLI flag called `use-long-timeouts` to revert to the old behavior.
## Additional Info
Additionally set the default `BeaconNodeHttpClient` timeouts to the be the slot duration of the network, rather than a constant 12 seconds. This will allow it to adjust to different network specifications.
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Proposed Changes
Implement the consensus changes necessary for the upcoming Altair hard fork.
## Additional Info
This is quite a heavy refactor, with pivotal types like the `BeaconState` and `BeaconBlock` changing from structs to enums. This ripples through the whole codebase with field accesses changing to methods, e.g. `state.slot` => `state.slot()`.
Co-authored-by: realbigsean <seananderson33@gmail.com>
This updates some older dependencies to address a few cargo audit warnings.
The majority of warnings come from network dependencies which will be addressed in #2389.
This PR contains some minor dep updates that are not network related.
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
## Issue Addressed
`make lint` failing on rust 1.53.0.
## Proposed Changes
1.53.0 updates
## Additional Info
I haven't figure out why yet, we were now hitting the recursion limit in a few crates. So I had to add `#![recursion_limit = "256"]` in a few places
Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
## Proposed Changes
Unescape text for json comparison in:
3a24ca5f14/remote_signer/tests/sign.rs (L282-L285)
Which causes this error:
```
---- sign::invalid_field_fork stdout ----
thread 'sign::invalid_field_fork' panicked at 'assertion failed: `(left == right)`
left: `"Unable to parse body message from JSON: Error(\"invalid hex (InvalidHexCharacter { c: 'I', index: 0 })\", line: 1, column: 237097)"`,
right: `"Unable to parse body message from JSON: Error(\"invalid hex (InvalidHexCharacter { c: \\'I\\', index: 0 })\", line: 1, column: 237097)"`', testing/remote_signer_test/src/consumer.rs:144:5
```
This is my first contribution and happy to receive feedback if you have any. Thanks
## Issue Addressed
Windows incompatibility.
## Proposed Changes
On windows, lighthouse needs to default to STDIN as tty doesn't exist. Also Windows uses ACLs for file permissions. So to mirror chmod 600, we will remove every entry in a file's ACL and add only a single SID that is an alias for the file owner.
Beyond that, there were several changes made to different unit tests because windows has slightly different error messages as well as frustrating nuances around killing a process :/
## Additional Info
Tested on my Windows VM and it appears to work, also compiled & tested on Linux with these changes. Permissions look correct on both platforms now. Just waiting for my validator to activate on Prater so I can test running full validator client on windows.
Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>
Co-authored-by: Michael Sproul <micsproul@gmail.com>
## Issue Addressed
The latest version of Rust has new clippy rules & the codebase isn't up to date with them.
## Proposed Changes
Small formatting changes that clippy tells me are functionally equivalent
## Issue Addressed
#2276
## Proposed Changes
Add the `SensitiveUrl` struct which wraps `Url` and implements custom `Display` and `Debug` traits to redact user secrets from being logged in eth1 endpoints, beacon node endpoints and metrics.
## Additional Info
This also includes a small rewrite of the eth1 crate to make requests using `Url` instead of `&str`.
Some error messages have also been changed to remove `Url` data.
## Issue Addressed
Closes#2274
## Proposed Changes
* Modify the `YamlConfig` to collect unknown fields into an `extra_fields` map, instead of failing hard.
* Log a debug message if there are extra fields returned to the VC from one of its BNs.
This restores Lighthouse's compatibility with Teku beacon nodes (and therefore Infura)
## Issue Addressed
resolves#2129resolves#2099
addresses some of #1712
unblocks #2076
unblocks #2153
## Proposed Changes
- Updates all the dependencies mentioned in #2129, except for web3. They haven't merged their tokio 1.0 update because they are waiting on some dependencies of their own. Since we only use web3 in tests, I think updating it in a separate issue is fine. If they are able to merge soon though, I can update in this PR.
- Updates `tokio_util` to 0.6.2 and `bytes` to 1.0.1.
- We haven't made a discv5 release since merging tokio 1.0 updates so I'm using a commit rather than release atm. **Edit:** I think we should merge an update of `tokio_util` to 0.6.2 into discv5 before this release because it has panic fixes in `DelayQueue` --> PR in discv5: https://github.com/sigp/discv5/pull/58
## Additional Info
tokio 1.0 changes that required some changes in lighthouse:
- `interval.next().await.is_some()` -> `interval.tick().await`
- `sleep` future is now `!Unpin` -> https://github.com/tokio-rs/tokio/issues/3028
- `try_recv` has been temporarily removed from `mpsc` -> https://github.com/tokio-rs/tokio/issues/3350
- stream features have moved to `tokio-stream` and `broadcast::Receiver::into_stream()` has been temporarily removed -> `https://github.com/tokio-rs/tokio/issues/2870
- I've copied over the `BroadcastStream` wrapper from this PR, but can update to use `tokio-stream` once it's merged https://github.com/tokio-rs/tokio/pull/3384
Co-authored-by: realbigsean <seananderson33@gmail.com>
## Issue Addressed
N/A
## Proposed Changes
Another attempt at fixing simulator issues for `eth1-sim`.
The `LocalValidatorClient` here blocks till genesis has occurred.
e4b62139d7/testing/simulator/src/local_network.rs (L145-L150)
Due to this, only the first validator(validator_0) starts before genesis. The remaining 3 vc's in the simulation start only after genesis. This was probably causing issues with missing the duties and eventually the proposal for slot 1.
This PR spawns each `LocalValidatorClient` in it's own tokio task to allow the remaining validators to start before genesis.
## Additional Info
Please provide any additional information. For example, future considerations
or information useful for reviewers.
## Issue Addressed
NA
## Proposed Changes
Copied from #2083, changes the config milliseconds_per_slot to seconds_per_slot to avoid errors when slot duration is not a multiple of a second. To avoid deserializing old serialized data (with milliseconds instead of seconds) the Serialize and Deserialize derive got removed from the Spec struct (isn't currently used anyway).
This PR replaces #2083 for the purpose of fixing a merge conflict without requiring the input of @blacktemplar.
## Additional Info
NA
Co-authored-by: blacktemplar <blacktemplar@a1.net>
## Issue Addressed
`test_dht_persistence` failing
## Proposed Changes
Bind `NetworkService::start` to an underscore prefixed variable rather than `_`. `_` was causing it to be dropped immediately
This was failing 5/100 times before this update, but I haven't been able to get it to fail after updating it
Co-authored-by: realbigsean <seananderson33@gmail.com>
## Issue Addressed
We have a race condition when counting the number of beacon nodes. The user could end up seeing a duplicated service name (node_N).
## Proposed Changes
I have updated to acquire write lock before counting the number of beacon nodes.
## Issue Addressed
Fixes#2141
Remove [tempdir](https://docs.rs/tempdir/0.3.7/tempdir/) in favor of [tempfile](https://docs.rs/tempfile/3.1.0/tempfile/).
## Proposed Changes
`tempfile` has a slightly different api that makes creating temp folders with a name prefix a chore (`tempdir::TempDir::new("toto")` => `tempfile::Builder::new().prefix("toto").tempdir()`).
So I removed temp folder name prefix where I deemed it not useful.
Otherwise, the functionality is the same.
## Issue Addressed
- Resolves#1883
## Proposed Changes
This follows on from @blacktemplar's work in #2018.
- Allows the VC to connect to multiple BN for redundancy.
- Update the simulator so some nodes always need to rely on their fallback.
- Adds some extra deprecation warnings for `--eth1-endpoint`
- Pass `SignatureBytes` as a reference instead of by value.
## Additional Info
NA
Co-authored-by: blacktemplar <blacktemplar@a1.net>
## Issue Addressed
NA
## Proposed Changes
Fixes problems with slot times below 1 second which got revealed by running the syncing simulator with the default speedup time.
## Issue Addressed
Closes#1264
## Proposed Changes
* Milagro BLS: tweak the feature flags so that Milagro doesn't get compiled if we're using BLST. Profiling showed that it was consuming about 1 minute of CPU time out of 60 minutes of CPU time (real time ~15 mins). A 1.6% saving.
* Reduce monomorphization: compiling for 3 different `EthSpec` types causes a heck of a lot of generic functions to be instantiated (monomorphized). Removing 2 of 3 cuts the LLVM+linking step from around 250 seconds to 180 seconds, a saving of 70 seconds (real time!). This applies only to `make` and not the CI build, because we test with the minimal spec on CI.
* Update `web3` crate to v0.13. This is perhaps the most controversial change, because it requires axing some deposit contract tools from `lcli`. I suspect these tools weren't used much anyway, and could be maintained separately, but I'm also happy to revert this change. However, it does save us a lot of compile time. With #1839, we now have 3 versions of Tokio (and all of Tokio's deps). This change brings us down to 2 versions, but 1 should be achievable once web3 (and reqwest) move to Tokio 0.3.
* Remove `lcli` from the Docker image. It's a dev tool and can be built from the repo if required.
## Issue Addressed
NA
## Proposed Changes
Updates out of date dependencies.
## Additional Info
See also https://github.com/sigp/lighthouse/issues/1712 for a list of dependencies that are still out of date and the resasons.
## Issue Addressed
Resolves#1434 (this is the last major feature in the standard spec. There are only a couple of places we may be off-spec due to recent spec changes or ongoing discussion)
Partly addresses #1669
## Proposed Changes
- remove the websocket server
- remove the `TeeEventHandler` and `NullEventHandler`
- add server sent events according to the eth2 API spec
## Additional Info
This is according to the currently unmerged PR here: https://github.com/ethereum/eth2.0-APIs/pull/117
Co-authored-by: realbigsean <seananderson33@gmail.com>
## Description
This PR updates Lighthouse to tokio 0.3. It includes a number of dependency updates and some structural changes as to how we create and spawn tasks.
This also brings with it a number of various improvements:
- Discv5 update
- Libp2p update
- Fix for recompilation issues
- Improved UPnP port mapping handling
- Futures dependency update
- Log downgrade to traces for rejecting peers when we've reached our max
Co-authored-by: blacktemplar <blacktemplar@a1.net>
## Issue Addressed
part of #1883
## Proposed Changes
Adds a new cli argument `--eth1-endpoints` that can be used instead of `--eth1-endpoint` to specify a comma-separated list of endpoints. If the first endpoint returns an error for some request the other endpoints are tried in the given order.
## Additional Info
Currently if the first endpoint fails the fallbacks are used silently (except for `try_fallback_test_endpoint` that is used in `do_update` which logs a `WARN` for each endpoint that is not reachable). A question is if we should add more logs so that the user gets warned if his main endpoint is for example just slow and sometimes hits timeouts.