## Issue Addressed
N/A
## Proposed Changes
Add a HTTP API which can be used to compute the attestation performances of a validator (or all validators) over a discrete range of epochs.
Performances can be computed for a single validator, or for the global validator set.
## Usage
### Request
The API can be used as follows:
```
curl "http://localhost:5052/lighthouse/analysis/attestation_performance/{validator_index}?start_epoch=57730&end_epoch=57732"
```
Alternatively, to compute performances for the global validator set:
```
curl "http://localhost:5052/lighthouse/analysis/attestation_performance/global?start_epoch=57730&end_epoch=57732"
```
### Response
The response is JSON formatted as follows:
```
[
{
"index": 72,
"epochs": {
"57730": {
"active": true,
"head": false,
"target": false,
"source": false
},
"57731": {
"active": true,
"head": true,
"target": true,
"source": true,
"delay": 1
},
"57732": {
"active": true,
"head": true,
"target": true,
"source": true,
"delay": 1
},
}
}
]
```
> Note that the `"epochs"` are not guaranteed to be in ascending order.
## Additional Info
- This API is intended to be used in our upcoming validator analysis tooling (#2873) and will likely not be very useful for regular users. Some advanced users or block explorers may find this API useful however.
- The request range is limited to 100 epochs (since the range is inclusive and it also computes the `end_epoch` it's actually 101 epochs) to prevent Lighthouse using exceptionally large amounts of memory.
## Issue Addressed
Closes https://github.com/sigp/lighthouse/issues/2857
## Proposed Changes
Explicitly set GNU malloc's MMAP_THRESHOLD to 128KB, disabling dynamic adjustments. For rationale see the linked issue.
## Issue Addressed
NA
## Proposed Changes
- Bump Lighthouse version to v2.1.1
- Update `thread_local` from v1.1.3 to v1.1.4 to address https://rustsec.org/advisories/RUSTSEC-2022-0006
## Additional Info
- ~~Blocked on #2950~~
- ~~Blocked on #2952~~
## Proposed Changes
This PR establishes compatibility between Lighthouse's VC and Nimbus's BN.
Lighthouse was previously `POST`ing unquoted lists of validator indices to the attester and sync duties endpoints which were (correctly) not accepted by Nimbus. These lists had slipped through the cracks because they didn't have an explicit wrapper type to add `serde` annotations to. I've added the `ValidatorIndexDataRef` newtype in order to implement the modified serialisation behaviour.
## Testing
Combined with https://github.com/sigp/lighthouse/pull/2940, I've confirmed that this PR allows my Lighthouse VC on Prater to validate with the public Nimbus BN listed here: https://github.com/status-im/nimbus-eth2#quickly-test-your-tooling-against-nimbus. I haven't had a block proposal yet, but attestations and sync committee messages are working.
## Additional Info
This may also provide compatibility with Prysm BNs but I haven't had a chance to test that yet.
## Issues Addressed
Closes#2739Closes#2812
## Proposed Changes
Support the deserialization of query strings containing duplicate keys into their corresponding types.
As `warp` does not support this feature natively (as discussed in #2739), it relies on the external library [`serde_array_query`](https://github.com/sigp/serde_array_query) (written by @michaelsproul)
This is backwards compatible meaning that both of the following requests will produce the same output:
```
curl "http://localhost:5052/eth/v1/events?topics=head,block"
```
```
curl "http://localhost:5052/eth/v1/events?topics=head&topics=block"
```
## Additional Info
Certain error messages have changed slightly. This only affects endpoints which accept multiple values.
For example:
```
{"code":400,"message":"BAD_REQUEST: invalid query: Invalid query string","stacktraces":[]}
```
is now
```
{"code":400,"message":"BAD_REQUEST: unable to parse query","stacktraces":[]}
```
The serve order of the endpoints `get_beacon_state_validators` and `get_beacon_state_validators_id` have flipped:
```rust
.or(get_beacon_state_validators_id.boxed())
.or(get_beacon_state_validators.boxed())
```
This is to ensure proper error messages when filter fallback occurs due to the use of the `and_then` filter.
## Future Work
- Cleanup / remove filter fallback behaviour by substituting `and_then` with `then` where appropriate.
- Add regression tests for HTTP API error messages.
## Credits
- @mooori for doing the ground work of investigating possible solutions within the existing Rust ecosystem.
- @michaelsproul for writing [`serde_array_query`](https://github.com/sigp/serde_array_query) and for helping debug the behaviour of the `warp` filter fallback leading to incorrect error messages.
## Proposed Changes
Change the canonical fork name for the merge to Bellatrix. Keep other merge naming the same to avoid churn.
I've also fixed and enabled the `fork` and `transition` tests for Bellatrix, and the v1.1.7 fork choice tests.
Additionally, the `BellatrixPreset` has been added with tests. It gets served via the `/config/spec` API endpoint along with the other presets.
## Issue Addressed
NA
## Proposed Changes
We have observed occasions were under-resourced nodes will receive messages that were valid *at the time*, but later become invalidated due to long waits for a `BeaconProcessor` worker.
In this PR, we will check to see if the message was valid *at the time of receipt*. If it was initially valid but invalid now, we just ignore the message without penalizing the peer.
## Additional Info
NA
## Proposed Changes
Add `mallinfo2` behind a feature flag so that we can get accurate memory metrics during debugging. It can be enabled when building Lighthouse like so (so long as the platform supports it):
```
cargo install --path lighthouse --features "malloc_utils/mallinfo2"
```
## Issue Addressed
Resolves: https://github.com/sigp/lighthouse/issues/2741
Includes: https://github.com/sigp/lighthouse/pull/2853 so that we can get ssz static tests passing here on v1.1.6. If we want to merge that first, we can make this diff slightly smaller
## Proposed Changes
- Changes the `justified_epoch` and `finalized_epoch` in the `ProtoArrayNode` each to an `Option<Checkpoint>`. The `Option` is necessary only for the migration, so not ideal. But does allow us to add a default logic to `None` on these fields during the database migration.
- Adds a database migration from a legacy fork choice struct to the new one, search for all necessary block roots in fork choice by iterating through blocks in the db.
- updates related to https://github.com/ethereum/consensus-specs/pull/2727
- We will have to update the persisted forkchoice to make sure the justified checkpoint stored is correct according to the updated fork choice logic. This boils down to setting the forkchoice store's justified checkpoint to the justified checkpoint of the block that advanced the finalized checkpoint to the current one.
- AFAICT there's no migration steps necessary for the update to allow applying attestations from prior blocks, but would appreciate confirmation on that
- I updated the consensus spec tests to v1.1.6 here, but they will fail until we also implement the proposer score boost updates. I confirmed that the previously failing scenario `new_finalized_slot_is_justified_checkpoint_ancestor` will now pass after the boost updates, but haven't confirmed _all_ tests will pass because I just quickly stubbed out the proposer boost test scenario formatting.
- This PR now also includes proposer boosting https://github.com/ethereum/consensus-specs/pull/2730
## Additional Info
I realized checking justified and finalized roots in fork choice makes it more likely that we trigger this bug: https://github.com/ethereum/consensus-specs/pull/2727
It's possible the combination of justified checkpoint and finalized checkpoint in the forkchoice store is different from in any block in fork choice. So when trying to startup our store's justified checkpoint seems invalid to the rest of fork choice (but it should be valid). When this happens we get an `InvalidBestNode` error and fail to start up. So I'm including that bugfix in this branch.
Todo:
- [x] Fix fork choice tests
- [x] Self review
- [x] Add fix for https://github.com/ethereum/consensus-specs/pull/2727
- [x] Rebase onto Kintusgi
- [x] Fix `num_active_validators` calculation as @michaelsproul pointed out
- [x] Clean up db migrations
Co-authored-by: realbigsean <seananderson33@gmail.com>
## Issue Addressed
New rust lints
## Proposed Changes
- Boxing some enum variants
- removing some unused fields (is the validator lockfile unused? seemed so to me)
## Additional Info
- some error fields were marked as dead code but are logged out in areas
- left some dead fields in our ef test code because I assume they are useful for debugging?
Co-authored-by: realbigsean <seananderson33@gmail.com>
## Issue Addressed
Closes#1996
## Proposed Changes
Run a second `Logger` via `sloggers` which logs to a file in the background with:
- separate `debug-level` for background and terminal logging
- the ability to limit log size
- rotation through a customizable number of log files
- an option to compress old log files (`.gz` format)
Add the following new CLI flags:
- `--logfile-debug-level`: The debug level of the log files
- `--logfile-max-size`: The maximum size of each log file
- `--logfile-max-number`: The number of old log files to store
- `--logfile-compress`: Whether to compress old log files
By default background logging uses the `debug` log level and saves logfiles to:
- Beacon Node: `$HOME/.lighthouse/$network/beacon/logs/beacon.log`
- Validator Client: `$HOME/.lighthouse/$network/validators/logs/validator.log`
Or, when using the `--datadir` flag:
`$datadir/beacon/logs/beacon.log` and `$datadir/validators/logs/validator.log`
Once rotated, old logs are stored like so: `beacon.log.1`, `beacon.log.2` etc.
> Note: `beacon.log.1` is always newer than `beacon.log.2`.
## Additional Info
Currently the default value of `--logfile-max-size` is 200 (MB) and `--logfile-max-number` is 5.
This means that the maximum storage space that the logs will take up by default is 1.2GB.
(200MB x 5 from old log files + <200MB the current logfile being written to)
Happy to adjust these default values to whatever people think is appropriate.
It's also worth noting that when logging to a file, we lose our custom `slog` formatting. This means the logfile logs look like this:
```
Oct 27 16:02:50.305 INFO Lighthouse started, version: Lighthouse/v2.0.1-8edd9d4+, module: lighthouse:413
Oct 27 16:02:50.305 INFO Configured for network, name: prater, module: lighthouse:414
```
While testing some code on Windows, I ran into a failure when using `clippy` via (`make lint`):
```
error: this expression borrows a reference (`&str`) that is immediately dereferenced by the compiler
--> common/filesystem/src/lib.rs:105:43
|
105 | let mut acl = ACL::from_file_path(&path_str, false).map_err(Error::UnableToRetrieveACL)?;
| ^^^^^^^^^ help: change this to: `path_str`
|
= note: `-D clippy::needless-borrow` implied by `-D warnings`
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_borrow
error: could not compile `filesystem` due to previous error
```
## Proposed Changes
Remove the unnecessary borrow as suggested.
## Additional Info
Since we are only running `clippy` in CI on Ubuntu, I believe we don't have any way (in CI) to detect these Windows specific lint errors (either from new code, or from linting changes from new Rust versions.
This is because code marked as `#[cfg(windows)]` is not checked on `unix` systems and vice versa.
I'm conscious that our CI runs are already taking a long time, and that adding a new Windows `clippy` run would add a non-negligible amount of time to the runs (not sure if this topic has already been discussed), but it something to be aware of.
## Extra Note
I don't think this is the case, but it might be worth someone else running `make lint` on their Windows machines to eliminate the possibility that this is an error specific to my setup.
## Proposed Changes
* Add the `Eth-Consensus-Version` header to the HTTP API for the block and state endpoints. This is part of the v2.1.0 API that was recently released: https://github.com/ethereum/beacon-APIs/pull/170
* Add tests for the above. I refactored the `eth2` crate's helper functions to make this more straight-forward, and introduced some new mixin traits that I think greatly improve readability and flexibility.
* Add a new `map_with_fork!` macro which is useful for decoding a superstruct type without naming all its variants. It is now used for SSZ-decoding `BeaconBlock` and `BeaconState`, and for JSON-decoding `SignedBeaconBlock` in the API.
## Additional Info
The `map_with_fork!` changes will conflict with the Merge changes, but when resolving the conflict the changes from this branch should be preferred (it is no longer necessary to enumerate every fork). The merge fork _will_ need to be added to `map_fork_name_with`.
I have been in the process of debugging libp2p tasks as there is something locking our executor.
This addition adds a metric allowing us to track all tasks within lighthouse allowing us to identify various sections of Lighthouse code that may be taking longer than normal to process.
## Issue Addressed
When compiling with Rust 1.56.0 the compiler generates 3 instances of this warning:
```
warning: trailing semicolon in macro used in expression position
--> common/eth2_network_config/src/lib.rs:181:24
|
181 | })?;
| ^
...
195 | let deposit_contract_deploy_block = load_from_file!(DEPLOY_BLOCK_FILE);
| ---------------------------------- in this macro invocation
|
= note: `#[warn(semicolon_in_expressions_from_macros)]` on by default
= warning: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!
= note: for more information, see issue #79813 <https://github.com/rust-lang/rust/issues/79813>
= note: this warning originates in the macro `load_from_file` (in Nightly builds, run with -Z macro-backtrace for more info)
```
This warning is completely harmless, but will be visible to users compiling Lighthouse v2.0.1 (or earlier) with Rust 1.56.0 (to be released October 21st). It is **completely safe** to ignore this warning, it's just a superficial change to Rust's syntax.
## Proposed Changes
This PR removes the semi-colon as recommended, and fixes the new Clippy lints from 1.56.0
## Description
The `eth2_libp2p` crate was originally named and designed to incorporate a simple libp2p integration into lighthouse. Since its origins the crates purpose has expanded dramatically. It now houses a lot more sophistication that is specific to lighthouse and no longer just a libp2p integration.
As of this writing it currently houses the following high-level lighthouse-specific logic:
- Lighthouse's implementation of the eth2 RPC protocol and specific encodings/decodings
- Integration and handling of ENRs with respect to libp2p and eth2
- Lighthouse's discovery logic, its integration with discv5 and logic about searching and handling peers.
- Lighthouse's peer manager - This is a large module handling various aspects of Lighthouse's network, such as peer scoring, handling pings and metadata, connection maintenance and recording, etc.
- Lighthouse's peer database - This is a collection of information stored for each individual peer which is specific to lighthouse. We store connection state, sync state, last seen ips and scores etc. The data stored for each peer is designed for various elements of the lighthouse code base such as syncing and the http api.
- Gossipsub scoring - This stores a collection of gossipsub 1.1 scoring mechanisms that are continuously analyssed and updated based on the ethereum 2 networks and how Lighthouse performs on these networks.
- Lighthouse specific types for managing gossipsub topics, sync status and ENR fields
- Lighthouse's network HTTP API metrics - A collection of metrics for lighthouse network monitoring
- Lighthouse's custom configuration of all networking protocols, RPC, gossipsub, discovery, identify and libp2p.
Therefore it makes sense to rename the crate to be more akin to its current purposes, simply that it manages the majority of Lighthouse's network stack. This PR renames this crate to `lighthouse_network`
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
NA
## Proposed Changes
- Update versions to `v2.0.1` in anticipation for a release early next week.
- Add `--ignore` to `cargo audit`. See #2727.
## Additional Info
NA
Currently, the beacon node has no ability to serve the HTTP API over TLS.
Adding this functionality would be helpful for certain use cases, such as when you need a validator client to connect to a backup beacon node which is outside your local network, and the use of an SSH tunnel or reverse proxy would be inappropriate.
## Proposed Changes
- Add three new CLI flags to the beacon node
- `--http-enable-tls`: enables TLS
- `--http-tls-cert`: to specify the path to the certificate file
- `--http-tls-key`: to specify the path to the key file
- Update the HTTP API to optionally use `warp`'s [`TlsServer`](https://docs.rs/warp/0.3.1/warp/struct.TlsServer.html) depending on the presence of the `--http-enable-tls` flag
- Update tests and docs
- Use a custom branch for `warp` to ensure proper error handling
## Additional Info
Serving the API over TLS should currently be considered experimental. The reason for this is that it uses code from an [unmerged PR](https://github.com/seanmonstar/warp/pull/717). This commit provides the `try_bind_with_graceful_shutdown` method to `warp`, which is helpful for controlling error flow when the TLS configuration is invalid (cert/key files don't exist, incorrect permissions, etc).
I've implemented the same code in my [branch here](https://github.com/macladson/warp/tree/tls).
Once the code has been reviewed and merged upstream into `warp`, we can remove the dependency on my branch and the feature can be considered more stable.
Currently, the private key file must not be password-protected in order to be read into Lighthouse.
## Issue Addressed
Fix `cargo audit` failures on `unstable`
Closes#2698
## Proposed Changes
The main culprit is `nix`, which is vulnerable for versions below v0.23.0. We can't get by with a straight-forward `cargo update` because `psutil` depends on an old version of `nix` (cf. https://github.com/rust-psutil/rust-psutil/pull/93). Hence I've temporarily forked `psutil` under the `sigp` org, where I've included the update to `nix` v0.23.0.
Additionally, I took the chance to update the `time` dependency to v0.3, which removed a bunch of stale deps including `stdweb` which is no longer maintained. Lighthouse only uses the `time` crate in the notifier to do some pretty printing, and so wasn't affected by any of the breaking changes in v0.3 ([changelog here](https://github.com/time-rs/time/blob/main/CHANGELOG.md#030-2021-07-30)).
## Proposed Changes
While investigating memory usage I noticed that the malloc metrics were going negative once they passed 2GiB. This is because the underlying `mallinfo` function returns a `i32`, and we were casting it straight to an `i64`, preserving the sign.
The long-term fix will be to move to `mallinfo2`, but it's still not yet widely available.
## Issue Addressed
Fix#2585
## Proposed Changes
Provide a canonical version of test_logger that can be used
throughout lighthouse.
## Additional Info
This allows tests to conditionally emit logging data by adding
test_logger as the default logger. And then when executing
`cargo test --features logging/test_logger` log output
will be visible:
wink@3900x:~/lighthouse/common/logging/tests/test-feature-test_logger (Add-test_logger-as-feature-to-logging)
$ cargo test --features logging/test_logger
Finished test [unoptimized + debuginfo] target(s) in 0.02s
Running unittests (target/debug/deps/test_logger-e20115db6a5e3714)
running 1 test
Sep 10 12:53:45.212 INFO hi, module: test_logger:8
test tests::test_fn_with_logging ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests test-logger
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Or, in normal scenarios where logging isn't needed, executing
`cargo test` the log output will not be visible:
wink@3900x:~/lighthouse/common/logging/tests/test-feature-test_logger (Add-test_logger-as-feature-to-logging)
$ cargo test
Finished test [unoptimized + debuginfo] target(s) in 0.02s
Running unittests (target/debug/deps/test_logger-02e02f8d41e8cf8a)
running 1 test
test tests::test_fn_with_logging ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests test-logger
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
## Proposed Changes
Cut the first release candidate for v2.0.0, in preparation for testing and release this week
## Additional Info
Builds on #2632, which should either be merged first or in the same batch
## Issue Addressed
Closes#2528
## Proposed Changes
- Add `BlockTimesCache` to provide block timing information to `BeaconChain`. This allows additional metrics to be calculated for blocks that are set as head too late.
- Thread the `seen_timestamp` of blocks received from RPC responses (except blocks from syncing) through to the sync manager, similar to what is done for blocks from gossip.
## Additional Info
This provides the following additional metrics:
- `BEACON_BLOCK_OBSERVED_SLOT_START_DELAY_TIME`
- The delay between the start of the slot and when the block was first observed.
- `BEACON_BLOCK_IMPORTED_OBSERVED_DELAY_TIME`
- The delay between when the block was first observed and when the block was imported.
- `BEACON_BLOCK_HEAD_IMPORTED_DELAY_TIME`
- The delay between when the block was imported and when the block was set as head.
The metric `BEACON_BLOCK_IMPORTED_SLOT_START_DELAY_TIME` was removed.
A log is produced when a block is set as head too late, e.g.:
```
Aug 27 03:46:39.006 DEBG Delayed head block set_as_head_delay: Some(21.731066ms), imported_delay: Some(119.929934ms), observed_delay: Some(3.864596988s), block_delay: 4.006257988s, slot: 1931331, proposer_index: 24294, block_root: 0x937602c89d3143afa89088a44bdf4b4d0d760dad082abacb229495c048648a9e, service: beacon
```
## Issue Addressed
NA
## Proposed Changes
As `cargo audit` astutely pointed out, the version of `zeroize_derive` were were using had a vulnerability:
```
Crate: zeroize_derive
Version: 1.1.0
Title: `#[zeroize(drop)]` doesn't implement `Drop` for `enum`s
Date: 2021-09-24
ID: RUSTSEC-2021-0115
URL: https://rustsec.org/advisories/RUSTSEC-2021-0115
Solution: Upgrade to >=1.2.0
```
This PR updates `zeroize` and `zeroize_derive` to appease `cargo audit`.
`tiny-bip39` was also updated to allow compile.
## Additional Info
I don't believe this vulnerability actually affected the Lighthouse code-base directly. However, `tiny-bip39` may have been affected which may have resulted in some uncleaned memory in Lighthouse. Whilst this is not ideal, it's not a major issue. Zeroization is a nice-to-have since it only protects from sophisticated attacks or attackers that already have a high level of access already.
## Issue Addressed
NA
## Proposed Changes
Implements the "union" type from the SSZ spec for `ssz`, `ssz_derive`, `tree_hash` and `tree_hash_derive` so it may be derived for `enums`:
https://github.com/ethereum/consensus-specs/blob/v1.1.0-beta.3/ssz/simple-serialize.md#union
The union type is required for the merge, since the `Transaction` type is defined as a single-variant union `Union[OpaqueTransaction]`.
### Crate Updates
This PR will (hopefully) cause CI to publish new versions for the following crates:
- `eth2_ssz_derive`: `0.2.1` -> `0.3.0`
- `eth2_ssz`: `0.3.0` -> `0.4.0`
- `eth2_ssz_types`: `0.2.0` -> `0.2.1`
- `tree_hash`: `0.3.0` -> `0.4.0`
- `tree_hash_derive`: `0.3.0` -> `0.4.0`
These these crates depend on each other, I've had to add a workspace-level `[patch]` for these crates. A follow-up PR will need to remove this patch, ones the new versions are published.
### Union Behaviors
We already had SSZ `Encode` and `TreeHash` derive for enums, however it just did a "transparent" pass-through of the inner value. Since the "union" decoding from the spec is in conflict with the transparent method, I've required that all `enum` have exactly one of the following enum-level attributes:
#### SSZ
- `#[ssz(enum_behaviour = "union")]`
- matches the spec used for the merge
- `#[ssz(enum_behaviour = "transparent")]`
- maintains existing functionality
- not supported for `Decode` (never was)
#### TreeHash
- `#[tree_hash(enum_behaviour = "union")]`
- matches the spec used for the merge
- `#[tree_hash(enum_behaviour = "transparent")]`
- maintains existing functionality
This means that we can maintain the existing transparent behaviour, but all existing users will get a compile-time error until they explicitly opt-in to being transparent.
### Legacy Option Encoding
Before this PR, we already had a union-esque encoding for `Option<T>`. However, this was with the *old* SSZ spec where the union selector was 4 bytes. During merge specification, the spec was changed to use 1 byte for the selector.
Whilst the 4-byte `Option` encoding was never used in the spec, we used it in our database. Writing a migrate script for all occurrences of `Option` in the database would be painful, especially since it's used in the `CommitteeCache`. To avoid the migrate script, I added a serde-esque `#[ssz(with = "module")]` field-level attribute to `ssz_derive` so that we can opt into the 4-byte encoding on a field-by-field basis.
The `ssz::legacy::four_byte_impl!` macro allows a one-liner to define the module required for the `#[ssz(with = "module")]` for some `Option<T> where T: Encode + Decode`.
Notably, **I have removed `Encode` and `Decode` impls for `Option`**. I've done this to force a break on downstream users. Like I mentioned, `Option` isn't used in the spec so I don't think it'll be *that* annoying. I think it's nicer than quietly having two different union implementations or quietly breaking the existing `Option` impl.
### Crate Publish Ordering
I've modified the order in which CI publishes crates to ensure that we don't publish a crate without ensuring we already published a crate that it depends upon.
## TODO
- [ ] Queue a follow-up `[patch]`-removing PR.
## Issue Addressed
Closes#1891Closes#1784
## Proposed Changes
Implement checkpoint sync for Lighthouse, enabling it to start from a weak subjectivity checkpoint.
## Additional Info
- [x] Return unavailable status for out-of-range blocks requested by peers (#2561)
- [x] Implement sync daemon for fetching historical blocks (#2561)
- [x] Verify chain hashes (either in `historical_blocks.rs` or the calling module)
- [x] Consistency check for initial block + state
- [x] Fetch the initial state and block from a beacon node HTTP endpoint
- [x] Don't crash fetching beacon states by slot from the API
- [x] Background service for state reconstruction, triggered by CLI flag or API call.
Considered out of scope for this PR:
- Drop the requirement to provide the `--checkpoint-block` (this would require some pretty heavy refactoring of block verification)
Co-authored-by: Diva M <divma@protonmail.com>
Sigma Prime is transitioning our mainnet bootnodes and this PR represents the transition of our bootnodes.
After a few releases, old boot-nodes will be deprecated.
[EIP-3030]: https://eips.ethereum.org/EIPS/eip-3030
[Web3Signer]: https://consensys.github.io/web3signer/web3signer-eth2.html
## Issue Addressed
Resolves#2498
## Proposed Changes
Allows the VC to call out to a [Web3Signer] remote signer to obtain signatures.
## Additional Info
### Making Signing Functions `async`
To allow remote signing, I needed to make all the signing functions `async`. This caused a bit of noise where I had to convert iterators into `for` loops.
In `duties_service.rs` there was a particularly tricky case where we couldn't hold a write-lock across an `await`, so I had to first take a read-lock, then grab a write-lock.
### Move Signing from Core Executor
Whilst implementing this feature, I noticed that we signing was happening on the core tokio executor. I suspect this was causing the executor to temporarily lock and occasionally trigger some HTTP timeouts (and potentially SQL pool timeouts, but I can't verify this). Since moving all signing into blocking tokio tasks, I noticed a distinct drop in the "atttestations_http_get" metric on a Prater node:
![http_get_times](https://user-images.githubusercontent.com/6660660/132143737-82fd3836-2e7e-445b-a143-cb347783baad.png)
I think this graph indicates that freeing the core executor allows the VC to operate more smoothly.
### Refactor TaskExecutor
I noticed that the `TaskExecutor::spawn_blocking_handle` function would fail to spawn tasks if it were unable to obtain handles to some metrics (this can happen if the same metric is defined twice). It seemed that a more sensible approach would be to keep spawning tasks, but without metrics. To that end, I refactored the function so that it would still function without metrics. There are no other changes made.
## TODO
- [x] Restructure to support multiple signing methods.
- [x] Add calls to remote signer from VC.
- [x] Documentation
- [x] Test all endpoints
- [x] Test HTTPS certificate
- [x] Allow adding remote signer validators via the API
- [x] Add Altair support via [21.8.1-rc1](https://github.com/ConsenSys/web3signer/releases/tag/21.8.1-rc1)
- [x] Create issue to start using latest version of web3signer. (See #2570)
## Notes
- ~~Web3Signer doesn't yet support the Altair fork for Prater. See https://github.com/ConsenSys/web3signer/issues/423.~~
- ~~There is not yet a release of Web3Signer which supports Altair blocks. See https://github.com/ConsenSys/web3signer/issues/391.~~
## Proposed Changes
* Modify the `TaskExecutor` so that it spawns a "monitor" future for each future spawned by `spawn` or `spawn_blocking`. This monitor future joins the handle of the child future and shuts down the executor if it detects a panic.
* Enable backtraces by default by setting the environment variable `RUST_BACKTRACE`.
* Spawn the `ProductionBeaconNode` on the `TaskExecutor` so that if a panic occurs during start-up it will take down the whole process. Previously we were using a raw Tokio `spawn`, but I can't see any reason not to use the executor (perhaps someone else can).
## Additional Info
I considered using [`std::panic::set_hook`](https://doc.rust-lang.org/std/panic/fn.set_hook.html) to instantiate a custom panic handler, however this doesn't allow us to send a shutdown signal because `Fn` functions can't move variables (i.e. the shutdown sender) out of their environment. This also prevents it from receiving a `Logger`. Hence I decided to leave the panic handler untouched, but with backtraces turned on by default.
I did a run through the code base with all the raw Tokio spawn functions disallowed by Clippy, and found only two instances where we bypass the `TaskExecutor`: the HTTP API and `InitializedValidators` in the VC. In both places we use `spawn_blocking` and handle the return value, so I figured that was OK for now.
In terms of performance I think the overhead should be minimal. The monitor tasks will just get parked by the executor until their child resolves.
I've checked that this covers Discv5, as the `TaskExecutor` gets injected into Discv5 here: f9bba92db3/beacon_node/src/lib.rs (L125-L126)