## Issue Addressed
NA
## Proposed Changes
Carries on from #4115, with the following modifications:
1. Self-hosted runners are only enabled if `github.repository == sigp/lighthouse`.
- This allows forks to still have Github-hosted CI.
- This gives us a method to switch back to Github-runners if we have extended downtime on self-hosted.
1. Does not remove any existing dependency builds for Github-hosted runners (e.g., installing the latest Rust).
1. Adds the `WATCH_HOST` environment variable which defines where we expect to find the postgres db in the `watch` tests. This should be set to `host.docker.internal` for the tests to pass on self-hosted runners.
## Additional Info
NA
Co-authored-by: antondlr <anton@delaruelle.net>
## Issue Addressed
n/a Noticed this while working on something else
## Proposed Changes
- leverage the appropriate types to avoid a bunch of `unwrap` and errors
## Additional Info
n/a
## Issue Addressed
Speed up CI by installing foundry with Github action instead of building Anvil from source.
Building anvil from source on GItHub hosted runners currently takes about 10 mins. Using the `foundry-toolchain` action to install only takes about 2 seconds.
## Issue Addressed
N/A
## Proposed Changes
Add lints for rust 1.71
[3789134](3789134ae2) is probably the one that needs most attention as it changes beacon state code. I changed the `is_in_inactivity_leak ` function to return a `ArithError` as not all consumers of that function work well with a `BeaconState::Error`.
## Issue Addressed
This PR attempts to workaround the recent frequent eth1 simulator failures caused by missing eth logs from Anvil.
> FailedToInsertDeposit(NonConsecutive { log_index: 1, expected: 0 })
This usually occurs at the beginning of the tests, and it guarantees a timeout after a few hours if this log shows up, and this is currently causing our CIs to fail quite frequently.
Example failure here: https://github.com/sigp/lighthouse/actions/runs/5525760195/jobs/10079736914
## Proposed Changes
The quick fix applied here adds a timeout to node startup and restarts the node again.
- Add a 60 seconds timeout to beacon node startup in eth1 simulator tests. It takes ~10 seconds on my machine, but could take longer on CI runners.
- Wrap the startup code in a retry function, that allows for 3 retries before returning an error.
## Additional Info
We should probably raise an issue under the Anvil GitHub repo there so this can be further investigated.
## Issue Addressed
Addresses an issue where CI could fail due to an nonexisting file error:
```
Run ./clean.sh
rm: cannot remove '/home/runner/.lighthouse/local-testnet/geth_datadir4/geth/fastcache.tmp.1549331618': No such file or directory
Error: Process completed with exit code 1.
```
This seems to happen quite frequently now, I'm not sure exactly why but perhaps worth trying suppressing the prompt?
https://github.com/sigp/lighthouse/actions/runs/5455027574/jobs/9925916159?pr=4463
## Proposed Changes
Replace `wget` in the EF-tests makefile with `curl`.
On macOS `curl` is pre-installed, and I found myself making this change to avoid installing `wget`.
The `-L` flag is used to follow redirects which is useful if a repo gets renamed, and more similar to `wget`'s default behaviour.
## Issue Addressed
Fixes occasional compilation errors with mev-rs (see #4456).
## Proposed Changes
- Update `mev-rs` to the latest version, which allows us to remove hacky `[patch]` sections
- Update the `axum` version used in `watch` so LH only uses a single version
## Issue Addressed
#4488
## Proposed Changes
- Remove all instances of the `required` modifier where we have a default value specified for a subcommand
## Additional Info
N/A
## Issue Addressed
When trying to run `eth1-sim` locally, the simulator doesn't start for me, and panicked due to duplicate arg names for `proposer-nodes` (using same arg names as `nodes`). Not sure why this isn't failing on CI but failing on mine 🤔
```
thread 'main' panicked at 'Argument short must be unique
thread 'main' panicked at 'Argument long must be unique
```
## Issue Addressed
Fix an issue observed by `@zlan` on Discord where Lighthouse would sometimes return this error when looking up states via the API:
> {"code":500,"message":"UNHANDLED_ERROR: ForkChoiceError(MissingProtoArrayBlock(0xc9cf1495421b6ef3215d82253b388d77321176a1dcef0db0e71a0cd0ffc8cdb7))","stacktraces":[]}
## Proposed Changes
The error stems from a faulty assumption in the HTTP API logic: that any state in the hot database must have its block in fork choice. This isn't true because the state's hot database may update much less frequently than the fork choice store, e.g. if reconstructing states (where freezer migration pauses), or if the freezer migration runs slowly. There could also be a race between loading the hot state and checking fork choice, e.g. even if the finalization migration of DB+fork choice were atomic, the update could happen between the 1st and 2nd calls.
To address this I've changed the HTTP API logic to use the finalized block's execution status as a fallback where it is safe to do so. In the case where a block is non-canonical and prior to finalization (permanently orphaned) we default `execution_optimistic` to `true`.
## Additional Info
I've also added a new CLI flag to reduce the frequency of the finalization migration as this is useful for several purposes:
- Spacing out database writes (less frequent, larger batches)
- Keeping a limited chain history with high availability, e.g. the last month in the hot database.
This new flag made it _substantially_ easier to test this change. It was extracted from `tree-states` (where it's called `--db-migration-period`), which is why this PR also carries the `tree-states` label.
## Issue Addressed
#4494
## Proposed Changes
- Remove explicit re-exports of various types to appease the new compiler lint
## Additional Info
It seems `warn(hidden_glob_reexports)` is the main culprit.
## Proposed Changes
* Add `lcli state-root` command for computing the hash tree root of a `BeaconState`.
* Add a `--network` flag which can be used instead of `--testnet-dir` to set the network, e.g. Mainnet, Goerli, Gnosis.
* Use the new network flag in `transition-blocks`, `skip-slots`, and `block-root`, which previously only supported mainnet.
* **BREAKING CHANGE** Remove the default value of `~/.lighthouse/testnet` from `--testnet-dir`. This may have made sense in previous versions where `lcli` was more testnet focussed, but IMO it is an unnecessary complication and foot-gun today.
*Replaces #4434. It is identical, but this PR has a smaller diff due to a curated commit history.*
## Issue Addressed
NA
## Proposed Changes
This PR moves the scheduling logic for the `BeaconProcessor` into a new crate in `beacon_node/beacon_processor`. Previously it existed in the `beacon_node/network` crate.
This addresses a circular-dependency problem where it's not possible to use the `BeaconProcessor` from the `beacon_chain` crate. The `network` crate depends on the `beacon_chain` crate (`network -> beacon_chain`), but importing the `BeaconProcessor` into the `beacon_chain` crate would create a circular dependancy of `beacon_chain -> network`.
The `BeaconProcessor` was designed to provide queuing and prioritized scheduling for messages from the network. It has proven to be quite valuable and I believe we'd make Lighthouse more stable and effective by using it elsewhere. In particular, I think we should use the `BeaconProcessor` for:
1. HTTP API requests.
1. Scheduled tasks in the `BeaconChain` (e.g., state advance).
Using the `BeaconProcessor` for these tasks would help prevent the BN from becoming overwhelmed and would also help it to prioritize operations (e.g., choosing to process blocks from gossip before responding to low-priority HTTP API requests).
## Additional Info
This PR is intended to have zero impact on runtime behaviour. It aims to simply separate the *scheduling* code (i.e., the `BeaconProcessor`) from the *business logic* in the `network` crate (i.e., the `Worker` impls). Future PRs (see #4462) can build upon these works to actually use the `BeaconProcessor` for more operations.
I've gone to some effort to use `git mv` to make the diff look more like "file was moved and modified" rather than "file was deleted and a new one added". This should reduce review burden and help maintain commit attribution.
## Issue Addressed
[Users on Twitter](https://twitter.com/ashekhirin/status/1676334843192397824) are getting checkpoint sync URL timeouts with the default of 60s, so this PR increases the default timeout to 3 minutes.
I've also added a short section to the book about adjusting the timeout with `--checkpoint-sync-url-timeout`.
## Issue Addressed
#4331
## Proposed Changes
- Use comparison rather than strict equality between the earliest epoch we know about and the backfill target (which will be the most recent WSP by default or genesis)
- Add helper function `BackFillSync<T>::would_complete` to achieve this in one location
## Additional Info
- There's an ad hoc test for this in #4461
Co-authored-by: Age Manning <Age@AgeManning.com>
We now officially have ipv6 support. The mainnet bootnodes have been updated to support ipv6. This PR updates lighthouse's internal bootnodes for mainnet to avoid fetching them on initial load.
## Issue Addressed
#4118
## Proposed Changes
This PR introduces a "progressive balances" cache on the `BeaconState`, which keeps track of the accumulated target attestation balance for the current & previous epochs. The cached values are utilised by fork choice to calculate unrealized justification and finalization (instead of converting epoch participation arrays to balances for each block we receive).
This optimization will be rolled out gradually to allow for more testing. A new `--progressive-balances disabled|checked|strict|fast` flag is introduced to support this:
- `checked`: enabled with checks against participation cache, and falls back to the existing epoch processing calculation if there is a total target attester balance mismatch. There is no performance gain from this as the participation cache still needs to be computed. **This is the default mode for now.**
- `strict`: enabled with checks against participation cache, returns error if there is a mismatch. **Used for testing only**.
- `fast`: enabled with no comparative checks and without computing the participation cache. This mode gives us the performance gains from the optimization. This is still experimental and not currently recommended for production usage, but will become the default mode in a future release.
- `disabled`: disable the usage of progressive cache, and use the existing method for FFG progression calculation. This mode may be useful if we find a bug and want to stop the frequent error logs.
### Tasks
- [x] Initial cache implementation in `BeaconState`
- [x] Perform checks in fork choice to compare the progressive balances cache against results from `ParticipationCache`
- [x] Add CLI flag, and disable the optimization by default
- [x] Testing on Goerli & Benchmarking
- [x] Move caching logic from state processing to the `ProgressiveBalancesCache` (see [this comment](https://github.com/sigp/lighthouse/pull/4362#discussion_r1230877001))
- [x] Add attesting balance metrics
Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io>
## Issue Addressed
[#4292](https://github.com/sigp/lighthouse/issues/4292)
## Proposed Changes
Updated the node health endpoint
will return a 200 status code if `!syncing && !el_offline && !optimistic`
wil return a 206 if `(syncing || optimistic) && !el_offline`
will return a 503 if `el_offline`
## Additional Info
## Issue Addressed
[#4259](https://github.com/sigp/lighthouse/issues/4259)
## Proposed Changes
debounce spammy `Unable to send message to the beacon processor` log messages
## Additional Info
We could potentially debounce other logs that have the potential to be "spammy".
After some feedback we decided to additionally add the following change:
create a newtype wrapper around `mpsc::Sender<BeaconWorkEvent<T>>`. When there is an error on the try_send method on the wrapper, we increase a counter metric with one label per work type.
## Issue Addressed
- #4293
- #4264
## Proposed Changes
*Changes largely follow those suggested in the main issue*.
- Add new routes to HTTP API
- `post_beacon_blocks_v2`
- `post_blinded_beacon_blocks_v2`
- Add new routes to `BeaconNodeHttpClient`
- `post_beacon_blocks_v2`
- `post_blinded_beacon_blocks_v2`
- Define new Eth2 common types
- `BroadcastValidation`, enum representing the level of validation to apply to blocks prior to broadcast
- `BroadcastValidationQuery`, the corresponding HTTP query string type for the above type
- ~~Define `_checked` variants of both `publish_block` and `publish_blinded_block` that enforce a validation level at a type level~~
- Add interactive tests to the `bn_http_api_tests` test target covering each validation level (to their own test module, `broadcast_validation_tests`)
- `beacon/blocks`
- `broadcast_validation=gossip`
- Invalid (400)
- Full Pass (200)
- Partial Pass (202)
- `broadcast_validation=consensus`
- Invalid (400)
- Only gossip (400)
- Only consensus pass (i.e., equivocates) (200)
- Full pass (200)
- `broadcast_validation=consensus_and_equivocation`
- Invalid (400)
- Invalid due to early equivocation (400)
- Only gossip (400)
- Only consensus (400)
- Pass (200)
- `beacon/blinded_blocks`
- `broadcast_validation=gossip`
- Invalid (400)
- Full Pass (200)
- Partial Pass (202)
- `broadcast_validation=consensus`
- Invalid (400)
- Only gossip (400)
- ~~Only consensus pass (i.e., equivocates) (200)~~
- Full pass (200)
- `broadcast_validation=consensus_and_equivocation`
- Invalid (400)
- Invalid due to early equivocation (400)
- Only gossip (400)
- Only consensus (400)
- Pass (200)
- Add a new trait, `IntoGossipVerifiedBlock`, which allows type-level guarantees to be made as to gossip validity
- Modify the structure of the `ObservedBlockProducers` cache from a `(slot, validator_index)` mapping to a `((slot, validator_index), block_root)` mapping
- Modify `ObservedBlockProducers::proposer_has_been_observed` to return a `SeenBlock` rather than a boolean on success
- Punish gossip peer (low) for submitting equivocating blocks
- Rename `BlockError::SlashablePublish` to `BlockError::SlashableProposal`
## Additional Info
This PR contains changes that directly modify how blocks are verified within the client. For more context, consult [comments in-thread](https://github.com/sigp/lighthouse/pull/4316#discussion_r1234724202).
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
## Issue Addressed
NA
## Proposed Changes
Implements the `PrettyReqwestError` to wrap a `reqwest::Error` and give nicer `Debug` formatting. It also wraps the `Url` component in a `SensitiveUrl` to avoid leaking sensitive info in logs.
### Before
```
Reqwest(reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("localhost")), port: Some(9999), path: "/eth/v1/node/version", query: None, fragment: None }, source: hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 61, kind: ConnectionRefused, message: "Connection refused" })) })
```
### After
```
HttpClient(url: http://localhost:9999/, kind: request, detail: error trying to connect: tcp connect error: Connection refused (os error 61))
```
## Additional Info
I've also renamed the `Reqwest` error enum variants to `HttpClient`, to give people a better chance at knowing what's going on. Reqwest is pretty odd and looks like a typo.
I've implemented it in the `eth2` and `execution_layer` crates. This should affect most logs in the VC and EE-related ones in the BN.
I think the last crate that could benefit from the is the `beacon_node/eth1` crate. I haven't updated it in this PR since its error type is not so amenable to it (everything goes into a `String`). I don't have a whole lot of time to jig around with that at the moment and I feel that this PR as it stands is a significant enough improvement to merge on its own. Leaving it as-is is fine for the time being and we can always come back for it later (or implement in-protocol deposits!).
## Issue Addressed
Resolves#3238
## Proposed Changes
Please list or describe the changes introduced by this PR.
## Additional Info
Please provide any additional information. For example, future considerations
or information useful for reviewers.
## Issue Addressed
This PR adds a new `lint-fix` task to automatically fix simple Clippy warnings using `cargo clippy --fix`.
Usage:
```
make lint-fix
```
## Issue Addressed
#4386
## Proposed Changes
The original proposal described in the issue adds a new endpoint to support updating validator graffiti, but I realized we already have an endpoint that we use for updating various validator fields in memory and in the validator definitions file, so I think that would be the best place to add this to.
### API endpoint
`PATCH lighthouse/validators/{validator_pubkey}`
This endpoint updates the graffiti in both the [ validator definition file](https://lighthouse-book.sigmaprime.io/graffiti.html#2-setting-the-graffiti-in-the-validator_definitionsyml) and the in memory `InitializedValidators`. In the next block proposal, the new graffiti will be used.
Note that the [`--graffiti-file`](https://lighthouse-book.sigmaprime.io/graffiti.html#1-using-the---graffiti-file-flag-on-the-validator-client) flag has a priority over the validator definitions file, so if the caller attempts to update the graffiti while the `--graffiti-file` flag is present, the endpoint will return an error (Bad request 400).
## Tasks
- [x] Add graffiti update support to `PATCH lighthouse/validators/{validator_pubkey}`
- [x] Return error if user tries to update graffiti while the `--graffiti-flag` is set
- [x] Update Lighthouse book
## Issue Addressed
NA
## Proposed Changes
Adds the `--validator-registration-batch-size` flag to the VC to allow runtime configuration of the number of validators POSTed to the [`validator/register_validator`](https://ethereum.github.io/beacon-APIs/?urls.primaryName=dev#/Validator/registerValidator) endpoint.
There are builders (Agnostic and Eden) that are timing out with `regsiterValidator` requests with ~400 validators, even with a 9 second timeout. Exposing the batch size will help tune batch sizes to (hopefully) avoid this.
This PR should not change the behavior of Lighthouse when the new flag is not provided (i.e., the same default value is used).
## Additional Info
NA
## Proposed Changes
Remove `max-skip-slots` checks when processing blocks.
This was legacy code which was previously used in the Medalla testnet to sync to the correct fork.
With the addition of checkpoint sync which allows us to sync to any arbitrary fork, this is no longer a necessary feature, so it has been removed for simplicity.
## Additional Notes
The CLI flag and checks for attestation processing have been retained as it still may have uses in DoS protection.
Currently, the ENR of the node may not be correctly updated when specifying ipv6 fields through the CLI if an ENR exists on disk.
This remedies a bug where we were not checking for ipv6 fields when comparing whether to use an on-disk ENR or updating based on CLI configuration parameters.
This is only a minor correction to the table not properly showing up in Lighthouse book. The changes solves the formatting issue. Another change is on the link to do port-forwarding.
## Issue Addressed
Closes#4332
## Proposed Changes
Remove the `CountUnrealized` type, defaulting unrealized justification to _on_. This fixes the #4332 issue by ensuring that importing the same block to fork choice always results in the same outcome.
Finalized sync speed may be slightly impacted by this change, but that is deemed an acceptable trade-off until the optimisation from #4118 is implemented.
TODO:
- [x] Also check that the block isn't a duplicate before importing
Checkpointz now supports deposit snapshot but [they only support returning them in JSON](https://github.com/ethpandaops/checkpointz/issues/74) so I've modified lighthouse to request them in JSON by default.
There's also `get_opt` & `get_opt_with_timeout` methods which seem to expect responses in JSON but were not adding `Accept: application/json` to the request headers so I fixed that as well.
Also the beacon API puts quantities in quotes so I fixed that in the snapshot JSON serialization
## Issue Addressed
Resolves#3980. Builds on work by @GeemoCandama in #4084
## Proposed Changes
Extends the `SupportedProtocol` abstraction added in Geemo's PR and attempts to fix internal versioning of requests that are mentioned in this comment https://github.com/sigp/lighthouse/pull/4084#issuecomment-1496380033
Co-authored-by: geemo <geemo@tutanota.com>
## Issue Addressed
Closes#3964
## Proposed Changes
Use the `maxperf` profile to build Windows binaries, now that Rust 1.70.0 fixed the underlying issue.
## Proposed Changes
Correct some typos in the book, also update information about withdrawals since the Mainnet will be having 700K validators in about a month
## Issue Addressed
#3510
## Proposed Changes
- Replace mime with MediaTypeList
- Remove parse_accept fn as MediaTypeList does it built-in
- Get the supported media type of the highest q-factor in a single list iteration without sorting
## Additional Info
I have addressed the suggested changes in previous [PR](https://github.com/sigp/lighthouse/pull/3520#discussion_r959048633)
Done in different PRs so that they can reviewed independently, as it's likely this won't be merged before I leave
Includes resolution for #4080
- [ ] #4299
- [ ] #4318
- [ ] #4320
Co-authored-by: Diva M <divma@protonmail.com>
Co-authored-by: Age Manning <Age@AgeManning.com>
## Issue Addressed
This PR addresses issue https://github.com/sigp/lighthouse/issues/4350
## Proposed Changes
This change will enable slasher broadcast in the following cases:
No flag is passed,
`--slasher-broadcast` is passed and,
`--slasher-broadcast=true` is passed.
Only when an explicit false value is passed the slasher does not broadcast.(`--slasher-broadcast=false`).
## Additional Info
TODO
- [x] Modify CLI parsing logic
- [x] Write test
Refer to #4353
Co-authored-by: Rahul Dogra <rahulcooldogra@gmail.com>
Co-authored-by: Gua00va <105484243+Gua00va@users.noreply.github.com>
## Issue Addressed
Workaround for https://github.com/foundry-rs/foundry/issues/5115.
## Proposed Changes
Allow Anvil to be installed on Windows without errors by enabling the IPC features (which we don't use, but Anvil expects to exist).
## Issue Addressed
NA
## Proposed Changes
Downgrade a `CRIT` to an `ERRO` when there's an `Irrecoverable` error whilst publishing a blinded block.
It's quite common for builders successfully broadcast a block to the network whilst failing to respond to the BN when it publishes a signed, blinded block. The VC is currently raising a `CRIT` when this happens and I think that's excessive.
These changes have the same intent as #4073. In that PR I only managed to remove the `CRIT`s in the BN but missed this one in the VC.
I've also tidied the log messages to:
- Give them all the same title (*"Error whilst producing block"*) to help with grepping.
- Include the `block_slot` so it's easy to look up the slot in an explorer and see if it was actually skipped.
## Additional Info
This PR should not change any logic beyond logging.
## Issue Addressed
Closes#4354Closes#3987
Replaces #4305, #4283
## Proposed Changes
This switches the default slasher backend _back_ to LMDB.
If an MDBX database exists and the MDBX backend is enabled then MDBX will continue to be used. Our release binaries and Docker images will continue to include MDBX for as long as it is practical, so users of these should not notice any difference.
The main benefit is to users compiling from source and devs running tests. These users no longer have to struggle to compile MDBX and deal with the compatibility issues that arises. Similarly, devs don't need to worry about toggling feature flags in tests or risk forgetting to run the slasher tests due to backend issues.
## Issue Addressed
N/A
## Proposed Changes
This change will log the value of the relay block and the local block when the relay block is more profitable.
## Additional Info
This change will help validators understand the block selection (as it looks like the execution reward sometimes is higher that the MEV-reward).
The rationale for this change is to aid operators to better understand why a relay-block was chosen over a local block.
Looking at produced blocks (at beaconcha.in for example) it sometimes looks like the builder is making a profit just from the execution reward vs the MEV-reward, and creates the nagging question: "Could i have built this block and made that extra profit?"... The answer is probably "No, not without the extra transactions included by the relay", but by logging the value of the local block-candidate, this will no longer be an issue..
### Example (Mainnet)
https://beaconcha.in/block/17370329
MEV Block Reward: 0.17122 Ether to 0xE35bBaFa0266089f95d745d348b468622805D82B
Execution Reward: 0.17528 Ether to 0x1f9090aaE28b8a3dCeaDf281B0F12828e676c326
Difference: 0.00406 Ether
### Examples (Goerli)
https://goerli.beaconcha.in/block/9040065
MEV Block Reward: 0.56423 Ether to 0xF5794543CF6055Ae710E9c8E99E31343Cea004a8
Execution Reward: 0.56488 Ether to 0xfC0157aA4F5DB7177830ACddB3D5a9BB5BE9cc5e
Difference: 0.00065 Ether
https://goerli.beaconcha.in/block/9019921
MEV Block Reward: 1.39440 Ether to 0xF5794543CF6055Ae710E9c8E99E31343Cea004a8
Execution Reward: 1.39469 Ether to 0xfC0157aA4F5DB7177830ACddB3D5a9BB5BE9cc5e
Difference: 0.00029 Ether
https://goerli.beaconcha.in/block/9015583
MEV Block Reward: 1.04356 Ether to 0xF5794543CF6055Ae710E9c8E99E31343Cea004a8
Execution Reward: 1.04896 Ether to 0xfC0157aA4F5DB7177830ACddB3D5a9BB5BE9cc5e
Difference: 0.0054 Ether