## Proposed Changes
Add a new Cargo compilation profile called `maxperf` which enables more aggressive compiler optimisations at the expense of compilation time.
Some rough initial benchmarks show that this can provide up to a 25% reduction to run time for CPU bound tasks like block processing: https://docs.google.com/spreadsheets/d/15jHuZe7lLHhZq9Nw8kc6EL0Qh_N_YAYqkW2NQ_Afmtk/edit
The numbers in that spreadsheet compare the `consensus-context` branch from #3604 to the same branch compiled with the `maxperf` profile using:
```
PROFILE=maxperf make install-lcli
```
## Additional Info
The downsides of the maxperf profile are:
- It increases compile times substantially, which will particularly impact low-spec hardware. Compiling `lcli` is about 3x slower. Compiling Lighthouse is about 5x slower on my 5950X: 17m 38s rather than 3m 28s.
As a result I think we should not enable this everywhere by default.
- **Option 1**: enable by default for our released binaries. This gives the majority of users the fastest version of `lighthouse` possible, at the expense of slowing down our release CI. Source builds will continue to use the default `release` profile unless users opt-in to `maxperf`.
- **Option 2**: enable by default for source builds. This gives users building from source an edge, but makes them pay for it with compilation time.
I think I would prefer Option 1. I'll try doing some benchmarking to see how long a maxperf build of Lighthouse would take on GitHub actions.
Credit to Nicholas Nethercote for documenting these options in the Rust Performance Book: https://nnethercote.github.io/perf-book/build-configuration.html.
## Issue Addressed
Resolves#3516
## Proposed Changes
Adds a beacon fallback function for running a beacon node http query on all available fallbacks instead of returning on a first successful result. Uses the new `run_on_all` method for attestation and sync committee subscriptions.
## Additional Info
Please provide any additional information. For example, future considerations
or information useful for reviewers.
## Proposed Changes
Add some docs on checking the builder configuration, which is a frequently asked question on Discord.
## Additional Info
My text editor also insisted on stripping some trailing newlines, but can put 'em back if we want
## Issue Addressed
fixes lints from the last rust release
## Proposed Changes
Fix the lints, most of the lints by `clippy::question-mark` are false positives in the form of https://github.com/rust-lang/rust-clippy/issues/9518 so it's allowed for now
## Additional Info
## Issue Addressed
🐞 in which we don't actually unsubscribe from a random long lived subnet when it expires
## Proposed Changes
Remove code addressing a specific case in which we are subscribed to all subnets and handle the removal of the long lived subnet. I don't think the special case code is particularly important as, if someone is running with that many validators to be subscribed to all subnets, it should use `--subscribe-all-subnets` instead
## Additional Info
Noticed on some test nodes climbing bandwidth usage periodically (around 27hours, the time of subnet expirations) I'm running this code to test this does not happen anymore, but I think it should be good now
## Issue Addressed
NA
## Proposed Changes
This PR attempts to fix the following spurious CI failure:
```
---- store_tests::garbage_collect_temp_states_from_failed_block stdout ----
thread 'store_tests::garbage_collect_temp_states_from_failed_block' panicked at 'disk store should initialize: DBError { message: "Error { message: \"IO error: lock /tmp/.tmp6DcBQ9/cold_db/LOCK: already held by process\" }" }', beacon_node/beacon_chain/tests/store_tests.rs:59:10
```
I believe that some async task is taking a clone of the store and holding it in some other thread for a short time. This creates a race-condition when we try to open a new instance of the store.
## Additional Info
NA
## Issue Addressed
Resolves#3573
## Proposed Changes
Fix the bytecode for the deposit contract deployment transaction and value for deposit transaction in the execution engine integration tests. Also verify that all published transaction make it to the execution payload and have a valid status.
## Issue Addressed
NA
## Proposed Changes
This PR removes duplicated block root computation.
Computing the `SignedBeaconBlock::canonical_root` has become more expensive since the merge as we need to compute the merke root of each transaction inside an `ExecutionPayload`.
Computing the root for [a mainnet block](https://beaconcha.in/slot/4704236) is taking ~10ms on my i7-8700K CPU @ 3.70GHz (no sha extensions). Given that our median seen-to-imported time for blocks is presently 300-400ms, removing a few duplicated block roots (~30ms) could represent an easy 10% improvement. When we consider that the seen-to-imported times include operations *after* the block has been placed in the early attester cache, we could expect the 30ms to be more significant WRT our seen-to-attestable times.
## Additional Info
NA
## Issue Addressed
NA
## Proposed Changes
Fixes an issue introduced in #3574 where I erroneously assumed that a `crossbeam_channel` multiple receiver queue was a *broadcast* queue. This is incorrect, each message will be received by *only one* receiver. The effect of this mistake is these logs:
```
Sep 20 06:56:17.001 INFO Synced slot: 4736079, block: 0xaa8a…180d, epoch: 148002, finalized_epoch: 148000, finalized_root: 0x2775…47f2, exec_hash: 0x2ca5…ffde (verified), peers: 6, service: slot_notifier
Sep 20 06:56:23.237 ERRO Unable to validate attestation error: CommitteeCacheWait(RecvError), peer_id: 16Uiu2HAm2Jnnj8868tb7hCta1rmkXUf5YjqUH1YPj35DCwNyeEzs, type: "aggregated", slot: Slot(4736047), beacon_block_root: 0x88d318534b1010e0ebd79aed60b6b6da1d70357d72b271c01adf55c2b46206c1
```
## Additional Info
NA
## Proposed Changes
Improve the payload pruning feature in several ways:
- Payload pruning is now entirely optional. It is enabled by default but can be disabled with `--prune-payloads false`. The previous `--prune-payloads-on-startup` flag from #3565 is removed.
- Initial payload pruning on startup now runs in a background thread. This thread will always load the split state, which is a small fraction of its total work (up to ~300ms) and then backtrack from that state. This pruning process ran in 2m5s on one Prater node with good I/O and 16m on a node with slower I/O.
- To work with the optional payload pruning the database function `try_load_full_block` will now attempt to load execution payloads for finalized slots _if_ pruning is currently disabled. This gives users an opt-out for the extensive traffic between the CL and EL for reconstructing payloads.
## Additional Info
If the `prune-payloads` flag is toggled on and off then the on-startup check may not see any payloads to delete and fail to clean them up. In this case the `lighthouse db prune_payloads` command should be used to force a manual sweep of the database.
## Issue Addressed
https://github.com/ethereum/beacon-APIs/pull/222
## Proposed Changes
Update Lighthouse's randao verification API to match the `beacon-APIs` spec. We implemented the API before spec stabilisation, and it changed slightly in the course of review.
Rather than a flag `verify_randao` taking a boolean value, the new API uses a `skip_randao_verification` flag which takes no argument. The new spec also requires the randao reveal to be present and equal to the point-at-infinity when `skip_randao_verification` is set.
I've also updated the `POST /lighthouse/analysis/block_rewards` API to take blinded blocks as input, as the execution payload is irrelevant and we may want to assess blocks produced by builders.
## Additional Info
This is technically a breaking change, but seeing as I suspect I'm the only one using these parameters/APIs, I think we're OK to include this in a patch release.
## Issue Addressed
Closes https://github.com/sigp/lighthouse/issues/3556
## Proposed Changes
Delete finalized execution payloads from the database in two places:
1. When running the finalization migration in `migrate_database`. We delete the finalized payloads between the last split point and the new updated split point. _If_ payloads are already pruned prior to this then this is sufficient to prune _all_ payloads as non-canonical payloads are already deleted by the head pruner, and all canonical payloads prior to the previous split will already have been pruned.
2. To address the fact that users will update to this code _after_ the merge on mainnet (and testnets), we need a one-off scan to delete the finalized payloads from the canonical chain. This is implemented in `try_prune_execution_payloads` which runs on startup and scans the chain back to the Bellatrix fork or the anchor slot (if checkpoint synced after Bellatrix). In the case where payloads are already pruned this check only imposes a single state load for the split state, which shouldn't be _too slow_. Even so, a flag `--prepare-payloads-on-startup=false` is provided to turn this off after it has run the first time, which provides faster start-up times.
There is also a new `lighthouse db prune_payloads` subcommand for users who prefer to run the pruning manually.
## Additional Info
The tests have been updated to not rely on finalized payloads in the database, instead using the `MockExecutionLayer` to reconstruct them. Additionally a check was added to `check_chain_dump` which asserts the non-existence or existence of payloads on disk depending on their slot.
## Issue Addressed
Fixes a potential regression in memory fragmentation identified by @paulhauner here: https://github.com/sigp/lighthouse/pull/3371#discussion_r931770045.
## Proposed Changes
Immediately allocate a vector with sufficient size to hold all decoded elements in SSZ decoding. The `size_hint` is derived from the range iterator here:
2983235650/consensus/ssz/src/decode/impls.rs (L489)
## Additional Info
I'd like to test this out on some infra for a substantial duration to see if it affects total fragmentation.
## Issue Addressed
NA
## Proposed Changes
I've noticed that our block hashing times increase significantly after the merge. I did some flamegraph-ing and noticed that we're allocating a `Vec` for each byte of each execution payload transaction. This seems like unnecessary work and a bit of a fragmentation risk.
This PR switches to `SmallVec<[u8; 32]>` for the packed encoding of `TreeHash`. I believe this is a nice simple optimisation with no downside.
### Benchmarking
These numbers were computed using #3580 on my desktop (i7 hex-core). You can see a bit of noise in the numbers, that's probably just my computer doing other things. Generally I found this change takes the time from 10-11ms to 8-9ms. I can also see all the allocations disappear from flamegraph.
This is the block being benchmarked: https://beaconcha.in/slot/4704236
#### Before
```
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 980: 10.553003ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 981: 10.563737ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 982: 10.646352ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 983: 10.628532ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 984: 10.552112ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 985: 10.587778ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 986: 10.640526ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 987: 10.587243ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 988: 10.554748ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 989: 10.551111ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 990: 11.559031ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 991: 11.944827ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 992: 10.554308ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 993: 11.043397ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 994: 11.043315ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 995: 11.207711ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 996: 11.056246ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 997: 11.049706ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 998: 11.432449ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 999: 11.149617ms
```
#### After
```
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 980: 14.011653ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 981: 8.925314ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 982: 8.849563ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 983: 8.893689ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 984: 8.902964ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 985: 8.942067ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 986: 8.907088ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 987: 9.346101ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 988: 8.96142ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 989: 9.366437ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 990: 9.809334ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 991: 9.541561ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 992: 11.143518ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 993: 10.821181ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 994: 9.855973ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 995: 10.941006ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 996: 9.596155ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 997: 9.121739ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 998: 9.090019ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 999: 9.071885ms
```
## Additional Info
Please provide any additional information. For example, future considerations
or information useful for reviewers.
## Issue Addressed
NA
## Proposed Changes
Adds a simple tool for computing the block root of some block from a beacon-API or a file. This is useful for benchmarking.
## Additional Info
NA
## Issue Addressed
NA
## Proposed Changes
I have observed scenarios on Goerli where Lighthouse was receiving attestations which reference the same, un-cached shuffling on multiple threads at the same time. Lighthouse was then loading the same state from database and determining the shuffling on multiple threads at the same time. This is unnecessary load on the disk and RAM.
This PR modifies the shuffling cache so that each entry can be either:
- A committee
- A promise for a committee (i.e., a `crossbeam_channel::Receiver`)
Now, in the scenario where we have thread A and thread B simultaneously requesting the same un-cached shuffling, we will have the following:
1. Thread A will take the write-lock on the shuffling cache, find that there's no cached committee and then create a "promise" (a `crossbeam_channel::Sender`) for a committee before dropping the write-lock.
1. Thread B will then be allowed to take the write-lock for the shuffling cache and find the promise created by thread A. It will block the current thread waiting for thread A to fulfill that promise.
1. Thread A will load the state from disk, obtain the shuffling, send it down the channel, insert the entry into the cache and then continue to verify the attestation.
1. Thread B will then receive the shuffling from the receiver, be un-blocked and then continue to verify the attestation.
In the case where thread A fails to generate the shuffling and drops the sender, the next time that specific shuffling is requested we will detect that the channel is disconnected and return a `None` entry for that shuffling. This will cause the shuffling to be re-calculated.
## Additional Info
NA
## Issue Addressed
Fix a `cargo-audit` failure. We don't use `axum` for anything besides tests, but `cargo-audit` is failing due to this vulnerability in `axum-core`: https://rustsec.org/advisories/RUSTSEC-2022-0055
## Issue Addressed
Make sure gas limit examples in our docs represent sane values.
Thanks @dankrad for raising this in discord.
## Additional Info
We could also consider logging warnings about whether the gas limits configured are sane. Prysm has an open issue for this: https://github.com/prysmaticlabs/prysm/issues/10810
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Issue Addressed
#3562
## Proposed Changes
Change the fork endpoint from `localhost` to `127.0.0.1` to match the ganache default listening host.
This way it doesn't try (and fail) to connect to `::1` on IPV6 machines.
## Additional Info
First PR
## Issue Addressed
#3285
## Proposed Changes
Adds support for specifying histogram with buckets and adds new metric buckets for metrics mentioned in issue.
## Additional Info
Need some help for the buckets.
Co-authored-by: Michael Sproul <micsproul@gmail.com>
## Proposed Changes
Add a section on the new community checkpoint sync endpoints in the book. This should help stakers sync faster even without having to create an account.
## Issue Addressed
We were unable to update lighthouse by running `cargo update` because some of the `mev-build-rs` deps weren't pinned. But `mev-build-rs` is now pinned here and includes it's own pinned commits for `ssz-rs` and `etheruem-consensus`
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Issue Addressed
Resolves: #3550
Remove the `--strict-fee-recipient` flag. It will cause missed proposals prior to the bellatrix transition.
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Issue Addressed
Related: #3550
Remove references to the `--strict-fee-recipient` flag in docs. The flag will cause missed proposals prior to the merge transition.
Co-authored-by: realbigsean <sean@sigmaprime.io>
Add flag 'log-color' which preserves colors of log when stdout is redirected to a file.
This is my first lighthouse PR, please let me know if I'm not following contribution guidelines, I welcome meta-feeback (feedback on git commit messages, git branch naming, and the contents of the description of this PR.)
## Issue Addressed
Solves https://github.com/sigp/lighthouse/issues/3527
## Proposed Changes
Adding a flag which enables log color preserving when stdout is redirected to a file.
### Usage
Below I demonstrate current behaviour (without using the new flag) and the new behaviur (when using new flag).
In the screenshot below, I have to panes, one on top running `lighthouse` which redirects to file `~/Desktop/test.log` and one pane in the bottom which runs `tail -f ~/Desktop/test.log`.
#### Current behaviour
```sh
lighthouse --network prater vc |& tee -a ~/Desktop/test.log
```
**Result is no colors**
<img width="1624" alt="current" src="https://user-images.githubusercontent.com/864410/188258226-bfcf8271-4c9e-474c-848e-ac92a60df25c.png">
#### New behaviour
```sh
lighthouse --network prater vc --log-color |& tee -a ~/Desktop/test.log
```
**Result is colors** 🔴🟢🔵🟡
<img width="1624" alt="new" src="https://user-images.githubusercontent.com/864410/188258223-7d9ecf09-92c8-4cba-8f24-bd4d88fc0353.png">
## Additional Info
I chose American spelling of "color" instead of Brittish "colour' since that was aligned with `slog`'s API - method`force_color()`, let me know if you prefer spelling "colour" instead. I also chose to let it be an arg not taking any argument, just like `logfile-compress` flag, rather than having to write `--log-color true`.
## Issue Addressed
Closes#3514
## Proposed Changes
- Change default monitoring endpoint frequency to 120 seconds to fit with 30k requests/month limit.
- Allow configuration of the monitoring endpoint frequency using `--monitoring-endpoint-frequency N` where `N` is a value in seconds.
## Issue Addressed
Resolves https://github.com/sigp/lighthouse/issues/3517
## Proposed Changes
Adds a `--builder-profit-threshold <wei value>` flag to the BN. If an external payload's value field is less than this value, the local payload will be used. The value of the local payload will not be checked (it can't really be checked until the engine API is updated to support this).
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Issue Addressed
Add a flag that can increase count unrealized strictness, defaults to false
## Proposed Changes
Please list or describe the changes introduced by this PR.
## Additional Info
Please provide any additional information. For example, future considerations
or information useful for reviewers.
Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: sean <seananderson33@gmail.com>
## Issue Addressed
[Have --checkpoint-sync-url timeout](https://github.com/sigp/lighthouse/issues/3478)
## Proposed Changes
I added a parameter for `get_bytes_opt_accept_header<U: IntoUrl>` which accept a timeout duration, and modified the body of `get_beacon_blocks_ssz` and `get_debug_beacon_states_ssz` to pass corresponding timeout durations.
## Issue Addressed
When requesting an index which is not active during `start_epoch`, Lighthouse returns:
```
curl "http://localhost:5052/lighthouse/analysis/attestation_performance/999999999?start_epoch=100000&end_epoch=100000"
```
```json
{
"code": 500,
"message": "INTERNAL_SERVER_ERROR: ParticipationCache(InvalidValidatorIndex(999999999))",
"stacktraces": []
}
```
This error occurs even when the index in question becomes active before `end_epoch` which is undesirable as it can prevent larger queries from completing.
## Proposed Changes
In the event the index is out-of-bounds (has not yet been activated), simply return all fields as `false`:
```
-> curl "http://localhost:5052/lighthouse/analysis/attestation_performance/999999999?start_epoch=100000&end_epoch=100000"
```
```json
[
{
"index": 999999999,
"epochs": {
"100000": {
"active": false,
"head": false,
"target": false,
"source": false
}
}
}
]
```
By doing this, we cover the case where a validator becomes active sometime between `start_epoch` and `end_epoch`.
## Additional Info
Note that this error only occurs for epochs after the Altair hard fork.
## Issue Addressed
We currently subscribe to attestation subnets as soon as the subscription arrives (one epoch in advance), this makes it so that subscriptions for future slots are scheduled instead of done immediately.
## Proposed Changes
- Schedule subscriptions to subnets for future slots.
- Finish removing hashmap_delay, in favor of [delay_map](https://github.com/AgeManning/delay_map). This was the only remaining service to do this.
- Subscriptions for past slots are rejected, before we would subscribe for one slot.
- Add a new test for subscriptions that are not consecutive.
## Additional Info
This is also an effort in making the code easier to understand
## Issue Addressed
Resolves#3524
## Proposed Changes
Log fee recipient in the `Validator exists in beacon chain` log. Logging in the BN already happens here 18c61a5e8b/beacon_node/beacon_chain/src/beacon_chain.rs (L3858-L3865)
I also think it's good practice to encourage users to set the fee recipient in the VC rather than the BN because of issues mentioned here https://github.com/sigp/lighthouse/issues/3432
Some example logs from prater:
```
Aug 30 03:47:09.922 INFO Validator exists in beacon chain fee_recipient: 0xab97_ad88, validator_index: 213615, pubkey: 0xb542b69ba14ddbaf717ca1762ece63a4804c08d38a1aadf156ae718d1545942e86763a1604f5065d4faa550b7259d651, service: duties
Aug 30 03:48:05.505 INFO Validator exists in beacon chain fee_recipient: Fee recipient for validator not set in validator_definitions.yml or provided with the `--suggested-fee-recipient flag`, validator_index: 210710, pubkey: 0xad5d67cc7f990590c7b3fa41d593c4cf12d9ead894be2311fbb3e5c733d8c1b909e9d47af60ea3480fb6b37946c35390, service: duties
```
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
NA
## Proposed Changes
As we've seen on Prater, there seems to be a correlation between these messages
```
WARN Not enough time for a discovery search subnet_id: ExactSubnet { subnet_id: SubnetId(19), slot: Slot(3742336) }, service: attestation_service
```
... and nodes falling 20-30 slots behind the head for short periods. These nodes are running ~20k Prater validators.
After running some metrics, I can see that the `network_recv` channel is processing ~250k `AttestationSubscribe` messages per minute. It occurred to me that perhaps the `AttestationSubscribe` messages are "washing out" the `SendRequest` and `SendResponse` messages. In this PR I separate the `AttestationSubscribe` and `SyncCommitteeSubscribe` messages into their own queue so the `tokio::select!` in the `NetworkService` can still process the other messages in the `network_recv` channel without necessarily having to clear all the subscription messages first.
~~I've also added filter to the HTTP API to prevent duplicate subscriptions going to the network service.~~
## Additional Info
- Currently being tested on Prater
## Issue Addressed
I think the antithesis is failing due to an OOM which may be resolved by updating the ubuntu image it runs on. The lcli build looks like it's failing because the image lacks the `libclang` dependency
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Issue Addressed
Partly resolves#3518
## Proposed Changes
Change the slot notifier to use `duration_to_next_slot` rather than an interval timer. This makes it robust against underlying clock changes.