## Issue Addressed
Closes https://github.com/sigp/lighthouse/issues/3556
## Proposed Changes
Delete finalized execution payloads from the database in two places:
1. When running the finalization migration in `migrate_database`. We delete the finalized payloads between the last split point and the new updated split point. _If_ payloads are already pruned prior to this then this is sufficient to prune _all_ payloads as non-canonical payloads are already deleted by the head pruner, and all canonical payloads prior to the previous split will already have been pruned.
2. To address the fact that users will update to this code _after_ the merge on mainnet (and testnets), we need a one-off scan to delete the finalized payloads from the canonical chain. This is implemented in `try_prune_execution_payloads` which runs on startup and scans the chain back to the Bellatrix fork or the anchor slot (if checkpoint synced after Bellatrix). In the case where payloads are already pruned this check only imposes a single state load for the split state, which shouldn't be _too slow_. Even so, a flag `--prepare-payloads-on-startup=false` is provided to turn this off after it has run the first time, which provides faster start-up times.
There is also a new `lighthouse db prune_payloads` subcommand for users who prefer to run the pruning manually.
## Additional Info
The tests have been updated to not rely on finalized payloads in the database, instead using the `MockExecutionLayer` to reconstruct them. Additionally a check was added to `check_chain_dump` which asserts the non-existence or existence of payloads on disk depending on their slot.
## Issue Addressed
Fixes a potential regression in memory fragmentation identified by @paulhauner here: https://github.com/sigp/lighthouse/pull/3371#discussion_r931770045.
## Proposed Changes
Immediately allocate a vector with sufficient size to hold all decoded elements in SSZ decoding. The `size_hint` is derived from the range iterator here:
2983235650/consensus/ssz/src/decode/impls.rs (L489)
## Additional Info
I'd like to test this out on some infra for a substantial duration to see if it affects total fragmentation.
## Issue Addressed
NA
## Proposed Changes
I've noticed that our block hashing times increase significantly after the merge. I did some flamegraph-ing and noticed that we're allocating a `Vec` for each byte of each execution payload transaction. This seems like unnecessary work and a bit of a fragmentation risk.
This PR switches to `SmallVec<[u8; 32]>` for the packed encoding of `TreeHash`. I believe this is a nice simple optimisation with no downside.
### Benchmarking
These numbers were computed using #3580 on my desktop (i7 hex-core). You can see a bit of noise in the numbers, that's probably just my computer doing other things. Generally I found this change takes the time from 10-11ms to 8-9ms. I can also see all the allocations disappear from flamegraph.
This is the block being benchmarked: https://beaconcha.in/slot/4704236
#### Before
```
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 980: 10.553003ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 981: 10.563737ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 982: 10.646352ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 983: 10.628532ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 984: 10.552112ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 985: 10.587778ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 986: 10.640526ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 987: 10.587243ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 988: 10.554748ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 989: 10.551111ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 990: 11.559031ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 991: 11.944827ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 992: 10.554308ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 993: 11.043397ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 994: 11.043315ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 995: 11.207711ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 996: 11.056246ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 997: 11.049706ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 998: 11.432449ms
[2022-09-15T21:44:19Z INFO lcli::block_root] Run 999: 11.149617ms
```
#### After
```
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 980: 14.011653ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 981: 8.925314ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 982: 8.849563ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 983: 8.893689ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 984: 8.902964ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 985: 8.942067ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 986: 8.907088ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 987: 9.346101ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 988: 8.96142ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 989: 9.366437ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 990: 9.809334ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 991: 9.541561ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 992: 11.143518ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 993: 10.821181ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 994: 9.855973ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 995: 10.941006ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 996: 9.596155ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 997: 9.121739ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 998: 9.090019ms
[2022-09-15T21:41:49Z INFO lcli::block_root] Run 999: 9.071885ms
```
## Additional Info
Please provide any additional information. For example, future considerations
or information useful for reviewers.
## Issue Addressed
NA
## Proposed Changes
Adds a simple tool for computing the block root of some block from a beacon-API or a file. This is useful for benchmarking.
## Additional Info
NA
## Issue Addressed
NA
## Proposed Changes
I have observed scenarios on Goerli where Lighthouse was receiving attestations which reference the same, un-cached shuffling on multiple threads at the same time. Lighthouse was then loading the same state from database and determining the shuffling on multiple threads at the same time. This is unnecessary load on the disk and RAM.
This PR modifies the shuffling cache so that each entry can be either:
- A committee
- A promise for a committee (i.e., a `crossbeam_channel::Receiver`)
Now, in the scenario where we have thread A and thread B simultaneously requesting the same un-cached shuffling, we will have the following:
1. Thread A will take the write-lock on the shuffling cache, find that there's no cached committee and then create a "promise" (a `crossbeam_channel::Sender`) for a committee before dropping the write-lock.
1. Thread B will then be allowed to take the write-lock for the shuffling cache and find the promise created by thread A. It will block the current thread waiting for thread A to fulfill that promise.
1. Thread A will load the state from disk, obtain the shuffling, send it down the channel, insert the entry into the cache and then continue to verify the attestation.
1. Thread B will then receive the shuffling from the receiver, be un-blocked and then continue to verify the attestation.
In the case where thread A fails to generate the shuffling and drops the sender, the next time that specific shuffling is requested we will detect that the channel is disconnected and return a `None` entry for that shuffling. This will cause the shuffling to be re-calculated.
## Additional Info
NA
## Issue Addressed
Fix a `cargo-audit` failure. We don't use `axum` for anything besides tests, but `cargo-audit` is failing due to this vulnerability in `axum-core`: https://rustsec.org/advisories/RUSTSEC-2022-0055
## Issue Addressed
Make sure gas limit examples in our docs represent sane values.
Thanks @dankrad for raising this in discord.
## Additional Info
We could also consider logging warnings about whether the gas limits configured are sane. Prysm has an open issue for this: https://github.com/prysmaticlabs/prysm/issues/10810
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Issue Addressed
#3562
## Proposed Changes
Change the fork endpoint from `localhost` to `127.0.0.1` to match the ganache default listening host.
This way it doesn't try (and fail) to connect to `::1` on IPV6 machines.
## Additional Info
First PR
## Issue Addressed
#3285
## Proposed Changes
Adds support for specifying histogram with buckets and adds new metric buckets for metrics mentioned in issue.
## Additional Info
Need some help for the buckets.
Co-authored-by: Michael Sproul <micsproul@gmail.com>
## Proposed Changes
Add a section on the new community checkpoint sync endpoints in the book. This should help stakers sync faster even without having to create an account.
## Issue Addressed
We were unable to update lighthouse by running `cargo update` because some of the `mev-build-rs` deps weren't pinned. But `mev-build-rs` is now pinned here and includes it's own pinned commits for `ssz-rs` and `etheruem-consensus`
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Issue Addressed
Resolves: #3550
Remove the `--strict-fee-recipient` flag. It will cause missed proposals prior to the bellatrix transition.
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Issue Addressed
Related: #3550
Remove references to the `--strict-fee-recipient` flag in docs. The flag will cause missed proposals prior to the merge transition.
Co-authored-by: realbigsean <sean@sigmaprime.io>
Add flag 'log-color' which preserves colors of log when stdout is redirected to a file.
This is my first lighthouse PR, please let me know if I'm not following contribution guidelines, I welcome meta-feeback (feedback on git commit messages, git branch naming, and the contents of the description of this PR.)
## Issue Addressed
Solves https://github.com/sigp/lighthouse/issues/3527
## Proposed Changes
Adding a flag which enables log color preserving when stdout is redirected to a file.
### Usage
Below I demonstrate current behaviour (without using the new flag) and the new behaviur (when using new flag).
In the screenshot below, I have to panes, one on top running `lighthouse` which redirects to file `~/Desktop/test.log` and one pane in the bottom which runs `tail -f ~/Desktop/test.log`.
#### Current behaviour
```sh
lighthouse --network prater vc |& tee -a ~/Desktop/test.log
```
**Result is no colors**
<img width="1624" alt="current" src="https://user-images.githubusercontent.com/864410/188258226-bfcf8271-4c9e-474c-848e-ac92a60df25c.png">
#### New behaviour
```sh
lighthouse --network prater vc --log-color |& tee -a ~/Desktop/test.log
```
**Result is colors** 🔴🟢🔵🟡
<img width="1624" alt="new" src="https://user-images.githubusercontent.com/864410/188258223-7d9ecf09-92c8-4cba-8f24-bd4d88fc0353.png">
## Additional Info
I chose American spelling of "color" instead of Brittish "colour' since that was aligned with `slog`'s API - method`force_color()`, let me know if you prefer spelling "colour" instead. I also chose to let it be an arg not taking any argument, just like `logfile-compress` flag, rather than having to write `--log-color true`.
## Issue Addressed
Closes#3514
## Proposed Changes
- Change default monitoring endpoint frequency to 120 seconds to fit with 30k requests/month limit.
- Allow configuration of the monitoring endpoint frequency using `--monitoring-endpoint-frequency N` where `N` is a value in seconds.
## Issue Addressed
Resolves https://github.com/sigp/lighthouse/issues/3517
## Proposed Changes
Adds a `--builder-profit-threshold <wei value>` flag to the BN. If an external payload's value field is less than this value, the local payload will be used. The value of the local payload will not be checked (it can't really be checked until the engine API is updated to support this).
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Issue Addressed
Add a flag that can increase count unrealized strictness, defaults to false
## Proposed Changes
Please list or describe the changes introduced by this PR.
## Additional Info
Please provide any additional information. For example, future considerations
or information useful for reviewers.
Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: sean <seananderson33@gmail.com>
## Issue Addressed
[Have --checkpoint-sync-url timeout](https://github.com/sigp/lighthouse/issues/3478)
## Proposed Changes
I added a parameter for `get_bytes_opt_accept_header<U: IntoUrl>` which accept a timeout duration, and modified the body of `get_beacon_blocks_ssz` and `get_debug_beacon_states_ssz` to pass corresponding timeout durations.
## Issue Addressed
When requesting an index which is not active during `start_epoch`, Lighthouse returns:
```
curl "http://localhost:5052/lighthouse/analysis/attestation_performance/999999999?start_epoch=100000&end_epoch=100000"
```
```json
{
"code": 500,
"message": "INTERNAL_SERVER_ERROR: ParticipationCache(InvalidValidatorIndex(999999999))",
"stacktraces": []
}
```
This error occurs even when the index in question becomes active before `end_epoch` which is undesirable as it can prevent larger queries from completing.
## Proposed Changes
In the event the index is out-of-bounds (has not yet been activated), simply return all fields as `false`:
```
-> curl "http://localhost:5052/lighthouse/analysis/attestation_performance/999999999?start_epoch=100000&end_epoch=100000"
```
```json
[
{
"index": 999999999,
"epochs": {
"100000": {
"active": false,
"head": false,
"target": false,
"source": false
}
}
}
]
```
By doing this, we cover the case where a validator becomes active sometime between `start_epoch` and `end_epoch`.
## Additional Info
Note that this error only occurs for epochs after the Altair hard fork.
## Issue Addressed
We currently subscribe to attestation subnets as soon as the subscription arrives (one epoch in advance), this makes it so that subscriptions for future slots are scheduled instead of done immediately.
## Proposed Changes
- Schedule subscriptions to subnets for future slots.
- Finish removing hashmap_delay, in favor of [delay_map](https://github.com/AgeManning/delay_map). This was the only remaining service to do this.
- Subscriptions for past slots are rejected, before we would subscribe for one slot.
- Add a new test for subscriptions that are not consecutive.
## Additional Info
This is also an effort in making the code easier to understand
## Issue Addressed
Resolves#3524
## Proposed Changes
Log fee recipient in the `Validator exists in beacon chain` log. Logging in the BN already happens here 18c61a5e8b/beacon_node/beacon_chain/src/beacon_chain.rs (L3858-L3865)
I also think it's good practice to encourage users to set the fee recipient in the VC rather than the BN because of issues mentioned here https://github.com/sigp/lighthouse/issues/3432
Some example logs from prater:
```
Aug 30 03:47:09.922 INFO Validator exists in beacon chain fee_recipient: 0xab97_ad88, validator_index: 213615, pubkey: 0xb542b69ba14ddbaf717ca1762ece63a4804c08d38a1aadf156ae718d1545942e86763a1604f5065d4faa550b7259d651, service: duties
Aug 30 03:48:05.505 INFO Validator exists in beacon chain fee_recipient: Fee recipient for validator not set in validator_definitions.yml or provided with the `--suggested-fee-recipient flag`, validator_index: 210710, pubkey: 0xad5d67cc7f990590c7b3fa41d593c4cf12d9ead894be2311fbb3e5c733d8c1b909e9d47af60ea3480fb6b37946c35390, service: duties
```
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
NA
## Proposed Changes
As we've seen on Prater, there seems to be a correlation between these messages
```
WARN Not enough time for a discovery search subnet_id: ExactSubnet { subnet_id: SubnetId(19), slot: Slot(3742336) }, service: attestation_service
```
... and nodes falling 20-30 slots behind the head for short periods. These nodes are running ~20k Prater validators.
After running some metrics, I can see that the `network_recv` channel is processing ~250k `AttestationSubscribe` messages per minute. It occurred to me that perhaps the `AttestationSubscribe` messages are "washing out" the `SendRequest` and `SendResponse` messages. In this PR I separate the `AttestationSubscribe` and `SyncCommitteeSubscribe` messages into their own queue so the `tokio::select!` in the `NetworkService` can still process the other messages in the `network_recv` channel without necessarily having to clear all the subscription messages first.
~~I've also added filter to the HTTP API to prevent duplicate subscriptions going to the network service.~~
## Additional Info
- Currently being tested on Prater
## Issue Addressed
I think the antithesis is failing due to an OOM which may be resolved by updating the ubuntu image it runs on. The lcli build looks like it's failing because the image lacks the `libclang` dependency
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Issue Addressed
Partly resolves#3518
## Proposed Changes
Change the slot notifier to use `duration_to_next_slot` rather than an interval timer. This makes it robust against underlying clock changes.