## Issue Addressed
Adds self rate limiting options, mainly with the idea to comply with peer's rate limits in small testnets
## Proposed Changes
Add a hidden flag `self-limiter` this can take no value, or customs values to configure quotas per protocol
## Additional Info
### How to use
`--self-limiter` will turn on the self rate limiter applying the same params we apply to inbound requests (requests from other peers)
`--self-limiter "beacon_blocks_by_range:64/1"` will turn on the self rate limiter for ALL protocols, but change the quota for bbrange to 64 requested blocks per 1 second.
`--self-limiter "beacon_blocks_by_range:64/1;ping:1/10"` same as previous one, changing the quota for ping as well.
### Caveats
- The rate limiter is either on or off for all protocols. I added the custom values to be able to change the quotas per protocol so that some protocols can be given extremely loose or tight quotas. I think this should satisfy every need even if we can't technically turn off rate limits per protocol.
- This reuses the rate limiter struct for the inbound requests so there is this ugly part of the code in which we need to deal with the inbound only protocols (light client stuff) if this becomes too ugly as we add lc protocols, we might want to split the rate limiters. I've checked this and looks doable with const generics to avoid so much code duplication
### Knowing if this is on
```
Feb 06 21:12:05.493 DEBG Using self rate limiting params config: OutboundRateLimiterConfig { ping: 2/10s, metadata: 1/15s, status: 5/15s, goodbye: 1/10s, blocks_by_range: 1024/10s, blocks_by_root: 128/10s }, service: libp2p_rpc, service: libp2p
```
* Add first efforts at broadcast
* Tidy
* Move broadcast code to client
* Progress with broadcast impl
* Rename to address change
* Fix compile errors
* Use `while` loop
* Tidy
* Flip broadcast condition
* Switch to forgetting individual indices
* Always broadcast when the node starts
* Refactor into two functions
* Add testing
* Add another test
* Tidy, add more testing
* Tidy
* Add test, rename enum
* Rename enum again
* Tidy
* Break loop early
* Add V15 schema migration
* Bump schema version
* Progress with migration
* Update beacon_node/client/src/address_change_broadcast.rs
Co-authored-by: Michael Sproul <micsproul@gmail.com>
* Fix typo in function name
---------
Co-authored-by: Michael Sproul <micsproul@gmail.com>
* Use Local Payload if More Profitable than Builder
* Rename clone -> clone_from_ref
* Minimize Clones of GetPayloadResponse
* Cleanup & Fix Tests
* Added Tests for Payload Choice by Profit
* Fix Outdated Comments
## Proposed Changes
Clippy 1.67.0 put us on blast for the size of some of our errors, most of them written by me ( 👀 ). This PR shrinks the size of `BeaconChainError` by dropping some extraneous info and boxing an inner error which should only occur infrequently anyway.
For the `AttestationSlashInfo` and `BlockSlashInfo` I opted to ignore the lint as they are always used in a `Result<A, Info>` where `A` is a similar size. This means they don't bloat the size of the `Result`, so it's a bit annoying for Clippy to report this as an issue.
I also chose to ignore `clippy::uninlined-format-args` because I think the benefit-to-churn ratio is too low. E.g. sometimes we have long identifiers in `format!` args and IMO the non-inlined form is easier to read:
```rust
// I prefer this...
format!(
"{} did {} to {}",
REALLY_LONG_CONSTANT_NAME,
ANOTHER_REALLY_LONG_CONSTANT_NAME,
regular_long_identifier_name
);
// To this
format!("{REALLY_LONG_CONSTANT_NAME} did {ANOTHER_REALLY_LONG_CONSTANT_NAME} to {regular_long_identifier_name}");
```
I tried generating an automatic diff with `cargo clippy --fix` but it came out at:
```
250 files changed, 1209 insertions(+), 1469 deletions(-)
```
Which seems like a bad idea when we'd have to back-merge it to `capella` and `eip4844` 😱
Currently there is a race between receiving blocks and receiving light client optimistic updates (in unstable), which results in processing errors. This is a continuation of PR #3693 and seeks to progress on issue #3651
Add the parent_root to ReprocessQueueMessage::BlockImported so we can remove blocks from queue when a block arrives that has the same parent root. We use the parent root as opposed to the block_root because the LightClientOptimisticUpdate does not contain the block_root.
If light_client_optimistic_update.attested_header.canonical_root() != head_block.message().parent_root() then we queue the update. Otherwise we process immediately.
michaelsproul came up with this idea.
The code was heavily based off of the attestation reprocessing.
I have not properly tested this to see if it works as intended.
* Use eth1_withdrawal_credential in Some Test States
* Update beacon_node/genesis/src/interop.rs
Co-authored-by: Michael Sproul <micsproul@gmail.com>
* Update beacon_node/genesis/src/interop.rs
Co-authored-by: Michael Sproul <micsproul@gmail.com>
* Increase validator sizes
* Pick next sync committee message
Co-authored-by: Michael Sproul <micsproul@gmail.com>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
* Import BLS to execution changes before Capella
* Test for BLS to execution change HTTP API
* Pack BLS to execution changes in LIFO order
* Remove unused var
* Clippy
We recently ran a large-block experiment on the testnet and plan to do a further experiment on mainnet.
Although the metrics recovered from lighthouse nodes were quite useful, I think we could do with greater resolution in the block delay metrics and get some specific values for each block (currently these can be lost to large exponential histogram buckets).
This PR increases the resolution of the block delay histogram buckets, but also introduces a new metric which records the last block delay. Depending on the polling resolution of the metric server, we can lose some block delay information, however it will always give us a specific value and we will not lose exact data based on poor resolution histogram buckets.
Issue #3112
Add `Filter::recover` to the GET chain to handle rejections specifically as 404 NOT FOUND
Making a request to `http://localhost:5052/not_real` now returns the following:
```
{
"code": 404,
"message": "NOT_FOUND",
"stacktraces": []
}
```
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
Currently there is a race between receiving blocks and receiving light client optimistic updates (in unstable), which results in processing errors. This is a continuation of PR #3693 and seeks to progress on issue #3651
## Proposed Changes
Add the parent_root to ReprocessQueueMessage::BlockImported so we can remove blocks from queue when a block arrives that has the same parent root. We use the parent root as opposed to the block_root because the LightClientOptimisticUpdate does not contain the block_root.
If light_client_optimistic_update.attested_header.canonical_root() != head_block.message().parent_root() then we queue the update. Otherwise we process immediately.
## Additional Info
michaelsproul came up with this idea.
The code was heavily based off of the attestation reprocessing.
I have not properly tested this to see if it works as intended.
* Use eth1_withdrawal_credential in Some Test States
* Update beacon_node/genesis/src/interop.rs
Co-authored-by: Michael Sproul <micsproul@gmail.com>
* Update beacon_node/genesis/src/interop.rs
Co-authored-by: Michael Sproul <micsproul@gmail.com>
* Increase validator sizes
* Pick next sync committee message
Co-authored-by: Michael Sproul <micsproul@gmail.com>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
* Import BLS to execution changes before Capella
* Test for BLS to execution change HTTP API
* Pack BLS to execution changes in LIFO order
* Remove unused var
* Clippy
We recently ran a large-block experiment on the testnet and plan to do a further experiment on mainnet.
Although the metrics recovered from lighthouse nodes were quite useful, I think we could do with greater resolution in the block delay metrics and get some specific values for each block (currently these can be lost to large exponential histogram buckets).
This PR increases the resolution of the block delay histogram buckets, but also introduces a new metric which records the last block delay. Depending on the polling resolution of the metric server, we can lose some block delay information, however it will always give us a specific value and we will not lose exact data based on poor resolution histogram buckets.
## Issue Addressed
Issue #3112
## Proposed Changes
Add `Filter::recover` to the GET chain to handle rejections specifically as 404 NOT FOUND
## Additional Info
Making a request to `http://localhost:5052/not_real` now returns the following:
```
{
"code": 404,
"message": "NOT_FOUND",
"stacktraces": []
}
```
Co-authored-by: Paul Hauner <paul@paulhauner.com>
* add historical summaries
* fix tree hash caching, disable the sanity slots test with fake crypto
* add ssz static HistoricalSummary
* only store historical summaries after capella
* Teach `UpdatePattern` about Capella
* Tidy EF tests
* Clippy
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
## Issue Addressed
NA
## Proposed Changes
Myself and others (#3678) have observed that when running with lots of validators (e.g., 1000s) the cardinality is too much for Prometheus. I've seen Prometheus instances just grind to a halt when we turn the validator monitor on for our testnet validators (we have 10,000s of Goerli validators). Additionally, the debug log volume can get very high with one log per validator, per attestation.
To address this, the `bn --validator-monitor-individual-tracking-threshold <INTEGER>` flag has been added to *disable* per-validator (i.e., non-aggregated) metrics/logging once the validator monitor exceeds the threshold of validators. The default value is `64`, which is a finger-to-the-wind value. I don't actually know the value at which Prometheus starts to become overwhelmed, but I've seen it work with ~64 validators and I've seen it *not* work with 1000s of validators. A default of `64` seems like it will result in a breaking change to users who are running millions of dollars worth of validators whilst resulting in a no-op for low-validator-count users. I'm open to changing this number, though.
Additionally, this PR starts collecting aggregated Prometheus metrics (e.g., total count of head hits across all validators), so that high-validator-count validators still have some interesting metrics. We already had logging for aggregated values, so nothing has been added there.
I've opted to make this a breaking change since it can be rather damaging to your Prometheus instance to accidentally enable the validator monitor with large numbers of validators. I've crashed a Prometheus instance myself and had a report from another user who's done the same thing.
## Additional Info
NA
## Breaking Changes Note
A new label has been added to the validator monitor Prometheus metrics: `total`. This label tracks the aggregated metrics of all validators in the validator monitor (as opposed to each validator being tracking individually using its pubkey as the label).
Additionally, a new flag has been added to the Beacon Node: `--validator-monitor-individual-tracking-threshold`. The default value is `64`, which means that when the validator monitor is tracking more than 64 validators then it will stop tracking per-validator metrics and only track the `all_validators` metric. It will also stop logging per-validator logs and only emit aggregated logs (the exception being that exit and slashing logs are always emitted).
These changes were introduced in #3728 to address issues with untenable Prometheus cardinality and log volume when using the validator monitor with high validator counts (e.g., 1000s of validators). Users with less than 65 validators will see no change in behavior (apart from the added `all_validators` metric). Users with more than 65 validators who wish to maintain the previous behavior can set something like `--validator-monitor-individual-tracking-threshold 999999`.
## Issue Addressed
Recent discussions with other client devs about optimistic sync have revealed a conceptual issue with the optimisation implemented in #3738. In designing that feature I failed to consider that the execution node checks the `blockHash` of the execution payload before responding with `SYNCING`, and that omitting this check entirely results in a degradation of the full node's validation. A node omitting the `blockHash` checks could be tricked by a supermajority of validators into following an invalid chain, something which is ordinarily impossible.
## Proposed Changes
I've added verification of the `payload.block_hash` in Lighthouse. In case of failure we log a warning and fall back to verifying the payload with the execution client.
I've used our existing dependency on `ethers_core` for RLP support, and a new dependency on Parity's `triehash` crate for the Merkle patricia trie. Although the `triehash` crate is currently unmaintained it seems like our best option at the moment (it is also used by Reth, and requires vastly less boilerplate than Parity's generic `trie-root` library).
Block hash verification is pretty quick, about 500us per block on my machine (mainnet).
The optimistic finalized sync feature can be disabled using `--disable-optimistic-finalized-sync` which forces full verification with the EL.
## Additional Info
This PR also introduces a new dependency on our [`metastruct`](https://github.com/sigp/metastruct) library, which was perfectly suited to the RLP serialization method. There will likely be changes as `metastruct` grows, but I think this is a good way to start dogfooding it.
I took inspiration from some Parity and Reth code while writing this, and have preserved the relevant license headers on the files containing code that was copied and modified.
I've needed to do this work in order to do some episub testing.
This version of libp2p has not yet been released, so this is left as a draft for when we wish to update.
Co-authored-by: Diva M <divma@protonmail.com>
Our custom RPC implementation is lagging from the libp2p v50 version.
We are going to need to change a bunch of function names and would be nice to have consistent ordering of function names inside the handlers.
This is a precursor to the libp2p upgrade to minimize merge conflicts in function ordering.
- there was a bug in responding range blob requests where we would incorrectly label the first slot of an epoch as a non-skipped slot if it were skipped. this bug did not exist in the code for responding to block range request because the logic error was mitigated by defensive coding elsewhere
- there was a bug where a block received during range sync without a corresponding blob (and vice versa) was incorrectly interpreted as a stream termination
- RPC size limit fixes.
- Our blob cache was dead locking so I removed use of it for now.
- Because of our change in finalized sync batch size from 2 to 1 and our transition to using exact epoch boundaries for batches (rather than one slot past the epoch boundary), we need to sync finalized sync to 2 epochs + 1 slot past our peer's finalized slot in order to finalize the chain locally.
- use fork context bytes in rpc methods on both the server and client side