* Add first efforts at broadcast
* Tidy
* Move broadcast code to client
* Progress with broadcast impl
* Rename to address change
* Fix compile errors
* Use `while` loop
* Tidy
* Flip broadcast condition
* Switch to forgetting individual indices
* Always broadcast when the node starts
* Refactor into two functions
* Add testing
* Add another test
* Tidy, add more testing
* Tidy
* Add test, rename enum
* Rename enum again
* Tidy
* Break loop early
* Add V15 schema migration
* Bump schema version
* Progress with migration
* Update beacon_node/client/src/address_change_broadcast.rs
Co-authored-by: Michael Sproul <micsproul@gmail.com>
* Fix typo in function name
---------
Co-authored-by: Michael Sproul <micsproul@gmail.com>
Currently there is a race between receiving blocks and receiving light client optimistic updates (in unstable), which results in processing errors. This is a continuation of PR #3693 and seeks to progress on issue #3651
Add the parent_root to ReprocessQueueMessage::BlockImported so we can remove blocks from queue when a block arrives that has the same parent root. We use the parent root as opposed to the block_root because the LightClientOptimisticUpdate does not contain the block_root.
If light_client_optimistic_update.attested_header.canonical_root() != head_block.message().parent_root() then we queue the update. Otherwise we process immediately.
michaelsproul came up with this idea.
The code was heavily based off of the attestation reprocessing.
I have not properly tested this to see if it works as intended.
* Import BLS to execution changes before Capella
* Test for BLS to execution change HTTP API
* Pack BLS to execution changes in LIFO order
* Remove unused var
* Clippy
We recently ran a large-block experiment on the testnet and plan to do a further experiment on mainnet.
Although the metrics recovered from lighthouse nodes were quite useful, I think we could do with greater resolution in the block delay metrics and get some specific values for each block (currently these can be lost to large exponential histogram buckets).
This PR increases the resolution of the block delay histogram buckets, but also introduces a new metric which records the last block delay. Depending on the polling resolution of the metric server, we can lose some block delay information, however it will always give us a specific value and we will not lose exact data based on poor resolution histogram buckets.
## Issue Addressed
Currently there is a race between receiving blocks and receiving light client optimistic updates (in unstable), which results in processing errors. This is a continuation of PR #3693 and seeks to progress on issue #3651
## Proposed Changes
Add the parent_root to ReprocessQueueMessage::BlockImported so we can remove blocks from queue when a block arrives that has the same parent root. We use the parent root as opposed to the block_root because the LightClientOptimisticUpdate does not contain the block_root.
If light_client_optimistic_update.attested_header.canonical_root() != head_block.message().parent_root() then we queue the update. Otherwise we process immediately.
## Additional Info
michaelsproul came up with this idea.
The code was heavily based off of the attestation reprocessing.
I have not properly tested this to see if it works as intended.
* Import BLS to execution changes before Capella
* Test for BLS to execution change HTTP API
* Pack BLS to execution changes in LIFO order
* Remove unused var
* Clippy
We recently ran a large-block experiment on the testnet and plan to do a further experiment on mainnet.
Although the metrics recovered from lighthouse nodes were quite useful, I think we could do with greater resolution in the block delay metrics and get some specific values for each block (currently these can be lost to large exponential histogram buckets).
This PR increases the resolution of the block delay histogram buckets, but also introduces a new metric which records the last block delay. Depending on the polling resolution of the metric server, we can lose some block delay information, however it will always give us a specific value and we will not lose exact data based on poor resolution histogram buckets.
## Issue Addressed
NA
## Proposed Changes
Myself and others (#3678) have observed that when running with lots of validators (e.g., 1000s) the cardinality is too much for Prometheus. I've seen Prometheus instances just grind to a halt when we turn the validator monitor on for our testnet validators (we have 10,000s of Goerli validators). Additionally, the debug log volume can get very high with one log per validator, per attestation.
To address this, the `bn --validator-monitor-individual-tracking-threshold <INTEGER>` flag has been added to *disable* per-validator (i.e., non-aggregated) metrics/logging once the validator monitor exceeds the threshold of validators. The default value is `64`, which is a finger-to-the-wind value. I don't actually know the value at which Prometheus starts to become overwhelmed, but I've seen it work with ~64 validators and I've seen it *not* work with 1000s of validators. A default of `64` seems like it will result in a breaking change to users who are running millions of dollars worth of validators whilst resulting in a no-op for low-validator-count users. I'm open to changing this number, though.
Additionally, this PR starts collecting aggregated Prometheus metrics (e.g., total count of head hits across all validators), so that high-validator-count validators still have some interesting metrics. We already had logging for aggregated values, so nothing has been added there.
I've opted to make this a breaking change since it can be rather damaging to your Prometheus instance to accidentally enable the validator monitor with large numbers of validators. I've crashed a Prometheus instance myself and had a report from another user who's done the same thing.
## Additional Info
NA
## Breaking Changes Note
A new label has been added to the validator monitor Prometheus metrics: `total`. This label tracks the aggregated metrics of all validators in the validator monitor (as opposed to each validator being tracking individually using its pubkey as the label).
Additionally, a new flag has been added to the Beacon Node: `--validator-monitor-individual-tracking-threshold`. The default value is `64`, which means that when the validator monitor is tracking more than 64 validators then it will stop tracking per-validator metrics and only track the `all_validators` metric. It will also stop logging per-validator logs and only emit aggregated logs (the exception being that exit and slashing logs are always emitted).
These changes were introduced in #3728 to address issues with untenable Prometheus cardinality and log volume when using the validator monitor with high validator counts (e.g., 1000s of validators). Users with less than 65 validators will see no change in behavior (apart from the added `all_validators` metric). Users with more than 65 validators who wish to maintain the previous behavior can set something like `--validator-monitor-individual-tracking-threshold 999999`.
## Issue Addressed
Recent discussions with other client devs about optimistic sync have revealed a conceptual issue with the optimisation implemented in #3738. In designing that feature I failed to consider that the execution node checks the `blockHash` of the execution payload before responding with `SYNCING`, and that omitting this check entirely results in a degradation of the full node's validation. A node omitting the `blockHash` checks could be tricked by a supermajority of validators into following an invalid chain, something which is ordinarily impossible.
## Proposed Changes
I've added verification of the `payload.block_hash` in Lighthouse. In case of failure we log a warning and fall back to verifying the payload with the execution client.
I've used our existing dependency on `ethers_core` for RLP support, and a new dependency on Parity's `triehash` crate for the Merkle patricia trie. Although the `triehash` crate is currently unmaintained it seems like our best option at the moment (it is also used by Reth, and requires vastly less boilerplate than Parity's generic `trie-root` library).
Block hash verification is pretty quick, about 500us per block on my machine (mainnet).
The optimistic finalized sync feature can be disabled using `--disable-optimistic-finalized-sync` which forces full verification with the EL.
## Additional Info
This PR also introduces a new dependency on our [`metastruct`](https://github.com/sigp/metastruct) library, which was perfectly suited to the RLP serialization method. There will likely be changes as `metastruct` grows, but I think this is a good way to start dogfooding it.
I took inspiration from some Parity and Reth code while writing this, and have preserved the relevant license headers on the files containing code that was copied and modified.
- there was a bug in responding range blob requests where we would incorrectly label the first slot of an epoch as a non-skipped slot if it were skipped. this bug did not exist in the code for responding to block range request because the logic error was mitigated by defensive coding elsewhere
- there was a bug where a block received during range sync without a corresponding blob (and vice versa) was incorrectly interpreted as a stream termination
- RPC size limit fixes.
- Our blob cache was dead locking so I removed use of it for now.
- Because of our change in finalized sync batch size from 2 to 1 and our transition to using exact epoch boundaries for batches (rather than one slot past the epoch boundary), we need to sync finalized sync to 2 epochs + 1 slot past our peer's finalized slot in order to finalize the chain locally.
- use fork context bytes in rpc methods on both the server and client side
## Issue Addressed
While testing withdrawals with @ethDreamer we noticed lighthouse is sending empty batches when an error occurs. As LH peer receiving this, we would consider this a low tolerance action because the peer is claiming the batch is right and is empty.
## Proposed Changes
If any kind of error occurs, send a error response instead
## Additional Info
Right now we don't handle such thing as a partial batch with an error. If an error is received, the whole batch is discarded. Because of this it makes little sense to send partial batches that end with an error, so it's better to do the proposed solution instead of sending empty batches.
## Issue Addressed
Implementing the light_client_gossip topics but I'm not there yet.
Which issue # does this PR address?
Partially #3651
## Proposed Changes
Add light client gossip topics.
Please list or describe the changes introduced by this PR.
I'm going to Implement light_client_finality_update and light_client_optimistic_update gossip topics. Currently I've attempted the former and I'm seeking feedback.
## Additional Info
I've only implemented the light_client_finality_update topic because I wanted to make sure I was on the correct path. Also checking that the gossiped LightClientFinalityUpdate is the same as the locally constructed one is not implemented because caching the updates will make this much easier. Could someone give me some feedback on this please?
Please provide any additional information. For example, future considerations
or information useful for reviewers.
Co-authored-by: GeemoCandama <104614073+GeemoCandama@users.noreply.github.com>
## Issue Addressed
#3704
## Proposed Changes
Adds is_syncing_finalized: bool parameter for block verification functions. Sets the payload_verification_status to Optimistic if is_syncing_finalized is true. Uses SyncState in NetworkGlobals in BeaconProcessor to retrieve the syncing status.
## Additional Info
I could implement FinalizedSignatureVerifiedBlock if you think it would be nicer.
## Issue Addressed
Partially addresses #3651
## Proposed Changes
Adds server-side support for light_client_bootstrap_v1 topic
## Additional Info
This PR, creates each time a bootstrap without using cache, I do not know how necessary a cache is in this case as this topic is not supposed to be called frequently and IMHO we can just prevent abuse by using the limiter, but let me know what you think or if there is any caveat to this, or if it is necessary only for the sake of good practice.
Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
* add a rt is_blob_batch
* use the mixed type everywhere
* glue
* more glue
* minor fixes
* fix range tests
* filling in the gaps
* moore filling in the gaps
* add capella gossip boiler plate
* get everything compiling
Co-authored-by: realbigsean <sean@sigmaprime.io
Co-authored-by: Mark Mackey <mark@sigmaprime.io>
* small cleanup
* small cleanup
* cargo fix + some test cleanup
* improve block production
* add fixme for potential panic
Co-authored-by: Mark Mackey <mark@sigmaprime.io>
## Issue Addressed
#2847
## Proposed Changes
Add under a feature flag the required changes to subscribe to long lived subnets in a deterministic way
## Additional Info
There is an additional required change that is actually searching for peers using the prefix, but I find that it's best to make this change in the future
## Issue Addressed
🐞 in which we don't actually unsubscribe from a random long lived subnet when it expires
## Proposed Changes
Remove code addressing a specific case in which we are subscribed to all subnets and handle the removal of the long lived subnet. I don't think the special case code is particularly important as, if someone is running with that many validators to be subscribed to all subnets, it should use `--subscribe-all-subnets` instead
## Additional Info
Noticed on some test nodes climbing bandwidth usage periodically (around 27hours, the time of subnet expirations) I'm running this code to test this does not happen anymore, but I think it should be good now
## Issue Addressed
NA
## Proposed Changes
This PR removes duplicated block root computation.
Computing the `SignedBeaconBlock::canonical_root` has become more expensive since the merge as we need to compute the merke root of each transaction inside an `ExecutionPayload`.
Computing the root for [a mainnet block](https://beaconcha.in/slot/4704236) is taking ~10ms on my i7-8700K CPU @ 3.70GHz (no sha extensions). Given that our median seen-to-imported time for blocks is presently 300-400ms, removing a few duplicated block roots (~30ms) could represent an easy 10% improvement. When we consider that the seen-to-imported times include operations *after* the block has been placed in the early attester cache, we could expect the 30ms to be more significant WRT our seen-to-attestable times.
## Additional Info
NA
## Issue Addressed
NA
## Proposed Changes
I have observed scenarios on Goerli where Lighthouse was receiving attestations which reference the same, un-cached shuffling on multiple threads at the same time. Lighthouse was then loading the same state from database and determining the shuffling on multiple threads at the same time. This is unnecessary load on the disk and RAM.
This PR modifies the shuffling cache so that each entry can be either:
- A committee
- A promise for a committee (i.e., a `crossbeam_channel::Receiver`)
Now, in the scenario where we have thread A and thread B simultaneously requesting the same un-cached shuffling, we will have the following:
1. Thread A will take the write-lock on the shuffling cache, find that there's no cached committee and then create a "promise" (a `crossbeam_channel::Sender`) for a committee before dropping the write-lock.
1. Thread B will then be allowed to take the write-lock for the shuffling cache and find the promise created by thread A. It will block the current thread waiting for thread A to fulfill that promise.
1. Thread A will load the state from disk, obtain the shuffling, send it down the channel, insert the entry into the cache and then continue to verify the attestation.
1. Thread B will then receive the shuffling from the receiver, be un-blocked and then continue to verify the attestation.
In the case where thread A fails to generate the shuffling and drops the sender, the next time that specific shuffling is requested we will detect that the channel is disconnected and return a `None` entry for that shuffling. This will cause the shuffling to be re-calculated.
## Additional Info
NA
## Issue Addressed
#3285
## Proposed Changes
Adds support for specifying histogram with buckets and adds new metric buckets for metrics mentioned in issue.
## Additional Info
Need some help for the buckets.
Co-authored-by: Michael Sproul <micsproul@gmail.com>
## Issue Addressed
We currently subscribe to attestation subnets as soon as the subscription arrives (one epoch in advance), this makes it so that subscriptions for future slots are scheduled instead of done immediately.
## Proposed Changes
- Schedule subscriptions to subnets for future slots.
- Finish removing hashmap_delay, in favor of [delay_map](https://github.com/AgeManning/delay_map). This was the only remaining service to do this.
- Subscriptions for past slots are rejected, before we would subscribe for one slot.
- Add a new test for subscriptions that are not consecutive.
## Additional Info
This is also an effort in making the code easier to understand
## Issue Addressed
NA
## Proposed Changes
As we've seen on Prater, there seems to be a correlation between these messages
```
WARN Not enough time for a discovery search subnet_id: ExactSubnet { subnet_id: SubnetId(19), slot: Slot(3742336) }, service: attestation_service
```
... and nodes falling 20-30 slots behind the head for short periods. These nodes are running ~20k Prater validators.
After running some metrics, I can see that the `network_recv` channel is processing ~250k `AttestationSubscribe` messages per minute. It occurred to me that perhaps the `AttestationSubscribe` messages are "washing out" the `SendRequest` and `SendResponse` messages. In this PR I separate the `AttestationSubscribe` and `SyncCommitteeSubscribe` messages into their own queue so the `tokio::select!` in the `NetworkService` can still process the other messages in the `network_recv` channel without necessarily having to clear all the subscription messages first.
~~I've also added filter to the HTTP API to prevent duplicate subscriptions going to the network service.~~
## Additional Info
- Currently being tested on Prater
## Proposed Changes
This PR has two aims: to speed up attestation packing in the op pool, and to fix bugs in the verification of attester slashings, proposer slashings and voluntary exits. The changes are bundled into a single database schema upgrade (v12).
Attestation packing is sped up by removing several inefficiencies:
- No more recalculation of `attesting_indices` during packing.
- No (unnecessary) examination of the `ParticipationFlags`: a bitfield suffices. See `RewardCache`.
- No re-checking of attestation validity during packing: the `AttestationMap` provides attestations which are "correct by construction" (I have checked this using Hydra).
- No SSZ re-serialization for the clunky `AttestationId` type (it can be removed in a future release).
So far the speed-up seems to be roughly 2-10x, from 500ms down to 50-100ms.
Verification of attester slashings, proposer slashings and voluntary exits is fixed by:
- Tracking the `ForkVersion`s that were used to verify each message inside the `SigVerifiedOp`. This allows us to quickly re-verify that they match the head state's opinion of what the `ForkVersion` should be at the epoch(s) relevant to the message.
- Storing the `SigVerifiedOp` on disk rather than the raw operation. This allows us to continue track the fork versions after a reboot.
This is mostly contained in this commit 52bb1840ae5c4356a8fc3a51e5df23ed65ed2c7f.
## Additional Info
The schema upgrade uses the justified state to re-verify attestations and compute `attesting_indices` for them. It will drop any attestations that fail to verify, by the logic that attestations are most valuable in the few slots after they're observed, and are probably stale and useless by the time a node restarts. Exits and proposer slashings and similarly re-verified to obtain `SigVerifiedOp`s.
This PR contains a runtime killswitch `--paranoid-block-proposal` which opts out of all the optimisations in favour of closely verifying every included message. Although I'm quite sure that the optimisations are correct this flag could be useful in the event of an unforeseen emergency.
Finally, you might notice that the `RewardCache` appears quite useless in its current form because it is only updated on the hot-path immediately before proposal. My hope is that in future we can shift calls to `RewardCache::update` into the background, e.g. while performing the state advance. It is also forward-looking to `tree-states` compatibility, where iterating and indexing `state.{previous,current}_epoch_participation` is expensive and needs to be minimised.
## Issue Addressed
#3032
## Proposed Changes
Pause sync when ee is offline. Changes include three main parts:
- Online/offline notification system
- Pause sync
- Resume sync
#### Online/offline notification system
- The engine state is now guarded behind a new struct `State` that ensures every change is correctly notified. Notifications are only sent if the state changes. The new `State` is behind a `RwLock` (as before) as the synchronization mechanism.
- The actual notification channel is a [tokio::sync::watch](https://docs.rs/tokio/latest/tokio/sync/watch/index.html) which ensures only the last value is in the receiver channel. This way we don't need to worry about message order etc.
- Sync waits for state changes concurrently with normal messages.
#### Pause Sync
Sync has four components, pausing is done differently in each:
- **Block lookups**: Disabled while in this state. We drop current requests and don't search for new blocks. Block lookups are infrequent and I don't think it's worth the extra logic of keeping these and delaying processing. If we later see that this is required, we can add it.
- **Parent lookups**: Disabled while in this state. We drop current requests and don't search for new parents. Parent lookups are even less frequent and I don't think it's worth the extra logic of keeping these and delaying processing. If we later see that this is required, we can add it.
- **Range**: Chains don't send batches for processing to the beacon processor. This is easily done by guarding the channel to the beacon processor and giving it access only if the ee is responsive. I find this the simplest and most powerful approach since we don't need to deal with new sync states and chain segments that are added while the ee is offline will follow the same logic without needing to synchronize a shared state among those. Another advantage of passive pause vs active pause is that we can still keep track of active advertised chain segments so that on resume we don't need to re-evaluate all our peers.
- **Backfill**: Not affected by ee states, we don't pause.
#### Resume Sync
- **Block lookups**: Enabled again.
- **Parent lookups**: Enabled again.
- **Range**: Active resume. Since the only real pause range does is not sending batches for processing, resume makes all chains that are holding read-for-processing batches send them.
- **Backfill**: Not affected by ee states, no need to resume.
## Additional Info
**QUESTION**: Originally I made this to notify and change on synced state, but @pawanjay176 on talks with @paulhauner concluded we only need to check online/offline states. The upcheck function mentions extra checks to have a very up to date sync status to aid the networking stack. However, the only need the networking stack would have is this one. I added a TODO to review if the extra check can be removed
Next gen of #3094
Will work best with #3439
Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
## Issue Addressed
Solves #3390
So after checking some logs @pawanjay176 got, we conclude that this happened because we blacklisted a chain after trying it "too much". Now here, in all occurrences it seems that "too much" means we got too many download failures. This happened very slowly, exactly because the batch is allowed to stay alive for very long times after not counting penalties when the ee is offline. The error here then was not that the batch failed because of offline ee errors, but that we blacklisted a chain because of download errors, which we can't pin on the chain but on the peer. This PR fixes that.
## Proposed Changes
Adds a missing piece of logic so that if a chain fails for errors that can't be attributed to an objectively bad behavior from the peer, it is not blacklisted. The issue at hand occurred when new peers arrived claiming a head that had wrongfully blacklisted, even if the original peers participating in the chain were not penalized.
Another notable change is that we need to consider a batch invalid if it processed correctly but its next non empty batch fails processing. Now since a batch can fail processing in non empty ways, there is no need to mark as invalid previous batches.
Improves some logging as well.
## Additional Info
We should do this regardless of pausing sync on ee offline/unsynced state. This is because I think it's almost impossible to ensure a processing result will reach in a predictable order with a synced notification from the ee. Doing this handles what I think are inevitable data races when we actually pause sync
This also fixes a return that reports which batch failed and caused us some confusion checking the logs