Commit Graph

2391 Commits

Author SHA1 Message Date
Emilia Hane
b88d888145
fixup! Plug in pruning of blobs into app 2023-02-08 11:44:30 +01:00
Emilia Hane
2a41f25d68
fixup! Prune blobs before data availability breakpoint 2023-02-08 11:44:30 +01:00
Emilia Hane
fe0c911402
Plug in pruning of blobs into app 2023-02-08 11:44:30 +01:00
Emilia Hane
7bf88c2336
Prune blobs before data availability breakpoint 2023-02-08 11:44:30 +01:00
Emilia Hane
e2a6da4274
Boiler plate code for blobs pruning 2023-02-08 11:44:20 +01:00
realbigsean
a42d07592c
fix compilation issues after merge 2023-02-07 12:33:29 -05:00
realbigsean
26a296246d
Merge branch 'capella' of https://github.com/sigp/lighthouse into eip4844
# Conflicts:
#	beacon_node/beacon_chain/src/beacon_chain.rs
#	beacon_node/beacon_chain/src/block_verification.rs
#	beacon_node/beacon_chain/src/test_utils.rs
#	beacon_node/execution_layer/src/engine_api.rs
#	beacon_node/execution_layer/src/engine_api/http.rs
#	beacon_node/execution_layer/src/lib.rs
#	beacon_node/execution_layer/src/test_utils/handle_rpc.rs
#	beacon_node/http_api/src/lib.rs
#	beacon_node/http_api/tests/fork_tests.rs
#	beacon_node/network/src/beacon_processor/mod.rs
#	beacon_node/network/src/beacon_processor/work_reprocessing_queue.rs
#	beacon_node/network/src/beacon_processor/worker/sync_methods.rs
#	beacon_node/operation_pool/src/bls_to_execution_changes.rs
#	beacon_node/operation_pool/src/lib.rs
#	beacon_node/operation_pool/src/persistence.rs
#	consensus/serde_utils/src/u256_hex_be_opt.rs
#	testing/antithesis/Dockerfile.libvoidstar
2023-02-07 12:12:56 -05:00
realbigsean
3533ed418e
pr feedback and bugfixes 2023-02-07 08:36:35 -05:00
Paul Hauner
e062a7cf76
Broadcast address changes at Capella (#3919)
* Add first efforts at broadcast

* Tidy

* Move broadcast code to client

* Progress with broadcast impl

* Rename to address change

* Fix compile errors

* Use `while` loop

* Tidy

* Flip broadcast condition

* Switch to forgetting individual indices

* Always broadcast when the node starts

* Refactor into two functions

* Add testing

* Add another test

* Tidy, add more testing

* Tidy

* Add test, rename enum

* Rename enum again

* Tidy

* Break loop early

* Add V15 schema migration

* Bump schema version

* Progress with migration

* Update beacon_node/client/src/address_change_broadcast.rs

Co-authored-by: Michael Sproul <micsproul@gmail.com>

* Fix typo in function name

---------

Co-authored-by: Michael Sproul <micsproul@gmail.com>
2023-02-07 17:13:49 +11:00
ethDreamer
5b398b1990
Don't Reject all Builder Bids After Capella (#3940)
* Fix bug in Builder API Post-Capella

* Fix Clippy Complaints
2023-02-06 13:09:13 +11:00
sean
e5896d9b71 re-order methods 2023-02-05 18:14:24 -05:00
sean
1315098c15 variable list from -> new 2023-02-05 17:56:03 -05:00
sean
38db8d7952 add back in 4844 tx consistencycheck during payload reconstruction 2023-02-05 17:31:20 -05:00
sean
f22aac1603 improve error handling 2023-02-05 17:26:36 -05:00
sean
90e25dc6cf from to new 2023-02-05 16:55:55 -05:00
sean
94a369b9df blob tx decoding 2023-02-05 16:39:15 -05:00
realbigsean
d9e83e6cec blob decoding 2023-02-03 17:06:33 -05:00
ethDreamer
90b6ae62e6
Use Local Payload if More Profitable than Builder (#3934)
* Use Local Payload if More Profitable than Builder

* Rename clone -> clone_from_ref

* Minimize Clones of GetPayloadResponse

* Cleanup & Fix Tests

* Added Tests for Payload Choice by Profit

* Fix Outdated Comments
2023-02-01 19:37:46 -06:00
Mark Mackey
d842215a44
Merge branch 'upstream/unstable' into capella 2023-01-31 12:16:26 -06:00
ethDreamer
7b7595347d
exchangeCapabilities & Capella Readiness Logging (#3918)
* Undo Passing Spec to Engine API

* Utilize engine_exchangeCapabilities

* Add Logging to Indicate Capella Readiness

* Add exchangeCapabilities to mock_execution_layer

* Send Nested Array for engine_exchangeCapabilities

* Use Mutex Instead of RwLock for EngineCapabilities

* Improve Locking to Avoid Deadlock

* Prettier logic for get_engine_capabilities

* Improve Comments

* Update beacon_node/beacon_chain/src/capella_readiness.rs

Co-authored-by: Michael Sproul <micsproul@gmail.com>

* Update beacon_node/beacon_chain/src/capella_readiness.rs

Co-authored-by: Michael Sproul <micsproul@gmail.com>

* Update beacon_node/beacon_chain/src/capella_readiness.rs

Co-authored-by: Michael Sproul <micsproul@gmail.com>

* Update beacon_node/beacon_chain/src/capella_readiness.rs

Co-authored-by: Michael Sproul <micsproul@gmail.com>

* Update beacon_node/beacon_chain/src/capella_readiness.rs

Co-authored-by: Michael Sproul <micsproul@gmail.com>

* Update beacon_node/client/src/notifier.rs

Co-authored-by: Michael Sproul <micsproul@gmail.com>

* Update beacon_node/execution_layer/src/engine_api/http.rs

Co-authored-by: Michael Sproul <micsproul@gmail.com>

* Addressed Michael's Comments

---------

Co-authored-by: Michael Sproul <micsproul@gmail.com>
2023-01-31 18:26:23 +01:00
realbigsean
6d2dff66a0
Merge pull request #3926 from divagant-martian/stream-timeout-blob-bug
send stream terminators
2023-01-27 19:05:04 +01:00
realbigsean
b7e20fb87a
Update beacon_node/lighthouse_network/src/rpc/protocol.rs 2023-01-27 19:03:43 +01:00
Diva M
9976d3bbbc
send stream terminators 2023-01-27 18:11:26 +01:00
realbigsean
37e7c1d5c7 keep verification of payloads pre 4844 2023-01-27 17:59:40 +01:00
realbigsean
2645249eb5 Merge branch 'eip4844' of https://github.com/sigp/lighthouse into fix-and-loosen-execution-block-decoding 2023-01-27 11:54:30 +01:00
realbigsean
2c12200a1a fix compile 2023-01-27 11:41:59 +01:00
realbigsean
dd512cd82a stub out tx root check, fix block hash calculation 2023-01-27 11:39:26 +01:00
Michael Sproul
0866b739d0 Clippy 1.67 (#3916)
## Proposed Changes

Clippy 1.67.0 put us on blast for the size of some of our errors, most of them written by me ( 👀 ). This PR shrinks the size of `BeaconChainError` by dropping some extraneous info and boxing an inner error which should only occur infrequently anyway.

For the `AttestationSlashInfo` and `BlockSlashInfo` I opted to ignore the lint as they are always used in a `Result<A, Info>` where `A` is a similar size. This means they don't bloat the size of the `Result`, so it's a bit annoying for Clippy to report this as an issue.

I also chose to ignore `clippy::uninlined-format-args` because I think the benefit-to-churn ratio is too low. E.g. sometimes we have long identifiers in `format!` args and IMO the non-inlined form is easier to read:

```rust
// I prefer this...
format!(
    "{} did {} to {}",
    REALLY_LONG_CONSTANT_NAME,
    ANOTHER_REALLY_LONG_CONSTANT_NAME,
    regular_long_identifier_name
);
  
// To this
format!("{REALLY_LONG_CONSTANT_NAME} did {ANOTHER_REALLY_LONG_CONSTANT_NAME} to {regular_long_identifier_name}");
```

I tried generating an automatic diff with `cargo clippy --fix` but it came out at:

```
250 files changed, 1209 insertions(+), 1469 deletions(-)
```

Which seems like a bad idea when we'd have to back-merge it to `capella` and `eip4844` 😱
2023-01-27 09:48:42 +00:00
realbigsean
7c8d97c06e remove unused import 2023-01-25 14:26:01 +01:00
Michael Sproul
17b6a6094f Update another test broken by the shuffling change 2023-01-25 14:23:35 +01:00
Michael Sproul
494a270279 Fix the new BLS to execution change test 2023-01-25 14:23:35 +01:00
Michael Sproul
550d63f3fe Update sync rewards API for abstract exec payload 2023-01-25 14:23:35 +01:00
GeemoCandama
f857811e5f light client optimistic update reprocessing (#3799)
Currently there is a race between receiving blocks and receiving light client optimistic updates (in unstable), which results in processing errors. This is a continuation of PR #3693 and seeks to progress on issue #3651

Add the parent_root to ReprocessQueueMessage::BlockImported so we can remove blocks from queue when a block arrives that has the same parent root. We use the parent root as opposed to the block_root because the LightClientOptimisticUpdate does not contain the block_root.

If light_client_optimistic_update.attested_header.canonical_root() != head_block.message().parent_root() then we queue the update. Otherwise we process immediately.
michaelsproul came up with this idea.
The code was heavily based off of the attestation reprocessing.
I have not properly tested this to see if it works as intended.
2023-01-25 14:23:33 +01:00
naviechan
9b5c2eefd5 Implement sync_committee_rewards API (per-validator reward) (#3903)
[#3661](https://github.com/sigp/lighthouse/issues/3661)

`/eth/v1/beacon/rewards/sync_committee/{block_id}`

```
{
  "execution_optimistic": false,
  "finalized": false,
  "data": [
    {
      "validator_index": "0",
      "reward": "2000"
    }
  ]
}
```

The issue contains the implementation of three per-validator reward APIs:
* `sync_committee_rewards`
* [`attestation_rewards`](https://github.com/sigp/lighthouse/pull/3822)
* `block_rewards`

This PR only implements the `sync_committe_rewards `.

The endpoints can be viewed in the Ethereum Beacon nodes API browser: [https://ethereum.github.io/beacon-APIs/?urls.primaryName=dev#/Rewards](https://ethereum.github.io/beacon-APIs/?urls.primaryName=dev#/Rewards)

The implementation of [consensus client reward APIs](https://github.com/eth-protocol-fellows/cohort-three/blob/master/projects/project-ideas.md#consensus-client-reward-apis) is part of the [EPF](https://github.com/eth-protocol-fellows/cohort-three).

Co-authored-by: navie <naviechan@gmail.com>
Co-authored-by: kevinbogner <kevbogner@gmail.com>
2023-01-25 14:22:15 +01:00
ethDreamer
eb9da6c837 Use eth1_withdrawal_credentials in Test States (#3898)
* Use eth1_withdrawal_credential in Some Test States

* Update beacon_node/genesis/src/interop.rs

Co-authored-by: Michael Sproul <micsproul@gmail.com>

* Update beacon_node/genesis/src/interop.rs

Co-authored-by: Michael Sproul <micsproul@gmail.com>

* Increase validator sizes

* Pick next sync committee message

Co-authored-by: Michael Sproul <micsproul@gmail.com>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
2023-01-25 14:21:54 +01:00
Michael Sproul
a4cfe50ade Import BLS to execution changes before Capella (#3892)
* Import BLS to execution changes before Capella

* Test for BLS to execution change HTTP API

* Pack BLS to execution changes in LIFO order

* Remove unused var

* Clippy
2023-01-25 14:21:54 +01:00
Age Manning
528f7181bc Improve block delay metrics (#3894)
We recently ran a large-block experiment on the testnet and plan to do a further experiment on mainnet.

Although the metrics recovered from lighthouse nodes were quite useful, I think we could do with greater resolution in the block delay metrics and get some specific values for each block (currently these can be lost to large exponential histogram buckets). 

This PR increases the resolution of the block delay histogram buckets, but also introduces a new metric which records the last block delay. Depending on the polling resolution of the metric server, we can lose some block delay information, however it will always give us a specific value and we will not lose exact data based on poor resolution histogram buckets.
2023-01-25 14:21:53 +01:00
Santiago Medina
bd7bd005ee Return HTTP 404 rather than 405 (#3836)
Issue #3112

Add `Filter::recover` to the GET chain to handle rejections specifically as 404 NOT FOUND

Making a request to `http://localhost:5052/not_real` now returns the following:

```
{
    "code": 404,
    "message": "NOT_FOUND",
    "stacktraces": []
}
```

Co-authored-by: Paul Hauner <paul@paulhauner.com>
2023-01-25 14:21:07 +01:00
realbigsean
5e8d79891b merge conflict resolution 2023-01-25 11:10:44 +01:00
realbigsean
9fde813a7e
Merge pull request #3913 from divagant-martian/reject-pre-fork-rpc-upgrades
only support 4844 rpc methods if on 4844
2023-01-25 10:56:56 +01:00
Michael Sproul
16bdb2771b
Update another test broken by the shuffling change 2023-01-25 16:18:00 +11:00
Michael Sproul
e48487db01
Fix the new BLS to execution change test 2023-01-25 15:47:07 +11:00
Michael Sproul
79a20e8a5f
Update sync rewards API for abstract exec payload 2023-01-25 15:46:47 +11:00
Michael Sproul
c76a1971cc
Merge remote-tracking branch 'origin/unstable' into capella 2023-01-25 14:20:16 +11:00
GeemoCandama
a7351c00c0 light client optimistic update reprocessing (#3799)
## Issue Addressed
Currently there is a race between receiving blocks and receiving light client optimistic updates (in unstable), which results in processing errors. This is a continuation of PR #3693 and seeks to progress on issue #3651

## Proposed Changes

Add the parent_root to ReprocessQueueMessage::BlockImported so we can remove blocks from queue when a block arrives that has the same parent root. We use the parent root as opposed to the block_root because the LightClientOptimisticUpdate does not contain the block_root.

If light_client_optimistic_update.attested_header.canonical_root() != head_block.message().parent_root() then we queue the update. Otherwise we process immediately.
## Additional Info
michaelsproul came up with this idea.
The code was heavily based off of the attestation reprocessing.
I have not properly tested this to see if it works as intended.
2023-01-24 22:17:50 +00:00
ethDreamer
3d4dd6af75
Use eth1_withdrawal_credentials in Test States (#3898)
* Use eth1_withdrawal_credential in Some Test States

* Update beacon_node/genesis/src/interop.rs

Co-authored-by: Michael Sproul <micsproul@gmail.com>

* Update beacon_node/genesis/src/interop.rs

Co-authored-by: Michael Sproul <micsproul@gmail.com>

* Increase validator sizes

* Pick next sync committee message

Co-authored-by: Michael Sproul <micsproul@gmail.com>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
2023-01-24 16:22:51 +01:00
realbigsean
d3240c1ffb fix common issue across blocks by range and blobs by range 2023-01-24 15:42:28 +01:00
realbigsean
18d4faf611 review updates 2023-01-24 15:30:29 +01:00
realbigsean
2225e6ac89 pass in data availability boundary to the get_blobs method 2023-01-24 14:35:07 +01:00
realbigsean
a4ea1761bb
Update beacon_node/beacon_chain/src/beacon_chain.rs 2023-01-24 12:28:58 +01:00
realbigsean
5b4cd997d0
Update beacon_node/lighthouse_network/src/rpc/methods.rs 2023-01-24 12:20:40 +01:00
Diva M
2d2da92132
only support 4844 rpc methods if on 4844 2023-01-24 05:15:23 -05:00
realbigsean
b658cc7aaf simplify checking attester cache for block and blobs. use ResourceUnavailable according to the spec 2023-01-24 10:50:47 +01:00
naviechan
2802bc9a9c Implement sync_committee_rewards API (per-validator reward) (#3903)
## Issue Addressed

[#3661](https://github.com/sigp/lighthouse/issues/3661)

## Proposed Changes

`/eth/v1/beacon/rewards/sync_committee/{block_id}`

```
{
  "execution_optimistic": false,
  "finalized": false,
  "data": [
    {
      "validator_index": "0",
      "reward": "2000"
    }
  ]
}
```

The issue contains the implementation of three per-validator reward APIs:
* `sync_committee_rewards`
* [`attestation_rewards`](https://github.com/sigp/lighthouse/pull/3822)
* `block_rewards`

This PR only implements the `sync_committe_rewards `.

The endpoints can be viewed in the Ethereum Beacon nodes API browser: [https://ethereum.github.io/beacon-APIs/?urls.primaryName=dev#/Rewards](https://ethereum.github.io/beacon-APIs/?urls.primaryName=dev#/Rewards)

## Additional Info

The implementation of [consensus client reward APIs](https://github.com/eth-protocol-fellows/cohort-three/blob/master/projects/project-ideas.md#consensus-client-reward-apis) is part of the [EPF](https://github.com/eth-protocol-fellows/cohort-three).

Co-authored-by: navie <naviechan@gmail.com>
Co-authored-by: kevinbogner <kevbogner@gmail.com>
2023-01-24 02:06:42 +00:00
Emilia Hane
e14550425d
Fix mismatched response bug 2023-01-23 13:23:04 +01:00
realbigsean
4a51f65ce2 move available block comment 2023-01-23 09:17:57 +01:00
realbigsean
75320ff8bc cleanup 2023-01-22 05:54:25 +01:00
Emilia Hane
81a754577d
fixup! Improve error handling 2023-01-21 15:47:33 +01:00
Emilia Hane
f32f08eec0
Fix typo 2023-01-21 14:47:14 +01:00
Emilia Hane
5fc648217d
fixup! Improve error handling 2023-01-21 14:46:24 +01:00
realbigsean
a83fd1afb4 remove unused imports 2023-01-21 04:52:36 -05:00
realbigsean
cbd09dc281 finish refactor 2023-01-21 04:48:25 -05:00
Michael Sproul
d8abf2fc41
Import BLS to execution changes before Capella (#3892)
* Import BLS to execution changes before Capella

* Test for BLS to execution change HTTP API

* Pack BLS to execution changes in LIFO order

* Remove unused var

* Clippy
2023-01-21 10:39:59 +11:00
Michael Sproul
bb0e99c097 Merge remote-tracking branch 'origin/unstable' into capella 2023-01-21 10:37:26 +11:00
realbigsean
eb9feed784
add new traits 2023-01-20 16:04:35 -05:00
Emilia Hane
f7eb89ddd9
Improve error handling 2023-01-20 21:16:47 +01:00
realbigsean
8e57eef0ed
return a BlobsUnavailable error when the block root is a pre-4844 block 2023-01-20 21:16:47 +01:00
realbigsean
c6479444c2
don't send errors when we *correctly* don't have blobs 2023-01-20 21:16:47 +01:00
realbigsean
e1ce4e5b78
make explicity BlobsUnavailable error and handle it directly 2023-01-20 21:16:47 +01:00
realbigsean
f7f64eb007
fix/consolidate some error handling 2023-01-20 21:16:47 +01:00
Emilia Hane
89cb58d17b
Fix typo
Co-authored-by: realbigsean <seananderson33@GMAIL.com>
2023-01-20 21:16:47 +01:00
Emilia Hane
9cc25162e2
Send error message if eip4844 fork disabled
Co-authored-by: realbigsean <seananderson33@GMAIL.com>
2023-01-20 21:16:46 +01:00
Emilia Hane
654e59cbba
Fix rename fn bug
Co-authored-by: realbigsean <seananderson33@GMAIL.com>
2023-01-20 21:16:46 +01:00
Emilia Hane
a00b355800
fixup! Less strict handling of faulty rpc req params and syntax improvement 2023-01-20 21:16:46 +01:00
Emilia Hane
b4ec4c1ccf
Less strict handling of faulty rpc req params and syntax improvement 2023-01-20 21:16:46 +01:00
Emilia Hane
9445ac70d8
Check data availability boundary in rpc request 2023-01-20 21:16:46 +01:00
realbigsean
3cb8fb7973
block wrapper refactor initial commit 2023-01-20 11:50:16 -05:00
Age Manning
f8a3b3b95a Improve block delay metrics (#3894)
We recently ran a large-block experiment on the testnet and plan to do a further experiment on mainnet.

Although the metrics recovered from lighthouse nodes were quite useful, I think we could do with greater resolution in the block delay metrics and get some specific values for each block (currently these can be lost to large exponential histogram buckets). 

This PR increases the resolution of the block delay histogram buckets, but also introduces a new metric which records the last block delay. Depending on the polling resolution of the metric server, we can lose some block delay information, however it will always give us a specific value and we will not lose exact data based on poor resolution histogram buckets.
2023-01-20 00:46:56 +00:00
realbigsean
8f8b7211d0
execution API related fixes 2023-01-19 09:32:08 -05:00
realbigsean
acbbdd8b9e
merge eip4844 2023-01-19 08:46:38 -05:00
ethDreamer
26787412cd
Update engine_api to Latest spec (#3893)
* Update engine_api to Latest spec

* Small Test Fix

* Fix Test Deserialization Issue
2023-01-19 22:42:17 +11:00
Pawan Dhananjay
f04486dc71
Update kzg library to use bytes only interface 2023-01-17 12:12:17 +05:30
realbigsean
ddcd10b194
merge latest capella changes 2023-01-16 09:17:18 -05:00
Santiago Medina
912ea2a5ca Return HTTP 404 rather than 405 (#3836)
## Issue Addressed

Issue #3112

## Proposed Changes

Add `Filter::recover` to the GET chain to handle rejections specifically as 404 NOT FOUND

## Additional Info

Making a request to `http://localhost:5052/not_real` now returns the following:

```
{
    "code": 404,
    "message": "NOT_FOUND",
    "stacktraces": []
}
```


Co-authored-by: Paul Hauner <paul@paulhauner.com>
2023-01-16 03:42:09 +00:00
ethDreamer
51088725fb
CL-EL withdrawals harmonization using Gwei units (#3884) 2023-01-16 12:03:42 +11:00
realbigsean
48a8cbbc45
Merge pull request #3877 from michaelsproul/withdrawals-block-hash
Verify blockHash with withdrawals
2023-01-13 15:12:58 -05:00
realbigsean
1319683736
Update gossip_methods.rs 2023-01-13 14:59:03 -05:00
Mark Mackey
05c1291d8a Don't Penalize Early bls_to_execution_change 2023-01-13 12:53:25 -06:00
realbigsean
e9affafb6b
get rid of EL endpoint switching at forks 2023-01-13 10:51:45 -05:00
Michael Sproul
8e2931d73b
Verify blockHash with withdrawals 2023-01-13 12:46:54 +11:00
realbigsean
d96d793bfb
fix compilation issues 2023-01-12 14:17:14 -05:00
realbigsean
06f71e8cce
merge capella 2023-01-12 12:51:09 -05:00
Michael Sproul
aa896decc1 Fix some beacon_chain tests 2023-01-12 19:13:01 +11:00
Michael Sproul
2af8110529
Merge remote-tracking branch 'origin/unstable' into capella
Fixing the conflicts involved patching up some of the `block_hash` verification,
the rest will be done as part of https://github.com/sigp/lighthouse/issues/3870
2023-01-12 16:22:00 +11:00
ethDreamer
52c1055fdc
Remove withdrawals-processing feature (#3864)
* Use spec to Determine Supported Engine APIs

* Remove `withdrawals-processing` feature

* Fixed Tests

* Missed Some Spots

* Fixed Another Test

* Stupid Clippy
2023-01-12 15:15:08 +11:00
realbigsean
f7f351784a
get ef tests passing after capella rebase 2023-01-11 18:32:15 -05:00
realbigsean
438126f19a
merge upstream, fix compile errors 2023-01-11 13:52:58 -05:00
Paul Hauner
38514c07f2 Release v3.4.0 (#3862)
## Issue Addressed

NA

## Proposed Changes

Bump versions

## Additional Info

- [x] ~~Blocked on #3728, #3801~~
- [x] ~~Blocked on #3866~~
- [x] Requires additional testing
2023-01-11 03:27:08 +00:00
realbigsean
98b11bbd3f
add historical summaries (#3865)
* add historical summaries

* fix tree hash caching, disable the sanity slots test with fake crypto

* add ssz static HistoricalSummary

* only store historical summaries after capella

* Teach `UpdatePattern` about Capella

* Tidy EF tests

* Clippy

Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2023-01-11 12:40:21 +11:00
Paul Hauner
830efdb5c2 Improve validator monitor experience for high validator counts (#3728)
## Issue Addressed

NA

## Proposed Changes

Myself and others (#3678) have observed  that when running with lots of validators (e.g., 1000s) the cardinality is too much for Prometheus. I've seen Prometheus instances just grind to a halt when we turn the validator monitor on for our testnet validators (we have 10,000s of Goerli validators). Additionally, the debug log volume can get very high with one log per validator, per attestation.

To address this, the `bn --validator-monitor-individual-tracking-threshold <INTEGER>` flag has been added to *disable* per-validator (i.e., non-aggregated) metrics/logging once the validator monitor exceeds the threshold of validators. The default value is `64`, which is a finger-to-the-wind value. I don't actually know the value at which Prometheus starts to become overwhelmed, but I've seen it work with ~64 validators and I've seen it *not* work with 1000s of validators. A default of `64` seems like it will result in a breaking change to users who are running millions of dollars worth of validators whilst resulting in a no-op for low-validator-count users. I'm open to changing this number, though.

Additionally, this PR starts collecting aggregated Prometheus metrics (e.g., total count of head hits across all validators), so that high-validator-count validators still have some interesting metrics. We already had logging for aggregated values, so nothing has been added there.

I've opted to make this a breaking change since it can be rather damaging to your Prometheus instance to accidentally enable the validator monitor with large numbers of validators. I've crashed a Prometheus instance myself and had a report from another user who's done the same thing.

## Additional Info

NA

## Breaking Changes Note

A new label has been added to the validator monitor Prometheus metrics: `total`. This label tracks the aggregated metrics of all validators in the validator monitor (as opposed to each validator being tracking individually using its pubkey as the label).

Additionally, a new flag has been added to the Beacon Node: `--validator-monitor-individual-tracking-threshold`. The default value is `64`, which means that when the validator monitor is tracking more than 64 validators then it will stop tracking per-validator metrics and only track the `all_validators` metric. It will also stop logging per-validator logs and only emit aggregated logs (the exception being that exit and slashing logs are always emitted).

These changes were introduced in #3728 to address issues with untenable Prometheus cardinality and log volume when using the validator monitor with high validator counts (e.g., 1000s of validators). Users with less than 65 validators will see no change in behavior (apart from the added `all_validators` metric). Users with more than 65 validators who wish to maintain the previous behavior can set something like `--validator-monitor-individual-tracking-threshold 999999`.
2023-01-09 08:18:55 +00:00