## Issue Addressed
#3804
## Proposed Changes
- Add `total_balance` to the validator monitor and adjust the number of historical epochs which are cached.
- Allow certain values in the cache to be served out via the HTTP API without requiring a state read.
## Usage
```
curl -X POST "http://localhost:5052/lighthouse/ui/validator_info" -d '{"indices": [0]}' -H "Content-Type: application/json" | jq
```
```
{
"data": {
"validators": {
"0": {
"info": [
{
"epoch": 172981,
"total_balance": 36566388519
},
...
{
"epoch": 172990,
"total_balance": 36566496513
}
]
},
"1": {
"info": [
{
"epoch": 172981,
"total_balance": 36355797968
},
...
{
"epoch": 172990,
"total_balance": 36355905962
}
]
}
}
}
}
```
## Additional Info
This requires no historical states to operate which mean it will still function on the freshly checkpoint synced node, however because of this, the values will populate each epoch (up to a maximum of 10 entries).
Another benefit of this method, is that we can easily cache any other values which would normally require a state read and serve them via the same endpoint. However, we would need be cautious about not overly increasing block processing time by caching values from complex computations.
This also caches some of the validator metrics directly, rather than pulling them from the Prometheus metrics when the API is called. This means when the validator count exceeds the individual monitor threshold, the cached values will still be available.
Co-authored-by: Paul Hauner <paul@paulhauner.com>
* Modify comment to only include 4844
Capella only modifies per epoch processing by adding
`process_historical_summaries_update`, which does not change the realization of
justification or finality.
Whilst 4844 does not currently modify realization, the spec is not yet final
enough to say that it never will.
* Clarify address change verification comment
The verification of the address change doesn't really have anything to do with
the current epoch. I think this was just a copy-paste from a function like
`verify_exit`.
* Add extra encoding/decoding tests
* Remove TODO
The method LGTM
* Remove `FreeAttestation`
This is an ancient relic, I'm surprised it still existed!
* Add paranoid check for eip4844 code
This is not technically necessary, but I think it's nice to be explicit about
EIP4844 consensus code for the time being.
* Reduce big-O complexity of address change pruning
I'm not sure this is *actually* useful, but it might come in handy if we see a
ton of address changes at the fork boundary. I know the devops team have been
testing with ~100k changes, so maybe this will help in that case.
* Revert "Reduce big-O complexity of address change pruning"
This reverts commit e7d93e6cc7cf1b92dd5a9e1966ce47d4078121eb.
* Remove CapellaReadiness::NotSynced
Some EEs have a habit of flipping between synced/not-synced, which causes some
spurious "Not read for the merge" messages back before the merge. For the
merge, if the EE wasn't synced the CE simple wouldn't go through the transition
(due to optimistic sync stuff). However, we don't have that hard requirement
for Capella; the CE will go through the fork and just wait for the EE to catch
up. I think that removing `NotSynced` here will avoid false-positives on the
"Not ready logs..". We'll be creating other WARN/ERRO logs if the EE isn't
synced, anyway.
* Change some Capella readiness logging
There's two changes here:
1. Shorten the log messages, for readability.
2. Change the hints.
Connecting a Capella-ready LH to a non-Capella-ready EE gives this log:
```
WARN Not ready for Capella info: The execution endpoint does not appear to support the required engine api methods for Capella: Required Methods Unsupported: engine_getPayloadV2 engine_forkchoiceUpdatedV2 engine_newPayloadV2, service: slot_notifier
```
This variant of error doesn't get a "try updating" style hint, when it's the
one that needs it. This is because we detect the method-not-found reponse from
the EE and return default capabilities, rather than indicating that the request
fails. I think it's fair to say that an EE upgrade is required whenever it
doesn't provide the required methods.
I changed the `ExchangeCapabilitiesFailed` message since that can only happen
when the EE fails to respond with anything other than success or not-found.
## Proposed Changes
* Bump Go from 1.17 to 1.20. The latest Geth release v1.11.0 requires 1.18 minimum.
* Prevent a cache miss during payload building by using the right fee recipient. This prevents Geth v1.11.0 from building a block with 0 transactions. The payload building mechanism is overhauled in the new Geth to improve the payload every 2s, and the tests were failing because we were falling back on a `getPayload` call with no lookahead due to `get_payload_id` cache miss caused by the mismatched fee recipient. Alternatively we could hack the tests to send `proposer_preparation_data`, but I think the static fee recipient is simpler for now.
* Add support for optionally enabling Lighthouse logs in the integration tests. Enable using `cargo run --release --features logging/test_logger`. This was very useful for debugging.
## Issue Addressed
I discovered this issue while implementing [this test](https://github.com/jimmygchen/lighthouse/blob/test-example/beacon_node/network/src/beacon_processor/tests.rs#L895), where I tried to manipulate the slot clock with:
`rig.chain.slot_clock.set_current_time(duration);`
however the change doesn't get reflected in the `slot_clock` in `ReprocessQueue`, and I realised `slot_clock` was cloned a few times in the code, and therefore changing the time in `rig.chain.slot_clock` doesn't have any effect in `ReprocessQueue`.
I've incorporated the suggestion from the @paulhauner and @michaelsproul - wrapping the `ManualSlotClock.current_time` (`RwLock<Duration>)` in an `Arc`, and the above test now passes.
Let's see if this breaks any existing tests :)
## Issue Addressed
Windows tests for subscription and unsubscriptions fail in CI sporadically. We usually ignore this failures, so this PR aims to help reduce the failure noise. Associated issue is https://github.com/sigp/lighthouse/issues/3960
On heavily crowded networks, we are seeing many attempted connections to our node every second.
Often these connections come from peers that have just been disconnected. This can be for a number of reasons including:
- We have deemed them to be not as useful as other peers
- They have performed poorly
- They have dropped the connection with us
- The connection was spontaneously lost
- They were randomly removed because we have too many peers
In all of these cases, if we have reached or exceeded our target peer limit, there is no desire to accept new connections immediately after the disconnect from these peers. In fact, it often costs us resources to handle the established connections and defeats some of the logic of dropping them in the first place.
This PR adds a timeout, that prevents recently disconnected peers from reconnecting to us.
Technically we implement a ban at the swarm layer to prevent immediate re connections for at least 10 minutes. I decided to keep this light, and use a time-based LRUCache which only gets updated during the peer manager heartbeat to prevent added stress of polling a delay map for what could be a large number of peers.
This cache is bounded in time. An extra space bound could be added should people consider this a risk.
Co-authored-by: Diva M <divma@protonmail.com>