## Issue Addressed
Implements the standard key manager API from https://ethereum.github.io/keymanager-APIs/, formerly https://github.com/ethereum/beacon-APIs/pull/151
Related to https://github.com/sigp/lighthouse/issues/2557
## Proposed Changes
- [x] Add all of the new endpoints from the standard API: GET, POST and DELETE.
- [x] Add a `validators.enabled` column to the slashing protection database to support atomic disable + export.
- [x] Add tests for all the common sequential accesses of the API
- [x] Add tests for interactions with remote signer validators
- [x] Add end-to-end tests for migration of validators from one VC to another
- [x] Implement the authentication scheme from the standard (token bearer auth)
## Additional Info
The `enabled` column in the validators SQL database is necessary to prevent a race condition when exporting slashing protection data. Without the slashing protection database having a way of knowing that a key has been disabled, a concurrent request to sign a message could insert a new record into the database. The `delete_concurrent_with_signing` test exercises this code path, and was indeed failing before the `enabled` column was added.
The validator client authentication has been modified from basic auth to bearer auth, with basic auth preserved for backwards compatibility.
## Proposed Changes
Remove the check for exact equality on the beacon node spec when polling `/config/spec` from the VC. This check was always overzealous, and mostly served to check that the BN was configured for upcoming forks. I've replaced it by explicit checks of the `altair_fork_epoch` and `bellatrix_fork_epoch` instead.
## Additional Info
We should come back to this and clean it up so that we can retain compatibility while removing the field `default`s we installed.
## Issue Addressed
There was an overeager assert in the import of slashing protection data here:
fff01b24dd/validator_client/slashing_protection/src/slashing_database.rs (L939)
We were asserting that if the import contained any blocks for a validator, then the database should contain only a single block for that validator due to pruning/consolidation. However, we would only prune if the import contained _relevant blocks_ (that would actually change the maximum slot):
fff01b24dd/validator_client/slashing_protection/src/slashing_database.rs (L629-L633)
This lead to spurious failures (in the form of `ConsistencyError`s) when importing an interchange containing no new blocks for any of the validators. This wasn't hard to trigger, e.g. export and then immediately re-import the same file.
## Proposed Changes
This PR fixes the issue by simplifying the import so that it's more like the import for attestations. I.e. we make the assert true by always pruning when the imported file contains blocks.
In practice this doesn't have any downsides: if we import a new block then the behaviour is as before, except that we drop the `signing_root`. If we import an existing block or an old block then we prune the database to a single block. The only time this would be relevant is during extreme clock drift locally _plus_ import of a non-drifted interchange, which should occur infrequently.
## Additional Info
I've also added `Arbitrary` implementations to the slashing protection types so that we can fuzz them. I have a fuzzer sitting in a separate directory which I may or may not commit in a subsequent PR.
There's a new test in the standard interchange tests v5.2.1 that checks for this issue: https://github.com/eth-clients/slashing-protection-interchange-tests/pull/12
## Issue Addressed
New rust lints
## Proposed Changes
- Boxing some enum variants
- removing some unused fields (is the validator lockfile unused? seemed so to me)
## Additional Info
- some error fields were marked as dead code but are logged out in areas
- left some dead fields in our ef test code because I assume they are useful for debugging?
Co-authored-by: realbigsean <seananderson33@gmail.com>
## Issue Addressed
Resolves#2612
## Proposed Changes
Implements both the checks mentioned in the original issue.
1. Verifies the `randao_reveal` in the beacon node
2. Cross checks the proposer index after getting back the block from the beacon node.
## Additional info
The block production time increases by ~10x because of the signature verification on the beacon node (based on the `beacon_block_production_process_seconds` metric) when running on a local testnet.
## Issue Addressed
Part of https://github.com/sigp/lighthouse/issues/2557
## Proposed Changes
Refactor the slashing protection export so that it can export data for a subset of validators.
This is the last remaining building block required for supporting the standard validator API (which I'll start to build atop this branch)
## Additional Info
Built on and requires #2598
## Issue Addressed
Resolves#2262
## Proposed Changes
Add a new CLI flag `--beacon-nodes-tls-certs` which allows the user to specify a path to a certificate file (or a list of files, separated by commas). The VC will then use these certificates (in addition to the existing certificates in the OS trust store) when connecting to a beacon node over HTTPS.
## Additional Info
This only supports certificates in PEM format.
## Issue Addressed
Closes#2419
## Proposed Changes
Address a long-standing issue with the import of slashing protection data where the import would fail due to the data appearing slashable w.r.t the existing database. Importing is now idempotent, and will have no issues importing data that has been handed back and forth between different validator clients, or different implementations.
The implementation works by updating the high and low watermarks if they need updating, and not attempting to check if the input is slashable w.r.t itself or the database. This is a strengthening of the minification that we started to do by default since #2380, and what Teku has been doing since the beginning.
## Additional Info
The only feature we lose by doing this is the ability to do non-minified imports of clock drifted messages (cf. Prysm on Medalla). In theory, with the previous implementation we could import all the messages in case of clock drift and be aware of the "gap" between the real present time and the messages signed in the far future. _However_ for attestations this is close to useless, as the source epoch will advance as soon as justification occurs, which will require us to make slashable attestations with respect to our bogus attestation(s). E.g. if I sign an attestation 100=>200 when the current epoch is 101, then I won't be able to vote in any epochs prior to 101 becoming justified because 101=>102, 101=>103, etc are all surrounded by 100=>200. Seeing as signing attestations gets blocked almost immediately in this case regardless of our import behaviour, there's no point trying to handle it. For blocks the situation is more hopeful due to the lack of surrounds, but losing block proposals from validators who by definition can't attest doesn't seem like an issue (the other block proposers can pick up the slack).
Currently, the beacon node has no ability to serve the HTTP API over TLS.
Adding this functionality would be helpful for certain use cases, such as when you need a validator client to connect to a backup beacon node which is outside your local network, and the use of an SSH tunnel or reverse proxy would be inappropriate.
## Proposed Changes
- Add three new CLI flags to the beacon node
- `--http-enable-tls`: enables TLS
- `--http-tls-cert`: to specify the path to the certificate file
- `--http-tls-key`: to specify the path to the key file
- Update the HTTP API to optionally use `warp`'s [`TlsServer`](https://docs.rs/warp/0.3.1/warp/struct.TlsServer.html) depending on the presence of the `--http-enable-tls` flag
- Update tests and docs
- Use a custom branch for `warp` to ensure proper error handling
## Additional Info
Serving the API over TLS should currently be considered experimental. The reason for this is that it uses code from an [unmerged PR](https://github.com/seanmonstar/warp/pull/717). This commit provides the `try_bind_with_graceful_shutdown` method to `warp`, which is helpful for controlling error flow when the TLS configuration is invalid (cert/key files don't exist, incorrect permissions, etc).
I've implemented the same code in my [branch here](https://github.com/macladson/warp/tree/tls).
Once the code has been reviewed and merged upstream into `warp`, we can remove the dependency on my branch and the feature can be considered more stable.
Currently, the private key file must not be password-protected in order to be read into Lighthouse.
## Proposed Changes
Instead of checking for strict equality between a BN's spec and the VC's local spec, just check the genesis fork version. This prevents us from failing eagerly for minor differences, while still protecting the VC from connecting to a completely incompatible BN.
A warning is retained for the previous case where the specs are not exactly equal, which is to be expected if e.g. running against Infura before Infura configures the mainnet Altair fork epoch.
## Issue Addressed
NA
## Proposed Changes
Implements the "union" type from the SSZ spec for `ssz`, `ssz_derive`, `tree_hash` and `tree_hash_derive` so it may be derived for `enums`:
https://github.com/ethereum/consensus-specs/blob/v1.1.0-beta.3/ssz/simple-serialize.md#union
The union type is required for the merge, since the `Transaction` type is defined as a single-variant union `Union[OpaqueTransaction]`.
### Crate Updates
This PR will (hopefully) cause CI to publish new versions for the following crates:
- `eth2_ssz_derive`: `0.2.1` -> `0.3.0`
- `eth2_ssz`: `0.3.0` -> `0.4.0`
- `eth2_ssz_types`: `0.2.0` -> `0.2.1`
- `tree_hash`: `0.3.0` -> `0.4.0`
- `tree_hash_derive`: `0.3.0` -> `0.4.0`
These these crates depend on each other, I've had to add a workspace-level `[patch]` for these crates. A follow-up PR will need to remove this patch, ones the new versions are published.
### Union Behaviors
We already had SSZ `Encode` and `TreeHash` derive for enums, however it just did a "transparent" pass-through of the inner value. Since the "union" decoding from the spec is in conflict with the transparent method, I've required that all `enum` have exactly one of the following enum-level attributes:
#### SSZ
- `#[ssz(enum_behaviour = "union")]`
- matches the spec used for the merge
- `#[ssz(enum_behaviour = "transparent")]`
- maintains existing functionality
- not supported for `Decode` (never was)
#### TreeHash
- `#[tree_hash(enum_behaviour = "union")]`
- matches the spec used for the merge
- `#[tree_hash(enum_behaviour = "transparent")]`
- maintains existing functionality
This means that we can maintain the existing transparent behaviour, but all existing users will get a compile-time error until they explicitly opt-in to being transparent.
### Legacy Option Encoding
Before this PR, we already had a union-esque encoding for `Option<T>`. However, this was with the *old* SSZ spec where the union selector was 4 bytes. During merge specification, the spec was changed to use 1 byte for the selector.
Whilst the 4-byte `Option` encoding was never used in the spec, we used it in our database. Writing a migrate script for all occurrences of `Option` in the database would be painful, especially since it's used in the `CommitteeCache`. To avoid the migrate script, I added a serde-esque `#[ssz(with = "module")]` field-level attribute to `ssz_derive` so that we can opt into the 4-byte encoding on a field-by-field basis.
The `ssz::legacy::four_byte_impl!` macro allows a one-liner to define the module required for the `#[ssz(with = "module")]` for some `Option<T> where T: Encode + Decode`.
Notably, **I have removed `Encode` and `Decode` impls for `Option`**. I've done this to force a break on downstream users. Like I mentioned, `Option` isn't used in the spec so I don't think it'll be *that* annoying. I think it's nicer than quietly having two different union implementations or quietly breaking the existing `Option` impl.
### Crate Publish Ordering
I've modified the order in which CI publishes crates to ensure that we don't publish a crate without ensuring we already published a crate that it depends upon.
## TODO
- [ ] Queue a follow-up `[patch]`-removing PR.
[EIP-3030]: https://eips.ethereum.org/EIPS/eip-3030
[Web3Signer]: https://consensys.github.io/web3signer/web3signer-eth2.html
## Issue Addressed
Resolves#2498
## Proposed Changes
Allows the VC to call out to a [Web3Signer] remote signer to obtain signatures.
## Additional Info
### Making Signing Functions `async`
To allow remote signing, I needed to make all the signing functions `async`. This caused a bit of noise where I had to convert iterators into `for` loops.
In `duties_service.rs` there was a particularly tricky case where we couldn't hold a write-lock across an `await`, so I had to first take a read-lock, then grab a write-lock.
### Move Signing from Core Executor
Whilst implementing this feature, I noticed that we signing was happening on the core tokio executor. I suspect this was causing the executor to temporarily lock and occasionally trigger some HTTP timeouts (and potentially SQL pool timeouts, but I can't verify this). Since moving all signing into blocking tokio tasks, I noticed a distinct drop in the "atttestations_http_get" metric on a Prater node:
![http_get_times](https://user-images.githubusercontent.com/6660660/132143737-82fd3836-2e7e-445b-a143-cb347783baad.png)
I think this graph indicates that freeing the core executor allows the VC to operate more smoothly.
### Refactor TaskExecutor
I noticed that the `TaskExecutor::spawn_blocking_handle` function would fail to spawn tasks if it were unable to obtain handles to some metrics (this can happen if the same metric is defined twice). It seemed that a more sensible approach would be to keep spawning tasks, but without metrics. To that end, I refactored the function so that it would still function without metrics. There are no other changes made.
## TODO
- [x] Restructure to support multiple signing methods.
- [x] Add calls to remote signer from VC.
- [x] Documentation
- [x] Test all endpoints
- [x] Test HTTPS certificate
- [x] Allow adding remote signer validators via the API
- [x] Add Altair support via [21.8.1-rc1](https://github.com/ConsenSys/web3signer/releases/tag/21.8.1-rc1)
- [x] Create issue to start using latest version of web3signer. (See #2570)
## Notes
- ~~Web3Signer doesn't yet support the Altair fork for Prater. See https://github.com/ConsenSys/web3signer/issues/423.~~
- ~~There is not yet a release of Web3Signer which supports Altair blocks. See https://github.com/ConsenSys/web3signer/issues/391.~~
## Issue Addressed
Resolves#2438Resolves#2437
## Proposed Changes
Changes the permissions for validator client http server api token file and secret key to 600 from 644. Also changes the permission for logfiles generated using the `--logfile` cli option to 600.
Logs the path to the api token instead of the actual api token. Updates docs to reflect the change.
## Issue Addressed
Related to: #2259
Made an attempt at all the necessary updates here to publish the crates to crates.io. I incremented the minor versions on all the crates that have been previously published. We still might run into some issues as we try to publish because I'm not able to test this out but I think it's a good starting point.
## Proposed Changes
- Add description and license to `ssz_types` and `serde_util`
- rename `serde_util` to `eth2_serde_util`
- increment minor versions
- remove path dependencies
- remove patch dependencies
## Additional Info
Crates published:
- [x] `tree_hash` -- need to publish `tree_hash_derive` and `eth2_hashing` first
- [x] `eth2_ssz_types` -- need to publish `eth2_serde_util` first
- [x] `tree_hash_derive`
- [x] `eth2_ssz`
- [x] `eth2_ssz_derive`
- [x] `eth2_serde_util`
- [x] `eth2_hashing`
Co-authored-by: realbigsean <seananderson33@gmail.com>
## Proposed Changes
* Consolidate Tokio versions: everything now uses the latest v1.10.0, no more `tokio-compat`.
* Many semver-compatible changes via `cargo update`. Notably this upgrades from the yanked v0.8.0 version of crossbeam-deque which is present in v1.5.0-rc.0
* Many semver incompatible upgrades via `cargo upgrades` and `cargo upgrade --workspace pkg_name`. Notable ommissions:
- Prometheus, to be handled separately: https://github.com/sigp/lighthouse/issues/2485
- `rand`, `rand_xorshift`: the libsecp256k1 package requires 0.7.x, so we'll stick with that for now
- `ethereum-types` is pinned at 0.11.0 because that's what `web3` is using and it seems nice to have just a single version
## Additional Info
We still have two versions of `libp2p-core` due to `discv5` depending on the v0.29.0 release rather than `master`. AFAIK it should be OK to release in this state (cc @AgeManning )
## Issue Addressed
NA
## Proposed Changes
- Adds docs for Doppelganger Protection
- Shortens a log message since it was a bit longer than our usual formatting.
## Additional Info
Please provide any additional information. For example, future considerations
or information useful for reviewers.
## Issue Addressed
NA
## Proposed Changes
Registers validators with the doppelganger service at the earliest possible point.
This avoids the following (non-harmful, but scary) log when pruning the slashing DB on startup:
```
CRIT Validator unknown to doppelganger service, pubkey: 0xabc..., msg: preventing validator from performing duties, service: doppelganger
```
## Additional Info
NA
## Proposed Changes
* Implement the validator client and HTTP API changes necessary to support Altair
Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
## Issue Addressed
N/A
## Proposed Changes
This is just a cosmetic change to print only the unique list of violaters. We could repeat violaters in the list if an attestation and aggregation both were detected from the same validator.
## Issue Addressed
NA
## Proposed Changes
Fixes a bug in Doppelganger Protection which would cause false-positives when a VC is restarted in the same epoch where it has already produced a signed message.
It could also cause a false-negative in the scenario where time skips forward (perhaps due to host suspend/wake). The new `time_skips_forward_with_doppelgangers` test covers this case.
This was a simple (and embarrassing, on my behalf) `>=` instead of `<=` bug that was missed by my tests but detected during manual testing by @michaelsproul (🙏). Regression tests have been added.
## Additional Info
NA
## TODO
- [x] Add test for doppelganger in epoch > next_check_epoch
## Issue Addressed
Resolves#2069
## Proposed Changes
- Adds a `--doppelganger-detection` flag
- Adds a `lighthouse/seen_validators` endpoint, which will make it so the lighthouse VC is not interopable with other client beacon nodes if the `--doppelganger-detection` flag is used, but hopefully this will become standardized. Relevant Eth2 API repo issue: https://github.com/ethereum/eth2.0-APIs/issues/64
- If the `--doppelganger-detection` flag is used, the VC will wait until the beacon node is synced, and then wait an additional 2 epochs. The reason for this is to make sure the beacon node is able to subscribe to the subnets our validators should be attesting on. I think an alternative would be to have the beacon node subscribe to all subnets for 2+ epochs on startup by default.
## Additional Info
I'd like to add tests and would appreciate feedback.
TODO: handle validators started via the API, potentially make this default behavior
Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
N/A
## Proposed Changes
- Removing a bunch of unnecessary references
- Updated `Error::VariantError` to `Error::Variant`
- There were additional enum variant lints that I ignored, because I thought our variant names were fine
- removed `MonitoredValidator`'s `pubkey` field, because I couldn't find it used anywhere. It looks like we just use the string version of the pubkey (the `id` field) if there is no index
## Additional Info
Co-authored-by: realbigsean <seananderson33@gmail.com>
## Issue Addressed
#2454
## Proposed Changes
Adds the `--http-address` flag to allow the user to use custom HTTP addresses. This can be helpful for certain Docker setups.
Since using custom HTTP addresses is unsafe due to the server being unencrypted, `--unencrypted-http-transport` was also added as a safety flag and must be used in tandem with `--http-address`. This is to ensure the user is aware of the risks associated with using non-local HTTP addresses.
## Issue Addressed
Resolves#2313
## Proposed Changes
Provide `BeaconNodeHttpClient` with a dedicated `Timeouts` struct.
This will allow granular adjustment of the timeout duration for different calls made from the VC to the BN. These can either be a constant value, or as a ratio of the slot duration.
Improve timeout performance by using these adjusted timeout duration's only whenever a fallback endpoint is available.
Add a CLI flag called `use-long-timeouts` to revert to the old behavior.
## Additional Info
Additionally set the default `BeaconNodeHttpClient` timeouts to the be the slot duration of the network, rather than a constant 12 seconds. This will allow it to adjust to different network specifications.
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Proposed Changes
Implement the consensus changes necessary for the upcoming Altair hard fork.
## Additional Info
This is quite a heavy refactor, with pivotal types like the `BeaconState` and `BeaconBlock` changing from structs to enums. This ripples through the whole codebase with field accesses changing to methods, e.g. `state.slot` => `state.slot()`.
Co-authored-by: realbigsean <seananderson33@gmail.com>
## Issue Addressed
Closes#2354
## Proposed Changes
Add a `minify` method to `slashing_protection::Interchange` that keeps only the maximum-epoch attestation and maximum-slot block for each validator. Specifically, `minify` constructs "synthetic" attestations (with no `signing_root`) containing the maximum source epoch _and_ the maximum target epoch from the input. This is equivalent to the `minify_synth` algorithm that I've formally verified in this repository:
https://github.com/michaelsproul/slashing-proofs
## Additional Info
Includes the JSON loading optimisation from #2347
## Issue Addressed
`make lint` failing on rust 1.53.0.
## Proposed Changes
1.53.0 updates
## Additional Info
I haven't figure out why yet, we were now hitting the recursion limit in a few crates. So I had to add `#![recursion_limit = "256"]` in a few places
Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
## Issue Addressed
Windows incompatibility.
## Proposed Changes
On windows, lighthouse needs to default to STDIN as tty doesn't exist. Also Windows uses ACLs for file permissions. So to mirror chmod 600, we will remove every entry in a file's ACL and add only a single SID that is an alias for the file owner.
Beyond that, there were several changes made to different unit tests because windows has slightly different error messages as well as frustrating nuances around killing a process :/
## Additional Info
Tested on my Windows VM and it appears to work, also compiled & tested on Linux with these changes. Permissions look correct on both platforms now. Just waiting for my validator to activate on Prater so I can test running full validator client on windows.
Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>
Co-authored-by: Michael Sproul <micsproul@gmail.com>
## Issue Addressed
The latest version of Rust has new clippy rules & the codebase isn't up to date with them.
## Proposed Changes
Small formatting changes that clippy tells me are functionally equivalent
## Issue Addressed
#2276
## Proposed Changes
Add the `SensitiveUrl` struct which wraps `Url` and implements custom `Display` and `Debug` traits to redact user secrets from being logged in eth1 endpoints, beacon node endpoints and metrics.
## Additional Info
This also includes a small rewrite of the eth1 crate to make requests using `Url` instead of `&str`.
Some error messages have also been changed to remove `Url` data.
## Issue Addressed
NA
## Proposed Changes
Fixes a compile error when using the `milagro` feature. I can't see any need to use the specific BLST object here. @pawanjay176 can you please confirm?
## Additional Info
NA
## Issue Addressed
Closes#2274
## Proposed Changes
* Modify the `YamlConfig` to collect unknown fields into an `extra_fields` map, instead of failing hard.
* Log a debug message if there are extra fields returned to the VC from one of its BNs.
This restores Lighthouse's compatibility with Teku beacon nodes (and therefore Infura)
## Issue Addressed
Closes#2052
## Proposed Changes
- Refactor the attester/proposer duties endpoints in the BN
- Performance improvements
- Fixes some potential inconsistencies with the dependent root fields.
- Removes `http_api::beacon_proposer_cache` and just uses the one on the `BeaconChain` instead.
- Move the code for the proposer/attester duties endpoints into separate files, for readability.
- Refactor the `DutiesService` in the VC
- Required to reduce the delay on broadcasting new blocks.
- Gets rid of the `ValidatorDuty` shim struct that came about when we adopted the standard API.
- Separate block/attestation duty tasks so that they don't block each other when one is slow.
- In the VC, use `PublicKeyBytes` to represent validators instead of `PublicKey`. `PublicKey` is a legit crypto object whilst `PublicKeyBytes` is just a byte-array, it's much faster to clone/hash `PublicKeyBytes` and this change has had a significant impact on runtimes.
- Unfortunately this has created lots of dust changes.
- In the BN, store `PublicKeyBytes` in the `beacon_proposer_cache` and allow access to them. The HTTP API always sends `PublicKeyBytes` over the wire and the conversion from `PublicKey` -> `PublickeyBytes` is non-trivial, especially when queries have 100s/1000s of validators (like Pyrmont).
- Add the `state_processing::state_advance` mod which dedups a lot of the "apply `n` skip slots to the state" code.
- This also fixes a bug with some functions which were failing to include a state root as per [this comment](072695284f/consensus/state_processing/src/state_advance.rs (L69-L74)). I couldn't find any instance of this bug that resulted in anything more severe than keying a shuffling cache by the wrong block root.
- Swap the VC block service to use `mpsc` from `tokio` instead of `futures`. This is consistent with the rest of the code base.
~~This PR *reduces* the size of the codebase 🎉~~ It *used* to reduce the size of the code base before I added more comments.
## Observations on Prymont
- Proposer duties times down from peaks of 450ms to consistent <1ms.
- Current epoch attester duties times down from >1s peaks to a consistent 20-30ms.
- Block production down from +600ms to 100-200ms.
## Additional Info
- ~~Blocked on #2241~~
- ~~Blocked on #2234~~
## TODO
- [x] ~~Refactor this into some smaller PRs?~~ Leaving this as-is for now.
- [x] Address `per_slot_processing` roots.
- [x] Investigate slow next epoch times. Not getting added to cache on block processing?
- [x] Consider [this](072695284f/beacon_node/store/src/hot_cold_store.rs (L811-L812)) in the scenario of replacing the state roots
Co-authored-by: pawan <pawandhananjay@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
## Issue Addressed
Resolves#1944
## Proposed Changes
Adds a "graffiti" key to the `validator_definitions.yml`. Setting the key will override anything passed through the validator `--graffiti` flag.
Returns an error if the value for the graffiti key is > 32 bytes instead of silently truncating.
## Proposed Changes
Prune the slashing protection database so that it doesn't exhibit unbounded growth. Prune by dropping attestations and blocks from more than 512 epochs ago, relying on the guards that prevent signing messages with slots or epochs less than the minimum recorded in the DB.
The pruning process is potentially time consuming, so it's scheduled to run only every 512 epochs, in the last 2/3rds of a slot. This gives it at least 4 seconds to run without impacting other signing, which I think should be sufficient. I've seen it run for several minutes (yikes!) on our Pyrmont nodes, but I suspect that 1) this will only occur on the first run when the database is still huge 2) no other production users will be impacted because they don't have enough validators per node.
Pruning also happens at start-up, as I figured this is a fairly infrequent event, and if a user is experiencing problems with the VC related to pruning, it's nice to be able to trigger it with a quick restart. Users are also conditioned to not mind missing a few attestations during a restart.
We need to include a note in the release notes that users may see the message `timed out waiting for connection` the first time they prune a huge database, but that this is totally fine and to be expected (the VC will miss those attestations in the meantime).
I'm also open to making this opt-in for now, although the sooner we get users doing it, the less painful it will be: prune early, prune often!