lighthouse/validator_client/slashing_protection/src/interchange.rs

160 lines
5.8 KiB
Rust
Raw Normal View History

use crate::InterchangeError;
use serde_derive::{Deserialize, Serialize};
use std::cmp::max;
use std::collections::{HashMap, HashSet};
use std::io;
Optimize validator duties (#2243) ## Issue Addressed Closes #2052 ## Proposed Changes - Refactor the attester/proposer duties endpoints in the BN - Performance improvements - Fixes some potential inconsistencies with the dependent root fields. - Removes `http_api::beacon_proposer_cache` and just uses the one on the `BeaconChain` instead. - Move the code for the proposer/attester duties endpoints into separate files, for readability. - Refactor the `DutiesService` in the VC - Required to reduce the delay on broadcasting new blocks. - Gets rid of the `ValidatorDuty` shim struct that came about when we adopted the standard API. - Separate block/attestation duty tasks so that they don't block each other when one is slow. - In the VC, use `PublicKeyBytes` to represent validators instead of `PublicKey`. `PublicKey` is a legit crypto object whilst `PublicKeyBytes` is just a byte-array, it's much faster to clone/hash `PublicKeyBytes` and this change has had a significant impact on runtimes. - Unfortunately this has created lots of dust changes. - In the BN, store `PublicKeyBytes` in the `beacon_proposer_cache` and allow access to them. The HTTP API always sends `PublicKeyBytes` over the wire and the conversion from `PublicKey` -> `PublickeyBytes` is non-trivial, especially when queries have 100s/1000s of validators (like Pyrmont). - Add the `state_processing::state_advance` mod which dedups a lot of the "apply `n` skip slots to the state" code. - This also fixes a bug with some functions which were failing to include a state root as per [this comment](https://github.com/sigp/lighthouse/blob/072695284f7eff82c51f79bc921ad942fea7483a/consensus/state_processing/src/state_advance.rs#L69-L74). I couldn't find any instance of this bug that resulted in anything more severe than keying a shuffling cache by the wrong block root. - Swap the VC block service to use `mpsc` from `tokio` instead of `futures`. This is consistent with the rest of the code base. ~~This PR *reduces* the size of the codebase :tada:~~ It *used* to reduce the size of the code base before I added more comments. ## Observations on Prymont - Proposer duties times down from peaks of 450ms to consistent <1ms. - Current epoch attester duties times down from >1s peaks to a consistent 20-30ms. - Block production down from +600ms to 100-200ms. ## Additional Info - ~~Blocked on #2241~~ - ~~Blocked on #2234~~ ## TODO - [x] ~~Refactor this into some smaller PRs?~~ Leaving this as-is for now. - [x] Address `per_slot_processing` roots. - [x] Investigate slow next epoch times. Not getting added to cache on block processing? - [x] Consider [this](https://github.com/sigp/lighthouse/blob/072695284f7eff82c51f79bc921ad942fea7483a/beacon_node/store/src/hot_cold_store.rs#L811-L812) in the scenario of replacing the state roots Co-authored-by: pawan <pawandhananjay@gmail.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-03-17 05:09:57 +00:00
use types::{Epoch, Hash256, PublicKeyBytes, Slot};
#[derive(Debug, Clone, PartialEq, Deserialize, Serialize)]
#[serde(deny_unknown_fields)]
Fix assert in slashing protection import (#2881) ## Issue Addressed There was an overeager assert in the import of slashing protection data here: https://github.com/sigp/lighthouse/blob/fff01b24ddedcd54486e374460855ca20d3dd232/validator_client/slashing_protection/src/slashing_database.rs#L939 We were asserting that if the import contained any blocks for a validator, then the database should contain only a single block for that validator due to pruning/consolidation. However, we would only prune if the import contained _relevant blocks_ (that would actually change the maximum slot): https://github.com/sigp/lighthouse/blob/fff01b24ddedcd54486e374460855ca20d3dd232/validator_client/slashing_protection/src/slashing_database.rs#L629-L633 This lead to spurious failures (in the form of `ConsistencyError`s) when importing an interchange containing no new blocks for any of the validators. This wasn't hard to trigger, e.g. export and then immediately re-import the same file. ## Proposed Changes This PR fixes the issue by simplifying the import so that it's more like the import for attestations. I.e. we make the assert true by always pruning when the imported file contains blocks. In practice this doesn't have any downsides: if we import a new block then the behaviour is as before, except that we drop the `signing_root`. If we import an existing block or an old block then we prune the database to a single block. The only time this would be relevant is during extreme clock drift locally _plus_ import of a non-drifted interchange, which should occur infrequently. ## Additional Info I've also added `Arbitrary` implementations to the slashing protection types so that we can fuzz them. I have a fuzzer sitting in a separate directory which I may or may not commit in a subsequent PR. There's a new test in the standard interchange tests v5.2.1 that checks for this issue: https://github.com/eth-clients/slashing-protection-interchange-tests/pull/12
2022-01-04 20:46:44 +00:00
#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
pub struct InterchangeMetadata {
#[serde(with = "eth2_serde_utils::quoted_u64::require_quotes")]
pub interchange_format_version: u64,
pub genesis_validators_root: Hash256,
}
#[derive(Debug, Clone, PartialEq, Eq, Hash, Deserialize, Serialize)]
#[serde(deny_unknown_fields)]
Fix assert in slashing protection import (#2881) ## Issue Addressed There was an overeager assert in the import of slashing protection data here: https://github.com/sigp/lighthouse/blob/fff01b24ddedcd54486e374460855ca20d3dd232/validator_client/slashing_protection/src/slashing_database.rs#L939 We were asserting that if the import contained any blocks for a validator, then the database should contain only a single block for that validator due to pruning/consolidation. However, we would only prune if the import contained _relevant blocks_ (that would actually change the maximum slot): https://github.com/sigp/lighthouse/blob/fff01b24ddedcd54486e374460855ca20d3dd232/validator_client/slashing_protection/src/slashing_database.rs#L629-L633 This lead to spurious failures (in the form of `ConsistencyError`s) when importing an interchange containing no new blocks for any of the validators. This wasn't hard to trigger, e.g. export and then immediately re-import the same file. ## Proposed Changes This PR fixes the issue by simplifying the import so that it's more like the import for attestations. I.e. we make the assert true by always pruning when the imported file contains blocks. In practice this doesn't have any downsides: if we import a new block then the behaviour is as before, except that we drop the `signing_root`. If we import an existing block or an old block then we prune the database to a single block. The only time this would be relevant is during extreme clock drift locally _plus_ import of a non-drifted interchange, which should occur infrequently. ## Additional Info I've also added `Arbitrary` implementations to the slashing protection types so that we can fuzz them. I have a fuzzer sitting in a separate directory which I may or may not commit in a subsequent PR. There's a new test in the standard interchange tests v5.2.1 that checks for this issue: https://github.com/eth-clients/slashing-protection-interchange-tests/pull/12
2022-01-04 20:46:44 +00:00
#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
pub struct InterchangeData {
Optimize validator duties (#2243) ## Issue Addressed Closes #2052 ## Proposed Changes - Refactor the attester/proposer duties endpoints in the BN - Performance improvements - Fixes some potential inconsistencies with the dependent root fields. - Removes `http_api::beacon_proposer_cache` and just uses the one on the `BeaconChain` instead. - Move the code for the proposer/attester duties endpoints into separate files, for readability. - Refactor the `DutiesService` in the VC - Required to reduce the delay on broadcasting new blocks. - Gets rid of the `ValidatorDuty` shim struct that came about when we adopted the standard API. - Separate block/attestation duty tasks so that they don't block each other when one is slow. - In the VC, use `PublicKeyBytes` to represent validators instead of `PublicKey`. `PublicKey` is a legit crypto object whilst `PublicKeyBytes` is just a byte-array, it's much faster to clone/hash `PublicKeyBytes` and this change has had a significant impact on runtimes. - Unfortunately this has created lots of dust changes. - In the BN, store `PublicKeyBytes` in the `beacon_proposer_cache` and allow access to them. The HTTP API always sends `PublicKeyBytes` over the wire and the conversion from `PublicKey` -> `PublickeyBytes` is non-trivial, especially when queries have 100s/1000s of validators (like Pyrmont). - Add the `state_processing::state_advance` mod which dedups a lot of the "apply `n` skip slots to the state" code. - This also fixes a bug with some functions which were failing to include a state root as per [this comment](https://github.com/sigp/lighthouse/blob/072695284f7eff82c51f79bc921ad942fea7483a/consensus/state_processing/src/state_advance.rs#L69-L74). I couldn't find any instance of this bug that resulted in anything more severe than keying a shuffling cache by the wrong block root. - Swap the VC block service to use `mpsc` from `tokio` instead of `futures`. This is consistent with the rest of the code base. ~~This PR *reduces* the size of the codebase :tada:~~ It *used* to reduce the size of the code base before I added more comments. ## Observations on Prymont - Proposer duties times down from peaks of 450ms to consistent <1ms. - Current epoch attester duties times down from >1s peaks to a consistent 20-30ms. - Block production down from +600ms to 100-200ms. ## Additional Info - ~~Blocked on #2241~~ - ~~Blocked on #2234~~ ## TODO - [x] ~~Refactor this into some smaller PRs?~~ Leaving this as-is for now. - [x] Address `per_slot_processing` roots. - [x] Investigate slow next epoch times. Not getting added to cache on block processing? - [x] Consider [this](https://github.com/sigp/lighthouse/blob/072695284f7eff82c51f79bc921ad942fea7483a/beacon_node/store/src/hot_cold_store.rs#L811-L812) in the scenario of replacing the state roots Co-authored-by: pawan <pawandhananjay@gmail.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-03-17 05:09:57 +00:00
pub pubkey: PublicKeyBytes,
pub signed_blocks: Vec<SignedBlock>,
pub signed_attestations: Vec<SignedAttestation>,
}
#[derive(Debug, Clone, PartialEq, Eq, Hash, Deserialize, Serialize)]
#[serde(deny_unknown_fields)]
Fix assert in slashing protection import (#2881) ## Issue Addressed There was an overeager assert in the import of slashing protection data here: https://github.com/sigp/lighthouse/blob/fff01b24ddedcd54486e374460855ca20d3dd232/validator_client/slashing_protection/src/slashing_database.rs#L939 We were asserting that if the import contained any blocks for a validator, then the database should contain only a single block for that validator due to pruning/consolidation. However, we would only prune if the import contained _relevant blocks_ (that would actually change the maximum slot): https://github.com/sigp/lighthouse/blob/fff01b24ddedcd54486e374460855ca20d3dd232/validator_client/slashing_protection/src/slashing_database.rs#L629-L633 This lead to spurious failures (in the form of `ConsistencyError`s) when importing an interchange containing no new blocks for any of the validators. This wasn't hard to trigger, e.g. export and then immediately re-import the same file. ## Proposed Changes This PR fixes the issue by simplifying the import so that it's more like the import for attestations. I.e. we make the assert true by always pruning when the imported file contains blocks. In practice this doesn't have any downsides: if we import a new block then the behaviour is as before, except that we drop the `signing_root`. If we import an existing block or an old block then we prune the database to a single block. The only time this would be relevant is during extreme clock drift locally _plus_ import of a non-drifted interchange, which should occur infrequently. ## Additional Info I've also added `Arbitrary` implementations to the slashing protection types so that we can fuzz them. I have a fuzzer sitting in a separate directory which I may or may not commit in a subsequent PR. There's a new test in the standard interchange tests v5.2.1 that checks for this issue: https://github.com/eth-clients/slashing-protection-interchange-tests/pull/12
2022-01-04 20:46:44 +00:00
#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
pub struct SignedBlock {
#[serde(with = "eth2_serde_utils::quoted_u64::require_quotes")]
pub slot: Slot,
#[serde(skip_serializing_if = "Option::is_none")]
pub signing_root: Option<Hash256>,
}
#[derive(Debug, Clone, PartialEq, Eq, Hash, Deserialize, Serialize)]
#[serde(deny_unknown_fields)]
Fix assert in slashing protection import (#2881) ## Issue Addressed There was an overeager assert in the import of slashing protection data here: https://github.com/sigp/lighthouse/blob/fff01b24ddedcd54486e374460855ca20d3dd232/validator_client/slashing_protection/src/slashing_database.rs#L939 We were asserting that if the import contained any blocks for a validator, then the database should contain only a single block for that validator due to pruning/consolidation. However, we would only prune if the import contained _relevant blocks_ (that would actually change the maximum slot): https://github.com/sigp/lighthouse/blob/fff01b24ddedcd54486e374460855ca20d3dd232/validator_client/slashing_protection/src/slashing_database.rs#L629-L633 This lead to spurious failures (in the form of `ConsistencyError`s) when importing an interchange containing no new blocks for any of the validators. This wasn't hard to trigger, e.g. export and then immediately re-import the same file. ## Proposed Changes This PR fixes the issue by simplifying the import so that it's more like the import for attestations. I.e. we make the assert true by always pruning when the imported file contains blocks. In practice this doesn't have any downsides: if we import a new block then the behaviour is as before, except that we drop the `signing_root`. If we import an existing block or an old block then we prune the database to a single block. The only time this would be relevant is during extreme clock drift locally _plus_ import of a non-drifted interchange, which should occur infrequently. ## Additional Info I've also added `Arbitrary` implementations to the slashing protection types so that we can fuzz them. I have a fuzzer sitting in a separate directory which I may or may not commit in a subsequent PR. There's a new test in the standard interchange tests v5.2.1 that checks for this issue: https://github.com/eth-clients/slashing-protection-interchange-tests/pull/12
2022-01-04 20:46:44 +00:00
#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
pub struct SignedAttestation {
#[serde(with = "eth2_serde_utils::quoted_u64::require_quotes")]
pub source_epoch: Epoch,
#[serde(with = "eth2_serde_utils::quoted_u64::require_quotes")]
pub target_epoch: Epoch,
#[serde(skip_serializing_if = "Option::is_none")]
pub signing_root: Option<Hash256>,
}
#[derive(Debug, Clone, PartialEq, Deserialize, Serialize)]
Fix assert in slashing protection import (#2881) ## Issue Addressed There was an overeager assert in the import of slashing protection data here: https://github.com/sigp/lighthouse/blob/fff01b24ddedcd54486e374460855ca20d3dd232/validator_client/slashing_protection/src/slashing_database.rs#L939 We were asserting that if the import contained any blocks for a validator, then the database should contain only a single block for that validator due to pruning/consolidation. However, we would only prune if the import contained _relevant blocks_ (that would actually change the maximum slot): https://github.com/sigp/lighthouse/blob/fff01b24ddedcd54486e374460855ca20d3dd232/validator_client/slashing_protection/src/slashing_database.rs#L629-L633 This lead to spurious failures (in the form of `ConsistencyError`s) when importing an interchange containing no new blocks for any of the validators. This wasn't hard to trigger, e.g. export and then immediately re-import the same file. ## Proposed Changes This PR fixes the issue by simplifying the import so that it's more like the import for attestations. I.e. we make the assert true by always pruning when the imported file contains blocks. In practice this doesn't have any downsides: if we import a new block then the behaviour is as before, except that we drop the `signing_root`. If we import an existing block or an old block then we prune the database to a single block. The only time this would be relevant is during extreme clock drift locally _plus_ import of a non-drifted interchange, which should occur infrequently. ## Additional Info I've also added `Arbitrary` implementations to the slashing protection types so that we can fuzz them. I have a fuzzer sitting in a separate directory which I may or may not commit in a subsequent PR. There's a new test in the standard interchange tests v5.2.1 that checks for this issue: https://github.com/eth-clients/slashing-protection-interchange-tests/pull/12
2022-01-04 20:46:44 +00:00
#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
pub struct Interchange {
pub metadata: InterchangeMetadata,
pub data: Vec<InterchangeData>,
}
impl Interchange {
pub fn from_json_str(json: &str) -> Result<Self, serde_json::Error> {
serde_json::from_str(json)
}
pub fn from_json_reader(mut reader: impl std::io::Read) -> Result<Self, io::Error> {
// We read the entire file into memory first, as this is *a lot* faster than using
// `serde_json::from_reader`. See https://github.com/serde-rs/json/issues/160
let mut json_str = String::new();
reader.read_to_string(&mut json_str)?;
Ok(Interchange::from_json_str(&json_str)?)
}
pub fn write_to(&self, writer: impl std::io::Write) -> Result<(), serde_json::Error> {
serde_json::to_writer(writer, self)
}
/// Do these two `Interchange`s contain the same data (ignoring ordering)?
pub fn equiv(&self, other: &Self) -> bool {
let self_set = self.data.iter().collect::<HashSet<_>>();
let other_set = other.data.iter().collect::<HashSet<_>>();
self.metadata == other.metadata && self_set == other_set
}
/// The number of entries in `data`.
pub fn len(&self) -> usize {
self.data.len()
}
/// Is the `data` part of the interchange completely empty?
pub fn is_empty(&self) -> bool {
self.len() == 0
}
/// Minify an interchange by constructing a synthetic block & attestation for each validator.
pub fn minify(&self) -> Result<Self, InterchangeError> {
// Map from pubkey to optional max block and max attestation.
let mut validator_data =
HashMap::<PublicKeyBytes, (Option<SignedBlock>, Option<SignedAttestation>)>::new();
for data in self.data.iter() {
// Existing maximum attestation and maximum block.
let (max_block, max_attestation) = validator_data
.entry(data.pubkey)
.or_insert_with(|| (None, None));
// Find maximum source and target epochs.
let max_source_epoch = data
.signed_attestations
.iter()
.map(|attestation| attestation.source_epoch)
.max();
let max_target_epoch = data
.signed_attestations
.iter()
.map(|attestation| attestation.target_epoch)
.max();
match (max_source_epoch, max_target_epoch) {
(Some(source_epoch), Some(target_epoch)) => {
if let Some(prev_max) = max_attestation {
prev_max.source_epoch = max(prev_max.source_epoch, source_epoch);
prev_max.target_epoch = max(prev_max.target_epoch, target_epoch);
} else {
*max_attestation = Some(SignedAttestation {
source_epoch,
target_epoch,
signing_root: None,
});
}
}
(None, None) => {}
Make slashing protection import more resilient (#2598) ## Issue Addressed Closes #2419 ## Proposed Changes Address a long-standing issue with the import of slashing protection data where the import would fail due to the data appearing slashable w.r.t the existing database. Importing is now idempotent, and will have no issues importing data that has been handed back and forth between different validator clients, or different implementations. The implementation works by updating the high and low watermarks if they need updating, and not attempting to check if the input is slashable w.r.t itself or the database. This is a strengthening of the minification that we started to do by default since #2380, and what Teku has been doing since the beginning. ## Additional Info The only feature we lose by doing this is the ability to do non-minified imports of clock drifted messages (cf. Prysm on Medalla). In theory, with the previous implementation we could import all the messages in case of clock drift and be aware of the "gap" between the real present time and the messages signed in the far future. _However_ for attestations this is close to useless, as the source epoch will advance as soon as justification occurs, which will require us to make slashable attestations with respect to our bogus attestation(s). E.g. if I sign an attestation 100=>200 when the current epoch is 101, then I won't be able to vote in any epochs prior to 101 becoming justified because 101=>102, 101=>103, etc are all surrounded by 100=>200. Seeing as signing attestations gets blocked almost immediately in this case regardless of our import behaviour, there's no point trying to handle it. For blocks the situation is more hopeful due to the lack of surrounds, but losing block proposals from validators who by definition can't attest doesn't seem like an issue (the other block proposers can pick up the slack).
2021-10-13 01:49:51 +00:00
_ => return Err(InterchangeError::MaxInconsistent),
};
// Find maximum block slot.
let max_block_slot = data.signed_blocks.iter().map(|block| block.slot).max();
if let Some(max_slot) = max_block_slot {
if let Some(prev_max) = max_block {
prev_max.slot = max(prev_max.slot, max_slot);
} else {
*max_block = Some(SignedBlock {
slot: max_slot,
signing_root: None,
});
}
}
}
let data = validator_data
.into_iter()
.map(|(pubkey, (maybe_block, maybe_att))| InterchangeData {
pubkey,
signed_blocks: maybe_block.into_iter().collect(),
signed_attestations: maybe_att.into_iter().collect(),
})
.collect();
Ok(Self {
metadata: self.metadata.clone(),
data,
})
}
}