lighthouse/validator_client/src/validator_store.rs
Paul Hauner 015ab7d0a7 Optimize validator duties (#2243)
## Issue Addressed

Closes #2052

## Proposed Changes

- Refactor the attester/proposer duties endpoints in the BN
    - Performance improvements
    - Fixes some potential inconsistencies with the dependent root fields.
    - Removes `http_api::beacon_proposer_cache` and just uses the one on the `BeaconChain` instead.
    - Move the code for the proposer/attester duties endpoints into separate files, for readability.
- Refactor the `DutiesService` in the VC
    - Required to reduce the delay on broadcasting new blocks.
    - Gets rid of the `ValidatorDuty` shim struct that came about when we adopted the standard API.
    - Separate block/attestation duty tasks so that they don't block each other when one is slow.
- In the VC, use `PublicKeyBytes` to represent validators instead of `PublicKey`. `PublicKey` is a legit crypto object whilst `PublicKeyBytes` is just a byte-array, it's much faster to clone/hash `PublicKeyBytes` and this change has had a significant impact on runtimes.
    - Unfortunately this has created lots of dust changes.
 - In the BN, store `PublicKeyBytes` in the `beacon_proposer_cache` and allow access to them. The HTTP API always sends `PublicKeyBytes` over the wire and the conversion from `PublicKey` -> `PublickeyBytes` is non-trivial, especially when queries have 100s/1000s of validators (like Pyrmont).
 - Add the `state_processing::state_advance` mod which dedups a lot of the "apply `n` skip slots to the state" code.
    - This also fixes a bug with some functions which were failing to include a state root as per [this comment](072695284f/consensus/state_processing/src/state_advance.rs (L69-L74)). I couldn't find any instance of this bug that resulted in anything more severe than keying a shuffling cache by the wrong block root.
 - Swap the VC block service to use `mpsc` from `tokio` instead of `futures`. This is consistent with the rest of the code base.
    
~~This PR *reduces* the size of the codebase 🎉~~ It *used* to reduce the size of the code base before I added more comments. 

## Observations on Prymont

- Proposer duties times down from peaks of 450ms to consistent <1ms.
- Current epoch attester duties times down from >1s peaks to a consistent 20-30ms.
- Block production down from +600ms to 100-200ms.

## Additional Info

- ~~Blocked on #2241~~
- ~~Blocked on #2234~~

## TODO

- [x] ~~Refactor this into some smaller PRs?~~ Leaving this as-is for now.
- [x] Address `per_slot_processing` roots.
- [x] Investigate slow next epoch times. Not getting added to cache on block processing?
- [x] Consider [this](072695284f/beacon_node/store/src/hot_cold_store.rs (L811-L812)) in the scenario of replacing the state roots


Co-authored-by: pawan <pawandhananjay@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-03-17 05:09:57 +00:00

444 lines
15 KiB
Rust

use crate::{
fork_service::ForkService, http_metrics::metrics, initialized_validators::InitializedValidators,
};
use account_utils::{validator_definitions::ValidatorDefinition, ZeroizeString};
use parking_lot::{Mutex, RwLock};
use slashing_protection::{NotSafe, Safe, SlashingDatabase};
use slog::{crit, error, info, warn, Logger};
use slot_clock::SlotClock;
use std::path::Path;
use std::sync::Arc;
use tempfile::TempDir;
use types::{
graffiti::GraffitiString, Attestation, BeaconBlock, ChainSpec, Domain, Epoch, EthSpec, Fork,
Graffiti, Hash256, Keypair, PublicKeyBytes, SelectionProof, Signature, SignedAggregateAndProof,
SignedBeaconBlock, SignedRoot, Slot,
};
use validator_dir::ValidatorDir;
/// Number of epochs of slashing protection history to keep.
///
/// This acts as a maximum safe-guard against clock drift.
const SLASHING_PROTECTION_HISTORY_EPOCHS: u64 = 512;
struct LocalValidator {
validator_dir: ValidatorDir,
voting_keypair: Keypair,
}
/// We derive our own `PartialEq` to avoid doing equality checks between secret keys.
///
/// It's nice to avoid secret key comparisons from a security perspective, but it's also a little
/// risky when it comes to `HashMap` integrity (that's why we need `PartialEq`).
///
/// Currently, we obtain keypairs from keystores where we derive the `PublicKey` from a `SecretKey`
/// via a hash function. In order to have two equal `PublicKey` with different `SecretKey` we would
/// need to have either:
///
/// - A serious upstream integrity error.
/// - A hash collision.
///
/// It seems reasonable to make these two assumptions in order to avoid the equality checks.
impl PartialEq for LocalValidator {
fn eq(&self, other: &Self) -> bool {
self.validator_dir == other.validator_dir
&& self.voting_keypair.pk == other.voting_keypair.pk
}
}
#[derive(Clone)]
pub struct ValidatorStore<T, E: EthSpec> {
validators: Arc<RwLock<InitializedValidators>>,
slashing_protection: SlashingDatabase,
slashing_protection_last_prune: Arc<Mutex<Epoch>>,
genesis_validators_root: Hash256,
spec: Arc<ChainSpec>,
log: Logger,
temp_dir: Option<Arc<TempDir>>,
fork_service: ForkService<T, E>,
}
impl<T: SlotClock + 'static, E: EthSpec> ValidatorStore<T, E> {
pub fn new(
validators: InitializedValidators,
slashing_protection: SlashingDatabase,
genesis_validators_root: Hash256,
spec: ChainSpec,
fork_service: ForkService<T, E>,
log: Logger,
) -> Self {
Self {
validators: Arc::new(RwLock::new(validators)),
slashing_protection,
slashing_protection_last_prune: Arc::new(Mutex::new(Epoch::new(0))),
genesis_validators_root,
spec: Arc::new(spec),
log,
temp_dir: None,
fork_service,
}
}
pub fn initialized_validators(&self) -> Arc<RwLock<InitializedValidators>> {
self.validators.clone()
}
/// Insert a new validator to `self`, where the validator is represented by an EIP-2335
/// keystore on the filesystem.
///
/// This function includes:
///
/// - Add the validator definition to the YAML file, saving it to the filesystem.
/// - Enable validator with the slashing protection database.
/// - If `enable == true`, start performing duties for the validator.
pub async fn add_validator_keystore<P: AsRef<Path>>(
&self,
voting_keystore_path: P,
password: ZeroizeString,
enable: bool,
graffiti: Option<GraffitiString>,
) -> Result<ValidatorDefinition, String> {
let mut validator_def = ValidatorDefinition::new_keystore_with_password(
voting_keystore_path,
Some(password),
graffiti.map(Into::into),
)
.map_err(|e| format!("failed to create validator definitions: {:?}", e))?;
self.slashing_protection
.register_validator(validator_def.voting_public_key.compress())
.map_err(|e| format!("failed to register validator: {:?}", e))?;
validator_def.enabled = enable;
self.validators
.write()
.add_definition(validator_def.clone())
.await
.map_err(|e| format!("Unable to add definition: {:?}", e))?;
Ok(validator_def)
}
pub fn voting_pubkeys(&self) -> Vec<PublicKeyBytes> {
self.validators
.read()
.iter_voting_pubkeys()
.cloned()
.collect()
}
pub fn num_voting_validators(&self) -> usize {
self.validators.read().num_enabled()
}
fn fork(&self) -> Fork {
self.fork_service.fork()
}
pub fn randao_reveal(
&self,
validator_pubkey: &PublicKeyBytes,
epoch: Epoch,
) -> Option<Signature> {
self.validators
.read()
.voting_keypair(validator_pubkey)
.map(|voting_keypair| {
let domain = self.spec.get_domain(
epoch,
Domain::Randao,
&self.fork(),
self.genesis_validators_root,
);
let message = epoch.signing_root(domain);
voting_keypair.sk.sign(message)
})
}
pub fn graffiti(&self, validator_pubkey: &PublicKeyBytes) -> Option<Graffiti> {
self.validators.read().graffiti(validator_pubkey)
}
pub fn sign_block(
&self,
validator_pubkey: &PublicKeyBytes,
block: BeaconBlock<E>,
current_slot: Slot,
) -> Option<SignedBeaconBlock<E>> {
// Make sure the block slot is not higher than the current slot to avoid potential attacks.
if block.slot > current_slot {
warn!(
self.log,
"Not signing block with slot greater than current slot";
"block_slot" => block.slot.as_u64(),
"current_slot" => current_slot.as_u64()
);
return None;
}
// Check for slashing conditions.
let fork = self.fork();
let domain = self.spec.get_domain(
block.epoch(),
Domain::BeaconProposer,
&fork,
self.genesis_validators_root,
);
let slashing_status = self.slashing_protection.check_and_insert_block_proposal(
validator_pubkey,
&block.block_header(),
domain,
);
match slashing_status {
// We can safely sign this block.
Ok(Safe::Valid) => {
let validators = self.validators.read();
let voting_keypair = validators.voting_keypair(validator_pubkey)?;
metrics::inc_counter_vec(&metrics::SIGNED_BLOCKS_TOTAL, &[metrics::SUCCESS]);
Some(block.sign(
&voting_keypair.sk,
&fork,
self.genesis_validators_root,
&self.spec,
))
}
Ok(Safe::SameData) => {
warn!(
self.log,
"Skipping signing of previously signed block";
);
metrics::inc_counter_vec(&metrics::SIGNED_BLOCKS_TOTAL, &[metrics::SAME_DATA]);
None
}
Err(NotSafe::UnregisteredValidator(pk)) => {
warn!(
self.log,
"Not signing block for unregistered validator";
"msg" => "Carefully consider running with --init-slashing-protection (see --help)",
"public_key" => format!("{:?}", pk)
);
metrics::inc_counter_vec(&metrics::SIGNED_BLOCKS_TOTAL, &[metrics::UNREGISTERED]);
None
}
Err(e) => {
crit!(
self.log,
"Not signing slashable block";
"error" => format!("{:?}", e)
);
metrics::inc_counter_vec(&metrics::SIGNED_BLOCKS_TOTAL, &[metrics::SLASHABLE]);
None
}
}
}
pub fn sign_attestation(
&self,
validator_pubkey: &PublicKeyBytes,
validator_committee_position: usize,
attestation: &mut Attestation<E>,
current_epoch: Epoch,
) -> Option<()> {
// Make sure the target epoch is not higher than the current epoch to avoid potential attacks.
if attestation.data.target.epoch > current_epoch {
return None;
}
// Checking for slashing conditions.
let fork = self.fork();
let domain = self.spec.get_domain(
attestation.data.target.epoch,
Domain::BeaconAttester,
&fork,
self.genesis_validators_root,
);
let slashing_status = self.slashing_protection.check_and_insert_attestation(
validator_pubkey,
&attestation.data,
domain,
);
match slashing_status {
// We can safely sign this attestation.
Ok(Safe::Valid) => {
let validators = self.validators.read();
let voting_keypair = validators.voting_keypair(validator_pubkey)?;
attestation
.sign(
&voting_keypair.sk,
validator_committee_position,
&fork,
self.genesis_validators_root,
&self.spec,
)
.map_err(|e| {
error!(
self.log,
"Error whilst signing attestation";
"error" => format!("{:?}", e)
)
})
.ok()?;
metrics::inc_counter_vec(&metrics::SIGNED_ATTESTATIONS_TOTAL, &[metrics::SUCCESS]);
Some(())
}
Ok(Safe::SameData) => {
warn!(
self.log,
"Skipping signing of previously signed attestation"
);
metrics::inc_counter_vec(
&metrics::SIGNED_ATTESTATIONS_TOTAL,
&[metrics::SAME_DATA],
);
None
}
Err(NotSafe::UnregisteredValidator(pk)) => {
warn!(
self.log,
"Not signing attestation for unregistered validator";
"msg" => "Carefully consider running with --init-slashing-protection (see --help)",
"public_key" => format!("{:?}", pk)
);
metrics::inc_counter_vec(
&metrics::SIGNED_ATTESTATIONS_TOTAL,
&[metrics::UNREGISTERED],
);
None
}
Err(e) => {
crit!(
self.log,
"Not signing slashable attestation";
"attestation" => format!("{:?}", attestation.data),
"error" => format!("{:?}", e)
);
metrics::inc_counter_vec(
&metrics::SIGNED_ATTESTATIONS_TOTAL,
&[metrics::SLASHABLE],
);
None
}
}
}
/// Signs an `AggregateAndProof` for a given validator.
///
/// The resulting `SignedAggregateAndProof` is sent on the aggregation channel and cannot be
/// modified by actors other than the signing validator.
pub fn produce_signed_aggregate_and_proof(
&self,
validator_pubkey: &PublicKeyBytes,
validator_index: u64,
aggregate: Attestation<E>,
selection_proof: SelectionProof,
) -> Option<SignedAggregateAndProof<E>> {
let validators = self.validators.read();
let voting_keypair = &validators.voting_keypair(validator_pubkey)?;
metrics::inc_counter_vec(&metrics::SIGNED_AGGREGATES_TOTAL, &[metrics::SUCCESS]);
Some(SignedAggregateAndProof::from_aggregate(
validator_index,
aggregate,
Some(selection_proof),
&voting_keypair.sk,
&self.fork(),
self.genesis_validators_root,
&self.spec,
))
}
/// Produces a `SelectionProof` for the `slot`, signed by with corresponding secret key to
/// `validator_pubkey`.
pub fn produce_selection_proof(
&self,
validator_pubkey: &PublicKeyBytes,
slot: Slot,
) -> Option<SelectionProof> {
let validators = self.validators.read();
let voting_keypair = &validators.voting_keypair(validator_pubkey)?;
metrics::inc_counter_vec(&metrics::SIGNED_SELECTION_PROOFS_TOTAL, &[metrics::SUCCESS]);
Some(SelectionProof::new::<E>(
slot,
&voting_keypair.sk,
&self.fork(),
self.genesis_validators_root,
&self.spec,
))
}
/// Prune the slashing protection database so that it remains performant.
///
/// This function will only do actual pruning periodically, so it should usually be
/// cheap to call. The `first_run` flag can be used to print a more verbose message when pruning
/// runs.
pub fn prune_slashing_protection_db(&self, current_epoch: Epoch, first_run: bool) {
// Attempt to prune every SLASHING_PROTECTION_HISTORY_EPOCHs, with a tolerance for
// missing the epoch that aligns exactly.
let mut last_prune = self.slashing_protection_last_prune.lock();
if current_epoch / SLASHING_PROTECTION_HISTORY_EPOCHS
<= *last_prune / SLASHING_PROTECTION_HISTORY_EPOCHS
{
return;
}
if first_run {
info!(
self.log,
"Pruning slashing protection DB";
"epoch" => current_epoch,
"msg" => "pruning may take several minutes the first time it runs"
);
} else {
info!(self.log, "Pruning slashing protection DB"; "epoch" => current_epoch);
}
let _timer = metrics::start_timer(&metrics::SLASHING_PROTECTION_PRUNE_TIMES);
let new_min_target_epoch = current_epoch.saturating_sub(SLASHING_PROTECTION_HISTORY_EPOCHS);
let new_min_slot = new_min_target_epoch.start_slot(E::slots_per_epoch());
let validators = self.validators.read();
if let Err(e) = self
.slashing_protection
.prune_all_signed_attestations(validators.iter_voting_pubkeys(), new_min_target_epoch)
{
error!(
self.log,
"Error during pruning of signed attestations";
"error" => ?e,
);
return;
}
if let Err(e) = self
.slashing_protection
.prune_all_signed_blocks(validators.iter_voting_pubkeys(), new_min_slot)
{
error!(
self.log,
"Error during pruning of signed blocks";
"error" => ?e,
);
return;
}
*last_prune = current_epoch;
info!(self.log, "Completed pruning of slashing protection DB");
}
}