Super Silky Smooth Syncs, like a Sir (#1628)

## Issue Addressed
In principle.. closes #1551 but in general are improvements for performance, maintainability and readability. The logic for the optimistic sync in actually simple

## Proposed Changes
There are miscellaneous things here:
- Remove unnecessary `BatchProcessResult::Partial` to simplify the batch validation logic
- Make batches a state machine. This is done to ensure batch state transitions respect our logic (this was previously done by moving batches between `Vec`s) and to ease the cognitive load of the `SyncingChain` struct
- Move most batch-related logic to the batch
- Remove `PendingBatches` in favor of a map of peers to their batches. This is to avoid duplicating peers inside the chain (peer_pool and pending_batches)
- Add `must_use` decoration to the `ProcessingResult` so that chains that request to be removed are handled accordingly. This also means that chains are now removed in more places than before to account for unhandled cases
- Store batches in a sorted map (`BTreeMap`) access is not O(1) but since the number of _active_ batches is bounded this should be fast, and saves performing hashing ops. Batches are indexed by the epoch they start. Sorted, to easily handle chain advancements (range logic)
- Produce the chain Id from the identifying fields: target root and target slot. This, to guarantee there can't be duplicated chains and be able to consistently search chains by either Id or checkpoint
- Fix chain_id not being present in all chain loggers
- Handle mega-edge case where the processor's work queue is full and the batch can't be sent. In this case the chain would lose the blocks, remain in a "syncing" state and waiting for a result that won't arrive, effectively stalling sync.
- When a batch imports blocks or the chain starts syncing with a local finalized epoch greater that the chain's start epoch, the chain is advanced instead of reset. This is to avoid losing download progress and validate batches faster. This also means that the old `start_epoch` now means "current first unvalidated batch", so it represents more accurately the progress of the chain.
- Batch status peers from the same chain to reduce Arc access.
- Handle a couple of cases where the retry counters for a batch were not updated/checked are now handled via the batch state machine. Basically now if we forget to do it, we will know.
- Do not send back the blocks from the processor to the batch. Instead register the attempt before sending the blocks (does not count as failed)
- When re-requesting a batch, try to avoid not only the last failed peer, but all previous failed peers.
- Optimize requesting batches ahead in the buffer by shuffling idle peers just once (this is just addressing a couple of old TODOs in the code)
- In chain_collection, store chains by their id in a map
- Include a mapping from request_ids to (chain, batch) that requested the batch to avoid the double O(n) search on block responses
- Other stuff:
  - impl `slog::KV` for batches
  - impl `slog::KV` for syncing chains
  - PSA: when logging, we can use `%thing` if `thing` implements `Display`. Same for `?` and `Debug`

### Optimistic syncing:
Try first the batch that contains the current head, if the batch imports any block, advance the chain. If not, if this optimistic batch is inside the current processing window leave it there for future use, if not drop it. The tolerance for this block is the same for downloading, but just once for processing



Co-authored-by: Age Manning <Age@AgeManning.com>
This commit is contained in:
divma 2020-09-23 06:29:55 +00:00
parent 80e52a0263
commit b8013b7b2c
13 changed files with 1480 additions and 1199 deletions

View File

@ -35,7 +35,6 @@ tokio-util = { version = "0.3.1", features = ["codec", "compat"] }
discv5 = { version = "0.1.0-alpha.10", features = ["libp2p"] }
tiny-keccak = "2.0.2"
environment = { path = "../../lighthouse/environment" }
# TODO: Remove rand crate for mainnet
rand = "0.7.3"
regex = "1.3.9"

View File

@ -32,6 +32,7 @@ tokio = { version = "0.2.21", features = ["full"] }
parking_lot = "0.11.0"
smallvec = "1.4.1"
# TODO: Remove rand crate for mainnet
# NOTE: why?
rand = "0.7.3"
fnv = "1.0.6"
rlp = "0.4.5"

View File

@ -28,39 +28,26 @@ pub fn handle_chain_segment<T: BeaconChainTypes>(
match process_id {
// this a request from the range sync
ProcessId::RangeBatchId(chain_id, epoch) => {
let len = downloaded_blocks.len();
let start_slot = if len > 0 {
downloaded_blocks[0].message.slot.as_u64()
} else {
0
};
let end_slot = if len > 0 {
downloaded_blocks[len - 1].message.slot.as_u64()
} else {
0
};
let start_slot = downloaded_blocks.first().map(|b| b.message.slot.as_u64());
let end_slot = downloaded_blocks.last().map(|b| b.message.slot.as_u64());
let sent_blocks = downloaded_blocks.len();
debug!(log, "Processing batch"; "batch_epoch" => epoch, "blocks" => downloaded_blocks.len(), "first_block_slot" => start_slot, "last_block_slot" => end_slot, "service" => "sync");
let result = match process_blocks(chain, downloaded_blocks.iter(), &log) {
(_, Ok(_)) => {
debug!(log, "Batch processed"; "batch_epoch" => epoch , "first_block_slot" => start_slot, "last_block_slot" => end_slot, "service"=> "sync");
BatchProcessResult::Success
debug!(log, "Batch processed"; "batch_epoch" => epoch, "first_block_slot" => start_slot,
"last_block_slot" => end_slot, "processed_blocks" => sent_blocks, "service"=> "sync");
BatchProcessResult::Success(sent_blocks > 0)
}
(imported_blocks, Err(e)) if imported_blocks > 0 => {
debug!(log, "Batch processing failed but imported some blocks";
"batch_epoch" => epoch, "error" => e, "imported_blocks"=> imported_blocks, "service" => "sync");
BatchProcessResult::Partial
}
(_, Err(e)) => {
debug!(log, "Batch processing failed"; "batch_epoch" => epoch, "error" => e, "service" => "sync");
BatchProcessResult::Failed
(imported_blocks, Err(e)) => {
debug!(log, "Batch processing failed"; "batch_epoch" => epoch, "first_block_slot" => start_slot,
"last_block_slot" => end_slot, "error" => e, "imported_blocks" => imported_blocks, "service" => "sync");
BatchProcessResult::Failed(imported_blocks > 0)
}
};
let msg = SyncMessage::BatchProcessed {
chain_id,
epoch,
downloaded_blocks,
result,
};
sync_send.send(msg).unwrap_or_else(|_| {
@ -70,7 +57,7 @@ pub fn handle_chain_segment<T: BeaconChainTypes>(
);
});
}
// this a parent lookup request from the sync manager
// this is a parent lookup request from the sync manager
ProcessId::ParentLookup(peer_id, chain_head) => {
debug!(
log, "Processing parent lookup";
@ -81,7 +68,7 @@ pub fn handle_chain_segment<T: BeaconChainTypes>(
// reverse
match process_blocks(chain, downloaded_blocks.iter().rev(), &log) {
(_, Err(e)) => {
debug!(log, "Parent lookup failed"; "last_peer_id" => format!("{}", peer_id), "error" => e);
debug!(log, "Parent lookup failed"; "last_peer_id" => %peer_id, "error" => e);
sync_send
.send(SyncMessage::ParentLookupFailed{peer_id, chain_head})
.unwrap_or_else(|_| {
@ -114,13 +101,7 @@ fn process_blocks<
match chain.process_chain_segment(blocks) {
ChainSegmentResult::Successful { imported_blocks } => {
metrics::inc_counter(&metrics::BEACON_PROCESSOR_CHAIN_SEGMENT_SUCCESS_TOTAL);
if imported_blocks == 0 {
debug!(log, "All blocks already known");
} else {
debug!(
log, "Imported blocks from network";
"count" => imported_blocks,
);
if imported_blocks > 0 {
// Batch completed successfully with at least one block, run fork choice.
run_fork_choice(chain, log);
}
@ -153,7 +134,7 @@ fn run_fork_choice<T: BeaconChainTypes>(chain: Arc<BeaconChain<T>>, log: &slog::
Err(e) => error!(
log,
"Fork choice failed";
"error" => format!("{:?}", e),
"error" => ?e,
"location" => "batch import error"
),
}
@ -219,7 +200,7 @@ fn handle_failed_chain_segment<T: EthSpec>(
warn!(
log, "BlockProcessingFailure";
"msg" => "unexpected condition in processing block.",
"outcome" => format!("{:?}", e)
"outcome" => ?e,
);
Err(format!("Internal error whilst processing block: {:?}", e))
@ -228,7 +209,7 @@ fn handle_failed_chain_segment<T: EthSpec>(
debug!(
log, "Invalid block received";
"msg" => "peer sent invalid block",
"outcome" => format!("{:?}", other),
"outcome" => %other,
);
Err(format!("Peer sent invalid block. Reason: {:?}", other))

View File

@ -535,9 +535,10 @@ impl<T: BeaconChainTypes> Worker<T> {
///
/// Creates a log if there is an interal error.
fn send_sync_message(&self, message: SyncMessage<T::EthSpec>) {
self.sync_tx
.send(message)
.unwrap_or_else(|_| error!(self.log, "Could not send message to the sync service"));
self.sync_tx.send(message).unwrap_or_else(|e| {
error!(self.log, "Could not send message to the sync service";
"error" => %e)
});
}
/// Handle an error whilst verifying an `Attestation` or `SignedAggregateAndProof` from the

View File

@ -82,10 +82,11 @@ impl<T: BeaconChainTypes> Processor<T> {
}
fn send_to_sync(&mut self, message: SyncMessage<T::EthSpec>) {
self.sync_send.send(message).unwrap_or_else(|_| {
self.sync_send.send(message).unwrap_or_else(|e| {
warn!(
self.log,
"Could not send message to the sync service";
"error" => %e,
)
});
}
@ -691,9 +692,10 @@ impl<T: EthSpec> HandlerNetworkContext<T> {
/// Sends a message to the network task.
fn inform_network(&mut self, msg: NetworkMessage<T>) {
let msg_r = &format!("{:?}", msg);
self.network_send
.send(msg)
.unwrap_or_else(|_| warn!(self.log, "Could not send message to the network service"))
.unwrap_or_else(|e| warn!(self.log, "Could not send message to the network service"; "error" => %e, "message" => msg_r))
}
/// Disconnects and ban's a peer, sending a Goodbye request with the associated reason.

View File

@ -29,9 +29,9 @@
//!
//! Block Lookup
//!
//! To keep the logic maintained to the syncing thread (and manage the request_ids), when a block needs to be searched for (i.e
//! if an attestation references an unknown block) this manager can search for the block and
//! subsequently search for parents if needed.
//! To keep the logic maintained to the syncing thread (and manage the request_ids), when a block
//! needs to be searched for (i.e if an attestation references an unknown block) this manager can
//! search for the block and subsequently search for parents if needed.
use super::network_context::SyncNetworkContext;
use super::peer_sync_info::{PeerSyncInfo, PeerSyncType};
@ -106,7 +106,6 @@ pub enum SyncMessage<T: EthSpec> {
BatchProcessed {
chain_id: ChainId,
epoch: Epoch,
downloaded_blocks: Vec<SignedBeaconBlock<T>>,
result: BatchProcessResult,
},
@ -123,12 +122,10 @@ pub enum SyncMessage<T: EthSpec> {
// TODO: When correct batch error handling occurs, we will include an error type.
#[derive(Debug)]
pub enum BatchProcessResult {
/// The batch was completed successfully.
Success,
/// The batch processing failed.
Failed,
/// The batch processing failed but managed to import at least one block.
Partial,
/// The batch was completed successfully. It carries whether the sent batch contained blocks.
Success(bool),
/// The batch processing failed. It carries whether the processing imported any block.
Failed(bool),
}
/// Maintains a sequential list of parents to lookup and the lookup's current state.
@ -275,9 +272,9 @@ impl<T: BeaconChainTypes> SyncManager<T> {
match local_peer_info.peer_sync_type(&remote) {
PeerSyncType::FullySynced => {
trace!(self.log, "Peer synced to our head found";
"peer" => format!("{:?}", peer_id),
"peer_head_slot" => remote.head_slot,
"local_head_slot" => local_peer_info.head_slot,
"peer" => %peer_id,
"peer_head_slot" => remote.head_slot,
"local_head_slot" => local_peer_info.head_slot,
);
self.synced_peer(&peer_id, remote);
// notify the range sync that a peer has been added
@ -285,11 +282,11 @@ impl<T: BeaconChainTypes> SyncManager<T> {
}
PeerSyncType::Advanced => {
trace!(self.log, "Useful peer for sync found";
"peer" => format!("{:?}", peer_id),
"peer_head_slot" => remote.head_slot,
"local_head_slot" => local_peer_info.head_slot,
"peer_finalized_epoch" => remote.finalized_epoch,
"local_finalized_epoch" => local_peer_info.finalized_epoch,
"peer" => %peer_id,
"peer_head_slot" => remote.head_slot,
"local_head_slot" => local_peer_info.head_slot,
"peer_finalized_epoch" => remote.finalized_epoch,
"local_finalized_epoch" => local_peer_info.finalized_epoch,
);
// There are few cases to handle here:
@ -908,14 +905,12 @@ impl<T: BeaconChainTypes> SyncManager<T> {
SyncMessage::BatchProcessed {
chain_id,
epoch,
downloaded_blocks,
result,
} => {
self.range_sync.handle_block_process_result(
&mut self.network,
chain_id,
epoch,
downloaded_blocks,
result,
);
}

View File

@ -1,11 +1,14 @@
//! Provides network functionality for the Syncing thread. This fundamentally wraps a network
//! channel and stores a global RPC ID to perform requests.
use super::range_sync::{BatchId, ChainId};
use super::RequestId as SyncRequestId;
use crate::router::processor::status_message;
use crate::service::NetworkMessage;
use beacon_chain::{BeaconChain, BeaconChainTypes};
use eth2_libp2p::rpc::{BlocksByRangeRequest, BlocksByRootRequest, GoodbyeReason, RequestId};
use eth2_libp2p::{Client, NetworkGlobals, PeerAction, PeerId, Request};
use fnv::FnvHashMap;
use slog::{debug, trace, warn};
use std::sync::Arc;
use tokio::sync::mpsc;
@ -21,7 +24,11 @@ pub struct SyncNetworkContext<T: EthSpec> {
network_globals: Arc<NetworkGlobals<T>>,
/// A sequential ID for all RPC requests.
request_id: usize,
request_id: SyncRequestId,
/// BlocksByRange requests made by range syncing chains.
range_requests: FnvHashMap<SyncRequestId, (ChainId, BatchId)>,
/// Logger for the `SyncNetworkContext`.
log: slog::Logger,
}
@ -36,6 +43,7 @@ impl<T: EthSpec> SyncNetworkContext<T> {
network_send,
network_globals,
request_id: 1,
range_requests: FnvHashMap::default(),
log,
}
}
@ -50,24 +58,26 @@ impl<T: EthSpec> SyncNetworkContext<T> {
.unwrap_or_default()
}
pub fn status_peer<U: BeaconChainTypes>(
pub fn status_peers<U: BeaconChainTypes>(
&mut self,
chain: Arc<BeaconChain<U>>,
peer_id: PeerId,
peers: impl Iterator<Item = PeerId>,
) {
if let Some(status_message) = status_message(&chain) {
debug!(
self.log,
"Sending Status Request";
"peer" => format!("{:?}", peer_id),
"fork_digest" => format!("{:?}", status_message.fork_digest),
"finalized_root" => format!("{:?}", status_message.finalized_root),
"finalized_epoch" => format!("{:?}", status_message.finalized_epoch),
"head_root" => format!("{}", status_message.head_root),
"head_slot" => format!("{}", status_message.head_slot),
);
for peer_id in peers {
debug!(
self.log,
"Sending Status Request";
"peer" => %peer_id,
"fork_digest" => ?status_message.fork_digest,
"finalized_root" => ?status_message.finalized_root,
"finalized_epoch" => ?status_message.finalized_epoch,
"head_root" => %status_message.head_root,
"head_slot" => %status_message.head_slot,
);
let _ = self.send_rpc_request(peer_id, Request::Status(status_message));
let _ = self.send_rpc_request(peer_id, Request::Status(status_message.clone()));
}
}
}
@ -75,15 +85,34 @@ impl<T: EthSpec> SyncNetworkContext<T> {
&mut self,
peer_id: PeerId,
request: BlocksByRangeRequest,
) -> Result<usize, &'static str> {
chain_id: ChainId,
batch_id: BatchId,
) -> Result<(), &'static str> {
trace!(
self.log,
"Sending BlocksByRange Request";
"method" => "BlocksByRange",
"count" => request.count,
"peer" => format!("{:?}", peer_id)
"peer" => %peer_id,
);
self.send_rpc_request(peer_id, Request::BlocksByRange(request))
let req_id = self.send_rpc_request(peer_id, Request::BlocksByRange(request))?;
self.range_requests.insert(req_id, (chain_id, batch_id));
Ok(())
}
pub fn blocks_by_range_response(
&mut self,
request_id: usize,
remove: bool,
) -> Option<(ChainId, BatchId)> {
// NOTE: we can't guarantee that the request must be registered as it could receive more
// than an error, and be removed after receiving the first one.
// FIXME: https://github.com/sigp/lighthouse/issues/1634
if remove {
self.range_requests.remove(&request_id)
} else {
self.range_requests.get(&request_id).cloned()
}
}
pub fn blocks_by_root_request(

View File

@ -1,35 +1,274 @@
use super::chain::EPOCHS_PER_BATCH;
use eth2_libp2p::rpc::methods::*;
use eth2_libp2p::rpc::methods::BlocksByRangeRequest;
use eth2_libp2p::PeerId;
use fnv::FnvHashMap;
use ssz::Encode;
use std::cmp::min;
use std::cmp::Ordering;
use std::collections::hash_map::Entry;
use std::collections::{HashMap, HashSet};
use std::collections::HashSet;
use std::hash::{Hash, Hasher};
use std::ops::Sub;
use types::{Epoch, EthSpec, SignedBeaconBlock, Slot};
/// A collection of sequential blocks that are requested from peers in a single RPC request.
#[derive(PartialEq, Debug)]
pub struct Batch<T: EthSpec> {
/// The requested start epoch of the batch.
pub start_epoch: Epoch,
/// The requested end slot of batch, exclusive.
pub end_slot: Slot,
/// The `Attempts` that have been made to send us this batch.
pub attempts: Vec<Attempt>,
/// The peer that is currently assigned to the batch.
pub current_peer: PeerId,
/// The number of retries this batch has undergone due to a failed request.
/// This occurs when peers do not respond or we get an RPC error.
pub retries: u8,
/// The number of times this batch has attempted to be re-downloaded and re-processed. This
/// occurs when a batch has been received but cannot be processed.
pub reprocess_retries: u8,
/// The blocks that have been downloaded.
pub downloaded_blocks: Vec<SignedBeaconBlock<T>>,
/// The number of times to retry a batch before it is considered failed.
const MAX_BATCH_DOWNLOAD_ATTEMPTS: u8 = 5;
/// Invalid batches are attempted to be re-downloaded from other peers. If a batch cannot be processed
/// after `MAX_BATCH_PROCESSING_ATTEMPTS` times, it is considered faulty.
const MAX_BATCH_PROCESSING_ATTEMPTS: u8 = 3;
/// A segment of a chain.
pub struct BatchInfo<T: EthSpec> {
/// Start slot of the batch.
start_slot: Slot,
/// End slot of the batch.
end_slot: Slot,
/// The `Attempts` that have been made and failed to send us this batch.
failed_processing_attempts: Vec<Attempt>,
/// The number of download retries this batch has undergone due to a failed request.
failed_download_attempts: Vec<PeerId>,
/// State of the batch.
state: BatchState<T>,
}
/// Current state of a batch
pub enum BatchState<T: EthSpec> {
/// The batch has failed either downloading or processing, but can be requested again.
AwaitingDownload,
/// The batch is being downloaded.
Downloading(PeerId, Vec<SignedBeaconBlock<T>>),
/// The batch has been completely downloaded and is ready for processing.
AwaitingProcessing(PeerId, Vec<SignedBeaconBlock<T>>),
/// The batch is being processed.
Processing(Attempt),
/// The batch was successfully processed and is waiting to be validated.
///
/// It is not sufficient to process a batch successfully to consider it correct. This is
/// because batches could be erroneously empty, or incomplete. Therefore, a batch is considered
/// valid, only if the next sequential batch imports at least a block.
AwaitingValidation(Attempt),
/// Intermediate state for inner state handling.
Poisoned,
/// The batch has maxed out the allowed attempts for either downloading or processing. It
/// cannot be recovered.
Failed,
}
impl<T: EthSpec> BatchState<T> {
/// Helper function for poisoning a state.
pub fn poison(&mut self) -> BatchState<T> {
std::mem::replace(self, BatchState::Poisoned)
}
}
impl<T: EthSpec> BatchInfo<T> {
/// Batches are downloaded excluding the first block of the epoch assuming it has already been
/// downloaded.
///
/// For example:
///
/// Epoch boundary | |
/// ... | 30 | 31 | 32 | 33 | 34 | ... | 61 | 62 | 63 | 64 | 65 |
/// Batch 1 | Batch 2 | Batch 3
pub fn new(start_epoch: &Epoch, num_of_epochs: u64) -> Self {
let start_slot = start_epoch.start_slot(T::slots_per_epoch()) + 1;
let end_slot = start_slot + num_of_epochs * T::slots_per_epoch();
BatchInfo {
start_slot,
end_slot,
failed_processing_attempts: Vec::new(),
failed_download_attempts: Vec::new(),
state: BatchState::AwaitingDownload,
}
}
/// Gives a list of peers from which this batch has had a failed download or processing
/// attempt.
pub fn failed_peers(&self) -> HashSet<PeerId> {
let mut peers = HashSet::with_capacity(
self.failed_processing_attempts.len() + self.failed_download_attempts.len(),
);
for attempt in &self.failed_processing_attempts {
peers.insert(attempt.peer_id.clone());
}
for download in &self.failed_download_attempts {
peers.insert(download.clone());
}
peers
}
pub fn current_peer(&self) -> Option<&PeerId> {
match &self.state {
BatchState::AwaitingDownload | BatchState::Failed => None,
BatchState::Downloading(peer_id, _)
| BatchState::AwaitingProcessing(peer_id, _)
| BatchState::Processing(Attempt { peer_id, .. })
| BatchState::AwaitingValidation(Attempt { peer_id, .. }) => Some(&peer_id),
BatchState::Poisoned => unreachable!("Poisoned batch"),
}
}
pub fn to_blocks_by_range_request(&self) -> BlocksByRangeRequest {
BlocksByRangeRequest {
start_slot: self.start_slot.into(),
count: self.end_slot.sub(self.start_slot).into(),
step: 1,
}
}
pub fn state(&self) -> &BatchState<T> {
&self.state
}
pub fn attempts(&self) -> &[Attempt] {
&self.failed_processing_attempts
}
/// Adds a block to a downloading batch.
pub fn add_block(&mut self, block: SignedBeaconBlock<T>) {
match self.state.poison() {
BatchState::Downloading(peer, mut blocks) => {
blocks.push(block);
self.state = BatchState::Downloading(peer, blocks)
}
other => unreachable!("Add block for batch in wrong state: {:?}", other),
}
}
/// Marks the batch as ready to be processed if the blocks are in the range. The number of
/// received blocks is returned, or the wrong batch end on failure
#[must_use = "Batch may have failed"]
pub fn download_completed(
&mut self,
) -> Result<
usize, /* Received blocks */
(
Slot, /* expected slot */
Slot, /* received slot */
&BatchState<T>,
),
> {
match self.state.poison() {
BatchState::Downloading(peer, blocks) => {
// verify that blocks are in range
if let Some(last_slot) = blocks.last().map(|b| b.slot()) {
// the batch is non-empty
let first_slot = blocks[0].slot();
let failed_range = if first_slot < self.start_slot {
Some((self.start_slot, first_slot))
} else if self.end_slot < last_slot {
Some((self.end_slot, last_slot))
} else {
None
};
if let Some(range) = failed_range {
// this is a failed download, register the attempt and check if the batch
// can be tried again
self.failed_download_attempts.push(peer);
self.state = if self.failed_download_attempts.len()
>= MAX_BATCH_DOWNLOAD_ATTEMPTS as usize
{
BatchState::Failed
} else {
// drop the blocks
BatchState::AwaitingDownload
};
return Err((range.0, range.1, &self.state));
}
}
let received = blocks.len();
self.state = BatchState::AwaitingProcessing(peer, blocks);
Ok(received)
}
other => unreachable!("Download completed for batch in wrong state: {:?}", other),
}
}
#[must_use = "Batch may have failed"]
pub fn download_failed(&mut self) -> &BatchState<T> {
match self.state.poison() {
BatchState::Downloading(peer, _) => {
// register the attempt and check if the batch can be tried again
self.failed_download_attempts.push(peer);
self.state = if self.failed_download_attempts.len()
>= MAX_BATCH_DOWNLOAD_ATTEMPTS as usize
{
BatchState::Failed
} else {
// drop the blocks
BatchState::AwaitingDownload
};
&self.state
}
other => unreachable!("Download failed for batch in wrong state: {:?}", other),
}
}
pub fn start_downloading_from_peer(&mut self, peer: PeerId) {
match self.state.poison() {
BatchState::AwaitingDownload => {
self.state = BatchState::Downloading(peer, Vec::new());
}
other => unreachable!("Starting download for batch in wrong state: {:?}", other),
}
}
pub fn start_processing(&mut self) -> Vec<SignedBeaconBlock<T>> {
match self.state.poison() {
BatchState::AwaitingProcessing(peer, blocks) => {
self.state = BatchState::Processing(Attempt::new(peer, &blocks));
blocks
}
other => unreachable!("Start processing for batch in wrong state: {:?}", other),
}
}
#[must_use = "Batch may have failed"]
pub fn processing_completed(&mut self, was_sucessful: bool) -> &BatchState<T> {
match self.state.poison() {
BatchState::Processing(attempt) => {
self.state = if !was_sucessful {
// register the failed attempt
self.failed_processing_attempts.push(attempt);
// check if the batch can be downloaded again
if self.failed_processing_attempts.len()
>= MAX_BATCH_PROCESSING_ATTEMPTS as usize
{
BatchState::Failed
} else {
BatchState::AwaitingDownload
}
} else {
BatchState::AwaitingValidation(attempt)
};
&self.state
}
other => unreachable!("Processing completed for batch in wrong state: {:?}", other),
}
}
#[must_use = "Batch may have failed"]
pub fn validation_failed(&mut self) -> &BatchState<T> {
match self.state.poison() {
BatchState::AwaitingValidation(attempt) => {
self.failed_processing_attempts.push(attempt);
// check if the batch can be downloaded again
self.state = if self.failed_processing_attempts.len()
>= MAX_BATCH_PROCESSING_ATTEMPTS as usize
{
BatchState::Failed
} else {
BatchState::AwaitingDownload
};
&self.state
}
other => unreachable!("Validation failed for batch in wrong state: {:?}", other),
}
}
}
/// Represents a peer's attempt and providing the result for this batch.
@ -43,131 +282,61 @@ pub struct Attempt {
pub hash: u64,
}
impl<T: EthSpec> Eq for Batch<T> {}
impl<T: EthSpec> Batch<T> {
pub fn new(start_epoch: Epoch, end_slot: Slot, peer_id: PeerId) -> Self {
Batch {
start_epoch,
end_slot,
attempts: Vec::new(),
current_peer: peer_id,
retries: 0,
reprocess_retries: 0,
downloaded_blocks: Vec::new(),
}
}
pub fn start_slot(&self) -> Slot {
// batches are shifted by 1
self.start_epoch.start_slot(T::slots_per_epoch()) + 1
}
pub fn end_slot(&self) -> Slot {
self.end_slot
}
pub fn to_blocks_by_range_request(&self) -> BlocksByRangeRequest {
let start_slot = self.start_slot();
BlocksByRangeRequest {
start_slot: start_slot.into(),
count: min(
T::slots_per_epoch() * EPOCHS_PER_BATCH,
self.end_slot.sub(start_slot).into(),
),
step: 1,
}
}
/// This gets a hash that represents the blocks currently downloaded. This allows comparing a
/// previously downloaded batch of blocks with a new downloaded batch of blocks.
pub fn hash(&self) -> u64 {
// the hash used is the ssz-encoded list of blocks
impl Attempt {
#[allow(clippy::ptr_arg)]
fn new<T: EthSpec>(peer_id: PeerId, blocks: &Vec<SignedBeaconBlock<T>>) -> Self {
let mut hasher = std::collections::hash_map::DefaultHasher::new();
self.downloaded_blocks.as_ssz_bytes().hash(&mut hasher);
hasher.finish()
blocks.as_ssz_bytes().hash(&mut hasher);
let hash = hasher.finish();
Attempt { peer_id, hash }
}
}
impl<T: EthSpec> Ord for Batch<T> {
fn cmp(&self, other: &Self) -> Ordering {
self.start_epoch.cmp(&other.start_epoch)
impl<T: EthSpec> slog::KV for &mut BatchInfo<T> {
fn serialize(
&self,
record: &slog::Record,
serializer: &mut dyn slog::Serializer,
) -> slog::Result {
slog::KV::serialize(*self, record, serializer)
}
}
impl<T: EthSpec> PartialOrd for Batch<T> {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
Some(self.cmp(other))
impl<T: EthSpec> slog::KV for BatchInfo<T> {
fn serialize(
&self,
record: &slog::Record,
serializer: &mut dyn slog::Serializer,
) -> slog::Result {
use slog::Value;
Value::serialize(&self.start_slot, record, "start_slot", serializer)?;
Value::serialize(
&(self.end_slot - 1), // NOTE: The -1 shows inclusive blocks
record,
"end_slot",
serializer,
)?;
serializer.emit_usize("downloaded", self.failed_download_attempts.len())?;
serializer.emit_usize("processed", self.failed_processing_attempts.len())?;
serializer.emit_str("state", &format!("{:?}", self.state))?;
slog::Result::Ok(())
}
}
/// A structure that contains a mapping of pending batch requests, that also keeps track of which
/// peers are currently making batch requests.
///
/// This is used to optimise searches for idle peers (peers that have no outbound batch requests).
pub struct PendingBatches<T: EthSpec> {
/// The current pending batches.
batches: FnvHashMap<usize, Batch<T>>,
/// A mapping of peers to the number of pending requests.
peer_requests: HashMap<PeerId, HashSet<usize>>,
}
impl<T: EthSpec> PendingBatches<T> {
pub fn new() -> Self {
PendingBatches {
batches: FnvHashMap::default(),
peer_requests: HashMap::new(),
}
}
pub fn insert(&mut self, request_id: usize, batch: Batch<T>) -> Option<Batch<T>> {
let peer_request = batch.current_peer.clone();
self.peer_requests
.entry(peer_request)
.or_insert_with(HashSet::new)
.insert(request_id);
self.batches.insert(request_id, batch)
}
pub fn remove(&mut self, request_id: usize) -> Option<Batch<T>> {
if let Some(batch) = self.batches.remove(&request_id) {
if let Entry::Occupied(mut entry) = self.peer_requests.entry(batch.current_peer.clone())
{
entry.get_mut().remove(&request_id);
if entry.get().is_empty() {
entry.remove();
}
impl<T: EthSpec> std::fmt::Debug for BatchState<T> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
BatchState::Processing(_) => f.write_str("Processing"),
BatchState::AwaitingValidation(_) => f.write_str("AwaitingValidation"),
BatchState::AwaitingDownload => f.write_str("AwaitingDownload"),
BatchState::Failed => f.write_str("Failed"),
BatchState::AwaitingProcessing(ref peer, ref blocks) => {
write!(f, "AwaitingProcessing({}, {} blocks)", peer, blocks.len())
}
Some(batch)
} else {
None
BatchState::Downloading(peer, blocks) => {
write!(f, "Downloading({}, {} blocks)", peer, blocks.len())
}
BatchState::Poisoned => f.write_str("Poisoned"),
}
}
/// The number of current pending batch requests.
pub fn len(&self) -> usize {
self.batches.len()
}
/// Adds a block to the batches if the request id exists. Returns None if there is no batch
/// matching the request id.
pub fn add_block(&mut self, request_id: usize, block: SignedBeaconBlock<T>) -> Option<()> {
let batch = self.batches.get_mut(&request_id)?;
batch.downloaded_blocks.push(block);
Some(())
}
/// Returns true if there the peer does not exist in the peer_requests mapping. Indicating it
/// has no pending outgoing requests.
pub fn peer_is_idle(&self, peer_id: &PeerId) -> bool {
self.peer_requests.get(peer_id).is_none()
}
/// Removes a batch for a given peer.
pub fn remove_batch_by_peer(&mut self, peer_id: &PeerId) -> Option<Batch<T>> {
let request_ids = self.peer_requests.get(peer_id)?;
let request_id = *request_ids.iter().next()?;
self.remove(request_id)
}
}

File diff suppressed because it is too large Load Diff

View File

@ -1,15 +1,18 @@
//! This provides the logic for the finalized and head chains.
//!
//! Each chain type is stored in it's own vector. A variety of helper functions are given along
//! with this struct to to simplify the logic of the other layers of sync.
//! Each chain type is stored in it's own map. A variety of helper functions are given along with
//! this struct to simplify the logic of the other layers of sync.
use super::chain::{ChainSyncingState, SyncingChain};
use super::chain::{ChainId, ChainSyncingState, ProcessingResult, SyncingChain};
use super::sync_type::RangeSyncType;
use crate::beacon_processor::WorkEvent as BeaconWorkEvent;
use crate::sync::network_context::SyncNetworkContext;
use crate::sync::PeerSyncInfo;
use beacon_chain::{BeaconChain, BeaconChainTypes};
use eth2_libp2p::{types::SyncState, NetworkGlobals, PeerId};
use slog::{debug, error, info, o};
use fnv::FnvHashMap;
use slog::{crit, debug, error, info, trace};
use std::collections::hash_map::Entry;
use std::sync::Arc;
use tokio::sync::mpsc;
use types::EthSpec;
@ -83,9 +86,9 @@ pub struct ChainCollection<T: BeaconChainTypes> {
/// A reference to the global network parameters.
network_globals: Arc<NetworkGlobals<T::EthSpec>>,
/// The set of finalized chains being synced.
finalized_chains: Vec<SyncingChain<T>>,
finalized_chains: FnvHashMap<ChainId, SyncingChain<T>>,
/// The set of head chains being synced.
head_chains: Vec<SyncingChain<T>>,
head_chains: FnvHashMap<ChainId, SyncingChain<T>>,
/// The current sync state of the process.
state: RangeSyncState,
/// Logger for the collection.
@ -101,8 +104,8 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
ChainCollection {
beacon_chain,
network_globals,
finalized_chains: Vec::new(),
head_chains: Vec::new(),
finalized_chains: FnvHashMap::default(),
head_chains: FnvHashMap::default(),
state: RangeSyncState::Idle,
log,
}
@ -129,7 +132,7 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
.unwrap_or_else(|| SyncState::Stalled);
let mut peer_state = self.network_globals.sync_state.write();
if new_state != *peer_state {
info!(self.log, "Sync state updated"; "old_state" => format!("{}",peer_state), "new_state" => format!("{}",new_state));
info!(self.log, "Sync state updated"; "old_state" => %peer_state, "new_state" => %new_state);
if new_state == SyncState::Synced {
network.subscribe_core_topics();
}
@ -141,7 +144,7 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
let new_state: SyncState = self.state.clone().into();
if *node_sync_state != new_state {
// we are updating the state, inform the user
info!(self.log, "Sync state updated"; "old_state" => format!("{}",node_sync_state), "new_state" => format!("{}",new_state));
info!(self.log, "Sync state updated"; "old_state" => %node_sync_state, "new_state" => %new_state);
}
*node_sync_state = new_state;
}
@ -182,30 +185,67 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
}
}
/// Finds any finalized chain if it exists.
pub fn get_finalized_mut(
&mut self,
target_head_root: Hash256,
target_head_slot: Slot,
) -> Option<&mut SyncingChain<T>> {
ChainCollection::get_chain(
self.finalized_chains.as_mut(),
target_head_root,
target_head_slot,
)
/// Calls `func` on every chain of the collection. If the result is
/// `ProcessingResult::RemoveChain`, the chain is removed and returned.
pub fn call_all<F>(&mut self, mut func: F) -> Vec<(SyncingChain<T>, RangeSyncType)>
where
F: FnMut(&mut SyncingChain<T>) -> ProcessingResult,
{
let mut to_remove = Vec::new();
for (id, chain) in self.finalized_chains.iter_mut() {
if let ProcessingResult::RemoveChain = func(chain) {
to_remove.push((*id, RangeSyncType::Finalized));
}
}
for (id, chain) in self.head_chains.iter_mut() {
if let ProcessingResult::RemoveChain = func(chain) {
to_remove.push((*id, RangeSyncType::Head));
}
}
let mut results = Vec::with_capacity(to_remove.len());
for (id, sync_type) in to_remove.into_iter() {
let chain = match sync_type {
RangeSyncType::Finalized => self.finalized_chains.remove(&id),
RangeSyncType::Head => self.head_chains.remove(&id),
};
results.push((chain.expect("Chain exits"), sync_type));
}
results
}
/// Finds any finalized chain if it exists.
pub fn get_head_mut(
/// Executes a function on the chain with the given id.
///
/// If the function returns `ProcessingResult::RemoveChain`, the chain is removed and returned.
/// If the chain is found, its syncing type is returned, or an error otherwise.
pub fn call_by_id<F>(
&mut self,
target_head_root: Hash256,
target_head_slot: Slot,
) -> Option<&mut SyncingChain<T>> {
ChainCollection::get_chain(
self.head_chains.as_mut(),
target_head_root,
target_head_slot,
)
id: ChainId,
func: F,
) -> Result<(Option<SyncingChain<T>>, RangeSyncType), ()>
where
F: FnOnce(&mut SyncingChain<T>) -> ProcessingResult,
{
if let Entry::Occupied(mut entry) = self.finalized_chains.entry(id) {
// Search in our finalized chains first
if let ProcessingResult::RemoveChain = func(entry.get_mut()) {
Ok((Some(entry.remove()), RangeSyncType::Finalized))
} else {
Ok((None, RangeSyncType::Finalized))
}
} else if let Entry::Occupied(mut entry) = self.head_chains.entry(id) {
// Search in our head chains next
if let ProcessingResult::RemoveChain = func(entry.get_mut()) {
Ok((Some(entry.remove()), RangeSyncType::Head))
} else {
Ok((None, RangeSyncType::Head))
}
} else {
// Chain was not found in the finalized collection, nor the head collection
Err(())
}
}
/// Updates the state of the chain collection.
@ -214,9 +254,8 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
/// updates the state of the collection. This starts head chains syncing if any are required to
/// do so.
pub fn update(&mut self, network: &mut SyncNetworkContext<T::EthSpec>) {
let local_epoch = {
let local = match PeerSyncInfo::from_chain(&self.beacon_chain) {
Some(local) => local,
let (local_finalized_epoch, local_head_epoch) =
match PeerSyncInfo::from_chain(&self.beacon_chain) {
None => {
return error!(
self.log,
@ -224,20 +263,21 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
"msg" => "likely due to head lock contention"
)
}
Some(local) => (
local.finalized_epoch,
local.head_slot.epoch(T::EthSpec::slots_per_epoch()),
),
};
local.finalized_epoch
};
// Remove any outdated finalized/head chains
self.purge_outdated_chains(network);
// Choose the best finalized chain if one needs to be selected.
self.update_finalized_chains(network, local_epoch);
self.update_finalized_chains(network, local_finalized_epoch, local_head_epoch);
if self.finalized_syncing_index().is_none() {
if self.finalized_syncing_chain().is_none() {
// Handle head syncing chains if there are no finalized chains left.
self.update_head_chains(network, local_epoch);
self.update_head_chains(network, local_finalized_epoch, local_head_epoch);
}
}
@ -247,53 +287,57 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
&mut self,
network: &mut SyncNetworkContext<T::EthSpec>,
local_epoch: Epoch,
local_head_epoch: Epoch,
) {
// Check if any chains become the new syncing chain
if let Some(index) = self.finalized_syncing_index() {
// There is a current finalized chain syncing
let _syncing_chain_peer_count = self.finalized_chains[index].peer_pool.len();
// search for a chain with more peers
if let Some((new_index, chain)) =
self.finalized_chains
.iter_mut()
.enumerate()
.find(|(_iter_index, _chain)| {
false
// && *iter_index != index
// && chain.peer_pool.len() > syncing_chain_peer_count
})
{
// A chain has more peers. Swap the syncing chain
debug!(self.log, "Switching finalized chains to sync"; "new_target_root" => format!("{}", chain.target_head_root), "new_end_slot" => chain.target_head_slot, "new_start_epoch"=> local_epoch);
// update the state to a new finalized state
let state = RangeSyncState::Finalized {
start_slot: chain.start_epoch.start_slot(T::EthSpec::slots_per_epoch()),
head_slot: chain.target_head_slot,
head_root: chain.target_head_root,
};
self.state = state;
// Stop the current chain from syncing
self.finalized_chains[index].stop_syncing();
// Start the new chain
self.finalized_chains[new_index].start_syncing(network, local_epoch);
}
} else if let Some(chain) = self
// Find the chain with most peers and check if it is already syncing
if let Some((new_id, peers)) = self
.finalized_chains
.iter_mut()
.max_by_key(|chain| chain.peer_pool.len())
.iter()
.max_by_key(|(_, chain)| chain.available_peers())
.map(|(id, chain)| (*id, chain.available_peers()))
{
// There is no currently syncing finalization chain, starting the one with the most peers
debug!(self.log, "New finalized chain started syncing"; "new_target_root" => format!("{}", chain.target_head_root), "new_end_slot" => chain.target_head_slot, "new_start_epoch"=> chain.start_epoch);
chain.start_syncing(network, local_epoch);
let old_id = self.finalized_syncing_chain().map(
|(currently_syncing_id, currently_syncing_chain)| {
if *currently_syncing_id != new_id
&& peers > currently_syncing_chain.available_peers()
{
currently_syncing_chain.stop_syncing();
// we stop this chain and start syncing the one with more peers
Some(*currently_syncing_id)
} else {
// the best chain is already the syncing chain, advance it if possible
None
}
},
);
let chain = self
.finalized_chains
.get_mut(&new_id)
.expect("Chain exists");
match old_id {
Some(Some(old_id)) => debug!(self.log, "Switching finalized chains";
"old_id" => old_id, &chain),
None => debug!(self.log, "Syncing new chain"; &chain),
Some(None) => trace!(self.log, "Advancing currently syncing chain"),
// this is the same chain. We try to advance it.
}
// update the state to a new finalized state
let state = RangeSyncState::Finalized {
start_slot: chain.start_epoch.start_slot(T::EthSpec::slots_per_epoch()),
head_slot: chain.target_head_slot,
head_root: chain.target_head_root,
};
self.state = state;
if let ProcessingResult::RemoveChain =
chain.start_syncing(network, local_epoch, local_head_epoch)
{
// this happens only if sending a batch over the `network` fails a lot
error!(self.log, "Chain removed while switching chains");
self.finalized_chains.remove(&new_id);
}
}
}
@ -302,6 +346,7 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
&mut self,
network: &mut SyncNetworkContext<T::EthSpec>,
local_epoch: Epoch,
local_head_epoch: Epoch,
) {
// There are no finalized chains, update the state.
if self.head_chains.is_empty() {
@ -311,42 +356,41 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
let mut currently_syncing = self
.head_chains
.iter()
.values()
.filter(|chain| chain.is_syncing())
.count();
let mut not_syncing = self.head_chains.len() - currently_syncing;
// Find all head chains that are not currently syncing ordered by peer count.
while currently_syncing <= PARALLEL_HEAD_CHAINS && not_syncing > 0 {
// Find the chain with the most peers and start syncing
if let Some((_index, chain)) = self
if let Some((_id, chain)) = self
.head_chains
.iter_mut()
.filter(|chain| !chain.is_syncing())
.enumerate()
.max_by_key(|(_index, chain)| chain.peer_pool.len())
.filter(|(_id, chain)| !chain.is_syncing())
.max_by_key(|(_id, chain)| chain.available_peers())
{
// start syncing this chain
debug!(self.log, "New head chain started syncing"; "new_target_root" => format!("{}", chain.target_head_root), "new_end_slot" => chain.target_head_slot, "new_start_epoch"=> chain.start_epoch);
chain.start_syncing(network, local_epoch);
debug!(self.log, "New head chain started syncing"; &chain);
if let ProcessingResult::RemoveChain =
chain.start_syncing(network, local_epoch, local_head_epoch)
{
error!(self.log, "Chain removed while switching head chains")
}
}
// update variables
currently_syncing = self
.head_chains
.iter()
.filter(|chain| chain.is_syncing())
.filter(|(_id, chain)| chain.is_syncing())
.count();
not_syncing = self.head_chains.len() - currently_syncing;
}
// Start
// for the syncing API, we find the minimal start_slot and the maximum
// target_slot of all head chains to report back.
let (min_epoch, max_slot) = self
.head_chains
.iter()
.values()
.filter(|chain| chain.is_syncing())
.fold(
(Epoch::from(0u64), Slot::from(0u64)),
@ -368,10 +412,9 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
/// chains and re-status their peers.
pub fn clear_head_chains(&mut self, network: &mut SyncNetworkContext<T::EthSpec>) {
let log_ref = &self.log;
self.head_chains.retain(|chain| {
if !chain.is_syncing()
{
debug!(log_ref, "Removing old head chain"; "start_epoch" => chain.start_epoch, "end_slot" => chain.target_head_slot);
self.head_chains.retain(|_id, chain| {
if !chain.is_syncing() {
debug!(log_ref, "Removing old head chain"; &chain);
chain.status_peers(network);
false
} else {
@ -380,140 +423,20 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
});
}
/// Add a new finalized chain to the collection.
pub fn new_finalized_chain(
&mut self,
local_finalized_epoch: Epoch,
target_head: Hash256,
target_slot: Slot,
peer_id: PeerId,
beacon_processor_send: mpsc::Sender<BeaconWorkEvent<T::EthSpec>>,
) {
let chain_id = rand::random();
self.finalized_chains.push(SyncingChain::new(
chain_id,
local_finalized_epoch,
target_slot,
target_head,
peer_id,
beacon_processor_send,
self.beacon_chain.clone(),
self.log.new(o!("chain" => chain_id)),
));
}
/// Add a new finalized chain to the collection and starts syncing it.
#[allow(clippy::too_many_arguments)]
pub fn new_head_chain(
&mut self,
remote_finalized_epoch: Epoch,
target_head: Hash256,
target_slot: Slot,
peer_id: PeerId,
beacon_processor_send: mpsc::Sender<BeaconWorkEvent<T::EthSpec>>,
) {
// remove the peer from any other head chains
self.head_chains.iter_mut().for_each(|chain| {
chain.peer_pool.remove(&peer_id);
});
self.head_chains.retain(|chain| !chain.peer_pool.is_empty());
let chain_id = rand::random();
let new_head_chain = SyncingChain::new(
chain_id,
remote_finalized_epoch,
target_slot,
target_head,
peer_id,
beacon_processor_send,
self.beacon_chain.clone(),
self.log.clone(),
);
self.head_chains.push(new_head_chain);
}
/// Returns if `true` if any finalized chains exist, `false` otherwise.
pub fn is_finalizing_sync(&self) -> bool {
!self.finalized_chains.is_empty()
}
/// Given a chain iterator, runs a given function on each chain until the function returns
/// `Some`. This allows the `RangeSync` struct to loop over chains and optionally remove the
/// chain from the collection if the function results in completing the chain.
fn request_function<'a, F, I, U>(chain: I, mut func: F) -> Option<(usize, U)>
where
I: Iterator<Item = &'a mut SyncingChain<T>>,
F: FnMut(&'a mut SyncingChain<T>) -> Option<U>,
{
chain
.enumerate()
.find_map(|(index, chain)| Some((index, func(chain)?)))
}
/// Given a chain iterator, runs a given function on each chain and return all `Some` results.
fn request_function_all<'a, F, I, U>(chain: I, mut func: F) -> Vec<(usize, U)>
where
I: Iterator<Item = &'a mut SyncingChain<T>>,
F: FnMut(&'a mut SyncingChain<T>) -> Option<U>,
{
chain
.enumerate()
.filter_map(|(index, chain)| Some((index, func(chain)?)))
.collect()
}
/// Runs a function on finalized chains until we get the first `Some` result from `F`.
pub fn finalized_request<F, U>(&mut self, func: F) -> Option<(usize, U)>
where
F: FnMut(&mut SyncingChain<T>) -> Option<U>,
{
ChainCollection::request_function(self.finalized_chains.iter_mut(), func)
}
/// Runs a function on head chains until we get the first `Some` result from `F`.
pub fn head_request<F, U>(&mut self, func: F) -> Option<(usize, U)>
where
F: FnMut(&mut SyncingChain<T>) -> Option<U>,
{
ChainCollection::request_function(self.head_chains.iter_mut(), func)
}
/// Runs a function on finalized and head chains until we get the first `Some` result from `F`.
pub fn head_finalized_request<F, U>(&mut self, func: F) -> Option<(usize, U)>
where
F: FnMut(&mut SyncingChain<T>) -> Option<U>,
{
ChainCollection::request_function(
self.finalized_chains
.iter_mut()
.chain(self.head_chains.iter_mut()),
func,
)
}
/// Runs a function on all finalized and head chains and collects all `Some` results from `F`.
pub fn head_finalized_request_all<F, U>(&mut self, func: F) -> Vec<(usize, U)>
where
F: FnMut(&mut SyncingChain<T>) -> Option<U>,
{
ChainCollection::request_function_all(
self.finalized_chains
.iter_mut()
.chain(self.head_chains.iter_mut()),
func,
)
}
/// Removes any outdated finalized or head chains.
///
/// This removes chains with no peers, or chains whose start block slot is less than our current
/// finalized block slot.
pub fn purge_outdated_chains(&mut self, network: &mut SyncNetworkContext<T::EthSpec>) {
// Remove any chains that have no peers
self.finalized_chains
.retain(|chain| !chain.peer_pool.is_empty());
self.head_chains.retain(|chain| !chain.peer_pool.is_empty());
.retain(|_id, chain| chain.available_peers() > 0);
self.head_chains
.retain(|_id, chain| chain.available_peers() > 0);
let local_info = match PeerSyncInfo::from_chain(&self.beacon_chain) {
Some(local) => local,
@ -533,28 +456,28 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
let beacon_chain = &self.beacon_chain;
let log_ref = &self.log;
// Remove chains that are out-dated and re-status their peers
self.finalized_chains.retain(|chain| {
self.finalized_chains.retain(|_id, chain| {
if chain.target_head_slot <= local_finalized_slot
|| beacon_chain
.fork_choice
.read()
.contains_block(&chain.target_head_root)
{
debug!(log_ref, "Purging out of finalized chain"; "start_epoch" => chain.start_epoch, "end_slot" => chain.target_head_slot);
debug!(log_ref, "Purging out of finalized chain"; &chain);
chain.status_peers(network);
false
} else {
true
}
});
self.head_chains.retain(|chain| {
self.head_chains.retain(|_id, chain| {
if chain.target_head_slot <= local_finalized_slot
|| beacon_chain
.fork_choice
.read()
.contains_block(&chain.target_head_root)
{
debug!(log_ref, "Purging out of date head chain"; "start_epoch" => chain.start_epoch, "end_slot" => chain.target_head_slot);
debug!(log_ref, "Purging out of date head chain"; &chain);
chain.status_peers(network);
false
} else {
@ -563,63 +486,71 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
});
}
/// Removes and returns a finalized chain from the collection.
pub fn remove_finalized_chain(&mut self, index: usize) -> SyncingChain<T> {
self.finalized_chains.swap_remove(index)
}
/// Removes and returns a head chain from the collection.
pub fn remove_head_chain(&mut self, index: usize) -> SyncingChain<T> {
self.head_chains.swap_remove(index)
}
/// Removes a chain from either finalized or head chains based on the index. Using a request
/// iterates of finalized chains before head chains. Thus an index that is greater than the
/// finalized chain length, indicates a head chain.
///
/// This will re-status the chains peers on removal. The index must exist.
pub fn remove_chain(&mut self, network: &mut SyncNetworkContext<T::EthSpec>, index: usize) {
let chain = if index >= self.finalized_chains.len() {
let index = index - self.finalized_chains.len();
let chain = self.head_chains.swap_remove(index);
chain.status_peers(network);
chain
/// Adds a peer to a chain with the given target, or creates a new syncing chain if it doesn't
/// exits.
#[allow(clippy::too_many_arguments)]
pub fn add_peer_or_create_chain(
&mut self,
start_epoch: Epoch,
target_head_root: Hash256,
target_head_slot: Slot,
peer: PeerId,
sync_type: RangeSyncType,
beacon_processor_send: &mpsc::Sender<BeaconWorkEvent<T::EthSpec>>,
network: &mut SyncNetworkContext<T::EthSpec>,
) {
let id = SyncingChain::<T>::id(&target_head_root, &target_head_slot);
let collection = if let RangeSyncType::Finalized = sync_type {
if let Some(chain) = self.head_chains.get(&id) {
// sanity verification for chain duplication / purging issues
crit!(self.log, "Adding known head chain as finalized chain"; chain);
}
&mut self.finalized_chains
} else {
let chain = self.finalized_chains.swap_remove(index);
chain.status_peers(network);
chain
if let Some(chain) = self.finalized_chains.get(&id) {
// sanity verification for chain duplication / purging issues
crit!(self.log, "Adding known finalized chain as head chain"; chain);
}
&mut self.head_chains
};
debug!(self.log, "Chain was removed"; "start_epoch" => chain.start_epoch, "end_slot" => chain.target_head_slot);
// update the state
self.update(network);
match collection.entry(id) {
Entry::Occupied(mut entry) => {
let chain = entry.get_mut();
debug!(self.log, "Adding peer to known chain"; "peer_id" => %peer, "sync_type" => ?sync_type, &chain);
assert_eq!(chain.target_head_root, target_head_root);
assert_eq!(chain.target_head_slot, target_head_slot);
if let ProcessingResult::RemoveChain = chain.add_peer(network, peer) {
debug!(self.log, "Chain removed after adding peer"; "chain" => id);
entry.remove();
}
}
Entry::Vacant(entry) => {
let peer_rpr = peer.to_string();
let new_chain = SyncingChain::new(
start_epoch,
target_head_slot,
target_head_root,
peer,
beacon_processor_send.clone(),
self.beacon_chain.clone(),
&self.log,
);
assert_eq!(new_chain.get_id(), id);
debug!(self.log, "New chain added to sync"; "peer_id" => peer_rpr, "sync_type" => ?sync_type, &new_chain);
entry.insert(new_chain);
}
}
}
/// Returns the index of finalized chain that is currently syncing. Returns `None` if no
/// finalized chain is currently syncing.
fn finalized_syncing_index(&self) -> Option<usize> {
self.finalized_chains
.iter()
.enumerate()
.find_map(|(index, chain)| {
if chain.state == ChainSyncingState::Syncing {
Some(index)
} else {
None
}
})
}
/// Returns a chain given the target head root and slot.
fn get_chain<'a>(
chain: &'a mut [SyncingChain<T>],
target_head_root: Hash256,
target_head_slot: Slot,
) -> Option<&'a mut SyncingChain<T>> {
chain.iter_mut().find(|iter_chain| {
iter_chain.target_head_root == target_head_root
&& iter_chain.target_head_slot == target_head_slot
fn finalized_syncing_chain(&mut self) -> Option<(&ChainId, &mut SyncingChain<T>)> {
self.finalized_chains.iter_mut().find_map(|(id, chain)| {
if chain.state == ChainSyncingState::Syncing {
Some((id, chain))
} else {
None
}
})
}
}

View File

@ -7,6 +7,6 @@ mod chain_collection;
mod range;
mod sync_type;
pub use batch::Batch;
pub use chain::{ChainId, EPOCHS_PER_BATCH};
pub use batch::BatchInfo;
pub use chain::{BatchId, ChainId, EPOCHS_PER_BATCH};
pub use range::RangeSync;

View File

@ -39,7 +39,7 @@
//! Each chain is downloaded in batches of blocks. The batched blocks are processed sequentially
//! and further batches are requested as current blocks are being processed.
use super::chain::{ChainId, ProcessingResult};
use super::chain::ChainId;
use super::chain_collection::{ChainCollection, RangeSyncState};
use super::sync_type::RangeSyncType;
use crate::beacon_processor::WorkEvent as BeaconWorkEvent;
@ -49,7 +49,7 @@ use crate::sync::PeerSyncInfo;
use crate::sync::RequestId;
use beacon_chain::{BeaconChain, BeaconChainTypes};
use eth2_libp2p::{NetworkGlobals, PeerId};
use slog::{debug, error, trace};
use slog::{debug, error, trace, warn};
use std::collections::HashSet;
use std::sync::Arc;
use tokio::sync::mpsc;
@ -121,21 +121,15 @@ impl<T: BeaconChainTypes> RangeSync<T> {
let local_info = match PeerSyncInfo::from_chain(&self.beacon_chain) {
Some(local) => local,
None => {
return error!(
self.log,
"Failed to get peer sync info";
"msg" => "likely due to head lock contention"
)
return error!(self.log, "Failed to get peer sync info";
"msg" => "likely due to head lock contention")
}
};
// convenience variables
// convenience variable
let remote_finalized_slot = remote_info
.finalized_epoch
.start_slot(T::EthSpec::slots_per_epoch());
let local_finalized_slot = local_info
.finalized_epoch
.start_slot(T::EthSpec::slots_per_epoch());
// NOTE: A peer that has been re-status'd may now exist in multiple finalized chains.
@ -146,7 +140,7 @@ impl<T: BeaconChainTypes> RangeSync<T> {
match RangeSyncType::new(&self.beacon_chain, &local_info, &remote_info) {
RangeSyncType::Finalized => {
// Finalized chain search
debug!(self.log, "Finalization sync peer joined"; "peer_id" => format!("{:?}", peer_id));
debug!(self.log, "Finalization sync peer joined"; "peer_id" => %peer_id);
// remove the peer from the awaiting_head_peers list if it exists
self.awaiting_head_peers.remove(&peer_id);
@ -154,37 +148,19 @@ impl<T: BeaconChainTypes> RangeSync<T> {
// Note: We keep current head chains. These can continue syncing whilst we complete
// this new finalized chain.
// If a finalized chain already exists that matches, add this peer to the chain's peer
// pool.
if let Some(chain) = self
.chains
.get_finalized_mut(remote_info.finalized_root, remote_finalized_slot)
{
debug!(self.log, "Finalized chain exists, adding peer"; "peer_id" => peer_id.to_string(), "target_root" => chain.target_head_root.to_string(), "targe_slot" => chain.target_head_slot);
self.chains.add_peer_or_create_chain(
local_info.finalized_epoch,
remote_info.finalized_root,
remote_finalized_slot,
peer_id,
RangeSyncType::Finalized,
&self.beacon_processor_send,
network,
);
// add the peer to the chain's peer pool
chain.add_peer(network, peer_id);
// check if the new peer's addition will favour a new syncing chain.
self.chains.update(network);
// update the global sync state if necessary
self.chains.update_sync_state(network);
} else {
// there is no finalized chain that matches this peer's last finalized target
// create a new finalized chain
debug!(self.log, "New finalized chain added to sync"; "peer_id" => format!("{:?}", peer_id), "start_slot" => local_finalized_slot, "end_slot" => remote_finalized_slot, "finalized_root" => format!("{}", remote_info.finalized_root));
self.chains.new_finalized_chain(
local_info.finalized_epoch,
remote_info.finalized_root,
remote_finalized_slot,
peer_id,
self.beacon_processor_send.clone(),
);
self.chains.update(network);
// update the global sync state
self.chains.update_sync_state(network);
}
self.chains.update(network);
// update the global sync state
self.chains.update_sync_state(network);
}
RangeSyncType::Head => {
// This peer requires a head chain sync
@ -192,7 +168,7 @@ impl<T: BeaconChainTypes> RangeSync<T> {
if self.chains.is_finalizing_sync() {
// If there are finalized chains to sync, finish these first, before syncing head
// chains. This allows us to re-sync all known peers
trace!(self.log, "Waiting for finalized sync to complete"; "peer_id" => format!("{:?}", peer_id));
trace!(self.log, "Waiting for finalized sync to complete"; "peer_id" => %peer_id);
// store the peer to re-status after all finalized chains complete
self.awaiting_head_peers.insert(peer_id);
return;
@ -203,31 +179,18 @@ impl<T: BeaconChainTypes> RangeSync<T> {
// The new peer has the same finalized (earlier filters should prevent a peer with an
// earlier finalized chain from reaching here).
debug!(self.log, "New peer added for recent head sync"; "peer_id" => format!("{:?}", peer_id));
// search if there is a matching head chain, then add the peer to the chain
if let Some(chain) = self
.chains
.get_head_mut(remote_info.head_root, remote_info.head_slot)
{
debug!(self.log, "Adding peer to the existing head chain peer pool"; "head_root" => format!("{}",remote_info.head_root), "head_slot" => remote_info.head_slot, "peer_id" => format!("{:?}", peer_id));
// add the peer to the head's pool
chain.add_peer(network, peer_id);
} else {
// There are no other head chains that match this peer's status, create a new one, and
let start_epoch = std::cmp::min(local_info.head_slot, remote_finalized_slot)
.epoch(T::EthSpec::slots_per_epoch());
debug!(self.log, "Creating a new syncing head chain"; "head_root" => format!("{}",remote_info.head_root), "start_epoch" => start_epoch, "head_slot" => remote_info.head_slot, "peer_id" => format!("{:?}", peer_id));
self.chains.new_head_chain(
start_epoch,
remote_info.head_root,
remote_info.head_slot,
peer_id,
self.beacon_processor_send.clone(),
);
}
let start_epoch = std::cmp::min(local_info.head_slot, remote_finalized_slot)
.epoch(T::EthSpec::slots_per_epoch());
self.chains.add_peer_or_create_chain(
start_epoch,
remote_info.head_root,
remote_info.head_slot,
peer_id,
RangeSyncType::Head,
&self.beacon_processor_send,
network,
);
self.chains.update(network);
self.chains.update_sync_state(network);
}
@ -245,23 +208,27 @@ impl<T: BeaconChainTypes> RangeSync<T> {
request_id: RequestId,
beacon_block: Option<SignedBeaconBlock<T::EthSpec>>,
) {
// Find the request. Most likely the first finalized chain (the syncing chain). If there
// are no finalized chains, then it will be a head chain. At most, there should only be
// `connected_peers` number of head chains, which should be relatively small and this
// lookup should not be very expensive. However, we could add an extra index that maps the
// request id to index of the vector to avoid O(N) searches and O(N) hash lookups.
let id_not_found = self
.chains
.head_finalized_request(|chain| {
chain.on_block_response(network, request_id, &beacon_block)
})
.is_none();
if id_not_found {
// The request didn't exist in any `SyncingChain`. Could have been an old request or
// the chain was purged due to being out of date whilst a request was pending. Log
// and ignore.
debug!(self.log, "Range response without matching request"; "peer" => format!("{:?}", peer_id), "request_id" => request_id);
// get the chain and batch for which this response belongs
if let Some((chain_id, batch_id)) =
network.blocks_by_range_response(request_id, beacon_block.is_none())
{
// check if this chunk removes the chain
match self.chains.call_by_id(chain_id, |chain| {
chain.on_block_response(network, batch_id, peer_id, beacon_block)
}) {
Ok((removed_chain, sync_type)) => {
if let Some(removed_chain) = removed_chain {
debug!(self.log, "Chain removed after block response"; "sync_type" => ?sync_type, "chain_id" => chain_id);
removed_chain.status_peers(network);
// TODO: update & update_sync_state?
}
}
Err(_) => {
debug!(self.log, "BlocksByRange response for removed chain"; "chain" => chain_id)
}
}
} else {
warn!(self.log, "Response/Error for non registered request"; "request_id" => request_id)
}
}
@ -269,76 +236,57 @@ impl<T: BeaconChainTypes> RangeSync<T> {
&mut self,
network: &mut SyncNetworkContext<T::EthSpec>,
chain_id: ChainId,
epoch: Epoch,
downloaded_blocks: Vec<SignedBeaconBlock<T::EthSpec>>,
batch_id: Epoch,
result: BatchProcessResult,
) {
// build an option for passing the downloaded_blocks to each chain
let mut downloaded_blocks = Some(downloaded_blocks);
match self.chains.finalized_request(|chain| {
chain.on_batch_process_result(network, chain_id, epoch, &mut downloaded_blocks, &result)
// check if this response removes the chain
match self.chains.call_by_id(chain_id, |chain| {
chain.on_batch_process_result(network, batch_id, &result)
}) {
Some((index, ProcessingResult::RemoveChain)) => {
let chain = self.chains.remove_finalized_chain(index);
debug!(self.log, "Finalized chain removed"; "start_epoch" => chain.start_epoch, "end_slot" => chain.target_head_slot);
// update the state of the collection
self.chains.update(network);
// the chain is complete, re-status it's peers
chain.status_peers(network);
// set the state to a head sync if there are no finalized chains, to inform the manager that we are awaiting a
// head chain.
self.chains.set_head_sync();
// Update the global variables
self.chains.update_sync_state(network);
// if there are no more finalized chains, re-status all known peers awaiting a head
// sync
match self.chains.state() {
RangeSyncState::Idle | RangeSyncState::Head { .. } => {
for peer_id in self.awaiting_head_peers.drain() {
network.status_peer(self.beacon_chain.clone(), peer_id);
Ok((None, _sync_type)) => {
// Chain was found and not removed
}
Ok((Some(removed_chain), sync_type)) => {
debug!(self.log, "Chain removed after processing result"; "chain" => chain_id, "sync_type" => ?sync_type);
// Chain ended, re-status its peers
removed_chain.status_peers(network);
match sync_type {
RangeSyncType::Finalized => {
// update the state of the collection
self.chains.update(network);
// set the state to a head sync if there are no finalized chains, to inform
// the manager that we are awaiting a head chain.
self.chains.set_head_sync();
// Update the global variables
self.chains.update_sync_state(network);
// if there are no more finalized chains, re-status all known peers
// awaiting a head sync
match self.chains.state() {
RangeSyncState::Idle | RangeSyncState::Head { .. } => {
network.status_peers(
self.beacon_chain.clone(),
self.awaiting_head_peers.drain(),
);
}
RangeSyncState::Finalized { .. } => {} // Have more finalized chains to complete
}
}
RangeSyncState::Finalized { .. } => {} // Have more finalized chains to complete
}
}
Some((_, ProcessingResult::KeepChain)) => {}
None => {
match self.chains.head_request(|chain| {
chain.on_batch_process_result(
network,
chain_id,
epoch,
&mut downloaded_blocks,
&result,
)
}) {
Some((index, ProcessingResult::RemoveChain)) => {
let chain = self.chains.remove_head_chain(index);
debug!(self.log, "Head chain completed"; "start_epoch" => chain.start_epoch, "end_slot" => chain.target_head_slot);
// the chain is complete, re-status it's peers and remove it
chain.status_peers(network);
// Remove non-syncing head chains and re-status the peers
// This removes a build-up of potentially duplicate head chains. Any
// legitimate head chains will be re-established
RangeSyncType::Head => {
// Remove non-syncing head chains and re-status the peers. This removes a
// build-up of potentially duplicate head chains. Any legitimate head
// chains will be re-established
self.chains.clear_head_chains(network);
// update the state of the collection
self.chains.update(network);
// update the global state and log any change
self.chains.update_sync_state(network);
}
Some((_, ProcessingResult::KeepChain)) => {}
None => {
// This can happen if a chain gets purged due to being out of date whilst a
// batch process is in progress.
debug!(self.log, "No chains match the block processing id"; "batch_epoch" => epoch, "chain_id" => chain_id);
}
}
}
Err(_) => {
debug!(self.log, "BlocksByRange response for removed chain"; "chain" => chain_id)
}
}
}
@ -352,7 +300,7 @@ impl<T: BeaconChainTypes> RangeSync<T> {
// if the peer is in the awaiting head mapping, remove it
self.awaiting_head_peers.remove(peer_id);
// remove the peer from any peer pool
// remove the peer from any peer pool, failing its batches
self.remove_peer(network, peer_id);
// update the state of the collection
@ -361,30 +309,17 @@ impl<T: BeaconChainTypes> RangeSync<T> {
self.chains.update_sync_state(network);
}
/// When a peer gets removed, both the head and finalized chains need to be searched to check which pool the peer is in. The chain may also have a batch or batches awaiting
/// When a peer gets removed, both the head and finalized chains need to be searched to check
/// which pool the peer is in. The chain may also have a batch or batches awaiting
/// for this peer. If so we mark the batch as failed. The batch may then hit it's maximum
/// retries. In this case, we need to remove the chain and re-status all the peers.
fn remove_peer(&mut self, network: &mut SyncNetworkContext<T::EthSpec>, peer_id: &PeerId) {
for (index, result) in self.chains.head_finalized_request_all(|chain| {
if chain.peer_pool.remove(peer_id) {
// this chain contained the peer
while let Some(batch) = chain.pending_batches.remove_batch_by_peer(peer_id) {
if let ProcessingResult::RemoveChain = chain.failed_batch(network, batch) {
// a single batch failed, remove the chain
return Some(ProcessingResult::RemoveChain);
}
}
// peer removed from chain, no batch failed
Some(ProcessingResult::KeepChain)
} else {
None
}
}) {
if result == ProcessingResult::RemoveChain {
// the chain needed to be removed
debug!(self.log, "Chain being removed due to failed batch");
self.chains.remove_chain(network, index);
}
for (removed_chain, sync_type) in self
.chains
.call_all(|chain| chain.remove_peer(peer_id, network))
{
debug!(self.log, "Chain removed after removing peer"; "sync_type" => ?sync_type, "chain" => removed_chain.get_id());
// TODO: anything else to do?
}
}
@ -398,17 +333,25 @@ impl<T: BeaconChainTypes> RangeSync<T> {
peer_id: PeerId,
request_id: RequestId,
) {
// check that this request is pending
match self
.chains
.head_finalized_request(|chain| chain.inject_error(network, &peer_id, request_id))
{
Some((_, ProcessingResult::KeepChain)) => {} // error handled chain persists
Some((index, ProcessingResult::RemoveChain)) => {
debug!(self.log, "Chain being removed due to RPC error");
self.chains.remove_chain(network, index)
// get the chain and batch for which this response belongs
if let Some((chain_id, batch_id)) = network.blocks_by_range_response(request_id, true) {
// check that this request is pending
match self.chains.call_by_id(chain_id, |chain| {
chain.inject_error(network, batch_id, peer_id)
}) {
Ok((removed_chain, sync_type)) => {
if let Some(removed_chain) = removed_chain {
debug!(self.log, "Chain removed on rpc error"; "sync_type" => ?sync_type, "chain" => removed_chain.get_id());
removed_chain.status_peers(network);
// TODO: update & update_sync_state?
}
}
Err(_) => {
debug!(self.log, "BlocksByRange response for removed chain"; "chain" => chain_id)
}
}
None => {} // request wasn't in the finalized chains, check the head chains
} else {
warn!(self.log, "Response/Error for non registered request"; "request_id" => request_id)
}
}
}

View File

@ -6,6 +6,7 @@ use beacon_chain::{BeaconChain, BeaconChainTypes};
use std::sync::Arc;
/// The type of Range sync that should be done relative to our current state.
#[derive(Debug)]
pub enum RangeSyncType {
/// A finalized chain sync should be started with this peer.
Finalized,