Super Silky Smooth Syncs, like a Sir (#1628)

## Issue Addressed
In principle.. closes #1551 but in general are improvements for performance, maintainability and readability. The logic for the optimistic sync in actually simple

## Proposed Changes
There are miscellaneous things here:
- Remove unnecessary `BatchProcessResult::Partial` to simplify the batch validation logic
- Make batches a state machine. This is done to ensure batch state transitions respect our logic (this was previously done by moving batches between `Vec`s) and to ease the cognitive load of the `SyncingChain` struct
- Move most batch-related logic to the batch
- Remove `PendingBatches` in favor of a map of peers to their batches. This is to avoid duplicating peers inside the chain (peer_pool and pending_batches)
- Add `must_use` decoration to the `ProcessingResult` so that chains that request to be removed are handled accordingly. This also means that chains are now removed in more places than before to account for unhandled cases
- Store batches in a sorted map (`BTreeMap`) access is not O(1) but since the number of _active_ batches is bounded this should be fast, and saves performing hashing ops. Batches are indexed by the epoch they start. Sorted, to easily handle chain advancements (range logic)
- Produce the chain Id from the identifying fields: target root and target slot. This, to guarantee there can't be duplicated chains and be able to consistently search chains by either Id or checkpoint
- Fix chain_id not being present in all chain loggers
- Handle mega-edge case where the processor's work queue is full and the batch can't be sent. In this case the chain would lose the blocks, remain in a "syncing" state and waiting for a result that won't arrive, effectively stalling sync.
- When a batch imports blocks or the chain starts syncing with a local finalized epoch greater that the chain's start epoch, the chain is advanced instead of reset. This is to avoid losing download progress and validate batches faster. This also means that the old `start_epoch` now means "current first unvalidated batch", so it represents more accurately the progress of the chain.
- Batch status peers from the same chain to reduce Arc access.
- Handle a couple of cases where the retry counters for a batch were not updated/checked are now handled via the batch state machine. Basically now if we forget to do it, we will know.
- Do not send back the blocks from the processor to the batch. Instead register the attempt before sending the blocks (does not count as failed)
- When re-requesting a batch, try to avoid not only the last failed peer, but all previous failed peers.
- Optimize requesting batches ahead in the buffer by shuffling idle peers just once (this is just addressing a couple of old TODOs in the code)
- In chain_collection, store chains by their id in a map
- Include a mapping from request_ids to (chain, batch) that requested the batch to avoid the double O(n) search on block responses
- Other stuff:
  - impl `slog::KV` for batches
  - impl `slog::KV` for syncing chains
  - PSA: when logging, we can use `%thing` if `thing` implements `Display`. Same for `?` and `Debug`

### Optimistic syncing:
Try first the batch that contains the current head, if the batch imports any block, advance the chain. If not, if this optimistic batch is inside the current processing window leave it there for future use, if not drop it. The tolerance for this block is the same for downloading, but just once for processing



Co-authored-by: Age Manning <Age@AgeManning.com>
This commit is contained in:
divma 2020-09-23 06:29:55 +00:00
parent 80e52a0263
commit b8013b7b2c
13 changed files with 1480 additions and 1199 deletions

View File

@ -35,7 +35,6 @@ tokio-util = { version = "0.3.1", features = ["codec", "compat"] }
discv5 = { version = "0.1.0-alpha.10", features = ["libp2p"] } discv5 = { version = "0.1.0-alpha.10", features = ["libp2p"] }
tiny-keccak = "2.0.2" tiny-keccak = "2.0.2"
environment = { path = "../../lighthouse/environment" } environment = { path = "../../lighthouse/environment" }
# TODO: Remove rand crate for mainnet
rand = "0.7.3" rand = "0.7.3"
regex = "1.3.9" regex = "1.3.9"

View File

@ -32,6 +32,7 @@ tokio = { version = "0.2.21", features = ["full"] }
parking_lot = "0.11.0" parking_lot = "0.11.0"
smallvec = "1.4.1" smallvec = "1.4.1"
# TODO: Remove rand crate for mainnet # TODO: Remove rand crate for mainnet
# NOTE: why?
rand = "0.7.3" rand = "0.7.3"
fnv = "1.0.6" fnv = "1.0.6"
rlp = "0.4.5" rlp = "0.4.5"

View File

@ -28,39 +28,26 @@ pub fn handle_chain_segment<T: BeaconChainTypes>(
match process_id { match process_id {
// this a request from the range sync // this a request from the range sync
ProcessId::RangeBatchId(chain_id, epoch) => { ProcessId::RangeBatchId(chain_id, epoch) => {
let len = downloaded_blocks.len(); let start_slot = downloaded_blocks.first().map(|b| b.message.slot.as_u64());
let start_slot = if len > 0 { let end_slot = downloaded_blocks.last().map(|b| b.message.slot.as_u64());
downloaded_blocks[0].message.slot.as_u64() let sent_blocks = downloaded_blocks.len();
} else {
0
};
let end_slot = if len > 0 {
downloaded_blocks[len - 1].message.slot.as_u64()
} else {
0
};
debug!(log, "Processing batch"; "batch_epoch" => epoch, "blocks" => downloaded_blocks.len(), "first_block_slot" => start_slot, "last_block_slot" => end_slot, "service" => "sync");
let result = match process_blocks(chain, downloaded_blocks.iter(), &log) { let result = match process_blocks(chain, downloaded_blocks.iter(), &log) {
(_, Ok(_)) => { (_, Ok(_)) => {
debug!(log, "Batch processed"; "batch_epoch" => epoch , "first_block_slot" => start_slot, "last_block_slot" => end_slot, "service"=> "sync"); debug!(log, "Batch processed"; "batch_epoch" => epoch, "first_block_slot" => start_slot,
BatchProcessResult::Success "last_block_slot" => end_slot, "processed_blocks" => sent_blocks, "service"=> "sync");
BatchProcessResult::Success(sent_blocks > 0)
} }
(imported_blocks, Err(e)) if imported_blocks > 0 => { (imported_blocks, Err(e)) => {
debug!(log, "Batch processing failed but imported some blocks"; debug!(log, "Batch processing failed"; "batch_epoch" => epoch, "first_block_slot" => start_slot,
"batch_epoch" => epoch, "error" => e, "imported_blocks"=> imported_blocks, "service" => "sync"); "last_block_slot" => end_slot, "error" => e, "imported_blocks" => imported_blocks, "service" => "sync");
BatchProcessResult::Partial BatchProcessResult::Failed(imported_blocks > 0)
}
(_, Err(e)) => {
debug!(log, "Batch processing failed"; "batch_epoch" => epoch, "error" => e, "service" => "sync");
BatchProcessResult::Failed
} }
}; };
let msg = SyncMessage::BatchProcessed { let msg = SyncMessage::BatchProcessed {
chain_id, chain_id,
epoch, epoch,
downloaded_blocks,
result, result,
}; };
sync_send.send(msg).unwrap_or_else(|_| { sync_send.send(msg).unwrap_or_else(|_| {
@ -70,7 +57,7 @@ pub fn handle_chain_segment<T: BeaconChainTypes>(
); );
}); });
} }
// this a parent lookup request from the sync manager // this is a parent lookup request from the sync manager
ProcessId::ParentLookup(peer_id, chain_head) => { ProcessId::ParentLookup(peer_id, chain_head) => {
debug!( debug!(
log, "Processing parent lookup"; log, "Processing parent lookup";
@ -81,7 +68,7 @@ pub fn handle_chain_segment<T: BeaconChainTypes>(
// reverse // reverse
match process_blocks(chain, downloaded_blocks.iter().rev(), &log) { match process_blocks(chain, downloaded_blocks.iter().rev(), &log) {
(_, Err(e)) => { (_, Err(e)) => {
debug!(log, "Parent lookup failed"; "last_peer_id" => format!("{}", peer_id), "error" => e); debug!(log, "Parent lookup failed"; "last_peer_id" => %peer_id, "error" => e);
sync_send sync_send
.send(SyncMessage::ParentLookupFailed{peer_id, chain_head}) .send(SyncMessage::ParentLookupFailed{peer_id, chain_head})
.unwrap_or_else(|_| { .unwrap_or_else(|_| {
@ -114,13 +101,7 @@ fn process_blocks<
match chain.process_chain_segment(blocks) { match chain.process_chain_segment(blocks) {
ChainSegmentResult::Successful { imported_blocks } => { ChainSegmentResult::Successful { imported_blocks } => {
metrics::inc_counter(&metrics::BEACON_PROCESSOR_CHAIN_SEGMENT_SUCCESS_TOTAL); metrics::inc_counter(&metrics::BEACON_PROCESSOR_CHAIN_SEGMENT_SUCCESS_TOTAL);
if imported_blocks == 0 { if imported_blocks > 0 {
debug!(log, "All blocks already known");
} else {
debug!(
log, "Imported blocks from network";
"count" => imported_blocks,
);
// Batch completed successfully with at least one block, run fork choice. // Batch completed successfully with at least one block, run fork choice.
run_fork_choice(chain, log); run_fork_choice(chain, log);
} }
@ -153,7 +134,7 @@ fn run_fork_choice<T: BeaconChainTypes>(chain: Arc<BeaconChain<T>>, log: &slog::
Err(e) => error!( Err(e) => error!(
log, log,
"Fork choice failed"; "Fork choice failed";
"error" => format!("{:?}", e), "error" => ?e,
"location" => "batch import error" "location" => "batch import error"
), ),
} }
@ -219,7 +200,7 @@ fn handle_failed_chain_segment<T: EthSpec>(
warn!( warn!(
log, "BlockProcessingFailure"; log, "BlockProcessingFailure";
"msg" => "unexpected condition in processing block.", "msg" => "unexpected condition in processing block.",
"outcome" => format!("{:?}", e) "outcome" => ?e,
); );
Err(format!("Internal error whilst processing block: {:?}", e)) Err(format!("Internal error whilst processing block: {:?}", e))
@ -228,7 +209,7 @@ fn handle_failed_chain_segment<T: EthSpec>(
debug!( debug!(
log, "Invalid block received"; log, "Invalid block received";
"msg" => "peer sent invalid block", "msg" => "peer sent invalid block",
"outcome" => format!("{:?}", other), "outcome" => %other,
); );
Err(format!("Peer sent invalid block. Reason: {:?}", other)) Err(format!("Peer sent invalid block. Reason: {:?}", other))

View File

@ -535,9 +535,10 @@ impl<T: BeaconChainTypes> Worker<T> {
/// ///
/// Creates a log if there is an interal error. /// Creates a log if there is an interal error.
fn send_sync_message(&self, message: SyncMessage<T::EthSpec>) { fn send_sync_message(&self, message: SyncMessage<T::EthSpec>) {
self.sync_tx self.sync_tx.send(message).unwrap_or_else(|e| {
.send(message) error!(self.log, "Could not send message to the sync service";
.unwrap_or_else(|_| error!(self.log, "Could not send message to the sync service")); "error" => %e)
});
} }
/// Handle an error whilst verifying an `Attestation` or `SignedAggregateAndProof` from the /// Handle an error whilst verifying an `Attestation` or `SignedAggregateAndProof` from the

View File

@ -82,10 +82,11 @@ impl<T: BeaconChainTypes> Processor<T> {
} }
fn send_to_sync(&mut self, message: SyncMessage<T::EthSpec>) { fn send_to_sync(&mut self, message: SyncMessage<T::EthSpec>) {
self.sync_send.send(message).unwrap_or_else(|_| { self.sync_send.send(message).unwrap_or_else(|e| {
warn!( warn!(
self.log, self.log,
"Could not send message to the sync service"; "Could not send message to the sync service";
"error" => %e,
) )
}); });
} }
@ -691,9 +692,10 @@ impl<T: EthSpec> HandlerNetworkContext<T> {
/// Sends a message to the network task. /// Sends a message to the network task.
fn inform_network(&mut self, msg: NetworkMessage<T>) { fn inform_network(&mut self, msg: NetworkMessage<T>) {
let msg_r = &format!("{:?}", msg);
self.network_send self.network_send
.send(msg) .send(msg)
.unwrap_or_else(|_| warn!(self.log, "Could not send message to the network service")) .unwrap_or_else(|e| warn!(self.log, "Could not send message to the network service"; "error" => %e, "message" => msg_r))
} }
/// Disconnects and ban's a peer, sending a Goodbye request with the associated reason. /// Disconnects and ban's a peer, sending a Goodbye request with the associated reason.

View File

@ -29,9 +29,9 @@
//! //!
//! Block Lookup //! Block Lookup
//! //!
//! To keep the logic maintained to the syncing thread (and manage the request_ids), when a block needs to be searched for (i.e //! To keep the logic maintained to the syncing thread (and manage the request_ids), when a block
//! if an attestation references an unknown block) this manager can search for the block and //! needs to be searched for (i.e if an attestation references an unknown block) this manager can
//! subsequently search for parents if needed. //! search for the block and subsequently search for parents if needed.
use super::network_context::SyncNetworkContext; use super::network_context::SyncNetworkContext;
use super::peer_sync_info::{PeerSyncInfo, PeerSyncType}; use super::peer_sync_info::{PeerSyncInfo, PeerSyncType};
@ -106,7 +106,6 @@ pub enum SyncMessage<T: EthSpec> {
BatchProcessed { BatchProcessed {
chain_id: ChainId, chain_id: ChainId,
epoch: Epoch, epoch: Epoch,
downloaded_blocks: Vec<SignedBeaconBlock<T>>,
result: BatchProcessResult, result: BatchProcessResult,
}, },
@ -123,12 +122,10 @@ pub enum SyncMessage<T: EthSpec> {
// TODO: When correct batch error handling occurs, we will include an error type. // TODO: When correct batch error handling occurs, we will include an error type.
#[derive(Debug)] #[derive(Debug)]
pub enum BatchProcessResult { pub enum BatchProcessResult {
/// The batch was completed successfully. /// The batch was completed successfully. It carries whether the sent batch contained blocks.
Success, Success(bool),
/// The batch processing failed. /// The batch processing failed. It carries whether the processing imported any block.
Failed, Failed(bool),
/// The batch processing failed but managed to import at least one block.
Partial,
} }
/// Maintains a sequential list of parents to lookup and the lookup's current state. /// Maintains a sequential list of parents to lookup and the lookup's current state.
@ -275,9 +272,9 @@ impl<T: BeaconChainTypes> SyncManager<T> {
match local_peer_info.peer_sync_type(&remote) { match local_peer_info.peer_sync_type(&remote) {
PeerSyncType::FullySynced => { PeerSyncType::FullySynced => {
trace!(self.log, "Peer synced to our head found"; trace!(self.log, "Peer synced to our head found";
"peer" => format!("{:?}", peer_id), "peer" => %peer_id,
"peer_head_slot" => remote.head_slot, "peer_head_slot" => remote.head_slot,
"local_head_slot" => local_peer_info.head_slot, "local_head_slot" => local_peer_info.head_slot,
); );
self.synced_peer(&peer_id, remote); self.synced_peer(&peer_id, remote);
// notify the range sync that a peer has been added // notify the range sync that a peer has been added
@ -285,11 +282,11 @@ impl<T: BeaconChainTypes> SyncManager<T> {
} }
PeerSyncType::Advanced => { PeerSyncType::Advanced => {
trace!(self.log, "Useful peer for sync found"; trace!(self.log, "Useful peer for sync found";
"peer" => format!("{:?}", peer_id), "peer" => %peer_id,
"peer_head_slot" => remote.head_slot, "peer_head_slot" => remote.head_slot,
"local_head_slot" => local_peer_info.head_slot, "local_head_slot" => local_peer_info.head_slot,
"peer_finalized_epoch" => remote.finalized_epoch, "peer_finalized_epoch" => remote.finalized_epoch,
"local_finalized_epoch" => local_peer_info.finalized_epoch, "local_finalized_epoch" => local_peer_info.finalized_epoch,
); );
// There are few cases to handle here: // There are few cases to handle here:
@ -908,14 +905,12 @@ impl<T: BeaconChainTypes> SyncManager<T> {
SyncMessage::BatchProcessed { SyncMessage::BatchProcessed {
chain_id, chain_id,
epoch, epoch,
downloaded_blocks,
result, result,
} => { } => {
self.range_sync.handle_block_process_result( self.range_sync.handle_block_process_result(
&mut self.network, &mut self.network,
chain_id, chain_id,
epoch, epoch,
downloaded_blocks,
result, result,
); );
} }

View File

@ -1,11 +1,14 @@
//! Provides network functionality for the Syncing thread. This fundamentally wraps a network //! Provides network functionality for the Syncing thread. This fundamentally wraps a network
//! channel and stores a global RPC ID to perform requests. //! channel and stores a global RPC ID to perform requests.
use super::range_sync::{BatchId, ChainId};
use super::RequestId as SyncRequestId;
use crate::router::processor::status_message; use crate::router::processor::status_message;
use crate::service::NetworkMessage; use crate::service::NetworkMessage;
use beacon_chain::{BeaconChain, BeaconChainTypes}; use beacon_chain::{BeaconChain, BeaconChainTypes};
use eth2_libp2p::rpc::{BlocksByRangeRequest, BlocksByRootRequest, GoodbyeReason, RequestId}; use eth2_libp2p::rpc::{BlocksByRangeRequest, BlocksByRootRequest, GoodbyeReason, RequestId};
use eth2_libp2p::{Client, NetworkGlobals, PeerAction, PeerId, Request}; use eth2_libp2p::{Client, NetworkGlobals, PeerAction, PeerId, Request};
use fnv::FnvHashMap;
use slog::{debug, trace, warn}; use slog::{debug, trace, warn};
use std::sync::Arc; use std::sync::Arc;
use tokio::sync::mpsc; use tokio::sync::mpsc;
@ -21,7 +24,11 @@ pub struct SyncNetworkContext<T: EthSpec> {
network_globals: Arc<NetworkGlobals<T>>, network_globals: Arc<NetworkGlobals<T>>,
/// A sequential ID for all RPC requests. /// A sequential ID for all RPC requests.
request_id: usize, request_id: SyncRequestId,
/// BlocksByRange requests made by range syncing chains.
range_requests: FnvHashMap<SyncRequestId, (ChainId, BatchId)>,
/// Logger for the `SyncNetworkContext`. /// Logger for the `SyncNetworkContext`.
log: slog::Logger, log: slog::Logger,
} }
@ -36,6 +43,7 @@ impl<T: EthSpec> SyncNetworkContext<T> {
network_send, network_send,
network_globals, network_globals,
request_id: 1, request_id: 1,
range_requests: FnvHashMap::default(),
log, log,
} }
} }
@ -50,24 +58,26 @@ impl<T: EthSpec> SyncNetworkContext<T> {
.unwrap_or_default() .unwrap_or_default()
} }
pub fn status_peer<U: BeaconChainTypes>( pub fn status_peers<U: BeaconChainTypes>(
&mut self, &mut self,
chain: Arc<BeaconChain<U>>, chain: Arc<BeaconChain<U>>,
peer_id: PeerId, peers: impl Iterator<Item = PeerId>,
) { ) {
if let Some(status_message) = status_message(&chain) { if let Some(status_message) = status_message(&chain) {
debug!( for peer_id in peers {
self.log, debug!(
"Sending Status Request"; self.log,
"peer" => format!("{:?}", peer_id), "Sending Status Request";
"fork_digest" => format!("{:?}", status_message.fork_digest), "peer" => %peer_id,
"finalized_root" => format!("{:?}", status_message.finalized_root), "fork_digest" => ?status_message.fork_digest,
"finalized_epoch" => format!("{:?}", status_message.finalized_epoch), "finalized_root" => ?status_message.finalized_root,
"head_root" => format!("{}", status_message.head_root), "finalized_epoch" => ?status_message.finalized_epoch,
"head_slot" => format!("{}", status_message.head_slot), "head_root" => %status_message.head_root,
); "head_slot" => %status_message.head_slot,
);
let _ = self.send_rpc_request(peer_id, Request::Status(status_message)); let _ = self.send_rpc_request(peer_id, Request::Status(status_message.clone()));
}
} }
} }
@ -75,15 +85,34 @@ impl<T: EthSpec> SyncNetworkContext<T> {
&mut self, &mut self,
peer_id: PeerId, peer_id: PeerId,
request: BlocksByRangeRequest, request: BlocksByRangeRequest,
) -> Result<usize, &'static str> { chain_id: ChainId,
batch_id: BatchId,
) -> Result<(), &'static str> {
trace!( trace!(
self.log, self.log,
"Sending BlocksByRange Request"; "Sending BlocksByRange Request";
"method" => "BlocksByRange", "method" => "BlocksByRange",
"count" => request.count, "count" => request.count,
"peer" => format!("{:?}", peer_id) "peer" => %peer_id,
); );
self.send_rpc_request(peer_id, Request::BlocksByRange(request)) let req_id = self.send_rpc_request(peer_id, Request::BlocksByRange(request))?;
self.range_requests.insert(req_id, (chain_id, batch_id));
Ok(())
}
pub fn blocks_by_range_response(
&mut self,
request_id: usize,
remove: bool,
) -> Option<(ChainId, BatchId)> {
// NOTE: we can't guarantee that the request must be registered as it could receive more
// than an error, and be removed after receiving the first one.
// FIXME: https://github.com/sigp/lighthouse/issues/1634
if remove {
self.range_requests.remove(&request_id)
} else {
self.range_requests.get(&request_id).cloned()
}
} }
pub fn blocks_by_root_request( pub fn blocks_by_root_request(

View File

@ -1,35 +1,274 @@
use super::chain::EPOCHS_PER_BATCH; use eth2_libp2p::rpc::methods::BlocksByRangeRequest;
use eth2_libp2p::rpc::methods::*;
use eth2_libp2p::PeerId; use eth2_libp2p::PeerId;
use fnv::FnvHashMap;
use ssz::Encode; use ssz::Encode;
use std::cmp::min; use std::collections::HashSet;
use std::cmp::Ordering;
use std::collections::hash_map::Entry;
use std::collections::{HashMap, HashSet};
use std::hash::{Hash, Hasher}; use std::hash::{Hash, Hasher};
use std::ops::Sub; use std::ops::Sub;
use types::{Epoch, EthSpec, SignedBeaconBlock, Slot}; use types::{Epoch, EthSpec, SignedBeaconBlock, Slot};
/// A collection of sequential blocks that are requested from peers in a single RPC request. /// The number of times to retry a batch before it is considered failed.
#[derive(PartialEq, Debug)] const MAX_BATCH_DOWNLOAD_ATTEMPTS: u8 = 5;
pub struct Batch<T: EthSpec> {
/// The requested start epoch of the batch. /// Invalid batches are attempted to be re-downloaded from other peers. If a batch cannot be processed
pub start_epoch: Epoch, /// after `MAX_BATCH_PROCESSING_ATTEMPTS` times, it is considered faulty.
/// The requested end slot of batch, exclusive. const MAX_BATCH_PROCESSING_ATTEMPTS: u8 = 3;
pub end_slot: Slot,
/// The `Attempts` that have been made to send us this batch. /// A segment of a chain.
pub attempts: Vec<Attempt>, pub struct BatchInfo<T: EthSpec> {
/// The peer that is currently assigned to the batch. /// Start slot of the batch.
pub current_peer: PeerId, start_slot: Slot,
/// The number of retries this batch has undergone due to a failed request. /// End slot of the batch.
/// This occurs when peers do not respond or we get an RPC error. end_slot: Slot,
pub retries: u8, /// The `Attempts` that have been made and failed to send us this batch.
/// The number of times this batch has attempted to be re-downloaded and re-processed. This failed_processing_attempts: Vec<Attempt>,
/// occurs when a batch has been received but cannot be processed. /// The number of download retries this batch has undergone due to a failed request.
pub reprocess_retries: u8, failed_download_attempts: Vec<PeerId>,
/// The blocks that have been downloaded. /// State of the batch.
pub downloaded_blocks: Vec<SignedBeaconBlock<T>>, state: BatchState<T>,
}
/// Current state of a batch
pub enum BatchState<T: EthSpec> {
/// The batch has failed either downloading or processing, but can be requested again.
AwaitingDownload,
/// The batch is being downloaded.
Downloading(PeerId, Vec<SignedBeaconBlock<T>>),
/// The batch has been completely downloaded and is ready for processing.
AwaitingProcessing(PeerId, Vec<SignedBeaconBlock<T>>),
/// The batch is being processed.
Processing(Attempt),
/// The batch was successfully processed and is waiting to be validated.
///
/// It is not sufficient to process a batch successfully to consider it correct. This is
/// because batches could be erroneously empty, or incomplete. Therefore, a batch is considered
/// valid, only if the next sequential batch imports at least a block.
AwaitingValidation(Attempt),
/// Intermediate state for inner state handling.
Poisoned,
/// The batch has maxed out the allowed attempts for either downloading or processing. It
/// cannot be recovered.
Failed,
}
impl<T: EthSpec> BatchState<T> {
/// Helper function for poisoning a state.
pub fn poison(&mut self) -> BatchState<T> {
std::mem::replace(self, BatchState::Poisoned)
}
}
impl<T: EthSpec> BatchInfo<T> {
/// Batches are downloaded excluding the first block of the epoch assuming it has already been
/// downloaded.
///
/// For example:
///
/// Epoch boundary | |
/// ... | 30 | 31 | 32 | 33 | 34 | ... | 61 | 62 | 63 | 64 | 65 |
/// Batch 1 | Batch 2 | Batch 3
pub fn new(start_epoch: &Epoch, num_of_epochs: u64) -> Self {
let start_slot = start_epoch.start_slot(T::slots_per_epoch()) + 1;
let end_slot = start_slot + num_of_epochs * T::slots_per_epoch();
BatchInfo {
start_slot,
end_slot,
failed_processing_attempts: Vec::new(),
failed_download_attempts: Vec::new(),
state: BatchState::AwaitingDownload,
}
}
/// Gives a list of peers from which this batch has had a failed download or processing
/// attempt.
pub fn failed_peers(&self) -> HashSet<PeerId> {
let mut peers = HashSet::with_capacity(
self.failed_processing_attempts.len() + self.failed_download_attempts.len(),
);
for attempt in &self.failed_processing_attempts {
peers.insert(attempt.peer_id.clone());
}
for download in &self.failed_download_attempts {
peers.insert(download.clone());
}
peers
}
pub fn current_peer(&self) -> Option<&PeerId> {
match &self.state {
BatchState::AwaitingDownload | BatchState::Failed => None,
BatchState::Downloading(peer_id, _)
| BatchState::AwaitingProcessing(peer_id, _)
| BatchState::Processing(Attempt { peer_id, .. })
| BatchState::AwaitingValidation(Attempt { peer_id, .. }) => Some(&peer_id),
BatchState::Poisoned => unreachable!("Poisoned batch"),
}
}
pub fn to_blocks_by_range_request(&self) -> BlocksByRangeRequest {
BlocksByRangeRequest {
start_slot: self.start_slot.into(),
count: self.end_slot.sub(self.start_slot).into(),
step: 1,
}
}
pub fn state(&self) -> &BatchState<T> {
&self.state
}
pub fn attempts(&self) -> &[Attempt] {
&self.failed_processing_attempts
}
/// Adds a block to a downloading batch.
pub fn add_block(&mut self, block: SignedBeaconBlock<T>) {
match self.state.poison() {
BatchState::Downloading(peer, mut blocks) => {
blocks.push(block);
self.state = BatchState::Downloading(peer, blocks)
}
other => unreachable!("Add block for batch in wrong state: {:?}", other),
}
}
/// Marks the batch as ready to be processed if the blocks are in the range. The number of
/// received blocks is returned, or the wrong batch end on failure
#[must_use = "Batch may have failed"]
pub fn download_completed(
&mut self,
) -> Result<
usize, /* Received blocks */
(
Slot, /* expected slot */
Slot, /* received slot */
&BatchState<T>,
),
> {
match self.state.poison() {
BatchState::Downloading(peer, blocks) => {
// verify that blocks are in range
if let Some(last_slot) = blocks.last().map(|b| b.slot()) {
// the batch is non-empty
let first_slot = blocks[0].slot();
let failed_range = if first_slot < self.start_slot {
Some((self.start_slot, first_slot))
} else if self.end_slot < last_slot {
Some((self.end_slot, last_slot))
} else {
None
};
if let Some(range) = failed_range {
// this is a failed download, register the attempt and check if the batch
// can be tried again
self.failed_download_attempts.push(peer);
self.state = if self.failed_download_attempts.len()
>= MAX_BATCH_DOWNLOAD_ATTEMPTS as usize
{
BatchState::Failed
} else {
// drop the blocks
BatchState::AwaitingDownload
};
return Err((range.0, range.1, &self.state));
}
}
let received = blocks.len();
self.state = BatchState::AwaitingProcessing(peer, blocks);
Ok(received)
}
other => unreachable!("Download completed for batch in wrong state: {:?}", other),
}
}
#[must_use = "Batch may have failed"]
pub fn download_failed(&mut self) -> &BatchState<T> {
match self.state.poison() {
BatchState::Downloading(peer, _) => {
// register the attempt and check if the batch can be tried again
self.failed_download_attempts.push(peer);
self.state = if self.failed_download_attempts.len()
>= MAX_BATCH_DOWNLOAD_ATTEMPTS as usize
{
BatchState::Failed
} else {
// drop the blocks
BatchState::AwaitingDownload
};
&self.state
}
other => unreachable!("Download failed for batch in wrong state: {:?}", other),
}
}
pub fn start_downloading_from_peer(&mut self, peer: PeerId) {
match self.state.poison() {
BatchState::AwaitingDownload => {
self.state = BatchState::Downloading(peer, Vec::new());
}
other => unreachable!("Starting download for batch in wrong state: {:?}", other),
}
}
pub fn start_processing(&mut self) -> Vec<SignedBeaconBlock<T>> {
match self.state.poison() {
BatchState::AwaitingProcessing(peer, blocks) => {
self.state = BatchState::Processing(Attempt::new(peer, &blocks));
blocks
}
other => unreachable!("Start processing for batch in wrong state: {:?}", other),
}
}
#[must_use = "Batch may have failed"]
pub fn processing_completed(&mut self, was_sucessful: bool) -> &BatchState<T> {
match self.state.poison() {
BatchState::Processing(attempt) => {
self.state = if !was_sucessful {
// register the failed attempt
self.failed_processing_attempts.push(attempt);
// check if the batch can be downloaded again
if self.failed_processing_attempts.len()
>= MAX_BATCH_PROCESSING_ATTEMPTS as usize
{
BatchState::Failed
} else {
BatchState::AwaitingDownload
}
} else {
BatchState::AwaitingValidation(attempt)
};
&self.state
}
other => unreachable!("Processing completed for batch in wrong state: {:?}", other),
}
}
#[must_use = "Batch may have failed"]
pub fn validation_failed(&mut self) -> &BatchState<T> {
match self.state.poison() {
BatchState::AwaitingValidation(attempt) => {
self.failed_processing_attempts.push(attempt);
// check if the batch can be downloaded again
self.state = if self.failed_processing_attempts.len()
>= MAX_BATCH_PROCESSING_ATTEMPTS as usize
{
BatchState::Failed
} else {
BatchState::AwaitingDownload
};
&self.state
}
other => unreachable!("Validation failed for batch in wrong state: {:?}", other),
}
}
} }
/// Represents a peer's attempt and providing the result for this batch. /// Represents a peer's attempt and providing the result for this batch.
@ -43,131 +282,61 @@ pub struct Attempt {
pub hash: u64, pub hash: u64,
} }
impl<T: EthSpec> Eq for Batch<T> {} impl Attempt {
#[allow(clippy::ptr_arg)]
impl<T: EthSpec> Batch<T> { fn new<T: EthSpec>(peer_id: PeerId, blocks: &Vec<SignedBeaconBlock<T>>) -> Self {
pub fn new(start_epoch: Epoch, end_slot: Slot, peer_id: PeerId) -> Self {
Batch {
start_epoch,
end_slot,
attempts: Vec::new(),
current_peer: peer_id,
retries: 0,
reprocess_retries: 0,
downloaded_blocks: Vec::new(),
}
}
pub fn start_slot(&self) -> Slot {
// batches are shifted by 1
self.start_epoch.start_slot(T::slots_per_epoch()) + 1
}
pub fn end_slot(&self) -> Slot {
self.end_slot
}
pub fn to_blocks_by_range_request(&self) -> BlocksByRangeRequest {
let start_slot = self.start_slot();
BlocksByRangeRequest {
start_slot: start_slot.into(),
count: min(
T::slots_per_epoch() * EPOCHS_PER_BATCH,
self.end_slot.sub(start_slot).into(),
),
step: 1,
}
}
/// This gets a hash that represents the blocks currently downloaded. This allows comparing a
/// previously downloaded batch of blocks with a new downloaded batch of blocks.
pub fn hash(&self) -> u64 {
// the hash used is the ssz-encoded list of blocks
let mut hasher = std::collections::hash_map::DefaultHasher::new(); let mut hasher = std::collections::hash_map::DefaultHasher::new();
self.downloaded_blocks.as_ssz_bytes().hash(&mut hasher); blocks.as_ssz_bytes().hash(&mut hasher);
hasher.finish() let hash = hasher.finish();
Attempt { peer_id, hash }
} }
} }
impl<T: EthSpec> Ord for Batch<T> { impl<T: EthSpec> slog::KV for &mut BatchInfo<T> {
fn cmp(&self, other: &Self) -> Ordering { fn serialize(
self.start_epoch.cmp(&other.start_epoch) &self,
record: &slog::Record,
serializer: &mut dyn slog::Serializer,
) -> slog::Result {
slog::KV::serialize(*self, record, serializer)
} }
} }
impl<T: EthSpec> PartialOrd for Batch<T> { impl<T: EthSpec> slog::KV for BatchInfo<T> {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> { fn serialize(
Some(self.cmp(other)) &self,
record: &slog::Record,
serializer: &mut dyn slog::Serializer,
) -> slog::Result {
use slog::Value;
Value::serialize(&self.start_slot, record, "start_slot", serializer)?;
Value::serialize(
&(self.end_slot - 1), // NOTE: The -1 shows inclusive blocks
record,
"end_slot",
serializer,
)?;
serializer.emit_usize("downloaded", self.failed_download_attempts.len())?;
serializer.emit_usize("processed", self.failed_processing_attempts.len())?;
serializer.emit_str("state", &format!("{:?}", self.state))?;
slog::Result::Ok(())
} }
} }
/// A structure that contains a mapping of pending batch requests, that also keeps track of which impl<T: EthSpec> std::fmt::Debug for BatchState<T> {
/// peers are currently making batch requests. fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
/// match self {
/// This is used to optimise searches for idle peers (peers that have no outbound batch requests). BatchState::Processing(_) => f.write_str("Processing"),
pub struct PendingBatches<T: EthSpec> { BatchState::AwaitingValidation(_) => f.write_str("AwaitingValidation"),
/// The current pending batches. BatchState::AwaitingDownload => f.write_str("AwaitingDownload"),
batches: FnvHashMap<usize, Batch<T>>, BatchState::Failed => f.write_str("Failed"),
/// A mapping of peers to the number of pending requests. BatchState::AwaitingProcessing(ref peer, ref blocks) => {
peer_requests: HashMap<PeerId, HashSet<usize>>, write!(f, "AwaitingProcessing({}, {} blocks)", peer, blocks.len())
}
impl<T: EthSpec> PendingBatches<T> {
pub fn new() -> Self {
PendingBatches {
batches: FnvHashMap::default(),
peer_requests: HashMap::new(),
}
}
pub fn insert(&mut self, request_id: usize, batch: Batch<T>) -> Option<Batch<T>> {
let peer_request = batch.current_peer.clone();
self.peer_requests
.entry(peer_request)
.or_insert_with(HashSet::new)
.insert(request_id);
self.batches.insert(request_id, batch)
}
pub fn remove(&mut self, request_id: usize) -> Option<Batch<T>> {
if let Some(batch) = self.batches.remove(&request_id) {
if let Entry::Occupied(mut entry) = self.peer_requests.entry(batch.current_peer.clone())
{
entry.get_mut().remove(&request_id);
if entry.get().is_empty() {
entry.remove();
}
} }
Some(batch) BatchState::Downloading(peer, blocks) => {
} else { write!(f, "Downloading({}, {} blocks)", peer, blocks.len())
None }
BatchState::Poisoned => f.write_str("Poisoned"),
} }
} }
/// The number of current pending batch requests.
pub fn len(&self) -> usize {
self.batches.len()
}
/// Adds a block to the batches if the request id exists. Returns None if there is no batch
/// matching the request id.
pub fn add_block(&mut self, request_id: usize, block: SignedBeaconBlock<T>) -> Option<()> {
let batch = self.batches.get_mut(&request_id)?;
batch.downloaded_blocks.push(block);
Some(())
}
/// Returns true if there the peer does not exist in the peer_requests mapping. Indicating it
/// has no pending outgoing requests.
pub fn peer_is_idle(&self, peer_id: &PeerId) -> bool {
self.peer_requests.get(peer_id).is_none()
}
/// Removes a batch for a given peer.
pub fn remove_batch_by_peer(&mut self, peer_id: &PeerId) -> Option<Batch<T>> {
let request_ids = self.peer_requests.get(peer_id)?;
let request_id = *request_ids.iter().next()?;
self.remove(request_id)
}
} }

File diff suppressed because it is too large Load Diff

View File

@ -1,15 +1,18 @@
//! This provides the logic for the finalized and head chains. //! This provides the logic for the finalized and head chains.
//! //!
//! Each chain type is stored in it's own vector. A variety of helper functions are given along //! Each chain type is stored in it's own map. A variety of helper functions are given along with
//! with this struct to to simplify the logic of the other layers of sync. //! this struct to simplify the logic of the other layers of sync.
use super::chain::{ChainSyncingState, SyncingChain}; use super::chain::{ChainId, ChainSyncingState, ProcessingResult, SyncingChain};
use super::sync_type::RangeSyncType;
use crate::beacon_processor::WorkEvent as BeaconWorkEvent; use crate::beacon_processor::WorkEvent as BeaconWorkEvent;
use crate::sync::network_context::SyncNetworkContext; use crate::sync::network_context::SyncNetworkContext;
use crate::sync::PeerSyncInfo; use crate::sync::PeerSyncInfo;
use beacon_chain::{BeaconChain, BeaconChainTypes}; use beacon_chain::{BeaconChain, BeaconChainTypes};
use eth2_libp2p::{types::SyncState, NetworkGlobals, PeerId}; use eth2_libp2p::{types::SyncState, NetworkGlobals, PeerId};
use slog::{debug, error, info, o}; use fnv::FnvHashMap;
use slog::{crit, debug, error, info, trace};
use std::collections::hash_map::Entry;
use std::sync::Arc; use std::sync::Arc;
use tokio::sync::mpsc; use tokio::sync::mpsc;
use types::EthSpec; use types::EthSpec;
@ -83,9 +86,9 @@ pub struct ChainCollection<T: BeaconChainTypes> {
/// A reference to the global network parameters. /// A reference to the global network parameters.
network_globals: Arc<NetworkGlobals<T::EthSpec>>, network_globals: Arc<NetworkGlobals<T::EthSpec>>,
/// The set of finalized chains being synced. /// The set of finalized chains being synced.
finalized_chains: Vec<SyncingChain<T>>, finalized_chains: FnvHashMap<ChainId, SyncingChain<T>>,
/// The set of head chains being synced. /// The set of head chains being synced.
head_chains: Vec<SyncingChain<T>>, head_chains: FnvHashMap<ChainId, SyncingChain<T>>,
/// The current sync state of the process. /// The current sync state of the process.
state: RangeSyncState, state: RangeSyncState,
/// Logger for the collection. /// Logger for the collection.
@ -101,8 +104,8 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
ChainCollection { ChainCollection {
beacon_chain, beacon_chain,
network_globals, network_globals,
finalized_chains: Vec::new(), finalized_chains: FnvHashMap::default(),
head_chains: Vec::new(), head_chains: FnvHashMap::default(),
state: RangeSyncState::Idle, state: RangeSyncState::Idle,
log, log,
} }
@ -129,7 +132,7 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
.unwrap_or_else(|| SyncState::Stalled); .unwrap_or_else(|| SyncState::Stalled);
let mut peer_state = self.network_globals.sync_state.write(); let mut peer_state = self.network_globals.sync_state.write();
if new_state != *peer_state { if new_state != *peer_state {
info!(self.log, "Sync state updated"; "old_state" => format!("{}",peer_state), "new_state" => format!("{}",new_state)); info!(self.log, "Sync state updated"; "old_state" => %peer_state, "new_state" => %new_state);
if new_state == SyncState::Synced { if new_state == SyncState::Synced {
network.subscribe_core_topics(); network.subscribe_core_topics();
} }
@ -141,7 +144,7 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
let new_state: SyncState = self.state.clone().into(); let new_state: SyncState = self.state.clone().into();
if *node_sync_state != new_state { if *node_sync_state != new_state {
// we are updating the state, inform the user // we are updating the state, inform the user
info!(self.log, "Sync state updated"; "old_state" => format!("{}",node_sync_state), "new_state" => format!("{}",new_state)); info!(self.log, "Sync state updated"; "old_state" => %node_sync_state, "new_state" => %new_state);
} }
*node_sync_state = new_state; *node_sync_state = new_state;
} }
@ -182,30 +185,67 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
} }
} }
/// Finds any finalized chain if it exists. /// Calls `func` on every chain of the collection. If the result is
pub fn get_finalized_mut( /// `ProcessingResult::RemoveChain`, the chain is removed and returned.
&mut self, pub fn call_all<F>(&mut self, mut func: F) -> Vec<(SyncingChain<T>, RangeSyncType)>
target_head_root: Hash256, where
target_head_slot: Slot, F: FnMut(&mut SyncingChain<T>) -> ProcessingResult,
) -> Option<&mut SyncingChain<T>> { {
ChainCollection::get_chain( let mut to_remove = Vec::new();
self.finalized_chains.as_mut(),
target_head_root, for (id, chain) in self.finalized_chains.iter_mut() {
target_head_slot, if let ProcessingResult::RemoveChain = func(chain) {
) to_remove.push((*id, RangeSyncType::Finalized));
}
}
for (id, chain) in self.head_chains.iter_mut() {
if let ProcessingResult::RemoveChain = func(chain) {
to_remove.push((*id, RangeSyncType::Head));
}
}
let mut results = Vec::with_capacity(to_remove.len());
for (id, sync_type) in to_remove.into_iter() {
let chain = match sync_type {
RangeSyncType::Finalized => self.finalized_chains.remove(&id),
RangeSyncType::Head => self.head_chains.remove(&id),
};
results.push((chain.expect("Chain exits"), sync_type));
}
results
} }
/// Finds any finalized chain if it exists. /// Executes a function on the chain with the given id.
pub fn get_head_mut( ///
/// If the function returns `ProcessingResult::RemoveChain`, the chain is removed and returned.
/// If the chain is found, its syncing type is returned, or an error otherwise.
pub fn call_by_id<F>(
&mut self, &mut self,
target_head_root: Hash256, id: ChainId,
target_head_slot: Slot, func: F,
) -> Option<&mut SyncingChain<T>> { ) -> Result<(Option<SyncingChain<T>>, RangeSyncType), ()>
ChainCollection::get_chain( where
self.head_chains.as_mut(), F: FnOnce(&mut SyncingChain<T>) -> ProcessingResult,
target_head_root, {
target_head_slot, if let Entry::Occupied(mut entry) = self.finalized_chains.entry(id) {
) // Search in our finalized chains first
if let ProcessingResult::RemoveChain = func(entry.get_mut()) {
Ok((Some(entry.remove()), RangeSyncType::Finalized))
} else {
Ok((None, RangeSyncType::Finalized))
}
} else if let Entry::Occupied(mut entry) = self.head_chains.entry(id) {
// Search in our head chains next
if let ProcessingResult::RemoveChain = func(entry.get_mut()) {
Ok((Some(entry.remove()), RangeSyncType::Head))
} else {
Ok((None, RangeSyncType::Head))
}
} else {
// Chain was not found in the finalized collection, nor the head collection
Err(())
}
} }
/// Updates the state of the chain collection. /// Updates the state of the chain collection.
@ -214,9 +254,8 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
/// updates the state of the collection. This starts head chains syncing if any are required to /// updates the state of the collection. This starts head chains syncing if any are required to
/// do so. /// do so.
pub fn update(&mut self, network: &mut SyncNetworkContext<T::EthSpec>) { pub fn update(&mut self, network: &mut SyncNetworkContext<T::EthSpec>) {
let local_epoch = { let (local_finalized_epoch, local_head_epoch) =
let local = match PeerSyncInfo::from_chain(&self.beacon_chain) { match PeerSyncInfo::from_chain(&self.beacon_chain) {
Some(local) => local,
None => { None => {
return error!( return error!(
self.log, self.log,
@ -224,20 +263,21 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
"msg" => "likely due to head lock contention" "msg" => "likely due to head lock contention"
) )
} }
Some(local) => (
local.finalized_epoch,
local.head_slot.epoch(T::EthSpec::slots_per_epoch()),
),
}; };
local.finalized_epoch
};
// Remove any outdated finalized/head chains // Remove any outdated finalized/head chains
self.purge_outdated_chains(network); self.purge_outdated_chains(network);
// Choose the best finalized chain if one needs to be selected. // Choose the best finalized chain if one needs to be selected.
self.update_finalized_chains(network, local_epoch); self.update_finalized_chains(network, local_finalized_epoch, local_head_epoch);
if self.finalized_syncing_index().is_none() { if self.finalized_syncing_chain().is_none() {
// Handle head syncing chains if there are no finalized chains left. // Handle head syncing chains if there are no finalized chains left.
self.update_head_chains(network, local_epoch); self.update_head_chains(network, local_finalized_epoch, local_head_epoch);
} }
} }
@ -247,53 +287,57 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
&mut self, &mut self,
network: &mut SyncNetworkContext<T::EthSpec>, network: &mut SyncNetworkContext<T::EthSpec>,
local_epoch: Epoch, local_epoch: Epoch,
local_head_epoch: Epoch,
) { ) {
// Check if any chains become the new syncing chain // Find the chain with most peers and check if it is already syncing
if let Some(index) = self.finalized_syncing_index() { if let Some((new_id, peers)) = self
// There is a current finalized chain syncing
let _syncing_chain_peer_count = self.finalized_chains[index].peer_pool.len();
// search for a chain with more peers
if let Some((new_index, chain)) =
self.finalized_chains
.iter_mut()
.enumerate()
.find(|(_iter_index, _chain)| {
false
// && *iter_index != index
// && chain.peer_pool.len() > syncing_chain_peer_count
})
{
// A chain has more peers. Swap the syncing chain
debug!(self.log, "Switching finalized chains to sync"; "new_target_root" => format!("{}", chain.target_head_root), "new_end_slot" => chain.target_head_slot, "new_start_epoch"=> local_epoch);
// update the state to a new finalized state
let state = RangeSyncState::Finalized {
start_slot: chain.start_epoch.start_slot(T::EthSpec::slots_per_epoch()),
head_slot: chain.target_head_slot,
head_root: chain.target_head_root,
};
self.state = state;
// Stop the current chain from syncing
self.finalized_chains[index].stop_syncing();
// Start the new chain
self.finalized_chains[new_index].start_syncing(network, local_epoch);
}
} else if let Some(chain) = self
.finalized_chains .finalized_chains
.iter_mut() .iter()
.max_by_key(|chain| chain.peer_pool.len()) .max_by_key(|(_, chain)| chain.available_peers())
.map(|(id, chain)| (*id, chain.available_peers()))
{ {
// There is no currently syncing finalization chain, starting the one with the most peers let old_id = self.finalized_syncing_chain().map(
debug!(self.log, "New finalized chain started syncing"; "new_target_root" => format!("{}", chain.target_head_root), "new_end_slot" => chain.target_head_slot, "new_start_epoch"=> chain.start_epoch); |(currently_syncing_id, currently_syncing_chain)| {
chain.start_syncing(network, local_epoch); if *currently_syncing_id != new_id
&& peers > currently_syncing_chain.available_peers()
{
currently_syncing_chain.stop_syncing();
// we stop this chain and start syncing the one with more peers
Some(*currently_syncing_id)
} else {
// the best chain is already the syncing chain, advance it if possible
None
}
},
);
let chain = self
.finalized_chains
.get_mut(&new_id)
.expect("Chain exists");
match old_id {
Some(Some(old_id)) => debug!(self.log, "Switching finalized chains";
"old_id" => old_id, &chain),
None => debug!(self.log, "Syncing new chain"; &chain),
Some(None) => trace!(self.log, "Advancing currently syncing chain"),
// this is the same chain. We try to advance it.
}
// update the state to a new finalized state
let state = RangeSyncState::Finalized { let state = RangeSyncState::Finalized {
start_slot: chain.start_epoch.start_slot(T::EthSpec::slots_per_epoch()), start_slot: chain.start_epoch.start_slot(T::EthSpec::slots_per_epoch()),
head_slot: chain.target_head_slot, head_slot: chain.target_head_slot,
head_root: chain.target_head_root, head_root: chain.target_head_root,
}; };
self.state = state; self.state = state;
if let ProcessingResult::RemoveChain =
chain.start_syncing(network, local_epoch, local_head_epoch)
{
// this happens only if sending a batch over the `network` fails a lot
error!(self.log, "Chain removed while switching chains");
self.finalized_chains.remove(&new_id);
}
} }
} }
@ -302,6 +346,7 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
&mut self, &mut self,
network: &mut SyncNetworkContext<T::EthSpec>, network: &mut SyncNetworkContext<T::EthSpec>,
local_epoch: Epoch, local_epoch: Epoch,
local_head_epoch: Epoch,
) { ) {
// There are no finalized chains, update the state. // There are no finalized chains, update the state.
if self.head_chains.is_empty() { if self.head_chains.is_empty() {
@ -311,42 +356,41 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
let mut currently_syncing = self let mut currently_syncing = self
.head_chains .head_chains
.iter() .values()
.filter(|chain| chain.is_syncing()) .filter(|chain| chain.is_syncing())
.count(); .count();
let mut not_syncing = self.head_chains.len() - currently_syncing; let mut not_syncing = self.head_chains.len() - currently_syncing;
// Find all head chains that are not currently syncing ordered by peer count. // Find all head chains that are not currently syncing ordered by peer count.
while currently_syncing <= PARALLEL_HEAD_CHAINS && not_syncing > 0 { while currently_syncing <= PARALLEL_HEAD_CHAINS && not_syncing > 0 {
// Find the chain with the most peers and start syncing // Find the chain with the most peers and start syncing
if let Some((_index, chain)) = self if let Some((_id, chain)) = self
.head_chains .head_chains
.iter_mut() .iter_mut()
.filter(|chain| !chain.is_syncing()) .filter(|(_id, chain)| !chain.is_syncing())
.enumerate() .max_by_key(|(_id, chain)| chain.available_peers())
.max_by_key(|(_index, chain)| chain.peer_pool.len())
{ {
// start syncing this chain // start syncing this chain
debug!(self.log, "New head chain started syncing"; "new_target_root" => format!("{}", chain.target_head_root), "new_end_slot" => chain.target_head_slot, "new_start_epoch"=> chain.start_epoch); debug!(self.log, "New head chain started syncing"; &chain);
chain.start_syncing(network, local_epoch); if let ProcessingResult::RemoveChain =
chain.start_syncing(network, local_epoch, local_head_epoch)
{
error!(self.log, "Chain removed while switching head chains")
}
} }
// update variables // update variables
currently_syncing = self currently_syncing = self
.head_chains .head_chains
.iter() .iter()
.filter(|chain| chain.is_syncing()) .filter(|(_id, chain)| chain.is_syncing())
.count(); .count();
not_syncing = self.head_chains.len() - currently_syncing; not_syncing = self.head_chains.len() - currently_syncing;
} }
// Start // Start
// for the syncing API, we find the minimal start_slot and the maximum // for the syncing API, we find the minimal start_slot and the maximum
// target_slot of all head chains to report back. // target_slot of all head chains to report back.
let (min_epoch, max_slot) = self let (min_epoch, max_slot) = self
.head_chains .head_chains
.iter() .values()
.filter(|chain| chain.is_syncing()) .filter(|chain| chain.is_syncing())
.fold( .fold(
(Epoch::from(0u64), Slot::from(0u64)), (Epoch::from(0u64), Slot::from(0u64)),
@ -368,10 +412,9 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
/// chains and re-status their peers. /// chains and re-status their peers.
pub fn clear_head_chains(&mut self, network: &mut SyncNetworkContext<T::EthSpec>) { pub fn clear_head_chains(&mut self, network: &mut SyncNetworkContext<T::EthSpec>) {
let log_ref = &self.log; let log_ref = &self.log;
self.head_chains.retain(|chain| { self.head_chains.retain(|_id, chain| {
if !chain.is_syncing() if !chain.is_syncing() {
{ debug!(log_ref, "Removing old head chain"; &chain);
debug!(log_ref, "Removing old head chain"; "start_epoch" => chain.start_epoch, "end_slot" => chain.target_head_slot);
chain.status_peers(network); chain.status_peers(network);
false false
} else { } else {
@ -380,140 +423,20 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
}); });
} }
/// Add a new finalized chain to the collection.
pub fn new_finalized_chain(
&mut self,
local_finalized_epoch: Epoch,
target_head: Hash256,
target_slot: Slot,
peer_id: PeerId,
beacon_processor_send: mpsc::Sender<BeaconWorkEvent<T::EthSpec>>,
) {
let chain_id = rand::random();
self.finalized_chains.push(SyncingChain::new(
chain_id,
local_finalized_epoch,
target_slot,
target_head,
peer_id,
beacon_processor_send,
self.beacon_chain.clone(),
self.log.new(o!("chain" => chain_id)),
));
}
/// Add a new finalized chain to the collection and starts syncing it.
#[allow(clippy::too_many_arguments)]
pub fn new_head_chain(
&mut self,
remote_finalized_epoch: Epoch,
target_head: Hash256,
target_slot: Slot,
peer_id: PeerId,
beacon_processor_send: mpsc::Sender<BeaconWorkEvent<T::EthSpec>>,
) {
// remove the peer from any other head chains
self.head_chains.iter_mut().for_each(|chain| {
chain.peer_pool.remove(&peer_id);
});
self.head_chains.retain(|chain| !chain.peer_pool.is_empty());
let chain_id = rand::random();
let new_head_chain = SyncingChain::new(
chain_id,
remote_finalized_epoch,
target_slot,
target_head,
peer_id,
beacon_processor_send,
self.beacon_chain.clone(),
self.log.clone(),
);
self.head_chains.push(new_head_chain);
}
/// Returns if `true` if any finalized chains exist, `false` otherwise. /// Returns if `true` if any finalized chains exist, `false` otherwise.
pub fn is_finalizing_sync(&self) -> bool { pub fn is_finalizing_sync(&self) -> bool {
!self.finalized_chains.is_empty() !self.finalized_chains.is_empty()
} }
/// Given a chain iterator, runs a given function on each chain until the function returns
/// `Some`. This allows the `RangeSync` struct to loop over chains and optionally remove the
/// chain from the collection if the function results in completing the chain.
fn request_function<'a, F, I, U>(chain: I, mut func: F) -> Option<(usize, U)>
where
I: Iterator<Item = &'a mut SyncingChain<T>>,
F: FnMut(&'a mut SyncingChain<T>) -> Option<U>,
{
chain
.enumerate()
.find_map(|(index, chain)| Some((index, func(chain)?)))
}
/// Given a chain iterator, runs a given function on each chain and return all `Some` results.
fn request_function_all<'a, F, I, U>(chain: I, mut func: F) -> Vec<(usize, U)>
where
I: Iterator<Item = &'a mut SyncingChain<T>>,
F: FnMut(&'a mut SyncingChain<T>) -> Option<U>,
{
chain
.enumerate()
.filter_map(|(index, chain)| Some((index, func(chain)?)))
.collect()
}
/// Runs a function on finalized chains until we get the first `Some` result from `F`.
pub fn finalized_request<F, U>(&mut self, func: F) -> Option<(usize, U)>
where
F: FnMut(&mut SyncingChain<T>) -> Option<U>,
{
ChainCollection::request_function(self.finalized_chains.iter_mut(), func)
}
/// Runs a function on head chains until we get the first `Some` result from `F`.
pub fn head_request<F, U>(&mut self, func: F) -> Option<(usize, U)>
where
F: FnMut(&mut SyncingChain<T>) -> Option<U>,
{
ChainCollection::request_function(self.head_chains.iter_mut(), func)
}
/// Runs a function on finalized and head chains until we get the first `Some` result from `F`.
pub fn head_finalized_request<F, U>(&mut self, func: F) -> Option<(usize, U)>
where
F: FnMut(&mut SyncingChain<T>) -> Option<U>,
{
ChainCollection::request_function(
self.finalized_chains
.iter_mut()
.chain(self.head_chains.iter_mut()),
func,
)
}
/// Runs a function on all finalized and head chains and collects all `Some` results from `F`.
pub fn head_finalized_request_all<F, U>(&mut self, func: F) -> Vec<(usize, U)>
where
F: FnMut(&mut SyncingChain<T>) -> Option<U>,
{
ChainCollection::request_function_all(
self.finalized_chains
.iter_mut()
.chain(self.head_chains.iter_mut()),
func,
)
}
/// Removes any outdated finalized or head chains. /// Removes any outdated finalized or head chains.
///
/// This removes chains with no peers, or chains whose start block slot is less than our current /// This removes chains with no peers, or chains whose start block slot is less than our current
/// finalized block slot. /// finalized block slot.
pub fn purge_outdated_chains(&mut self, network: &mut SyncNetworkContext<T::EthSpec>) { pub fn purge_outdated_chains(&mut self, network: &mut SyncNetworkContext<T::EthSpec>) {
// Remove any chains that have no peers // Remove any chains that have no peers
self.finalized_chains self.finalized_chains
.retain(|chain| !chain.peer_pool.is_empty()); .retain(|_id, chain| chain.available_peers() > 0);
self.head_chains.retain(|chain| !chain.peer_pool.is_empty()); self.head_chains
.retain(|_id, chain| chain.available_peers() > 0);
let local_info = match PeerSyncInfo::from_chain(&self.beacon_chain) { let local_info = match PeerSyncInfo::from_chain(&self.beacon_chain) {
Some(local) => local, Some(local) => local,
@ -533,28 +456,28 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
let beacon_chain = &self.beacon_chain; let beacon_chain = &self.beacon_chain;
let log_ref = &self.log; let log_ref = &self.log;
// Remove chains that are out-dated and re-status their peers // Remove chains that are out-dated and re-status their peers
self.finalized_chains.retain(|chain| { self.finalized_chains.retain(|_id, chain| {
if chain.target_head_slot <= local_finalized_slot if chain.target_head_slot <= local_finalized_slot
|| beacon_chain || beacon_chain
.fork_choice .fork_choice
.read() .read()
.contains_block(&chain.target_head_root) .contains_block(&chain.target_head_root)
{ {
debug!(log_ref, "Purging out of finalized chain"; "start_epoch" => chain.start_epoch, "end_slot" => chain.target_head_slot); debug!(log_ref, "Purging out of finalized chain"; &chain);
chain.status_peers(network); chain.status_peers(network);
false false
} else { } else {
true true
} }
}); });
self.head_chains.retain(|chain| { self.head_chains.retain(|_id, chain| {
if chain.target_head_slot <= local_finalized_slot if chain.target_head_slot <= local_finalized_slot
|| beacon_chain || beacon_chain
.fork_choice .fork_choice
.read() .read()
.contains_block(&chain.target_head_root) .contains_block(&chain.target_head_root)
{ {
debug!(log_ref, "Purging out of date head chain"; "start_epoch" => chain.start_epoch, "end_slot" => chain.target_head_slot); debug!(log_ref, "Purging out of date head chain"; &chain);
chain.status_peers(network); chain.status_peers(network);
false false
} else { } else {
@ -563,63 +486,71 @@ impl<T: BeaconChainTypes> ChainCollection<T> {
}); });
} }
/// Removes and returns a finalized chain from the collection. /// Adds a peer to a chain with the given target, or creates a new syncing chain if it doesn't
pub fn remove_finalized_chain(&mut self, index: usize) -> SyncingChain<T> { /// exits.
self.finalized_chains.swap_remove(index) #[allow(clippy::too_many_arguments)]
} pub fn add_peer_or_create_chain(
&mut self,
/// Removes and returns a head chain from the collection. start_epoch: Epoch,
pub fn remove_head_chain(&mut self, index: usize) -> SyncingChain<T> { target_head_root: Hash256,
self.head_chains.swap_remove(index) target_head_slot: Slot,
} peer: PeerId,
sync_type: RangeSyncType,
/// Removes a chain from either finalized or head chains based on the index. Using a request beacon_processor_send: &mpsc::Sender<BeaconWorkEvent<T::EthSpec>>,
/// iterates of finalized chains before head chains. Thus an index that is greater than the network: &mut SyncNetworkContext<T::EthSpec>,
/// finalized chain length, indicates a head chain. ) {
/// let id = SyncingChain::<T>::id(&target_head_root, &target_head_slot);
/// This will re-status the chains peers on removal. The index must exist. let collection = if let RangeSyncType::Finalized = sync_type {
pub fn remove_chain(&mut self, network: &mut SyncNetworkContext<T::EthSpec>, index: usize) { if let Some(chain) = self.head_chains.get(&id) {
let chain = if index >= self.finalized_chains.len() { // sanity verification for chain duplication / purging issues
let index = index - self.finalized_chains.len(); crit!(self.log, "Adding known head chain as finalized chain"; chain);
let chain = self.head_chains.swap_remove(index); }
chain.status_peers(network); &mut self.finalized_chains
chain
} else { } else {
let chain = self.finalized_chains.swap_remove(index); if let Some(chain) = self.finalized_chains.get(&id) {
chain.status_peers(network); // sanity verification for chain duplication / purging issues
chain crit!(self.log, "Adding known finalized chain as head chain"; chain);
}
&mut self.head_chains
}; };
match collection.entry(id) {
debug!(self.log, "Chain was removed"; "start_epoch" => chain.start_epoch, "end_slot" => chain.target_head_slot); Entry::Occupied(mut entry) => {
let chain = entry.get_mut();
// update the state debug!(self.log, "Adding peer to known chain"; "peer_id" => %peer, "sync_type" => ?sync_type, &chain);
self.update(network); assert_eq!(chain.target_head_root, target_head_root);
assert_eq!(chain.target_head_slot, target_head_slot);
if let ProcessingResult::RemoveChain = chain.add_peer(network, peer) {
debug!(self.log, "Chain removed after adding peer"; "chain" => id);
entry.remove();
}
}
Entry::Vacant(entry) => {
let peer_rpr = peer.to_string();
let new_chain = SyncingChain::new(
start_epoch,
target_head_slot,
target_head_root,
peer,
beacon_processor_send.clone(),
self.beacon_chain.clone(),
&self.log,
);
assert_eq!(new_chain.get_id(), id);
debug!(self.log, "New chain added to sync"; "peer_id" => peer_rpr, "sync_type" => ?sync_type, &new_chain);
entry.insert(new_chain);
}
}
} }
/// Returns the index of finalized chain that is currently syncing. Returns `None` if no /// Returns the index of finalized chain that is currently syncing. Returns `None` if no
/// finalized chain is currently syncing. /// finalized chain is currently syncing.
fn finalized_syncing_index(&self) -> Option<usize> { fn finalized_syncing_chain(&mut self) -> Option<(&ChainId, &mut SyncingChain<T>)> {
self.finalized_chains self.finalized_chains.iter_mut().find_map(|(id, chain)| {
.iter() if chain.state == ChainSyncingState::Syncing {
.enumerate() Some((id, chain))
.find_map(|(index, chain)| { } else {
if chain.state == ChainSyncingState::Syncing { None
Some(index) }
} else {
None
}
})
}
/// Returns a chain given the target head root and slot.
fn get_chain<'a>(
chain: &'a mut [SyncingChain<T>],
target_head_root: Hash256,
target_head_slot: Slot,
) -> Option<&'a mut SyncingChain<T>> {
chain.iter_mut().find(|iter_chain| {
iter_chain.target_head_root == target_head_root
&& iter_chain.target_head_slot == target_head_slot
}) })
} }
} }

View File

@ -7,6 +7,6 @@ mod chain_collection;
mod range; mod range;
mod sync_type; mod sync_type;
pub use batch::Batch; pub use batch::BatchInfo;
pub use chain::{ChainId, EPOCHS_PER_BATCH}; pub use chain::{BatchId, ChainId, EPOCHS_PER_BATCH};
pub use range::RangeSync; pub use range::RangeSync;

View File

@ -39,7 +39,7 @@
//! Each chain is downloaded in batches of blocks. The batched blocks are processed sequentially //! Each chain is downloaded in batches of blocks. The batched blocks are processed sequentially
//! and further batches are requested as current blocks are being processed. //! and further batches are requested as current blocks are being processed.
use super::chain::{ChainId, ProcessingResult}; use super::chain::ChainId;
use super::chain_collection::{ChainCollection, RangeSyncState}; use super::chain_collection::{ChainCollection, RangeSyncState};
use super::sync_type::RangeSyncType; use super::sync_type::RangeSyncType;
use crate::beacon_processor::WorkEvent as BeaconWorkEvent; use crate::beacon_processor::WorkEvent as BeaconWorkEvent;
@ -49,7 +49,7 @@ use crate::sync::PeerSyncInfo;
use crate::sync::RequestId; use crate::sync::RequestId;
use beacon_chain::{BeaconChain, BeaconChainTypes}; use beacon_chain::{BeaconChain, BeaconChainTypes};
use eth2_libp2p::{NetworkGlobals, PeerId}; use eth2_libp2p::{NetworkGlobals, PeerId};
use slog::{debug, error, trace}; use slog::{debug, error, trace, warn};
use std::collections::HashSet; use std::collections::HashSet;
use std::sync::Arc; use std::sync::Arc;
use tokio::sync::mpsc; use tokio::sync::mpsc;
@ -121,21 +121,15 @@ impl<T: BeaconChainTypes> RangeSync<T> {
let local_info = match PeerSyncInfo::from_chain(&self.beacon_chain) { let local_info = match PeerSyncInfo::from_chain(&self.beacon_chain) {
Some(local) => local, Some(local) => local,
None => { None => {
return error!( return error!(self.log, "Failed to get peer sync info";
self.log, "msg" => "likely due to head lock contention")
"Failed to get peer sync info";
"msg" => "likely due to head lock contention"
)
} }
}; };
// convenience variables // convenience variable
let remote_finalized_slot = remote_info let remote_finalized_slot = remote_info
.finalized_epoch .finalized_epoch
.start_slot(T::EthSpec::slots_per_epoch()); .start_slot(T::EthSpec::slots_per_epoch());
let local_finalized_slot = local_info
.finalized_epoch
.start_slot(T::EthSpec::slots_per_epoch());
// NOTE: A peer that has been re-status'd may now exist in multiple finalized chains. // NOTE: A peer that has been re-status'd may now exist in multiple finalized chains.
@ -146,7 +140,7 @@ impl<T: BeaconChainTypes> RangeSync<T> {
match RangeSyncType::new(&self.beacon_chain, &local_info, &remote_info) { match RangeSyncType::new(&self.beacon_chain, &local_info, &remote_info) {
RangeSyncType::Finalized => { RangeSyncType::Finalized => {
// Finalized chain search // Finalized chain search
debug!(self.log, "Finalization sync peer joined"; "peer_id" => format!("{:?}", peer_id)); debug!(self.log, "Finalization sync peer joined"; "peer_id" => %peer_id);
// remove the peer from the awaiting_head_peers list if it exists // remove the peer from the awaiting_head_peers list if it exists
self.awaiting_head_peers.remove(&peer_id); self.awaiting_head_peers.remove(&peer_id);
@ -154,37 +148,19 @@ impl<T: BeaconChainTypes> RangeSync<T> {
// Note: We keep current head chains. These can continue syncing whilst we complete // Note: We keep current head chains. These can continue syncing whilst we complete
// this new finalized chain. // this new finalized chain.
// If a finalized chain already exists that matches, add this peer to the chain's peer self.chains.add_peer_or_create_chain(
// pool. local_info.finalized_epoch,
if let Some(chain) = self remote_info.finalized_root,
.chains remote_finalized_slot,
.get_finalized_mut(remote_info.finalized_root, remote_finalized_slot) peer_id,
{ RangeSyncType::Finalized,
debug!(self.log, "Finalized chain exists, adding peer"; "peer_id" => peer_id.to_string(), "target_root" => chain.target_head_root.to_string(), "targe_slot" => chain.target_head_slot); &self.beacon_processor_send,
network,
);
// add the peer to the chain's peer pool self.chains.update(network);
chain.add_peer(network, peer_id); // update the global sync state
self.chains.update_sync_state(network);
// check if the new peer's addition will favour a new syncing chain.
self.chains.update(network);
// update the global sync state if necessary
self.chains.update_sync_state(network);
} else {
// there is no finalized chain that matches this peer's last finalized target
// create a new finalized chain
debug!(self.log, "New finalized chain added to sync"; "peer_id" => format!("{:?}", peer_id), "start_slot" => local_finalized_slot, "end_slot" => remote_finalized_slot, "finalized_root" => format!("{}", remote_info.finalized_root));
self.chains.new_finalized_chain(
local_info.finalized_epoch,
remote_info.finalized_root,
remote_finalized_slot,
peer_id,
self.beacon_processor_send.clone(),
);
self.chains.update(network);
// update the global sync state
self.chains.update_sync_state(network);
}
} }
RangeSyncType::Head => { RangeSyncType::Head => {
// This peer requires a head chain sync // This peer requires a head chain sync
@ -192,7 +168,7 @@ impl<T: BeaconChainTypes> RangeSync<T> {
if self.chains.is_finalizing_sync() { if self.chains.is_finalizing_sync() {
// If there are finalized chains to sync, finish these first, before syncing head // If there are finalized chains to sync, finish these first, before syncing head
// chains. This allows us to re-sync all known peers // chains. This allows us to re-sync all known peers
trace!(self.log, "Waiting for finalized sync to complete"; "peer_id" => format!("{:?}", peer_id)); trace!(self.log, "Waiting for finalized sync to complete"; "peer_id" => %peer_id);
// store the peer to re-status after all finalized chains complete // store the peer to re-status after all finalized chains complete
self.awaiting_head_peers.insert(peer_id); self.awaiting_head_peers.insert(peer_id);
return; return;
@ -203,31 +179,18 @@ impl<T: BeaconChainTypes> RangeSync<T> {
// The new peer has the same finalized (earlier filters should prevent a peer with an // The new peer has the same finalized (earlier filters should prevent a peer with an
// earlier finalized chain from reaching here). // earlier finalized chain from reaching here).
debug!(self.log, "New peer added for recent head sync"; "peer_id" => format!("{:?}", peer_id));
// search if there is a matching head chain, then add the peer to the chain let start_epoch = std::cmp::min(local_info.head_slot, remote_finalized_slot)
if let Some(chain) = self .epoch(T::EthSpec::slots_per_epoch());
.chains self.chains.add_peer_or_create_chain(
.get_head_mut(remote_info.head_root, remote_info.head_slot) start_epoch,
{ remote_info.head_root,
debug!(self.log, "Adding peer to the existing head chain peer pool"; "head_root" => format!("{}",remote_info.head_root), "head_slot" => remote_info.head_slot, "peer_id" => format!("{:?}", peer_id)); remote_info.head_slot,
peer_id,
// add the peer to the head's pool RangeSyncType::Head,
chain.add_peer(network, peer_id); &self.beacon_processor_send,
} else { network,
// There are no other head chains that match this peer's status, create a new one, and );
let start_epoch = std::cmp::min(local_info.head_slot, remote_finalized_slot)
.epoch(T::EthSpec::slots_per_epoch());
debug!(self.log, "Creating a new syncing head chain"; "head_root" => format!("{}",remote_info.head_root), "start_epoch" => start_epoch, "head_slot" => remote_info.head_slot, "peer_id" => format!("{:?}", peer_id));
self.chains.new_head_chain(
start_epoch,
remote_info.head_root,
remote_info.head_slot,
peer_id,
self.beacon_processor_send.clone(),
);
}
self.chains.update(network); self.chains.update(network);
self.chains.update_sync_state(network); self.chains.update_sync_state(network);
} }
@ -245,23 +208,27 @@ impl<T: BeaconChainTypes> RangeSync<T> {
request_id: RequestId, request_id: RequestId,
beacon_block: Option<SignedBeaconBlock<T::EthSpec>>, beacon_block: Option<SignedBeaconBlock<T::EthSpec>>,
) { ) {
// Find the request. Most likely the first finalized chain (the syncing chain). If there // get the chain and batch for which this response belongs
// are no finalized chains, then it will be a head chain. At most, there should only be if let Some((chain_id, batch_id)) =
// `connected_peers` number of head chains, which should be relatively small and this network.blocks_by_range_response(request_id, beacon_block.is_none())
// lookup should not be very expensive. However, we could add an extra index that maps the {
// request id to index of the vector to avoid O(N) searches and O(N) hash lookups. // check if this chunk removes the chain
match self.chains.call_by_id(chain_id, |chain| {
let id_not_found = self chain.on_block_response(network, batch_id, peer_id, beacon_block)
.chains }) {
.head_finalized_request(|chain| { Ok((removed_chain, sync_type)) => {
chain.on_block_response(network, request_id, &beacon_block) if let Some(removed_chain) = removed_chain {
}) debug!(self.log, "Chain removed after block response"; "sync_type" => ?sync_type, "chain_id" => chain_id);
.is_none(); removed_chain.status_peers(network);
if id_not_found { // TODO: update & update_sync_state?
// The request didn't exist in any `SyncingChain`. Could have been an old request or }
// the chain was purged due to being out of date whilst a request was pending. Log }
// and ignore. Err(_) => {
debug!(self.log, "Range response without matching request"; "peer" => format!("{:?}", peer_id), "request_id" => request_id); debug!(self.log, "BlocksByRange response for removed chain"; "chain" => chain_id)
}
}
} else {
warn!(self.log, "Response/Error for non registered request"; "request_id" => request_id)
} }
} }
@ -269,76 +236,57 @@ impl<T: BeaconChainTypes> RangeSync<T> {
&mut self, &mut self,
network: &mut SyncNetworkContext<T::EthSpec>, network: &mut SyncNetworkContext<T::EthSpec>,
chain_id: ChainId, chain_id: ChainId,
epoch: Epoch, batch_id: Epoch,
downloaded_blocks: Vec<SignedBeaconBlock<T::EthSpec>>,
result: BatchProcessResult, result: BatchProcessResult,
) { ) {
// build an option for passing the downloaded_blocks to each chain // check if this response removes the chain
let mut downloaded_blocks = Some(downloaded_blocks); match self.chains.call_by_id(chain_id, |chain| {
chain.on_batch_process_result(network, batch_id, &result)
match self.chains.finalized_request(|chain| {
chain.on_batch_process_result(network, chain_id, epoch, &mut downloaded_blocks, &result)
}) { }) {
Some((index, ProcessingResult::RemoveChain)) => { Ok((None, _sync_type)) => {
let chain = self.chains.remove_finalized_chain(index); // Chain was found and not removed
debug!(self.log, "Finalized chain removed"; "start_epoch" => chain.start_epoch, "end_slot" => chain.target_head_slot); }
// update the state of the collection Ok((Some(removed_chain), sync_type)) => {
self.chains.update(network); debug!(self.log, "Chain removed after processing result"; "chain" => chain_id, "sync_type" => ?sync_type);
// Chain ended, re-status its peers
// the chain is complete, re-status it's peers removed_chain.status_peers(network);
chain.status_peers(network); match sync_type {
RangeSyncType::Finalized => {
// set the state to a head sync if there are no finalized chains, to inform the manager that we are awaiting a // update the state of the collection
// head chain. self.chains.update(network);
self.chains.set_head_sync(); // set the state to a head sync if there are no finalized chains, to inform
// Update the global variables // the manager that we are awaiting a head chain.
self.chains.update_sync_state(network); self.chains.set_head_sync();
// Update the global variables
// if there are no more finalized chains, re-status all known peers awaiting a head self.chains.update_sync_state(network);
// sync // if there are no more finalized chains, re-status all known peers
match self.chains.state() { // awaiting a head sync
RangeSyncState::Idle | RangeSyncState::Head { .. } => { match self.chains.state() {
for peer_id in self.awaiting_head_peers.drain() { RangeSyncState::Idle | RangeSyncState::Head { .. } => {
network.status_peer(self.beacon_chain.clone(), peer_id); network.status_peers(
self.beacon_chain.clone(),
self.awaiting_head_peers.drain(),
);
}
RangeSyncState::Finalized { .. } => {} // Have more finalized chains to complete
} }
} }
RangeSyncState::Finalized { .. } => {} // Have more finalized chains to complete RangeSyncType::Head => {
} // Remove non-syncing head chains and re-status the peers. This removes a
} // build-up of potentially duplicate head chains. Any legitimate head
Some((_, ProcessingResult::KeepChain)) => {} // chains will be re-established
None => {
match self.chains.head_request(|chain| {
chain.on_batch_process_result(
network,
chain_id,
epoch,
&mut downloaded_blocks,
&result,
)
}) {
Some((index, ProcessingResult::RemoveChain)) => {
let chain = self.chains.remove_head_chain(index);
debug!(self.log, "Head chain completed"; "start_epoch" => chain.start_epoch, "end_slot" => chain.target_head_slot);
// the chain is complete, re-status it's peers and remove it
chain.status_peers(network);
// Remove non-syncing head chains and re-status the peers
// This removes a build-up of potentially duplicate head chains. Any
// legitimate head chains will be re-established
self.chains.clear_head_chains(network); self.chains.clear_head_chains(network);
// update the state of the collection // update the state of the collection
self.chains.update(network); self.chains.update(network);
// update the global state and log any change // update the global state and log any change
self.chains.update_sync_state(network); self.chains.update_sync_state(network);
} }
Some((_, ProcessingResult::KeepChain)) => {}
None => {
// This can happen if a chain gets purged due to being out of date whilst a
// batch process is in progress.
debug!(self.log, "No chains match the block processing id"; "batch_epoch" => epoch, "chain_id" => chain_id);
}
} }
} }
Err(_) => {
debug!(self.log, "BlocksByRange response for removed chain"; "chain" => chain_id)
}
} }
} }
@ -352,7 +300,7 @@ impl<T: BeaconChainTypes> RangeSync<T> {
// if the peer is in the awaiting head mapping, remove it // if the peer is in the awaiting head mapping, remove it
self.awaiting_head_peers.remove(peer_id); self.awaiting_head_peers.remove(peer_id);
// remove the peer from any peer pool // remove the peer from any peer pool, failing its batches
self.remove_peer(network, peer_id); self.remove_peer(network, peer_id);
// update the state of the collection // update the state of the collection
@ -361,30 +309,17 @@ impl<T: BeaconChainTypes> RangeSync<T> {
self.chains.update_sync_state(network); self.chains.update_sync_state(network);
} }
/// When a peer gets removed, both the head and finalized chains need to be searched to check which pool the peer is in. The chain may also have a batch or batches awaiting /// When a peer gets removed, both the head and finalized chains need to be searched to check
/// which pool the peer is in. The chain may also have a batch or batches awaiting
/// for this peer. If so we mark the batch as failed. The batch may then hit it's maximum /// for this peer. If so we mark the batch as failed. The batch may then hit it's maximum
/// retries. In this case, we need to remove the chain and re-status all the peers. /// retries. In this case, we need to remove the chain and re-status all the peers.
fn remove_peer(&mut self, network: &mut SyncNetworkContext<T::EthSpec>, peer_id: &PeerId) { fn remove_peer(&mut self, network: &mut SyncNetworkContext<T::EthSpec>, peer_id: &PeerId) {
for (index, result) in self.chains.head_finalized_request_all(|chain| { for (removed_chain, sync_type) in self
if chain.peer_pool.remove(peer_id) { .chains
// this chain contained the peer .call_all(|chain| chain.remove_peer(peer_id, network))
while let Some(batch) = chain.pending_batches.remove_batch_by_peer(peer_id) { {
if let ProcessingResult::RemoveChain = chain.failed_batch(network, batch) { debug!(self.log, "Chain removed after removing peer"; "sync_type" => ?sync_type, "chain" => removed_chain.get_id());
// a single batch failed, remove the chain // TODO: anything else to do?
return Some(ProcessingResult::RemoveChain);
}
}
// peer removed from chain, no batch failed
Some(ProcessingResult::KeepChain)
} else {
None
}
}) {
if result == ProcessingResult::RemoveChain {
// the chain needed to be removed
debug!(self.log, "Chain being removed due to failed batch");
self.chains.remove_chain(network, index);
}
} }
} }
@ -398,17 +333,25 @@ impl<T: BeaconChainTypes> RangeSync<T> {
peer_id: PeerId, peer_id: PeerId,
request_id: RequestId, request_id: RequestId,
) { ) {
// check that this request is pending // get the chain and batch for which this response belongs
match self if let Some((chain_id, batch_id)) = network.blocks_by_range_response(request_id, true) {
.chains // check that this request is pending
.head_finalized_request(|chain| chain.inject_error(network, &peer_id, request_id)) match self.chains.call_by_id(chain_id, |chain| {
{ chain.inject_error(network, batch_id, peer_id)
Some((_, ProcessingResult::KeepChain)) => {} // error handled chain persists }) {
Some((index, ProcessingResult::RemoveChain)) => { Ok((removed_chain, sync_type)) => {
debug!(self.log, "Chain being removed due to RPC error"); if let Some(removed_chain) = removed_chain {
self.chains.remove_chain(network, index) debug!(self.log, "Chain removed on rpc error"; "sync_type" => ?sync_type, "chain" => removed_chain.get_id());
removed_chain.status_peers(network);
// TODO: update & update_sync_state?
}
}
Err(_) => {
debug!(self.log, "BlocksByRange response for removed chain"; "chain" => chain_id)
}
} }
None => {} // request wasn't in the finalized chains, check the head chains } else {
warn!(self.log, "Response/Error for non registered request"; "request_id" => request_id)
} }
} }
} }

View File

@ -6,6 +6,7 @@ use beacon_chain::{BeaconChain, BeaconChainTypes};
use std::sync::Arc; use std::sync::Arc;
/// The type of Range sync that should be done relative to our current state. /// The type of Range sync that should be done relative to our current state.
#[derive(Debug)]
pub enum RangeSyncType { pub enum RangeSyncType {
/// A finalized chain sync should be started with this peer. /// A finalized chain sync should be started with this peer.
Finalized, Finalized,