lighthouse

Author	SHA1	Message	Date
realbigsean	8a70d80a2f	Revert "Revert "renames, remove , wrap BlockWrapper enum to make descontruction private"" This reverts commit `1931a442dc`.	2022-12-28 10:31:18 -05:00
realbigsean	1931a442dc	Revert "renames, remove , wrap BlockWrapper enum to make descontruction private" This reverts commit `5b3b34a9d7`.	2022-12-28 10:30:36 -05:00
realbigsean	5b3b34a9d7	renames, remove , wrap BlockWrapper enum to make descontruction private	2022-12-28 10:28:45 -05:00
Divma	bf5005244e	Blob syncing (#24 ) * add a rt is_blob_batch * use the mixed type everywhere * glue * more glue * minor fixes * fix range tests * filling in the gaps * moore filling in the gaps	2022-11-24 07:45:38 -05:00
Divma	f4ffa9e0b4	Handle processing results of non faulty batches (#3439 ) ## Issue Addressed Solves #3390 So after checking some logs @pawanjay176 got, we conclude that this happened because we blacklisted a chain after trying it "too much". Now here, in all occurrences it seems that "too much" means we got too many download failures. This happened very slowly, exactly because the batch is allowed to stay alive for very long times after not counting penalties when the ee is offline. The error here then was not that the batch failed because of offline ee errors, but that we blacklisted a chain because of download errors, which we can't pin on the chain but on the peer. This PR fixes that. ## Proposed Changes Adds a missing piece of logic so that if a chain fails for errors that can't be attributed to an objectively bad behavior from the peer, it is not blacklisted. The issue at hand occurred when new peers arrived claiming a head that had wrongfully blacklisted, even if the original peers participating in the chain were not penalized. Another notable change is that we need to consider a batch invalid if it processed correctly but its next non empty batch fails processing. Now since a batch can fail processing in non empty ways, there is no need to mark as invalid previous batches. Improves some logging as well. ## Additional Info We should do this regardless of pausing sync on ee offline/unsynced state. This is because I think it's almost impossible to ensure a processing result will reach in a predictable order with a synced notification from the ee. Doing this handles what I think are inevitable data races when we actually pause sync This also fixes a return that reports which batch failed and caused us some confusion checking the logs	2022-08-12 00:56:38 +00:00
Divma	cfd26d25e0	do not count sync batch attempts when peer is not at fault (#3245 ) ## Issue Addressed currently we count a failed attempt for a syncing chain even if the peer is not at fault. This makes us do more work if the chain fails, and heavily penalize peers, when we can simply retry. Inspired by a proposal I made to #3094 ## Proposed Changes If a batch fails but the peer is not at fault, do not count the attempt Also removes some annoying logs ## Additional Info We still get a counter on ignored attempts.. just in case	2022-06-07 02:35:56 +00:00
Divma	31386277c3	Sync wrong dbg assertion (#2821 ) ## Issue Addressed Running a beacon node I triggered a sync debug panic. And so finally the time to create tests for sync arrived. Fortunately, te bug was not in the sync algorithm itself but a wrong assertion ## Proposed Changes - Split Range's impl from the BeaconChain via a trait. This is needed for testing. The TestingRig/Harness is way bigger than needed and does not provide the modification functionalities that are needed to test sync. I find this simpler, tho some could disagree. - Add a regression test for sync that fails before the changes. - Fix the wrong assertion.	2021-11-19 02:38:25 +00:00
Michael Sproul	9667dc2f03	Implement checkpoint sync (#2244 ) ## Issue Addressed Closes #1891 Closes #1784 ## Proposed Changes Implement checkpoint sync for Lighthouse, enabling it to start from a weak subjectivity checkpoint. ## Additional Info - [x] Return unavailable status for out-of-range blocks requested by peers (#2561) - [x] Implement sync daemon for fetching historical blocks (#2561) - [x] Verify chain hashes (either in `historical_blocks.rs` or the calling module) - [x] Consistency check for initial block + state - [x] Fetch the initial state and block from a beacon node HTTP endpoint - [x] Don't crash fetching beacon states by slot from the API - [x] Background service for state reconstruction, triggered by CLI flag or API call. Considered out of scope for this PR: - Drop the requirement to provide the `--checkpoint-block` (this would require some pretty heavy refactoring of block verification) Co-authored-by: Diva M <divma@protonmail.com>	2021-09-22 00:37:28 +00:00
divma	2acf75785c	More sync updates (#1791 ) ## Issue Addressed #1614 and a couple of sync-stalling problems, the most important is a cyclic dependency between the sync manager and the peer manager	2020-10-20 22:34:18 +00:00
divma	b8013b7b2c	Super Silky Smooth Syncs, like a Sir (#1628 ) ## Issue Addressed In principle.. closes #1551 but in general are improvements for performance, maintainability and readability. The logic for the optimistic sync in actually simple ## Proposed Changes There are miscellaneous things here: - Remove unnecessary `BatchProcessResult::Partial` to simplify the batch validation logic - Make batches a state machine. This is done to ensure batch state transitions respect our logic (this was previously done by moving batches between `Vec`s) and to ease the cognitive load of the `SyncingChain` struct - Move most batch-related logic to the batch - Remove `PendingBatches` in favor of a map of peers to their batches. This is to avoid duplicating peers inside the chain (peer_pool and pending_batches) - Add `must_use` decoration to the `ProcessingResult` so that chains that request to be removed are handled accordingly. This also means that chains are now removed in more places than before to account for unhandled cases - Store batches in a sorted map (`BTreeMap`) access is not O(1) but since the number of _active_ batches is bounded this should be fast, and saves performing hashing ops. Batches are indexed by the epoch they start. Sorted, to easily handle chain advancements (range logic) - Produce the chain Id from the identifying fields: target root and target slot. This, to guarantee there can't be duplicated chains and be able to consistently search chains by either Id or checkpoint - Fix chain_id not being present in all chain loggers - Handle mega-edge case where the processor's work queue is full and the batch can't be sent. In this case the chain would lose the blocks, remain in a "syncing" state and waiting for a result that won't arrive, effectively stalling sync. - When a batch imports blocks or the chain starts syncing with a local finalized epoch greater that the chain's start epoch, the chain is advanced instead of reset. This is to avoid losing download progress and validate batches faster. This also means that the old `start_epoch` now means "current first unvalidated batch", so it represents more accurately the progress of the chain. - Batch status peers from the same chain to reduce Arc access. - Handle a couple of cases where the retry counters for a batch were not updated/checked are now handled via the batch state machine. Basically now if we forget to do it, we will know. - Do not send back the blocks from the processor to the batch. Instead register the attempt before sending the blocks (does not count as failed) - When re-requesting a batch, try to avoid not only the last failed peer, but all previous failed peers. - Optimize requesting batches ahead in the buffer by shuffling idle peers just once (this is just addressing a couple of old TODOs in the code) - In chain_collection, store chains by their id in a map - Include a mapping from request_ids to (chain, batch) that requested the batch to avoid the double O(n) search on block responses - Other stuff: - impl `slog::KV` for batches - impl `slog::KV` for syncing chains - PSA: when logging, we can use `%thing` if `thing` implements `Display`. Same for `?` and `Debug` ### Optimistic syncing: Try first the batch that contains the current head, if the batch imports any block, advance the chain. If not, if this optimistic batch is inside the current processing window leave it there for future use, if not drop it. The tolerance for this block is the same for downloading, but just once for processing Co-authored-by: Age Manning <Age@AgeManning.com>	2020-09-23 06:29:55 +00:00
divma	46dbf027af	Do not reset batch ids & redownload out of range batches (#1528 ) The changes are somewhat simple but should solve two issues: - When quickly changing between chains once and a second time back again, batchIds would collide and cause havoc. - If we got an out of range response from a peer, sync would remain in syncing but without advancing Changes: - remove the batch id. Identify each batch (inside a chain) by its starting epoch. Target epochs for downloading and processing now advance by EPOCHS_PER_BATCH - for the same reason, move the "to_be_downloaded_id" to be an epoch - remove a sneaky line that dropped an out of range batch without downloading it - bonus: put the chain_id in the log given to the chain. This is why explicitly logging the chain_id is removed	2020-08-18 01:29:51 +00:00
Pawan Dhananjay	bb8b88edcf	Use SSZ types in rpc (#1244 ) * Update `milagro_bls` to new release (#1183) * Update milagro_bls to new release Signed-off-by: Kirk Baird <baird.k@outlook.com> * Tidy up fake cryptos Signed-off-by: Kirk Baird <baird.k@outlook.com> * move SecretHash to bls and put plaintext back Signed-off-by: Kirk Baird <baird.k@outlook.com> * Update v0.12.0 to v0.12.1 * Use ssz types for Request and error types * Fix errors * Constrain BlocksByRangeRequest count to MAX_REQUEST_BLOCKS * Fix issues after rebasing * Address review comments Co-authored-by: Kirk Baird <baird.k@outlook.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io> Co-authored-by: Age Manning <Age@AgeManning.com>	2020-06-12 10:04:50 +10:00
Age Manning	79cc9473c1	Sync and multi-client updates (#1044 ) * Update finalized/head sync logic * Correct sync logging * Handle status during sync gracefully	2020-04-23 19:01:29 +10:00
divma	2469bde6b1	Add chain_id in range syncing to avoid wrong dispatching of batch results (#1037 )	2020-04-22 21:17:56 +10:00
divma	47aef629d1	move the parent lookup process to a dedicated thread (#906 ) * Upgrade the parent lookup logic * Apply reviewer suggestions * move the parent lookup process to a dedicated thread * move the logic of parent lookup and range syncing to a block processor * review suggestions * more review suggestions * Add small logging changes * Process parent lookups in reverse Co-authored-by: Age Manning <Age@AgeManning.com>	2020-03-23 12:07:41 +11:00
Age Manning	fdb6e28f94	Super/Silky smooth syncs (#816 ) * Initial block processing thread design * Correct compilation issues * Increase logging and request from all given peers * Patch peer request bug * Adds fork choice to block processing * Adds logging for bug isolation * Patch syncing for chains with skip-slots * Bump block processing error logs * Improve logging for attestation processing * Randomize peer selection during sync * Resuming chains restarts from local finalized slot * Downgrades Arc batches to Rc batches * Add clippy fixes * Downgrade Rc<Batch> to Option<Batch> to pass processed batches to chains * Add reviewers suggestions	2020-01-23 17:30:49 +11:00
Age Manning	c184a98170	Sync fixes (#801 ) * Randomize peer selection for batch errors * Downgrade attestation logging * Handle range sync errors * Update lock file * Downgrade logs * Decrease batch size for better thread handling * Optimise peer selection in range sync	2020-01-15 14:48:09 +11:00
Age Manning	5853326342	Sync Re-Write (#663 ) * Apply clippy lints to beacon node * Remove unnecessary logging and correct formatting * Initial bones of load-balanced range-sync * Port bump meshsup tests * Further structure and network handling logic added * Basic structure, ignoring error handling * Correct max peers delay bug * Clean up and re-write message processor and sync manager * Restructure directory, correct type issues * Fix compiler issues * Completed first testing of new sync * Correct merge issues * Clean up warnings * Push attestation processed log down to dbg * Correct math error, downgraded logs * Add RPC error handling and improved syncing code * Add libp2p stream error handling and dropping of invalid peers * Lower logs * Add discovery tweak * Correct libp2p service locking * Handles peer disconnects for sync * Add logs downgrade discovery log * Less fork choice (#679) * Try merge in change to reduce fork choice calls * Remove fork choice from process block * Minor log fix * Check successes > 0 * Fix failing beacon chain tests * Fix re-org warnings * Fix mistake in prev commit * Range sync refactor - Introduces `ChainCollection` - Correct Disconnect node handling - Removes duplicate code * Various bug fixes * Remove unnecessary logs * Maintain syncing state in the transition from finalied to head * Improved disconnect handling * Adds forwards block interator * Notifies lighthouse on stream timeouts * Apply new gossipsub updates	2019-12-09 18:50:21 +11:00

18 Commits