lighthouse

Author	SHA1	Message	Date
divma	b8013b7b2c	Super Silky Smooth Syncs, like a Sir (#1628 ) ## Issue Addressed In principle.. closes #1551 but in general are improvements for performance, maintainability and readability. The logic for the optimistic sync in actually simple ## Proposed Changes There are miscellaneous things here: - Remove unnecessary `BatchProcessResult::Partial` to simplify the batch validation logic - Make batches a state machine. This is done to ensure batch state transitions respect our logic (this was previously done by moving batches between `Vec`s) and to ease the cognitive load of the `SyncingChain` struct - Move most batch-related logic to the batch - Remove `PendingBatches` in favor of a map of peers to their batches. This is to avoid duplicating peers inside the chain (peer_pool and pending_batches) - Add `must_use` decoration to the `ProcessingResult` so that chains that request to be removed are handled accordingly. This also means that chains are now removed in more places than before to account for unhandled cases - Store batches in a sorted map (`BTreeMap`) access is not O(1) but since the number of _active_ batches is bounded this should be fast, and saves performing hashing ops. Batches are indexed by the epoch they start. Sorted, to easily handle chain advancements (range logic) - Produce the chain Id from the identifying fields: target root and target slot. This, to guarantee there can't be duplicated chains and be able to consistently search chains by either Id or checkpoint - Fix chain_id not being present in all chain loggers - Handle mega-edge case where the processor's work queue is full and the batch can't be sent. In this case the chain would lose the blocks, remain in a "syncing" state and waiting for a result that won't arrive, effectively stalling sync. - When a batch imports blocks or the chain starts syncing with a local finalized epoch greater that the chain's start epoch, the chain is advanced instead of reset. This is to avoid losing download progress and validate batches faster. This also means that the old `start_epoch` now means "current first unvalidated batch", so it represents more accurately the progress of the chain. - Batch status peers from the same chain to reduce Arc access. - Handle a couple of cases where the retry counters for a batch were not updated/checked are now handled via the batch state machine. Basically now if we forget to do it, we will know. - Do not send back the blocks from the processor to the batch. Instead register the attempt before sending the blocks (does not count as failed) - When re-requesting a batch, try to avoid not only the last failed peer, but all previous failed peers. - Optimize requesting batches ahead in the buffer by shuffling idle peers just once (this is just addressing a couple of old TODOs in the code) - In chain_collection, store chains by their id in a map - Include a mapping from request_ids to (chain, batch) that requested the batch to avoid the double O(n) search on block responses - Other stuff: - impl `slog::KV` for batches - impl `slog::KV` for syncing chains - PSA: when logging, we can use `%thing` if `thing` implements `Display`. Same for `?` and `Debug` ### Optimistic syncing: Try first the batch that contains the current head, if the batch imports any block, advance the chain. If not, if this optimistic batch is inside the current processing window leave it there for future use, if not drop it. The tolerance for this block is the same for downloading, but just once for processing Co-authored-by: Age Manning <Age@AgeManning.com>	2020-09-23 06:29:55 +00:00
Age Manning	80e52a0263	Subscribe to core topics after sync (#1613 ) ## Issue Addressed N/A ## Proposed Changes Prevent subscribing to core gossipsub topics until after we have achieved a full sync. This prevents us censoring gossipsub channels, getting penalised in gossipsub 1.1 scoring and saves us computation time in attempting to validate gossipsub messages which we will be unable to do with a non-sync'd chain.	2020-09-23 03:26:33 +00:00
Pawan Dhananjay	a97ec318c4	Subscribe to subnets an epoch in advance (#1600 ) ## Issue Addressed N/A ## Proposed Changes Subscibe to subnet an epoch in advance of the attestation slot instead of 4 slots in advance.	2020-09-22 07:29:34 +00:00
Pawan Dhananjay	14ff38539c	Add trusted peers (#1640 ) ## Issue Addressed Closes #1581 ## Proposed Changes Adds a new cli option for trusted peers who always have the maximum possible score.	2020-09-22 01:12:36 +00:00
Age Manning	1db8daae0c	Shift metadata to the global network variables (#1631 ) ## Issue Addressed N/A ## Proposed Changes Shifts the local `metadata` to `network_globals` making it accessible to the HTTP API and other areas of lighthouse. ## Additional Info N/A	2020-09-21 02:00:38 +00:00
Age Manning	c9596fcf0e	Temporary Sync Work-Around (#1615 ) ## Issue Addressed #1590 ## Proposed Changes This is a temporary workaround that prevents finalized chain sync from swapping chains. I'm merging this in now until the full solution is ready.	2020-09-13 23:58:49 +00:00
Age Manning	c6abc56113	Prevent large step-size parameters (#1583 ) ## Issue Addressed Malicious users could request very large block ranges, more than we expect. Although technically legal, we are now quadraticaly weighting large step sizes in the filter. Therefore users may request large skips, but not a large number of blocks, to prevent requests forcing us to do long chain lookups. ## Proposed Changes Weight the step parameter in the RPC filter and prevent any overflows that effect us in the step parameter. ## Additional Info	2020-09-11 02:33:36 +00:00
blacktemplar	7f1b936905	ignore too early / too late attestations instead of penalizing them (#1608 ) ## Issue Addressed NA ## Proposed Changes This ignores attestations that are too early or too late as it is specified in the spec (see https://github.com/ethereum/eth2.0-specs/blob/v0.12.1/specs/phase0/p2p-interface.md#global-topics first subpoint of `beacon_aggregate_and_proof`)	2020-09-11 01:43:15 +00:00
Age Manning	b19cf02d2d	Penalise bad peer behaviour (#1602 ) ## Issue Addressed #1386 ## Proposed Changes Penalises peers in our scoring system that produce invalid attestations or blocks.	2020-09-10 03:51:06 +00:00
Age Manning	fb9d828e5e	Extended Gossipsub metrics (#1577 ) ## Issue Addressed N/A ## Proposed Changes Adds extended metrics to get a better idea of what is happening at the gossipsub layer of lighthouse. This provides information about mesh statistics per topics, subscriptions and peer scores. ## Additional Info	2020-09-01 06:59:14 +00:00
blacktemplar	c18d37c202	Use Gossipsub 1.1 (#1516 ) ## Issue Addressed #1172 ## Proposed Changes * updates the libp2p dependency * small adaptions based on changes in libp2p * report not just valid messages but also invalid and distinguish between `IGNORE`d messages and `REJECT`ed messages Co-authored-by: Age Manning <Age@AgeManning.com>	2020-08-30 13:06:50 +00:00
Adam Szkoda	d9f4819fe0	Alternative (to BeaconChainHarness) BeaconChain testing API (#1380 ) The PR: * Adds the ability to generate a crucial test scenario that isn't possible with `BeaconChainHarness` (i.e. two blocks occupying the same slot; previously forks necessitated skipping slots): ![image](https://user-images.githubusercontent.com/165678/88195404-4bce3580-cc40-11ea-8c08-b48d2e1d5959.png) * New testing API: Instead of repeatedly calling add_block(), you generate a sorted `Vec<Slot>` and leave it up to the framework to generate blocks at those slots. * Jumping backwards to an earlier epoch is a hard error, so that tests necessarily generate blocks in a epoch-by-epoch manner. * Configures the test logger so that output is printed on the console in case a test fails. The logger also plays well with `--nocapture`, contrary to the existing testing framework * Rewrites existing fork pruning tests to use the new API * Adds a tests that triggers finalization at a non epoch boundary slot * Renamed `BeaconChainYoke` to `BeaconChainTestingRig` because the former has been too confusing * Fixed multiple tests (e.g. `block_production_different_shuffling_long`, `delete_blocks_and_states`, `shuffling_compatible_simple_fork`) that relied on a weird (and accidental) feature of the old `BeaconChainHarness` that attestations aren't produced for epochs earlier than the current one, thus masking potential bugs in test cases. Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2020-08-26 09:24:55 +00:00
Pawan Dhananjay	bbed42f30c	Refactor attestation service (#1415 ) ## Issue Addressed N/A ## Proposed Changes Refactor attestation service to send out requests to find peers for subnets as soon as we get attestation duties. Earlier, we had much more involved logic to send the discovery requests to the discovery service only 6 slots before the attestation slot. Now that discovery is much smarter with grouped queries, the complexity in attestation service can be reduced considerably. Co-authored-by: Age Manning <Age@AgeManning.com>	2020-08-19 08:46:25 +00:00
divma	fdc6e2aa8e	Shutdown like a Sir (#1545 ) ## Issue Addressed #1494 ## Proposed Changes - Give the TaskExecutor the sender side of a channel that a task can clone to request shutting down - The receiver side of this channel is in environment and now we block until ctrl+c or an internal shutdown signal is received - The swarm now informs when it has reached 0 listeners - The network receives this message and requests the shutdown	2020-08-19 05:51:14 +00:00
Paul Hauner	8e7dd7b2b1	Add remaining network ops to queuing system (#1546 ) ## Issue Addressed NA ## Proposed Changes - Refactors the `BeaconProcessor` to remove some excessive nesting and file bloat - Sorry about the noise from this, it's all contained in 4d3f8c5 though. - Adds exits, proposer slashings, attester slashings to the `BeaconProcessor` so we don't get overwhelmed with large amounts of slashings (which happened a few hours ago). ## Additional Info NA	2020-08-19 05:09:53 +00:00
Age Manning	2d0b214b57	Clean up logs (#1541 ) ## Description This PR improves some logging for the end-user. It downgrades some warning logs and removes the slots per second sync speed if we are syncing and the speed is 0. This is likely because we are syncing from a finalised checkpoint and the head doesn't change.	2020-08-18 08:11:39 +00:00
Age Manning	8311074d68	Purge out-dated head chains on chain completion (#1538 ) ## Description There can be many head chains queued up to complete. Currently we try and process all of these to completion before we consider the node synced. In a chaotic network, there can be many of these and processing them to completion can be very expensive and slow. This PR removes any non-syncing head chains from the queue, and re-status's the peers. If, after we have synced to head on one chain, there is still a valid head chain to download, it will be re-established once the status has been returned. This should assist with getting nodes to sync on medalla faster.	2020-08-18 05:22:34 +00:00
Age Manning	3bb30754d9	Keep track of failed head chains and prevent re-lookups (#1534 ) ## Overview There are forked chains which get referenced by blocks and attestations on a network. Typically if these chains are very long, we stop looking up the chain and downvote the peer. In extreme circumstances, many peers are on many chains, the chains can be very deep and become time consuming performing lookups. This PR adds a cache to known failed chain lookups. This prevents us from starting a parent-lookup (or stopping one half way through) if we have attempted the chain lookup in the past.	2020-08-18 03:54:09 +00:00
Age Manning	cc44a64d15	Limit parallelism of head chain sync (#1527 ) ## Description Currently lighthouse load-balances across peers a single finalized chain. The chain is selected via the most peers. Once synced to the latest finalized epoch Lighthouse creates chains amongst its peers and syncs them all in parallel amongst each peer (grouped by their current head block). This is typically fast and relatively efficient under normal operations. However if the chain has not finalized in a long time, the head chains can grow quite long. Peer's head chains will update every slot as new blocks are added to the head. Syncing all head chains in parallel is a bottleneck and highly inefficient in block duplication leads to RPC timeouts when attempting to handle all new heads chains at once. This PR limits the parallelism of head syncing chains to 2. We now sync at most two head chains at a time. This allows for the possiblity of sync progressing alongside a peer being slow and holding up one chain via RPC timeouts.	2020-08-18 02:49:24 +00:00
divma	46dbf027af	Do not reset batch ids & redownload out of range batches (#1528 ) The changes are somewhat simple but should solve two issues: - When quickly changing between chains once and a second time back again, batchIds would collide and cause havoc. - If we got an out of range response from a peer, sync would remain in syncing but without advancing Changes: - remove the batch id. Identify each batch (inside a chain) by its starting epoch. Target epochs for downloading and processing now advance by EPOCHS_PER_BATCH - for the same reason, move the "to_be_downloaded_id" to be an epoch - remove a sneaky line that dropped an out of range batch without downloading it - bonus: put the chain_id in the log given to the chain. This is why explicitly logging the chain_id is removed	2020-08-18 01:29:51 +00:00
Michael Sproul	719a69aee0	Ignore blocks that skip a large distance from their parent (#1530 ) ## Proposed Changes To mitigate the impact of minority forks on RAM and disk usage, this change rejects blocks whose parent lies more than 320 slots (10 epochs, ~1 hour) in the past. The behaviour is configurable via `lighthouse bn --max-skip-slots N`, and can be turned off entirely using `--max-skip-slots none`. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2020-08-17 10:54:58 +00:00
Paul Hauner	f85485884f	Process gossip blocks on the GossipProcessor (#1523 ) ## Issue Addressed NA ## Proposed Changes Moves beacon block processing over to the newly-added `GossipProcessor`. This moves the task off the core executor onto the blocking one. ## Additional Info - With this PR, gossip blocks are being ignored during sync.	2020-08-17 09:20:27 +00:00
Age Manning	afdc4fea1d	Correct logic for peer sync identification (#1525 ) Fix a small sync bug which can mis-classify newly connected peers.	2020-08-17 03:00:10 +00:00
divma	113b40f321	Add multiaddr support in bootnodes (#1481 ) ## Issue Addressed #1384 Only catch, as currently implemented, when dialing the multiaddr nodes, there is no way to ask the peer manager if they are already connected or dialing	2020-08-17 02:13:26 +00:00
Paul Hauner	b0a3731fff	Introduce a queue for attestations from the network (#1511 ) ## Issue Addressed N/A ## Proposed Changes Introduces the `GossipProcessor`, a multi-threaded (multi-tasked?), non-blocking processor for some messages from the network which require verification and import into the `BeaconChain`. Initial testing indicates that this massively improves system stability by (a) moving block tasks from the normal executor (b) spreading out attestation load. ## Additional Info TBC	2020-08-14 04:38:45 +00:00
divma	138c0cf7f0	Remove block clone (#1448 ) ## Issue Addressed #1028 A bit late, but I think if `BlockError` had a kind (the current `BlockError` minus everything on the variants that comes directly from the block) and the original block, more clones could be removed	2020-08-06 04:29:17 +00:00
Age Manning	09a615b2c0	Lighthouse crate v0.2.0 bump (#1450 ) ## Description This PR marks Lighthouse v0.2.0. This release marks the stable version of Lighthouse, ready for the approaching Medalla testnet.	2020-08-06 03:43:05 +00:00
Paul Hauner	5629126f45	Add reason to invalid attestation log (#1460 ) ## Issue Addressed NA ## Proposed Changes Adds an extra field to a debug log so we can see why an attestation was invalid. ## Additional Info NA	2020-08-05 01:49:52 +00:00
Age Manning	31707ccf45	Shift author to sigma prime on some crates (#1440 ) Shifts the author to sigma prime on some crates	2020-08-04 02:31:41 +00:00
Age Manning	f634f073a8	Correct issue with network message passing (#1439 ) ## Issue Addressed Sync was breaking occasionally. The root cause appears to be identify crashing as events we being sent to the protocol after nodes were banned. Have not been able to reproduce sync issues since this update. ## Proposed Changes Only send messages to sub-behaviour protocols if the peer manager thinks the peer is connected. All other messages are dropped.	2020-08-03 09:35:53 +00:00
Age Manning	a37e75f44b	Downgrade sync and rpc warn logs (#1417 ) * Downgrade sycn and rpc warn logs * Correct warning	2020-07-30 13:52:44 +10:00
Age Manning	395d99ce03	Sync update (#1412 ) ## Issue Addressed Recurring sync loop and invalid batch downloading ## Proposed Changes Shifts the batches to include the first slot of each epoch. This ensures the finalized is always downloaded once a chain has completed syncing. Also add in logic to prevent re-dialing disconnected peers. Non-performant peers get disconnected during sync, this prevents re-connection to these during sync. ## Additional Info N/A	2020-07-29 05:25:10 +00:00
Age Manning	ba0f3daf9d	Gossipsub update (#1400 ) ## Issue Addressed N/A ## Proposed Changes This provides a number of corrections and improvements to gossipsub. Specifically - Enables options for greater privacy around the message author - Provides greater flexibility on message validation - Prevents unvalidated messages from being gossiped - Shifts the duplicate cache to a time-based cache inside gossipsub - Updates the message-id to handle bytes - Bug fixes related to mesh maintenance and topic subscription. This should improve our attestation inclusion rate.	2020-07-29 03:40:22 +00:00
realbigsean	09b40b7a5e	Discover query grouping (#1364 ) ## Issue Addressed #1281 ## Proposed Changes Groups queries for specific subnets into groups of up to 3. ## Additional Info	2020-07-29 02:43:50 +00:00
divma	9ae9df806c	Fix clippy lints rpc (#1401 ) ## Issue Addressed #1388 partially (eth2_libp2p & network) ## Proposed Changes TLDR at the end - Complex types are 3 on the handlers/Behaviours but the types are `Poll<ComplexType>` where `ComplexType` comes from the traits of libp2p. Those, I don't thing are worth an alias. A couple more were from using tokio combinators and were removed writing things the async way and using [`BoxFuture`](https://docs.rs/futures/0.3.5/futures/future/type.BoxFuture.html) - The cognitive complexity.. I tried to address those before (they come from the poll functions too) and tbh they are cognitively simpler to understand the way they are now. Moving separate parts to functions doesn't add much since that code is not repeated and they all do early returns. If moved those returns would now need to be wrapped in an Option, probably, and checked to be returned again. I would leave them like that but that's just preference. - Too many arguments: They are not easily put together in a wrapping struct since the parameters don't relate semantically (Ex: fn new with a log, a reference to the chain, a peer, etc) but some may differ. - Needless returns were indeed needless ## Additional Info TLDR: removed needless return, used BoxFuture and async, left the rest untouched since those lgtm	2020-07-28 01:39:42 +00:00
blacktemplar	23a8f31f83	Fix clippy warnings (#1385 ) ## Issue Addressed NA ## Proposed Changes Fixes most clippy warnings and ignores the rest of them, see issue #1388.	2020-07-23 14:18:00 +00:00
Pawan Dhananjay	b885d79ac3	Fix attestation propagation (#1360 ) * Add `should_process` for conditional processing of Attestations * Remove ATTESTATIONS_IGNORED metric	2020-07-20 12:55:32 +10:00
Age Manning	f500b24242	Update smallvec (#1339 )	2020-07-07 16:57:27 +10:00
Age Manning	5bc8fea2e0	Activate peer scoring (#1284 ) * Initial score structure * Peer manager update * Updates to dialing * Correct tests * Correct typos and remove unused function * Integrate scoring into the network crate * Clean warnings * Formatting * Shift core functionality into the behaviour * Temp commit * Shift disconnections into the behaviour * Temp commit * Update libp2p and gossipsub * Remove gossipsub lru cache * Correct merge conflicts * Modify handler and correct tests * Update enr network globals on socket update * Apply clippy lints * Add new prysm fingerprint * More clippy fixes	2020-07-07 10:13:16 +10:00
Paul Hauner	e429c3eefe	Remove old block processing shim (#1327 ) * Remove old block processing shim * Run rustfmt * Fix log formatting * Swap peer ids over to display	2020-07-06 16:28:00 +10:00
Paul Hauner	25cd91ce26	Update deps (#1322 ) * Run cargo update * Upgrade prometheus * Update hex * Upgrade parking-lot * Upgrade num-bigint * Upgrade sha2 * Update dockerfile Rust version * Run cargo update	2020-07-06 11:55:56 +10:00
Age Manning	9fc290a344	Add waker to attestation service (#1305 ) * Add waker to attestation service * Formatting	2020-06-28 22:29:27 +10:00
Paul Hauner	6e7d5c6a7c	Add metrics for validator subscriptions (#1302 )	2020-06-28 10:47:03 +10:00
Michael Sproul	7688b5f1dd	Merge remote-tracking branch 'origin/master' into spec-v0.12	2020-06-26 12:57:56 +10:00
pscott	02174e21d8	Fix clippy's performance lints (#1286 ) * Fix clippy perf lints * Cargo fmt * Add and to lint rule in Makefile * Fix some leftover clippy lints	2020-06-26 00:04:08 +10:00
Paul Hauner	decea48c78	Merge branch 'master' into spec-v0.12	2020-06-21 10:33:02 +10:00
Age Manning	710409c2ba	Userland clean up (#1277 ) * Improve logging, remove unused CLI and move discovery * Correct tests * Handle flag correctly	2020-06-20 09:34:28 +10:00
Age Manning	e379ad0f4e	Silky smooth discovery (#1274 ) * Initial structural re-write * Improving discovery update and correcting attestation service logic * Rework discovery.mod * Handling lifetimes of query futures * Discovery update first draft * format fixes * Stabalise discv5 update * Formatting corrections * Limit FindPeers queries and bug correction * Update to stable release discv5 * Remove unnecessary pin * formatting	2020-06-19 14:13:23 +10:00
Michael Sproul	9450a0f30d	Merge remote-tracking branch 'origin/master' into spec-v0.12	2020-06-18 21:59:59 +10:00
Michael Sproul	bcb6afa0aa	Process exits and slashings off the network (#1253 ) * Process exits and slashings off the network * Fix rest_api tests * Add op verification tests * Add tests for pruning of slashings in the op pool * Address Paul's review comments	2020-06-18 21:06:34 +10:00

1 2 3 4 5 ...

296 Commits