lighthouse

Author	SHA1	Message	Date
Age Manning	df40700ddd	Rename eth2_libp2p to lighthouse_network (#2702 ) ## Description The `eth2_libp2p` crate was originally named and designed to incorporate a simple libp2p integration into lighthouse. Since its origins the crates purpose has expanded dramatically. It now houses a lot more sophistication that is specific to lighthouse and no longer just a libp2p integration. As of this writing it currently houses the following high-level lighthouse-specific logic: - Lighthouse's implementation of the eth2 RPC protocol and specific encodings/decodings - Integration and handling of ENRs with respect to libp2p and eth2 - Lighthouse's discovery logic, its integration with discv5 and logic about searching and handling peers. - Lighthouse's peer manager - This is a large module handling various aspects of Lighthouse's network, such as peer scoring, handling pings and metadata, connection maintenance and recording, etc. - Lighthouse's peer database - This is a collection of information stored for each individual peer which is specific to lighthouse. We store connection state, sync state, last seen ips and scores etc. The data stored for each peer is designed for various elements of the lighthouse code base such as syncing and the http api. - Gossipsub scoring - This stores a collection of gossipsub 1.1 scoring mechanisms that are continuously analyssed and updated based on the ethereum 2 networks and how Lighthouse performs on these networks. - Lighthouse specific types for managing gossipsub topics, sync status and ENR fields - Lighthouse's network HTTP API metrics - A collection of metrics for lighthouse network monitoring - Lighthouse's custom configuration of all networking protocols, RPC, gossipsub, discovery, identify and libp2p. Therefore it makes sense to rename the crate to be more akin to its current purposes, simply that it manages the majority of Lighthouse's network stack. This PR renames this crate to `lighthouse_network` Co-authored-by: Paul Hauner <paul@paulhauner.com>	2021-10-19 00:30:39 +00:00
Paul Hauner	fff01b24dd	Release v2.0.1 (#2726 ) ## Issue Addressed NA ## Proposed Changes - Update versions to `v2.0.1` in anticipation for a release early next week. - Add `--ignore` to `cargo audit`. See #2727. ## Additional Info NA	2021-10-18 03:08:32 +00:00
Age Manning	180c90bf6d	Correct peer connection transition logic (#2725 ) ## Description This PR updates the peer connection transition logic. It is acceptable for a peer to immediately transition from a disconnected state to a disconnecting state. This can occur when we are at our peer limit and a new peer's dial us.	2021-10-17 04:04:36 +00:00
Paul Hauner	a7b675460d	Add Altair tests to op pool (#2723 ) ## Issue Addressed NA ## Proposed Changes Adds some more testing for Altair to the op pool. Credits to @michaelsproul for some appropriated efforts here. ## Additional Info NA Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2021-10-16 05:07:23 +00:00
Michael Sproul	5cde3fc4da	Reduce lock contention in backfill sync (#2716 ) ## Proposed Changes Clone the proposer pubkeys during backfill signature verification to reduce the time that the pubkey cache lock is held for. Cloning such a small number of pubkeys has negligible impact on the total running time, but greatly reduces lock contention. On a Ryzen 5950X, the setup step seems to take around 180us regardless of whether the key is cloned or not, while the verification takes 7ms. When Lighthouse is limited to 10% of one core using `sudo cpulimit --pid <pid> --limit 10` the total time jumps up to 800ms, but the setup step remains only 250us. This means that under heavy load this PR could cut the time the lock is held for from 800ms to 250us, which is a huge saving of 99.97%!	2021-10-15 03:28:03 +00:00
Paul Hauner	9c5a8ab7f2	Change "too many resources" to "insufficient resources" in eth2_libp2p (#2713 ) ## Issue Addressed NA ## Proposed Changes Fixes what I assume is a typo in a log message. See the diff for details. ## Additional Info NA	2021-10-15 00:07:12 +00:00
Age Manning	05040e68ec	Update discovery (#2711 ) ## Issue Addressed #2695 ## Proposed Changes This updates discovery to the latest version which has patched a panic that occurred due to a race condition in the bucket logic.	2021-10-14 22:09:38 +00:00
Paul Hauner	e2d09bb8ac	Add `BeaconChainHarness::builder` (#2707 ) ## Issue Addressed NA ## Proposed Changes This PR is near-identical to https://github.com/sigp/lighthouse/pull/2652, however it is to be merged into `unstable` instead of `merge-f2f`. Please see that PR for reasoning. I'm making this duplicate PR to merge to `unstable` in an effort to shrink the diff between `unstable` and `merge-f2f` by doing smaller, lead-up PRs. ## Additional Info NA	2021-10-14 02:58:10 +00:00
Pawan Dhananjay	34d22b5920	Reduce validator monitor logging verbosity (#2606 ) ## Issue Addressed Resolves #2541 ## Proposed Changes Reduces verbosity of validator monitor per epoch logging by batching info logs for multiple validators. Instead of a log for every validator managed by the validator monitor, we now batch logs for attestation records for previous epoch. Before: ```log Sep 20 06:53:08.239 INFO Previous epoch attestation success validator: 1, epoch: 65875, matched_head: true, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon Sep 20 06:53:08.239 INFO Previous epoch attestation success validator: 2, epoch: 65875, matched_head: true, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon Sep 20 06:53:08.239 INFO Previous epoch attestation success validator: 3, epoch: 65875, matched_head: true, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon Sep 20 06:53:08.239 INFO Previous epoch attestation success validator: 4, epoch: 65875, matched_head: true, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon Sep 20 06:53:08.239 INFO Previous epoch attestation success validator: 5, epoch: 65875, matched_head: false, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon Sep 20 06:53:08.239 WARN Attestation failed to match head validator: 5, epoch: 65875, service: val_mon Sep 20 06:53:08.239 INFO Previous epoch attestation success validator: 6, epoch: 65875, matched_head: false, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon Sep 20 06:53:08.239 WARN Attestation failed to match head validator: 6, epoch: 65875, service: val_mon Sep 20 06:53:08.239 INFO Previous epoch attestation success validator: 7, epoch: 65875, matched_head: true, matched_target: false, inclusion_lag: 1 slot(s), service: val_mon Sep 20 06:53:08.239 WARN Attestation failed to match target validator: 7, epoch: 65875, service: val_mon Sep 20 06:53:08.239 WARN Sub-optimal inclusion delay validator: 7, epoch: 65875, optimal: 1, delay: 2, service: val_mon Sep 20 06:53:08.239 INFO Previous epoch attestation success validator: 8, epoch: 65875, matched_head: true, matched_target: false, inclusion_lag: 1 slot(s), service: val_mon Sep 20 06:53:08.239 WARN Attestation failed to match target validator: 8, epoch: 65875, service: val_mon Sep 20 06:53:08.239 WARN Sub-optimal inclusion delay validator: 8, epoch: 65875, optimal: 1, delay: 2, service: val_mon Sep 20 06:53:08.239 ERRO Previous epoch attestation missing validator: 9, epoch: 65875, service: val_mon Sep 20 06:53:08.239 ERRO Previous epoch attestation missing validator: 10, epoch: 65875, service: val_mon ``` after ``` Sep 20 06:53:08.239 INFO Previous epoch attestation success validators: [1,2,3,4,5,6,7,8,9] , epoch: 65875, service: val_mon Sep 20 06:53:08.239 WARN Previous epoch attestation failed to match head, validators: [5,6], epoch: 65875, service: val_mon Sep 20 06:53:08.239 WARN Previous epoch attestation failed to match target, validators: [7,8], epoch: 65875, service: val_mon Sep 20 06:53:08.239 WARN Previous epoch attestations had sub-optimal inclusion delay, validators: [7,8], epoch: 65875, service: val_mon Sep 20 06:53:08.239 ERRO Previous epoch attestation missing validators: [9,10], epoch: 65875, service: val_mon ``` The detailed individual logs are downgraded to debug logs.	2021-10-12 05:06:48 +00:00
Mac L	a73d698e30	Add TLS capability to the beacon node HTTP API (#2668 ) Currently, the beacon node has no ability to serve the HTTP API over TLS. Adding this functionality would be helpful for certain use cases, such as when you need a validator client to connect to a backup beacon node which is outside your local network, and the use of an SSH tunnel or reverse proxy would be inappropriate. ## Proposed Changes - Add three new CLI flags to the beacon node - `--http-enable-tls`: enables TLS - `--http-tls-cert`: to specify the path to the certificate file - `--http-tls-key`: to specify the path to the key file - Update the HTTP API to optionally use `warp`'s [`TlsServer`](https://docs.rs/warp/0.3.1/warp/struct.TlsServer.html) depending on the presence of the `--http-enable-tls` flag - Update tests and docs - Use a custom branch for `warp` to ensure proper error handling ## Additional Info Serving the API over TLS should currently be considered experimental. The reason for this is that it uses code from an [unmerged PR](https://github.com/seanmonstar/warp/pull/717). This commit provides the `try_bind_with_graceful_shutdown` method to `warp`, which is helpful for controlling error flow when the TLS configuration is invalid (cert/key files don't exist, incorrect permissions, etc). I've implemented the same code in my [branch here](https://github.com/macladson/warp/tree/tls). Once the code has been reviewed and merged upstream into `warp`, we can remove the dependency on my branch and the feature can be considered more stable. Currently, the private key file must not be password-protected in order to be read into Lighthouse.	2021-10-12 03:35:49 +00:00
Age Manning	0aee7ec873	Refactor Peerdb and PeerManager (#2660 ) ## Proposed Changes This is a refactor of the PeerDB and PeerManager. A number of bugs have been surfacing around the connection state of peers and their interaction with the score state. This refactor tightens the mutability properties of peers such that only specific modules are able to modify the state of peer information preventing inadvertant state changes that can lead to our local peer manager db being out of sync with libp2p. Further, the logic around connection and scoring was quite convoluted and the distinction between the PeerManager and Peerdb was not well defined. Although these issues are not fully resolved, this PR is step to cleaning up this logic. The peerdb solely manages most mutability operations of peers leaving high-order logic to the peer manager. A single `update_connection_state()` function has been added to the peer-db making it solely responsible for modifying the peer's connection state. The way the peer's scores can be modified have been reduced to three simple functions (`update_scores()`, `update_gossipsub_scores()` and `report_peer()`). This prevents any add-hoc modifications of scores and only natural processes of score modification is allowed which simplifies the reasoning of score and state changes.	2021-10-11 02:45:06 +00:00
Michael Sproul	708557a473	Fix cargo audit warns for nix, psutil, time (#2699 ) ## Issue Addressed Fix `cargo audit` failures on `unstable` Closes #2698 ## Proposed Changes The main culprit is `nix`, which is vulnerable for versions below v0.23.0. We can't get by with a straight-forward `cargo update` because `psutil` depends on an old version of `nix` (cf. https://github.com/rust-psutil/rust-psutil/pull/93). Hence I've temporarily forked `psutil` under the `sigp` org, where I've included the update to `nix` v0.23.0. Additionally, I took the chance to update the `time` dependency to v0.3, which removed a bunch of stale deps including `stdweb` which is no longer maintained. Lighthouse only uses the `time` crate in the notifier to do some pretty printing, and so wasn't affected by any of the breaking changes in v0.3 ([changelog here](https://github.com/time-rs/time/blob/main/CHANGELOG.md#030-2021-07-30)).	2021-10-11 00:10:35 +00:00
Pawan Dhananjay	7c7ba770de	Update broken api links (#2665 ) ## Issue Addressed Resolves #2563 Replacement for #2653 as I'm not able to reopen that PR after force pushing. ## Proposed Changes Fixes all broken api links. Cherry picked changes in #2590 and updated a few more links. Co-authored-by: Mason Stallmo <masonstallmo@gmail.com>	2021-10-06 00:46:09 +00:00
Pawan Dhananjay	73ec29c267	Don't log errors on resubscription of gossip topics (#2613 ) ## Issue Addressed Resolves #2555 ## Proposed Changes Don't log errors on resubscribing to topics. Also don't log errors if we are setting already set attnet/syncnet bits.	2021-10-06 00:46:08 +00:00
Wink Saville	58870fc6d3	Add test_logger as feature to logging (#2586 ) ## Issue Addressed Fix #2585 ## Proposed Changes Provide a canonical version of test_logger that can be used throughout lighthouse. ## Additional Info This allows tests to conditionally emit logging data by adding test_logger as the default logger. And then when executing `cargo test --features logging/test_logger` log output will be visible: wink@3900x:~/lighthouse/common/logging/tests/test-feature-test_logger (Add-test_logger-as-feature-to-logging) $ cargo test --features logging/test_logger Finished test [unoptimized + debuginfo] target(s) in 0.02s Running unittests (target/debug/deps/test_logger-e20115db6a5e3714) running 1 test Sep 10 12:53:45.212 INFO hi, module: test_logger:8 test tests::test_fn_with_logging ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s Doc-tests test-logger running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s Or, in normal scenarios where logging isn't needed, executing `cargo test` the log output will not be visible: wink@3900x:~/lighthouse/common/logging/tests/test-feature-test_logger (Add-test_logger-as-feature-to-logging) $ cargo test Finished test [unoptimized + debuginfo] target(s) in 0.02s Running unittests (target/debug/deps/test_logger-02e02f8d41e8cf8a) running 1 test test tests::test_fn_with_logging ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s Doc-tests test-logger running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s	2021-10-06 00:46:07 +00:00
Michael Sproul	7c88f582d9	Release v2.0.0 (#2673 ) ## Proposed Changes * Bump version to v2.0.0 * Update dependencies (obsoletes #2670). `tokio-macros` v1.4.0 had been yanked due to a bug.	2021-10-05 03:53:18 +00:00
Michael Sproul	ed1fc7cca6	Fix I/O atomicity issues with checkpoint sync (#2671 ) ## Issue Addressed This PR addresses an issue found by @YorickDowne during testing of v2.0.0-rc.0. Due to a lack of atomic database writes on checkpoint sync start-up, it was possible for the database to get into an inconsistent state from which it couldn't recover without `--purge-db`. The core of the issue was that the store's anchor info was being stored _before_ the `PersistedBeaconChain`. If a crash occured so that anchor info was stored but _not_ the `PersistedBeaconChain`, then on restart Lighthouse would think the database was unitialized and attempt to compare-and-swap a `None` value, but would actually find the stale info from the previous run. ## Proposed Changes The issue is fixed by writing the anchor info, the split point, and the `PersistedBeaconChain` atomically on start-up. Some type-hinting ugliness was required, which could possibly be cleaned up in future refactors.	2021-10-05 03:53:17 +00:00
Kane Wallmann	28b79084cd	Fix chain_id value in config/deposit_contract RPC method (#2659 ) ## Issue Addressed This PR addresses issue #2657 ## Proposed Changes Changes `/eth/v1/config/deposit_contract` endpoint to return the chain ID from the loaded chain spec instead of eth1::DEFAULT_NETWORK_ID which is the Goerli chain ID of 5. Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2021-10-01 06:32:38 +00:00
Michael Sproul	ea78315749	Release v2.0.0-rc.0 (#2634 ) ## Proposed Changes Cut the first release candidate for v2.0.0, in preparation for testing and release this week ## Additional Info Builds on #2632, which should either be merged first or in the same batch	2021-10-01 01:23:55 +00:00
Age Manning	29a8865d07	Consistent tracking of disconnected peers (#2650 ) ## Issue Addressed N/A ## Proposed Changes When peers switching to a disconnecting state, decrement the disconnected peers counter. This also downgrades some crit logs to errors. I've also added a re-sync point when peers get unbanned the disconnected peer count will match back to the number of disconnected peers if it has gone out of sync previously.	2021-09-30 04:31:43 +00:00
Squirrel	db4d72c4f1	Remove unused deps (#2592 ) Found some deps you're possibly not using. Please shout if you think they are indeed still needed.	2021-09-30 04:31:42 +00:00
Mac L	4c510f8f6b	Add `BlockTimesCache` to allow additional block delay metrics (#2546 ) ## Issue Addressed Closes #2528 ## Proposed Changes - Add `BlockTimesCache` to provide block timing information to `BeaconChain`. This allows additional metrics to be calculated for blocks that are set as head too late. - Thread the `seen_timestamp` of blocks received from RPC responses (except blocks from syncing) through to the sync manager, similar to what is done for blocks from gossip. ## Additional Info This provides the following additional metrics: - `BEACON_BLOCK_OBSERVED_SLOT_START_DELAY_TIME` - The delay between the start of the slot and when the block was first observed. - `BEACON_BLOCK_IMPORTED_OBSERVED_DELAY_TIME` - The delay between when the block was first observed and when the block was imported. - `BEACON_BLOCK_HEAD_IMPORTED_DELAY_TIME` - The delay between when the block was imported and when the block was set as head. The metric `BEACON_BLOCK_IMPORTED_SLOT_START_DELAY_TIME` was removed. A log is produced when a block is set as head too late, e.g.: ``` Aug 27 03:46:39.006 DEBG Delayed head block set_as_head_delay: Some(21.731066ms), imported_delay: Some(119.929934ms), observed_delay: Some(3.864596988s), block_delay: 4.006257988s, slot: 1931331, proposer_index: 24294, block_root: 0x937602c89d3143afa89088a44bdf4b4d0d760dad082abacb229495c048648a9e, service: beacon ```	2021-09-30 04:31:41 +00:00
Pawan Dhananjay	70441aa554	Improve valmon inclusion delay calculation (#2618 ) ## Issue Addressed Resolves #2552 ## Proposed Changes Offers some improvement in inclusion distance calculation in the validator monitor. When registering an attestation from a block, instead of doing `block.slot() - attesstation.data.slot()` to get the inclusion distance, we now pass the parent block slot from the beacon chain and do `parent_slot.saturating_sub(attestation.data.slot())`. This allows us to give best effort inclusion distance in scenarios where the attestation was included right after a skip slot. Note that this does not give accurate results in scenarios where the attestation was included few blocks after the skip slot. In this case, if the attestation slot was `b1` and was included in block `b2` with a skip slot in between, we would get the inclusion delay as 0 (by ignoring the skip slot) which is the best effort inclusion delay. ``` b1 <- missed <- b2 ``` Here, if the attestation slot was `b1` and was included in block `b3` with a skip slot and valid block `b2` in between, then we would get the inclusion delay as 2 instead of 1 (by ignoring the skip slot). ``` b1 <- missed <- b2 <- b3 ``` A solution for the scenario 2 would be to count number of slots between included slot and attestation slot ignoring the skip slots in the beacon chain and pass the value to the validator monitor. But I'm concerned that it could potentially lead to db accesses for older blocks in extreme cases. This PR also uses the validator monitor data for logging per epoch inclusion distance. This is useful as we won't get inclusion data in post-altair summaries. Co-authored-by: Michael Sproul <micsproul@gmail.com>	2021-09-30 01:22:43 +00:00
realbigsean	7d13e57d9f	Add interop metrics (#2645 ) ## Issue Addressed Resolves: #2644 ## Proposed Changes - Adds mandatory metrics mentioned here: https://github.com/ethereum/beacon-metrics/blob/master/metrics.md#interop-metrics ## Additional Info Couldn't figure out how to alias metrics, so I created them all as new gauges/counters. Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-09-29 23:44:24 +00:00
realbigsean	113ef74ef6	Add contribution and proof event (#2527 ) ## Issue Addressed N/A ## Proposed Changes Add the new ContributionAndProof event: https://github.com/ethereum/beacon-APIs/pull/158 ## Additional Info N/A Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-09-25 07:53:58 +00:00
Paul Hauner	fe52322088	Implement SSZ union type (#2579 ) ## Issue Addressed NA ## Proposed Changes Implements the "union" type from the SSZ spec for `ssz`, `ssz_derive`, `tree_hash` and `tree_hash_derive` so it may be derived for `enums`: https://github.com/ethereum/consensus-specs/blob/v1.1.0-beta.3/ssz/simple-serialize.md#union The union type is required for the merge, since the `Transaction` type is defined as a single-variant union `Union[OpaqueTransaction]`. ### Crate Updates This PR will (hopefully) cause CI to publish new versions for the following crates: - `eth2_ssz_derive`: `0.2.1` -> `0.3.0` - `eth2_ssz`: `0.3.0` -> `0.4.0` - `eth2_ssz_types`: `0.2.0` -> `0.2.1` - `tree_hash`: `0.3.0` -> `0.4.0` - `tree_hash_derive`: `0.3.0` -> `0.4.0` These these crates depend on each other, I've had to add a workspace-level `[patch]` for these crates. A follow-up PR will need to remove this patch, ones the new versions are published. ### Union Behaviors We already had SSZ `Encode` and `TreeHash` derive for enums, however it just did a "transparent" pass-through of the inner value. Since the "union" decoding from the spec is in conflict with the transparent method, I've required that all `enum` have exactly one of the following enum-level attributes: #### SSZ - `#[ssz(enum_behaviour = "union")]` - matches the spec used for the merge - `#[ssz(enum_behaviour = "transparent")]` - maintains existing functionality - not supported for `Decode` (never was) #### TreeHash - `#[tree_hash(enum_behaviour = "union")]` - matches the spec used for the merge - `#[tree_hash(enum_behaviour = "transparent")]` - maintains existing functionality This means that we can maintain the existing transparent behaviour, but all existing users will get a compile-time error until they explicitly opt-in to being transparent. ### Legacy Option Encoding Before this PR, we already had a union-esque encoding for `Option<T>`. However, this was with the old SSZ spec where the union selector was 4 bytes. During merge specification, the spec was changed to use 1 byte for the selector. Whilst the 4-byte `Option` encoding was never used in the spec, we used it in our database. Writing a migrate script for all occurrences of `Option` in the database would be painful, especially since it's used in the `CommitteeCache`. To avoid the migrate script, I added a serde-esque `#[ssz(with = "module")]` field-level attribute to `ssz_derive` so that we can opt into the 4-byte encoding on a field-by-field basis. The `ssz::legacy::four_byte_impl!` macro allows a one-liner to define the module required for the `#[ssz(with = "module")]` for some `Option<T> where T: Encode + Decode`. Notably, I have removed `Encode` and `Decode` impls for `Option`. I've done this to force a break on downstream users. Like I mentioned, `Option` isn't used in the spec so I don't think it'll be that annoying. I think it's nicer than quietly having two different union implementations or quietly breaking the existing `Option` impl. ### Crate Publish Ordering I've modified the order in which CI publishes crates to ensure that we don't publish a crate without ensuring we already published a crate that it depends upon. ## TODO - [ ] Queue a follow-up `[patch]`-removing PR.	2021-09-25 05:58:36 +00:00
Age Manning	00a7ef0036	Correct bug in sync (#2615 ) A bug that causes failed batches to continually download in a loop is corrected.	2021-09-23 01:32:04 +00:00
Paul Hauner	be11437c27	Batch BLS verification for attestations (#2399 ) ## Issue Addressed NA ## Proposed Changes Adds the ability to verify batches of aggregated/unaggregated attestations from the network. When the `BeaconProcessor` finds there are messages in the aggregated or unaggregated attestation queues, it will first check the length of the queue: - `== 1` verify the attestation individually. - `>= 2` take up to 64 of those attestations and verify them in a batch. Notably, we only perform batch verification if the queue has a backlog. We don't apply any artificial delays to attestations to try and force them into batches. ### Batching Details To assist with implementing batches we modify `beacon_chain::attestation_verification` to have two distinct categories for attestations: - Indexed attestations: those which have passed initial validation and were valid enough for us to derive an `IndexedAttestation`. - Verified attestations: those attestations which were indexed and also passed signature verification. These are well-formed, interesting messages which were signed by validators. The batching functions accept `n` attestations and then return `n` attestation verification `Result`s, where those `Result`s can be any combination of `Ok` or `Err`. In other words, we attempt to verify as many attestations as possible and return specific per-attestation results so peer scores can be updated, if required. When we batch verify attestations, we first try to map all those attestations to indexed attestations. If any of those attestations were able to be indexed, we then perform batch BLS verification on those indexed attestations. If the batch verification succeeds, we convert them into verified attestations, disabling individual signature checking. If the batch fails, we convert to verified attestations with individual signature checking enabled. Ultimately, we optimistically try to do a batch verification of attestation signatures and fall-back to individual verification if it fails. This opens an attach vector for "poisoning" the attestations and causing us to waste a batch verification. I argue that peer scoring should do a good-enough job of defending against this and the typical-case gains massively outweigh the worst-case losses. ## Additional Info Before this PR, attestation verification took the attestations by value (instead of by reference). It turns out that this was unnecessary and, in my opinion, resulted in some undesirable ergonomics (e.g., we had to pass the attestation back in the `Err` variant to avoid clones). In this PR I've modified attestation verification so that it now takes a reference. I refactored the `beacon_chain/tests/attestation_verification.rs` tests so they use a builder-esque "tester" struct instead of a weird macro. It made it easier for me to test individual/batch with the same set of tests and I think it was a nice tidy-up. Notably, I did this last to try and make sure my new refactors to actual production code would pass under the existing test suite.	2021-09-22 08:49:41 +00:00
Michael Sproul	9667dc2f03	Implement checkpoint sync (#2244 ) ## Issue Addressed Closes #1891 Closes #1784 ## Proposed Changes Implement checkpoint sync for Lighthouse, enabling it to start from a weak subjectivity checkpoint. ## Additional Info - [x] Return unavailable status for out-of-range blocks requested by peers (#2561) - [x] Implement sync daemon for fetching historical blocks (#2561) - [x] Verify chain hashes (either in `historical_blocks.rs` or the calling module) - [x] Consistency check for initial block + state - [x] Fetch the initial state and block from a beacon node HTTP endpoint - [x] Don't crash fetching beacon states by slot from the API - [x] Background service for state reconstruction, triggered by CLI flag or API call. Considered out of scope for this PR: - Drop the requirement to provide the `--checkpoint-block` (this would require some pretty heavy refactoring of block verification) Co-authored-by: Diva M <divma@protonmail.com>	2021-09-22 00:37:28 +00:00
Age Manning	280e4fe23d	Increase connection limits and allow priority connections (#2604 ) In previous network updates we have made our libp2p connections more lean by limiting the maximum number of connections a lighthouse node will accept before libp2p rejects new connections. However, we still maintain the logic that at maximum connections, we try to dial extra peers if they are needed by a validator client to publish messages on a specific subnet. The dials typically result in failures due the libp2p connection limits. This PR adds an extra factor, `PRIORITY_PEER_EXCESS` which sets aside a new allocation of peers we are able to dial in case we need these peers for the VC client. This allocation sits along side the excess peer (which allows extra incoming peers on top of our target peer limit). The drawback here, is that libp2p now allows extra peers to connect to us (beyond the standard peer limit) which the peer manager should subsequently reject.	2021-09-21 07:45:13 +00:00
Age Manning	a73dcb7b6d	Improved handling of IP Banning (#2530 ) This PR in general improves the handling around peer banning. Specifically there were issues when multiple peers under a single IP connected to us after we banned the IP for poor behaviour. This PR should now handle these peers gracefully as well as make some improvements around how we previously disconnected and banned peers. The logic now goes as follows: - Once a peer gets banned, its gets registered with its known IP addresses - Once enough banned peers exist under a single IP that IP is banned - We retain connections with existing peers under this IP - Any new connections under this IP are rejected	2021-09-17 04:02:31 +00:00
Pawan Dhananjay	64ad2af100	Subscribe to altair gossip topics 2 slots before fork (#2532 ) ## Issue Addressed N/A ## Proposed Changes Add a fork_digest to `ForkContext` only if it is set in the config. Reject gossip messages on post fork topics before the fork happens. Edit: Instead of rejecting gossip messages on post fork topics, we now subscribe to post fork topics 2 slots before the fork. Co-authored-by: Age Manning <Age@AgeManning.com>	2021-09-17 01:11:16 +00:00
Age Manning	56e0615df8	Experimental discovery (#2577 ) # Description A few changes have been made to discovery. In particular a custom re-write of an LRU cache which previously was read/write O(N) for all our sessions ~5k, to a more reasonable hashmap-style O(1). Further there has been reported issues in the current discv5, so added error handling to help identify the issue has been added.	2021-09-16 04:45:05 +00:00
Age Manning	95b17137a8	Reduce network debug noise (#2593 ) The identify network debug logs can get quite noisy and are unnecessary to print on every request/response. This PR reduces debug noise by only printing messages for identify messages that offer some new information.	2021-09-14 08:28:35 +00:00
Wink Saville	4755d4b236	Update sloggers to v2.0.2 (#2588 ) fixes #2584	2021-09-14 06:48:26 +00:00
Paul Hauner	f9bba92db3	v1.5.2 (#2595 ) ## Issue Addressed NA ## Proposed Changes Version bump ## Additional Info Please do not `bors` without my approval, I am still testing.	2021-09-13 23:01:19 +00:00
Paul Hauner	ddbd4e6965	v1.5.2-rc.0 (#2565 ) ## Issue Addressed NA ## Proposed Changes - Bump version - Tidy some comments mangled by the version change regex. ## Additional Info NA	2021-09-03 23:28:21 +00:00
Michael Sproul	9c785a9b33	Optimize `process_attestation` with active balance cache (#2560 ) ## Proposed Changes Cache the total active balance for the current epoch in the `BeaconState`. Computing this value takes around 1ms, and this was negatively impacting block processing times on Prater, particularly when reconstructing states. With a large number of attestations in each block, I saw the `process_attestations` function taking 150ms, which means that reconstructing hot states can take up to 4.65s (31 * 150ms), and reconstructing freezer states can take up to 307s (2047 * 150ms). I opted to add the cache to the beacon state rather than computing the total active balance at the start of state processing and threading it through. Although this would be simpler in a way, it would waste time, particularly during block replay, as the total active balance doesn't change for the duration of an epoch. So we save ~32ms for hot states, and up to 8.1s for freezer states (using `--slots-per-restore-point 8192`).	2021-09-03 07:50:43 +00:00
realbigsean	50321c6671	Updates to make crates publishable (#2472 ) ## Issue Addressed Related to: #2259 Made an attempt at all the necessary updates here to publish the crates to crates.io. I incremented the minor versions on all the crates that have been previously published. We still might run into some issues as we try to publish because I'm not able to test this out but I think it's a good starting point. ## Proposed Changes - Add description and license to `ssz_types` and `serde_util` - rename `serde_util` to `eth2_serde_util` - increment minor versions - remove path dependencies - remove patch dependencies ## Additional Info Crates published: - [x] `tree_hash` -- need to publish `tree_hash_derive` and `eth2_hashing` first - [x] `eth2_ssz_types` -- need to publish `eth2_serde_util` first - [x] `tree_hash_derive` - [x] `eth2_ssz` - [x] `eth2_ssz_derive` - [x] `eth2_serde_util` - [x] `eth2_hashing` Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-09-03 01:10:25 +00:00
Pawan Dhananjay	5a3bcd2904	Validator monitor support for sync committees (#2476 ) ## Issue Addressed N/A ## Proposed Changes Add functionality in the validator monitor to provide sync committee related metrics for monitored validators. Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2021-08-31 23:31:36 +00:00
Paul Hauner	44fa54004c	Persist to DB after setting canonical head (#2547 ) ## Issue Addressed NA ## Proposed Changes Missed head votes on attestations is a well-known issue. The primary cause is a block getting set as the head after the attestation deadline. This PR aims to shorten the overall time between "block received" and "block set as head" by: 1. Persisting the head and fork choice after setting the canonical head - Informal measurements show this takes ~200ms 1. Pruning the op pool after setting the canonical head. 1. No longer persisting the op pool to disk during `BeaconChain::fork_choice` - Informal measurements show this can take up to 1.2s. I also add some metrics to help measure the effect of these changes. Persistence changes like this run the risk of breaking assumptions downstream. However, I have considered these risks and I think we're fine here. I will describe my reasoning for each change. ## Reasoning ### Change 1: Persisting the head and fork choice after setting the canonical head For (1), although the function is called `persist_head_and_fork_choice`, it only persists: - Fork choice - Head tracker - Genesis block root Since `BeaconChain::fork_choice_internal` does not modify these values between the original time we were persisting it and the current time, I assert that the change I've made is non-substantial in terms of what ends up on-disk. There's the possibility that some other thread has modified fork choice in the extra time we've given it, but that's totally fine. Since the only time we read those values from disk is during startup, I assert that this has no impact during runtime. ### Change 2: Pruning the op pool after setting the canonical head Similar to the argument above, we don't modify the op pool during `BeaconChain::fork_choice_internal` so it shouldn't matter when we prune. This change should be non-substantial. ### Change 3: No longer persisting the op pool to disk during `BeaconChain::fork_choice` This change is substantial. With the proposed changes, we'll only be persisting the op pool to disk when we shut down cleanly (i.e., the `BeaconChain` gets dropped). This means we'll save disk IO and time during usual operation, but a `kill -9` or similar "crash" will probably result in an out-of-date op pool when we reboot. An out-of-date op pool can only have an impact when producing blocks or aggregate attestations/sync committees. I think it's pretty reasonable that a crash might result in an out-of-date op pool, since: - Crashes are fairly rare. Practically the only time I see LH suffer a full crash is when the OOM killer shows up, and that's a very serious event. - It's generally quite rare to produce a block/aggregate immediately after a reboot. Just a few slots of runtime is probably enough to have a decent-enough op pool again. ## Additional Info Credits to @macladson for the timings referenced here.	2021-08-31 04:48:21 +00:00
Pawan Dhananjay	b4dd98b3c6	Shutdown after sync (#2519 ) ## Issue Addressed Resolves #2033 ## Proposed Changes Adds a flag to enable shutting down beacon node right after sync is completed. ## Additional Info Will need modification after weak subjectivity sync is enabled to change definition of a fully synced node.	2021-08-30 13:46:13 +00:00
Michael Sproul	10945e0619	Revert bad blocks on missed fork (#2529 ) ## Issue Addressed Closes #2526 ## Proposed Changes If the head block fails to decode on start up, do two things: 1. Revert all blocks between the head and the most recent hard fork (to `fork_slot - 1`). 2. Reset fork choice so that it contains the new head, and all blocks back to the new head's finalized checkpoint. ## Additional Info I tweaked some of the beacon chain test harness stuff in order to make it generic enough to test with a non-zero slot clock on start-up. In the process I consolidated all the various `new_` methods into a single generic one which will hopefully serve all future uses 🤞	2021-08-30 06:41:31 +00:00
Mason Stallmo	bc14d1d73d	Add more unix signal handlers (#2486 ) ## Issue Addressed Resolves #2114 Swapped out the ctrlc crate for tokio signals to hook register handlers for SIGPIPE and SIGHUP along with SIGTERM and SIGINT. ## Proposed Changes - Swap out the ctrlc crate for tokio signals for unix signal handing - Register signals for SIGPIPE and SHIGUP that trigger the same shutdown procedure as SIGTERM and SIGINT ## Additional Info I tested these changes against the examples in the original issue and noticed some interesting behavior on my machine. When running `lighthouse bn --network pyrmont \|& tee -a pyrmont_bn.log` or `lighthouse bn --network pyrmont 2>&1 \| tee -a pyrmont_bn.log` none of the above signals are sent to the lighthouse program in a way I was able to observe. The only time it seems that the signal gets sent to the lighthouse program is if there is no redirection of stderr to stdout. I'm not as familiar with the details of how unix signals work in linux with a redirect like that so I'm not sure if this is a bug in the program or expected behavior. Signals are correctly received without the redirection and if the above signals are sent directly to the program with something like `kill`.	2021-08-30 05:19:34 +00:00
Pawan Dhananjay	99737c551a	Improve eth1 fallback logging (#2490 ) ## Issue Addressed Resolves #2487 ## Proposed Changes Logs a message once in every invocation of `Eth1Service::update` method if the primary endpoint is unavailable for some reason. e.g. ```log Aug 03 00:09:53.517 WARN Error connecting to eth1 node endpoint action: trying fallbacks, endpoint: http://localhost:8545/, service: eth1_rpc Aug 03 00:09:56.959 INFO Fetched data from fallback fallback_number: 1, service: eth1_rpc ``` The main aim of this PR is to have an accompanying message to the "action: trying fallbacks" error message that is returned when checking the endpoint for liveness. This is mainly to indicate to the user that the fallback was live and reachable. ## Additional info This PR is not meant to be a catch all for all cases where the primary endpoint failed. For instance, this won't log anything if the primary node was working fine during endpoint liveness checking and failed during deposit/block fetching. This is done intentionally to reduce number of logs while initial deposit/block sync and to avoid more complicated logic.	2021-08-30 00:51:26 +00:00
Paul Hauner	b0ac3464ca	v1.5.1 (#2544 ) ## Issue Addressed NA ## Proposed Changes - Bump version ## Additional Info NA	2021-08-27 01:58:19 +00:00
Paul Hauner	4405425726	Expand gossip duplicate cache time (#2542 ) ## Issue Addressed NA ## Proposed Changes This PR expands the time that entries exist in the gossip-sub duplicate cache. Recent investigations found that this cache is one slot (12s) shorter than the period for which an attestation is permitted to propagate on the gossip network. Before #2540, this was causing peers to be unnecessarily down-scored for sending old attestations. Although that issue has been fixed, the duplicate cache time is increased here to avoid such messages from getting any further up the networking stack then required. ## Additional Info NA	2021-08-26 23:25:50 +00:00
Paul Hauner	3fdad38eba	Remove penality for duplicate attestation from same validator (#2540 ) ## Issue Addressed NA ## Proposed Changes A Discord user presented logs which indicated a drop in their peer count caused by a variety of peers sending attestations where we'd already seen an attestation for that validator. It's presently unclear how this case came about, but during our investigation I noticed that we are down-voting peers for sending such attestations. There are three scenarios where we may receive duplicate unagg. attestations from the same validator: 1. The validator is committing a slashable offense. 2. The gossipsub message-deduping functionality is not working as expected. 3. We received the message via the HTTP prior to seeing it via gossip. Scenario (1) would be so costly for an attacker that I don't think we need to add DoS protection for it. Scenario (2) seems feasible. Our "seen message" caches in gossipsub might fill up/expire and let through these duplicates. There are also cases involving message ID mismatches with the other peers. In both these cases, I don't think we should be doing 1 attestation == -1 point down-voting. Scenario (3) is not necessarily a fault of the peer and we shouldn't down-score them for it. ## Additional Info NA	2021-08-26 08:00:50 +00:00
Age Manning	09545fe668	Increase maximum gossipsub subscriptions (#2531 ) Due to the altair fork, in principle we can now subscribe to up to 148 topics. This bypasses our original limit and we can end up rejecting subscriptions. This PR increases the limit to account for the fork.	2021-08-26 02:01:10 +00:00
Pawan Dhananjay	d3b4cbed53	Packet filter cli option (#2523 ) ## Issue Addressed N/A ## Proposed Changes Adds a cli option to disable packet filter in `lighthouse bootnode`. This is useful in running local testnets as the bootnode bans requests from the same ip(localhost) if the packet filter is enabled.	2021-08-26 00:29:39 +00:00

1 2 3 4 5 ...

1666 Commits