lighthouse

Author	SHA1	Message	Date
Age Manning	0b319d4926	Inform dialing via the behaviour (#2814 ) I had this change but it seems to have been lost in chaos of network upgrades. The swarm dialing event seems to miss some cases where we dial via the behaviour. This causes an error to be logged as the peer manager doesn't know about some dialing events. This shifts the logic to the behaviour to inform the peer manager.	2021-11-19 04:42:33 +00:00
Divma	53562010ec	Move peer db writes to eth2 libp2p (#2724 ) ## Issue Addressed Part of a bigger effort to make the network globals read only. This moves all writes to the `PeerDB` to the `eth2_libp2p` crate. Limiting writes to the peer manager is a slightly more complicated issue for a next PR, to keep things reviewable. ## Proposed Changes - Make the peers field in the globals a private field. - Allow mutable access to the peers field to `eth2_libp2p` for now. - Add a new network message to update the sync state. Co-authored-by: Age Manning <Age@AgeManning.com>	2021-11-19 04:42:31 +00:00
Divma	31386277c3	Sync wrong dbg assertion (#2821 ) ## Issue Addressed Running a beacon node I triggered a sync debug panic. And so finally the time to create tests for sync arrived. Fortunately, te bug was not in the sync algorithm itself but a wrong assertion ## Proposed Changes - Split Range's impl from the BeaconChain via a trait. This is needed for testing. The TestingRig/Harness is way bigger than needed and does not provide the modification functionalities that are needed to test sync. I find this simpler, tho some could disagree. - Add a regression test for sync that fails before the changes. - Fix the wrong assertion.	2021-11-19 02:38:25 +00:00
Age Manning	e519af9012	Update Lighthouse Dependencies (#2818 ) ## Issue Addressed Updates lighthouse dependencies to resolve audit issues in out-dated deps.	2021-11-18 05:08:42 +00:00
Pawan Dhananjay	e32c09bfda	Fix decoding max length (#2816 ) ## Issue Addressed N/A ## Proposed Changes Fix encoder max length to the correct value (`MAX_RPC_SIZE`).	2021-11-16 22:23:39 +00:00
Age Manning	a43a2448b7	Investigate and correct RPC Response Timeouts (#2804 ) RPC Responses are for some reason not removing their timeout when they are completing. As an example: ``` Nov 09 01:18:20.256 DEBG Received BlocksByRange Request step: 1, start_slot: 728465, count: 64, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:20.263 DEBG Received BlocksByRange Request step: 1, start_slot: 728593, count: 64, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:20.483 DEBG BlocksByRange Response sent returned: 63, requested: 64, current_slot: 2466389, start_slot: 728465, msg: Failed to return all requested blocks, peer: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:20.500 DEBG BlocksByRange Response sent returned: 64, requested: 64, current_slot: 2466389, start_slot: 728593, peer: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:21.068 DEBG Received BlocksByRange Request step: 1, start_slot: 728529, count: 64, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:21.272 DEBG BlocksByRange Response sent returned: 63, requested: 64, current_slot: 2466389, start_slot: 728529, msg: Failed to return all requested blocks, peer: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:23.434 DEBG Received BlocksByRange Request step: 1, start_slot: 728657, count: 64, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:23.665 DEBG BlocksByRange Response sent returned: 64, requested: 64, current_slot: 2466390, start_slot: 728657, peer: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:25.851 DEBG Received BlocksByRange Request step: 1, start_slot: 728337, count: 64, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:25.851 DEBG Received BlocksByRange Request step: 1, start_slot: 728401, count: 64, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:26.094 DEBG BlocksByRange Response sent returned: 62, requested: 64, current_slot: 2466390, start_slot: 728401, msg: Failed to return all requested blocks, peer: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:26.100 DEBG BlocksByRange Response sent returned: 63, requested: 64, current_slot: 2466390, start_slot: 728337, msg: Failed to return all requested blocks, peer: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:31.070 DEBG RPC Error direction: Incoming, score: 0, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw, client: Prysm: version: a80b1c252a9b4773493b41999769bf3134ac373f, os_version: unknown, err: Stream Timeout, protocol: beacon_blocks_by_range, service: libp2p Nov 09 01:18:31.070 WARN Timed out to a peer's request. Likely insufficient resources, reduce peer count, service: libp2p Nov 09 01:18:31.085 DEBG RPC Error direction: Incoming, score: 0, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw, client: Prysm: version: a80b1c252a9b4773493b41999769bf3134ac373f, os_version: unknown, err: Stream Timeout, protocol: beacon_blocks_by_range, service: libp2p Nov 09 01:18:31.085 WARN Timed out to a peer's request. Likely insufficient resources, reduce peer count, service: libp2p Nov 09 01:18:31.459 DEBG RPC Error direction: Incoming, score: 0, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw, client: Prysm: version: a80b1c252a9b4773493b41999769bf3134ac373f, os_version: unknown, err: Stream Timeout, protocol: beacon_blocks_by_range, service: libp2p Nov 09 01:18:31.459 WARN Timed out to a peer's request. Likely insufficient resources, reduce peer count, service: libp2p Nov 09 01:18:34.129 DEBG RPC Error direction: Incoming, score: 0, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw, client: Prysm: version: a80b1c252a9b4773493b41999769bf3134ac373f, os_version: unknown, err: Stream Timeout, protocol: beacon_blocks_by_range, service: libp2p Nov 09 01:18:34.130 WARN Timed out to a peer's request. Likely insufficient resources, reduce peer count, service: libp2p Nov 09 01:18:35.686 DEBG Peer Manager disconnecting peer reason: Too many peers, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw, service: libp2p ``` This PR is to investigate and correct the issue. ~~My current thoughts are that for some reason we are not closing the streams correctly, or fast enough, or the executor is not registering the closes and waking up.~~ - Pretty sure this is not the case, see message below for a more accurate reason. ~~I've currently added a timeout to stream closures in an attempt to force streams to close and the future to always complete.~~ I removed this	2021-11-16 03:42:25 +00:00
Paul Hauner	931daa40d7	Add fork choice EF tests (#2737 ) ## Issue Addressed Resolves #2545 ## Proposed Changes Adds the long-overdue EF tests for fork choice. Although we had pretty good coverage via other implementations that closely followed our approach, it is nonetheless important for us to implement these tests too. During testing I found that we were using a hard-coded `SAFE_SLOTS_TO_UPDATE_JUSTIFIED` value rather than one from the `ChainSpec`. This caused a failure during a minimal preset test. This doesn't represent a risk to mainnet or testnets, since the hard-coded value matched the mainnet preset. ## Failing Cases There is one failing case which is presently marked as `SkippedKnownFailure`: ``` case 4 ("new_finalized_slot_is_justified_checkpoint_ancestor") from /home/paul/development/lighthouse/testing/ef_tests/consensus-spec-tests/tests/minimal/phase0/fork_choice/on_block/pyspec_tests/new_finalized_slot_is_justified_checkpoint_ancestor failed with NotEqual: head check failed: Got Head { slot: Slot(40), root: 0x9183dbaed4191a862bd307d476e687277fc08469fc38618699863333487703e7 } \| Expected Head { slot: Slot(24), root: 0x105b49b51bf7103c182aa58860b039550a89c05a4675992e2af703bd02c84570 } ``` This failure is due to #2741. It's not a particularly high priority issue at the moment, so we fix it after merging this PR.	2021-11-08 07:29:04 +00:00
Divma	fbafe416d1	Move the peer manager to be a behaviour (#2773 ) This simply moves some functions that were "swarm notifications" to a network behaviour implementation. Notes ------ - We could disconnect from the peer manager but we would lose the rpc shutdown message - We still notify from the swarm since this is the most reliable way to get some events. Ugly but best for now - Events need to be pushed with "add event" to wake the waker Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>	2021-11-08 00:01:10 +00:00
Michael Sproul	df02639b71	De-duplicate attestations in the slasher (#2767 ) ## Issue Addressed Closes https://github.com/sigp/lighthouse/issues/2112 Closes https://github.com/sigp/lighthouse/issues/1861 ## Proposed Changes Collect attestations by validator index in the slasher, and use the magic of reference counting to automatically discard redundant attestations. This results in us storing only 1-2% of the attestations observed when subscribed to all subnets, which carries over to a 50-100x reduction in data stored 🎉 ## Additional Info There's some nuance to the configuration of the `slot-offset`. It has a profound effect on the effictiveness of de-duplication, see the docs added to the book for an explanation: `5442e695e5/book/src/slasher.md (slot-offset)`	2021-11-08 00:01:09 +00:00
Divma	a683e0296a	Peer manager cfg (#2766 ) ## Issue Addressed I've done this change in a couple of WIPs already so I might as well submit it on its own. This changes no functionality but reduces coupling in a 0.0001%. It also helps new people who need to work in the peer manager to better understand what it actually needs from the outside ## Proposed Changes Add a config to the peer manager	2021-11-03 23:44:44 +00:00
Divma	7502970a7d	Do not compute metrics in the network service if the cli flag is not set (#2765 ) ## Issue Addressed The computation of metrics in the network service can be expensive. This disables the computation unless the cli flag `metrics` is set. ## Additional Info Metrics in other parts of the network are still updated, since most are simple metrics and checking if metrics are enabled each time each metric is updated doesn't seem like a gain.	2021-11-03 00:06:03 +00:00
realbigsean	c4ad0e3fb3	Ensure dependent root consistency in head events (#2753 ) ## Issue Addressed @paulhauner noticed that when we send head events, we use the block root from `new_head` in `fork_choice_internal`, but calculate `dependent_root` and `previous_dependent_root` using the `canonical_head`. This is normally fine because `new_head` updates the `canonical_head` in `fork_choice_internal`, but it's possible we have a reorg updating `canonical_head` before our head events are sent. So this PR ensures `dependent_root` and `previous_dependent_root` are always derived from the state associated with `new_head`. Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-11-02 02:26:32 +00:00
Pawan Dhananjay	4499adc7fd	Check proposer index during block production (#2740 ) ## Issue Addressed Resolves #2612 ## Proposed Changes Implements both the checks mentioned in the original issue. 1. Verifies the `randao_reveal` in the beacon node 2. Cross checks the proposer index after getting back the block from the beacon node. ## Additional info The block production time increases by ~10x because of the signature verification on the beacon node (based on the `beacon_block_production_process_seconds` metric) when running on a local testnet.	2021-11-01 07:44:40 +00:00
Michael Sproul	ffb04e1a9e	Add op pool metrics for attestations (#2758 ) ## Proposed Changes Add several metrics for the number of attestations in the op pool. These give us a way to observe the number of valid, non-trivial attestations during block packing rather than just the size of the entire op pool.	2021-11-01 05:52:31 +00:00
Divma	e2c0650d16	Relax late sync committee penalty (#2752 ) ## Issue Addressed Getting too many peers kicked due to slightly late sync committee messages as tested on.. under-performant hardware. ## Proposed Changes Only penalize if the message is more than one slot late. Still ignore the message- Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>	2021-10-31 22:30:19 +00:00
Age Manning	1790010260	Upgrade to latest libp2p (#2605 ) This is a pre-cursor to the next libp2p upgrade. It is currently being used for staging a number of PR upgrades which are contingent on the latest libp2p.	2021-10-29 01:59:29 +00:00
ethDreamer	2c4413454a	Fixed Gossip Topics on Fork Boundary (#2619 ) ## Issue Addressed The [p2p-interface section of the `altair` spec](https://github.com/ethereum/consensus-specs/blob/dev/specs/altair/p2p-interface.md#transitioning-the-gossip) says you should subscribe to the topics for a fork "In advance of the fork" and unsubscribe from old topics `2 Epochs` after the new fork is activated. We've chosen to subscribe to new fork topics `2 slots` before the fork is initiated. This function is supposed to return the required fork digests at any given time but as it was currently written, it doesn't return the fork digest for a previous fork if you've switched to the current fork less than 2 epoch's ago. Also this function required modification for every new fork we add. ## Proposed Changes Make this function fork-agnostic and correctly handle the previous fork topic digests when you've only just switched to the new fork.	2021-10-29 00:05:27 +00:00
Pawan Dhananjay	88063398f6	Prevent double import of blocks (#2647 ) ## Issue Addressed Resolves #2611 ## Proposed Changes Adds a duplicate block root cache to the `BeaconProcessor`. Adds the block root to the cache before calling `process_gossip_block` and `process_rpc_block`. Since `process_rpc_block` is called only for single block lookups, we don't have to worry about batched block imports. The block is imported from the source(gossip/rpc) that arrives first. The block that arrives second is not imported to avoid the db access issue. There are 2 cases: 1. Block that arrives second is from rpc: In this case, we return an optimistic `BlockError::BlockIsAlreadyKnown` to sync. 2. Block that arrives second is from gossip: In this case, we only do gossip verification and forwarding but don't import the block into the the beacon chain. ## Additional info Splits up `process_gossip_block` function to `process_gossip_unverified_block` and `process_gossip_verified_block`.	2021-10-28 03:36:14 +00:00
Michael Sproul	2dc6163043	Add API version headers and `map_fork_name!` (#2745 ) ## Proposed Changes * Add the `Eth-Consensus-Version` header to the HTTP API for the block and state endpoints. This is part of the v2.1.0 API that was recently released: https://github.com/ethereum/beacon-APIs/pull/170 * Add tests for the above. I refactored the `eth2` crate's helper functions to make this more straight-forward, and introduced some new mixin traits that I think greatly improve readability and flexibility. * Add a new `map_with_fork!` macro which is useful for decoding a superstruct type without naming all its variants. It is now used for SSZ-decoding `BeaconBlock` and `BeaconState`, and for JSON-decoding `SignedBeaconBlock` in the API. ## Additional Info The `map_with_fork!` changes will conflict with the Merge changes, but when resolving the conflict the changes from this branch should be preferred (it is no longer necessary to enumerate every fork). The merge fork _will_ need to be added to `map_fork_name_with`.	2021-10-28 01:18:04 +00:00
Mac L	8edd9d45ab	Fix purge-db edge case (#2747 ) ## Issue Addressed Currently, if you launch the beacon node with the `--purge-db` flag and the `beacon` directory exists, but one (or both) of the `chain_db` or `freezer-db` directories are missing, it will error unnecessarily with: ``` Failed to remove chain_db: No such file or directory (os error 2) ``` This is an edge case which can occur in cases of manual intervention (a user deleted the directory) or if you had previously run with the `--purge-db` flag and Lighthouse errored before it could initialize the db directories. ## Proposed Changes Check if the `chain_db`/`freezer_db` exists before attempting to remove them. This prevents unnecessary errors.	2021-10-25 22:11:28 +00:00
Divma	d4819bfd42	Add a waker to the RPC handler (#2721 ) ## Issue Addressed Attempts to fix #2701 but I doubt this is the reason behind that. ## Proposed Changes maintain a waker in the rpc handler and call it if an event is received	2021-10-21 06:14:36 +00:00
Pawan Dhananjay	de34001e78	Update `next_fork_subscriptions` correctly (#2688 ) ## Issue Addressed N/A ## Proposed Changes Update the `next_fork_subscriptions` timer only after a fork happens.	2021-10-21 04:38:44 +00:00
divma	99f7a7db58	remove double backfill sync state (#2733 ) ## Issue Addressed In the backfill sync the state was maintained twice, once locally and also in the globals. This makes it so that it's maintained only once. The only behavioral change is that when backfill sync in paused, the global backfill state is updated. I asked @AgeManning about this and he deemed it a bug, so this solves it.	2021-10-19 22:32:25 +00:00
Michael Sproul	aad397f00a	Resolve Rust 1.56 lints and warnings (#2728 ) ## Issue Addressed When compiling with Rust 1.56.0 the compiler generates 3 instances of this warning: ``` warning: trailing semicolon in macro used in expression position --> common/eth2_network_config/src/lib.rs:181:24 \| 181 \| })?; \| ^ ... 195 \| let deposit_contract_deploy_block = load_from_file!(DEPLOY_BLOCK_FILE); \| ---------------------------------- in this macro invocation \| = note: `#[warn(semicolon_in_expressions_from_macros)]` on by default = warning: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release! = note: for more information, see issue #79813 <https://github.com/rust-lang/rust/issues/79813> = note: this warning originates in the macro `load_from_file` (in Nightly builds, run with -Z macro-backtrace for more info) ``` This warning is completely harmless, but will be visible to users compiling Lighthouse v2.0.1 (or earlier) with Rust 1.56.0 (to be released October 21st). It is completely safe to ignore this warning, it's just a superficial change to Rust's syntax. ## Proposed Changes This PR removes the semi-colon as recommended, and fixes the new Clippy lints from 1.56.0	2021-10-19 00:30:42 +00:00
Akihito Nakano	efec60ee90	Tiny fix: wrong log level (#2720 ) ## Proposed Changes If the `RemoveChain` is critical log level should be crit. 🙂	2021-10-19 00:30:41 +00:00
Michael Sproul	d2e3d4c6f1	Add flag to disable lock timeouts (#2714 ) ## Issue Addressed Mitigates #1096 ## Proposed Changes Add a flag to the beacon node called `--disable-lock-timeouts` which allows opting out of lock timeouts. The lock timeouts serve a dual purpose: 1. They prevent any single operation from hogging the lock for too long. When a timeout occurs it logs a nasty error which indicates that there's suboptimal lock use occurring, which we can then act on. 2. They allow deadlock detection. We're fairly sure there are no deadlocks left in Lighthouse anymore but the timeout locks offer a safeguard against that. However, timeouts on locks are not without downsides: They allow for the possibility of livelock, particularly on slower hardware. If lock timeouts keep failing spuriously the node can be prevented from making any progress, even if it would be able to make progress slowly without the timeout. One particularly concerning scenario which could occur would be if a DoS attack succeeded in slowing block signature verification times across the network, and all Lighthouse nodes got livelocked because they timed out repeatedly. This could also occur on just a subset of nodes (e.g. dual core VPSs or Raspberri Pis). By making the behaviour runtime configurable this PR allows us to choose the behaviour we want depending on circumstance. I suspect that long term we could make the timeout-free approach the default (#2381 moves in this direction) and just enable the timeouts on our testnet nodes for debugging purposes. This PR conservatively leaves the default as-is so we can gain some more experience before switching the default.	2021-10-19 00:30:40 +00:00
Age Manning	df40700ddd	Rename eth2_libp2p to lighthouse_network (#2702 ) ## Description The `eth2_libp2p` crate was originally named and designed to incorporate a simple libp2p integration into lighthouse. Since its origins the crates purpose has expanded dramatically. It now houses a lot more sophistication that is specific to lighthouse and no longer just a libp2p integration. As of this writing it currently houses the following high-level lighthouse-specific logic: - Lighthouse's implementation of the eth2 RPC protocol and specific encodings/decodings - Integration and handling of ENRs with respect to libp2p and eth2 - Lighthouse's discovery logic, its integration with discv5 and logic about searching and handling peers. - Lighthouse's peer manager - This is a large module handling various aspects of Lighthouse's network, such as peer scoring, handling pings and metadata, connection maintenance and recording, etc. - Lighthouse's peer database - This is a collection of information stored for each individual peer which is specific to lighthouse. We store connection state, sync state, last seen ips and scores etc. The data stored for each peer is designed for various elements of the lighthouse code base such as syncing and the http api. - Gossipsub scoring - This stores a collection of gossipsub 1.1 scoring mechanisms that are continuously analyssed and updated based on the ethereum 2 networks and how Lighthouse performs on these networks. - Lighthouse specific types for managing gossipsub topics, sync status and ENR fields - Lighthouse's network HTTP API metrics - A collection of metrics for lighthouse network monitoring - Lighthouse's custom configuration of all networking protocols, RPC, gossipsub, discovery, identify and libp2p. Therefore it makes sense to rename the crate to be more akin to its current purposes, simply that it manages the majority of Lighthouse's network stack. This PR renames this crate to `lighthouse_network` Co-authored-by: Paul Hauner <paul@paulhauner.com>	2021-10-19 00:30:39 +00:00
Paul Hauner	fff01b24dd	Release v2.0.1 (#2726 ) ## Issue Addressed NA ## Proposed Changes - Update versions to `v2.0.1` in anticipation for a release early next week. - Add `--ignore` to `cargo audit`. See #2727. ## Additional Info NA	2021-10-18 03:08:32 +00:00
Age Manning	180c90bf6d	Correct peer connection transition logic (#2725 ) ## Description This PR updates the peer connection transition logic. It is acceptable for a peer to immediately transition from a disconnected state to a disconnecting state. This can occur when we are at our peer limit and a new peer's dial us.	2021-10-17 04:04:36 +00:00
Paul Hauner	a7b675460d	Add Altair tests to op pool (#2723 ) ## Issue Addressed NA ## Proposed Changes Adds some more testing for Altair to the op pool. Credits to @michaelsproul for some appropriated efforts here. ## Additional Info NA Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2021-10-16 05:07:23 +00:00
Michael Sproul	5cde3fc4da	Reduce lock contention in backfill sync (#2716 ) ## Proposed Changes Clone the proposer pubkeys during backfill signature verification to reduce the time that the pubkey cache lock is held for. Cloning such a small number of pubkeys has negligible impact on the total running time, but greatly reduces lock contention. On a Ryzen 5950X, the setup step seems to take around 180us regardless of whether the key is cloned or not, while the verification takes 7ms. When Lighthouse is limited to 10% of one core using `sudo cpulimit --pid <pid> --limit 10` the total time jumps up to 800ms, but the setup step remains only 250us. This means that under heavy load this PR could cut the time the lock is held for from 800ms to 250us, which is a huge saving of 99.97%!	2021-10-15 03:28:03 +00:00
Paul Hauner	9c5a8ab7f2	Change "too many resources" to "insufficient resources" in eth2_libp2p (#2713 ) ## Issue Addressed NA ## Proposed Changes Fixes what I assume is a typo in a log message. See the diff for details. ## Additional Info NA	2021-10-15 00:07:12 +00:00
Age Manning	05040e68ec	Update discovery (#2711 ) ## Issue Addressed #2695 ## Proposed Changes This updates discovery to the latest version which has patched a panic that occurred due to a race condition in the bucket logic.	2021-10-14 22:09:38 +00:00
Paul Hauner	e2d09bb8ac	Add `BeaconChainHarness::builder` (#2707 ) ## Issue Addressed NA ## Proposed Changes This PR is near-identical to https://github.com/sigp/lighthouse/pull/2652, however it is to be merged into `unstable` instead of `merge-f2f`. Please see that PR for reasoning. I'm making this duplicate PR to merge to `unstable` in an effort to shrink the diff between `unstable` and `merge-f2f` by doing smaller, lead-up PRs. ## Additional Info NA	2021-10-14 02:58:10 +00:00
Pawan Dhananjay	34d22b5920	Reduce validator monitor logging verbosity (#2606 ) ## Issue Addressed Resolves #2541 ## Proposed Changes Reduces verbosity of validator monitor per epoch logging by batching info logs for multiple validators. Instead of a log for every validator managed by the validator monitor, we now batch logs for attestation records for previous epoch. Before: ```log Sep 20 06:53:08.239 INFO Previous epoch attestation success validator: 1, epoch: 65875, matched_head: true, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon Sep 20 06:53:08.239 INFO Previous epoch attestation success validator: 2, epoch: 65875, matched_head: true, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon Sep 20 06:53:08.239 INFO Previous epoch attestation success validator: 3, epoch: 65875, matched_head: true, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon Sep 20 06:53:08.239 INFO Previous epoch attestation success validator: 4, epoch: 65875, matched_head: true, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon Sep 20 06:53:08.239 INFO Previous epoch attestation success validator: 5, epoch: 65875, matched_head: false, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon Sep 20 06:53:08.239 WARN Attestation failed to match head validator: 5, epoch: 65875, service: val_mon Sep 20 06:53:08.239 INFO Previous epoch attestation success validator: 6, epoch: 65875, matched_head: false, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon Sep 20 06:53:08.239 WARN Attestation failed to match head validator: 6, epoch: 65875, service: val_mon Sep 20 06:53:08.239 INFO Previous epoch attestation success validator: 7, epoch: 65875, matched_head: true, matched_target: false, inclusion_lag: 1 slot(s), service: val_mon Sep 20 06:53:08.239 WARN Attestation failed to match target validator: 7, epoch: 65875, service: val_mon Sep 20 06:53:08.239 WARN Sub-optimal inclusion delay validator: 7, epoch: 65875, optimal: 1, delay: 2, service: val_mon Sep 20 06:53:08.239 INFO Previous epoch attestation success validator: 8, epoch: 65875, matched_head: true, matched_target: false, inclusion_lag: 1 slot(s), service: val_mon Sep 20 06:53:08.239 WARN Attestation failed to match target validator: 8, epoch: 65875, service: val_mon Sep 20 06:53:08.239 WARN Sub-optimal inclusion delay validator: 8, epoch: 65875, optimal: 1, delay: 2, service: val_mon Sep 20 06:53:08.239 ERRO Previous epoch attestation missing validator: 9, epoch: 65875, service: val_mon Sep 20 06:53:08.239 ERRO Previous epoch attestation missing validator: 10, epoch: 65875, service: val_mon ``` after ``` Sep 20 06:53:08.239 INFO Previous epoch attestation success validators: [1,2,3,4,5,6,7,8,9] , epoch: 65875, service: val_mon Sep 20 06:53:08.239 WARN Previous epoch attestation failed to match head, validators: [5,6], epoch: 65875, service: val_mon Sep 20 06:53:08.239 WARN Previous epoch attestation failed to match target, validators: [7,8], epoch: 65875, service: val_mon Sep 20 06:53:08.239 WARN Previous epoch attestations had sub-optimal inclusion delay, validators: [7,8], epoch: 65875, service: val_mon Sep 20 06:53:08.239 ERRO Previous epoch attestation missing validators: [9,10], epoch: 65875, service: val_mon ``` The detailed individual logs are downgraded to debug logs.	2021-10-12 05:06:48 +00:00
Mac L	a73d698e30	Add TLS capability to the beacon node HTTP API (#2668 ) Currently, the beacon node has no ability to serve the HTTP API over TLS. Adding this functionality would be helpful for certain use cases, such as when you need a validator client to connect to a backup beacon node which is outside your local network, and the use of an SSH tunnel or reverse proxy would be inappropriate. ## Proposed Changes - Add three new CLI flags to the beacon node - `--http-enable-tls`: enables TLS - `--http-tls-cert`: to specify the path to the certificate file - `--http-tls-key`: to specify the path to the key file - Update the HTTP API to optionally use `warp`'s [`TlsServer`](https://docs.rs/warp/0.3.1/warp/struct.TlsServer.html) depending on the presence of the `--http-enable-tls` flag - Update tests and docs - Use a custom branch for `warp` to ensure proper error handling ## Additional Info Serving the API over TLS should currently be considered experimental. The reason for this is that it uses code from an [unmerged PR](https://github.com/seanmonstar/warp/pull/717). This commit provides the `try_bind_with_graceful_shutdown` method to `warp`, which is helpful for controlling error flow when the TLS configuration is invalid (cert/key files don't exist, incorrect permissions, etc). I've implemented the same code in my [branch here](https://github.com/macladson/warp/tree/tls). Once the code has been reviewed and merged upstream into `warp`, we can remove the dependency on my branch and the feature can be considered more stable. Currently, the private key file must not be password-protected in order to be read into Lighthouse.	2021-10-12 03:35:49 +00:00
Age Manning	0aee7ec873	Refactor Peerdb and PeerManager (#2660 ) ## Proposed Changes This is a refactor of the PeerDB and PeerManager. A number of bugs have been surfacing around the connection state of peers and their interaction with the score state. This refactor tightens the mutability properties of peers such that only specific modules are able to modify the state of peer information preventing inadvertant state changes that can lead to our local peer manager db being out of sync with libp2p. Further, the logic around connection and scoring was quite convoluted and the distinction between the PeerManager and Peerdb was not well defined. Although these issues are not fully resolved, this PR is step to cleaning up this logic. The peerdb solely manages most mutability operations of peers leaving high-order logic to the peer manager. A single `update_connection_state()` function has been added to the peer-db making it solely responsible for modifying the peer's connection state. The way the peer's scores can be modified have been reduced to three simple functions (`update_scores()`, `update_gossipsub_scores()` and `report_peer()`). This prevents any add-hoc modifications of scores and only natural processes of score modification is allowed which simplifies the reasoning of score and state changes.	2021-10-11 02:45:06 +00:00
Michael Sproul	708557a473	Fix cargo audit warns for nix, psutil, time (#2699 ) ## Issue Addressed Fix `cargo audit` failures on `unstable` Closes #2698 ## Proposed Changes The main culprit is `nix`, which is vulnerable for versions below v0.23.0. We can't get by with a straight-forward `cargo update` because `psutil` depends on an old version of `nix` (cf. https://github.com/rust-psutil/rust-psutil/pull/93). Hence I've temporarily forked `psutil` under the `sigp` org, where I've included the update to `nix` v0.23.0. Additionally, I took the chance to update the `time` dependency to v0.3, which removed a bunch of stale deps including `stdweb` which is no longer maintained. Lighthouse only uses the `time` crate in the notifier to do some pretty printing, and so wasn't affected by any of the breaking changes in v0.3 ([changelog here](https://github.com/time-rs/time/blob/main/CHANGELOG.md#030-2021-07-30)).	2021-10-11 00:10:35 +00:00
Pawan Dhananjay	7c7ba770de	Update broken api links (#2665 ) ## Issue Addressed Resolves #2563 Replacement for #2653 as I'm not able to reopen that PR after force pushing. ## Proposed Changes Fixes all broken api links. Cherry picked changes in #2590 and updated a few more links. Co-authored-by: Mason Stallmo <masonstallmo@gmail.com>	2021-10-06 00:46:09 +00:00
Pawan Dhananjay	73ec29c267	Don't log errors on resubscription of gossip topics (#2613 ) ## Issue Addressed Resolves #2555 ## Proposed Changes Don't log errors on resubscribing to topics. Also don't log errors if we are setting already set attnet/syncnet bits.	2021-10-06 00:46:08 +00:00
Wink Saville	58870fc6d3	Add test_logger as feature to logging (#2586 ) ## Issue Addressed Fix #2585 ## Proposed Changes Provide a canonical version of test_logger that can be used throughout lighthouse. ## Additional Info This allows tests to conditionally emit logging data by adding test_logger as the default logger. And then when executing `cargo test --features logging/test_logger` log output will be visible: wink@3900x:~/lighthouse/common/logging/tests/test-feature-test_logger (Add-test_logger-as-feature-to-logging) $ cargo test --features logging/test_logger Finished test [unoptimized + debuginfo] target(s) in 0.02s Running unittests (target/debug/deps/test_logger-e20115db6a5e3714) running 1 test Sep 10 12:53:45.212 INFO hi, module: test_logger:8 test tests::test_fn_with_logging ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s Doc-tests test-logger running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s Or, in normal scenarios where logging isn't needed, executing `cargo test` the log output will not be visible: wink@3900x:~/lighthouse/common/logging/tests/test-feature-test_logger (Add-test_logger-as-feature-to-logging) $ cargo test Finished test [unoptimized + debuginfo] target(s) in 0.02s Running unittests (target/debug/deps/test_logger-02e02f8d41e8cf8a) running 1 test test tests::test_fn_with_logging ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s Doc-tests test-logger running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s	2021-10-06 00:46:07 +00:00
Michael Sproul	7c88f582d9	Release v2.0.0 (#2673 ) ## Proposed Changes * Bump version to v2.0.0 * Update dependencies (obsoletes #2670). `tokio-macros` v1.4.0 had been yanked due to a bug.	2021-10-05 03:53:18 +00:00
Michael Sproul	ed1fc7cca6	Fix I/O atomicity issues with checkpoint sync (#2671 ) ## Issue Addressed This PR addresses an issue found by @YorickDowne during testing of v2.0.0-rc.0. Due to a lack of atomic database writes on checkpoint sync start-up, it was possible for the database to get into an inconsistent state from which it couldn't recover without `--purge-db`. The core of the issue was that the store's anchor info was being stored _before_ the `PersistedBeaconChain`. If a crash occured so that anchor info was stored but _not_ the `PersistedBeaconChain`, then on restart Lighthouse would think the database was unitialized and attempt to compare-and-swap a `None` value, but would actually find the stale info from the previous run. ## Proposed Changes The issue is fixed by writing the anchor info, the split point, and the `PersistedBeaconChain` atomically on start-up. Some type-hinting ugliness was required, which could possibly be cleaned up in future refactors.	2021-10-05 03:53:17 +00:00
Kane Wallmann	28b79084cd	Fix chain_id value in config/deposit_contract RPC method (#2659 ) ## Issue Addressed This PR addresses issue #2657 ## Proposed Changes Changes `/eth/v1/config/deposit_contract` endpoint to return the chain ID from the loaded chain spec instead of eth1::DEFAULT_NETWORK_ID which is the Goerli chain ID of 5. Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2021-10-01 06:32:38 +00:00
Michael Sproul	ea78315749	Release v2.0.0-rc.0 (#2634 ) ## Proposed Changes Cut the first release candidate for v2.0.0, in preparation for testing and release this week ## Additional Info Builds on #2632, which should either be merged first or in the same batch	2021-10-01 01:23:55 +00:00
Age Manning	29a8865d07	Consistent tracking of disconnected peers (#2650 ) ## Issue Addressed N/A ## Proposed Changes When peers switching to a disconnecting state, decrement the disconnected peers counter. This also downgrades some crit logs to errors. I've also added a re-sync point when peers get unbanned the disconnected peer count will match back to the number of disconnected peers if it has gone out of sync previously.	2021-09-30 04:31:43 +00:00
Squirrel	db4d72c4f1	Remove unused deps (#2592 ) Found some deps you're possibly not using. Please shout if you think they are indeed still needed.	2021-09-30 04:31:42 +00:00
Mac L	4c510f8f6b	Add `BlockTimesCache` to allow additional block delay metrics (#2546 ) ## Issue Addressed Closes #2528 ## Proposed Changes - Add `BlockTimesCache` to provide block timing information to `BeaconChain`. This allows additional metrics to be calculated for blocks that are set as head too late. - Thread the `seen_timestamp` of blocks received from RPC responses (except blocks from syncing) through to the sync manager, similar to what is done for blocks from gossip. ## Additional Info This provides the following additional metrics: - `BEACON_BLOCK_OBSERVED_SLOT_START_DELAY_TIME` - The delay between the start of the slot and when the block was first observed. - `BEACON_BLOCK_IMPORTED_OBSERVED_DELAY_TIME` - The delay between when the block was first observed and when the block was imported. - `BEACON_BLOCK_HEAD_IMPORTED_DELAY_TIME` - The delay between when the block was imported and when the block was set as head. The metric `BEACON_BLOCK_IMPORTED_SLOT_START_DELAY_TIME` was removed. A log is produced when a block is set as head too late, e.g.: ``` Aug 27 03:46:39.006 DEBG Delayed head block set_as_head_delay: Some(21.731066ms), imported_delay: Some(119.929934ms), observed_delay: Some(3.864596988s), block_delay: 4.006257988s, slot: 1931331, proposer_index: 24294, block_root: 0x937602c89d3143afa89088a44bdf4b4d0d760dad082abacb229495c048648a9e, service: beacon ```	2021-09-30 04:31:41 +00:00
Pawan Dhananjay	70441aa554	Improve valmon inclusion delay calculation (#2618 ) ## Issue Addressed Resolves #2552 ## Proposed Changes Offers some improvement in inclusion distance calculation in the validator monitor. When registering an attestation from a block, instead of doing `block.slot() - attesstation.data.slot()` to get the inclusion distance, we now pass the parent block slot from the beacon chain and do `parent_slot.saturating_sub(attestation.data.slot())`. This allows us to give best effort inclusion distance in scenarios where the attestation was included right after a skip slot. Note that this does not give accurate results in scenarios where the attestation was included few blocks after the skip slot. In this case, if the attestation slot was `b1` and was included in block `b2` with a skip slot in between, we would get the inclusion delay as 0 (by ignoring the skip slot) which is the best effort inclusion delay. ``` b1 <- missed <- b2 ``` Here, if the attestation slot was `b1` and was included in block `b3` with a skip slot and valid block `b2` in between, then we would get the inclusion delay as 2 instead of 1 (by ignoring the skip slot). ``` b1 <- missed <- b2 <- b3 ``` A solution for the scenario 2 would be to count number of slots between included slot and attestation slot ignoring the skip slots in the beacon chain and pass the value to the validator monitor. But I'm concerned that it could potentially lead to db accesses for older blocks in extreme cases. This PR also uses the validator monitor data for logging per epoch inclusion distance. This is useful as we won't get inclusion data in post-altair summaries. Co-authored-by: Michael Sproul <micsproul@gmail.com>	2021-09-30 01:22:43 +00:00
realbigsean	7d13e57d9f	Add interop metrics (#2645 ) ## Issue Addressed Resolves: #2644 ## Proposed Changes - Adds mandatory metrics mentioned here: https://github.com/ethereum/beacon-metrics/blob/master/metrics.md#interop-metrics ## Additional Info Couldn't figure out how to alias metrics, so I created them all as new gauges/counters. Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-09-29 23:44:24 +00:00

1 2 3 4 5 ...

1692 Commits