lighthouse

Author	SHA1	Message	Date
Divma	36fc887a40	Gossip cache timeout adjustments (#2997 ) ## Proposed Changes - Do not retry to publish sync committee messages. - Give a more lenient timeout to slashings and exits	2022-02-07 23:25:06 +00:00
Age Manning	675c7b7e26	Correct a dial race condition (#2992 ) ## Issue Addressed On a network with few nodes, it is possible that the same node can be found from a subnet discovery and a normal peer discovery at the same time. The network behaviour loads these peers into events and processes them when it has the chance. It can happen that the same peer can enter the event queue more than once and then attempt to be dialed twice. This PR shifts the registration of nodes in the peerdb as being dialed before they enter the NetworkBehaviour queue, preventing multiple attempts of the same peer being entered into the queue and avoiding the race condition.	2022-02-07 23:25:05 +00:00
Divma	48b7c8685b	upgrade libp2p (#2933 ) ## Issue Addressed Upgrades libp2p to v.0.42.0 pre release (https://github.com/libp2p/rust-libp2p/pull/2440)	2022-02-07 23:25:03 +00:00
Divma	615695776e	Retry gossipsub messages when insufficient peers (#2964 ) ## Issue Addressed #2947 ## Proposed Changes Store messages that fail to be published due to insufficient peers for retry later. Messages expire after half an epoch and are retried if gossipsub informs us that an useful peer has connected. Currently running in Atlanta ## Additional Info If on retry sending the messages fails they will not be tried again	2022-02-03 01:12:30 +00:00
Mac L	286996b090	Fix small typo in error log (#2975 ) ## Proposed Changes Fixes a small typo I came across.	2022-01-31 22:55:07 +00:00
Age Manning	bdd70d7aef	Reduce gossip history (#2969 ) The gossipsub history was increased to a good portion of a slot from 2.1 seconds in the last release. Although it shouldn't cause too much issue, it could be related to recieving later messages than usual and interacting with our scoring system penalizing peers. For consistency, this PR reduces the time we gossip messages back to the same values of the previous release. It also adjusts the gossipsub heartbeat time for testing purposes with a developer flag but this should not effect end users.	2022-01-31 07:29:41 +00:00
Age Manning	ca29b580a2	Increase target subnet peers (#2948 ) In the latest release we decreased the target number of subnet peers. It appears this could be causing issues in some cases and so reverting it back to the previous number it wise. A larger PR that follows this will address some other related discovery issues and peer management around subnet peer discovery.	2022-01-24 12:08:00 +00:00
Age Manning	fc7a1a7dc7	Allow disconnected states to introduce new peers without warning (#2922 ) ## Issue Addressed We emit a warning to verify that all peer connection state information is consistent. A warning is given under one edge case; We try to dial a peer with peer-id X and multiaddr Y. The peer responds to multiaddr Y with a different peer-id, Z. The dialing to the peer fails, but libp2p injects the failed attempt as peer-id Z. In this instance, our PeerDB tries to add a new peer in the disconnected state under a previously unknown peer-id. This is harmless and so this PR permits this behaviour without logging a warning.	2022-01-20 09:14:25 +00:00
Michael Sproul	ef7351ddfe	Update to spec v1.1.8 (#2893 ) ## Proposed Changes Change the canonical fork name for the merge to Bellatrix. Keep other merge naming the same to avoid churn. I've also fixed and enabled the `fork` and `transition` tests for Bellatrix, and the v1.1.7 fork choice tests. Additionally, the `BellatrixPreset` has been added with tests. It gets served via the `/config/spec` API endpoint along with the other presets.	2022-01-19 00:24:19 +00:00
Age Manning	1c667ad3ca	PeerDB Status unknown bug fix (#2907 ) ## Issue Addressed The PeerDB was getting out of sync with the number of disconnected peers compared to the actual count. As this value determines how many we store in our cache, over time the cache was depleting and we were removing peers immediately resulting in errors that manifest as unknown peers for some operations. The error occurs when dialing a peer fails, we were not correctly updating the peerdb counter because the increment to the counter was placed in the wrong order and was therefore not incrementing the count. This PR corrects this.	2022-01-14 05:42:48 +00:00
Age Manning	6f4102aab6	Network performance tuning (#2608 ) There is a pretty significant tradeoff between bandwidth and speed of gossipsub messages. We can reduce our bandwidth usage considerably at the cost of minimally delaying gossipsub messages. The impact of delaying messages has not been analyzed thoroughly yet, however this PR in conjunction with some gossipsub updates show considerable bandwidth reduction. This PR allows the user to set a CLI value (`network-load`) which is an integer in the range of 1 of 5 depending on their bandwidth appetite. 1 represents the least bandwidth but slowest message recieving and 5 represents the most bandwidth and fastest received message time. For low-bandwidth users it is likely to be more efficient to use a lower value. The default is set to 3, which currently represents a reduced bandwidth usage compared to previous version of this PR. The previous lighthouse versions are equivalent to setting the `network-load` CLI to 4. This PR is awaiting a few gossipsub updates before we can get it into lighthouse.	2022-01-14 05:42:47 +00:00
Paul Hauner	aaa5344eab	Add peer score adjustment msgs (#2901 ) ## Issue Addressed N/A ## Proposed Changes This PR adds the `msg` field to `Peer score adjusted` log messages. These `msg` fields help identify why a peer was banned. Example: ``` Jan 11 04:18:48.096 DEBG Peer score adjusted score: -100.00, peer_id: 16Uiu2HAmQskxKWWGYfginwZ51n5uDbhvjHYnvASK7PZ5gBdLmzWj, msg: attn_unknown_head, service: libp2p Jan 11 04:18:48.096 DEBG Peer score adjusted score: -27.86, peer_id: 16Uiu2HAmA7cCb3MemVDbK3MHZoSb7VN3cFUG3vuSZgnGesuVhPDE, msg: sync_past_slot, service: libp2p Jan 11 04:18:48.096 DEBG Peer score adjusted score: -100.00, peer_id: 16Uiu2HAmQskxKWWGYfginwZ51n5uDbhvjHYnvASK7PZ5gBdLmzWj, msg: attn_unknown_head, service: libp2p Jan 11 04:18:48.096 DEBG Peer score adjusted score: -28.86, peer_id: 16Uiu2HAmA7cCb3MemVDbK3MHZoSb7VN3cFUG3vuSZgnGesuVhPDE, msg: sync_past_slot, service: libp2p Jan 11 04:18:48.096 DEBG Peer score adjusted score: -29.86, peer_id: 16Uiu2HAmA7cCb3MemVDbK3MHZoSb7VN3cFUG3vuSZgnGesuVhPDE, msg: sync_past_slot, service: libp2p ``` There is also a `libp2p_report_peer_msgs_total` metrics which allows us to see count of reports per `msg` tag. ## Additional Info NA	2022-01-12 05:32:14 +00:00
Michael Sproul	fac117667b	Update to superstruct v0.4.1 (#2886 ) ## Proposed Changes Update `superstruct` to bring in @realbigsean's fixes necessary for MEV-compatible private beacon block types (a la #2795). The refactoring is due to another change in superstruct that allows partial getters to be auto-generated.	2022-01-06 03:14:58 +00:00
Age Manning	81c667b58e	Additional networking metrics (#2549 ) Adds additional metrics for network monitoring and evaluation. Co-authored-by: Mark Mackey <mark@sigmaprime.io>	2021-12-22 06:17:14 +00:00
Divma	56d596ee42	Unban peers at the swarm level when purged (#2855 ) ## Issue Addressed #2840	2021-12-20 23:45:21 +00:00
Divma	eee0260a68	do not count dialing peers in the connection limit (#2856 ) ## Issue Addressed #2841 ## Proposed Changes Not counting dialing peers while deciding if we have reached the target peers in case of outbound peers. ## Additional Info Checked this running in nodes and bandwidth looks normal, peer count looks normal too	2021-12-15 05:48:45 +00:00
realbigsean	b22ac95d7f	v1.1.6 Fork Choice changes (#2822 ) ## Issue Addressed Resolves: https://github.com/sigp/lighthouse/issues/2741 Includes: https://github.com/sigp/lighthouse/pull/2853 so that we can get ssz static tests passing here on v1.1.6. If we want to merge that first, we can make this diff slightly smaller ## Proposed Changes - Changes the `justified_epoch` and `finalized_epoch` in the `ProtoArrayNode` each to an `Option<Checkpoint>`. The `Option` is necessary only for the migration, so not ideal. But does allow us to add a default logic to `None` on these fields during the database migration. - Adds a database migration from a legacy fork choice struct to the new one, search for all necessary block roots in fork choice by iterating through blocks in the db. - updates related to https://github.com/ethereum/consensus-specs/pull/2727 - We will have to update the persisted forkchoice to make sure the justified checkpoint stored is correct according to the updated fork choice logic. This boils down to setting the forkchoice store's justified checkpoint to the justified checkpoint of the block that advanced the finalized checkpoint to the current one. - AFAICT there's no migration steps necessary for the update to allow applying attestations from prior blocks, but would appreciate confirmation on that - I updated the consensus spec tests to v1.1.6 here, but they will fail until we also implement the proposer score boost updates. I confirmed that the previously failing scenario `new_finalized_slot_is_justified_checkpoint_ancestor` will now pass after the boost updates, but haven't confirmed _all_ tests will pass because I just quickly stubbed out the proposer boost test scenario formatting. - This PR now also includes proposer boosting https://github.com/ethereum/consensus-specs/pull/2730 ## Additional Info I realized checking justified and finalized roots in fork choice makes it more likely that we trigger this bug: https://github.com/ethereum/consensus-specs/pull/2727 It's possible the combination of justified checkpoint and finalized checkpoint in the forkchoice store is different from in any block in fork choice. So when trying to startup our store's justified checkpoint seems invalid to the rest of fork choice (but it should be valid). When this happens we get an `InvalidBestNode` error and fail to start up. So I'm including that bugfix in this branch. Todo: - [x] Fix fork choice tests - [x] Self review - [x] Add fix for https://github.com/ethereum/consensus-specs/pull/2727 - [x] Rebase onto Kintusgi - [x] Fix `num_active_validators` calculation as @michaelsproul pointed out - [x] Clean up db migrations Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-12-13 20:43:22 +00:00
Lion - dapplion	2984f4b474	Remove wrong duplicated comment (#2751 ) ## Issue Addressed Remove wrong duplicated comment. Comment was copied from ban_peer() but doesn't apply to unban_peer()	2021-12-06 05:34:15 +00:00
Pawan Dhananjay	f3c237cfa0	Restrict network limits based on merge fork epoch (#2839 )	2021-12-02 14:32:31 +11:00
Paul Hauner	ab86b42874	Kintsugi Diva comments (#2836 ) * Remove TODOs * Fix typo	2021-12-02 14:29:59 +11:00
Paul Hauner	82a81524e3	Bump crate versions (#2829 )	2021-12-02 14:29:57 +11:00
pawan	44a7b37ce3	Increase network limits (#2796 ) Fix max packet sizes Fix max_payload_size function Add merge block test Fix max size calculation; fix up test Clear comments Add a payload_size_function Use safe arith for payload calculation Return an error if block too big in block production Separate test to check if block is over limit	2021-12-02 14:29:20 +11:00
Mark Mackey	5687c56d51	Initial merge changes Added Execution Payload from Rayonism Fork Updated new Containers to match Merge Spec Updated BeaconBlockBody for Merge Spec Completed updating BeaconState and BeaconBlockBody Modified ExecutionPayload<T> to use Transaction<T> Mostly Finished Changes for beacon-chain.md Added some things for fork-choice.md Update to match new fork-choice.md/fork.md changes ran cargo fmt Added Missing Pieces in eth2_libp2p for Merge fix ef test Various Changes to Conform Closer to Merge Spec	2021-12-02 14:26:50 +11:00
Age Manning	6625aa4afe	Status'd Peer Not Found (#2761 ) ## Issue Addressed Users are experiencing `Status'd peer not found` errors ## Proposed Changes Although I cannot reproduce this error, this is only one connection state change that is not addressed in the peer manager (that I could see). The error occurs because the number of disconnected peers in the peerdb becomes out of sync with the actual number of disconnected peers. From what I can tell almost all possible connection state changes are handled, except for the case when a disconnected peer changes to be disconnecting. This can potentially happen at the peer connection limit, where a previously connected peer switches to disconnecting. This PR decrements the disconnected counter when this event occurs and from what I can tell, covers all possible disconnection state changes in the peer manager.	2021-11-28 22:46:17 +00:00
Pawan Dhananjay	9eedb6b888	Allow additional subnet peers (#2823 ) ## Issue Addressed N/A ## Proposed Changes 1. Don't disconnect peer from dht on connection limit errors 2. Bump up `PRIORITY_PEER_EXCESS` to allow for dialing upto 60 peers by default. Co-authored-by: Diva M <divma@protonmail.com>	2021-11-25 21:27:08 +00:00
Michael Sproul	2c07a72980	Revert peer DB changes from #2724 (#2828 ) ## Proposed Changes This reverts commit `53562010ec` from PR #2724 Hopefully this will restore the reliability of the sync simulator.	2021-11-25 03:45:52 +00:00
Age Manning	0b319d4926	Inform dialing via the behaviour (#2814 ) I had this change but it seems to have been lost in chaos of network upgrades. The swarm dialing event seems to miss some cases where we dial via the behaviour. This causes an error to be logged as the peer manager doesn't know about some dialing events. This shifts the logic to the behaviour to inform the peer manager.	2021-11-19 04:42:33 +00:00
Divma	53562010ec	Move peer db writes to eth2 libp2p (#2724 ) ## Issue Addressed Part of a bigger effort to make the network globals read only. This moves all writes to the `PeerDB` to the `eth2_libp2p` crate. Limiting writes to the peer manager is a slightly more complicated issue for a next PR, to keep things reviewable. ## Proposed Changes - Make the peers field in the globals a private field. - Allow mutable access to the peers field to `eth2_libp2p` for now. - Add a new network message to update the sync state. Co-authored-by: Age Manning <Age@AgeManning.com>	2021-11-19 04:42:31 +00:00
Age Manning	e519af9012	Update Lighthouse Dependencies (#2818 ) ## Issue Addressed Updates lighthouse dependencies to resolve audit issues in out-dated deps.	2021-11-18 05:08:42 +00:00
Pawan Dhananjay	e32c09bfda	Fix decoding max length (#2816 ) ## Issue Addressed N/A ## Proposed Changes Fix encoder max length to the correct value (`MAX_RPC_SIZE`).	2021-11-16 22:23:39 +00:00
Age Manning	a43a2448b7	Investigate and correct RPC Response Timeouts (#2804 ) RPC Responses are for some reason not removing their timeout when they are completing. As an example: ``` Nov 09 01:18:20.256 DEBG Received BlocksByRange Request step: 1, start_slot: 728465, count: 64, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:20.263 DEBG Received BlocksByRange Request step: 1, start_slot: 728593, count: 64, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:20.483 DEBG BlocksByRange Response sent returned: 63, requested: 64, current_slot: 2466389, start_slot: 728465, msg: Failed to return all requested blocks, peer: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:20.500 DEBG BlocksByRange Response sent returned: 64, requested: 64, current_slot: 2466389, start_slot: 728593, peer: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:21.068 DEBG Received BlocksByRange Request step: 1, start_slot: 728529, count: 64, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:21.272 DEBG BlocksByRange Response sent returned: 63, requested: 64, current_slot: 2466389, start_slot: 728529, msg: Failed to return all requested blocks, peer: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:23.434 DEBG Received BlocksByRange Request step: 1, start_slot: 728657, count: 64, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:23.665 DEBG BlocksByRange Response sent returned: 64, requested: 64, current_slot: 2466390, start_slot: 728657, peer: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:25.851 DEBG Received BlocksByRange Request step: 1, start_slot: 728337, count: 64, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:25.851 DEBG Received BlocksByRange Request step: 1, start_slot: 728401, count: 64, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:26.094 DEBG BlocksByRange Response sent returned: 62, requested: 64, current_slot: 2466390, start_slot: 728401, msg: Failed to return all requested blocks, peer: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:26.100 DEBG BlocksByRange Response sent returned: 63, requested: 64, current_slot: 2466390, start_slot: 728337, msg: Failed to return all requested blocks, peer: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw Nov 09 01:18:31.070 DEBG RPC Error direction: Incoming, score: 0, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw, client: Prysm: version: a80b1c252a9b4773493b41999769bf3134ac373f, os_version: unknown, err: Stream Timeout, protocol: beacon_blocks_by_range, service: libp2p Nov 09 01:18:31.070 WARN Timed out to a peer's request. Likely insufficient resources, reduce peer count, service: libp2p Nov 09 01:18:31.085 DEBG RPC Error direction: Incoming, score: 0, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw, client: Prysm: version: a80b1c252a9b4773493b41999769bf3134ac373f, os_version: unknown, err: Stream Timeout, protocol: beacon_blocks_by_range, service: libp2p Nov 09 01:18:31.085 WARN Timed out to a peer's request. Likely insufficient resources, reduce peer count, service: libp2p Nov 09 01:18:31.459 DEBG RPC Error direction: Incoming, score: 0, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw, client: Prysm: version: a80b1c252a9b4773493b41999769bf3134ac373f, os_version: unknown, err: Stream Timeout, protocol: beacon_blocks_by_range, service: libp2p Nov 09 01:18:31.459 WARN Timed out to a peer's request. Likely insufficient resources, reduce peer count, service: libp2p Nov 09 01:18:34.129 DEBG RPC Error direction: Incoming, score: 0, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw, client: Prysm: version: a80b1c252a9b4773493b41999769bf3134ac373f, os_version: unknown, err: Stream Timeout, protocol: beacon_blocks_by_range, service: libp2p Nov 09 01:18:34.130 WARN Timed out to a peer's request. Likely insufficient resources, reduce peer count, service: libp2p Nov 09 01:18:35.686 DEBG Peer Manager disconnecting peer reason: Too many peers, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw, service: libp2p ``` This PR is to investigate and correct the issue. ~~My current thoughts are that for some reason we are not closing the streams correctly, or fast enough, or the executor is not registering the closes and waking up.~~ - Pretty sure this is not the case, see message below for a more accurate reason. ~~I've currently added a timeout to stream closures in an attempt to force streams to close and the future to always complete.~~ I removed this	2021-11-16 03:42:25 +00:00
Divma	fbafe416d1	Move the peer manager to be a behaviour (#2773 ) This simply moves some functions that were "swarm notifications" to a network behaviour implementation. Notes ------ - We could disconnect from the peer manager but we would lose the rpc shutdown message - We still notify from the swarm since this is the most reliable way to get some events. Ugly but best for now - Events need to be pushed with "add event" to wake the waker Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>	2021-11-08 00:01:10 +00:00
Divma	a683e0296a	Peer manager cfg (#2766 ) ## Issue Addressed I've done this change in a couple of WIPs already so I might as well submit it on its own. This changes no functionality but reduces coupling in a 0.0001%. It also helps new people who need to work in the peer manager to better understand what it actually needs from the outside ## Proposed Changes Add a config to the peer manager	2021-11-03 23:44:44 +00:00
Divma	7502970a7d	Do not compute metrics in the network service if the cli flag is not set (#2765 ) ## Issue Addressed The computation of metrics in the network service can be expensive. This disables the computation unless the cli flag `metrics` is set. ## Additional Info Metrics in other parts of the network are still updated, since most are simple metrics and checking if metrics are enabled each time each metric is updated doesn't seem like a gain.	2021-11-03 00:06:03 +00:00
Age Manning	1790010260	Upgrade to latest libp2p (#2605 ) This is a pre-cursor to the next libp2p upgrade. It is currently being used for staging a number of PR upgrades which are contingent on the latest libp2p.	2021-10-29 01:59:29 +00:00
Divma	d4819bfd42	Add a waker to the RPC handler (#2721 ) ## Issue Addressed Attempts to fix #2701 but I doubt this is the reason behind that. ## Proposed Changes maintain a waker in the rpc handler and call it if an event is received	2021-10-21 06:14:36 +00:00
Age Manning	df40700ddd	Rename eth2_libp2p to lighthouse_network (#2702 ) ## Description The `eth2_libp2p` crate was originally named and designed to incorporate a simple libp2p integration into lighthouse. Since its origins the crates purpose has expanded dramatically. It now houses a lot more sophistication that is specific to lighthouse and no longer just a libp2p integration. As of this writing it currently houses the following high-level lighthouse-specific logic: - Lighthouse's implementation of the eth2 RPC protocol and specific encodings/decodings - Integration and handling of ENRs with respect to libp2p and eth2 - Lighthouse's discovery logic, its integration with discv5 and logic about searching and handling peers. - Lighthouse's peer manager - This is a large module handling various aspects of Lighthouse's network, such as peer scoring, handling pings and metadata, connection maintenance and recording, etc. - Lighthouse's peer database - This is a collection of information stored for each individual peer which is specific to lighthouse. We store connection state, sync state, last seen ips and scores etc. The data stored for each peer is designed for various elements of the lighthouse code base such as syncing and the http api. - Gossipsub scoring - This stores a collection of gossipsub 1.1 scoring mechanisms that are continuously analyssed and updated based on the ethereum 2 networks and how Lighthouse performs on these networks. - Lighthouse specific types for managing gossipsub topics, sync status and ENR fields - Lighthouse's network HTTP API metrics - A collection of metrics for lighthouse network monitoring - Lighthouse's custom configuration of all networking protocols, RPC, gossipsub, discovery, identify and libp2p. Therefore it makes sense to rename the crate to be more akin to its current purposes, simply that it manages the majority of Lighthouse's network stack. This PR renames this crate to `lighthouse_network` Co-authored-by: Paul Hauner <paul@paulhauner.com>	2021-10-19 00:30:39 +00:00

37 Commits