Commit Graph

144 Commits

Author SHA1 Message Date
Paul Hauner
90ea075c62 Revert "Network protocol upgrades (#2345)" (#2388)
## Issue Addressed

NA

## Proposed Changes

Reverts #2345 in the interests of getting v1.4.0 out this week. Once we have released that, we can go back to testing this again.

## Additional Info

NA
2021-06-02 01:07:28 +00:00
Kevin Lu
320a683e72 Minimum Outbound-Only Peers Requirement (#2356)
## Issue Addressed

#2325 

## Proposed Changes

This pull request changes the behavior of the Peer Manager by including a minimum outbound-only peers requirement. The peer manager will continue querying for peers if this outbound-only target number hasn't been met. Additionally, when peers are being removed, an outbound-only peer will not be disconnected if doing so brings us below the minimum. 

## Additional Info

Unit test for heartbeat function tests that disconnection behavior is correct. Continual querying for peers if outbound-only hasn't been met is not directly tested, but indirectly through unit testing of the helper function that counts the number of outbound-only peers.

EDIT: Am concerned about the behavior of ```update_peer_scores```. If we have connected to a peer with a score below the disconnection threshold (-20), then its connection status will remain connected, while its score state will change to disconnected. 

```rust
let previous_state = info.score_state();            
// Update scores            
info.score_update();
Self::handle_score_transitions(                
               previous_state,
                peer_id,
                info, 
               &mut to_ban_peers,
               &mut to_unban_peers,
               &mut self.events,
               &self.log,
);
```

```previous_state``` will be set to Disconnected, and then because ```handle_score_transitions``` only changes connection status for a peer if the state changed, the peer remains connected. Then in the heartbeat code, because we only disconnect healthy peers if we have too many peers, these peers don't get disconnected. I'm not sure realistically how often this scenario would occur, but it might be better to adjust the logic to account for scenarios where the score state implies a connection status different from the current connection status. 

Co-authored-by: Kevin Lu <kevlu93@gmail.com>
2021-05-31 04:18:19 +00:00
Age Manning
ec5cceba50 Correct issue with dialing peers (#2375)
The ordering of adding new peers to the peerdb and deciding when to dial them was not considered in a previous update.

This adds the condition that if a peer is not in the peer-db then it is an acceptable peer to dial.

This makes #2374 obsolete.
2021-05-29 07:25:06 +00:00
Age Manning
d12e746b50 Network protocol upgrades (#2345)
This provides a number of upgrades to gossipsub and discovery. 

The updates are extensive and this needs thorough testing.
2021-05-28 22:02:10 +00:00
Age Manning
55aada006f
More stringent dialing (#2363)
* More stringent dialing

* Cover cached enr dialing
2021-05-26 14:21:44 +10:00
ethDreamer
cb47388ad7 Updated to comply with new clippy formatting rules (#2336)
## Issue Addressed

The latest version of Rust has new clippy rules & the codebase isn't up to date with them.

## Proposed Changes

Small formatting changes that clippy tells me are functionally equivalent
2021-05-10 00:53:09 +00:00
ethDreamer
0aa8509525 Filter Disconnected Peers from Discv5 DHT (#2219)
## Issue Addressed
#2107

## Proposed Change
The peer manager will mark peers as disconnected in the discv5 DHT when they disconnect or dial fails

## Additional Info
Rationale for this particular change is explained in my comment on #2107
2021-04-28 04:07:37 +00:00
Pawan Dhananjay
95a362213d Fix local testnet scripts (#2229)
## Issue Addressed

Resolves #2094 

## Proposed Changes

Fixes scripts for creating local testnets. Adds an option in `lighthouse boot_node` to run with a previously generated enr.
2021-03-30 05:17:58 +00:00
Michael Sproul
f9d60f5436 VC: accept unknown fields in chain spec (#2277)
## Issue Addressed

Closes #2274

## Proposed Changes

* Modify the `YamlConfig` to collect unknown fields into an `extra_fields` map, instead of failing hard.
* Log a debug message if there are extra fields returned to the VC from one of its BNs.

This restores Lighthouse's compatibility with Teku beacon nodes (and therefore Infura)
2021-03-26 04:53:57 +00:00
Paul Hauner
8e5c20b6d1 Update for clippy 1.50 (#2193)
## Issue Addressed

NA

## Proposed Changes

Rust 1.50 has landed 🎉

The shiny new `clippy` peers down upon us mere mortals with disgust. Brutish peasants wrapping our `usize`s in superfluous `Option`s... tsk tsk.

I've performed the goat sacrifice and corrected our evil ways in this PR. Tonight we shall pray that Github Actions bestows the almighty green tick upon us.

## Additional Info

NA


Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-02-15 00:09:12 +00:00
realbigsean
e20f64b21a Update to tokio 1.1 (#2172)
## Issue Addressed

resolves #2129
resolves #2099 
addresses some of #1712
unblocks #2076
unblocks #2153 

## Proposed Changes

- Updates all the dependencies mentioned in #2129, except for web3. They haven't merged their tokio 1.0 update because they are waiting on some dependencies of their own. Since we only use web3 in tests, I think updating it in a separate issue is fine. If they are able to merge soon though, I can update in this PR. 

- Updates `tokio_util` to 0.6.2 and `bytes` to 1.0.1.

- We haven't made a discv5 release since merging tokio 1.0 updates so I'm using a commit rather than release atm. **Edit:** I think we should merge an update of `tokio_util` to 0.6.2 into discv5 before this release because it has panic fixes in `DelayQueue`  --> PR in discv5:  https://github.com/sigp/discv5/pull/58

## Additional Info

tokio 1.0 changes that required some changes in lighthouse:

- `interval.next().await.is_some()` -> `interval.tick().await`
- `sleep` future is now `!Unpin` -> https://github.com/tokio-rs/tokio/issues/3028
- `try_recv` has been temporarily removed from `mpsc` -> https://github.com/tokio-rs/tokio/issues/3350
- stream features have moved to `tokio-stream` and `broadcast::Receiver::into_stream()` has been temporarily removed -> `https://github.com/tokio-rs/tokio/issues/2870
- I've copied over the `BroadcastStream` wrapper from this PR, but can update to use `tokio-stream` once it's merged https://github.com/tokio-rs/tokio/pull/3384

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-02-10 23:29:49 +00:00
Akihito Nakano
1a22a096c6 Fix clippy errors on tests (#2160)
## Issue Addressed

There are some clippy error on tests.


## Proposed Changes

Enable clippy check on tests and fix the errors. 💪
2021-01-28 23:31:06 +00:00
Paul Hauner
1eb0915301 Fix bug from #2163 (#2165)
## Issue Addressed

NA

## Proposed Changes

Fixes a bug that I missed during a review in #2163. I found this bug by observing that nodes were receiving far less attestations (~1/2 of previous).

I'm not certain on *exactly* how this mistake manifested in a reduction in attestations, but the mistake touches so much code that I think it's reasonable to declare that this it the cause of the observed issue (drop in attestations).

## Additional Info

NA
2021-01-20 10:28:12 +00:00
Paul Hauner
d9f940613f Represent slots in secs instead of millisecs (#2163)
## Issue Addressed

NA

## Proposed Changes

Copied from #2083, changes the config milliseconds_per_slot to seconds_per_slot to avoid errors when slot duration is not a multiple of a second. To avoid deserializing old serialized data (with milliseconds instead of seconds) the Serialize and Deserialize derive got removed from the Spec struct (isn't currently used anyway).

This PR replaces #2083 for the purpose of fixing a merge conflict without requiring the input of @blacktemplar.

## Additional Info

NA


Co-authored-by: blacktemplar <blacktemplar@a1.net>
2021-01-19 09:39:51 +00:00
Paul Hauner
805e152f66 Simplify enum -> str with strum (#2164)
## Issue Addressed

NA

## Proposed Changes

As per #2100, uses derives from the sturm library to implement AsRef<str> and AsStaticRef to easily get str values from enums without creating new Strings. Furthermore unifies all attestation error counter into one IntCounterVec vector.

These works are originally by @blacktemplar, I've just created this PR so I can resolve some merge conflicts.

## Additional Info

NA


Co-authored-by: blacktemplar <blacktemplar@a1.net>
2021-01-19 06:33:58 +00:00
realbigsean
7a71977987 Clippy 1.49.0 updates and dht persistence test fix (#2156)
## Issue Addressed

`test_dht_persistence` failing

## Proposed Changes

Bind `NetworkService::start` to an underscore prefixed variable rather than `_`.  `_` was causing it to be dropped immediately

This was failing 5/100 times before this update, but I haven't been able to get it to fail after updating it

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-01-19 00:34:28 +00:00
Pawan Dhananjay
28238d97b1 Disconnect from peers quicker on internet issues (#2147)
## Issue Addressed

Fixes #2146 

## Proposed Changes

Change ping timeout errors to return `LowToleranceErrors` so that we disconnect faster on internet failures/changes.
2021-01-13 08:09:10 +00:00
Age Manning
7e4b190df0 Reduce ping interval (#2132)
## Issue Addressed

#2123

## Description

Reduces the TCP ping interval to increase our responsiveness to peer liveness changes.
2021-01-06 04:35:52 +00:00
Samuel E. Moelius
939fa717fd test_decode_malicious_status_message improvements (#2104)
## Issue Addressed

None

## Proposed Changes

* Correct typo in one comment, elaborate some others.
* Add asserts to ensure comments match code.
* Eliminate one unnecessary `clone`.

## Additional Info

None
2021-01-06 01:10:26 +00:00
Samuel E. Moelius
0245ddd37b Fix typo in ssz_snappy.rs comment (#2103)
## Issue Addressed

None

## Proposed Changes

Correct a typo in `ssz_snappy.rs`.

## Additional Info

Pedantry at it finest.
2021-01-06 01:10:24 +00:00
Age Manning
2931b05582 Update libp2p (#2101)
This is a little bit of a tip-of-the-iceberg PR. It houses a lot of code changes in the libp2p dependency. 

This needs a bit of thorough testing before merging. 

The primary code changes are:
- General libp2p dependency update
- Gossipsub refactor to shift compression into gossipsub providing performance improvements and improved API for handling compression



Co-authored-by: Paul Hauner <paul@paulhauner.com>
2020-12-23 07:53:36 +00:00
Samuel E. Moelius
3381266998 Eliminate uses of expect in ssz_snappy.rs (#2105)
## Issue Addressed

None

## Proposed Changes

Eliminate three uses of `expect` in `ssz_snappy.rs`.

## Additional Info

None
2020-12-22 02:28:37 +00:00
Pawan Dhananjay
f998eff7ce Subnet discovery fixes (#2095)
## Issue Addressed

N/A

## Proposed Changes

Fixes multiple issues related to discovering of subnet peers.
1. Subnet discovery retries after yielding no results
2. Metadata updates if peer send older metadata
3. peerdb stores the peer subscriptions from gossipsub
2020-12-17 00:39:15 +00:00
blacktemplar
3fcc517993 Fix Syncing Simulator (#2049)
## Issue Addressed

NA

## Proposed Changes

Fixes problems with slot times below 1 second which got revealed by running the syncing simulator with the default speedup time.
2020-12-16 05:37:38 +00:00
divma
11c299cbf6 impl Resource Unavailable RPC error (#2072)
## Issue Addressed

Related to #1891, The error is not in the spec yet (see ethereum/eth2.0-specs#2131)

## Proposed Changes

Implement the proposed error, banning peers that send it

## Additional Info

NA
2020-12-15 00:17:32 +00:00
Age Manning
4f85371ce8 Downgrades a valid log (#2057)
## Issue Addressed

#2046 

## Proposed Changes

The log was originally intended to verify the correct logic and ordering of events when scoring peers. The queued tasks can be structured in such a way that peers can be banned after they are disconnected. Therefore the error log is now downgraded to  debug log.
2020-12-08 10:48:45 +00:00
divma
57489e620f fix default network handling (#2029)
## Issue Addressed
#1992 and #1987, and also to be considered a continuation of #1751

## Proposed Changes
many changed files but most are renaming to align the code with the semantics of `--network` 
- remove the `--network` default value (in clap) and instead set it after checking the `network` and `testnet-dir` flags
- move `eth2_testnet_config` crate to `eth2_network_config`
- move `Eth2TestnetConfig` to `Eth2NetworkConfig`
- move `DEFAULT_HARDCODED_TESTNET` to `DEFAULT_HARDCODED_NETWORK`
- `beacon_node`s `get_eth2_testnet_config` loads the `DEFAULT_HARDCODED_NETWORK` if there is no network nor testnet provided
- `boot_node`s config loads the config same as the `beacon_node`, it was using the configuration only for preconfigured networks (That code is ~1year old so I asume it was not intended)
- removed a one year old comment stating we should try to emulate `https://github.com/eth2-clients/eth2-testnets/tree/master/nimbus/testnet1` it looks outdated (?)
- remove `lighthouse`s `load_testnet_config` in favor of `get_eth2_network_config` to centralize that logic (It had differences)
- some spelling

## Additional Info
Both the command of #1992 and the scripts of #1987 seem to work fine, same as `bn` and `vc`
2020-12-08 05:41:10 +00:00
divma
f3200784b4 More metrics + RPC tweaks (#2041)
## Issue Addressed

NA

## Proposed Changes
This was mostly done to find the reason why LH was dropping peers from Nimbus. It proved to be useful so I think it's worth it. But there is also some functional stuff here
- Add metrics for rpc errors per client, error type and direction
- Add metrics for downscoring events per source type, client and penalty type
- Add metrics for gossip validation results per client for non-accepted messages
- Make the RPC handler return errors and requests/responses in the order we see them
- Allow a small burst for the Ping rate limit, from 1 every 5 seconds to 2 every 10 seconds
- Send rate limiting errors with a particular code and use that same code to identify them. I picked something different to 128 since that is most likely what other clients are using for their own errors
- Remove some unused code in the `PeerAction` and the rpc handler
- Remove the unused variant `RateLimited`. tTis was never produced directly, since the only way to get the request's protocol is via de handler. The handler upon receiving from LH a response with an error (rate limited in this case) emits this event with the missing info (It was always like this, just pointing out that we do downscore rate limiting errors regardless of the change)

Metrics for Nimbus looked like this:
Downscoring events: `increase(libp2p_peer_actions_per_client{client="Nimbus"}[5m])`
![image](https://user-images.githubusercontent.com/26765164/101210880-862bf280-3676-11eb-94c0-399f0bf5aa2e.png)

RPC Errors: `increase(libp2p_rpc_errors_per_client{client="Nimbus"}[5m])`
![image](https://user-images.githubusercontent.com/26765164/101210997-ba071800-3676-11eb-847a-f32405ede002.png)

Unaccepted gossip message: `increase(gossipsub_unaccepted_messages_per_client{client="Nimbus"}[5m])`
![image](https://user-images.githubusercontent.com/26765164/101211124-f470b500-3676-11eb-9459-132ecff058ec.png)
2020-12-08 03:55:50 +00:00
Age Manning
2682f46025 Fingerprint new client identify agent string (#2027)
Nimbus have modified their identify agent string. 

This PR adds their new agent string to identify new nimbus peers.
2020-12-03 22:07:14 +00:00
blacktemplar
d8cda2d86e Fix new clippy lints (#2036)
## Issue Addressed

NA

## Proposed Changes

Fixes new clippy lints in the whole project (mainly [manual_strip](https://rust-lang.github.io/rust-clippy/master/index.html#manual_strip) and [unnecessary_lazy_evaluations](https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_lazy_evaluations)). Furthermore, removes `to_string()` calls on literals when used with the `?`-operator.
2020-12-03 01:10:26 +00:00
Age Manning
c718e81eaf Add privacy option (#2016)
Adds a `--privacy` CLI flag to the beacon node that users may opt into. 

This does two things:
- Removes client identifying information from the identify libp2p protocol
- Changes the default graffiti to "" if no graffiti is set.
2020-11-30 22:55:08 +00:00
divma
8fcd22992c No string in slog (#2017)
## Issue Addressed

Following slog's documentation, this should help a bit with string allocations. I left it run for two days and mem usage is lower. This is of course anecdotal, but shouldn't harm anyway 

## Proposed Changes

remove `String` creation in logs when possible
2020-11-30 10:33:00 +00:00
Paul Hauner
85e69249e6 Drop discovery log to trace (#2007)
## Issue Addressed

NA

## Proposed Changes

This was causing:

```
Nov 28 21:56:08.154 ERRO slog-async: logger dropped messages due to channel overflow, count: 44, service: libp2p
```

## Additional Info

NA
2020-11-29 03:02:23 +00:00
Age Manning
a567f788bd Upgrade to tokio 0.3 (#1839)
## Description

This PR updates Lighthouse to tokio 0.3. It includes a number of dependency updates and some structural changes as to how we create and spawn tasks.

This also brings with it a number of various improvements:

- Discv5 update
- Libp2p update
- Fix for recompilation issues
- Improved UPnP port mapping handling
- Futures dependency update
- Log downgrade to traces for rejecting peers when we've reached our max



Co-authored-by: blacktemplar <blacktemplar@a1.net>
2020-11-28 05:30:57 +00:00
divma
fc07cc3fdf Sync metrics (#1975)
## Issue Addressed
- Add metrics to keep track of peer counts by sync type
- Add metric to keep track of the number of syncing chains in range

## Proposed Changes
Plugin to the network metrics update interval and update too the counts for peers wrt to their sync status with us

## Additional Info
For the peer counts
- By the way it is implemented the numbers won't always match to the total peer count in the `libp2p` metric.
- Updating the gauge with every change is messy because it requires to be updated on connection (in the `eth2_libp2p` crate, while metrics are defined in the `network` crate) on Goodbye sent (for an `IrrelevantPeer`) either in the `beacon_processor` or the `peer_manager`, and on disconnection. Since this is not a critical metric I think counting once every second is enough. If you think more accuracy is needed we can do it too, but it would be harder to maintain)

ATM those look like this
![image](https://user-images.githubusercontent.com/26765164/100275387-22137b00-2f60-11eb-93b9-94b0f265240c.png)
2020-11-26 05:23:17 +00:00
divma
3b4afc27bf Status race condition (#1967)
## Issue Addressed

Sync stalls due to race conditions between dc notifications and status processing
2020-11-25 02:15:38 +00:00
Age Manning
a96893744c
Update bootnodes and boot_node cli (#1961) 2020-11-25 02:01:37 +11:00
Paul Hauner
65b1cf2af1 Add flag to import all attestations (#1941)
## Issue Addressed

NA

## Proposed Changes

Adds the `--import-all-attestations` flag which tells the `network::AttestationService` to import/aggregate all attestations after verification (instead of only ones for subnets that are relevant to local validators).

This is useful for testing/debugging and also for creating back-up nodes that should be all cached up and ready for any validator.

## Additional Info

NA
2020-11-22 23:58:25 +00:00
Pawan Dhananjay
e47739047d Add additional libp2p tests (#1867)
## Issue Addressed

N/A

## Proposed Changes

Adds tests for the eth2_libp2p crate.
2020-11-19 22:32:09 +00:00
blacktemplar
3408de8151 Avoid string initialization in network metrics and replace by &str where possible (#1898)
## Issue Addressed

NA

## Proposed Changes

Removes most of the temporary string initializations in network metrics and replaces them by directly using `&str`. This further improves on PR https://github.com/sigp/lighthouse/pull/1895.

For the subnet id handling the current approach uses a build script to create a static map. This has the disadvantage that the build script hardcodes the number of subnets. If we want to use more than 64 subnets we need to adjust this in the build script.

## Additional Info

We still have some string initializations for the enum `PeerKind`. To also replace that by `&str` I created a PR in the libp2p dependency: https://github.com/sigp/rust-libp2p/pull/91. Either we wait with merging until this dependency PR is merged (and all conflicts with the newest libp2p version are resolved) or we just merge as is and I will create another PR when the dependency is ready.
2020-11-18 23:31:37 +00:00
Age Manning
49c4630045 Performance improvement for db reads (#1909)
This PR adds a number of improvements:
- Downgrade a warning log when we ignore blocks for gossipsub processing
- Revert a a correction to improve logging of peer score changes
- Shift syncing DB reads off the core-executor allowing parallel processing of large sync messages
- Correct the timeout logic of RPC chunk sends, giving more time before timing out RPC outbound messages.
2020-11-16 07:28:30 +00:00
divma
eb56140582 Update logs + do not downscore peers if WE time out (#1901)
## Issue Addressed

- RPC Errors were being logged twice: first in the peer manager and then again in the router, so leave just the peer manager's one 
- The "reduce peer count" warn message gets thrown to the user for every missed chunk, so instead print it when the request times out and also do not include there info that is not relevant to the user
- The processor didn't have the service tag so add it
- Impl `KV` for status message
- Do not downscore peers if we are the ones that timed out

Other small improvements
2020-11-16 04:06:14 +00:00
divma
8a16548715 Misc Peer sync info adjustments (#1896)
## Issue Addressed
#1856 

## Proposed Changes
- For clarity, the router's processor now only decides if a peer is compatible and it disconnects it or sends it to sync accordingly. No logic here regarding how useful is the peer. 
- Update peer_sync_info's rules
- Add an `IrrelevantPeer` sync status to account for incompatible peers (maybe this should be "IncompatiblePeer" now that I think about it?) this state is update upon receiving an internal goodbye in the peer manager
- Misc code cleanups
- Reduce the need to create `StatusMessage`s (and thus, `Arc` accesses )
- Add missing calls to update the global sync state

The overall effect should be:
- More peers recognized as Behind, and less as Unknown
- Peers identified as incompatible
2020-11-13 09:00:10 +00:00
Age Manning
c00e6c2c6f Small network adjustments (#1884)
## Issue Addressed

- Asymmetric pings - Currently with symmetric ping intervals, lighthouse nodes race each other to ping often ending in simultaneous ping connections. This shifts the ping interval to be asymmetric based on inbound/outbound connections
- Correct inbound/outbound peer-db registering - It appears we were accounting inbound as outbound and vice versa in the peerdb, this has been corrected
- Improved logging

There is likely more to come - I'll leave this open as we investigate further testnets
2020-11-13 06:06:33 +00:00
blacktemplar
c7ac967d5a handle peer state transitions on gossipsub score changes + refactoring (#1892)
## Issue Addressed

NA

## Proposed Changes

Correctly handles peer state transitions on gossipsub changes + refactors handling of peer state transitions into one function used for lighthouse score changes and gossipsub score changes.


Co-authored-by: Age Manning <Age@AgeManning.com>
2020-11-13 03:15:03 +00:00
blacktemplar
fcb4893f72 do subnet discoveries until we have MESH_N_LOW many peers (#1886)
## Issue Addressed

NA

## Proposed Changes

Increases the target peers for a subnet, so that subnet queries are executed until we have at least the minimum required peers for a mesh (`MESH_N_LOW`). We keep the limit of `6` target peers for aggregated subnet discovery queries, therefore the size (and the time needed) for a query doesn't change.
2020-11-13 00:56:05 +00:00
blacktemplar
7404f1ce54 Gossipsub scoring (#1668)
## Issue Addressed

#1606 

## Proposed Changes

Uses dynamic gossipsub scoring parameters depending on the number of active validators as specified in https://gist.github.com/blacktemplar/5c1862cb3f0e32a1a7fb0b25e79e6e2c.

## Additional Info

Although the parameters got tested on Medalla, extensive testing using simulations on larger networks is still to be done and we expect that we need to change the parameters, although this might only affect constants within the dynamic parameter framework.
2020-11-12 01:48:28 +00:00
divma
b0e9e3dcef Seen addresses store port (#1841)
## Issue Addressed
#1764
2020-11-09 04:01:03 +00:00
Age Manning
e2ae5010a6 Update libp2p (#1865)
Updates libp2p to the latest version. 

This adds tokio 0.3 support and brings back yamux support. 

This also updates some discv5 configuration parameters for leaner discovery queries
2020-11-06 04:14:14 +00:00
blacktemplar
7e7fad5734 Ignore RPC messages of disconnected peers and remove old peers based on disconnection time (#1854)
## Issue Addressed

NA

## Proposed Changes

Lets the networking behavior ignore messages of peers that are not connected. Furthermore, old peers are not removed from the peerdb based on score anymore but based on the disconnection time.
2020-11-03 23:43:10 +00:00