Commit Graph

189 Commits

Author SHA1 Message Date
Pawan Dhananjay
e8c0d1f19b Altair networking (#2300)
## Issue Addressed

Resolves #2278 

## Proposed Changes

Implements the networking components for the Altair hard fork https://github.com/ethereum/eth2.0-specs/blob/dev/specs/altair/p2p-interface.md

## Additional Info

This PR acts as the base branch for networking changes and tracks https://github.com/sigp/lighthouse/pull/2279 . Changes to gossip, rpc and discovery can be separate PRs to be merged here for ease of review.

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-08-04 01:44:57 +00:00
Michael Sproul
187425cdc1 Bump discv5 to v0.1.0-beta.9 (#2479)
Bump discv5 to fix the issues with IP filters and removing nodes.

~~Blocked on an upstream release, and more testnet data.~~
2021-08-03 01:05:06 +00:00
realbigsean
303deb9969 Rust 1.54.0 lints (#2483)
## Issue Addressed

N/A

## Proposed Changes

- Removing a bunch of unnecessary references
- Updated `Error::VariantError` to `Error::Variant`
- There were additional enum variant lints that I ignored, because I thought our variant names were fine
- removed `MonitoredValidator`'s `pubkey` field, because I couldn't find it used anywhere. It looks like we just use the string version of the pubkey (the `id` field) if there is no index

## Additional Info



Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-07-30 01:11:47 +00:00
Michael Sproul
63923eaa29 Bump discv5 to v0.1.0-beta.8 (#2471)
## Proposed Changes

Update discv5 to fix bugs seen on `altair-devnet-1`
2021-07-21 07:10:52 +00:00
Age Manning
08fedbfcba
Libp2p Connection Limit (#2455)
* Get libp2p to handle connection limits

* fmt
2021-07-15 16:43:18 +10:00
Age Manning
6818a94171
Discovery update (#2458) 2021-07-15 16:43:18 +10:00
Age Manning
381befbf82
Ensure disconnecting peers are added to the peerdb (#2451) 2021-07-15 16:43:18 +10:00
Age Manning
059d9ec1b1
Gossipsub scoring improvements (#2391)
* Tweak gossipsub parameters for improved scoring

* Modify gossip history

* Update settings

* Make mesh window constant

* Decrease the mesh message deliveries weight

* Fmt
2021-07-15 16:43:18 +10:00
Age Manning
c62810b408
Update to Libp2p to 39.1 (#2448)
* Adjust beacon node timeouts for validator client HTTP requests (#2352)

Resolves #2313

Provide `BeaconNodeHttpClient` with a dedicated `Timeouts` struct.
This will allow granular adjustment of the timeout duration for different calls made from the VC to the BN. These can either be a constant value, or as a ratio of the slot duration.

Improve timeout performance by using these adjusted timeout duration's only whenever a fallback endpoint is available.

Add a CLI flag called `use-long-timeouts` to revert to the old behavior.

Additionally set the default `BeaconNodeHttpClient` timeouts to the be the slot duration of the network, rather than a constant 12 seconds. This will allow it to adjust to different network specifications.

Co-authored-by: Paul Hauner <paul@paulhauner.com>

* Use read_recursive locks in database (#2417)

Closes #2245

Replace all calls to `RwLock::read` in the `store` crate with `RwLock::read_recursive`.

* Unfortunately we can't run the deadlock detector on CI because it's pinned to an old Rust 1.51.0 nightly which cannot compile Lighthouse (one of our deps uses `ptr::addr_of!` which is too new). A fun side-project at some point might be to update the deadlock detector.
* The reason I think we haven't seen this deadlock (at all?) in practice is that _writes_ to the database's split point are quite infrequent, and a concurrent write is required to trigger the deadlock. The split point is only written when finalization advances, which is once per epoch (every ~6 minutes), and state reads are also quite sporadic. Perhaps we've just been incredibly lucky, or there's something about the timing of state reads vs database migration that protects us.
* I wrote a few small programs to demo the deadlock, and the effectiveness of the `read_recursive` fix: https://github.com/michaelsproul/relock_deadlock_mvp
* [The docs for `read_recursive`](https://docs.rs/lock_api/0.4.2/lock_api/struct.RwLock.html#method.read_recursive) warn of starvation for writers. I think in order for starvation to occur the database would have to be spammed with so many state reads that it's unable to ever clear them all and find time for a write, in which case migration of states to the freezer would cease. If an attack could be performed to trigger this starvation then it would likely trigger a deadlock in the current code, and I think ceasing migration is preferable to deadlocking in this extreme situation. In practice neither should occur due to protection from spammy peers at the network layer. Nevertheless, it would be prudent to run this change on the testnet nodes to check that it doesn't cause accidental starvation.

* Return more detail when invalid data is found in the DB during startup (#2445)

- Resolves #2444

Adds some more detail to the error message returned when the `BeaconChainBuilder` is unable to access or decode block/state objects during startup.

NA

* Use hardware acceleration for SHA256 (#2426)

Modify the SHA256 implementation in `eth2_hashing` so that it switches between `ring` and `sha2` to take advantage of [x86_64 SHA extensions](https://en.wikipedia.org/wiki/Intel_SHA_extensions). The extensions are available on modern Intel and AMD CPUs, and seem to provide a considerable speed-up: on my Ryzen 5950X it dropped state tree hashing times by about 30% from 35ms to 25ms (on Prater).

The extensions became available in the `sha2` crate [last year](https://www.reddit.com/r/rust/comments/hf2vcx/ann_rustcryptos_sha1_and_sha2_now_support/), and are not available in Ring, which uses a [pure Rust implementation of sha2](https://github.com/briansmith/ring/blob/main/src/digest/sha2.rs). Ring is faster on CPUs that lack the extensions so I've implemented a runtime switch to use `sha2` only when the extensions are available. The runtime switching seems to impose a miniscule penalty (see the benchmarks linked below).

* Start a release checklist (#2270)

NA

Add a checklist to the release draft created by CI. I know @michaelsproul was also working on this and I suspect @realbigsean also might have useful input.

NA

* Serious banning

* fmt

Co-authored-by: Mac L <mjladson@pm.me>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-07-15 16:43:18 +10:00
Age Manning
3c0d3227ab
Global Network Behaviour Refactor (#2442)
* Network upgrades (#2345)

* Discovery patch (#2382)

* Upgrade libp2p and unstable gossip

* Network protocol upgrades

* Correct dependencies, reduce incoming bucket limit

* Clean up dirty DHT entries before repopulating

* Update cargo lock

* Update lockfile

* Update ENR dep

* Update deps to specific versions

* Update test dependencies

* Update docker rust, and remote signer tests

* More remote signer test fixes

* Temp commit

* Update discovery

* Remove cached enrs after dialing

* Increase the session capacity, for improved efficiency

* Bleeding edge discovery (#2435)

* Update discovery banning logic and tokio

* Update to latest discovery

* Shift to latest discovery

* Fmt

* Initial re-factor of the behaviour

* More progress

* Missed changes

* First draft

* Discovery as a behaviour

* Adding back event waker (not convinced its neccessary, but have made this many changes already)

* Corrections

* Speed up discovery

* Remove double log

* Fmt

* After disconnect inform swarm about ban

* More fmt

* Appease clippy

* Improve ban handling

* Update tests

* Update cargo.lock

* Correct tests

* Downgrade log
2021-07-15 16:43:17 +10:00
Pawan Dhananjay
64226321b3
Relax requirement for enr fork digest predicate (#2433) 2021-07-15 16:43:17 +10:00
Age Manning
c1d2e35c9e
Bleeding edge discovery (#2435)
* Update discovery banning logic and tokio

* Update to latest discovery

* Shift to latest discovery

* Fmt
2021-07-15 16:43:17 +10:00
Age Manning
f4bc9db16d
Change the window mode of yamux (#2390) 2021-07-15 16:43:17 +10:00
Age Manning
6fb48b45fa
Discovery patch (#2382)
* Upgrade libp2p and unstable gossip

* Network protocol upgrades

* Correct dependencies, reduce incoming bucket limit

* Clean up dirty DHT entries before repopulating

* Update cargo lock

* Update lockfile

* Update ENR dep

* Update deps to specific versions

* Update test dependencies

* Update docker rust, and remote signer tests

* More remote signer test fixes

* Temp commit

* Update discovery

* Remove cached enrs after dialing

* Increase the session capacity, for improved efficiency
2021-07-15 16:43:17 +10:00
Age Manning
4aa06c9555
Network upgrades (#2345) 2021-07-15 16:43:10 +10:00
Michael Sproul
b4689e20c6 Altair consensus changes and refactors (#2279)
## Proposed Changes

Implement the consensus changes necessary for the upcoming Altair hard fork.

## Additional Info

This is quite a heavy refactor, with pivotal types like the `BeaconState` and `BeaconBlock` changing from structs to enums. This ripples through the whole codebase with field accesses changing to methods, e.g. `state.slot` => `state.slot()`.


Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-07-09 06:15:32 +00:00
Age Manning
73d002ef92 Update outdated dependencies (#2425)
This updates some older dependencies to address a few cargo audit warnings.

The majority of warnings come from network dependencies which will be addressed in #2389. 

This PR contains some minor dep updates that are not network related.

Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-07-05 00:54:17 +00:00
realbigsean
b84ff9f793 rust 1.53.0 updates (#2411)
## Issue Addressed

`make lint` failing on rust 1.53.0.

## Proposed Changes

1.53.0 updates

## Additional Info

I haven't figure out why yet, we were now hitting the recursion limit in a few crates. So I had to add `#![recursion_limit = "256"]` in a few places


Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-06-18 05:58:01 +00:00
divma
3261eff0bf split outbound and inbound codecs encoded types (#2410)
Splits the inbound and outbound requests, for maintainability.
2021-06-17 00:40:16 +00:00
Paul Hauner
90ea075c62 Revert "Network protocol upgrades (#2345)" (#2388)
## Issue Addressed

NA

## Proposed Changes

Reverts #2345 in the interests of getting v1.4.0 out this week. Once we have released that, we can go back to testing this again.

## Additional Info

NA
2021-06-02 01:07:28 +00:00
Kevin Lu
320a683e72 Minimum Outbound-Only Peers Requirement (#2356)
## Issue Addressed

#2325 

## Proposed Changes

This pull request changes the behavior of the Peer Manager by including a minimum outbound-only peers requirement. The peer manager will continue querying for peers if this outbound-only target number hasn't been met. Additionally, when peers are being removed, an outbound-only peer will not be disconnected if doing so brings us below the minimum. 

## Additional Info

Unit test for heartbeat function tests that disconnection behavior is correct. Continual querying for peers if outbound-only hasn't been met is not directly tested, but indirectly through unit testing of the helper function that counts the number of outbound-only peers.

EDIT: Am concerned about the behavior of ```update_peer_scores```. If we have connected to a peer with a score below the disconnection threshold (-20), then its connection status will remain connected, while its score state will change to disconnected. 

```rust
let previous_state = info.score_state();            
// Update scores            
info.score_update();
Self::handle_score_transitions(                
               previous_state,
                peer_id,
                info, 
               &mut to_ban_peers,
               &mut to_unban_peers,
               &mut self.events,
               &self.log,
);
```

```previous_state``` will be set to Disconnected, and then because ```handle_score_transitions``` only changes connection status for a peer if the state changed, the peer remains connected. Then in the heartbeat code, because we only disconnect healthy peers if we have too many peers, these peers don't get disconnected. I'm not sure realistically how often this scenario would occur, but it might be better to adjust the logic to account for scenarios where the score state implies a connection status different from the current connection status. 

Co-authored-by: Kevin Lu <kevlu93@gmail.com>
2021-05-31 04:18:19 +00:00
Age Manning
ec5cceba50 Correct issue with dialing peers (#2375)
The ordering of adding new peers to the peerdb and deciding when to dial them was not considered in a previous update.

This adds the condition that if a peer is not in the peer-db then it is an acceptable peer to dial.

This makes #2374 obsolete.
2021-05-29 07:25:06 +00:00
Age Manning
d12e746b50 Network protocol upgrades (#2345)
This provides a number of upgrades to gossipsub and discovery. 

The updates are extensive and this needs thorough testing.
2021-05-28 22:02:10 +00:00
Age Manning
55aada006f
More stringent dialing (#2363)
* More stringent dialing

* Cover cached enr dialing
2021-05-26 14:21:44 +10:00
ethDreamer
cb47388ad7 Updated to comply with new clippy formatting rules (#2336)
## Issue Addressed

The latest version of Rust has new clippy rules & the codebase isn't up to date with them.

## Proposed Changes

Small formatting changes that clippy tells me are functionally equivalent
2021-05-10 00:53:09 +00:00
Mac L
bacc38c3da Add testing for beacon node and validator client CLI flags (#2311)
## Issue Addressed

N/A

## Proposed Changes

Add unit tests for the various CLI flags associated with the beacon node and validator client. These changes require the addition of two new flags: `dump-config` and `immediate-shutdown`.

## Additional Info

Both `dump-config` and `immediate-shutdown` are marked as hidden since they should only be used in testing and other advanced use cases.
**Note:** This requires changing `main.rs` so that the flags can adjust the program behavior as necessary.

Co-authored-by: Paul Hauner <paul@paulhauner.com>
2021-05-06 00:36:22 +00:00
ethDreamer
0aa8509525 Filter Disconnected Peers from Discv5 DHT (#2219)
## Issue Addressed
#2107

## Proposed Change
The peer manager will mark peers as disconnected in the discv5 DHT when they disconnect or dial fails

## Additional Info
Rationale for this particular change is explained in my comment on #2107
2021-04-28 04:07:37 +00:00
Pawan Dhananjay
95a362213d Fix local testnet scripts (#2229)
## Issue Addressed

Resolves #2094 

## Proposed Changes

Fixes scripts for creating local testnets. Adds an option in `lighthouse boot_node` to run with a previously generated enr.
2021-03-30 05:17:58 +00:00
Michael Sproul
f9d60f5436 VC: accept unknown fields in chain spec (#2277)
## Issue Addressed

Closes #2274

## Proposed Changes

* Modify the `YamlConfig` to collect unknown fields into an `extra_fields` map, instead of failing hard.
* Log a debug message if there are extra fields returned to the VC from one of its BNs.

This restores Lighthouse's compatibility with Teku beacon nodes (and therefore Infura)
2021-03-26 04:53:57 +00:00
Age Manning
1c507c588e Update to the latest libp2p (#2239)
Updates to the latest libp2p and ignores RUSTSEC-2020-0146 from cargo-audit


Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-03-02 05:59:49 +00:00
Paul Hauner
8949ae7c4e Address ENR update loop (#2216)
## Issue Addressed

- Resolves #2215

## Proposed Changes

Addresses a potential loop when the majority of peers indicate that we are contactable via an IPv6 address.

See https://github.com/sigp/discv5/pull/62 for further rationale.

## Additional Info

The alternative to this PR is to use `--disable-enr-auto-update` and then manually supply an `--enr-address` and `--enr-upd-port`. However, that requires the user to know their IP addresses in order for discovery to work properly. This might not be practical/achievable for some users, hence this hotfix.
2021-02-21 23:47:52 +00:00
Paul Hauner
8e5c20b6d1 Update for clippy 1.50 (#2193)
## Issue Addressed

NA

## Proposed Changes

Rust 1.50 has landed 🎉

The shiny new `clippy` peers down upon us mere mortals with disgust. Brutish peasants wrapping our `usize`s in superfluous `Option`s... tsk tsk.

I've performed the goat sacrifice and corrected our evil ways in this PR. Tonight we shall pray that Github Actions bestows the almighty green tick upon us.

## Additional Info

NA


Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-02-15 00:09:12 +00:00
realbigsean
e20f64b21a Update to tokio 1.1 (#2172)
## Issue Addressed

resolves #2129
resolves #2099 
addresses some of #1712
unblocks #2076
unblocks #2153 

## Proposed Changes

- Updates all the dependencies mentioned in #2129, except for web3. They haven't merged their tokio 1.0 update because they are waiting on some dependencies of their own. Since we only use web3 in tests, I think updating it in a separate issue is fine. If they are able to merge soon though, I can update in this PR. 

- Updates `tokio_util` to 0.6.2 and `bytes` to 1.0.1.

- We haven't made a discv5 release since merging tokio 1.0 updates so I'm using a commit rather than release atm. **Edit:** I think we should merge an update of `tokio_util` to 0.6.2 into discv5 before this release because it has panic fixes in `DelayQueue`  --> PR in discv5:  https://github.com/sigp/discv5/pull/58

## Additional Info

tokio 1.0 changes that required some changes in lighthouse:

- `interval.next().await.is_some()` -> `interval.tick().await`
- `sleep` future is now `!Unpin` -> https://github.com/tokio-rs/tokio/issues/3028
- `try_recv` has been temporarily removed from `mpsc` -> https://github.com/tokio-rs/tokio/issues/3350
- stream features have moved to `tokio-stream` and `broadcast::Receiver::into_stream()` has been temporarily removed -> `https://github.com/tokio-rs/tokio/issues/2870
- I've copied over the `BroadcastStream` wrapper from this PR, but can update to use `tokio-stream` once it's merged https://github.com/tokio-rs/tokio/pull/3384

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-02-10 23:29:49 +00:00
Akihito Nakano
1a22a096c6 Fix clippy errors on tests (#2160)
## Issue Addressed

There are some clippy error on tests.


## Proposed Changes

Enable clippy check on tests and fix the errors. 💪
2021-01-28 23:31:06 +00:00
Paul Hauner
1eb0915301 Fix bug from #2163 (#2165)
## Issue Addressed

NA

## Proposed Changes

Fixes a bug that I missed during a review in #2163. I found this bug by observing that nodes were receiving far less attestations (~1/2 of previous).

I'm not certain on *exactly* how this mistake manifested in a reduction in attestations, but the mistake touches so much code that I think it's reasonable to declare that this it the cause of the observed issue (drop in attestations).

## Additional Info

NA
2021-01-20 10:28:12 +00:00
Paul Hauner
d9f940613f Represent slots in secs instead of millisecs (#2163)
## Issue Addressed

NA

## Proposed Changes

Copied from #2083, changes the config milliseconds_per_slot to seconds_per_slot to avoid errors when slot duration is not a multiple of a second. To avoid deserializing old serialized data (with milliseconds instead of seconds) the Serialize and Deserialize derive got removed from the Spec struct (isn't currently used anyway).

This PR replaces #2083 for the purpose of fixing a merge conflict without requiring the input of @blacktemplar.

## Additional Info

NA


Co-authored-by: blacktemplar <blacktemplar@a1.net>
2021-01-19 09:39:51 +00:00
Paul Hauner
805e152f66 Simplify enum -> str with strum (#2164)
## Issue Addressed

NA

## Proposed Changes

As per #2100, uses derives from the sturm library to implement AsRef<str> and AsStaticRef to easily get str values from enums without creating new Strings. Furthermore unifies all attestation error counter into one IntCounterVec vector.

These works are originally by @blacktemplar, I've just created this PR so I can resolve some merge conflicts.

## Additional Info

NA


Co-authored-by: blacktemplar <blacktemplar@a1.net>
2021-01-19 06:33:58 +00:00
realbigsean
7a71977987 Clippy 1.49.0 updates and dht persistence test fix (#2156)
## Issue Addressed

`test_dht_persistence` failing

## Proposed Changes

Bind `NetworkService::start` to an underscore prefixed variable rather than `_`.  `_` was causing it to be dropped immediately

This was failing 5/100 times before this update, but I haven't been able to get it to fail after updating it

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-01-19 00:34:28 +00:00
Pawan Dhananjay
28238d97b1 Disconnect from peers quicker on internet issues (#2147)
## Issue Addressed

Fixes #2146 

## Proposed Changes

Change ping timeout errors to return `LowToleranceErrors` so that we disconnect faster on internet failures/changes.
2021-01-13 08:09:10 +00:00
realbigsean
423dea169c update smallvec (#2152)
## Issue Addressed

`cargo audit` is failing because of a potential for an overflow in the version of `smallvec` we're using

## Proposed Changes

Update to the latest version of `smallvec`, which has the fix


Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-01-11 23:32:11 +00:00
Arthur Woimbée
851a4dca3c replace tempdir by tempfile (#2143)
## Issue Addressed

Fixes #2141 
Remove [tempdir](https://docs.rs/tempdir/0.3.7/tempdir/) in favor of [tempfile](https://docs.rs/tempfile/3.1.0/tempfile/).

## Proposed Changes

`tempfile` has a slightly different api that makes creating temp folders with a name prefix a chore (`tempdir::TempDir::new("toto")` => `tempfile::Builder::new().prefix("toto").tempdir()`).

So I removed temp folder name prefix where I deemed it not useful.

Otherwise, the functionality is the same.
2021-01-06 06:36:11 +00:00
Age Manning
7e4b190df0 Reduce ping interval (#2132)
## Issue Addressed

#2123

## Description

Reduces the TCP ping interval to increase our responsiveness to peer liveness changes.
2021-01-06 04:35:52 +00:00
Samuel E. Moelius
939fa717fd test_decode_malicious_status_message improvements (#2104)
## Issue Addressed

None

## Proposed Changes

* Correct typo in one comment, elaborate some others.
* Add asserts to ensure comments match code.
* Eliminate one unnecessary `clone`.

## Additional Info

None
2021-01-06 01:10:26 +00:00
Samuel E. Moelius
0245ddd37b Fix typo in ssz_snappy.rs comment (#2103)
## Issue Addressed

None

## Proposed Changes

Correct a typo in `ssz_snappy.rs`.

## Additional Info

Pedantry at it finest.
2021-01-06 01:10:24 +00:00
Age Manning
2931b05582 Update libp2p (#2101)
This is a little bit of a tip-of-the-iceberg PR. It houses a lot of code changes in the libp2p dependency. 

This needs a bit of thorough testing before merging. 

The primary code changes are:
- General libp2p dependency update
- Gossipsub refactor to shift compression into gossipsub providing performance improvements and improved API for handling compression



Co-authored-by: Paul Hauner <paul@paulhauner.com>
2020-12-23 07:53:36 +00:00
Samuel E. Moelius
3381266998 Eliminate uses of expect in ssz_snappy.rs (#2105)
## Issue Addressed

None

## Proposed Changes

Eliminate three uses of `expect` in `ssz_snappy.rs`.

## Additional Info

None
2020-12-22 02:28:37 +00:00
Pawan Dhananjay
f998eff7ce Subnet discovery fixes (#2095)
## Issue Addressed

N/A

## Proposed Changes

Fixes multiple issues related to discovering of subnet peers.
1. Subnet discovery retries after yielding no results
2. Metadata updates if peer send older metadata
3. peerdb stores the peer subscriptions from gossipsub
2020-12-17 00:39:15 +00:00
blacktemplar
3fcc517993 Fix Syncing Simulator (#2049)
## Issue Addressed

NA

## Proposed Changes

Fixes problems with slot times below 1 second which got revealed by running the syncing simulator with the default speedup time.
2020-12-16 05:37:38 +00:00
divma
11c299cbf6 impl Resource Unavailable RPC error (#2072)
## Issue Addressed

Related to #1891, The error is not in the spec yet (see ethereum/eth2.0-specs#2131)

## Proposed Changes

Implement the proposed error, banning peers that send it

## Additional Info

NA
2020-12-15 00:17:32 +00:00
Age Manning
4f85371ce8 Downgrades a valid log (#2057)
## Issue Addressed

#2046 

## Proposed Changes

The log was originally intended to verify the correct logic and ordering of events when scoring peers. The queued tasks can be structured in such a way that peers can be banned after they are disconnected. Therefore the error log is now downgraded to  debug log.
2020-12-08 10:48:45 +00:00