Commit Graph

191 Commits

Author SHA1 Message Date
divma
11c299cbf6 impl Resource Unavailable RPC error (#2072)
## Issue Addressed

Related to #1891, The error is not in the spec yet (see ethereum/eth2.0-specs#2131)

## Proposed Changes

Implement the proposed error, banning peers that send it

## Additional Info

NA
2020-12-15 00:17:32 +00:00
Age Manning
4f85371ce8 Downgrades a valid log (#2057)
## Issue Addressed

#2046 

## Proposed Changes

The log was originally intended to verify the correct logic and ordering of events when scoring peers. The queued tasks can be structured in such a way that peers can be banned after they are disconnected. Therefore the error log is now downgraded to  debug log.
2020-12-08 10:48:45 +00:00
divma
57489e620f fix default network handling (#2029)
## Issue Addressed
#1992 and #1987, and also to be considered a continuation of #1751

## Proposed Changes
many changed files but most are renaming to align the code with the semantics of `--network` 
- remove the `--network` default value (in clap) and instead set it after checking the `network` and `testnet-dir` flags
- move `eth2_testnet_config` crate to `eth2_network_config`
- move `Eth2TestnetConfig` to `Eth2NetworkConfig`
- move `DEFAULT_HARDCODED_TESTNET` to `DEFAULT_HARDCODED_NETWORK`
- `beacon_node`s `get_eth2_testnet_config` loads the `DEFAULT_HARDCODED_NETWORK` if there is no network nor testnet provided
- `boot_node`s config loads the config same as the `beacon_node`, it was using the configuration only for preconfigured networks (That code is ~1year old so I asume it was not intended)
- removed a one year old comment stating we should try to emulate `https://github.com/eth2-clients/eth2-testnets/tree/master/nimbus/testnet1` it looks outdated (?)
- remove `lighthouse`s `load_testnet_config` in favor of `get_eth2_network_config` to centralize that logic (It had differences)
- some spelling

## Additional Info
Both the command of #1992 and the scripts of #1987 seem to work fine, same as `bn` and `vc`
2020-12-08 05:41:10 +00:00
divma
f3200784b4 More metrics + RPC tweaks (#2041)
## Issue Addressed

NA

## Proposed Changes
This was mostly done to find the reason why LH was dropping peers from Nimbus. It proved to be useful so I think it's worth it. But there is also some functional stuff here
- Add metrics for rpc errors per client, error type and direction
- Add metrics for downscoring events per source type, client and penalty type
- Add metrics for gossip validation results per client for non-accepted messages
- Make the RPC handler return errors and requests/responses in the order we see them
- Allow a small burst for the Ping rate limit, from 1 every 5 seconds to 2 every 10 seconds
- Send rate limiting errors with a particular code and use that same code to identify them. I picked something different to 128 since that is most likely what other clients are using for their own errors
- Remove some unused code in the `PeerAction` and the rpc handler
- Remove the unused variant `RateLimited`. tTis was never produced directly, since the only way to get the request's protocol is via de handler. The handler upon receiving from LH a response with an error (rate limited in this case) emits this event with the missing info (It was always like this, just pointing out that we do downscore rate limiting errors regardless of the change)

Metrics for Nimbus looked like this:
Downscoring events: `increase(libp2p_peer_actions_per_client{client="Nimbus"}[5m])`
![image](https://user-images.githubusercontent.com/26765164/101210880-862bf280-3676-11eb-94c0-399f0bf5aa2e.png)

RPC Errors: `increase(libp2p_rpc_errors_per_client{client="Nimbus"}[5m])`
![image](https://user-images.githubusercontent.com/26765164/101210997-ba071800-3676-11eb-847a-f32405ede002.png)

Unaccepted gossip message: `increase(gossipsub_unaccepted_messages_per_client{client="Nimbus"}[5m])`
![image](https://user-images.githubusercontent.com/26765164/101211124-f470b500-3676-11eb-9459-132ecff058ec.png)
2020-12-08 03:55:50 +00:00
blacktemplar
a28e8decbf update dependencies (#2032)
## Issue Addressed

NA

## Proposed Changes

Updates out of date dependencies.

## Additional Info

See also https://github.com/sigp/lighthouse/issues/1712 for a list of dependencies that are still out of date and the resasons.
2020-12-07 08:20:33 +00:00
Age Manning
2682f46025 Fingerprint new client identify agent string (#2027)
Nimbus have modified their identify agent string. 

This PR adds their new agent string to identify new nimbus peers.
2020-12-03 22:07:14 +00:00
blacktemplar
d8cda2d86e Fix new clippy lints (#2036)
## Issue Addressed

NA

## Proposed Changes

Fixes new clippy lints in the whole project (mainly [manual_strip](https://rust-lang.github.io/rust-clippy/master/index.html#manual_strip) and [unnecessary_lazy_evaluations](https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_lazy_evaluations)). Furthermore, removes `to_string()` calls on literals when used with the `?`-operator.
2020-12-03 01:10:26 +00:00
Age Manning
c718e81eaf Add privacy option (#2016)
Adds a `--privacy` CLI flag to the beacon node that users may opt into. 

This does two things:
- Removes client identifying information from the identify libp2p protocol
- Changes the default graffiti to "" if no graffiti is set.
2020-11-30 22:55:08 +00:00
divma
8fcd22992c No string in slog (#2017)
## Issue Addressed

Following slog's documentation, this should help a bit with string allocations. I left it run for two days and mem usage is lower. This is of course anecdotal, but shouldn't harm anyway 

## Proposed Changes

remove `String` creation in logs when possible
2020-11-30 10:33:00 +00:00
Paul Hauner
85e69249e6 Drop discovery log to trace (#2007)
## Issue Addressed

NA

## Proposed Changes

This was causing:

```
Nov 28 21:56:08.154 ERRO slog-async: logger dropped messages due to channel overflow, count: 44, service: libp2p
```

## Additional Info

NA
2020-11-29 03:02:23 +00:00
Age Manning
a567f788bd Upgrade to tokio 0.3 (#1839)
## Description

This PR updates Lighthouse to tokio 0.3. It includes a number of dependency updates and some structural changes as to how we create and spawn tasks.

This also brings with it a number of various improvements:

- Discv5 update
- Libp2p update
- Fix for recompilation issues
- Improved UPnP port mapping handling
- Futures dependency update
- Log downgrade to traces for rejecting peers when we've reached our max



Co-authored-by: blacktemplar <blacktemplar@a1.net>
2020-11-28 05:30:57 +00:00
divma
fc07cc3fdf Sync metrics (#1975)
## Issue Addressed
- Add metrics to keep track of peer counts by sync type
- Add metric to keep track of the number of syncing chains in range

## Proposed Changes
Plugin to the network metrics update interval and update too the counts for peers wrt to their sync status with us

## Additional Info
For the peer counts
- By the way it is implemented the numbers won't always match to the total peer count in the `libp2p` metric.
- Updating the gauge with every change is messy because it requires to be updated on connection (in the `eth2_libp2p` crate, while metrics are defined in the `network` crate) on Goodbye sent (for an `IrrelevantPeer`) either in the `beacon_processor` or the `peer_manager`, and on disconnection. Since this is not a critical metric I think counting once every second is enough. If you think more accuracy is needed we can do it too, but it would be harder to maintain)

ATM those look like this
![image](https://user-images.githubusercontent.com/26765164/100275387-22137b00-2f60-11eb-93b9-94b0f265240c.png)
2020-11-26 05:23:17 +00:00
divma
3b4afc27bf Status race condition (#1967)
## Issue Addressed

Sync stalls due to race conditions between dc notifications and status processing
2020-11-25 02:15:38 +00:00
Age Manning
a96893744c
Update bootnodes and boot_node cli (#1961) 2020-11-25 02:01:37 +11:00
Paul Hauner
65b1cf2af1 Add flag to import all attestations (#1941)
## Issue Addressed

NA

## Proposed Changes

Adds the `--import-all-attestations` flag which tells the `network::AttestationService` to import/aggregate all attestations after verification (instead of only ones for subnets that are relevant to local validators).

This is useful for testing/debugging and also for creating back-up nodes that should be all cached up and ready for any validator.

## Additional Info

NA
2020-11-22 23:58:25 +00:00
Pawan Dhananjay
e47739047d Add additional libp2p tests (#1867)
## Issue Addressed

N/A

## Proposed Changes

Adds tests for the eth2_libp2p crate.
2020-11-19 22:32:09 +00:00
blacktemplar
3408de8151 Avoid string initialization in network metrics and replace by &str where possible (#1898)
## Issue Addressed

NA

## Proposed Changes

Removes most of the temporary string initializations in network metrics and replaces them by directly using `&str`. This further improves on PR https://github.com/sigp/lighthouse/pull/1895.

For the subnet id handling the current approach uses a build script to create a static map. This has the disadvantage that the build script hardcodes the number of subnets. If we want to use more than 64 subnets we need to adjust this in the build script.

## Additional Info

We still have some string initializations for the enum `PeerKind`. To also replace that by `&str` I created a PR in the libp2p dependency: https://github.com/sigp/rust-libp2p/pull/91. Either we wait with merging until this dependency PR is merged (and all conflicts with the newest libp2p version are resolved) or we just merge as is and I will create another PR when the dependency is ready.
2020-11-18 23:31:37 +00:00
Age Manning
49c4630045 Performance improvement for db reads (#1909)
This PR adds a number of improvements:
- Downgrade a warning log when we ignore blocks for gossipsub processing
- Revert a a correction to improve logging of peer score changes
- Shift syncing DB reads off the core-executor allowing parallel processing of large sync messages
- Correct the timeout logic of RPC chunk sends, giving more time before timing out RPC outbound messages.
2020-11-16 07:28:30 +00:00
divma
eb56140582 Update logs + do not downscore peers if WE time out (#1901)
## Issue Addressed

- RPC Errors were being logged twice: first in the peer manager and then again in the router, so leave just the peer manager's one 
- The "reduce peer count" warn message gets thrown to the user for every missed chunk, so instead print it when the request times out and also do not include there info that is not relevant to the user
- The processor didn't have the service tag so add it
- Impl `KV` for status message
- Do not downscore peers if we are the ones that timed out

Other small improvements
2020-11-16 04:06:14 +00:00
divma
8a16548715 Misc Peer sync info adjustments (#1896)
## Issue Addressed
#1856 

## Proposed Changes
- For clarity, the router's processor now only decides if a peer is compatible and it disconnects it or sends it to sync accordingly. No logic here regarding how useful is the peer. 
- Update peer_sync_info's rules
- Add an `IrrelevantPeer` sync status to account for incompatible peers (maybe this should be "IncompatiblePeer" now that I think about it?) this state is update upon receiving an internal goodbye in the peer manager
- Misc code cleanups
- Reduce the need to create `StatusMessage`s (and thus, `Arc` accesses )
- Add missing calls to update the global sync state

The overall effect should be:
- More peers recognized as Behind, and less as Unknown
- Peers identified as incompatible
2020-11-13 09:00:10 +00:00
Age Manning
c00e6c2c6f Small network adjustments (#1884)
## Issue Addressed

- Asymmetric pings - Currently with symmetric ping intervals, lighthouse nodes race each other to ping often ending in simultaneous ping connections. This shifts the ping interval to be asymmetric based on inbound/outbound connections
- Correct inbound/outbound peer-db registering - It appears we were accounting inbound as outbound and vice versa in the peerdb, this has been corrected
- Improved logging

There is likely more to come - I'll leave this open as we investigate further testnets
2020-11-13 06:06:33 +00:00
blacktemplar
c7ac967d5a handle peer state transitions on gossipsub score changes + refactoring (#1892)
## Issue Addressed

NA

## Proposed Changes

Correctly handles peer state transitions on gossipsub changes + refactors handling of peer state transitions into one function used for lighthouse score changes and gossipsub score changes.


Co-authored-by: Age Manning <Age@AgeManning.com>
2020-11-13 03:15:03 +00:00
blacktemplar
fcb4893f72 do subnet discoveries until we have MESH_N_LOW many peers (#1886)
## Issue Addressed

NA

## Proposed Changes

Increases the target peers for a subnet, so that subnet queries are executed until we have at least the minimum required peers for a mesh (`MESH_N_LOW`). We keep the limit of `6` target peers for aggregated subnet discovery queries, therefore the size (and the time needed) for a query doesn't change.
2020-11-13 00:56:05 +00:00
blacktemplar
7404f1ce54 Gossipsub scoring (#1668)
## Issue Addressed

#1606 

## Proposed Changes

Uses dynamic gossipsub scoring parameters depending on the number of active validators as specified in https://gist.github.com/blacktemplar/5c1862cb3f0e32a1a7fb0b25e79e6e2c.

## Additional Info

Although the parameters got tested on Medalla, extensive testing using simulations on larger networks is still to be done and we expect that we need to change the parameters, although this might only affect constants within the dynamic parameter framework.
2020-11-12 01:48:28 +00:00
divma
b0e9e3dcef Seen addresses store port (#1841)
## Issue Addressed
#1764
2020-11-09 04:01:03 +00:00
Age Manning
e2ae5010a6 Update libp2p (#1865)
Updates libp2p to the latest version. 

This adds tokio 0.3 support and brings back yamux support. 

This also updates some discv5 configuration parameters for leaner discovery queries
2020-11-06 04:14:14 +00:00
blacktemplar
7e7fad5734 Ignore RPC messages of disconnected peers and remove old peers based on disconnection time (#1854)
## Issue Addressed

NA

## Proposed Changes

Lets the networking behavior ignore messages of peers that are not connected. Furthermore, old peers are not removed from the peerdb based on score anymore but based on the disconnection time.
2020-11-03 23:43:10 +00:00
Age Manning
0a0f4daf9d Prevent errors for stream termination race (#1853)
Prevents an error being propagated on a race condition for RPC stream termination
2020-11-03 10:37:00 +00:00
divma
6c0c050fbb Tweak head syncing (#1845)
## Issue Addressed

Fixes head syncing

## Proposed Changes

- Get back to statusing peers after removing chain segments and making the peer manager deal with status according to the Sync status, preventing an old known deadlock
- Also a bug where a chain would get removed if the optimistic batch succeeds being empty

## Additional Info

Tested on Medalla and looking good
2020-11-01 23:37:39 +00:00
blacktemplar
2bd5b9182f fix unbanning of peers (#1838)
## Issue Addressed

NA

## Proposed Changes

Currently a banned peer will remain banned indefinitely as long as update is called on the score struct regularly. This fixes this bug and the score decay starts after `BANNED_BEFORE_DECAY` seconds after banning.
2020-10-29 01:25:02 +00:00
divma
ad846ad280 Inform peers of requests that exceed the maximum rate limit + log downgrade (#1830)
## Issue Addressed

#1825 

## Proposed Changes

Since we penalize more blocks by range requests that have large steps, it is possible to get requests that will never be processed. We were not informing peers about this requests and also logging CRIT that is no longer relevant. Later we should check if more sophisticated handling for those requests is needed
2020-10-27 11:46:38 +00:00
Age Manning
7453f39d68 Prevent unbanning of disconnected peers (#1822)
## Issue Addressed

Further testing revealed another edge case where we attempt to unban a peer that can be in a disconnected start. Although this causes no real issue, it does log an error to the user. 

This PR adds a check to prevent this edge case and prevents the error being logged to the user.
2020-10-24 05:24:20 +00:00
Age Manning
a3cc1a1e0f Call unban only when necessary (#1821)
This PR prevents a user-facing error. 

It prevents optimistically unbanning a peer and instead checks the state of the peer before requesting the peers state to be unbanned.
2020-10-24 03:24:19 +00:00
blacktemplar
1644289a08 Updates the libp2p to the second newest commit => Allow only one topic per message (#1819)
As @AgeManning mentioned the newest libp2p version had some problems and got downgraded again on lighthouse master. This is an intermediate version that makes no problems and only adds a small change of allowing only one topic per message.
2020-10-24 01:05:37 +00:00
Age Manning
7870b81ade Downgrade libp2p (#1817)
## Description

This downgrades the recent libp2p upgrade. 

There were issues with the RPC which prevented syncing of the chain and this upgrade needs to be further investigated.
2020-10-23 09:33:59 +00:00
Age Manning
2c7f362908 Discovery v5.1 (#1786)
## Overview 

This updates lighthouse to discovery v5.1

Note: This makes lighthouse's discovery not compatible with any previous version. Lighthouse cannot discover peers or send/receive ENR's from any previous version. This is a breaking change. 

This resolves #1605
2020-10-23 04:16:33 +00:00
Age Manning
c49dd94e20 Update to latest libp2p (#1810)
## Description

Updates to the latest libp2p and includes gossipsub updates. 

Of particular note is the limitation of a single topic per gossipsub message.

Co-authored-by: blacktemplar <blacktemplar@a1.net>
2020-10-23 03:01:31 +00:00
Age Manning
66f0cf4430 Improve peer handling (#1796)
## Issue Addressed

Potentially resolves #1647 and sync stalls. 

## Proposed Changes

The handling of the state of banned peers was inadequate for the complex peerdb data structure. We store a limited number of disconnected and banned peers in the db. We were not tracking intermediate "disconnecting" states and the in some circumstances we were updating the peer state without informing the peerdb. This lead to a number of inconsistencies in the peer state. 

Further, the peer manager could ban a peer changing a peer's state from being connected to banned. In this circumstance, if the peer then disconnected, we didn't inform the application layer, which lead to applications like sync not being informed of a peers disconnection. This could lead to sync stalling and having to require a lighthouse restart. 

Improved handling for peer states and interactions with the peerdb is made in this PR.
2020-10-23 01:27:48 +00:00
realbigsean
a3552a4b70 Node endpoints (#1778)
## Issue Addressed

`node` endpoints in #1434

## Proposed Changes

Implement these:
```
 /eth/v1/node/health
 /eth/v1/node/peers/{peer_id}
 /eth/v1/node/peers
```
- Add an `Option<Enr>` to `PeerInfo`
- Finish implementation of `/eth/v1/node/identity`

## Additional Info
- should update the `peers` endpoints when #1764 is resolved



Co-authored-by: realbigsean <seananderson33@gmail.com>
2020-10-22 02:59:42 +00:00
divma
668513b67e Sync state adjustments (#1804)
check for advanced peers and the state of the chain wrt the clock slot to decide if a chain is or not synced /transitioning to a head sync. Also a fix that prevented getting the right state while syncing heads
2020-10-22 00:26:06 +00:00
divma
2acf75785c More sync updates (#1791)
## Issue Addressed
#1614 and a couple of sync-stalling problems, the most important is a cyclic dependency between the sync manager and the peer manager
2020-10-20 22:34:18 +00:00
blacktemplar
6ba997b88e add direction information to PeerInfo (#1768)
## Issue Addressed

NA

## Proposed Changes

Adds a direction field to `PeerConnectionStatus` that can be accessed by calling `is_outgoing` which will return `true` iff the peer is connected and the first connection was an outgoing one.
2020-10-16 05:24:21 +00:00
Herman Junge
d7b9d0dd9f Implement matches! macro (#1777)
Fix #1775
2020-10-15 21:42:43 +00:00
Pawan Dhananjay
97be2ca295 Simulator and attestation service fixes (#1747)
## Issue Addressed

#1729 #1730 

Which issue # does this PR address?

## Proposed Changes

1. Fixes a bug in the simulator where nodes can't find each other due to 0 udp ports in their enr.
2. Fixes bugs in attestation service where we are unsubscribing from a subnet prematurely.

More testing is needed for attestation service fixes.
2020-10-15 07:11:31 +00:00
blacktemplar
a0634cc64f Gossipsub topic filters (#1767)
## Proposed Changes

Adds a gossipsub topic filter that only allows subscribing and incoming subscriptions from valid ETH2 topics.

## Additional Info

Currently the preparation of the valid topic hashes uses only the current fork id but in the future it must also use all possible future fork ids for planned forks. This has to get added when hard coded forks get implemented.

DO NOT MERGE: We first need to merge the libp2p changes (see https://github.com/sigp/rust-libp2p/pull/70) so that we can refer from here to a commit hash inside the lighthouse branch.
2020-10-14 10:12:57 +00:00
blacktemplar
8248afa793 Updates the message-id according to the Networking Spec (#1752)
## Proposed Changes

Implement the new message id function (see https://github.com/ethereum/eth2.0-specs/pull/2089) using an additional fast message id function for better performance + caching decompressed data.
2020-10-14 06:51:58 +00:00
Pawan Dhananjay
99a02fd2ab Limit snappy input stream (#1738)
## Issue Addressed

N/A

## Proposed Changes

This PR limits the length of the stream received by the snappy decoder to be the maximum allowed size for the received rpc message type. Also adds further checks to ensure that the length specified in the rpc [encoding-dependent header](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md#encoding-strategies) is within the bounds for the rpc message type being decoded.
2020-10-11 22:45:33 +00:00
Paul Hauner
0e4cc50262
Remove unused deps 2020-10-09 15:58:20 +11:00
Paul Hauner
db3e0578e9
Merge branch 'v0.3.0-staging' into v3-master 2020-10-09 15:27:08 +11:00
Paul Hauner
da44821e39
Clean up obsolete TODOs (#1734)
Squashed commit of the following:

commit f99373cbaec9adb2bdbae3f7e903284327962083
Author: Age Manning <Age@AgeManning.com>
Date:   Mon Oct 5 18:44:09 2020 +1100

    Clean up obsolute TODOs
2020-10-05 21:08:14 +11:00
Paul Hauner
ee7c8a0b7e Update external deps (#1711)
## Issue Addressed

- Resolves #1706 

## Proposed Changes

Updates dependencies across the workspace. Any crate that was not able to be brought to the latest version is listed in #1712.

## Additional Info

NA
2020-10-05 08:22:19 +00:00
Age Manning
240181e840
Upgrade discovery and restructure task execution (#1693)
* Initial rebase

* Remove old code

* Correct release tests

* Rebase commit

* Remove eth2-testnet dep on eth2libp2p

* Remove crates lost in rebase

* Remove unused dep
2020-10-05 18:45:54 +11:00
Age Manning
a8c5af8874
Increase content-id length (#1725)
## Issue Addressed

N/A

## Proposed Changes

Increase gossipsub's content-id length to the full 32 byte hash. 

## Additional Info

N/A
2020-10-05 17:33:42 +11:00
Age Manning
47c921f326 Update libp2p (#1728)
## Issue Addressed

N/A

## Proposed Changes

Updates the libp2p dependency to the latest version

## Additional Info

N/A
2020-10-05 05:16:27 +00:00
Age Manning
6b68c628df Increase content-id length (#1725)
## Issue Addressed

N/A

## Proposed Changes

Increase gossipsub's content-id length to the full 32 byte hash. 

## Additional Info

N/A
2020-10-04 23:49:16 +00:00
divma
e3c7b58657 Address a couple of TODOs (#1724)
## Issue Addressed
couple of TODOs
2020-10-04 22:50:44 +00:00
Sean
6af3bc9ce2
Add UPnP support for Lighthouse (#1587)
This commit was modified by Paul H whilst rebasing master onto
v0.3.0-staging

Adding UPnP support will help grow the DHT by allowing NAT traversal for peers with UPnP supported routers.

Using IGD library: https://docs.rs/igd/0.10.0/igd/

Adding the  the libp2p tcp port and discovery udp port. If this fails it simply logs the attempt and moves on

Co-authored-by: Age Manning <Age@AgeManning.com>
2020-10-03 10:07:47 +10:00
Sean
94b17ce02b Add UPnP support for Lighthouse (#1587)
Adding UPnP support will help grow the DHT by allowing NAT traversal for peers with UPnP supported routers.

## Issue Addressed

#927 

## Proposed Changes

Using IGD library: https://docs.rs/igd/0.10.0/igd/

Adding the  the libp2p tcp port and discovery udp port. If this fails it simply logs the attempt and moves on

## Additional Info



Co-authored-by: Age Manning <Age@AgeManning.com>
2020-10-02 08:47:00 +00:00
Pawan Dhananjay
8e20176337
Directory restructure (#1532)
Closes #1487
Closes #1427

Directory restructure in accordance with #1487. Also has temporary migration code to move the old directories into new structure.
Also extracts all default directory names and utility functions into a `directory` crate to avoid repetitio.

~Since `validator_definition.yaml` stores absolute paths, users will have to manually change the keystore paths or delete the file to get the validators picked up by the vc.~. `validator_definition.yaml` is migrated as well from the default directories.

Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
2020-10-01 11:12:35 +10:00
Age Manning
13cb642f39
Update boot-node and discovery (#1682)
* Improve boot_node and upgrade discovery

* Clippy lints
2020-09-29 18:28:29 +10:00
blacktemplar
ae28773965
Networking bug fixes (#1684)
* call correct unsubscribe method for subnets

* correctly delegate closed connections in behaviour

* correct unsubscribe method name
2020-09-29 18:28:15 +10:00
Age Manning
28b6d921c6 Remove banned peers from DHT and track IPs (#1656)
## Issue Addressed

#629 

## Proposed Changes

This removes banned peers from the DHT and informs discovery to block the node_id and the known source IP's associated with this node. It has the capabilities of un banning this peer after a period of time. 

This also corrects the logic about banning specific IP addresses. We now use seen_ip addresses from libp2p rather than those sent to us via identify (which also include local addresses).
2020-09-25 01:52:39 +00:00
divma
b8013b7b2c Super Silky Smooth Syncs, like a Sir (#1628)
## Issue Addressed
In principle.. closes #1551 but in general are improvements for performance, maintainability and readability. The logic for the optimistic sync in actually simple

## Proposed Changes
There are miscellaneous things here:
- Remove unnecessary `BatchProcessResult::Partial` to simplify the batch validation logic
- Make batches a state machine. This is done to ensure batch state transitions respect our logic (this was previously done by moving batches between `Vec`s) and to ease the cognitive load of the `SyncingChain` struct
- Move most batch-related logic to the batch
- Remove `PendingBatches` in favor of a map of peers to their batches. This is to avoid duplicating peers inside the chain (peer_pool and pending_batches)
- Add `must_use` decoration to the `ProcessingResult` so that chains that request to be removed are handled accordingly. This also means that chains are now removed in more places than before to account for unhandled cases
- Store batches in a sorted map (`BTreeMap`) access is not O(1) but since the number of _active_ batches is bounded this should be fast, and saves performing hashing ops. Batches are indexed by the epoch they start. Sorted, to easily handle chain advancements (range logic)
- Produce the chain Id from the identifying fields: target root and target slot. This, to guarantee there can't be duplicated chains and be able to consistently search chains by either Id or checkpoint
- Fix chain_id not being present in all chain loggers
- Handle mega-edge case where the processor's work queue is full and the batch can't be sent. In this case the chain would lose the blocks, remain in a "syncing" state and waiting for a result that won't arrive, effectively stalling sync.
- When a batch imports blocks or the chain starts syncing with a local finalized epoch greater that the chain's start epoch, the chain is advanced instead of reset. This is to avoid losing download progress and validate batches faster. This also means that the old `start_epoch` now means "current first unvalidated batch", so it represents more accurately the progress of the chain.
- Batch status peers from the same chain to reduce Arc access.
- Handle a couple of cases where the retry counters for a batch were not updated/checked are now handled via the batch state machine. Basically now if we forget to do it, we will know.
- Do not send back the blocks from the processor to the batch. Instead register the attempt before sending the blocks (does not count as failed)
- When re-requesting a batch, try to avoid not only the last failed peer, but all previous failed peers.
- Optimize requesting batches ahead in the buffer by shuffling idle peers just once (this is just addressing a couple of old TODOs in the code)
- In chain_collection, store chains by their id in a map
- Include a mapping from request_ids to (chain, batch) that requested the batch to avoid the double O(n) search on block responses
- Other stuff:
  - impl `slog::KV` for batches
  - impl `slog::KV` for syncing chains
  - PSA: when logging, we can use `%thing` if `thing` implements `Display`. Same for `?` and `Debug`

### Optimistic syncing:
Try first the batch that contains the current head, if the batch imports any block, advance the chain. If not, if this optimistic batch is inside the current processing window leave it there for future use, if not drop it. The tolerance for this block is the same for downloading, but just once for processing



Co-authored-by: Age Manning <Age@AgeManning.com>
2020-09-23 06:29:55 +00:00
Age Manning
80e52a0263 Subscribe to core topics after sync (#1613)
## Issue Addressed

N/A

## Proposed Changes

Prevent subscribing to core gossipsub topics until after we have achieved a full sync. This prevents us censoring gossipsub channels, getting penalised in gossipsub 1.1 scoring and saves us computation time in attempting to validate gossipsub messages which we will be unable to do with a non-sync'd chain.
2020-09-23 03:26:33 +00:00
Pawan Dhananjay
14ff38539c Add trusted peers (#1640)
## Issue Addressed

Closes #1581 

## Proposed Changes

Adds a new cli option for trusted peers who always have the maximum possible score.
2020-09-22 01:12:36 +00:00
Age Manning
1db8daae0c Shift metadata to the global network variables (#1631)
## Issue Addressed

N/A

## Proposed Changes

Shifts the local `metadata` to `network_globals` making it accessible to the HTTP API and other areas of lighthouse.

## Additional Info

N/A
2020-09-21 02:00:38 +00:00
Pawan Dhananjay
7b97c4ad30 Snappy additional sanity checks (#1625)
## Issue Addressed

N/A

## Proposed Changes

Adds the following check from the spec

> A reader SHOULD NOT read more than max_encoded_len(n) bytes after reading the SSZ length-prefix n from the header.
2020-09-21 01:06:25 +00:00
Age Manning
49ab414594 Shift gossipsub validation (#1612)
## Issue Addressed

N/A

## Proposed Changes

This will consider all gossipsub messages that have either the `from`, `seqno` or `signature` field as invalid. 

## Additional Info

We should not merge this until all other clients have been sending empty fields for a while.

See https://github.com/ethereum/eth2.0-specs/issues/1981 for reference
2020-09-18 02:05:36 +00:00
Age Manning
2074beccdc Gossipsub message id to shortened bytes (#1607)
## Issue Addressed

https://github.com/ethereum/eth2.0-specs/pull/2044

## Proposed Changes

Shifts the gossipsub message id to use the first 8 bytes of the SHA256 hash of the gossipsub message data field.

## Additional Info

We should merge this in once the spec has been decided on. It will cause issues with gossipsub scoring and gossipsub propagation rates (as we won't receive IWANT) messages from clients that also haven't made this update.
2020-09-18 02:05:34 +00:00
Age Manning
c6abc56113 Prevent large step-size parameters (#1583)
## Issue Addressed

Malicious users could request very large block ranges, more than we expect. Although technically legal, we are now quadraticaly weighting large step sizes in the filter. Therefore users may request large skips, but not a large number of blocks, to prevent requests forcing us to do long chain lookups. 

## Proposed Changes

Weight the step parameter in the RPC filter and prevent any overflows that effect us in the step parameter.

## Additional Info
2020-09-11 02:33:36 +00:00
Pawan Dhananjay
0525876882 Dial cached enr's before making subnet discovery query (#1376)
## Issue Addressed

Closes #1365 

## Proposed Changes

Dial peers in the `cached_enrs` who aren't connected, aren't banned and satisfy the subnet predicate before making a subnet discovery query.
2020-09-11 00:52:27 +00:00
Age Manning
d79366c503 Prevent printing binary in RPC errors (#1604)
## Issue Addressed

#1566 

## Proposed Changes

Prevents printing binary characters in the RPC error response from peers.
2020-09-10 04:43:22 +00:00
Daniel Schonfeld
2a9a815f29 conforming to the p2p specs, requiring error_messages to be bound (#1593)
## Issue Addressed

#1421 

## Proposed Changes

Bounding the error_message that can be returned for RPC domain errors


Co-authored-by: Age Manning <Age@AgeManning.com>
2020-09-07 06:47:05 +00:00
Age Manning
a6376b4585 Update discv5 to v10 (#1592)
## Issue Addressed

Code improvements, dependency improvements and better async handling.
2020-09-07 05:53:20 +00:00
Age Manning
fb9d828e5e Extended Gossipsub metrics (#1577)
## Issue Addressed

N/A

## Proposed Changes

Adds extended metrics to get a better idea of what is happening at the gossipsub layer of lighthouse. This provides information about mesh statistics per topics, subscriptions and peer scores. 

## Additional Info
2020-09-01 06:59:14 +00:00
blacktemplar
c18d37c202 Use Gossipsub 1.1 (#1516)
## Issue Addressed

#1172

## Proposed Changes

* updates the libp2p dependency
* small adaptions based on changes in libp2p
* report not just valid messages but also invalid and distinguish between `IGNORE`d messages and `REJECT`ed messages


Co-authored-by: Age Manning <Age@AgeManning.com>
2020-08-30 13:06:50 +00:00
blacktemplar
2bc9115a94 reuse beacon_node methods for initializing network configs in boot_node (#1520)
## Issue Addressed

#1378

## Proposed Changes

Boot node reuses code from beacon_node to initialize network config. This also enables using the network directory to store/load the enr and the private key.

## Additional Info

Note that before this PR the port cli arguments were off (the argument was named `enr-port` but used as `boot-node-enr-port`).
Therefore as port always the cli port argument was used (for both enr and listening). Now the enr-port argument can be used to overwrite the listening port as the public port others should connect to.

Last but not least note, that this restructuring reuses `ethlibp2p::NetworkConfig` that has many more options than the ones used in the boot node. For example the network config has an own `discv5_config` field that gets never used in the boot node and instead another `Discv5Config` gets created later in the boot node process.

Co-authored-by: Age Manning <Age@AgeManning.com>
2020-08-21 12:00:01 +00:00
blacktemplar
3f0a113c7f ban IP addresses if too many banned peers for this IP address (#1543)
## Issue Addressed

#1283 

## Proposed Changes

All peers with the same IP will be considered banned as long as there are more than 5 (constant) peers with this IP that have a score below the ban threshold. As soon as some of those 5 peers get unbanned (through decay) and if there are then less than 5 peers with a score below the threshold the IP will be considered not banned anymore.
2020-08-21 01:41:12 +00:00
Pawan Dhananjay
bbed42f30c Refactor attestation service (#1415)
## Issue Addressed

N/A

## Proposed Changes

Refactor attestation service to send out requests to find peers for subnets as soon as we get attestation duties. 
Earlier, we had much more involved logic to send the discovery requests to the discovery service only 6 slots before the attestation slot. Now that discovery is much smarter with grouped queries, the complexity in attestation service can be reduced considerably.



Co-authored-by: Age Manning <Age@AgeManning.com>
2020-08-19 08:46:25 +00:00
divma
fdc6e2aa8e Shutdown like a Sir (#1545)
## Issue Addressed
#1494 

## Proposed Changes
- Give the TaskExecutor the sender side of a channel that a task can clone to request shutting down
- The receiver side of this channel is in environment and now we block until ctrl+c or an internal shutdown signal is received
- The swarm now informs when it has reached 0 listeners
- The network receives this message and requests the shutdown
2020-08-19 05:51:14 +00:00
Age Manning
e1e5002d3c Fingerprint Lodestar (#1536)
Fingerprints the Lodestar client
2020-08-18 06:28:24 +00:00
Paul Hauner
a58aa6ee55 Revert back to discv5 alpha 8 to maintain ARM support (#1531)
## Issue Addressed

NA

## Proposed Changes

See title.

## Additional Info

NA
2020-08-17 10:06:08 +00:00
Age Manning
3c689a6837 Remove yamux support (#1526)
## Issue Addressed

There is currently an issue with yamux when connecting to prysm peers. The source of the issue is currently unknown. 

This PR removes yamux support to force mplex negotation. We can add back yamux support once we have isolated and corrected the issue.
2020-08-17 05:05:06 +00:00
Pawan Dhananjay
850a2d5985 Persist metadata and enr across restarts (#1513)
## Issue Addressed

Resolves #1489 

## Proposed Changes

- Change starting metadata seq num to 0 according to the [spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md#metadata).
- Remove metadata field from `NetworkGlobals`
- Persist metadata to disk on every update
- Load metadata seq number from disk on restart
- Persist enr to disk on update to ensure enr sequence number increments are persisted as well.

## Additional info

Since we modified starting metadata seq num to 0 from 1, we might still see `Invalid Sequence number provided` like in #1489  from prysm nodes if they have our metadata cached.
2020-08-17 02:13:28 +00:00
divma
113b40f321 Add multiaddr support in bootnodes (#1481)
## Issue Addressed
#1384 

Only catch, as currently implemented, when dialing the multiaddr nodes, there is no way to ask the peer manager if they are already connected or dialing
2020-08-17 02:13:26 +00:00
Age Manning
99acfb50f2 Update gossipsub duplicate cache (#1524)
This potentially handles memory leak issues by preventing adding references to already seen gossipsub messages.
2020-08-17 01:27:33 +00:00
Age Manning
c75c06cf16 Update discv5 to alpha.9 (#1517)
## Discovery v5 update

In this update we remove the openssl dependency in favour of rust-crypto. 

The update also removes a series of unnecessary async functions which may improve some of the issues we have been experiencing.
2020-08-15 04:02:14 +00:00
Paul Hauner
b063df5bf9 Cross-compile to vendored x86_84, aarch64 (Raspberry Pi 4) (#1497)
## Issue Addressed

NA

## Proposed Changes

Adds support for using the [`cross`](https://github.com/rust-embedded/cross) project to produce cross-compiled binaries using Docker images.

Provides quite clean and simple cross-compiles cause all the complexity is hidden in Dockerfiles. It does require you to be in the `docker` group though.

## Details

- Adds shortcut commands to `Makefile`
- Ensures `reqwest` and `discv5` use vendored openssl libs (i.e., static not shared).
- Switches to a [commit](284f705964) of blst that has a renamed C function to avoid a collision with openssl (upstream issue: https://github.com/supranational/blst/issues/21).
- Updates `ring` to the latest satisfiable version, since an earlier version was causing issues with `cross`.
- Off-topic, but adds extra message about Windows support as suggested by Discord user.

## Additional Info

- ~~Blocked on #1495~~
- There are no tests in CI for this yet for a few reasons:
  - I'm hesitant to add more long-running tasks.
  - Short-term bitrot should be avoided since we'll use it each release.
  - In the long term I think it would be good to automate binary creation on a release.
- I observed the binaries increase in size from 50mb to 52mb after these changes.
2020-08-11 05:16:30 +00:00
divma
1a67d15701 Mitigate too many outgoing connections (#1469)
limit simultaneous outgoing connections attempts to a reasonable top as an extra layer of protection
also shift the keep alive logic of the rpc handler to avoid needing to update it by hand. I think In rare cases this could make shutting down a connection a bit faster.
2020-08-11 02:16:31 +00:00
Age Manning
cbfae87aa6 Upgrade logs (#1495)
## Issue Addressed

#1483 

## Proposed Changes

Upgrades the log to a critical if a listener fails. We are able to listen on many interfaces so a single instance is not critical. We should however gracefully shutdown the client if we have no listeners, although the client can still function solely on outgoing connections.

For now a critical is raised and I leave #1494 for more sophisticated handling of this. 

This also updates discv5 to handle errors of binding to a UDP socket such that lighthouse is now able to handle them.
2020-08-10 05:19:51 +00:00
Age Manning
04e4389efe Patch gossipsub (#1490)
## Issue Addressed

Some nodes not following head, high CPU usage and HTTP API delays

## Proposed Changes

Patches gossipsub. Gossipsub was using an `lru_time_cache` to check for duplicates. This contained an `O(N)` lookup for every gossipsub message to update the time cache. This was causing high cpu usage and blocking network threads. 

This PR introduces a custom cache without `O(N)` inserts. 

This also adds built in safety mechanisms to prevent gossipsub from excessively retrying connections upon failure. A maximum limit is set after which we disconnect from the node from too many failed substream connections.
2020-08-08 08:09:04 +00:00
Age Manning
08a31c5a1a Disconnect peers (#1484)
## Issue Addressed

Peers that connected after the peer limit may remain connected in some circumstances. 

This ensures peers not in the peer manager's list get disconnected. Further logging is also added to track this behaviour.
2020-08-08 06:08:44 +00:00
Age Manning
a1f9769040 Libp2p update (#1482)
Updates to latest libp2p master. 

This now has native noise support. 

This PR
- Removes secio support
- Prioritises mplex over yamux
2020-08-08 02:17:32 +00:00
divma
7d87e11e0f Fix rpc coded response display (#1470)
Prevent errors to be printed in debug mode
2020-08-06 04:29:23 +00:00
Pawan Dhananjay
983f768034 Remove ssz encoding support from rpc (#1457)
## Issue Addressed

Partially resolves #1422 

## Proposed Changes

Remove ssz encoding from req/resp in rpc.
2020-08-06 04:29:19 +00:00
Age Manning
09a615b2c0 Lighthouse crate v0.2.0 bump (#1450)
## Description

This PR marks Lighthouse v0.2.0. 

This release marks the stable version of Lighthouse, ready for the approaching Medalla testnet.
2020-08-06 03:43:05 +00:00
divma
924ba66218 Update v0.12.2 gossip params (#1449)
## Issue Addressed
#1422
2020-08-06 00:04:33 +00:00
Paul Hauner
f26adc0a36 Lighthouse v0.2.0 (Medalla) (#1452)
## Issue Addressed

NA

## Proposed Changes

- Moves the git-based versioning we were doing into the `lighthouse_version` crate in `common`.
- Removes the `beacon_node/version` crate, replacing it with `lighthouse_version`.
- Bumps the version to `v0.2.0`.

## Additional Info

There are now two types of version string:

1. `const VERSION: &str = Lighthouse/v0.2.0-1419501f2+`
1. `version_with_platform() = Lighthouse/v0.2.0-1419501f2+/x86_64-linux`

(1) is handy cause it's a `const` and shorter. (2) has platform info so it's more useful. Note that the plus-sign (`+`) indicates the the git commit is dirty (it used to be `(modified)` but I had to shorten it to fit into graffiti).

These version strings are now included on:

- `lighthouse --version`
- `lcli --version`
- `curl localhost:5052/node/version`
- p2p messages when we communicate our version

You can update the version by changing this constant (version is not related to a `Cargo.toml`):

b9ad7102d5/common/lighthouse_version/src/lib.rs (L4-L15)
2020-08-04 07:44:53 +00:00
divma
1bbecbcf26 Track gossip subscriptions as a metric (#1445)
## Issue Addressed
#1399 

## Proposed Changes
Set an Int gauge per topic and inc/dec when peers subscribe/unsubscribe
2020-08-04 04:18:10 +00:00
Age Manning
31707ccf45 Shift author to sigma prime on some crates (#1440)
Shifts the author to sigma prime on some crates
2020-08-04 02:31:41 +00:00