Commit Graph

1273 Commits

Author SHA1 Message Date
Pawan Dhananjay
850a2d5985 Persist metadata and enr across restarts (#1513)
## Issue Addressed

Resolves #1489 

## Proposed Changes

- Change starting metadata seq num to 0 according to the [spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md#metadata).
- Remove metadata field from `NetworkGlobals`
- Persist metadata to disk on every update
- Load metadata seq number from disk on restart
- Persist enr to disk on update to ensure enr sequence number increments are persisted as well.

## Additional info

Since we modified starting metadata seq num to 0 from 1, we might still see `Invalid Sequence number provided` like in #1489  from prysm nodes if they have our metadata cached.
2020-08-17 02:13:28 +00:00
divma
113b40f321 Add multiaddr support in bootnodes (#1481)
## Issue Addressed
#1384 

Only catch, as currently implemented, when dialing the multiaddr nodes, there is no way to ask the peer manager if they are already connected or dialing
2020-08-17 02:13:26 +00:00
Age Manning
99acfb50f2 Update gossipsub duplicate cache (#1524)
This potentially handles memory leak issues by preventing adding references to already seen gossipsub messages.
2020-08-17 01:27:33 +00:00
Age Manning
c75c06cf16 Update discv5 to alpha.9 (#1517)
## Discovery v5 update

In this update we remove the openssl dependency in favour of rust-crypto. 

The update also removes a series of unnecessary async functions which may improve some of the issues we have been experiencing.
2020-08-15 04:02:14 +00:00
Paul Hauner
f4a7311008 Update to v0.2.3 (#1519)
## Issue Addressed

NA

## Proposed Changes

Bump versions to v0.2.3.

## Additional Info

NA
2020-08-14 08:32:31 +00:00
Paul Hauner
619ad106cf Restrict fork choice getters to finalized blocks (#1475)
## Issue Addressed

- Resolves #1451

## Proposed Changes

- Restricts the `contains_block` and `contains_block` so they only indicate a block is present if it descends from the finalized root. This helps to ensure that fork choice never points to a block that has been pruned from the database.
- Resolves #1451
- Before importing a block, double-check that its parent is known and a descendant of the finalized root.
- Split a big, monolithic block verification test into smaller tests. 

## Additional Notes

I suspect there would be a craftier way to do the `is_descendant_of_finalized` check, but we're a bit tight on time now and we can optimize later if it starts showing in benches.

## TODO

- [x] Tests
2020-08-14 06:36:38 +00:00
Paul Hauner
b0a3731fff Introduce a queue for attestations from the network (#1511)
## Issue Addressed

N/A

## Proposed Changes

Introduces the `GossipProcessor`, a multi-threaded (multi-tasked?), non-blocking processor for some messages from the network which require verification and import into the `BeaconChain`.

Initial testing indicates that this massively improves system stability by (a) moving block tasks from the normal executor (b) spreading out attestation load.

## Additional Info

TBC
2020-08-14 04:38:45 +00:00
Adam Szkoda
05a8399769 Wind down the SSE thread when the client disconnects (#1514)
These started to appear when I `^C` `curl -N http://localhost:5052/beacon/fork/stream`: `Aug 12 13:00:01.539 ERRO Couldn't stream piece hyper::Error(ChannelClosed), service: http`

Something must have changed in hyper since SSE has been implemented because I'm sure I haven't seen those errors before.

This PR properly detects a closed SSE stream and cleans up.
2020-08-13 06:12:18 +00:00
Adam Szkoda
8a1a4051cf Fix a bug in fork pruning (#1507)
Extracted from https://github.com/sigp/lighthouse/pull/1380 because merging #1380 proves to be contentious.

Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2020-08-12 07:00:00 +00:00
Paul Hauner
b063df5bf9 Cross-compile to vendored x86_84, aarch64 (Raspberry Pi 4) (#1497)
## Issue Addressed

NA

## Proposed Changes

Adds support for using the [`cross`](https://github.com/rust-embedded/cross) project to produce cross-compiled binaries using Docker images.

Provides quite clean and simple cross-compiles cause all the complexity is hidden in Dockerfiles. It does require you to be in the `docker` group though.

## Details

- Adds shortcut commands to `Makefile`
- Ensures `reqwest` and `discv5` use vendored openssl libs (i.e., static not shared).
- Switches to a [commit](284f705964) of blst that has a renamed C function to avoid a collision with openssl (upstream issue: https://github.com/supranational/blst/issues/21).
- Updates `ring` to the latest satisfiable version, since an earlier version was causing issues with `cross`.
- Off-topic, but adds extra message about Windows support as suggested by Discord user.

## Additional Info

- ~~Blocked on #1495~~
- There are no tests in CI for this yet for a few reasons:
  - I'm hesitant to add more long-running tasks.
  - Short-term bitrot should be avoided since we'll use it each release.
  - In the long term I think it would be good to automate binary creation on a release.
- I observed the binaries increase in size from 50mb to 52mb after these changes.
2020-08-11 05:16:30 +00:00
divma
1a67d15701 Mitigate too many outgoing connections (#1469)
limit simultaneous outgoing connections attempts to a reasonable top as an extra layer of protection
also shift the keep alive logic of the rpc handler to avoid needing to update it by hand. I think In rare cases this could make shutting down a connection a bit faster.
2020-08-11 02:16:31 +00:00
realbigsean
ec84183e05 Add graffiti cli flag to the validator client. (#1425)
## Issue Addressed

#1419

## Proposed Changes

Creates a `--graffiti` cli flag in the validator client. If the flag is set, it overrides graffiti in the beacon node. 

## Additional Info
2020-08-11 02:16:29 +00:00
divma
95b55d7170 Block error display (#1503)
## Issue Addressed

#1486
2020-08-11 01:30:26 +00:00
Age Manning
134676fd6f Version bump to v0.2.2 (#1496)
Version bump to v0.2.2
2020-08-10 06:49:03 +00:00
Age Manning
cbfae87aa6 Upgrade logs (#1495)
## Issue Addressed

#1483 

## Proposed Changes

Upgrades the log to a critical if a listener fails. We are able to listen on many interfaces so a single instance is not critical. We should however gracefully shutdown the client if we have no listeners, although the client can still function solely on outgoing connections.

For now a critical is raised and I leave #1494 for more sophisticated handling of this. 

This also updates discv5 to handle errors of binding to a UDP socket such that lighthouse is now able to handle them.
2020-08-10 05:19:51 +00:00
Age Manning
04e4389efe Patch gossipsub (#1490)
## Issue Addressed

Some nodes not following head, high CPU usage and HTTP API delays

## Proposed Changes

Patches gossipsub. Gossipsub was using an `lru_time_cache` to check for duplicates. This contained an `O(N)` lookup for every gossipsub message to update the time cache. This was causing high cpu usage and blocking network threads. 

This PR introduces a custom cache without `O(N)` inserts. 

This also adds built in safety mechanisms to prevent gossipsub from excessively retrying connections upon failure. A maximum limit is set after which we disconnect from the node from too many failed substream connections.
2020-08-08 08:09:04 +00:00
Age Manning
08a31c5a1a Disconnect peers (#1484)
## Issue Addressed

Peers that connected after the peer limit may remain connected in some circumstances. 

This ensures peers not in the peer manager's list get disconnected. Further logging is also added to track this behaviour.
2020-08-08 06:08:44 +00:00
Age Manning
a1f9769040 Libp2p update (#1482)
Updates to latest libp2p master. 

This now has native noise support. 

This PR
- Removes secio support
- Prioritises mplex over yamux
2020-08-08 02:17:32 +00:00
Paul Hauner
0b287f6ece Push naive attestations into op pool (#1466)
## Issue Addressed

NA

## Proposed Changes

- When producing a block, go and ensure every attestation in the naive aggregation pool is included in the operation pool. This should help us increase the number of useful attestations in a block.
- Lift the `RwLock`s inside `NaiveAggregationPool` up into a single high-level lock. There were race conditions in the existing setup and it was hard to reason about.

## Additional Info

NA
2020-08-06 07:26:46 +00:00
divma
7d87e11e0f Fix rpc coded response display (#1470)
Prevent errors to be printed in debug mode
2020-08-06 04:29:23 +00:00
Pawan Dhananjay
983f768034 Remove ssz encoding support from rpc (#1457)
## Issue Addressed

Partially resolves #1422 

## Proposed Changes

Remove ssz encoding from req/resp in rpc.
2020-08-06 04:29:19 +00:00
divma
138c0cf7f0 Remove block clone (#1448)
## Issue Addressed

#1028 

A bit late, but I think if `BlockError` had a kind (the current `BlockError` minus everything on the variants that comes directly from the block) and the original block, more clones could be removed
2020-08-06 04:29:17 +00:00
Age Manning
09a615b2c0 Lighthouse crate v0.2.0 bump (#1450)
## Description

This PR marks Lighthouse v0.2.0. 

This release marks the stable version of Lighthouse, ready for the approaching Medalla testnet.
2020-08-06 03:43:05 +00:00
divma
924ba66218 Update v0.12.2 gossip params (#1449)
## Issue Addressed
#1422
2020-08-06 00:04:33 +00:00
Paul Hauner
5629126f45 Add reason to invalid attestation log (#1460)
## Issue Addressed

NA

## Proposed Changes

Adds an extra field to a debug log so we can see *why* an attestation was invalid.

## Additional Info

NA
2020-08-05 01:49:52 +00:00
Paul Hauner
f26adc0a36 Lighthouse v0.2.0 (Medalla) (#1452)
## Issue Addressed

NA

## Proposed Changes

- Moves the git-based versioning we were doing into the `lighthouse_version` crate in `common`.
- Removes the `beacon_node/version` crate, replacing it with `lighthouse_version`.
- Bumps the version to `v0.2.0`.

## Additional Info

There are now two types of version string:

1. `const VERSION: &str = Lighthouse/v0.2.0-1419501f2+`
1. `version_with_platform() = Lighthouse/v0.2.0-1419501f2+/x86_64-linux`

(1) is handy cause it's a `const` and shorter. (2) has platform info so it's more useful. Note that the plus-sign (`+`) indicates the the git commit is dirty (it used to be `(modified)` but I had to shorten it to fit into graffiti).

These version strings are now included on:

- `lighthouse --version`
- `lcli --version`
- `curl localhost:5052/node/version`
- p2p messages when we communicate our version

You can update the version by changing this constant (version is not related to a `Cargo.toml`):

b9ad7102d5/common/lighthouse_version/src/lib.rs (L4-L15)
2020-08-04 07:44:53 +00:00
divma
1bbecbcf26 Track gossip subscriptions as a metric (#1445)
## Issue Addressed
#1399 

## Proposed Changes
Set an Int gauge per topic and inc/dec when peers subscribe/unsubscribe
2020-08-04 04:18:10 +00:00
Age Manning
31707ccf45 Shift author to sigma prime on some crates (#1440)
Shifts the author to sigma prime on some crates
2020-08-04 02:31:41 +00:00
Age Manning
1419501f2e Update peerdb constants (#1444)
Increases the cache for disconnected and banned peers.
2020-08-03 12:48:22 +00:00
Age Manning
37679b8898
Update score decay behaviour (#1442) 2020-08-03 20:46:08 +10:00
Age Manning
f634f073a8 Correct issue with network message passing (#1439)
## Issue Addressed

Sync was breaking occasionally. The root cause appears to be identify crashing as events we being sent to the protocol after nodes were banned. Have not been able to reproduce sync issues since this update. 

## Proposed Changes

Only send messages to sub-behaviour protocols if the peer manager thinks the peer is connected. All other messages are dropped.
2020-08-03 09:35:53 +00:00
Age Manning
3b5da8f35f Gossipsub update (#1432)
## Issue Addressed

The most recent gossipsub update had an issue where some privacy settings lead to not sending a sequence number with the message. Although Lighthouse treats these as valid (based on current configuration) other clients may not. 

This corrects gossipsub to send sequence numbers where expected and based on the configuration settings.
2020-08-02 13:19:56 +00:00
divma
4d77784bb8 Rate limit RPC requests (#1402)
## Issue Addressed
#1056 

## Proposed Changes
- Add a rate limiter to the RPC behaviour. This also means the rate limiting occurs just before the door to the application level, so the number of connections a peer opens does not affect this (this would happen in the future if put on the handler)
- The algorithm used is the leaky bucket as a meter / token bucket implemented the GCRA way
- Each protocol has its own limit. Due to the way the algorithm works, the "small" protocols have a hard limit, while bbrange and bbroot allow [burstiness](https://www.wikiwand.com/en/Burstiness). This is so that a peer can't request hundreds of individual requests expecting only one block in a short period of time, it also allows a peer to send two half size requests instead of one with max if they want to without getting limited, and.. it also allows a peer to request a batch of the maximum size and then send _appropriately spaced_ requests of really small sizes. From what I've seen in sync this is plausible when reaching the target slot.

## Additional Info
Needs to be heavily tested
2020-07-31 05:47:09 +00:00
Age Manning
a37e75f44b
Downgrade sync and rpc warn logs (#1417)
* Downgrade sycn and rpc warn logs

* Correct warning
2020-07-30 13:52:44 +10:00
Age Manning
febb300a2d Limit incoming connection requests (#1413)
## Issue Addressed

Limits the number of incoming connections and adjusts the buffer sizes in libp2p
2020-07-29 06:39:30 +00:00
Paul Hauner
36d3d37cb4 Add support for multiple testnet flags (#1396)
## Issue Addressed

NA

## Proposed Changes

Allows for multiple "hardcoded" testnets.

## Additional Info

This PR is incomplete.

## TODO

- [x] Add flag to CLI, integrate with rest of Lighthouse.


Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2020-07-29 06:39:29 +00:00
Age Manning
395d99ce03 Sync update (#1412)
## Issue Addressed

Recurring sync loop and invalid batch downloading

## Proposed Changes

Shifts the batches to include the first slot of each epoch. This ensures the finalized is always downloaded once a chain has completed syncing. 

Also add in logic to prevent re-dialing disconnected peers. Non-performant peers get disconnected during sync, this prevents re-connection to these during sync. 

## Additional Info

N/A
2020-07-29 05:25:10 +00:00
Age Manning
ba0f3daf9d Gossipsub update (#1400)
## Issue Addressed

N/A

## Proposed Changes

This provides a number of corrections and improvements to gossipsub. Specifically
- Enables options for greater privacy around the message author
- Provides greater flexibility on message validation
- Prevents unvalidated messages from being gossiped
- Shifts the duplicate cache to a time-based cache inside gossipsub
- Updates the message-id to handle bytes
- Bug fixes related to mesh maintenance and topic subscription. This should improve our attestation inclusion rate.
2020-07-29 03:40:22 +00:00
realbigsean
09b40b7a5e Discover query grouping (#1364)
## Issue Addressed

#1281

## Proposed Changes

Groups queries for specific subnets into groups of up to 3.

## Additional Info
2020-07-29 02:43:50 +00:00
divma
9ae9df806c Fix clippy lints rpc (#1401)
## Issue Addressed
#1388 partially (eth2_libp2p & network)

## Proposed Changes 
TLDR at the end
- *Complex types* are 3 on the handlers/Behaviours but the types are `Poll<ComplexType>` where `ComplexType` comes from the traits of libp2p. Those, I don't thing are worth an alias. A couple more were from using tokio combinators and were removed writing things the async way and using [`BoxFuture`](https://docs.rs/futures/0.3.5/futures/future/type.BoxFuture.html)
- The *cognitive complexity*.. I tried to address those before (they come from the poll functions too) and tbh they are cognitively simpler to understand the way they are now. Moving separate parts to functions doesn't add much since that code is not repeated and they all do early returns. If moved those returns would now need to be wrapped in an Option, probably, and checked to be returned again. I would leave them like that but that's just preference.
- *Too many arguments*: They are not easily put together in a wrapping struct since the parameters don't relate semantically (Ex: fn new with a log, a reference to the chain, a peer, etc) but some may differ.
- *Needless returns* were indeed needless

## Additional Info
TLDR: removed needless return, used BoxFuture and async, left the rest untouched since those lgtm
2020-07-28 01:39:42 +00:00
Paul Hauner
0b5be9b2c0
Add info about peer scoring to block/attestation errors (#1393)
* Add comments to `BlockError`

* Add `AttnError` comments

* Clean up
2020-07-26 13:16:49 +10:00
Paul Hauner
e5d9d6179f Add info about valid deposit count to logs (#1391)
## Issue Addressed

NA

## Proposed Changes

Adds a `valid_deposits` field to the logs whilst waiting for genesis:

```
Jul 25 11:02:25.631 INFO Waiting for more deposits               valid_deposits: 3085, total_deposits: 3188, min_genesis_active_validators: 16384, service: beacon
```

In this example we can see there are `3188` deposits, but only `3085` have valid signatures.

## Additional Info

NA
2020-07-25 04:44:10 +00:00
Paul Hauner
b73c497be2 Support multiple BLS implementations (#1335)
## Issue Addressed

NA

## Proposed Changes

- Refactor the `bls` crate to support multiple BLS "backends" (e.g., milagro, blst, etc).
- Removes some duplicate, unused code in `common/rest_types/src/validator.rs`.
- Removes the old "upgrade legacy keypairs" functionality (these were unencrypted keys that haven't been supported for a few testnets, no one should be using them anymore).

## Additional Info

Most of the files changed are just inconsequential changes to function names.

## TODO

- [x] Optimization levels
- [x] Infinity point: https://github.com/supranational/blst/issues/11
- [x] Ensure milagro *and* blst are tested via CI
- [x] What to do with unsafe code?
- [x] Test infinity point in signature sets
2020-07-25 02:03:18 +00:00
blacktemplar
23a8f31f83 Fix clippy warnings (#1385)
## Issue Addressed

NA

## Proposed Changes

Fixes most clippy warnings and ignores the rest of them, see issue #1388.
2020-07-23 14:18:00 +00:00
divma
ba10c80633 Refactor inbound substream logic with async (#1325)
## Issue Addressed
#1112 

The logic is slightly different but still valid wrt to error handling.
- Inbound state is either Busy with a future that return the subtream (and info about the processing)
- The state machine works as follows:
  - `Idle` with pending responses => `Busy`
  - `Busy` => finished ? if so and there are new pending responses then `Busy`, if not then `Idle`
               => not finished remains `Busy`
- Add an `InboundInfo` for readability
- Other stuff:
  - Close inbound substreams when all expected responses are sent
  - Remove the error variants from `RPCCodedResponse` and use the codes instead
  - Fix various spelling mistakes because I got sloppy last time

Sorry for the delay

Co-authored-by: Age Manning <Age@AgeManning.com>
2020-07-23 12:30:43 +00:00
blacktemplar
3c4daec9af
replace max_peers cli argument by target_peers and use excess peers above target_peers capped by a new constant PEER_EXCESS_FACTOR (relative to target_peers) (#1383) 2020-07-23 13:55:36 +10:00
Pawan Dhananjay
3a888d6ef3 Fix early return from DepositLog parsing (#1382)
## Issue Addressed

N/A

## Proposed Changes

When parsing deposit logs, we were returning an error in case `PublicKeyBytes` or `SignatureBytes` didn't convert to valid bls `PublicKey` or `Signature` types. This would stall our import of deposit logs. 
Fixes this by returning `signature_is_valid` as `false` in `DepositLog` if the bytes are invalid `PublicKey/Signature` types.

Tested this fix on the Onyx deposit contract where the bug was observed and it works correctly as expected.
2020-07-22 10:24:37 +00:00
Akihito Nakano
41f7547645 Remove unused event handler function (#1377)
## Issue Addressed

`websocket_event_handler` has been unused since #1107.
2020-07-22 10:24:35 +00:00
Akihito Nakano
ea0e936ac4 Small improvement: encapsulate a public field (#1362)
## Issue Addressed

This PR makes the `Eth1Chain::use_dummy_backend` field private. I believe this could be good to ensure the consistency  of a Eth1Chain instance. 💡
2020-07-22 09:34:57 +00:00
blacktemplar
f61a7113ac
Do not send regular status updates during syncing (#1375) 2020-07-22 15:39:56 +10:00