Commit Graph

459 Commits

Author SHA1 Message Date
Paul Hauner
2b2a358522 Detailed validator monitoring (#2151)
## Issue Addressed

- Resolves #2064

## Proposed Changes

Adds a `ValidatorMonitor` struct which provides additional logging and Grafana metrics for specific validators.

Use `lighthouse bn --validator-monitor` to automatically enable monitoring for any validator that hits the [subnet subscription](https://ethereum.github.io/eth2.0-APIs/#/Validator/prepareBeaconCommitteeSubnet) HTTP API endpoint.

Also, use `lighthouse bn --validator-monitor-pubkeys` to supply a list of validators which will always be monitored.

See the new docs included in this PR for more info.

## TODO

- [x] Track validator balance, `slashed` status, etc.
- [x] ~~Register slashings in current epoch, not offense epoch~~
- [ ] Publish Grafana dashboard, update TODO link in docs
- [x] ~~#2130 is merged into this branch, resolve that~~
2021-01-20 19:19:38 +00:00
Paul Hauner
d9f940613f Represent slots in secs instead of millisecs (#2163)
## Issue Addressed

NA

## Proposed Changes

Copied from #2083, changes the config milliseconds_per_slot to seconds_per_slot to avoid errors when slot duration is not a multiple of a second. To avoid deserializing old serialized data (with milliseconds instead of seconds) the Serialize and Deserialize derive got removed from the Spec struct (isn't currently used anyway).

This PR replaces #2083 for the purpose of fixing a merge conflict without requiring the input of @blacktemplar.

## Additional Info

NA


Co-authored-by: blacktemplar <blacktemplar@a1.net>
2021-01-19 09:39:51 +00:00
Paul Hauner
805e152f66 Simplify enum -> str with strum (#2164)
## Issue Addressed

NA

## Proposed Changes

As per #2100, uses derives from the sturm library to implement AsRef<str> and AsStaticRef to easily get str values from enums without creating new Strings. Furthermore unifies all attestation error counter into one IntCounterVec vector.

These works are originally by @blacktemplar, I've just created this PR so I can resolve some merge conflicts.

## Additional Info

NA


Co-authored-by: blacktemplar <blacktemplar@a1.net>
2021-01-19 06:33:58 +00:00
realbigsean
7a71977987 Clippy 1.49.0 updates and dht persistence test fix (#2156)
## Issue Addressed

`test_dht_persistence` failing

## Proposed Changes

Bind `NetworkService::start` to an underscore prefixed variable rather than `_`.  `_` was causing it to be dropped immediately

This was failing 5/100 times before this update, but I haven't been able to get it to fail after updating it

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-01-19 00:34:28 +00:00
realbigsean
423dea169c update smallvec (#2152)
## Issue Addressed

`cargo audit` is failing because of a potential for an overflow in the version of `smallvec` we're using

## Proposed Changes

Update to the latest version of `smallvec`, which has the fix


Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-01-11 23:32:11 +00:00
Age Manning
2931b05582 Update libp2p (#2101)
This is a little bit of a tip-of-the-iceberg PR. It houses a lot of code changes in the libp2p dependency. 

This needs a bit of thorough testing before merging. 

The primary code changes are:
- General libp2p dependency update
- Gossipsub refactor to shift compression into gossipsub providing performance improvements and improved API for handling compression



Co-authored-by: Paul Hauner <paul@paulhauner.com>
2020-12-23 07:53:36 +00:00
Pawan Dhananjay
f998eff7ce Subnet discovery fixes (#2095)
## Issue Addressed

N/A

## Proposed Changes

Fixes multiple issues related to discovering of subnet peers.
1. Subnet discovery retries after yielding no results
2. Metadata updates if peer send older metadata
3. peerdb stores the peer subscriptions from gossipsub
2020-12-17 00:39:15 +00:00
Age Manning
dfb588e521 Softer penalties for missing blocks (#2075)
## Issue Addressed

Users are reporting errors for sending attestations to peers. If the clock sync is a little out or we receive attestations before blocks, peers are being too harshly penalized. They can get scored many times per missing block and we typically need these peers on subnets. 


## Proposed Changes

This removes the penalization for missing blocks with attestations. The penalty should be handled when #635 gets built as it will allow us to group attestations per missing block and penalize once.
2020-12-10 00:40:12 +00:00
Age Manning
4f85371ce8 Downgrades a valid log (#2057)
## Issue Addressed

#2046 

## Proposed Changes

The log was originally intended to verify the correct logic and ordering of events when scoring peers. The queued tasks can be structured in such a way that peers can be banned after they are disconnected. Therefore the error log is now downgraded to  debug log.
2020-12-08 10:48:45 +00:00
divma
f3200784b4 More metrics + RPC tweaks (#2041)
## Issue Addressed

NA

## Proposed Changes
This was mostly done to find the reason why LH was dropping peers from Nimbus. It proved to be useful so I think it's worth it. But there is also some functional stuff here
- Add metrics for rpc errors per client, error type and direction
- Add metrics for downscoring events per source type, client and penalty type
- Add metrics for gossip validation results per client for non-accepted messages
- Make the RPC handler return errors and requests/responses in the order we see them
- Allow a small burst for the Ping rate limit, from 1 every 5 seconds to 2 every 10 seconds
- Send rate limiting errors with a particular code and use that same code to identify them. I picked something different to 128 since that is most likely what other clients are using for their own errors
- Remove some unused code in the `PeerAction` and the rpc handler
- Remove the unused variant `RateLimited`. tTis was never produced directly, since the only way to get the request's protocol is via de handler. The handler upon receiving from LH a response with an error (rate limited in this case) emits this event with the missing info (It was always like this, just pointing out that we do downscore rate limiting errors regardless of the change)

Metrics for Nimbus looked like this:
Downscoring events: `increase(libp2p_peer_actions_per_client{client="Nimbus"}[5m])`
![image](https://user-images.githubusercontent.com/26765164/101210880-862bf280-3676-11eb-94c0-399f0bf5aa2e.png)

RPC Errors: `increase(libp2p_rpc_errors_per_client{client="Nimbus"}[5m])`
![image](https://user-images.githubusercontent.com/26765164/101210997-ba071800-3676-11eb-847a-f32405ede002.png)

Unaccepted gossip message: `increase(gossipsub_unaccepted_messages_per_client{client="Nimbus"}[5m])`
![image](https://user-images.githubusercontent.com/26765164/101211124-f470b500-3676-11eb-9459-132ecff058ec.png)
2020-12-08 03:55:50 +00:00
Michael Sproul
c1ec386d18 Pass failed gossip blocks to the slasher (#2047)
## Issue Addressed

Closes #2042

## Proposed Changes

Pass blocks that fail gossip verification to the slasher. Blocks that are successfully verified are not passed immediately, but will be passed as part of full block verification.
2020-12-04 05:03:30 +00:00
realbigsean
fdfb81a74a Server sent events (#1920)
## Issue Addressed

Resolves #1434 (this is the last major feature in the standard spec. There are only a couple of places we may be off-spec due to recent spec changes or ongoing discussion)
Partly addresses #1669
 
## Proposed Changes

- remove the websocket server
- remove the `TeeEventHandler` and `NullEventHandler` 
- add server sent events according to the eth2 API spec

## Additional Info

This is according to the currently unmerged PR here: https://github.com/ethereum/eth2.0-APIs/pull/117


Co-authored-by: realbigsean <seananderson33@gmail.com>
2020-12-04 00:18:58 +00:00
blacktemplar
d8cda2d86e Fix new clippy lints (#2036)
## Issue Addressed

NA

## Proposed Changes

Fixes new clippy lints in the whole project (mainly [manual_strip](https://rust-lang.github.io/rust-clippy/master/index.html#manual_strip) and [unnecessary_lazy_evaluations](https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_lazy_evaluations)). Furthermore, removes `to_string()` calls on literals when used with the `?`-operator.
2020-12-03 01:10:26 +00:00
divma
8fcd22992c No string in slog (#2017)
## Issue Addressed

Following slog's documentation, this should help a bit with string allocations. I left it run for two days and mem usage is lower. This is of course anecdotal, but shouldn't harm anyway 

## Proposed Changes

remove `String` creation in logs when possible
2020-11-30 10:33:00 +00:00
Age Manning
a567f788bd Upgrade to tokio 0.3 (#1839)
## Description

This PR updates Lighthouse to tokio 0.3. It includes a number of dependency updates and some structural changes as to how we create and spawn tasks.

This also brings with it a number of various improvements:

- Discv5 update
- Libp2p update
- Fix for recompilation issues
- Improved UPnP port mapping handling
- Futures dependency update
- Log downgrade to traces for rejecting peers when we've reached our max



Co-authored-by: blacktemplar <blacktemplar@a1.net>
2020-11-28 05:30:57 +00:00
divma
fc07cc3fdf Sync metrics (#1975)
## Issue Addressed
- Add metrics to keep track of peer counts by sync type
- Add metric to keep track of the number of syncing chains in range

## Proposed Changes
Plugin to the network metrics update interval and update too the counts for peers wrt to their sync status with us

## Additional Info
For the peer counts
- By the way it is implemented the numbers won't always match to the total peer count in the `libp2p` metric.
- Updating the gauge with every change is messy because it requires to be updated on connection (in the `eth2_libp2p` crate, while metrics are defined in the `network` crate) on Goodbye sent (for an `IrrelevantPeer`) either in the `beacon_processor` or the `peer_manager`, and on disconnection. Since this is not a critical metric I think counting once every second is enough. If you think more accuracy is needed we can do it too, but it would be harder to maintain)

ATM those look like this
![image](https://user-images.githubusercontent.com/26765164/100275387-22137b00-2f60-11eb-93b9-94b0f265240c.png)
2020-11-26 05:23:17 +00:00
divma
3b4afc27bf Status race condition (#1967)
## Issue Addressed

Sync stalls due to race conditions between dc notifications and status processing
2020-11-25 02:15:38 +00:00
divma
6f890c398e Sync Bug fixes (#1950)
## Issue Addressed

Two issues related to empty batches
- Chain target's was not being advanced when the batch was successful, empty and the chain didn't have an optimistic batch
- Not switching finalized chains. We now switch finalized chains requiring a minimum work first
2020-11-24 02:11:31 +00:00
Michael Sproul
5828ff1204 Implement slasher (#1567)
This is an implementation of a slasher that lives inside the BN and can be enabled via `lighthouse bn --slasher`.

Features included in this PR:

- [x] Detection of attester slashing conditions (double votes, surrounds existing, surrounded by existing)
- [x] Integration into Lighthouse's attestation verification flow
- [x] Detection of proposer slashing conditions
- [x] Extraction of attestations from blocks as they are verified
- [x] Compression of chunks
- [x] Configurable history length
- [x] Pruning of old attestations and blocks
- [x] More tests

Future work:

* Focus on a slice of history separate from the most recent N epochs (e.g. epochs `current - K` to `current - M`)
* Run out-of-process
* Ingest attestations from the chain without a resync

Design notes are here https://hackmd.io/@sproul/HJSEklmPL
2020-11-23 03:43:22 +00:00
Paul Hauner
65b1cf2af1 Add flag to import all attestations (#1941)
## Issue Addressed

NA

## Proposed Changes

Adds the `--import-all-attestations` flag which tells the `network::AttestationService` to import/aggregate all attestations after verification (instead of only ones for subnets that are relevant to local validators).

This is useful for testing/debugging and also for creating back-up nodes that should be all cached up and ready for any validator.

## Additional Info

NA
2020-11-22 23:58:25 +00:00
divma
d0cbf3111a move sync state to the chains KV (#1940)
## Issue Addressed
we have a log saying we add a peer to a chain, and an another one in case the chain is not syncing. To avoid needing to peer there two (and reduce log entries) simply log the chain's syncing state in the chain's KV
2020-11-22 23:58:23 +00:00
divma
d727e55abe Move some rpc processing to the beacon_processor (#1936)
## Issue Addressed
`BlocksByRange` requests were the main culprit of a series of timeouts to peer's requests in general because they produce build up in the router's processor. Those were moved to the blocking executor but a task is being spawned for each; also not ideal since the amount of resources we give to those is not controlled

## Proposed Changes
- Move `BlocksByRange` and `BlocksByRoots` to the `beacon_processor`. The processor crafts the responses and sends them.
- Move too the processing of `StatusMessage`s from other peers. This is a fast operation but it can also build up and won't scale if we keep it in the router (processing one at the time). These don't need to send an answer, so there is no harm in processing them "later" if that were to happen. Sending responses to status requests is still in the router, so we answer as soon as we see them.
- Some "extras" that are basically clean up:
  - Split the `Worker` logic in sync methods (chain processing and rpc blocks), gossip methods (the majority of methods) and rpc methods (the new ones)
  - Move the `status_message` function previously provided by the router's processor to a more central place since it is used by the router, sync, network_context and beacon_processor
 - Some spelling

## Additional Info
What's left to decide/test more thoroughly is the length of the queues and the priority rules. @paulhauner suggested at some point to put status above attestations, and @AgeManning had described an importance of "protecting gossipsub" so my solution is leaving status requests in the router and RPC methods below attestations. Slashings and Exits are at the end.
2020-11-19 23:33:44 +00:00
Pawan Dhananjay
e47739047d Add additional libp2p tests (#1867)
## Issue Addressed

N/A

## Proposed Changes

Adds tests for the eth2_libp2p crate.
2020-11-19 22:32:09 +00:00
blacktemplar
3408de8151 Avoid string initialization in network metrics and replace by &str where possible (#1898)
## Issue Addressed

NA

## Proposed Changes

Removes most of the temporary string initializations in network metrics and replaces them by directly using `&str`. This further improves on PR https://github.com/sigp/lighthouse/pull/1895.

For the subnet id handling the current approach uses a build script to create a static map. This has the disadvantage that the build script hardcodes the number of subnets. If we want to use more than 64 subnets we need to adjust this in the build script.

## Additional Info

We still have some string initializations for the enum `PeerKind`. To also replace that by `&str` I created a PR in the libp2p dependency: https://github.com/sigp/rust-libp2p/pull/91. Either we wait with merging until this dependency PR is merged (and all conflicts with the newest libp2p version are resolved) or we just merge as is and I will create another PR when the dependency is ready.
2020-11-18 23:31:37 +00:00
divma
398919b5d4 router: drop requests from peers that have dc'd (#1919)
## Issue Addressed

A peer might send a lot of requests that comply to the rate limit and the disconnect, this humongous pr makes sure we don't process them if the peer is not connected
2020-11-17 02:06:21 +00:00
Age Manning
49c4630045 Performance improvement for db reads (#1909)
This PR adds a number of improvements:
- Downgrade a warning log when we ignore blocks for gossipsub processing
- Revert a a correction to improve logging of peer score changes
- Shift syncing DB reads off the core-executor allowing parallel processing of large sync messages
- Correct the timeout logic of RPC chunk sends, giving more time before timing out RPC outbound messages.
2020-11-16 07:28:30 +00:00
divma
eb56140582 Update logs + do not downscore peers if WE time out (#1901)
## Issue Addressed

- RPC Errors were being logged twice: first in the peer manager and then again in the router, so leave just the peer manager's one 
- The "reduce peer count" warn message gets thrown to the user for every missed chunk, so instead print it when the request times out and also do not include there info that is not relevant to the user
- The processor didn't have the service tag so add it
- Impl `KV` for status message
- Do not downscore peers if we are the ones that timed out

Other small improvements
2020-11-16 04:06:14 +00:00
divma
8a16548715 Misc Peer sync info adjustments (#1896)
## Issue Addressed
#1856 

## Proposed Changes
- For clarity, the router's processor now only decides if a peer is compatible and it disconnects it or sends it to sync accordingly. No logic here regarding how useful is the peer. 
- Update peer_sync_info's rules
- Add an `IrrelevantPeer` sync status to account for incompatible peers (maybe this should be "IncompatiblePeer" now that I think about it?) this state is update upon receiving an internal goodbye in the peer manager
- Misc code cleanups
- Reduce the need to create `StatusMessage`s (and thus, `Arc` accesses )
- Add missing calls to update the global sync state

The overall effect should be:
- More peers recognized as Behind, and less as Unknown
- Peers identified as incompatible
2020-11-13 09:00:10 +00:00
Age Manning
c00e6c2c6f Small network adjustments (#1884)
## Issue Addressed

- Asymmetric pings - Currently with symmetric ping intervals, lighthouse nodes race each other to ping often ending in simultaneous ping connections. This shifts the ping interval to be asymmetric based on inbound/outbound connections
- Correct inbound/outbound peer-db registering - It appears we were accounting inbound as outbound and vice versa in the peerdb, this has been corrected
- Improved logging

There is likely more to come - I'll leave this open as we investigate further testnets
2020-11-13 06:06:33 +00:00
Paul Hauner
8772c02fa0 Reduce temp allocations in network metrics (#1895)
## Issue Addressed

Using `heaptrack` I could see that ~75% of Lighthouse temporary allocations are caused by temporary string allocations here.

## Proposed Changes

Reduces temporary `String` allocations when updating metrics in the `network` crate. The solution isn't perfect since we rebuild our caches with each call, but it's a significant improvement.

## Additional Info

NA
2020-11-13 04:19:38 +00:00
blacktemplar
7404f1ce54 Gossipsub scoring (#1668)
## Issue Addressed

#1606 

## Proposed Changes

Uses dynamic gossipsub scoring parameters depending on the number of active validators as specified in https://gist.github.com/blacktemplar/5c1862cb3f0e32a1a7fb0b25e79e6e2c.

## Additional Info

Although the parameters got tested on Medalla, extensive testing using simulations on larger networks is still to be done and we expect that we need to change the parameters, although this might only affect constants within the dynamic parameter framework.
2020-11-12 01:48:28 +00:00
Age Manning
e2ae5010a6 Update libp2p (#1865)
Updates libp2p to the latest version. 

This adds tokio 0.3 support and brings back yamux support. 

This also updates some discv5 configuration parameters for leaner discovery queries
2020-11-06 04:14:14 +00:00
divma
6c0c050fbb Tweak head syncing (#1845)
## Issue Addressed

Fixes head syncing

## Proposed Changes

- Get back to statusing peers after removing chain segments and making the peer manager deal with status according to the Sync status, preventing an old known deadlock
- Also a bug where a chain would get removed if the optimistic batch succeeds being empty

## Additional Info

Tested on Medalla and looking good
2020-11-01 23:37:39 +00:00
divma
9f45ac2f5e More sync edge cases + prettify range (#1834)
## Issue Addressed
Sync edge case when we get an empty optimistic batch that passes validation and is inside the download buffer. Eventually the chain would reach the batch and treat it as an ugly state. 

## Proposed Changes
- Handle the edge case advancing the chain's target + code clarification
- Some largey changes for readability + ergonomics since rust has try ops
- Better handling of bad batch and chain states
2020-10-29 02:29:24 +00:00
Age Manning
7870b81ade Downgrade libp2p (#1817)
## Description

This downgrades the recent libp2p upgrade. 

There were issues with the RPC which prevented syncing of the chain and this upgrade needs to be further investigated.
2020-10-23 09:33:59 +00:00
Age Manning
ae96dab5d2 Increase UPnP logging and decrease batch sizes (#1812)
## Description

This increases the logging of the underlying UPnP tasks to inform the user of UPnP error/success. 

This also decreases the batch syncing size to two epochs per batch.
2020-10-23 03:01:33 +00:00
Age Manning
c49dd94e20 Update to latest libp2p (#1810)
## Description

Updates to the latest libp2p and includes gossipsub updates. 

Of particular note is the limitation of a single topic per gossipsub message.

Co-authored-by: blacktemplar <blacktemplar@a1.net>
2020-10-23 03:01:31 +00:00
divma
668513b67e Sync state adjustments (#1804)
check for advanced peers and the state of the chain wrt the clock slot to decide if a chain is or not synced /transitioning to a head sync. Also a fix that prevented getting the right state while syncing heads
2020-10-22 00:26:06 +00:00
divma
2acf75785c More sync updates (#1791)
## Issue Addressed
#1614 and a couple of sync-stalling problems, the most important is a cyclic dependency between the sync manager and the peer manager
2020-10-20 22:34:18 +00:00
Michael Sproul
703c33bdc7 Fix head tracker concurrency bugs (#1771)
## Issue Addressed

Closes #1557

## Proposed Changes

Modify the pruning algorithm so that it mutates the head-tracker _before_ committing the database transaction to disk, and _only if_ all the heads to be removed are still present in the head-tracker (i.e. no concurrent mutations).

In the process of writing and testing this I also had to make a few other changes:

* Use internal mutability for all `BeaconChainHarness` functions (namely the RNG and the graffiti), in order to enable parallel calls (see testing section below).
* Disable logging in harness tests unless the `test_logger` feature is turned on

And chose to make some clean-ups:

* Delete the `NullMigrator`
* Remove type-based configuration for the migrator in favour of runtime config (simpler, less duplicated code)
* Use the non-blocking migrator unless the blocking migrator is required. In the store tests we need the blocking migrator because some tests make asserts about the state of the DB after the migration has run.
* Rename `validators_keypairs` -> `validator_keypairs` in the `BeaconChainHarness`

## Testing

To confirm that the fix worked, I wrote a test using [Hiatus](https://crates.io/crates/hiatus), which can be found here:

https://github.com/michaelsproul/lighthouse/tree/hiatus-issue-1557

That test can't be merged because it inserts random breakpoints everywhere, but if you check out that branch you can run the test with:

```
$ cd beacon_node/beacon_chain
$ cargo test --release --test parallel_tests --features test_logger
```

It should pass, and the log output should show:

```
WARN Pruning deferred because of a concurrent mutation, message: this is expected only very rarely!
```

## Additional Info

This is a backwards-compatible change with no impact on consensus.
2020-10-19 05:58:39 +00:00
Pawan Dhananjay
97be2ca295 Simulator and attestation service fixes (#1747)
## Issue Addressed

#1729 #1730 

Which issue # does this PR address?

## Proposed Changes

1. Fixes a bug in the simulator where nodes can't find each other due to 0 udp ports in their enr.
2. Fixes bugs in attestation service where we are unsubscribing from a subnet prematurely.

More testing is needed for attestation service fixes.
2020-10-15 07:11:31 +00:00
blacktemplar
8248afa793 Updates the message-id according to the Networking Spec (#1752)
## Proposed Changes

Implement the new message id function (see https://github.com/ethereum/eth2.0-specs/pull/2089) using an additional fast message id function for better performance + caching decompressed data.
2020-10-14 06:51:58 +00:00
Paul Hauner
0e4cc50262
Remove unused deps 2020-10-09 15:58:20 +11:00
Paul Hauner
db3e0578e9
Merge branch 'v0.3.0-staging' into v3-master 2020-10-09 15:27:08 +11:00
Paul Hauner
da44821e39
Clean up obsolete TODOs (#1734)
Squashed commit of the following:

commit f99373cbaec9adb2bdbae3f7e903284327962083
Author: Age Manning <Age@AgeManning.com>
Date:   Mon Oct 5 18:44:09 2020 +1100

    Clean up obsolute TODOs
2020-10-05 21:08:14 +11:00
Paul Hauner
ee7c8a0b7e Update external deps (#1711)
## Issue Addressed

- Resolves #1706 

## Proposed Changes

Updates dependencies across the workspace. Any crate that was not able to be brought to the latest version is listed in #1712.

## Additional Info

NA
2020-10-05 08:22:19 +00:00
Age Manning
240181e840
Upgrade discovery and restructure task execution (#1693)
* Initial rebase

* Remove old code

* Correct release tests

* Rebase commit

* Remove eth2-testnet dep on eth2libp2p

* Remove crates lost in rebase

* Remove unused dep
2020-10-05 18:45:54 +11:00
Age Manning
bcb629564a
Improve error handling in network processing (#1654)
* Improve error handling in network processing

* Cargo fmt

* Cargo fmt

* Improve error handling for prior genesis

* Remove dep
2020-10-05 17:34:56 +11:00
divma
113758a4f5
From panic to crit (#1726)
## Issue Addressed
Downgrade inconsistent chain segment states from `panic` to `crit`. I don't love this solution but since range can always bounce back from any of those, we don't panic.

Co-authored-by: Age Manning <Age@AgeManning.com>
2020-10-05 17:34:49 +11:00
divma
6997776494
Sync fixes (#1716)
## Issue Addressed

chain state inconsistencies

## Proposed Changes
- a batch can be fake-failed by Range if it needs to move a peer to another chain. The peer will still send blocks/ errors / produce timeouts for those  requests, so check when we get a response from the RPC that the request id matches, instead of only the peer, since a re-request can be directed to the same peer.
- if an optimistic batch succeeds, store the attempt to avoid trying it again when quickly switching chains. Also, use it only if ahead of our current target, instead of the segment's start epoch
2020-10-05 17:33:36 +11:00
Paul Hauner
e7eb99cb5e
Use Drop impl to send worker idle message (#1718)
## Issue Addressed

NA

## Proposed Changes

Uses a `Drop` implementation to help ensure that `BeaconProcessor` workers are freed. This will help prevent against regression, if someone happens to add an early return and it will also help in the case of a panic.

## Additional Info

NA
2020-10-05 17:33:25 +11:00
Age Manning
fe07a3c21c
Improve error handling in network processing (#1654)
* Improve error handling in network processing

* Cargo fmt

* Cargo fmt

* Improve error handling for prior genesis

* Remove dep
2020-10-05 17:30:43 +11:00
divma
b1c121b880 From panic to crit (#1726)
## Issue Addressed
Downgrade inconsistent chain segment states from `panic` to `crit`. I don't love this solution but since range can always bounce back from any of those, we don't panic.

Co-authored-by: Age Manning <Age@AgeManning.com>
2020-10-05 04:02:09 +00:00
divma
86a18e72c4 Sync fixes (#1716)
## Issue Addressed

chain state inconsistencies

## Proposed Changes
- a batch can be fake-failed by Range if it needs to move a peer to another chain. The peer will still send blocks/ errors / produce timeouts for those  requests, so check when we get a response from the RPC that the request id matches, instead of only the peer, since a re-request can be directed to the same peer.
- if an optimistic batch succeeds, store the attempt to avoid trying it again when quickly switching chains. Also, use it only if ahead of our current target, instead of the segment's start epoch
2020-10-04 23:49:14 +00:00
Paul Hauner
d72c026d32 Use Drop impl to send worker idle message (#1718)
## Issue Addressed

NA

## Proposed Changes

Uses a `Drop` implementation to help ensure that `BeaconProcessor` workers are freed. This will help prevent against regression, if someone happens to add an early return and it will also help in the case of a panic.

## Additional Info

NA
2020-10-04 21:59:20 +00:00
Sean
6af3bc9ce2
Add UPnP support for Lighthouse (#1587)
This commit was modified by Paul H whilst rebasing master onto
v0.3.0-staging

Adding UPnP support will help grow the DHT by allowing NAT traversal for peers with UPnP supported routers.

Using IGD library: https://docs.rs/igd/0.10.0/igd/

Adding the  the libp2p tcp port and discovery udp port. If this fails it simply logs the attempt and moves on

Co-authored-by: Age Manning <Age@AgeManning.com>
2020-10-03 10:07:47 +10:00
realbigsean
255cc25623
Weak subjectivity start from genesis (#1675)
This commit was edited by Paul H when rebasing from master to
v0.3.0-staging.

Solution 2 proposed here: https://github.com/sigp/lighthouse/issues/1435#issuecomment-692317639

- Adds an optional `--wss-checkpoint` flag that takes a string `root:epoch`
- Verify that the given checkpoint exists in the chain, or that the the chain syncs through this checkpoint. If not, shutdown and prompt the user to purge state before restarting.

Co-authored-by: Paul Hauner <paul@paulhauner.com>
2020-10-03 10:00:28 +10:00
Sean
94b17ce02b Add UPnP support for Lighthouse (#1587)
Adding UPnP support will help grow the DHT by allowing NAT traversal for peers with UPnP supported routers.

## Issue Addressed

#927 

## Proposed Changes

Using IGD library: https://docs.rs/igd/0.10.0/igd/

Adding the  the libp2p tcp port and discovery udp port. If this fails it simply logs the attempt and moves on

## Additional Info



Co-authored-by: Age Manning <Age@AgeManning.com>
2020-10-02 08:47:00 +00:00
realbigsean
9d2d6239cd Weak subjectivity start from genesis (#1675)
## Issue Addressed
Solution 2 proposed here: https://github.com/sigp/lighthouse/issues/1435#issuecomment-692317639

## Proposed Changes
- Adds an optional `--wss-checkpoint` flag that takes a string `root:epoch`
- Verify that the given checkpoint exists in the chain, or that the the chain syncs through this checkpoint. If not, shutdown and prompt the user to purge state before restarting.

## Additional Info


Co-authored-by: Paul Hauner <paul@paulhauner.com>
2020-10-01 01:41:58 +00:00
Michael Sproul
22aedda1be
Add database schema versioning (#1688)
## Issue Addressed

Closes #673

## Proposed Changes

Store a schema version in the database so that future releases can check they're running against a compatible database version. This would also enable automatic migration on breaking database changes, but that's left as future work.

The database config is also stored in the database so that the `slots_per_restore_point` value can be checked for consistency, which closes #673
2020-10-01 11:12:36 +10:00
Paul Hauner
cdec3cec18
Implement standard eth2.0 API (#1569)
- Resolves #1550
- Resolves #824
- Resolves #825
- Resolves #1131
- Resolves #1411
- Resolves #1256
- Resolve #1177

- Includes the `ShufflingId` struct initially defined in #1492. That PR is now closed and the changes are included here, with significant bug fixes.
- Implement the https://github.com/ethereum/eth2.0-APIs in a new `http_api` crate using `warp`. This replaces the `rest_api` crate.
- Add a new `common/eth2` crate which provides a wrapper around `reqwest`, providing the HTTP client that is used by the validator client and for testing. This replaces the `common/remote_beacon_node` crate.
- Create a `http_metrics` crate which is a dedicated server for Prometheus metrics (they are no longer served on the same port as the REST API). We now have flags for `--metrics`, `--metrics-address`, etc.
- Allow the `subnet_id` to be an optional parameter for `VerifiedUnaggregatedAttestation::verify`. This means it does not need to be provided unnecessarily by the validator client.
- Move `fn map_attestation_committee` in `mod beacon_chain::attestation_verification` to a new `fn with_committee_cache` on the `BeaconChain` so the same cache can be used for obtaining validator duties.
- Add some other helpers to `BeaconChain` to assist with common API duties (e.g., `block_root_at_slot`, `head_beacon_block_root`).
- Change the `NaiveAggregationPool` so it can index attestations by `hash_tree_root(attestation.data)`. This is a requirement of the API.
- Add functions to `BeaconChainHarness` to allow it to create slashings and exits.
- Allow for `eth1::Eth1NetworkId` to go to/from a `String`.
- Add functions to the `OperationPool` to allow getting all objects in the pool.
- Add function to `BeaconState` to check if a committee cache is initialized.
- Fix bug where `seconds_per_eth1_block` was not transferring over from `YamlConfig` to `ChainSpec`.
- Add the `deposit_contract_address` to `YamlConfig` and `ChainSpec`. We needed to be able to return it in an API response.
- Change some uses of serde `serialize_with` and `deserialize_with` to a single use of `with` (code quality).
- Impl `Display` and `FromStr` for several BLS fields.
- Check for clock discrepancy when VC polls BN for sync state (with +/- 1 slot tolerance). This is not intended to be comprehensive, it was just easy to do.

- See #1434 for a per-endpoint overview.
- Seeking clarity here: https://github.com/ethereum/eth2.0-APIs/issues/75

- [x] Add docs for prom port to close #1256
- [x] Follow up on this #1177
- [x] ~~Follow up with #1424~~ Will fix in future PR.
- [x] Follow up with #1411
- [x] ~~Follow up with  #1260~~ Will fix in future PR.
- [x] Add quotes to all integers.
- [x] Remove `rest_types`
- [x] Address missing beacon block error. (#1629)
- [x] ~~Add tests for lighthouse/peers endpoints~~ Wontfix
- [x] ~~Follow up with validator status proposal~~ Tracked in #1434
- [x] Unify graffiti structs
- [x] ~~Start server when waiting for genesis?~~ Will fix in future PR.
- [x] TODO in http_api tests
- [x] Move lighthouse endpoints off /eth/v1
- [x] Update docs to link to standard

- ~~Blocked on #1586~~

Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2020-10-01 11:12:36 +10:00
blacktemplar
ae28773965
Networking bug fixes (#1684)
* call correct unsubscribe method for subnets

* correctly delegate closed connections in behaviour

* correct unsubscribe method name
2020-09-29 18:28:15 +10:00
Paul Hauner
1ef4f0ea12 Add gossip conditions from spec v0.12.3 (#1667)
## Issue Addressed

NA

## Proposed Changes

There are four new conditions introduced in v0.12.3:

 1. _[REJECT]_ The attestation's epoch matches its target -- i.e. `attestation.data.target.epoch ==
  compute_epoch_at_slot(attestation.data.slot)`
1. _[REJECT]_ The attestation's target block is an ancestor of the block named in the LMD vote -- i.e.
  `get_ancestor(store, attestation.data.beacon_block_root, compute_start_slot_at_epoch(attestation.data.target.epoch)) == attestation.data.target.root`
1. _[REJECT]_ The committee index is within the expected range -- i.e. `data.index < get_committee_count_per_slot(state, data.target.epoch)`.
1. _[REJECT]_ The number of aggregation bits matches the committee size -- i.e.
  `len(attestation.aggregation_bits) == len(get_beacon_committee(state, data.slot, data.index))`.

This PR implements new logic to suit (1) and (2). Tests are added for (3) and (4), although they were already implicitly enforced.

## Additional Info

- There's a bit of edge-case with target root verification that I raised here: https://github.com/ethereum/eth2.0-specs/pull/2001#issuecomment-699246659
- I've had to add an `--ignore` to `cargo audit` to get CI to pass. See https://github.com/sigp/lighthouse/issues/1669
2020-09-27 20:59:40 +00:00
divma
b8013b7b2c Super Silky Smooth Syncs, like a Sir (#1628)
## Issue Addressed
In principle.. closes #1551 but in general are improvements for performance, maintainability and readability. The logic for the optimistic sync in actually simple

## Proposed Changes
There are miscellaneous things here:
- Remove unnecessary `BatchProcessResult::Partial` to simplify the batch validation logic
- Make batches a state machine. This is done to ensure batch state transitions respect our logic (this was previously done by moving batches between `Vec`s) and to ease the cognitive load of the `SyncingChain` struct
- Move most batch-related logic to the batch
- Remove `PendingBatches` in favor of a map of peers to their batches. This is to avoid duplicating peers inside the chain (peer_pool and pending_batches)
- Add `must_use` decoration to the `ProcessingResult` so that chains that request to be removed are handled accordingly. This also means that chains are now removed in more places than before to account for unhandled cases
- Store batches in a sorted map (`BTreeMap`) access is not O(1) but since the number of _active_ batches is bounded this should be fast, and saves performing hashing ops. Batches are indexed by the epoch they start. Sorted, to easily handle chain advancements (range logic)
- Produce the chain Id from the identifying fields: target root and target slot. This, to guarantee there can't be duplicated chains and be able to consistently search chains by either Id or checkpoint
- Fix chain_id not being present in all chain loggers
- Handle mega-edge case where the processor's work queue is full and the batch can't be sent. In this case the chain would lose the blocks, remain in a "syncing" state and waiting for a result that won't arrive, effectively stalling sync.
- When a batch imports blocks or the chain starts syncing with a local finalized epoch greater that the chain's start epoch, the chain is advanced instead of reset. This is to avoid losing download progress and validate batches faster. This also means that the old `start_epoch` now means "current first unvalidated batch", so it represents more accurately the progress of the chain.
- Batch status peers from the same chain to reduce Arc access.
- Handle a couple of cases where the retry counters for a batch were not updated/checked are now handled via the batch state machine. Basically now if we forget to do it, we will know.
- Do not send back the blocks from the processor to the batch. Instead register the attempt before sending the blocks (does not count as failed)
- When re-requesting a batch, try to avoid not only the last failed peer, but all previous failed peers.
- Optimize requesting batches ahead in the buffer by shuffling idle peers just once (this is just addressing a couple of old TODOs in the code)
- In chain_collection, store chains by their id in a map
- Include a mapping from request_ids to (chain, batch) that requested the batch to avoid the double O(n) search on block responses
- Other stuff:
  - impl `slog::KV` for batches
  - impl `slog::KV` for syncing chains
  - PSA: when logging, we can use `%thing` if `thing` implements `Display`. Same for `?` and `Debug`

### Optimistic syncing:
Try first the batch that contains the current head, if the batch imports any block, advance the chain. If not, if this optimistic batch is inside the current processing window leave it there for future use, if not drop it. The tolerance for this block is the same for downloading, but just once for processing



Co-authored-by: Age Manning <Age@AgeManning.com>
2020-09-23 06:29:55 +00:00
Age Manning
80e52a0263 Subscribe to core topics after sync (#1613)
## Issue Addressed

N/A

## Proposed Changes

Prevent subscribing to core gossipsub topics until after we have achieved a full sync. This prevents us censoring gossipsub channels, getting penalised in gossipsub 1.1 scoring and saves us computation time in attempting to validate gossipsub messages which we will be unable to do with a non-sync'd chain.
2020-09-23 03:26:33 +00:00
Pawan Dhananjay
a97ec318c4 Subscribe to subnets an epoch in advance (#1600)
## Issue Addressed

N/A

## Proposed Changes

Subscibe to subnet an epoch in advance of the attestation slot instead of 4 slots in advance.
2020-09-22 07:29:34 +00:00
Pawan Dhananjay
14ff38539c Add trusted peers (#1640)
## Issue Addressed

Closes #1581 

## Proposed Changes

Adds a new cli option for trusted peers who always have the maximum possible score.
2020-09-22 01:12:36 +00:00
Age Manning
1db8daae0c Shift metadata to the global network variables (#1631)
## Issue Addressed

N/A

## Proposed Changes

Shifts the local `metadata` to `network_globals` making it accessible to the HTTP API and other areas of lighthouse.

## Additional Info

N/A
2020-09-21 02:00:38 +00:00
Age Manning
c9596fcf0e Temporary Sync Work-Around (#1615)
## Issue Addressed

#1590 

## Proposed Changes

This is a temporary workaround that prevents finalized chain sync from swapping chains. I'm merging this in now until the full solution is ready.
2020-09-13 23:58:49 +00:00
Age Manning
c6abc56113 Prevent large step-size parameters (#1583)
## Issue Addressed

Malicious users could request very large block ranges, more than we expect. Although technically legal, we are now quadraticaly weighting large step sizes in the filter. Therefore users may request large skips, but not a large number of blocks, to prevent requests forcing us to do long chain lookups. 

## Proposed Changes

Weight the step parameter in the RPC filter and prevent any overflows that effect us in the step parameter.

## Additional Info
2020-09-11 02:33:36 +00:00
blacktemplar
7f1b936905 ignore too early / too late attestations instead of penalizing them (#1608)
## Issue Addressed

NA

## Proposed Changes

This ignores attestations that are too early or too late as it is specified in the spec (see https://github.com/ethereum/eth2.0-specs/blob/v0.12.1/specs/phase0/p2p-interface.md#global-topics first subpoint of `beacon_aggregate_and_proof`)
2020-09-11 01:43:15 +00:00
Age Manning
b19cf02d2d Penalise bad peer behaviour (#1602)
## Issue Addressed

#1386 

## Proposed Changes

Penalises peers in our scoring system that produce invalid attestations or blocks.
2020-09-10 03:51:06 +00:00
Age Manning
fb9d828e5e Extended Gossipsub metrics (#1577)
## Issue Addressed

N/A

## Proposed Changes

Adds extended metrics to get a better idea of what is happening at the gossipsub layer of lighthouse. This provides information about mesh statistics per topics, subscriptions and peer scores. 

## Additional Info
2020-09-01 06:59:14 +00:00
blacktemplar
c18d37c202 Use Gossipsub 1.1 (#1516)
## Issue Addressed

#1172

## Proposed Changes

* updates the libp2p dependency
* small adaptions based on changes in libp2p
* report not just valid messages but also invalid and distinguish between `IGNORE`d messages and `REJECT`ed messages


Co-authored-by: Age Manning <Age@AgeManning.com>
2020-08-30 13:06:50 +00:00
Adam Szkoda
d9f4819fe0 Alternative (to BeaconChainHarness) BeaconChain testing API (#1380)
The PR:

* Adds the ability to generate a crucial test scenario that isn't possible with `BeaconChainHarness` (i.e. two blocks occupying the same slot; previously forks necessitated skipping slots):

![image](https://user-images.githubusercontent.com/165678/88195404-4bce3580-cc40-11ea-8c08-b48d2e1d5959.png)

* New testing API: Instead of repeatedly calling add_block(), you generate a sorted `Vec<Slot>` and leave it up to the framework to generate blocks at those slots.
* Jumping backwards to an earlier epoch is a hard error, so that tests necessarily generate blocks in a epoch-by-epoch manner.
* Configures the test logger so that output is printed on the console in case a test fails.  The logger also plays well with `--nocapture`, contrary to the existing testing framework
* Rewrites existing fork pruning tests to use the new API
* Adds a tests that triggers finalization at a non epoch boundary slot
* Renamed `BeaconChainYoke` to `BeaconChainTestingRig` because the former has been too confusing
* Fixed multiple tests (e.g. `block_production_different_shuffling_long`, `delete_blocks_and_states`, `shuffling_compatible_simple_fork`) that relied on a weird (and accidental) feature of the old `BeaconChainHarness` that attestations aren't produced for epochs earlier than the current one, thus masking potential bugs in test cases.

Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2020-08-26 09:24:55 +00:00
Pawan Dhananjay
bbed42f30c Refactor attestation service (#1415)
## Issue Addressed

N/A

## Proposed Changes

Refactor attestation service to send out requests to find peers for subnets as soon as we get attestation duties. 
Earlier, we had much more involved logic to send the discovery requests to the discovery service only 6 slots before the attestation slot. Now that discovery is much smarter with grouped queries, the complexity in attestation service can be reduced considerably.



Co-authored-by: Age Manning <Age@AgeManning.com>
2020-08-19 08:46:25 +00:00
divma
fdc6e2aa8e Shutdown like a Sir (#1545)
## Issue Addressed
#1494 

## Proposed Changes
- Give the TaskExecutor the sender side of a channel that a task can clone to request shutting down
- The receiver side of this channel is in environment and now we block until ctrl+c or an internal shutdown signal is received
- The swarm now informs when it has reached 0 listeners
- The network receives this message and requests the shutdown
2020-08-19 05:51:14 +00:00
Paul Hauner
8e7dd7b2b1 Add remaining network ops to queuing system (#1546)
## Issue Addressed

NA

## Proposed Changes

- Refactors the `BeaconProcessor` to remove some excessive nesting and file bloat
  - Sorry about the noise from this, it's all contained in 4d3f8c5 though.
- Adds exits, proposer slashings, attester slashings to the `BeaconProcessor` so we don't get overwhelmed with large amounts of slashings (which happened a few hours ago).

## Additional Info

NA
2020-08-19 05:09:53 +00:00
Age Manning
2d0b214b57 Clean up logs (#1541)
## Description

This PR improves some logging for the end-user. 

It downgrades some warning logs and removes the slots per second sync speed if we are syncing and the speed is 0. This is likely because we are syncing from a finalised checkpoint and the head doesn't change.
2020-08-18 08:11:39 +00:00
Age Manning
8311074d68 Purge out-dated head chains on chain completion (#1538)
## Description

There can be many head chains queued up to complete. Currently we try and process all of these to completion before we consider the node synced. 

In a chaotic network, there can be many of these and processing them to completion can be very expensive and slow. This PR removes any non-syncing head chains from the queue, and re-status's the peers. If, after we have synced to head on one chain, there is still a valid head chain to download, it will be re-established once the status has been returned. 

This should assist with getting nodes to sync on medalla faster.
2020-08-18 05:22:34 +00:00
Age Manning
3bb30754d9 Keep track of failed head chains and prevent re-lookups (#1534)
## Overview

There are forked chains which get referenced by blocks and attestations on a network. Typically if these chains are very long, we stop looking up the chain and downvote the peer. In extreme circumstances, many peers are on many chains, the chains can be very deep and become time consuming performing lookups. 

This PR adds a cache to known failed chain lookups. This prevents us from starting a parent-lookup (or stopping one half way through) if we have attempted the chain lookup in the past.
2020-08-18 03:54:09 +00:00
Age Manning
cc44a64d15 Limit parallelism of head chain sync (#1527)
## Description

Currently lighthouse load-balances across peers a single finalized chain. The chain is selected via the most peers. Once synced to the latest finalized epoch Lighthouse creates chains amongst its peers and syncs them all in parallel amongst each peer (grouped by their current head block). 

This is typically fast and relatively efficient under normal operations. However if the chain has not finalized in a long time, the head chains can grow quite long. Peer's head chains will update every slot as new blocks are added to the head. Syncing all head chains in parallel is a bottleneck and highly inefficient in block duplication leads to RPC timeouts when attempting to handle all new heads chains at once. 

This PR limits the parallelism of head syncing chains to 2. We now sync at most two head chains at a time. This allows for the possiblity of sync progressing alongside a peer being slow and holding up one chain via RPC timeouts.
2020-08-18 02:49:24 +00:00
divma
46dbf027af Do not reset batch ids & redownload out of range batches (#1528)
The changes are somewhat simple but should solve two issues:
- When quickly changing between chains once and a second time back again, batchIds would collide and cause havoc. 
- If we got an out of range response from a peer, sync would remain in syncing but without advancing

Changes:
- remove the batch id. Identify each batch (inside a chain) by its starting epoch. Target epochs for downloading and processing now advance by EPOCHS_PER_BATCH
- for the same reason, move the "to_be_downloaded_id" to be an epoch
- remove a sneaky line that dropped an out of range batch without downloading it
- bonus: put the chain_id in the log given to the chain. This is why explicitly logging the chain_id is removed
2020-08-18 01:29:51 +00:00
Michael Sproul
719a69aee0 Ignore blocks that skip a large distance from their parent (#1530)
## Proposed Changes

To mitigate the impact of minority forks on RAM and disk usage, this change rejects blocks whose parent lies more than 320 slots (10 epochs, ~1 hour) in the past. The behaviour is configurable via `lighthouse bn --max-skip-slots N`, and can be turned off entirely using `--max-skip-slots none`.

Co-authored-by: Paul Hauner <paul@paulhauner.com>
2020-08-17 10:54:58 +00:00
Paul Hauner
f85485884f Process gossip blocks on the GossipProcessor (#1523)
## Issue Addressed

NA

## Proposed Changes

Moves beacon block processing over to the newly-added `GossipProcessor`. This moves the task off the core executor onto the blocking one.

## Additional Info

- With this PR, gossip blocks are being ignored during sync.
2020-08-17 09:20:27 +00:00
Age Manning
afdc4fea1d Correct logic for peer sync identification (#1525)
Fix a small sync bug which can mis-classify newly connected peers.
2020-08-17 03:00:10 +00:00
divma
113b40f321 Add multiaddr support in bootnodes (#1481)
## Issue Addressed
#1384 

Only catch, as currently implemented, when dialing the multiaddr nodes, there is no way to ask the peer manager if they are already connected or dialing
2020-08-17 02:13:26 +00:00
Paul Hauner
b0a3731fff Introduce a queue for attestations from the network (#1511)
## Issue Addressed

N/A

## Proposed Changes

Introduces the `GossipProcessor`, a multi-threaded (multi-tasked?), non-blocking processor for some messages from the network which require verification and import into the `BeaconChain`.

Initial testing indicates that this massively improves system stability by (a) moving block tasks from the normal executor (b) spreading out attestation load.

## Additional Info

TBC
2020-08-14 04:38:45 +00:00
divma
138c0cf7f0 Remove block clone (#1448)
## Issue Addressed

#1028 

A bit late, but I think if `BlockError` had a kind (the current `BlockError` minus everything on the variants that comes directly from the block) and the original block, more clones could be removed
2020-08-06 04:29:17 +00:00
Age Manning
09a615b2c0 Lighthouse crate v0.2.0 bump (#1450)
## Description

This PR marks Lighthouse v0.2.0. 

This release marks the stable version of Lighthouse, ready for the approaching Medalla testnet.
2020-08-06 03:43:05 +00:00
Paul Hauner
5629126f45 Add reason to invalid attestation log (#1460)
## Issue Addressed

NA

## Proposed Changes

Adds an extra field to a debug log so we can see *why* an attestation was invalid.

## Additional Info

NA
2020-08-05 01:49:52 +00:00
Age Manning
31707ccf45 Shift author to sigma prime on some crates (#1440)
Shifts the author to sigma prime on some crates
2020-08-04 02:31:41 +00:00
Age Manning
f634f073a8 Correct issue with network message passing (#1439)
## Issue Addressed

Sync was breaking occasionally. The root cause appears to be identify crashing as events we being sent to the protocol after nodes were banned. Have not been able to reproduce sync issues since this update. 

## Proposed Changes

Only send messages to sub-behaviour protocols if the peer manager thinks the peer is connected. All other messages are dropped.
2020-08-03 09:35:53 +00:00
Age Manning
a37e75f44b
Downgrade sync and rpc warn logs (#1417)
* Downgrade sycn and rpc warn logs

* Correct warning
2020-07-30 13:52:44 +10:00
Age Manning
395d99ce03 Sync update (#1412)
## Issue Addressed

Recurring sync loop and invalid batch downloading

## Proposed Changes

Shifts the batches to include the first slot of each epoch. This ensures the finalized is always downloaded once a chain has completed syncing. 

Also add in logic to prevent re-dialing disconnected peers. Non-performant peers get disconnected during sync, this prevents re-connection to these during sync. 

## Additional Info

N/A
2020-07-29 05:25:10 +00:00
Age Manning
ba0f3daf9d Gossipsub update (#1400)
## Issue Addressed

N/A

## Proposed Changes

This provides a number of corrections and improvements to gossipsub. Specifically
- Enables options for greater privacy around the message author
- Provides greater flexibility on message validation
- Prevents unvalidated messages from being gossiped
- Shifts the duplicate cache to a time-based cache inside gossipsub
- Updates the message-id to handle bytes
- Bug fixes related to mesh maintenance and topic subscription. This should improve our attestation inclusion rate.
2020-07-29 03:40:22 +00:00
realbigsean
09b40b7a5e Discover query grouping (#1364)
## Issue Addressed

#1281

## Proposed Changes

Groups queries for specific subnets into groups of up to 3.

## Additional Info
2020-07-29 02:43:50 +00:00
divma
9ae9df806c Fix clippy lints rpc (#1401)
## Issue Addressed
#1388 partially (eth2_libp2p & network)

## Proposed Changes 
TLDR at the end
- *Complex types* are 3 on the handlers/Behaviours but the types are `Poll<ComplexType>` where `ComplexType` comes from the traits of libp2p. Those, I don't thing are worth an alias. A couple more were from using tokio combinators and were removed writing things the async way and using [`BoxFuture`](https://docs.rs/futures/0.3.5/futures/future/type.BoxFuture.html)
- The *cognitive complexity*.. I tried to address those before (they come from the poll functions too) and tbh they are cognitively simpler to understand the way they are now. Moving separate parts to functions doesn't add much since that code is not repeated and they all do early returns. If moved those returns would now need to be wrapped in an Option, probably, and checked to be returned again. I would leave them like that but that's just preference.
- *Too many arguments*: They are not easily put together in a wrapping struct since the parameters don't relate semantically (Ex: fn new with a log, a reference to the chain, a peer, etc) but some may differ.
- *Needless returns* were indeed needless

## Additional Info
TLDR: removed needless return, used BoxFuture and async, left the rest untouched since those lgtm
2020-07-28 01:39:42 +00:00
blacktemplar
23a8f31f83 Fix clippy warnings (#1385)
## Issue Addressed

NA

## Proposed Changes

Fixes most clippy warnings and ignores the rest of them, see issue #1388.
2020-07-23 14:18:00 +00:00
Pawan Dhananjay
b885d79ac3
Fix attestation propagation (#1360)
* Add `should_process` for conditional processing of Attestations

* Remove ATTESTATIONS_IGNORED metric
2020-07-20 12:55:32 +10:00
Age Manning
f500b24242
Update smallvec (#1339) 2020-07-07 16:57:27 +10:00
Age Manning
5bc8fea2e0
Activate peer scoring (#1284)
* Initial score structure

* Peer manager update

* Updates to dialing

* Correct tests

* Correct typos and remove unused function

* Integrate scoring into the network crate

* Clean warnings

* Formatting

* Shift core functionality into the behaviour

* Temp commit

* Shift disconnections into the behaviour

* Temp commit

* Update libp2p and gossipsub

* Remove gossipsub lru cache

* Correct merge conflicts

* Modify handler and correct tests

* Update enr network globals on socket update

* Apply clippy lints

* Add new prysm fingerprint

* More clippy fixes
2020-07-07 10:13:16 +10:00
Paul Hauner
e429c3eefe
Remove old block processing shim (#1327)
* Remove old block processing shim

* Run rustfmt

* Fix log formatting

* Swap peer ids over to display
2020-07-06 16:28:00 +10:00
Paul Hauner
25cd91ce26
Update deps (#1322)
* Run cargo update

* Upgrade prometheus

* Update hex

* Upgrade parking-lot

* Upgrade num-bigint

* Upgrade sha2

* Update dockerfile Rust version

* Run cargo update
2020-07-06 11:55:56 +10:00
Age Manning
9fc290a344
Add waker to attestation service (#1305)
* Add waker to attestation service

* Formatting
2020-06-28 22:29:27 +10:00
Paul Hauner
6e7d5c6a7c
Add metrics for validator subscriptions (#1302) 2020-06-28 10:47:03 +10:00
Michael Sproul
7688b5f1dd
Merge remote-tracking branch 'origin/master' into spec-v0.12 2020-06-26 12:57:56 +10:00
pscott
02174e21d8
Fix clippy's performance lints (#1286)
* Fix clippy perf lints

* Cargo fmt

* Add  and  to lint rule in Makefile

* Fix some leftover clippy lints
2020-06-26 00:04:08 +10:00
Paul Hauner
decea48c78
Merge branch 'master' into spec-v0.12 2020-06-21 10:33:02 +10:00
Age Manning
710409c2ba
Userland clean up (#1277)
* Improve logging, remove unused CLI and move discovery

* Correct tests

* Handle flag correctly
2020-06-20 09:34:28 +10:00
Age Manning
e379ad0f4e
Silky smooth discovery (#1274)
* Initial structural re-write

* Improving discovery update and correcting attestation service logic

* Rework discovery.mod

* Handling lifetimes of query futures

* Discovery update first draft

* format fixes

* Stabalise discv5 update

* Formatting corrections

* Limit FindPeers queries and bug correction

* Update to stable release discv5

* Remove unnecessary pin

* formatting
2020-06-19 14:13:23 +10:00
Michael Sproul
9450a0f30d
Merge remote-tracking branch 'origin/master' into spec-v0.12 2020-06-18 21:59:59 +10:00
Michael Sproul
bcb6afa0aa
Process exits and slashings off the network (#1253)
* Process exits and slashings off the network

* Fix rest_api tests

* Add op verification tests

* Add tests for pruning of slashings in the op pool

* Address Paul's review comments
2020-06-18 21:06:34 +10:00
Pawan Dhananjay
3199b1a6f2
Use all attestation subnets (#1257)
* Update `milagro_bls` to new release (#1183)

* Update milagro_bls to new release

Signed-off-by: Kirk Baird <baird.k@outlook.com>

* Tidy up fake cryptos

Signed-off-by: Kirk Baird <baird.k@outlook.com>

* move SecretHash to bls and put plaintext back

Signed-off-by: Kirk Baird <baird.k@outlook.com>

* Update v0.12.0 to v0.12.1

* Add compute_subnet_for_attestation

* Replace CommitteeIndex topic with Attestation

* Fix warnings

* Fix attestation service tests

* fmt

* Appease clippy

* return error from validator_subscriptions

* move state out of loop

* Fix early break on error

* Get state from slot clock

* Fix beacon state in attestation tests

* Add failing test for lookahead > 1

* Minor change

* Address some review comments

* Add subnet verification to beacon chain

* Move subnet verification to processor

* Pass committee_count_at_slot to ValidatorDuty and ValidatorSubscription

* Pass subnet id for publishing attestations

* Fix attestation service tests

* Fix more tests

* Fix fork choice test

* Remove unused code

* Remove more unused and expensive code

Co-authored-by: Kirk Baird <baird.k@outlook.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: Age Manning <Age@AgeManning.com>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
2020-06-18 19:11:03 +10:00
divma
065251b701
Add DC/Shutdown capabilities to the behaviour handler (#1233)
* Remove ban event from the PM

* Fix dispatching of responses to peer's requests

* Disconnection logic
2020-06-18 11:53:08 +10:00
Michael Sproul
e6f97bf466
Merge remote-tracking branch 'origin/master' into spec-v0.12 2020-06-17 12:34:11 +10:00
Paul Hauner
764cb2d32a
v0.12 fork choice update (#1229)
* Incomplete scraps

* Add progress on new fork choice impl

* Further progress

* First complete compiling version

* Remove chain reference

* Add new lmd_ghost crate

* Start integrating into beacon chain

* Update `milagro_bls` to new release (#1183)

* Update milagro_bls to new release

Signed-off-by: Kirk Baird <baird.k@outlook.com>

* Tidy up fake cryptos

Signed-off-by: Kirk Baird <baird.k@outlook.com>

* move SecretHash to bls and put plaintext back

Signed-off-by: Kirk Baird <baird.k@outlook.com>

* Update state processing for v0.12

* Fix EF test runners for v0.12

* Fix some tests

* Fix broken attestation verification test

* More test fixes

* Rough beacon chain impl working

* Remove fork_choice_2

* Remove checkpoint manager

* Half finished ssz impl

* Add missed file

* Add persistence

* Tidy, fix some compile errors

* Remove RwLock from ProtoArrayForkChoice

* Fix store-based compile errors

* Add comments, tidy

* Move function out of ForkChoice struct

* Start testing

* More testing

* Fix compile error

* Tidy beacon_chain::fork_choice

* Queue attestations from the current slot

* Allow fork choice to handle prior-to-genesis start

* Improve error granularity

* Test attestation dequeuing

* Process attestations during block

* Store target root in fork choice

* Move fork choice verification into new crate

* Update tests

* Consensus updates for v0.12 (#1228)

* Update state processing for v0.12

* Fix EF test runners for v0.12

* Fix some tests

* Fix broken attestation verification test

* More test fixes

* Fix typo found in review

* Add `Block` struct to ProtoArray

* Start fixing get_ancestor

* Add rough progress on testing

* Get fork choice tests working

* Progress with testing

* Fix partialeq impl

* Move slot clock from fc_store

* Improve testing

* Add testing for best justified

* Add clone back to SystemTimeSlotClock

* Add balances test

* Start adding balances cache again

* Wire-in balances cache

* Improve tests

* Remove commented-out tests

* Remove beacon_chain::ForkChoice

* Rename crates

* Update wider codebase to new fork_choice layout

* Move advance_slot in test harness

* Tidy ForkChoice::update_time

* Fix verification tests

* Fix compile error with iter::once

* Fix fork choice tests

* Ensure block attestations are processed

* Fix failing beacon_chain tests

* Add first invalid block check

* Add finalized block check

* Progress with testing, new store builder

* Add fixes to get_ancestor

* Fix old genesis justification test

* Fix remaining fork choice tests

* Change root iteration method

* Move on_verified_block

* Remove unused method

* Start adding attestation verification tests

* Add invalid ffg target test

* Add target epoch test

* Add queued attestation test

* Remove old fork choice verification tests

* Tidy, add test

* Move fork choice lock drop

* Rename BeaconForkChoiceStore

* Add comments, tidy BeaconForkChoiceStore

* Update metrics, rename fork_choice_store.rs

* Remove genesis_block_root from ForkChoice

* Tidy

* Update fork_choice comments

* Tidy, add comments

* Tidy, simplify ForkChoice, fix compile issue

* Tidy, removed dead file

* Increase http request timeout

* Fix failing rest_api test

* Set HTTP timeout back to 5s

* Apply fix to get_ancestor

* Address Michael's comments

* Fix typo

* Revert "Fix broken attestation verification test"

This reverts commit 722cdc903b12611de27916a57eeecfa3224f2279.

Co-authored-by: Kirk Baird <baird.k@outlook.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2020-06-17 11:10:22 +10:00
Adam Szkoda
9db0c28051
Make key value storage abstractions more accurate (#1267)
* Layer do_atomically() abstractions properly

* Reduce allocs and DRY get_key_for_col()

* Parameterize HotColdDB with hot and cold item stores

* -impl Store for MemoryStore

* Replace Store uses with HotColdDB

* Ditch Store trait

* cargo fmt

* Style fix

* Readd missing dep that broke the build
2020-06-16 11:34:04 +10:00
Michael Sproul
7818447fd2
Check for unused deps in CI (#1262)
* Check for unused deps in CI

* Bump slashing protection parking_lot version
2020-06-14 10:59:50 +10:00
Pawan Dhananjay
bb8b88edcf
Use SSZ types in rpc (#1244)
* Update `milagro_bls` to new release (#1183)

* Update milagro_bls to new release

Signed-off-by: Kirk Baird <baird.k@outlook.com>

* Tidy up fake cryptos

Signed-off-by: Kirk Baird <baird.k@outlook.com>

* move SecretHash to bls and put plaintext back

Signed-off-by: Kirk Baird <baird.k@outlook.com>

* Update v0.12.0 to v0.12.1

* Use ssz types for Request and error types

* Fix errors

* Constrain BlocksByRangeRequest count to MAX_REQUEST_BLOCKS

* Fix issues after rebasing

* Address review comments

Co-authored-by: Kirk Baird <baird.k@outlook.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: Age Manning <Age@AgeManning.com>
2020-06-12 10:04:50 +10:00
Age Manning
2dfe77a8f9
Handle syncing edge case (#1258) 2020-06-11 12:06:42 +10:00
Adam Szkoda
7f036a6e95
Add error handling to iterators (#1243)
* Add error handling to iterators

* Review feedback

* Leverage itertools::process_results() in few places
2020-06-10 09:55:44 +10:00
realbigsean
036096ef61
add retry logic to peer discovery and an expiration time for peers (#1203)
* add retry logic to peer discovery and an expiration time for peers

* Restructure discovery

* Add mac build to CI

* Always return an error for Health when not linux

* Change macos workflow

* Rename macos tests

* Update DiscoverPeers messages to pass Instants. Implement PartialEq for AttServiceMessage

* update discover peer queueing to always check existing messages and extend min_ttl as necessary

* update method name and comment

* Correct merge issues

* Add subnet id check to partialeq, fix discover peer message dups

* fix discover peer message dups

* fix discover peer message dups for real this time

Co-authored-by: Age Manning <Age@AgeManning.com>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
2020-06-05 14:55:03 +10:00
divma
0e37a16927
Super tiny RPC refactor (#1187)
* wip: mwake the request id optional

* make the request_id optional

* cleanup

* address clippy lints inside rpc

* WIP: Separate sent RPC events from received ones

* WIP: Separate sent RPC events from received ones

* cleanup

* Separate request ids from substream ids

* Make RPC's message handling independent of RequestIds

* Change behaviour RPC events to be more outside-crate friendly

* Propage changes across the network + router + processor

* Propage changes across the network + router + processor

* fmt

* "tiny" refactor

* more tiny refactors

* fmt eth2-libp2p

* wip: propagating changes

* wip: propagating changes

* cleaning up

* more cleanup

* fmt

* tests HOT fix

Co-authored-by: Age Manning <Age@AgeManning.com>
2020-06-05 13:07:59 +10:00
Pawan Dhananjay
042e80570c
Improve tokio task execution (#1181)
* Add logging on shutdown

* Replace tokio::spawn with handle.spawn

* Upgrade tokio

* Add a task executor

* Beacon chain tasks use task executor

* Validator client tasks use task executor

* Rename runtime_handle to executor

* Add duration histograms; minor fixes

* Cleanup

* Fix logs

* Fix tests

* Remove random file

* Get enr dependency instead of libp2p

* Address some review comments

* Libp2p takes a TaskExecutor

* Ugly fix libp2p tests

* Move TaskExecutor to own file

* Upgrade Dockerfile rust version

* Minor fixes

* Revert "Ugly fix libp2p tests"

This reverts commit 58d4bb690f52de28d893943b7504d2d0c6621429.

* Pretty fix libp2p tests

* Add spawn_without_exit; change Counter to Gauge

* Tidy

* Move log from RuntimeContext to TaskExecutor

* Fix errors

* Replace histogram with int_gauge for async tasks

* Fix todo

* Fix memory leak in test by exiting all spawned tasks at the end
2020-06-04 21:48:05 +10:00
Adam Szkoda
91cb14ac41
Clean up database abstractions (#1200)
* Remove redundant method

* Pull out a method out of a struct

* More precise db access abstractions

* Move fake trait method out of it

* cargo fmt

* Fix compilation error after refactoring

* Move another fake method out the Store trait

* Get rid of superfluous method

* Fix refactoring bug

* Rename: SimpleStoreItem -> StoreItem

* Get rid of the confusing DiskStore type alias

* Get rid of SimpleDiskStore type alias

* Correction: A method took both self and a ref to Self
2020-06-01 08:13:49 +10:00
realbigsean
ea56dcb179
fix attestation service tests (#1167) 2020-05-22 12:09:22 +10:00
Age Manning
dd51a72f1f
Client identification (#1158)
* Add logs and client identification

* Add client to RPC Error log

* Remove attestation service tests
2020-05-18 21:35:14 +10:00
Paul Hauner
4331834003
Directory Restructure (#1163)
* Move tests -> testing

* Directory restructure

* Update Cargo.toml during restructure

* Update Makefile during restructure

* Fix arbitrary path
2020-05-18 21:24:23 +10:00
Paul Hauner
90b3953dda
v0.1.2 (#1155)
* Version downgrade

* Start updating docs and version tags

Co-authored-by: Age Manning <Age@AgeManning.com>
2020-05-18 15:05:23 +10:00
Age Manning
b6408805a2
Stable futures (#879)
* Port eth1 lib to use stable futures

* Port eth1_test_rig to stable futures

* Port eth1 tests to stable futures

* Port genesis service to stable futures

* Port genesis tests to stable futures

* Port beacon_chain to stable futures

* Port lcli to stable futures

* Fix eth1_test_rig (#1014)

* Fix lcli

* Port timer to stable futures

* Fix timer

* Port websocket_server to stable futures

* Port notifier to stable futures

* Add TODOS

* Update hashmap hashset to stable futures

* Adds panic test to hashset delay

* Port remote_beacon_node to stable futures

* Fix lcli merge conflicts

* Non rpc stuff compiles

* protocol.rs compiles

* Port websockets, timer and notifier to stable futures (#1035)

* Fix lcli

* Port timer to stable futures

* Fix timer

* Port websocket_server to stable futures

* Port notifier to stable futures

* Add TODOS

* Port remote_beacon_node to stable futures

* Partial eth2-libp2p stable future upgrade

* Finished first round of fighting RPC types

* Further progress towards porting eth2-libp2p adds caching to discovery

* Update behaviour

* RPC handler to stable futures

* Update RPC to master libp2p

* Network service additions

* Fix the fallback transport construction (#1102)

* Correct warning

* Remove hashmap delay

* Compiling version of eth2-libp2p

* Update all crates versions

* Fix conversion function and add tests (#1113)

* Port validator_client to stable futures (#1114)

* Add PH & MS slot clock changes

* Account for genesis time

* Add progress on duties refactor

* Add simple is_aggregator bool to val subscription

* Start work on attestation_verification.rs

* Add progress on ObservedAttestations

* Progress with ObservedAttestations

* Fix tests

* Add observed attestations to the beacon chain

* Add attestation observation to processing code

* Add progress on attestation verification

* Add first draft of ObservedAttesters

* Add more tests

* Add observed attesters to beacon chain

* Add observers to attestation processing

* Add more attestation verification

* Create ObservedAggregators map

* Remove commented-out code

* Add observed aggregators into chain

* Add progress

* Finish adding features to attestation verification

* Ensure beacon chain compiles

* Link attn verification into chain

* Integrate new attn verification in chain

* Remove old attestation processing code

* Start trying to fix beacon_chain tests

* Split adding into pools into two functions

* Add aggregation to harness

* Get test harness working again

* Adjust the number of aggregators for test harness

* Fix edge-case in harness

* Integrate new attn processing in network

* Fix compile bug in validator_client

* Update validator API endpoints

* Fix aggreagation in test harness

* Fix enum thing

* Fix attestation observation bug:

* Patch failing API tests

* Start adding comments to attestation verification

* Remove unused attestation field

* Unify "is block known" logic

* Update comments

* Supress fork choice errors for network processing

* Add todos

* Tidy

* Add gossip attn tests

* Disallow test harness to produce old attns

* Comment out in-progress tests

* Partially address pruning tests

* Fix failing store test

* Add aggregate tests

* Add comments about which spec conditions we check

* Dont re-aggregate

* Split apart test harness attn production

* Fix compile error in network

* Make progress on commented-out test

* Fix skipping attestation test

* Add fork choice verification tests

* Tidy attn tests, remove dead code

* Remove some accidentally added code

* Fix clippy lint

* Rename test file

* Add block tests, add cheap block proposer check

* Rename block testing file

* Add observed_block_producers

* Tidy

* Switch around block signature verification

* Finish block testing

* Remove gossip from signature tests

* First pass of self review

* Fix deviation in spec

* Update test spec tags

* Start moving over to hashset

* Finish moving observed attesters to hashmap

* Move aggregation pool over to hashmap

* Make fc attn borrow again

* Fix rest_api compile error

* Fix missing comments

* Fix monster test

* Uncomment increasing slots test

* Address remaining comments

* Remove unsafe, use cfg test

* Remove cfg test flag

* Fix dodgy comment

* Revert "Update hashmap hashset to stable futures"

This reverts commit d432378a3cc5cd67fc29c0b15b96b886c1323554.

* Revert "Adds panic test to hashset delay"

This reverts commit 281502396fc5b90d9c421a309c2c056982c9525b.

* Ported attestation_service

* Ported duties_service

* Ported fork_service

* More ports

* Port block_service

* Minor fixes

* VC compiles

* Update TODOS

* Borrow self where possible

* Ignore aggregates that are already known.

* Unify aggregator modulo logic

* Fix typo in logs

* Refactor validator subscription logic

* Avoid reproducing selection proof

* Skip HTTP call if no subscriptions

* Rename DutyAndState -> DutyAndProof

* Tidy logs

* Print root as dbg

* Fix compile errors in tests

* Fix compile error in test

* Re-Fix attestation and duties service

* Minor fixes

Co-authored-by: Paul Hauner <paul@paulhauner.com>

* Network crate update to stable futures

* Port account_manager to stable futures (#1121)

* Port account_manager to stable futures

* Run async fns in tokio environment

* Port rest_api crate to stable futures (#1118)

* Port rest_api lib to stable futures

* Reduce tokio features

* Update notifier to stable futures

* Builder update

* Further updates

* Convert self referential async functions

* stable futures fixes (#1124)

* Fix eth1 update functions

* Fix genesis and client

* Fix beacon node lib

* Return appropriate runtimes from environment

* Fix test rig

* Refactor eth1 service update

* Upgrade simulator to stable futures

* Lighthouse compiles on stable futures

* Remove println debugging statement

* Update libp2p service, start rpc test upgrade

* Update network crate for new libp2p

* Update tokio::codec to futures_codec (#1128)

* Further work towards RPC corrections

* Correct http timeout and network service select

* Use tokio runtime for libp2p

* Revert "Update tokio::codec to futures_codec (#1128)"

This reverts commit e57aea924acf5cbabdcea18895ac07e38a425ed7.

* Upgrade RPC libp2p tests

* Upgrade secio fallback test

* Upgrade gossipsub examples

* Clean up RPC protocol

* Test fixes (#1133)

* Correct websocket timeout and run on os thread

* Fix network test

* Clean up PR

* Correct tokio tcp move attestation service tests

* Upgrade attestation service tests

* Correct network test

* Correct genesis test

* Test corrections

* Log info when block is received

* Modify logs and update attester service events

* Stable futures: fixes to vc, eth1 and account manager (#1142)

* Add local testnet scripts

* Remove whiteblock script

* Rename local testnet script

* Move spawns onto handle

* Fix VC panic

* Initial fix to block production issue

* Tidy block producer fix

* Tidy further

* Add local testnet clean script

* Run cargo fmt

* Tidy duties service

* Tidy fork service

* Tidy ForkService

* Tidy AttestationService

* Tidy notifier

* Ensure await is not suppressed in eth1

* Ensure await is not suppressed in account_manager

* Use .ok() instead of .unwrap_or(())

* RPC decoding test for proto

* Update discv5 and eth2-libp2p deps

* Fix lcli double runtime issue (#1144)

* Handle stream termination and dialing peer errors

* Correct peer_info variant types

* Remove unnecessary warnings

* Handle subnet unsubscription removal and improve logigng

* Add logs around ping

* Upgrade discv5 and improve logging

* Handle peer connection status for multiple connections

* Improve network service logging

* Improve logging around peer manager

* Upgrade swarm poll centralise peer management

* Identify clients on error

* Fix `remove_peer` in sync (#1150)

* remove_peer removes from all chains

* Remove logs

* Fix early return from loop

* Improved logging, fix panic

* Partially correct tests

* Stable futures: Vc sync (#1149)

* Improve syncing heuristic

* Add comments

* Use safer method for tolerance

* Fix tests

* Stable futures: Fix VC bug, update agg pool, add more metrics (#1151)

* Expose epoch processing summary

* Expose participation metrics to prometheus

* Switch to f64

* Reduce precision

* Change precision

* Expose observed attesters metrics

* Add metrics for agg/unagg attn counts

* Add metrics for gossip rx

* Add metrics for gossip tx

* Adds ignored attns to prom

* Add attestation timing

* Add timer for aggregation pool sig agg

* Add write lock timer for agg pool

* Add more metrics to agg pool

* Change map lock code

* Add extra metric to agg pool

* Change lock handling in agg pool

* Change .write() to .read()

* Add another agg pool timer

* Fix for is_aggregator

* Fix pruning bug

Co-authored-by: pawan <pawandhananjay@gmail.com>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
2020-05-17 11:16:48 +00:00
Age Manning
1cb274008d
Handles BlocksByRange step parameter around skip slots (#1146) 2020-05-14 22:41:02 +10:00
realbigsean
2692c779a7
Attestation service test suite (#1070)
* add attestation service tests

* fix cargo fmt

Co-authored-by: Age Manning <Age@AgeManning.com>
2020-05-10 19:56:31 +10:00
Paul Hauner
ad5bd6412a
Add attestation gossip pre-verification (#983)
* Add PH & MS slot clock changes

* Account for genesis time

* Add progress on duties refactor

* Add simple is_aggregator bool to val subscription

* Start work on attestation_verification.rs

* Add progress on ObservedAttestations

* Progress with ObservedAttestations

* Fix tests

* Add observed attestations to the beacon chain

* Add attestation observation to processing code

* Add progress on attestation verification

* Add first draft of ObservedAttesters

* Add more tests

* Add observed attesters to beacon chain

* Add observers to attestation processing

* Add more attestation verification

* Create ObservedAggregators map

* Remove commented-out code

* Add observed aggregators into chain

* Add progress

* Finish adding features to attestation verification

* Ensure beacon chain compiles

* Link attn verification into chain

* Integrate new attn verification in chain

* Remove old attestation processing code

* Start trying to fix beacon_chain tests

* Split adding into pools into two functions

* Add aggregation to harness

* Get test harness working again

* Adjust the number of aggregators for test harness

* Fix edge-case in harness

* Integrate new attn processing in network

* Fix compile bug in validator_client

* Update validator API endpoints

* Fix aggreagation in test harness

* Fix enum thing

* Fix attestation observation bug:

* Patch failing API tests

* Start adding comments to attestation verification

* Remove unused attestation field

* Unify "is block known" logic

* Update comments

* Supress fork choice errors for network processing

* Add todos

* Tidy

* Add gossip attn tests

* Disallow test harness to produce old attns

* Comment out in-progress tests

* Partially address pruning tests

* Fix failing store test

* Add aggregate tests

* Add comments about which spec conditions we check

* Dont re-aggregate

* Split apart test harness attn production

* Fix compile error in network

* Make progress on commented-out test

* Fix skipping attestation test

* Add fork choice verification tests

* Tidy attn tests, remove dead code

* Remove some accidentally added code

* Fix clippy lint

* Rename test file

* Add block tests, add cheap block proposer check

* Rename block testing file

* Add observed_block_producers

* Tidy

* Switch around block signature verification

* Finish block testing

* Remove gossip from signature tests

* First pass of self review

* Fix deviation in spec

* Update test spec tags

* Start moving over to hashset

* Finish moving observed attesters to hashmap

* Move aggregation pool over to hashmap

* Make fc attn borrow again

* Fix rest_api compile error

* Fix missing comments

* Fix monster test

* Uncomment increasing slots test

* Address remaining comments

* Remove unsafe, use cfg test

* Remove cfg test flag

* Fix dodgy comment

* Ignore aggregates that are already known.

* Unify aggregator modulo logic

* Fix typo in logs

* Refactor validator subscription logic

* Avoid reproducing selection proof

* Skip HTTP call if no subscriptions

* Rename DutyAndState -> DutyAndProof

* Tidy logs

* Print root as dbg

* Fix compile errors in tests

* Fix compile error in test
2020-05-06 21:42:56 +10:00
divma
b4a1a2e483
Better handling of RPC errors and RPC conn with the PeerManager (#1047) 2020-05-03 23:17:12 +10:00
realbigsean
dea01be00e
Improve aggregate validator logic (#1020)
* track whether we have aggregate validator subscriptions to exact subnets, so we know whether or not to drop incoming attestations

* fix is aggregator check

* fix CI

Co-authored-by: Age Manning <Age@AgeManning.com>
2020-04-30 15:39:10 +10:00
Age Manning
500f6b53d1
Testnet corrections (#1050)
* Correct RPC ping request

* Add attestation verification

* Add discv5 bug fixes

* Reduce gossipsub heartbeat and update metadata

* Handle known chain of advanced peer
2020-04-27 14:18:30 +10:00
divma
fa8154e3da
Ensure batches align to epoch boundaries (#1021)
* Ensure batches align to epoch boundaries

* Clean up range_sync logs
2020-04-27 14:18:09 +10:00
Age Manning
79cc9473c1
Sync and multi-client updates (#1044)
* Update finalized/head sync logic

* Correct sync logging

* Handle status during sync gracefully
2020-04-23 19:01:29 +10:00
Age Manning
0b82e9f8a9
Update Syncing logic (#1042)
* Prevent duplicate parent block lookups

* Updates logic for handling re-status'd peers

* Allow block lookup if the block is close to head

* Correct ordering of sync logs

* Remove comments in block processer, clean up sim
2020-04-22 23:58:10 +10:00
divma
2469bde6b1
Add chain_id in range syncing to avoid wrong dispatching of batch results (#1037) 2020-04-22 21:17:56 +10:00
Age Manning
20b6baf11f
Sync corrections (#1034)
* Correct status re-request logic improve logging

* Prevent multiple dials of the same peer

* Discovery to obey max peers when connecting to new peers
2020-04-22 00:29:19 +10:00
Age Manning
7acb136974
Correct parent lookup (#1027)
* Correct parent-lookup with block gossip verification

* Further update port conflicts in tests
2020-04-20 16:54:37 +10:00
Age Manning
489ad90536
Various corrections pre-testnet (#1019)
* Correct sync log messaging

* Modify syncing logs

* Update discv5 bug

* Discv5 patch

* Re-word sync status message

* Correct discovery peer finding logic

* Remove debugging log

* Remove duplicates in CLI

* Correct fmt
2020-04-19 20:45:25 +10:00
Age Manning
f9e8dad1fb
Correct status fork digest (#1016)
* Correct status fork digest

* Correct port issues with tests
2020-04-18 11:45:52 +10:00
Age Manning
e5874f4565
Global Sync access (#994)
* Connect sync logic to network globals

* Add further sync info to sync status

* Build new syncing HTTP API methods

* Fix bug in updating sync state

* Highest slot is current slot

* Update book for syncing API
2020-04-14 18:17:35 +10:00
divma
fa9daa488d
add handling of failed batches that imported blocks (#996) 2020-04-13 23:23:44 +10:00
Age Manning
19b8c5a9e0
Small bug fixes from initial sim tests (#993)
* Debug logging and fixes

* Minor fixes

* Remove debugging statements
2020-04-09 14:28:37 +10:00
Age Manning
1779aa6a8a
Merge latest master in v0.2.0 2020-04-08 16:46:37 +10:00
Age Manning
b23f19272d
v0.11.1 Network update (#989)
* Minor log bumps

* Initial building of extended RPC methods

* Wire in extended RPC methods

* Merge initial peer management template

* Add a PeerDB and give the peer manager some basic functions

* Initial connection of peer manager

* Add peer manager to lighthouse

* Connect peer manager with new RPC methods

* Correct tests and metadata RPC

Co-authored-by: Diva <divma@protonmail.com>
2020-04-08 01:08:05 +10:00