Commit Graph

260 Commits

Author SHA1 Message Date
Pawan Dhananjay
bb5285ac6d
Remove BeaconBlockAndBlobsSidecar from core topics (#4016) 2023-02-22 09:45:38 +11:00
Paul Hauner
eed7d65ce7
Allow for withdrawals in max block size (#4011)
* Allow for withdrawals in max block size

* Ensure payload size is counted
2023-02-21 18:03:10 +11:00
Michael Sproul
066c27750a
Merge remote-tracking branch 'origin/staging' into capella-update 2023-02-17 12:05:36 +11:00
realbigsean
4d0b0f681d
merge self limiter 2023-02-15 14:25:58 -05:00
realbigsean
b805fa6279
merge with upstream 2023-02-15 14:20:12 -05:00
Michael Sproul
918b688f72
Simplify payload traits and reduce cloning (#3976)
* Simplify payload traits and reduce cloning

* Fix self limiter
2023-02-15 14:17:56 +11:00
Age Manning
8dd9249177 Enforce a timeout on peer disconnect (#3757)
On heavily crowded networks, we are seeing many attempted connections to our node every second. 

Often these connections come from peers that have just been disconnected. This can be for a number of reasons including: 
- We have deemed them to be not as useful as other peers
- They have performed poorly
- They have dropped the connection with us
- The connection was spontaneously lost
- They were randomly removed because we have too many peers

In all of these cases, if we have reached or exceeded our target peer limit, there is no desire to accept new connections immediately after the disconnect from these peers. In fact, it often costs us resources to handle the established connections and defeats some of the logic of dropping them in the first place. 

This PR adds a timeout, that prevents recently disconnected peers from reconnecting to us.

Technically we implement a ban at the swarm layer to prevent immediate re connections for at least 10 minutes. I decided to keep this light, and use a time-based LRUCache which only gets updated during the peer manager heartbeat to prevent added stress of polling a delay map for what could be a large number of peers.

This cache is bounded in time. An extra space bound could be added should people consider this a risk.

Co-authored-by: Diva M <divma@protonmail.com>
2023-02-14 03:25:42 +00:00
Michael Sproul
d53ccf8fc7
Placeholder for BlobsByRange outbound rate limit 2023-02-14 12:08:14 +11:00
Michael Sproul
18c8cab4da
Merge remote-tracking branch 'origin/unstable' into capella-merge 2023-02-14 12:07:27 +11:00
realbigsean
ad9af6d8b1
complete match for has_context_bytes 2023-02-13 16:44:54 -05:00
Emilia Hane
43bf908e7a
Fix release tests 2023-02-10 15:34:59 +01:00
Emilia Hane
09370e70d9
Fix rebase conflicts 2023-02-10 09:41:19 +01:00
Divma
ceb986549d Self rate limiting dev flag (#3928)
## Issue Addressed
Adds self rate limiting options, mainly with the idea to comply with peer's rate limits in small testnets

## Proposed Changes
Add a hidden flag `self-limiter` this can take no value, or customs values to configure quotas per protocol

## Additional Info
### How to use
`--self-limiter` will turn on the self rate limiter applying the same params we apply to inbound requests (requests from other peers)
`--self-limiter "beacon_blocks_by_range:64/1"` will turn on the self rate limiter for ALL protocols, but change the quota for bbrange to 64 requested blocks per 1 second.
`--self-limiter "beacon_blocks_by_range:64/1;ping:1/10"` same as previous one, changing the quota for ping as well.

### Caveats
- The rate limiter is either on or off for all protocols. I added the custom values to be able to change the quotas per protocol so that some protocols can be given extremely loose or tight quotas. I think this should satisfy every need even if we can't technically turn off rate limits per protocol.
- This reuses the rate limiter struct for the inbound requests so there is this ugly part of the code in which we need to deal with the inbound only protocols (light client stuff) if this becomes too ugly as we add lc protocols, we might want to split the rate limiters. I've checked this and looks doable with const generics to avoid so much code duplication

### Knowing if this is on
```
Feb 06 21:12:05.493 DEBG Using self rate limiting params         config: OutboundRateLimiterConfig { ping: 2/10s, metadata: 1/15s, status: 5/15s, goodbye: 1/10s, blocks_by_range: 1024/10s, blocks_by_root: 128/10s }, service: libp2p_rpc, service: libp2p
```
2023-02-08 02:18:53 +00:00
Diva M
493784366f
self rate limiting 2023-02-07 13:00:35 -05:00
realbigsean
b7e20fb87a
Update beacon_node/lighthouse_network/src/rpc/protocol.rs 2023-01-27 19:03:43 +01:00
Diva M
9976d3bbbc
send stream terminators 2023-01-27 18:11:26 +01:00
realbigsean
5e8d79891b merge conflict resolution 2023-01-25 11:10:44 +01:00
realbigsean
5b4cd997d0
Update beacon_node/lighthouse_network/src/rpc/methods.rs 2023-01-24 12:20:40 +01:00
Diva M
2d2da92132
only support 4844 rpc methods if on 4844 2023-01-24 05:15:23 -05:00
realbigsean
b658cc7aaf simplify checking attester cache for block and blobs. use ResourceUnavailable according to the spec 2023-01-24 10:50:47 +01:00
Emilia Hane
81a754577d
fixup! Improve error handling 2023-01-21 15:47:33 +01:00
realbigsean
06f71e8cce
merge capella 2023-01-12 12:51:09 -05:00
Michael Sproul
2af8110529
Merge remote-tracking branch 'origin/unstable' into capella
Fixing the conflicts involved patching up some of the `block_hash` verification,
the rest will be done as part of https://github.com/sigp/lighthouse/issues/3870
2023-01-12 16:22:00 +11:00
realbigsean
438126f19a
merge upstream, fix compile errors 2023-01-11 13:52:58 -05:00
Age Manning
1d9a2022b4 Upgrade to libp2p v0.50.0 (#3764)
I've needed to do this work in order to do some episub testing. 

This version of libp2p has not yet been released, so this is left as a draft for when we wish to update.

Co-authored-by: Diva M <divma@protonmail.com>
2023-01-06 15:59:33 +00:00
Age Manning
4e5e7ee1fc Restructure code for libp2p upgrade (#3850)
Our custom RPC implementation is lagging from the libp2p v50 version. 

We are going to need to change a bunch of function names and would be nice to have consistent ordering of function names inside the handlers. 

This is a precursor to the libp2p upgrade to minimize merge conflicts in function ordering.
2023-01-05 17:18:24 +00:00
realbigsean
d8f7277beb
cleanup 2022-12-30 11:00:14 -05:00
Mark Mackey
c188cde034
merge upstream/unstable 2022-12-28 14:43:25 -06:00
realbigsean
8a70d80a2f
Revert "Revert "renames, remove , wrap BlockWrapper enum to make descontruction private""
This reverts commit 1931a442dc.
2022-12-28 10:31:18 -05:00
realbigsean
1931a442dc
Revert "renames, remove , wrap BlockWrapper enum to make descontruction private"
This reverts commit 5b3b34a9d7.
2022-12-28 10:30:36 -05:00
realbigsean
5b3b34a9d7
renames, remove , wrap BlockWrapper enum to make descontruction private 2022-12-28 10:28:45 -05:00
Divma
240854750c
cleanup: remove unused imports, unusued fields (#3834) 2022-12-23 17:16:10 -05:00
realbigsean
5db0a88d4f
fix compilation errors from merge 2022-12-23 10:27:01 -05:00
realbigsean
d504d51dd9
merge with upstream add context bytes to error log 2022-12-22 14:06:28 -05:00
realbigsean
33d01a7911
miscelaneous fixes on syncing, rpc and responding to peer's sync related requests (#3827)
- there was a bug in responding range blob requests where we would incorrectly label the first slot of an epoch as a non-skipped slot if it were skipped. this bug did not exist in the code for responding to block range request because the logic error was mitigated by defensive coding elsewhere
- there was a bug where a block received during range sync without a corresponding blob (and vice versa) was incorrectly interpreted as a stream termination
- RPC size limit fixes.
- Our blob cache was dead locking so I removed use of it for now.
- Because of our change in finalized sync batch size from 2 to 1 and our transition to using exact epoch boundaries for batches (rather than one slot past the epoch boundary), we need to sync finalized sync to 2 epochs + 1 slot past our peer's finalized slot in order to finalize the chain locally.
- use fork context bytes in rpc methods on both the server and client side
2022-12-21 15:50:51 -05:00
realbigsean
ff772311fa
add context bytes to blob messages, fix rpc limits, sync past finalized checkpoint during finalized sync so we can advance our own finalization, fix stream termination bug in blobs by range 2022-12-21 13:56:52 -05:00
realbigsean
a67fa516c7
don't expect context bytes for blob messages 2022-12-20 19:32:54 -05:00
realbigsean
9c46a1cb21
fix rate limits, and a couple other bugs 2022-12-20 18:56:07 -05:00
realbigsean
5de4f5b8d0
handle parent blob request edge cases correctly. fix data availability boundary check 2022-12-19 11:39:09 -05:00
realbigsean
0349b104bf
add blob rpc protocols to 2022-12-16 14:28:14 -05:00
Divma
ffbf70e2d9 Clippy lints for rust 1.66 (#3810)
## Issue Addressed
Fixes the new clippy lints for rust 1.66

## Proposed Changes

Most of the changes come from:
- [unnecessary_cast](https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_cast)
- [iter_kv_map](https://rust-lang.github.io/rust-clippy/master/index.html#iter_kv_map)
- [needless_borrow](https://rust-lang.github.io/rust-clippy/master/index.html#needless_borrow)

## Additional Info

na
2022-12-16 04:04:00 +00:00
realbigsean
1644978cdb
fix compilation 2022-12-15 10:26:10 -05:00
realbigsean
d893706e0e
merge with capella 2022-12-15 09:33:18 -05:00
Michael Sproul
991e4094f8
Merge remote-tracking branch 'origin/unstable' into capella-update 2022-12-14 13:00:41 +11:00
ethDreamer
b1c33361ea
Fixed Clippy Complaints & Some Failing Tests (#3791)
* Fixed Clippy Complaints & Some Failing Tests
* Update Dockerfile to Rust-1.65
* EF test file renamed
* Touch up comments based on feedback
2022-12-13 10:50:24 -06:00
GeemoCandama
1b28ef8a8d Adding light_client gossip topics (#3693)
## Issue Addressed
Implementing the light_client_gossip topics but I'm not there yet.

Which issue # does this PR address?
Partially #3651

## Proposed Changes
Add light client gossip topics.
Please list or describe the changes introduced by this PR.
I'm going to Implement light_client_finality_update and light_client_optimistic_update gossip topics. Currently I've attempted the former and I'm seeking feedback.

## Additional Info
I've only implemented the light_client_finality_update topic because I wanted to make sure I was on the correct path. Also checking that the gossiped LightClientFinalityUpdate is the same as the locally constructed one is not implemented because caching the updates will make this much easier. Could someone give me some feedback on this please? 

Please provide any additional information. For example, future considerations
or information useful for reviewers.

Co-authored-by: GeemoCandama <104614073+GeemoCandama@users.noreply.github.com>
2022-12-13 06:24:51 +00:00
realbigsean
a0d4aecf30
requests block + blob always post eip4844 2022-12-07 15:30:08 -05:00
realbigsean
6c8b1b323b
merge upstream 2022-12-07 12:27:21 -05:00
ethDreamer
1a39976715
Fixed Compiler Warnings & Failing Tests (#3771) 2022-12-03 10:42:12 +11:00
realbigsean
8102a01085
merge with upstream 2022-12-01 11:13:07 -05:00
Mark Mackey
8a04c3428e Merged with unstable 2022-11-30 17:29:10 -06:00
Diva M
979a95d62f
handle unknown parents for block-blob pairs
wip

handle unknown parents for block-blob pairs
2022-11-30 17:21:54 -05:00
GeemoCandama
3534c85e30 Optimize finalized chain sync by skipping newPayload messages (#3738)
## Issue Addressed

#3704 

## Proposed Changes
Adds is_syncing_finalized: bool parameter for block verification functions. Sets the payload_verification_status to Optimistic if is_syncing_finalized is true. Uses SyncState in NetworkGlobals in BeaconProcessor to retrieve the syncing status.

## Additional Info
I could implement FinalizedSignatureVerifiedBlock if you think it would be nicer.
2022-11-29 08:19:27 +00:00
Age Manning
2779017076 Gossipsub fast message id change (#3755)
For improved consistency, this mixes in the topic into our fast message id for more consistent tracking of messages across topics.
2022-11-28 07:36:52 +00:00
realbigsean
3c9e1abcb7
merge upstream 2022-11-26 10:01:57 -05:00
sean
07a79c8266 add block and blobs sidecar to whitelist 2022-11-25 14:44:57 +00:00
sean
f88caa7afc set quota for blobs by root 2022-11-25 14:30:51 +00:00
antondlr
e9bf7f7cc1 remove commas from comma-separated kv pairs (#3737)
## Issue Addressed

Logs are in comma separated kv list, but the values sometimes contain commas, which breaks parsing
2022-11-25 07:57:10 +00:00
Giulio rebuffo
d5a2de759b Added LightClientBootstrap V1 (#3711)
## Issue Addressed

Partially addresses #3651

## Proposed Changes

Adds server-side support for light_client_bootstrap_v1 topic

## Additional Info

This PR, creates each time a bootstrap without using cache, I do not know how necessary a cache is in this case as this topic is not supposed to be called frequently and IMHO we can just prevent abuse by using the limiter, but let me know what you think or if there is any caveat to this, or if it is necessary only for the sake of good practice.


Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
2022-11-25 05:19:00 +00:00
Michael Sproul
788b337951
Op pool and gossip for BLS to execution changes (#3726) 2022-11-25 07:09:26 +11:00
realbigsean
48b2efce9f
merge with upstream 2022-11-22 18:38:30 -05:00
Michael Sproul
61b4bbf870
Fix BlocksByRoot response types (#3743) 2022-11-22 12:29:47 -05:00
Diva M
7ed2d35424
get it to compile 2022-11-21 14:53:33 -05:00
realbigsean
e7ee79185b
add blobs cache and fix some block production 2022-11-21 14:09:06 -05:00
realbigsean
dc87156641
block and blob handling progress 2022-11-19 16:53:34 -05:00
realbigsean
45897ad4e1
remove blob wrapper 2022-11-19 15:18:42 -05:00
realbigsean
7162e5e23b
add a bunch of blob coupling boiler plate, add a blobs by root request 2022-11-15 16:43:56 -05:00
Pawan Dhananjay
857ef25d28 Add metrics for subnet queries (#3721)
## Issue Addressed

N/A

## Proposed Changes

Add metrics for peers discovered in subnet discv5 queries.
2022-11-15 13:25:38 +00:00
Michael Sproul
713b6a18d4 Simplify GossipTopic -> String conversion (#3722)
## Proposed Changes

With a few different changes to the gossip topics in flight (light clients, Capella, 4844, etc) I think this simplification makes sense. I noticed it while plumbing through a new Capella topic.
2022-11-15 05:21:48 +00:00
Age Manning
230168deff Health Endpoints for UI (#3668)
This PR adds some health endpoints for the beacon node and the validator client.

Specifically it adds the endpoint:
`/lighthouse/ui/health`

These are not entirely stable yet. But provide a base for modification for our UI. 

These also may have issues with various platforms and may need modification.
2022-11-15 05:21:26 +00:00
Michael Sproul
0cdd049da9
Fixes to make EF Capella tests pass (#3719)
* Fixes to make EF Capella tests pass

* Clippy for state_processing
2022-11-14 13:14:31 -06:00
realbigsean
fe04d945cc
make signed block + sidecar consensus spec 2022-11-10 14:22:30 -05:00
realbigsean
bc0af72c74
fix topic name 2022-11-07 12:36:31 -05:00
realbigsean
1aec17b09c
Merge branch 'unstable' of https://github.com/sigp/lighthouse into eip4844 2022-11-04 13:23:55 -04:00
Divma
8600645f65 Fix rust 1.65 lints (#3682)
## Issue Addressed

New lints for rust 1.65

## Proposed Changes

Notable change is the identification or parameters that are only used in recursion

## Additional Info
na
2022-11-04 07:43:43 +00:00
realbigsean
8656d23327
merge with unstable 2022-11-01 13:18:00 -04:00
Pawan Dhananjay
29f2ec46d3
Couple blocks and blobs in gossip (#3670)
* Revert "Add more gossip verification conditions"

This reverts commit 1430b561c3.

* Revert "Add todos"

This reverts commit 91efb9d4c7.

* Revert "Reprocess blob sidecar messages"

This reverts commit 21bf3d37cd.

* Add the coupled topic

* Decode SignedBeaconBlockAndBlobsSidecar correctly

* Process Block and Blobs in beacon processor

* Remove extra blob publishing logic from vc

* Remove blob signing in vc

* Ugly hack to compile
2022-11-01 10:28:21 -04:00
Divma
46fbf5b98b Update discv5 (#3171)
## Issue Addressed

Updates discv5

Pending on
- [x] #3547 
- [x] Alex upgrades his deps

## Proposed Changes

updates discv5 and the enr crate. The only relevant change would be some clear indications of ipv4 usage in lighthouse

## Additional Info

Functionally, this should be equivalent to the prev version.
As draft pending a discv5 release
2022-10-28 05:40:06 +00:00
realbigsean
137f230344
Capella eip 4844 cleanup (#3652)
* add capella gossip boiler plate

* get everything compiling

Co-authored-by: realbigsean <sean@sigmaprime.io
Co-authored-by: Mark Mackey <mark@sigmaprime.io>

* small cleanup

* small cleanup

* cargo fix + some test cleanup

* improve block production

* add fixme for potential panic

Co-authored-by: Mark Mackey <mark@sigmaprime.io>
2022-10-26 15:15:26 -04:00
Divma
3a5888e53d Ban and unban peers at the swarm level (#3653)
## Issue Addressed

I missed this from https://github.com/sigp/lighthouse/pull/3491. peers were being banned at the behaviour level only. The identify errors are explained by this as well

## Proposed Changes

Add banning and unbanning 

## Additional Info

Befor,e having tests that catch this was hard because the swarm was outside the behaviour. We could now have tests that prevent something like this in the future
2022-10-24 21:39:30 +00:00
Pawan Dhananjay
c55b28bf10
Minor fixes 2022-10-04 19:18:06 -05:00
realbigsean
7527c2b455
fix RPC limit add blob signing domain 2022-10-04 14:57:29 -04:00
realbigsean
ba16a037a3
cleanup 2022-10-04 09:34:05 -04:00
realbigsean
c0dc42ea07
cargo fmt 2022-10-04 08:21:46 -04:00
realbigsean
8d45e48775
cargo fix 2022-10-03 21:52:16 -04:00
realbigsean
e81dbbfea4
compile 2022-10-03 21:48:02 -04:00
realbigsean
88006735c4
compile 2022-10-03 10:06:04 -04:00
realbigsean
7520651515
cargo fix and some test fixes 2022-09-29 12:43:35 -04:00
realbigsean
fe6fc55449
fix compilation errors, rename capella -> shanghai, cleanup some rebase issues 2022-09-29 12:43:13 -04:00
realbigsean
3f1e5cee78
Some gossip work 2022-09-29 12:35:53 -04:00
realbigsean
4008da6c60
sync tx blobs 2022-09-29 12:32:55 -04:00
realbigsean
4cdf1b546d
add shanghai fork version and epoch 2022-09-29 12:28:58 -04:00
realbigsean
de44b300c0
add/update types 2022-09-29 12:25:56 -04:00
Age Manning
27bb9ff07d Handle Lodestar's new agent string (#3620)
## Issue Addressed

#3561 

## Proposed Changes

Recognize Lodestars new agent string and appropriately count these peers as lodestar peers.
2022-09-29 01:50:13 +00:00
Divma
b1d2510d1b Libp2p v0.48.0 upgrade (#3547)
## Issue Addressed

Upgrades libp2p to v.0.47.0. This is the compilation of
- [x] #3495 
- [x] #3497 
- [x] #3491 
- [x] #3546 
- [x] #3553 

Co-authored-by: Age Manning <Age@AgeManning.com>
2022-09-29 01:50:11 +00:00
Marius van der Wijden
8b71b978e0 new round of hacks (config etc) 2022-09-17 23:42:49 +02:00
Marius van der Wijden
aeb52ff186 network stuff 2022-09-17 16:10:42 +02:00
Marius van der Wijden
36a0add0cd network stuff 2022-09-17 15:23:28 +02:00
Daniel Knopik
0518665949 Merge remote-tracking branch 'fork/eip4844' into eip4844 2022-09-17 14:58:33 +02:00
Daniel Knopik
292a16a6eb gossip boilerplate 2022-09-17 14:58:27 +02:00
Marius van der Wijden
acace8ab31 network: blobs by range message 2022-09-17 14:55:18 +02:00
Daniel Knopik
bcc738cb9d progress on gossip stuff 2022-09-17 14:31:57 +02:00
Daniel Knopik
ca1e17b386 it compiles! 2022-09-17 12:23:03 +02:00
Divma
473abc14ca Subscribe to subnets only when needed (#3419)
## Issue Addressed

We currently subscribe to attestation subnets as soon as the subscription arrives (one epoch in advance), this makes it so that subscriptions for future slots are scheduled instead of done immediately. 

## Proposed Changes

- Schedule subscriptions to subnets for future slots.
- Finish removing hashmap_delay, in favor of [delay_map](https://github.com/AgeManning/delay_map). This was the only remaining service to do this.
- Subscriptions for past slots are rejected, before we would subscribe for one slot.
- Add a new test for subscriptions that are not consecutive.

## Additional Info

This is also an effort in making the code easier to understand
2022-09-05 00:22:48 +00:00
Pawan Dhananjay
f3439116da Return ResourceUnavailable if we are unable to reconstruct execution payloads (#3365)
## Issue Addressed

Resolves #3351 

## Proposed Changes

Returns a `ResourceUnavailable` rpc error if we are unable to serve full payloads to blocks by root and range requests because the execution layer is not synced.


## Additional Info

This PR also changes the penalties such that a `ResourceUnavailable` error is only penalized if it is an outgoing request. If we are syncing and aren't getting full block responses, then we don't have use for the peer. However, this might not be true for the incoming request case. We let the peer decide in this case if we are still useful or if we should be banned.
cc @divagant-martian please let me know if i'm missing something here.
2022-07-27 03:20:00 +00:00
Justin Traglia
0f62d900fe Fix some typos (#3376)
## Proposed Changes

This PR fixes various minor typos in the project.
2022-07-27 00:51:06 +00:00
Akihito Nakano
98a9626ef5 Bump the MSRV to 1.62 and using #[derive(Default)] on enums (#3304)
## Issue Addressed

N/A

## Proposed Changes

Since Rust 1.62, we can use `#[derive(Default)]` on enums.  

https://blog.rust-lang.org/2022/06/30/Rust-1.62.0.html#default-enum-variants

There are no changes to functionality in this PR, just replaced the `Default` trait implementation with `#[derive(Default)]`.
2022-07-15 07:31:19 +00:00
Paul Hauner
be4e261e74 Use async code when interacting with EL (#3244)
## Overview

This rather extensive PR achieves two primary goals:

1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.

Additionally, it achieves:

- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
    - I had to do this to deal with sending blocks into spawned tasks.
    - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
    - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
    - Avoids cloning *all the blocks* in *every chain segment* during sync.
    - It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough 😅)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.

For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273

## Changes to `canonical_head` and `fork_choice`

Previously, the `BeaconChain` had two separate fields:

```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```

Now, we have grouped these values under a single struct:

```
canonical_head: CanonicalHead {
  cached_head: RwLock<Arc<Snapshot>>,
  fork_choice: RwLock<BeaconForkChoice>
} 
```

Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.

## Breaking Changes

### The `state` (root) field in the `finalized_checkpoint` SSE event

Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:

1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.

Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](de2b2801c8/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java (L171-L182)) it uses [`getStateRootFromBlockRoot`](de2b2801c8/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java (L336-L341)) which uses (1).

I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.

## Notes for Reviewers

I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.

I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".

I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.

I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.

Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.

You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.

I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.

Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
Akihito Nakano
082ed35bdc Test the pruning of excess peers using randomly generated input (#3248)
## Issue Addressed

https://github.com/sigp/lighthouse/issues/3092


## Proposed Changes

Added property-based tests for the pruning implementation. A randomly generated input for the test contains connection direction, subnets, and scores.


## Additional Info

I left some comments on this PR, what I have tried, and [a question](https://github.com/sigp/lighthouse/pull/3248#discussion_r891981969).

Co-authored-by: Diva M <divma@protonmail.com>
2022-06-25 22:22:34 +00:00
Divma
7af5742081 Deprecate step param in BlocksByRange RPC request (#3275)
## Issue Addressed

Deprecates the step parameter in the blocks by range request

## Proposed Changes

- Modifies the BlocksByRangeRequest type to remove the step parameter and everywhere we took it into account before
- Adds a new type to still handle coding and decoding of requests that use the parameter

## Additional Info
I went with a deprecation over the type itself so that requests received outside `lighthouse_network` don't even need to deal with this parameter. After the deprecation period just removing the Old blocks by range request should be straightforward
2022-06-22 16:23:34 +00:00
Divma
3dd50bda11 Improve substream management (#3261)
## Issue Addressed

Which issue # does this PR address?

## Proposed Changes

Please list or describe the changes introduced by this PR.

## Additional Info

Please provide any additional information. For example, future considerations
or information useful for reviewers.
2022-06-10 06:58:50 +00:00
Akihito Nakano
a6d2ed6119 Fix: PeerManager doesn't remove "outbound only" peers which should be pruned (#3236)
## Issue Addressed

This is one step to address https://github.com/sigp/lighthouse/issues/3092 before introducing `quickcheck`.

I noticed an issue while I was reading the pruning implementation `PeerManager::prune_excess_peers()`. If a peer with the following condition, **`outbound_peers_pruned` counter increases but the peer is not pushed to `peers_to_prune`**.

- [outbound only](1e4ac8a4b9/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L1018))
- [min_subnet_count <= MIN_SYNC_COMMITTEE_PEERS](1e4ac8a4b9/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L1047))

As a result, PeerManager doesn't remove "outbound" peers which should be pruned.

Note: [`subnet_to_peer`](e0d673ea86/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L999)) (HashMap) doesn't guarantee a particular order of iteration. So whether the test fails depend on the order of iteration.
2022-06-06 05:51:10 +00:00
Akihito Nakano
695f415590 Tiny improvement: PeerManager and maximum discovery query (#3182)
## Issue Addressed

As [`Discovery` bounds the maximum discovery query](e88b18be09/beacon_node/lighthouse_network/src/discovery/mod.rs (L328)), `PeerManager` no need to handle it.

e88b18be09/beacon_node/lighthouse_network/src/discovery/mod.rs (L328)
2022-05-19 06:00:46 +00:00
François Garillot
3f9e83e840 [refactor] Refactor Option/Result combinators (#3180)
Code simplifications using `Option`/`Result` combinators to make pattern-matches a tad simpler. 
Opinions on these loosely held, happy to adjust in review.

Tool-aided by [comby-rust](https://github.com/huitseeker/comby-rust).
2022-05-16 01:59:47 +00:00
Pawan Dhananjay
db0beb5178 Poll shutdown timeout in rpc handler (#3153)
## Issue Addressed

N/A

## Proposed Changes

Previously, we were using `Sleep::is_elapsed()` to check if the shutdown timeout had triggered without polling the sleep. This PR polls the sleep timer.
2022-04-13 03:54:44 +00:00
Divma
580d2f7873 log upgrades + prevent dialing of disconnecting peers (#3148)
## Issue Addressed
We still ping peers that are considered in a disconnecting state

## Proposed Changes

Do not ping peers once we decide they are disconnecting
Upgrade logs about ignored rpc messages

## Additional Info
--
2022-04-13 03:54:43 +00:00
Pawan Dhananjay
fff4dd6311 Fix rpc limits version 2 (#3146)
## Issue Addressed

N/A

## Proposed Changes

https://github.com/sigp/lighthouse/pull/3133 changed the rpc type limits to be fork aware i.e. if our current fork based on wall clock slot is Altair, then we apply only altair rpc type limits. This is a bug because phase0 blocks can still be sent over rpc and phase 0 block minimum size is smaller than altair block minimum size. So a phase0 block with `size < SIGNED_BEACON_BLOCK_ALTAIR_MIN` will return an `InvalidData` error as it doesn't pass the rpc types bound check.

This error can be seen when we try syncing pre-altair blocks with size smaller than `SIGNED_BEACON_BLOCK_ALTAIR_MIN`.

This PR fixes the issue by also accounting for forks earlier than current_fork in the rpc limits calculation in the  `rpc_block_limits_by_fork` function. I decided to hardcode the limits in the function because that seemed simpler than calculating previous forks based on current fork and doing a min across forks. Adding a new fork variant is simple and can the limits can be easily checked in a review. 

Adds unit tests and modifies the syncing simulator to check the syncing from across fork boundaries. 
The syncing simulator's block 1 would always be of phase 0 minimum size (404 bytes) which is smaller than altair min block size (since block 1 contains no attestations).
2022-04-07 23:45:38 +00:00
Pawan Dhananjay
ab434bc075 Fix merge rpc length limits (#3133)
## Issue Addressed

N/A

## Proposed Changes

Fix the upper bound for blocks by root responses to be equal to the max merge block size instead of altair.
Further make the rpc response limits fork aware.
2022-04-04 00:26:15 +00:00
Michael Sproul
41e7a07c51 Add lighthouse db command (#3129)
## Proposed Changes

Add a `lighthouse db` command with three initial subcommands:

- `lighthouse db version`: print the database schema version.
- `lighthouse db migrate --to N`: manually upgrade (or downgrade!) the database to a different version.
- `lighthouse db inspect --column C`: log the key and size in bytes of every value in a given `DBColumn`.

This PR lays the groundwork for other changes, namely:

- Mark's fast-deposit sync (https://github.com/sigp/lighthouse/pull/2915), for which I think we should implement a database downgrade (from v9 to v8).
- My `tree-states` work, which already implements a downgrade (v10 to v8).
- Standalone purge commands like `lighthouse db purge-dht` per https://github.com/sigp/lighthouse/issues/2824.

## Additional Info

I updated the `strum` crate to 0.24.0, which necessitated some changes in the network code to remove calls to deprecated methods.

Thanks to @winksaville for the motivation, and implementation work that I used as a source of inspiration (https://github.com/sigp/lighthouse/pull/2685).
2022-04-01 00:58:59 +00:00
Divma
4bf1af4e85 Custom RPC request management for sync (#3029)
## Proposed Changes
Make `lighthouse_network` generic over request ids, now usable by sync
2022-03-02 22:07:17 +00:00
Age Manning
e88b18be09 Update libp2p (#3039)
Update libp2p. 

This corrects some gossipsub metrics.
2022-03-02 05:09:52 +00:00
Age Manning
f3c1dde898 Filter non global ips from discovery (#3023)
## Issue Addressed

#3006 

## Proposed Changes

This PR changes the default behaviour of lighthouse to ignore discovered IPs that are not globally routable. It adds a CLI flag, --enable-local-discovery to permit the non-global IPs in discovery.

NOTE: We should take care in merging this as I will break current set-ups that rely on local IP discovery. I made this the non-default behaviour because we dont really want to be wasting resources attempting to connect to non-routable addresses and we dont want to propagate these to others (on the chance we can connect to one of these local nodes), improving discoveries efficiency.
2022-03-02 03:14:27 +00:00
Age Manning
a1b730c043 Cleanup small issues (#3027)
Downgrades some excessive networking logs and corrects some metrics.
2022-03-01 01:49:22 +00:00
Michael Sproul
5e1f8a8480 Update to Rust 1.59 and 2021 edition (#3038)
## Proposed Changes

Lots of lint updates related to `flat_map`, `unwrap_or_else` and string patterns. I did a little more creative refactoring in the op pool, but otherwise followed Clippy's suggestions.

## Additional Info

We need this PR to unblock CI.
2022-02-25 00:10:17 +00:00
Age Manning
3ebb8b0244 Improved peer management (#2993)
## Issue Addressed

I noticed in some logs some excess and unecessary discovery queries. What was happening was we were pruning our peers down to our outbound target and having some disconnect. When we are below this threshold we try to find more peers (even if we are at our peer limit). The request becomes futile because we have no more peer slots. 

This PR corrects this issue and advances the pruning mechanism to favour subnet peers. 

An overview the new logic added is:
- We prune peers down to a target outbound peer count which is higher than the minimum outbound peer count.
- We only search for more peers if there is room to do so, and we are below the minimum outbound peer count not the target. So this gives us some buffer for peers to disconnect. The buffer is currently 10%

The modified pruning logic is documented in the code but for reference it should do the following:
- Prune peers with bad scores first
- If we need to prune more peers, then prune peers that are subscribed to a long-lived subnet
- If we still need to prune peers, the prune peers that we have a higher density of on any given subnet which should drive for uniform peers across all subnets.

This will need a bit of testing as it modifies some significant peer management behaviours in lighthouse.
2022-02-18 02:36:43 +00:00
Paul Hauner
0a6a8ea3b0 Engine API v1.0.0.alpha.6 + interop tests (#3024)
## Issue Addressed

NA

## Proposed Changes

This PR extends #3018 to address my review comments there and add automated integration tests with Geth (and other implementations, in the future).

I've also de-duplicated the "unused port" logic by creating an  `common/unused_port` crate.

## Additional Info

I'm not sure if we want to merge this PR, or update #3018 and merge that. I don't mind, I'm primarily opening this PR to make sure CI works.


Co-authored-by: Mark Mackey <mark@sigmaprime.io>
2022-02-17 21:47:06 +00:00
Divma
1306b2db96 libp2p upgrade + gossipsub interval fix (#3012)
## Issue Addressed
Lighthouse gossiping late messages

## Proposed Changes
Point LH to our fork using tokio interval, which 1) works as expected 2) is more performant than the previous version that actually worked as expected
Upgrade libp2p 

## Additional Info
https://github.com/libp2p/rust-libp2p/issues/2497
2022-02-10 04:12:03 +00:00
Divma
36fc887a40 Gossip cache timeout adjustments (#2997)
## Proposed Changes

- Do not retry to publish sync committee messages.
- Give a more lenient timeout to slashings and exits
2022-02-07 23:25:06 +00:00
Age Manning
675c7b7e26 Correct a dial race condition (#2992)
## Issue Addressed

On a network with few nodes, it is possible that the same node can be found from a subnet discovery and a normal peer discovery at the same time.

The network behaviour loads these peers into events and processes them when it has the chance. It can happen that the same peer can enter the event queue more than once and then attempt to be dialed twice. 

This PR shifts the registration of nodes in the peerdb as being dialed before they enter the NetworkBehaviour queue, preventing multiple attempts of the same peer being entered into the queue and avoiding the race condition.
2022-02-07 23:25:05 +00:00
Divma
48b7c8685b upgrade libp2p (#2933)
## Issue Addressed

Upgrades libp2p to v.0.42.0 pre release (https://github.com/libp2p/rust-libp2p/pull/2440)
2022-02-07 23:25:03 +00:00
Divma
615695776e Retry gossipsub messages when insufficient peers (#2964)
## Issue Addressed
#2947 

## Proposed Changes

Store messages that fail to be published due to insufficient peers for retry later. Messages expire after half an epoch and are retried if gossipsub informs us that an useful peer has connected. Currently running in Atlanta

## Additional Info
If on retry sending the messages fails they will not be tried again
2022-02-03 01:12:30 +00:00
Mac L
286996b090 Fix small typo in error log (#2975)
## Proposed Changes

Fixes a small typo I came across.
2022-01-31 22:55:07 +00:00
Age Manning
bdd70d7aef Reduce gossip history (#2969)
The gossipsub history was increased to a good portion of a slot from 2.1 seconds in the last release.

Although it shouldn't cause too much issue, it could be related to recieving later messages than usual and interacting with our scoring system penalizing peers. For consistency, this PR reduces the time we gossip messages back to the same values of the previous release.

It also adjusts the gossipsub heartbeat time for testing purposes with a developer flag but this should not effect end users.
2022-01-31 07:29:41 +00:00
Age Manning
ca29b580a2 Increase target subnet peers (#2948)
In the latest release we decreased the target number of subnet peers. 

It appears this could be causing issues in some cases and so reverting it back to the previous number it wise. A larger PR that follows this will address some other related discovery issues and peer management around subnet peer discovery.
2022-01-24 12:08:00 +00:00
Age Manning
fc7a1a7dc7 Allow disconnected states to introduce new peers without warning (#2922)
## Issue Addressed

We emit a warning to verify that all peer connection state information is consistent. A warning is given under one edge case;
We try to dial a peer with peer-id X and multiaddr Y. The peer responds to multiaddr Y with a different peer-id, Z. The dialing to the peer fails, but libp2p injects the failed attempt as peer-id Z. 

In this instance, our PeerDB tries to add a new peer in the disconnected state under a previously unknown peer-id. This is harmless and so this PR permits this behaviour without logging a warning.
2022-01-20 09:14:25 +00:00
Age Manning
1c667ad3ca PeerDB Status unknown bug fix (#2907)
## Issue Addressed

The PeerDB was getting out of sync with the number of disconnected peers compared to the actual count. As this value determines how many we store in our cache, over time the cache was depleting and we were removing peers immediately resulting in errors that manifest as unknown peers for some operations.

The error occurs when dialing a peer fails, we were not correctly updating the peerdb counter because the increment to the counter was placed in the wrong order and was therefore not incrementing the count. 

This PR corrects this.
2022-01-14 05:42:48 +00:00
Age Manning
6f4102aab6 Network performance tuning (#2608)
There is a pretty significant tradeoff between bandwidth and speed of gossipsub messages. 

We can reduce our bandwidth usage considerably at the cost of minimally delaying gossipsub messages. The impact of delaying messages has not been analyzed thoroughly yet, however this PR in conjunction with some gossipsub updates show considerable bandwidth reduction. 

This PR allows the user to set a CLI value (`network-load`) which is an integer in the range of 1 of 5 depending on their bandwidth appetite. 1 represents the least bandwidth but slowest message recieving and 5 represents the most bandwidth and fastest received message time. 

For low-bandwidth users it is likely to be more efficient to use a lower value. The default is set to 3, which currently represents a reduced bandwidth usage compared to previous version of this PR. The previous lighthouse versions are equivalent to setting the `network-load` CLI to 4.

This PR is awaiting a few gossipsub updates before we can get it into lighthouse.
2022-01-14 05:42:47 +00:00
Paul Hauner
aaa5344eab Add peer score adjustment msgs (#2901)
## Issue Addressed

N/A

## Proposed Changes

This PR adds the `msg` field to `Peer score adjusted` log messages. These `msg` fields help identify *why* a peer was banned.

Example:

```
Jan 11 04:18:48.096 DEBG Peer score adjusted                     score: -100.00, peer_id: 16Uiu2HAmQskxKWWGYfginwZ51n5uDbhvjHYnvASK7PZ5gBdLmzWj, msg: attn_unknown_head, service: libp2p
Jan 11 04:18:48.096 DEBG Peer score adjusted                     score: -27.86, peer_id: 16Uiu2HAmA7cCb3MemVDbK3MHZoSb7VN3cFUG3vuSZgnGesuVhPDE, msg: sync_past_slot, service: libp2p
Jan 11 04:18:48.096 DEBG Peer score adjusted                     score: -100.00, peer_id: 16Uiu2HAmQskxKWWGYfginwZ51n5uDbhvjHYnvASK7PZ5gBdLmzWj, msg: attn_unknown_head, service: libp2p
Jan 11 04:18:48.096 DEBG Peer score adjusted                     score: -28.86, peer_id: 16Uiu2HAmA7cCb3MemVDbK3MHZoSb7VN3cFUG3vuSZgnGesuVhPDE, msg: sync_past_slot, service: libp2p
Jan 11 04:18:48.096 DEBG Peer score adjusted                     score: -29.86, peer_id: 16Uiu2HAmA7cCb3MemVDbK3MHZoSb7VN3cFUG3vuSZgnGesuVhPDE, msg: sync_past_slot, service: libp2p
```

There is also a `libp2p_report_peer_msgs_total` metrics which allows us to see count of reports per `msg` tag. 

## Additional Info

NA
2022-01-12 05:32:14 +00:00
Age Manning
81c667b58e Additional networking metrics (#2549)
Adds additional metrics for network monitoring and evaluation.


Co-authored-by: Mark Mackey <mark@sigmaprime.io>
2021-12-22 06:17:14 +00:00
Divma
56d596ee42 Unban peers at the swarm level when purged (#2855)
## Issue Addressed
#2840
2021-12-20 23:45:21 +00:00
Divma
eee0260a68 do not count dialing peers in the connection limit (#2856)
## Issue Addressed
#2841 

## Proposed Changes
Not counting dialing peers while deciding if we have reached the target peers in case of outbound peers.

## Additional Info
Checked this running in nodes and bandwidth looks normal, peer count looks normal too
2021-12-15 05:48:45 +00:00
Lion - dapplion
2984f4b474 Remove wrong duplicated comment (#2751)
## Issue Addressed

Remove wrong duplicated comment. Comment was copied from ban_peer() but doesn't apply to unban_peer()
2021-12-06 05:34:15 +00:00
Pawan Dhananjay
f3c237cfa0
Restrict network limits based on merge fork epoch (#2839) 2021-12-02 14:32:31 +11:00
Paul Hauner
ab86b42874
Kintsugi Diva comments (#2836)
* Remove TODOs

* Fix typo
2021-12-02 14:29:59 +11:00
pawan
44a7b37ce3
Increase network limits (#2796)
Fix max packet sizes

Fix max_payload_size function

Add merge block test

Fix max size calculation; fix up test

Clear comments

Add a payload_size_function

Use safe arith for payload calculation

Return an error if block too big in block production

Separate test to check if block is over limit
2021-12-02 14:29:20 +11:00
Mark Mackey
5687c56d51
Initial merge changes
Added Execution Payload from Rayonism Fork

Updated new Containers to match Merge Spec

Updated BeaconBlockBody for Merge Spec

Completed updating BeaconState and BeaconBlockBody

Modified ExecutionPayload<T> to use Transaction<T>

Mostly Finished Changes for beacon-chain.md

Added some things for fork-choice.md

Update to match new fork-choice.md/fork.md changes

ran cargo fmt

Added Missing Pieces in eth2_libp2p for Merge

fix ef test

Various Changes to Conform Closer to Merge Spec
2021-12-02 14:26:50 +11:00
Age Manning
6625aa4afe Status'd Peer Not Found (#2761)
## Issue Addressed

Users are experiencing `Status'd peer not found` errors

## Proposed Changes

Although I cannot reproduce this error, this is only one connection state change that is not addressed in the peer manager (that I could see). The error occurs because the number of disconnected peers in the peerdb becomes out of sync with the actual number of disconnected peers. From what I can tell almost all possible connection state changes are handled, except for the case when a disconnected peer changes to be disconnecting. This can potentially happen at the peer connection limit, where a previously connected peer switches to disconnecting. 

This PR decrements the disconnected counter when this event occurs and from what I can tell, covers all possible disconnection state changes in the peer manager.
2021-11-28 22:46:17 +00:00
Pawan Dhananjay
9eedb6b888 Allow additional subnet peers (#2823)
## Issue Addressed

N/A

## Proposed Changes

1. Don't disconnect peer from dht on connection limit errors
2. Bump up `PRIORITY_PEER_EXCESS` to allow for dialing upto 60 peers by default.



Co-authored-by: Diva M <divma@protonmail.com>
2021-11-25 21:27:08 +00:00
Michael Sproul
2c07a72980 Revert peer DB changes from #2724 (#2828)
## Proposed Changes

This reverts commit 53562010ec from PR #2724

Hopefully this will restore the reliability of the sync simulator.
2021-11-25 03:45:52 +00:00
Age Manning
0b319d4926 Inform dialing via the behaviour (#2814)
I had this change but it seems to have been lost in chaos of network upgrades.

The swarm dialing event seems to miss some cases where we dial via the behaviour. This causes an error to be logged as the peer manager doesn't know about some dialing events. 

This shifts the logic to the behaviour to inform the peer manager.
2021-11-19 04:42:33 +00:00