Commit Graph

81 Commits

Author SHA1 Message Date
realbigsean
137f230344
Capella eip 4844 cleanup (#3652)
* add capella gossip boiler plate

* get everything compiling

Co-authored-by: realbigsean <sean@sigmaprime.io
Co-authored-by: Mark Mackey <mark@sigmaprime.io>

* small cleanup

* small cleanup

* cargo fix + some test cleanup

* improve block production

* add fixme for potential panic

Co-authored-by: Mark Mackey <mark@sigmaprime.io>
2022-10-26 15:15:26 -04:00
Pawan Dhananjay
c55b28bf10
Minor fixes 2022-10-04 19:18:06 -05:00
realbigsean
7527c2b455
fix RPC limit add blob signing domain 2022-10-04 14:57:29 -04:00
realbigsean
ba16a037a3
cleanup 2022-10-04 09:34:05 -04:00
realbigsean
c0dc42ea07
cargo fmt 2022-10-04 08:21:46 -04:00
realbigsean
8d45e48775
cargo fix 2022-10-03 21:52:16 -04:00
realbigsean
e81dbbfea4
compile 2022-10-03 21:48:02 -04:00
realbigsean
88006735c4
compile 2022-10-03 10:06:04 -04:00
realbigsean
7520651515
cargo fix and some test fixes 2022-09-29 12:43:35 -04:00
realbigsean
fe6fc55449
fix compilation errors, rename capella -> shanghai, cleanup some rebase issues 2022-09-29 12:43:13 -04:00
realbigsean
3f1e5cee78
Some gossip work 2022-09-29 12:35:53 -04:00
realbigsean
4008da6c60
sync tx blobs 2022-09-29 12:32:55 -04:00
realbigsean
4cdf1b546d
add shanghai fork version and epoch 2022-09-29 12:28:58 -04:00
realbigsean
de44b300c0
add/update types 2022-09-29 12:25:56 -04:00
Age Manning
27bb9ff07d Handle Lodestar's new agent string (#3620)
## Issue Addressed

#3561 

## Proposed Changes

Recognize Lodestars new agent string and appropriately count these peers as lodestar peers.
2022-09-29 01:50:13 +00:00
Divma
b1d2510d1b Libp2p v0.48.0 upgrade (#3547)
## Issue Addressed

Upgrades libp2p to v.0.47.0. This is the compilation of
- [x] #3495 
- [x] #3497 
- [x] #3491 
- [x] #3546 
- [x] #3553 

Co-authored-by: Age Manning <Age@AgeManning.com>
2022-09-29 01:50:11 +00:00
Marius van der Wijden
8b71b978e0 new round of hacks (config etc) 2022-09-17 23:42:49 +02:00
Marius van der Wijden
aeb52ff186 network stuff 2022-09-17 16:10:42 +02:00
Marius van der Wijden
36a0add0cd network stuff 2022-09-17 15:23:28 +02:00
Daniel Knopik
0518665949 Merge remote-tracking branch 'fork/eip4844' into eip4844 2022-09-17 14:58:33 +02:00
Daniel Knopik
292a16a6eb gossip boilerplate 2022-09-17 14:58:27 +02:00
Marius van der Wijden
acace8ab31 network: blobs by range message 2022-09-17 14:55:18 +02:00
Daniel Knopik
bcc738cb9d progress on gossip stuff 2022-09-17 14:31:57 +02:00
Daniel Knopik
ca1e17b386 it compiles! 2022-09-17 12:23:03 +02:00
Divma
473abc14ca Subscribe to subnets only when needed (#3419)
## Issue Addressed

We currently subscribe to attestation subnets as soon as the subscription arrives (one epoch in advance), this makes it so that subscriptions for future slots are scheduled instead of done immediately. 

## Proposed Changes

- Schedule subscriptions to subnets for future slots.
- Finish removing hashmap_delay, in favor of [delay_map](https://github.com/AgeManning/delay_map). This was the only remaining service to do this.
- Subscriptions for past slots are rejected, before we would subscribe for one slot.
- Add a new test for subscriptions that are not consecutive.

## Additional Info

This is also an effort in making the code easier to understand
2022-09-05 00:22:48 +00:00
Pawan Dhananjay
f3439116da Return ResourceUnavailable if we are unable to reconstruct execution payloads (#3365)
## Issue Addressed

Resolves #3351 

## Proposed Changes

Returns a `ResourceUnavailable` rpc error if we are unable to serve full payloads to blocks by root and range requests because the execution layer is not synced.


## Additional Info

This PR also changes the penalties such that a `ResourceUnavailable` error is only penalized if it is an outgoing request. If we are syncing and aren't getting full block responses, then we don't have use for the peer. However, this might not be true for the incoming request case. We let the peer decide in this case if we are still useful or if we should be banned.
cc @divagant-martian please let me know if i'm missing something here.
2022-07-27 03:20:00 +00:00
Justin Traglia
0f62d900fe Fix some typos (#3376)
## Proposed Changes

This PR fixes various minor typos in the project.
2022-07-27 00:51:06 +00:00
Akihito Nakano
98a9626ef5 Bump the MSRV to 1.62 and using #[derive(Default)] on enums (#3304)
## Issue Addressed

N/A

## Proposed Changes

Since Rust 1.62, we can use `#[derive(Default)]` on enums.  

https://blog.rust-lang.org/2022/06/30/Rust-1.62.0.html#default-enum-variants

There are no changes to functionality in this PR, just replaced the `Default` trait implementation with `#[derive(Default)]`.
2022-07-15 07:31:19 +00:00
Paul Hauner
be4e261e74 Use async code when interacting with EL (#3244)
## Overview

This rather extensive PR achieves two primary goals:

1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.

Additionally, it achieves:

- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
    - I had to do this to deal with sending blocks into spawned tasks.
    - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
    - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
    - Avoids cloning *all the blocks* in *every chain segment* during sync.
    - It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough 😅)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.

For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273

## Changes to `canonical_head` and `fork_choice`

Previously, the `BeaconChain` had two separate fields:

```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```

Now, we have grouped these values under a single struct:

```
canonical_head: CanonicalHead {
  cached_head: RwLock<Arc<Snapshot>>,
  fork_choice: RwLock<BeaconForkChoice>
} 
```

Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.

## Breaking Changes

### The `state` (root) field in the `finalized_checkpoint` SSE event

Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:

1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.

Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](de2b2801c8/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java (L171-L182)) it uses [`getStateRootFromBlockRoot`](de2b2801c8/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java (L336-L341)) which uses (1).

I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.

## Notes for Reviewers

I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.

I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".

I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.

I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.

Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.

You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.

I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.

Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
Akihito Nakano
082ed35bdc Test the pruning of excess peers using randomly generated input (#3248)
## Issue Addressed

https://github.com/sigp/lighthouse/issues/3092


## Proposed Changes

Added property-based tests for the pruning implementation. A randomly generated input for the test contains connection direction, subnets, and scores.


## Additional Info

I left some comments on this PR, what I have tried, and [a question](https://github.com/sigp/lighthouse/pull/3248#discussion_r891981969).

Co-authored-by: Diva M <divma@protonmail.com>
2022-06-25 22:22:34 +00:00
Divma
7af5742081 Deprecate step param in BlocksByRange RPC request (#3275)
## Issue Addressed

Deprecates the step parameter in the blocks by range request

## Proposed Changes

- Modifies the BlocksByRangeRequest type to remove the step parameter and everywhere we took it into account before
- Adds a new type to still handle coding and decoding of requests that use the parameter

## Additional Info
I went with a deprecation over the type itself so that requests received outside `lighthouse_network` don't even need to deal with this parameter. After the deprecation period just removing the Old blocks by range request should be straightforward
2022-06-22 16:23:34 +00:00
Divma
3dd50bda11 Improve substream management (#3261)
## Issue Addressed

Which issue # does this PR address?

## Proposed Changes

Please list or describe the changes introduced by this PR.

## Additional Info

Please provide any additional information. For example, future considerations
or information useful for reviewers.
2022-06-10 06:58:50 +00:00
Akihito Nakano
a6d2ed6119 Fix: PeerManager doesn't remove "outbound only" peers which should be pruned (#3236)
## Issue Addressed

This is one step to address https://github.com/sigp/lighthouse/issues/3092 before introducing `quickcheck`.

I noticed an issue while I was reading the pruning implementation `PeerManager::prune_excess_peers()`. If a peer with the following condition, **`outbound_peers_pruned` counter increases but the peer is not pushed to `peers_to_prune`**.

- [outbound only](1e4ac8a4b9/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L1018))
- [min_subnet_count <= MIN_SYNC_COMMITTEE_PEERS](1e4ac8a4b9/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L1047))

As a result, PeerManager doesn't remove "outbound" peers which should be pruned.

Note: [`subnet_to_peer`](e0d673ea86/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L999)) (HashMap) doesn't guarantee a particular order of iteration. So whether the test fails depend on the order of iteration.
2022-06-06 05:51:10 +00:00
Akihito Nakano
695f415590 Tiny improvement: PeerManager and maximum discovery query (#3182)
## Issue Addressed

As [`Discovery` bounds the maximum discovery query](e88b18be09/beacon_node/lighthouse_network/src/discovery/mod.rs (L328)), `PeerManager` no need to handle it.

e88b18be09/beacon_node/lighthouse_network/src/discovery/mod.rs (L328)
2022-05-19 06:00:46 +00:00
François Garillot
3f9e83e840 [refactor] Refactor Option/Result combinators (#3180)
Code simplifications using `Option`/`Result` combinators to make pattern-matches a tad simpler. 
Opinions on these loosely held, happy to adjust in review.

Tool-aided by [comby-rust](https://github.com/huitseeker/comby-rust).
2022-05-16 01:59:47 +00:00
Pawan Dhananjay
db0beb5178 Poll shutdown timeout in rpc handler (#3153)
## Issue Addressed

N/A

## Proposed Changes

Previously, we were using `Sleep::is_elapsed()` to check if the shutdown timeout had triggered without polling the sleep. This PR polls the sleep timer.
2022-04-13 03:54:44 +00:00
Divma
580d2f7873 log upgrades + prevent dialing of disconnecting peers (#3148)
## Issue Addressed
We still ping peers that are considered in a disconnecting state

## Proposed Changes

Do not ping peers once we decide they are disconnecting
Upgrade logs about ignored rpc messages

## Additional Info
--
2022-04-13 03:54:43 +00:00
Pawan Dhananjay
fff4dd6311 Fix rpc limits version 2 (#3146)
## Issue Addressed

N/A

## Proposed Changes

https://github.com/sigp/lighthouse/pull/3133 changed the rpc type limits to be fork aware i.e. if our current fork based on wall clock slot is Altair, then we apply only altair rpc type limits. This is a bug because phase0 blocks can still be sent over rpc and phase 0 block minimum size is smaller than altair block minimum size. So a phase0 block with `size < SIGNED_BEACON_BLOCK_ALTAIR_MIN` will return an `InvalidData` error as it doesn't pass the rpc types bound check.

This error can be seen when we try syncing pre-altair blocks with size smaller than `SIGNED_BEACON_BLOCK_ALTAIR_MIN`.

This PR fixes the issue by also accounting for forks earlier than current_fork in the rpc limits calculation in the  `rpc_block_limits_by_fork` function. I decided to hardcode the limits in the function because that seemed simpler than calculating previous forks based on current fork and doing a min across forks. Adding a new fork variant is simple and can the limits can be easily checked in a review. 

Adds unit tests and modifies the syncing simulator to check the syncing from across fork boundaries. 
The syncing simulator's block 1 would always be of phase 0 minimum size (404 bytes) which is smaller than altair min block size (since block 1 contains no attestations).
2022-04-07 23:45:38 +00:00
Pawan Dhananjay
ab434bc075 Fix merge rpc length limits (#3133)
## Issue Addressed

N/A

## Proposed Changes

Fix the upper bound for blocks by root responses to be equal to the max merge block size instead of altair.
Further make the rpc response limits fork aware.
2022-04-04 00:26:15 +00:00
Michael Sproul
41e7a07c51 Add lighthouse db command (#3129)
## Proposed Changes

Add a `lighthouse db` command with three initial subcommands:

- `lighthouse db version`: print the database schema version.
- `lighthouse db migrate --to N`: manually upgrade (or downgrade!) the database to a different version.
- `lighthouse db inspect --column C`: log the key and size in bytes of every value in a given `DBColumn`.

This PR lays the groundwork for other changes, namely:

- Mark's fast-deposit sync (https://github.com/sigp/lighthouse/pull/2915), for which I think we should implement a database downgrade (from v9 to v8).
- My `tree-states` work, which already implements a downgrade (v10 to v8).
- Standalone purge commands like `lighthouse db purge-dht` per https://github.com/sigp/lighthouse/issues/2824.

## Additional Info

I updated the `strum` crate to 0.24.0, which necessitated some changes in the network code to remove calls to deprecated methods.

Thanks to @winksaville for the motivation, and implementation work that I used as a source of inspiration (https://github.com/sigp/lighthouse/pull/2685).
2022-04-01 00:58:59 +00:00
Divma
4bf1af4e85 Custom RPC request management for sync (#3029)
## Proposed Changes
Make `lighthouse_network` generic over request ids, now usable by sync
2022-03-02 22:07:17 +00:00
Age Manning
e88b18be09 Update libp2p (#3039)
Update libp2p. 

This corrects some gossipsub metrics.
2022-03-02 05:09:52 +00:00
Age Manning
f3c1dde898 Filter non global ips from discovery (#3023)
## Issue Addressed

#3006 

## Proposed Changes

This PR changes the default behaviour of lighthouse to ignore discovered IPs that are not globally routable. It adds a CLI flag, --enable-local-discovery to permit the non-global IPs in discovery.

NOTE: We should take care in merging this as I will break current set-ups that rely on local IP discovery. I made this the non-default behaviour because we dont really want to be wasting resources attempting to connect to non-routable addresses and we dont want to propagate these to others (on the chance we can connect to one of these local nodes), improving discoveries efficiency.
2022-03-02 03:14:27 +00:00
Age Manning
a1b730c043 Cleanup small issues (#3027)
Downgrades some excessive networking logs and corrects some metrics.
2022-03-01 01:49:22 +00:00
Michael Sproul
5e1f8a8480 Update to Rust 1.59 and 2021 edition (#3038)
## Proposed Changes

Lots of lint updates related to `flat_map`, `unwrap_or_else` and string patterns. I did a little more creative refactoring in the op pool, but otherwise followed Clippy's suggestions.

## Additional Info

We need this PR to unblock CI.
2022-02-25 00:10:17 +00:00
Age Manning
3ebb8b0244 Improved peer management (#2993)
## Issue Addressed

I noticed in some logs some excess and unecessary discovery queries. What was happening was we were pruning our peers down to our outbound target and having some disconnect. When we are below this threshold we try to find more peers (even if we are at our peer limit). The request becomes futile because we have no more peer slots. 

This PR corrects this issue and advances the pruning mechanism to favour subnet peers. 

An overview the new logic added is:
- We prune peers down to a target outbound peer count which is higher than the minimum outbound peer count.
- We only search for more peers if there is room to do so, and we are below the minimum outbound peer count not the target. So this gives us some buffer for peers to disconnect. The buffer is currently 10%

The modified pruning logic is documented in the code but for reference it should do the following:
- Prune peers with bad scores first
- If we need to prune more peers, then prune peers that are subscribed to a long-lived subnet
- If we still need to prune peers, the prune peers that we have a higher density of on any given subnet which should drive for uniform peers across all subnets.

This will need a bit of testing as it modifies some significant peer management behaviours in lighthouse.
2022-02-18 02:36:43 +00:00
Paul Hauner
0a6a8ea3b0 Engine API v1.0.0.alpha.6 + interop tests (#3024)
## Issue Addressed

NA

## Proposed Changes

This PR extends #3018 to address my review comments there and add automated integration tests with Geth (and other implementations, in the future).

I've also de-duplicated the "unused port" logic by creating an  `common/unused_port` crate.

## Additional Info

I'm not sure if we want to merge this PR, or update #3018 and merge that. I don't mind, I'm primarily opening this PR to make sure CI works.


Co-authored-by: Mark Mackey <mark@sigmaprime.io>
2022-02-17 21:47:06 +00:00
Divma
1306b2db96 libp2p upgrade + gossipsub interval fix (#3012)
## Issue Addressed
Lighthouse gossiping late messages

## Proposed Changes
Point LH to our fork using tokio interval, which 1) works as expected 2) is more performant than the previous version that actually worked as expected
Upgrade libp2p 

## Additional Info
https://github.com/libp2p/rust-libp2p/issues/2497
2022-02-10 04:12:03 +00:00
Divma
36fc887a40 Gossip cache timeout adjustments (#2997)
## Proposed Changes

- Do not retry to publish sync committee messages.
- Give a more lenient timeout to slashings and exits
2022-02-07 23:25:06 +00:00
Age Manning
675c7b7e26 Correct a dial race condition (#2992)
## Issue Addressed

On a network with few nodes, it is possible that the same node can be found from a subnet discovery and a normal peer discovery at the same time.

The network behaviour loads these peers into events and processes them when it has the chance. It can happen that the same peer can enter the event queue more than once and then attempt to be dialed twice. 

This PR shifts the registration of nodes in the peerdb as being dialed before they enter the NetworkBehaviour queue, preventing multiple attempts of the same peer being entered into the queue and avoiding the race condition.
2022-02-07 23:25:05 +00:00