Commit Graph

62 Commits

Author SHA1 Message Date
Diva M
3c1a22ceaf
Merge commit '1e029ce5384e911390a513e2d1885532f34a8b2b' into eip4844 2023-04-04 11:56:54 -05:00
Age Manning
311e69db65 Ban peer race condition (#4140)
It is possible that when we go to ban a peer, there is already an unbanned message in the queue. It could lead to the case that we ban and immediately unban a peer leaving us in a state where a should-be banned peer is unbanned. 

If this banned peer connects to us in this faulty state, we currently do not attempt to re-ban it. This PR does correct this also, so if we do see this error, it will now self-correct (although we shouldn't see the error in the first place). 

I have also incremented the severity of not supporting protocols as I see peers ultimately get banned in a few steps and it seems to make sense to just ban them outright, rather than have them linger.
2023-04-03 03:02:57 +00:00
Diva M
607242c127
Merge branch 'unstable' into eip4844 2023-03-17 16:26:51 -05:00
Age Manning
3d99ce25f8 Correct a race condition when dialing peers (#4056)
There is a race condition which occurs when multiple discovery queries return at almost the exact same time and they independently contain a useful peer we would like to connect to.

The condition can occur that we can add the same peer to the dial queue, before we get a chance to process the queue. 
This ends up displaying an error to the user: 
```
ERRO Dialing an already dialing peer
```
Although this error is harmless it's not ideal. 

There are two solutions to resolving this:
1. As we decide to dial the peer, we change the state in the peer-db to dialing (before we add it to the queue) which would prevent other requests from adding to the queue. 
2. We prevent duplicates in the dial queue

This PR has opted for 2. because 1. will complicate the code in that we are changing states in non-intuitive places. Although this technically adds a very slight performance cost, its probably a cleaner solution as we can keep the state-changing logic in one place.
2023-03-16 05:44:54 +00:00
Diva M
d93753cc88
Merge branch 'unstable' into off-4844 2023-03-02 15:38:00 -05:00
Divma
047c7544e3 Clean capella (#4019)
## Issue Addressed

Cleans up all the remnants of 4844 in capella. This makes sure when 4844 is reviewed there is nothing we are missing because it got included here 

## Proposed Changes

drop a bomb on every 4844 thing 

## Additional Info

Merge process I did (locally) is as follows:
- squash merge to produce one commit
- in new branch off unstable with the squashed commit create a `git revert HEAD` commit
- merge that new branch onto 4844 with `--strategy ours`
- compare local 4844 to remote 4844 and make sure the diff is empty
- enjoy

Co-authored-by: Paul Hauner <paul@paulhauner.com>
2023-03-01 03:19:02 +00:00
Age Manning
5c63d8758e Register disconnected peers when temporarily banned (#4001)
This is a correction to #3757. 

The correction registers a peer that is being disconnected in the local peer manager db to ensure we are tracking the correct state.
2023-02-21 23:45:44 +00:00
Michael Sproul
066c27750a
Merge remote-tracking branch 'origin/staging' into capella-update 2023-02-17 12:05:36 +11:00
Age Manning
8dd9249177 Enforce a timeout on peer disconnect (#3757)
On heavily crowded networks, we are seeing many attempted connections to our node every second. 

Often these connections come from peers that have just been disconnected. This can be for a number of reasons including: 
- We have deemed them to be not as useful as other peers
- They have performed poorly
- They have dropped the connection with us
- The connection was spontaneously lost
- They were randomly removed because we have too many peers

In all of these cases, if we have reached or exceeded our target peer limit, there is no desire to accept new connections immediately after the disconnect from these peers. In fact, it often costs us resources to handle the established connections and defeats some of the logic of dropping them in the first place. 

This PR adds a timeout, that prevents recently disconnected peers from reconnecting to us.

Technically we implement a ban at the swarm layer to prevent immediate re connections for at least 10 minutes. I decided to keep this light, and use a time-based LRUCache which only gets updated during the peer manager heartbeat to prevent added stress of polling a delay map for what could be a large number of peers.

This cache is bounded in time. An extra space bound could be added should people consider this a risk.

Co-authored-by: Diva M <divma@protonmail.com>
2023-02-14 03:25:42 +00:00
realbigsean
b658cc7aaf simplify checking attester cache for block and blobs. use ResourceUnavailable according to the spec 2023-01-24 10:50:47 +01:00
realbigsean
06f71e8cce
merge capella 2023-01-12 12:51:09 -05:00
Michael Sproul
2af8110529
Merge remote-tracking branch 'origin/unstable' into capella
Fixing the conflicts involved patching up some of the `block_hash` verification,
the rest will be done as part of https://github.com/sigp/lighthouse/issues/3870
2023-01-12 16:22:00 +11:00
realbigsean
438126f19a
merge upstream, fix compile errors 2023-01-11 13:52:58 -05:00
Age Manning
1d9a2022b4 Upgrade to libp2p v0.50.0 (#3764)
I've needed to do this work in order to do some episub testing. 

This version of libp2p has not yet been released, so this is left as a draft for when we wish to update.

Co-authored-by: Diva M <divma@protonmail.com>
2023-01-06 15:59:33 +00:00
Mark Mackey
c188cde034
merge upstream/unstable 2022-12-28 14:43:25 -06:00
realbigsean
5de4f5b8d0
handle parent blob request edge cases correctly. fix data availability boundary check 2022-12-19 11:39:09 -05:00
Divma
ffbf70e2d9 Clippy lints for rust 1.66 (#3810)
## Issue Addressed
Fixes the new clippy lints for rust 1.66

## Proposed Changes

Most of the changes come from:
- [unnecessary_cast](https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_cast)
- [iter_kv_map](https://rust-lang.github.io/rust-clippy/master/index.html#iter_kv_map)
- [needless_borrow](https://rust-lang.github.io/rust-clippy/master/index.html#needless_borrow)

## Additional Info

na
2022-12-16 04:04:00 +00:00
realbigsean
8102a01085
merge with upstream 2022-12-01 11:13:07 -05:00
Mark Mackey
8a04c3428e Merged with unstable 2022-11-30 17:29:10 -06:00
antondlr
e9bf7f7cc1 remove commas from comma-separated kv pairs (#3737)
## Issue Addressed

Logs are in comma separated kv list, but the values sometimes contain commas, which breaks parsing
2022-11-25 07:57:10 +00:00
Giulio rebuffo
d5a2de759b Added LightClientBootstrap V1 (#3711)
## Issue Addressed

Partially addresses #3651

## Proposed Changes

Adds server-side support for light_client_bootstrap_v1 topic

## Additional Info

This PR, creates each time a bootstrap without using cache, I do not know how necessary a cache is in this case as this topic is not supposed to be called frequently and IMHO we can just prevent abuse by using the limiter, but let me know what you think or if there is any caveat to this, or if it is necessary only for the sake of good practice.


Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
2022-11-25 05:19:00 +00:00
realbigsean
7162e5e23b
add a bunch of blob coupling boiler plate, add a blobs by root request 2022-11-15 16:43:56 -05:00
realbigsean
8656d23327
merge with unstable 2022-11-01 13:18:00 -04:00
Divma
46fbf5b98b Update discv5 (#3171)
## Issue Addressed

Updates discv5

Pending on
- [x] #3547 
- [x] Alex upgrades his deps

## Proposed Changes

updates discv5 and the enr crate. The only relevant change would be some clear indications of ipv4 usage in lighthouse

## Additional Info

Functionally, this should be equivalent to the prev version.
As draft pending a discv5 release
2022-10-28 05:40:06 +00:00
realbigsean
e81dbbfea4
compile 2022-10-03 21:48:02 -04:00
realbigsean
88006735c4
compile 2022-10-03 10:06:04 -04:00
realbigsean
4008da6c60
sync tx blobs 2022-09-29 12:32:55 -04:00
Age Manning
27bb9ff07d Handle Lodestar's new agent string (#3620)
## Issue Addressed

#3561 

## Proposed Changes

Recognize Lodestars new agent string and appropriately count these peers as lodestar peers.
2022-09-29 01:50:13 +00:00
Divma
b1d2510d1b Libp2p v0.48.0 upgrade (#3547)
## Issue Addressed

Upgrades libp2p to v.0.47.0. This is the compilation of
- [x] #3495 
- [x] #3497 
- [x] #3491 
- [x] #3546 
- [x] #3553 

Co-authored-by: Age Manning <Age@AgeManning.com>
2022-09-29 01:50:11 +00:00
Marius van der Wijden
36a0add0cd network stuff 2022-09-17 15:23:28 +02:00
Divma
473abc14ca Subscribe to subnets only when needed (#3419)
## Issue Addressed

We currently subscribe to attestation subnets as soon as the subscription arrives (one epoch in advance), this makes it so that subscriptions for future slots are scheduled instead of done immediately. 

## Proposed Changes

- Schedule subscriptions to subnets for future slots.
- Finish removing hashmap_delay, in favor of [delay_map](https://github.com/AgeManning/delay_map). This was the only remaining service to do this.
- Subscriptions for past slots are rejected, before we would subscribe for one slot.
- Add a new test for subscriptions that are not consecutive.

## Additional Info

This is also an effort in making the code easier to understand
2022-09-05 00:22:48 +00:00
Pawan Dhananjay
f3439116da Return ResourceUnavailable if we are unable to reconstruct execution payloads (#3365)
## Issue Addressed

Resolves #3351 

## Proposed Changes

Returns a `ResourceUnavailable` rpc error if we are unable to serve full payloads to blocks by root and range requests because the execution layer is not synced.


## Additional Info

This PR also changes the penalties such that a `ResourceUnavailable` error is only penalized if it is an outgoing request. If we are syncing and aren't getting full block responses, then we don't have use for the peer. However, this might not be true for the incoming request case. We let the peer decide in this case if we are still useful or if we should be banned.
cc @divagant-martian please let me know if i'm missing something here.
2022-07-27 03:20:00 +00:00
Akihito Nakano
98a9626ef5 Bump the MSRV to 1.62 and using #[derive(Default)] on enums (#3304)
## Issue Addressed

N/A

## Proposed Changes

Since Rust 1.62, we can use `#[derive(Default)]` on enums.  

https://blog.rust-lang.org/2022/06/30/Rust-1.62.0.html#default-enum-variants

There are no changes to functionality in this PR, just replaced the `Default` trait implementation with `#[derive(Default)]`.
2022-07-15 07:31:19 +00:00
Akihito Nakano
082ed35bdc Test the pruning of excess peers using randomly generated input (#3248)
## Issue Addressed

https://github.com/sigp/lighthouse/issues/3092


## Proposed Changes

Added property-based tests for the pruning implementation. A randomly generated input for the test contains connection direction, subnets, and scores.


## Additional Info

I left some comments on this PR, what I have tried, and [a question](https://github.com/sigp/lighthouse/pull/3248#discussion_r891981969).

Co-authored-by: Diva M <divma@protonmail.com>
2022-06-25 22:22:34 +00:00
Divma
3dd50bda11 Improve substream management (#3261)
## Issue Addressed

Which issue # does this PR address?

## Proposed Changes

Please list or describe the changes introduced by this PR.

## Additional Info

Please provide any additional information. For example, future considerations
or information useful for reviewers.
2022-06-10 06:58:50 +00:00
Akihito Nakano
a6d2ed6119 Fix: PeerManager doesn't remove "outbound only" peers which should be pruned (#3236)
## Issue Addressed

This is one step to address https://github.com/sigp/lighthouse/issues/3092 before introducing `quickcheck`.

I noticed an issue while I was reading the pruning implementation `PeerManager::prune_excess_peers()`. If a peer with the following condition, **`outbound_peers_pruned` counter increases but the peer is not pushed to `peers_to_prune`**.

- [outbound only](1e4ac8a4b9/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L1018))
- [min_subnet_count <= MIN_SYNC_COMMITTEE_PEERS](1e4ac8a4b9/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L1047))

As a result, PeerManager doesn't remove "outbound" peers which should be pruned.

Note: [`subnet_to_peer`](e0d673ea86/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L999)) (HashMap) doesn't guarantee a particular order of iteration. So whether the test fails depend on the order of iteration.
2022-06-06 05:51:10 +00:00
Akihito Nakano
695f415590 Tiny improvement: PeerManager and maximum discovery query (#3182)
## Issue Addressed

As [`Discovery` bounds the maximum discovery query](e88b18be09/beacon_node/lighthouse_network/src/discovery/mod.rs (L328)), `PeerManager` no need to handle it.

e88b18be09/beacon_node/lighthouse_network/src/discovery/mod.rs (L328)
2022-05-19 06:00:46 +00:00
Divma
580d2f7873 log upgrades + prevent dialing of disconnecting peers (#3148)
## Issue Addressed
We still ping peers that are considered in a disconnecting state

## Proposed Changes

Do not ping peers once we decide they are disconnecting
Upgrade logs about ignored rpc messages

## Additional Info
--
2022-04-13 03:54:43 +00:00
Pawan Dhananjay
ab434bc075 Fix merge rpc length limits (#3133)
## Issue Addressed

N/A

## Proposed Changes

Fix the upper bound for blocks by root responses to be equal to the max merge block size instead of altair.
Further make the rpc response limits fork aware.
2022-04-04 00:26:15 +00:00
Michael Sproul
41e7a07c51 Add lighthouse db command (#3129)
## Proposed Changes

Add a `lighthouse db` command with three initial subcommands:

- `lighthouse db version`: print the database schema version.
- `lighthouse db migrate --to N`: manually upgrade (or downgrade!) the database to a different version.
- `lighthouse db inspect --column C`: log the key and size in bytes of every value in a given `DBColumn`.

This PR lays the groundwork for other changes, namely:

- Mark's fast-deposit sync (https://github.com/sigp/lighthouse/pull/2915), for which I think we should implement a database downgrade (from v9 to v8).
- My `tree-states` work, which already implements a downgrade (v10 to v8).
- Standalone purge commands like `lighthouse db purge-dht` per https://github.com/sigp/lighthouse/issues/2824.

## Additional Info

I updated the `strum` crate to 0.24.0, which necessitated some changes in the network code to remove calls to deprecated methods.

Thanks to @winksaville for the motivation, and implementation work that I used as a source of inspiration (https://github.com/sigp/lighthouse/pull/2685).
2022-04-01 00:58:59 +00:00
Divma
4bf1af4e85 Custom RPC request management for sync (#3029)
## Proposed Changes
Make `lighthouse_network` generic over request ids, now usable by sync
2022-03-02 22:07:17 +00:00
Age Manning
e88b18be09 Update libp2p (#3039)
Update libp2p. 

This corrects some gossipsub metrics.
2022-03-02 05:09:52 +00:00
Age Manning
a1b730c043 Cleanup small issues (#3027)
Downgrades some excessive networking logs and corrects some metrics.
2022-03-01 01:49:22 +00:00
Age Manning
3ebb8b0244 Improved peer management (#2993)
## Issue Addressed

I noticed in some logs some excess and unecessary discovery queries. What was happening was we were pruning our peers down to our outbound target and having some disconnect. When we are below this threshold we try to find more peers (even if we are at our peer limit). The request becomes futile because we have no more peer slots. 

This PR corrects this issue and advances the pruning mechanism to favour subnet peers. 

An overview the new logic added is:
- We prune peers down to a target outbound peer count which is higher than the minimum outbound peer count.
- We only search for more peers if there is room to do so, and we are below the minimum outbound peer count not the target. So this gives us some buffer for peers to disconnect. The buffer is currently 10%

The modified pruning logic is documented in the code but for reference it should do the following:
- Prune peers with bad scores first
- If we need to prune more peers, then prune peers that are subscribed to a long-lived subnet
- If we still need to prune peers, the prune peers that we have a higher density of on any given subnet which should drive for uniform peers across all subnets.

This will need a bit of testing as it modifies some significant peer management behaviours in lighthouse.
2022-02-18 02:36:43 +00:00
Divma
1306b2db96 libp2p upgrade + gossipsub interval fix (#3012)
## Issue Addressed
Lighthouse gossiping late messages

## Proposed Changes
Point LH to our fork using tokio interval, which 1) works as expected 2) is more performant than the previous version that actually worked as expected
Upgrade libp2p 

## Additional Info
https://github.com/libp2p/rust-libp2p/issues/2497
2022-02-10 04:12:03 +00:00
Divma
48b7c8685b upgrade libp2p (#2933)
## Issue Addressed

Upgrades libp2p to v.0.42.0 pre release (https://github.com/libp2p/rust-libp2p/pull/2440)
2022-02-07 23:25:03 +00:00
Mac L
286996b090 Fix small typo in error log (#2975)
## Proposed Changes

Fixes a small typo I came across.
2022-01-31 22:55:07 +00:00
Age Manning
fc7a1a7dc7 Allow disconnected states to introduce new peers without warning (#2922)
## Issue Addressed

We emit a warning to verify that all peer connection state information is consistent. A warning is given under one edge case;
We try to dial a peer with peer-id X and multiaddr Y. The peer responds to multiaddr Y with a different peer-id, Z. The dialing to the peer fails, but libp2p injects the failed attempt as peer-id Z. 

In this instance, our PeerDB tries to add a new peer in the disconnected state under a previously unknown peer-id. This is harmless and so this PR permits this behaviour without logging a warning.
2022-01-20 09:14:25 +00:00
Age Manning
1c667ad3ca PeerDB Status unknown bug fix (#2907)
## Issue Addressed

The PeerDB was getting out of sync with the number of disconnected peers compared to the actual count. As this value determines how many we store in our cache, over time the cache was depleting and we were removing peers immediately resulting in errors that manifest as unknown peers for some operations.

The error occurs when dialing a peer fails, we were not correctly updating the peerdb counter because the increment to the counter was placed in the wrong order and was therefore not incrementing the count. 

This PR corrects this.
2022-01-14 05:42:48 +00:00
Age Manning
6f4102aab6 Network performance tuning (#2608)
There is a pretty significant tradeoff between bandwidth and speed of gossipsub messages. 

We can reduce our bandwidth usage considerably at the cost of minimally delaying gossipsub messages. The impact of delaying messages has not been analyzed thoroughly yet, however this PR in conjunction with some gossipsub updates show considerable bandwidth reduction. 

This PR allows the user to set a CLI value (`network-load`) which is an integer in the range of 1 of 5 depending on their bandwidth appetite. 1 represents the least bandwidth but slowest message recieving and 5 represents the most bandwidth and fastest received message time. 

For low-bandwidth users it is likely to be more efficient to use a lower value. The default is set to 3, which currently represents a reduced bandwidth usage compared to previous version of this PR. The previous lighthouse versions are equivalent to setting the `network-load` CLI to 4.

This PR is awaiting a few gossipsub updates before we can get it into lighthouse.
2022-01-14 05:42:47 +00:00