* update libp2p and address compiler errors
* remove bandwidth logging from transport
* use libp2p registry
* make clippy happy
* use rust 1.73
* correct rpc keep alive
* remove comments and obsolte code
* remove libp2p prefix
* make clippy happy
* use quic under facade
* remove fast msg id
* bubble up close statements
* fix wrong comment
## Issue Addressed
https://github.com/sigp/lighthouse/issues/4543
## Proposed Changes
- Removes `NotBanned` from `BanResult`, implements `Display` and `std::error::Error` for `BanResult` and changes `ban_result` return type to `Option<BanResult>` which helps returning `BanResult` on `handle_established_inbound_connection`
- moves the check from for banned peers from `on_connection_established` to `handle_established_inbound_connection` to start addressing #4543.
- Removes `allow_block_list` as it's now redundant? Not sure about this one but if `PeerManager` keeps track of the banned peers, no need to send a `Swarm` event for `alow_block_list` to also keep that list right?
## Questions
- #4543 refers:
> More specifically, implement the connection limit behaviour inside the peer manager.
@AgeManning do you mean copying `libp2p::connection_limits::Behaviour`'s code into `PeerManager`/ having it as an inner `NetworkBehaviour` of `PeerManager`/other? If it's the first two, I think it probably makes more sense to have it as it is as it's less code to maintain.
> Also implement the banning of peers inside the behaviour, rather than passing messages back up to the swarm.
I tried to achieve this, but we still need to pass the `PeerManagerEvent::Banned` swarm event as `DiscV5` handles it's node and ip management internally and I did not find a method to query if a peer is banned. Is there anything else we can do from here?
3397612160/beacon_node/lighthouse_network/src/discovery/mod.rs (L931-L940)
Same as the question above, I did not find a way to check if `DiscV5` has the peer banned, so that we could check here and avoid sending `Swarm` events
3397612160/beacon_node/lighthouse_network/src/peer_manager/network_behaviour.rs (L168-L178)
Is there a chance we try to dial a peer that has been banned previously?
Thanks!
## Issue Addressed
Synchronize dependencies and edition on the workspace `Cargo.toml`
## Proposed Changes
with https://github.com/rust-lang/cargo/issues/8415 merged it's now possible to synchronize details on the workspace `Cargo.toml` like the metadata and dependencies.
By only having dependencies that are shared between multiple crates aligned on the workspace `Cargo.toml` it's easier to not miss duplicate versions of the same dependency and therefore ease on the compile times.
## Additional Info
this PR also removes the no longer required direct dependency of the `serde_derive` crate.
should be reviewed after https://github.com/sigp/lighthouse/pull/4639 get's merged.
closes https://github.com/sigp/lighthouse/issues/4651
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: Michael Sproul <micsproul@gmail.com>
## Issue Addressed
#4402
## Proposed Changes
This PR adds QUIC support to Lighthouse. As this is not officially spec'd this will only work between lighthouse <-> lighthouse connections. We attempt a QUIC connection (if the node advertises it) and if it fails we fallback to TCP.
This should be a backwards compatible modification. We want to test this functionality on live networks to observe any improvements in bandwidth/latency.
NOTE: This also removes the websockets transport as I believe no one is really using it. It should be mentioned in our release however.
Co-authored-by: João Oliveira <hello@jxs.pt>
## Issue Addressed
#4538
## Proposed Changes
add newtype wrapper around DialError that extracts error messages and logs them in a more readable format
## Additional Info
I was able to test Transport Dial Errors in the situation where a libp2p instance attempts to ping a nonexistent peer. That error message should look something like
`A transport level error has ocurred: Connection refused (os error 61)`
AgeManning mentioned we should try fetching only the most inner error (in situations where theres a nested error). I took a stab at implementing that
For non transport DialErrors, I wrote out the error messages explicitly (as per the docs). Could potentially clean things up here if thats not necessary
Co-authored-by: Age Manning <Age@AgeManning.com>
## Issue Addressed
Upgrade libp2p to v0.52
## Proposed Changes
- **Workflows**: remove installation of `protoc`
- **Book**: remove installation of `protoc`
- **`Dockerfile`s and `cross`**: remove custom base `Dockerfile` for cross since it's no longer needed. Remove `protoc` from remaining `Dockerfiles`s
- **Upgrade `discv5` to `v0.3.1`:** we have some cool stuff in there: no longer needs `protoc` and faster ip updates on cold start
- **Upgrade `prometheus` to `0.21.0`**, now it no longer needs encoding checks
- **things that look like refactors:** bunch of api types were renamed and need to be accessed in a different (clearer) way
- **Lighthouse network**
- connection limits is now a behaviour
- banned peers no longer exist on the swarm level, but at the behaviour level
- `connection_event_buffer_size` now is handled per connection with a buffer size of 4
- `mplex` is deprecated and was removed
- rpc handler now logs the peer to which it belongs
## Additional Info
Tried to keep as much behaviour unchanged as possible. However, there is a great deal of improvements we can do _after_ this upgrade:
- Smart connection limits: Connection limits have been checked only based on numbers, we can now use information about the incoming peer to decide if we want it
- More powerful peer management: Dial attempts from other behaviours can be rejected early
- Incoming connections can be rejected early
- Banning can be returned exclusively to the peer management: We should not get connections to banned peers anymore making use of this
- TCP Nat updates: We might be able to take advantage of confirmed external addresses to check out tcp ports/ips
Co-authored-by: Age Manning <Age@AgeManning.com>
Co-authored-by: Akihito Nakano <sora.akatsuki@gmail.com>
## Issue Addressed
#4150
## Proposed Changes
Maintain trusted peers in the pruning logic. ~~In principle the changes here are not necessary as a trusted peer has a max score (100) and all other peers can have at most 0 (because we don't implement positive scores). This means that we should never prune trusted peers unless we have more trusted peers than the target peer count.~~
This change shifts this logic to explicitly never prune trusted peers which I expect is the intuitive behaviour.
~~I suspect the issue in #4150 arises when a trusted peer disconnects from us for one reason or another and then we remove that peer from our peerdb as it becomes stale. When it re-connects at some large time later, it is no longer a trusted peer.~~
Currently we do disconnect trusted peers, and this PR corrects this to maintain trusted peers in the pruning logic.
As suggested in #4150 we maintain trusted peers in the db and thus we remember them even if they disconnect from us.
## Issue Addressed
N/A
## Proposed Changes
Adds a flag for disabling peer scoring. This is useful for local testing and testing small networks for new features.
It is possible that when we go to ban a peer, there is already an unbanned message in the queue. It could lead to the case that we ban and immediately unban a peer leaving us in a state where a should-be banned peer is unbanned.
If this banned peer connects to us in this faulty state, we currently do not attempt to re-ban it. This PR does correct this also, so if we do see this error, it will now self-correct (although we shouldn't see the error in the first place).
I have also incremented the severity of not supporting protocols as I see peers ultimately get banned in a few steps and it seems to make sense to just ban them outright, rather than have them linger.
There is a race condition which occurs when multiple discovery queries return at almost the exact same time and they independently contain a useful peer we would like to connect to.
The condition can occur that we can add the same peer to the dial queue, before we get a chance to process the queue.
This ends up displaying an error to the user:
```
ERRO Dialing an already dialing peer
```
Although this error is harmless it's not ideal.
There are two solutions to resolving this:
1. As we decide to dial the peer, we change the state in the peer-db to dialing (before we add it to the queue) which would prevent other requests from adding to the queue.
2. We prevent duplicates in the dial queue
This PR has opted for 2. because 1. will complicate the code in that we are changing states in non-intuitive places. Although this technically adds a very slight performance cost, its probably a cleaner solution as we can keep the state-changing logic in one place.
## Issue Addressed
Cleans up all the remnants of 4844 in capella. This makes sure when 4844 is reviewed there is nothing we are missing because it got included here
## Proposed Changes
drop a bomb on every 4844 thing
## Additional Info
Merge process I did (locally) is as follows:
- squash merge to produce one commit
- in new branch off unstable with the squashed commit create a `git revert HEAD` commit
- merge that new branch onto 4844 with `--strategy ours`
- compare local 4844 to remote 4844 and make sure the diff is empty
- enjoy
Co-authored-by: Paul Hauner <paul@paulhauner.com>
This is a correction to #3757.
The correction registers a peer that is being disconnected in the local peer manager db to ensure we are tracking the correct state.
On heavily crowded networks, we are seeing many attempted connections to our node every second.
Often these connections come from peers that have just been disconnected. This can be for a number of reasons including:
- We have deemed them to be not as useful as other peers
- They have performed poorly
- They have dropped the connection with us
- The connection was spontaneously lost
- They were randomly removed because we have too many peers
In all of these cases, if we have reached or exceeded our target peer limit, there is no desire to accept new connections immediately after the disconnect from these peers. In fact, it often costs us resources to handle the established connections and defeats some of the logic of dropping them in the first place.
This PR adds a timeout, that prevents recently disconnected peers from reconnecting to us.
Technically we implement a ban at the swarm layer to prevent immediate re connections for at least 10 minutes. I decided to keep this light, and use a time-based LRUCache which only gets updated during the peer manager heartbeat to prevent added stress of polling a delay map for what could be a large number of peers.
This cache is bounded in time. An extra space bound could be added should people consider this a risk.
Co-authored-by: Diva M <divma@protonmail.com>
I've needed to do this work in order to do some episub testing.
This version of libp2p has not yet been released, so this is left as a draft for when we wish to update.
Co-authored-by: Diva M <divma@protonmail.com>
## Issue Addressed
Partially addresses #3651
## Proposed Changes
Adds server-side support for light_client_bootstrap_v1 topic
## Additional Info
This PR, creates each time a bootstrap without using cache, I do not know how necessary a cache is in this case as this topic is not supposed to be called frequently and IMHO we can just prevent abuse by using the limiter, but let me know what you think or if there is any caveat to this, or if it is necessary only for the sake of good practice.
Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
## Issue Addressed
Updates discv5
Pending on
- [x] #3547
- [x] Alex upgrades his deps
## Proposed Changes
updates discv5 and the enr crate. The only relevant change would be some clear indications of ipv4 usage in lighthouse
## Additional Info
Functionally, this should be equivalent to the prev version.
As draft pending a discv5 release