Commit Graph

8 Commits

Author SHA1 Message Date
Age Manning
8a1b77bf89 Ultra Fast Super Slick CI (#4755)
Attempting to improve our CI speeds as its recently been a pain point.

Major changes:

 - Use a github action to pull stable/nightly rust rather than building it each run
 - Shift test suite to `nexttest` https://github.com/nextest-rs/nextest for CI
 
 UPDATE:

So I've iterated on some changes, and although I think its still not optimal I think this is a good base to start from. Some extra things in this PR:
- Shifted where we pull rust from. We're now using this thing: https://github.com/moonrepo/setup-rust . It's got some interesting cache's built in, but was not seeing the gains that Jimmy managed to get. In either case tho, it can pull rust, cargofmt, clippy, cargo nexttest all in < 5s. So I think it's worthwhile. 
- I've grouped a few of the check-like tests into a single test called `code-test`. Although we were using github runners in parallel which may be faster, it just seems wasteful. There were like 4-5 tests, where we would pull lighthouse, compile it, then run an action, like clippy, cargo-audit or fmt. I've grouped these into a single action, so we only compile lighthouse once, then in each step we run the checks. This avoids compiling lighthouse like 5 times.
- Ive made doppelganger tests run on our local machines to avoid pulling foundry, building and making lcli which are all now baked into the images. 
- We have sccache and do not incremental compile lighthouse

Misc bonus things:
- Cargo update
- Fix web3 signer openssl keys which is required after a cargo update
- Use mock_instant in an LRU cache test to avoid non-deterministic test
- Remove race condition in building web3signer tests

There's still some things we could improve on. Such as downloading the EF tests every run and the web3-signer binary, but I've left these to be out of scope of this PR. I think the above are meaningful improvements.



Co-authored-by: Paul Hauner <paul@paulhauner.com>
Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: antondlr <anton@delaruelle.net>
2023-10-03 06:33:15 +00:00
João Oliveira
dcd69dfc62 Move dependencies to workspace (#4650)
## Issue Addressed

Synchronize dependencies and edition on the workspace `Cargo.toml`

## Proposed Changes

with https://github.com/rust-lang/cargo/issues/8415 merged it's now possible to synchronize details on the workspace `Cargo.toml` like the metadata and dependencies.
By only having dependencies that are shared between multiple crates aligned on the workspace `Cargo.toml` it's easier to not miss duplicate versions of the same dependency and therefore ease on the compile times.

## Additional Info
this PR also removes the no longer required direct dependency of the `serde_derive` crate.

should be reviewed after https://github.com/sigp/lighthouse/pull/4639 get's merged.
closes https://github.com/sigp/lighthouse/issues/4651


Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: Michael Sproul <micsproul@gmail.com>
2023-09-22 04:30:56 +00:00
Paul Hauner
d07c78bccf Appease clippy in Rust 1.70 (#4365)
## Issue Addressed

NA

## Proposed Changes

Fixes some new clippy lints raised after updating to Rust 1.70.

## Additional Info

NA
2023-06-02 03:17:40 +00:00
Age Manning
3d99ce25f8 Correct a race condition when dialing peers (#4056)
There is a race condition which occurs when multiple discovery queries return at almost the exact same time and they independently contain a useful peer we would like to connect to.

The condition can occur that we can add the same peer to the dial queue, before we get a chance to process the queue. 
This ends up displaying an error to the user: 
```
ERRO Dialing an already dialing peer
```
Although this error is harmless it's not ideal. 

There are two solutions to resolving this:
1. As we decide to dial the peer, we change the state in the peer-db to dialing (before we add it to the queue) which would prevent other requests from adding to the queue. 
2. We prevent duplicates in the dial queue

This PR has opted for 2. because 1. will complicate the code in that we are changing states in non-intuitive places. Although this technically adds a very slight performance cost, its probably a cleaner solution as we can keep the state-changing logic in one place.
2023-03-16 05:44:54 +00:00
Age Manning
8dd9249177 Enforce a timeout on peer disconnect (#3757)
On heavily crowded networks, we are seeing many attempted connections to our node every second. 

Often these connections come from peers that have just been disconnected. This can be for a number of reasons including: 
- We have deemed them to be not as useful as other peers
- They have performed poorly
- They have dropped the connection with us
- The connection was spontaneously lost
- They were randomly removed because we have too many peers

In all of these cases, if we have reached or exceeded our target peer limit, there is no desire to accept new connections immediately after the disconnect from these peers. In fact, it often costs us resources to handle the established connections and defeats some of the logic of dropping them in the first place. 

This PR adds a timeout, that prevents recently disconnected peers from reconnecting to us.

Technically we implement a ban at the swarm layer to prevent immediate re connections for at least 10 minutes. I decided to keep this light, and use a time-based LRUCache which only gets updated during the peer manager heartbeat to prevent added stress of polling a delay map for what could be a large number of peers.

This cache is bounded in time. An extra space bound could be added should people consider this a risk.

Co-authored-by: Diva M <divma@protonmail.com>
2023-02-14 03:25:42 +00:00
Divma
7366266bd1 keep failed finalized chains to avoid retries (#3142)
## Issue Addressed

In very rare occasions we've seen most if not all our peers in a chain with which we don't agree. Purging these peers can take a very long time: number of retries of the chain. Meanwhile sync is caught in a loop trying the chain again and again. This makes it so that we fast track purging peers via registering the failed chain to prevent retrying for some time (30 seconds). Longer times could be dangerous since a chain can fail if a batch fails to download for example. In this case, I think it's still acceptable to fast track purging peers since they are nor providing the required info anyway 

Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>
2022-04-13 01:10:55 +00:00
Michael Sproul
5e1f8a8480 Update to Rust 1.59 and 2021 edition (#3038)
## Proposed Changes

Lots of lint updates related to `flat_map`, `unwrap_or_else` and string patterns. I did a little more creative refactoring in the op pool, but otherwise followed Clippy's suggestions.

## Additional Info

We need this PR to unblock CI.
2022-02-25 00:10:17 +00:00
Age Manning
3bb30754d9 Keep track of failed head chains and prevent re-lookups (#1534)
## Overview

There are forked chains which get referenced by blocks and attestations on a network. Typically if these chains are very long, we stop looking up the chain and downvote the peer. In extreme circumstances, many peers are on many chains, the chains can be very deep and become time consuming performing lookups. 

This PR adds a cache to known failed chain lookups. This prevents us from starting a parent-lookup (or stopping one half way through) if we have attempted the chain lookup in the past.
2020-08-18 03:54:09 +00:00