## Description
There can be many head chains queued up to complete. Currently we try and process all of these to completion before we consider the node synced.
In a chaotic network, there can be many of these and processing them to completion can be very expensive and slow. This PR removes any non-syncing head chains from the queue, and re-status's the peers. If, after we have synced to head on one chain, there is still a valid head chain to download, it will be re-established once the status has been returned.
This should assist with getting nodes to sync on medalla faster.
## Overview
There are forked chains which get referenced by blocks and attestations on a network. Typically if these chains are very long, we stop looking up the chain and downvote the peer. In extreme circumstances, many peers are on many chains, the chains can be very deep and become time consuming performing lookups.
This PR adds a cache to known failed chain lookups. This prevents us from starting a parent-lookup (or stopping one half way through) if we have attempted the chain lookup in the past.
## Description
Currently lighthouse load-balances across peers a single finalized chain. The chain is selected via the most peers. Once synced to the latest finalized epoch Lighthouse creates chains amongst its peers and syncs them all in parallel amongst each peer (grouped by their current head block).
This is typically fast and relatively efficient under normal operations. However if the chain has not finalized in a long time, the head chains can grow quite long. Peer's head chains will update every slot as new blocks are added to the head. Syncing all head chains in parallel is a bottleneck and highly inefficient in block duplication leads to RPC timeouts when attempting to handle all new heads chains at once.
This PR limits the parallelism of head syncing chains to 2. We now sync at most two head chains at a time. This allows for the possiblity of sync progressing alongside a peer being slow and holding up one chain via RPC timeouts.
The changes are somewhat simple but should solve two issues:
- When quickly changing between chains once and a second time back again, batchIds would collide and cause havoc.
- If we got an out of range response from a peer, sync would remain in syncing but without advancing
Changes:
- remove the batch id. Identify each batch (inside a chain) by its starting epoch. Target epochs for downloading and processing now advance by EPOCHS_PER_BATCH
- for the same reason, move the "to_be_downloaded_id" to be an epoch
- remove a sneaky line that dropped an out of range batch without downloading it
- bonus: put the chain_id in the log given to the chain. This is why explicitly logging the chain_id is removed
## Proposed Changes
To mitigate the impact of minority forks on RAM and disk usage, this change rejects blocks whose parent lies more than 320 slots (10 epochs, ~1 hour) in the past. The behaviour is configurable via `lighthouse bn --max-skip-slots N`, and can be turned off entirely using `--max-skip-slots none`.
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
NA
## Proposed Changes
Tells `cross` (used for cross-compiling) to read the `RUSTFLAGS`env and pass it through during build. This allows us to use `-g` and get debug info.
## Additional Info
NA
## Issue Addressed
NA
## Proposed Changes
Moves beacon block processing over to the newly-added `GossipProcessor`. This moves the task off the core executor onto the blocking one.
## Additional Info
- With this PR, gossip blocks are being ignored during sync.
## Issue Addressed
NA
## Proposed Changes
- Adds a new function to allow getting a state with a bad state root history for attestation verification. This reduces unnecessary tree hashing during attestation processing, which accounted for 23% of memory allocations (by bytes) in a recent `heaptrack` observation.
- Don't clone caches on intermediate epoch-boundary states during block processing.
- Reject blocks that are known to fork choice earlier during gossip processing, instead of waiting until after state has been loaded (this only happens in edge-case).
- Avoid multiple re-allocations by creating a "forced" exact size iterator.
## Additional Info
NA
## Issue Addressed
There is currently an issue with yamux when connecting to prysm peers. The source of the issue is currently unknown.
This PR removes yamux support to force mplex negotation. We can add back yamux support once we have isolated and corrected the issue.
## Issue Addressed
Resolves#1489
## Proposed Changes
- Change starting metadata seq num to 0 according to the [spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md#metadata).
- Remove metadata field from `NetworkGlobals`
- Persist metadata to disk on every update
- Load metadata seq number from disk on restart
- Persist enr to disk on update to ensure enr sequence number increments are persisted as well.
## Additional info
Since we modified starting metadata seq num to 0 from 1, we might still see `Invalid Sequence number provided` like in #1489 from prysm nodes if they have our metadata cached.
## Issue Addressed
#1384
Only catch, as currently implemented, when dialing the multiaddr nodes, there is no way to ask the peer manager if they are already connected or dialing
## Discovery v5 update
In this update we remove the openssl dependency in favour of rust-crypto.
The update also removes a series of unnecessary async functions which may improve some of the issues we have been experiencing.
## Issue Addressed
NA
## Proposed Changes
This PR commits the `Cargo.lock` file so it does not indicate a dirty git tree in the version tag. This code should be used for the `v0.2.3` release.
Also, adds a `Makefile` command to produce tarballs for upload on release.
## Additional Info
NA
## Issue Addressed
- Resolves#1451
## Proposed Changes
- Restricts the `contains_block` and `contains_block` so they only indicate a block is present if it descends from the finalized root. This helps to ensure that fork choice never points to a block that has been pruned from the database.
- Resolves#1451
- Before importing a block, double-check that its parent is known and a descendant of the finalized root.
- Split a big, monolithic block verification test into smaller tests.
## Additional Notes
I suspect there would be a craftier way to do the `is_descendant_of_finalized` check, but we're a bit tight on time now and we can optimize later if it starts showing in benches.
## TODO
- [x] Tests
## Issue Addressed
N/A
## Proposed Changes
Introduces the `GossipProcessor`, a multi-threaded (multi-tasked?), non-blocking processor for some messages from the network which require verification and import into the `BeaconChain`.
Initial testing indicates that this massively improves system stability by (a) moving block tasks from the normal executor (b) spreading out attestation load.
## Additional Info
TBC
## Issue Addressed
N/A
## Proposed Changes
Earlier, to log to a file, the only options were to redirect stdout/stderr to a file or use json logging.
Redirecting to stdout/stderr works well but causes issues with mistakenly overwriting the file instead of appending which has resulted in loss of precious logs on multiple occasions for me.
Json logging creates a timestamped backup of the file if it already exists, but the json format itself is hugely annoying.
This PR modifies the `--logfile` option to log as it does in the terminal to a logfile.
These started to appear when I `^C` `curl -N http://localhost:5052/beacon/fork/stream`: `Aug 12 13:00:01.539 ERRO Couldn't stream piece hyper::Error(ChannelClosed), service: http`
Something must have changed in hyper since SSE has been implemented because I'm sure I haven't seen those errors before.
This PR properly detects a closed SSE stream and cleans up.
## Issue Addressed
consequent use of "wally"
## Proposed Changes
Please list or describe the changes introduced by this PR.
## Additional Info
Please provide any additional information. For example, future considerations
or information useful for reviewers.
## Issue Addressed
minor documentation changes in order to have identical command prompts and description below
## Proposed Changes
adjust description "wally" to align with command prompt
## Additional Info
devs might give it a thought whether command line should be "mywallet"
I personally prefer "wally" for minimization reasons =)
## Issue Addressed
NA
## Proposed Changes
Adds support for using the [`cross`](https://github.com/rust-embedded/cross) project to produce cross-compiled binaries using Docker images.
Provides quite clean and simple cross-compiles cause all the complexity is hidden in Dockerfiles. It does require you to be in the `docker` group though.
## Details
- Adds shortcut commands to `Makefile`
- Ensures `reqwest` and `discv5` use vendored openssl libs (i.e., static not shared).
- Switches to a [commit](284f705964) of blst that has a renamed C function to avoid a collision with openssl (upstream issue: https://github.com/supranational/blst/issues/21).
- Updates `ring` to the latest satisfiable version, since an earlier version was causing issues with `cross`.
- Off-topic, but adds extra message about Windows support as suggested by Discord user.
## Additional Info
- ~~Blocked on #1495~~
- There are no tests in CI for this yet for a few reasons:
- I'm hesitant to add more long-running tasks.
- Short-term bitrot should be avoided since we'll use it each release.
- In the long term I think it would be good to automate binary creation on a release.
- I observed the binaries increase in size from 50mb to 52mb after these changes.
limit simultaneous outgoing connections attempts to a reasonable top as an extra layer of protection
also shift the keep alive logic of the rpc handler to avoid needing to update it by hand. I think In rare cases this could make shutting down a connection a bit faster.
## Issue Addressed
#1419
## Proposed Changes
Creates a `--graffiti` cli flag in the validator client. If the flag is set, it overrides graffiti in the beacon node.
## Additional Info
## Issue Addressed
#1483
## Proposed Changes
Upgrades the log to a critical if a listener fails. We are able to listen on many interfaces so a single instance is not critical. We should however gracefully shutdown the client if we have no listeners, although the client can still function solely on outgoing connections.
For now a critical is raised and I leave #1494 for more sophisticated handling of this.
This also updates discv5 to handle errors of binding to a UDP socket such that lighthouse is now able to handle them.
## Issue Addressed
Some nodes not following head, high CPU usage and HTTP API delays
## Proposed Changes
Patches gossipsub. Gossipsub was using an `lru_time_cache` to check for duplicates. This contained an `O(N)` lookup for every gossipsub message to update the time cache. This was causing high cpu usage and blocking network threads.
This PR introduces a custom cache without `O(N)` inserts.
This also adds built in safety mechanisms to prevent gossipsub from excessively retrying connections upon failure. A maximum limit is set after which we disconnect from the node from too many failed substream connections.
## Issue Addressed
Peers that connected after the peer limit may remain connected in some circumstances.
This ensures peers not in the peer manager's list get disconnected. Further logging is also added to track this behaviour.
## Issue Addressed
NA
## Proposed Changes
- When producing a block, go and ensure every attestation in the naive aggregation pool is included in the operation pool. This should help us increase the number of useful attestations in a block.
- Lift the `RwLock`s inside `NaiveAggregationPool` up into a single high-level lock. There were race conditions in the existing setup and it was hard to reason about.
## Additional Info
NA
Install cmake on macOS
## Issue Addressed
Installation error on macOS
## Proposed Changes
Add instructions for installing `cmake` on macOS via homebrew.
## Issue Addressed
#1028
A bit late, but I think if `BlockError` had a kind (the current `BlockError` minus everything on the variants that comes directly from the block) and the original block, more clones could be removed
## Issue Addressed
- Resolves#1461
## Proposed Changes
Copy the `.git` directory across when building docker so we can get commit information.
Unfortunately this means duplicating you `.git` directory which might be quite large (mine is >100mb). Notably this directory isn't contained in the final image, just the intermediate builder image.
## Additional Info
NA