Closes#1487Closes#1427
Directory restructure in accordance with #1487. Also has temporary migration code to move the old directories into new structure.
Also extracts all default directory names and utility functions into a `directory` crate to avoid repetitio.
~Since `validator_definition.yaml` stores absolute paths, users will have to manually change the keystore paths or delete the file to get the validators picked up by the vc.~. `validator_definition.yaml` is migrated as well from the default directories.
Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
#629
## Proposed Changes
This removes banned peers from the DHT and informs discovery to block the node_id and the known source IP's associated with this node. It has the capabilities of un banning this peer after a period of time.
This also corrects the logic about banning specific IP addresses. We now use seen_ip addresses from libp2p rather than those sent to us via identify (which also include local addresses).
This reverts commit 4fca306397.
Something in the BLST update is causing SIGILLs on aarch64 non-portable builds. While we debug the issue, I think it's best if we just revert the update.
## Issue Addressed
Closes#1504
Closes https://github.com/sigp/lighthouse/issues/1505
## Proposed Changes
* Update `blst` to the latest version, which is more portable and includes finer-grained compilation controls (see below).
* Detect the case where a binary has been explicitly compiled with ADX support but it's missing at runtime, and report a nicer error than `SIGILL`.
## Known Issues
* None. The previous issue with `make build-aarch64` (https://github.com/supranational/blst/issues/27), has been resolved.
## Additional Info
I think we should tweak our release process and our Docker builds so that we provide two options:
Binaries:
* `lighthouse`: compiled with `modern`/`force-adx`, for CPUs 2013 and newer
* `lighthouse-portable`: compiled with `portable` for older CPUs
Docker images:
* `sigp/lighthouse:latest`: multi-arch image with `modern` x86_64 and vanilla aarch64 binary
* `sigp/lighthouse:latest-portable`: multi-arch image with `portable` builds for x86_64 and aarch64
And relevant Docker images for the releases (as per https://github.com/sigp/lighthouse/pull/1574#issuecomment-687766141), tagged `v0.x.y` and `v0.x.y-portable`
## Issue Addressed
#1431
## Proposed Changes
Added an archived zip file with required files manually
## Additional Info
1) Used zip, instead of tar.gz to add a single dependency instead of two.
2) I left the download from github code for now, waiting to hear if you'd like it cleaned up or left to be used for some tooling needs.
## Issue Addressed
#1421
## Proposed Changes
Bounding the error_message that can be returned for RPC domain errors
Co-authored-by: Age Manning <Age@AgeManning.com>
## Proposed Changes
This is an extraction of the quoted int code from #1569, that I've come to rely on for #1544.
It allows us to parse integers from serde strings in YAML, JSON, etc. The main differences from the code in Paul's original PR are:
* Added a submodule that makes quoting mandatory (`require_quotes`).
* Decoding is generic over the type `T` being decoded. You can use `#[serde(with = "serde_utils::quoted_u64::require_quotes")]` on `Epoch` and `Slot` fields (this is what I do in my slashing protection PR).
I've turned on quoting for `Epoch` and `Slot` in this PR, but will leave the other `types` changes to you Paul.
I opted to put everything in the `conseus/serde_utils` module so that BLS can use it without a circular dependency. In future when we want to publish `types` I think we could publish `serde_utils` as `lighthouse_serde_utils` or something. Open to other ideas on this front too.
Converts the graffiti binary data to string before printing to logs.
## Issue Addressed
#1566
## Proposed Changes
Rather than converting graffiti to a vector the binary data less the last character is passed to String::from_utf_lossy(). This then allows us to call the to_string() function directly to give us the string
## Additional Info
Rust skills are fairly weak
## Issue Addressed
N/A
## Proposed Changes
Adds extended metrics to get a better idea of what is happening at the gossipsub layer of lighthouse. This provides information about mesh statistics per topics, subscriptions and peer scores.
## Additional Info
## Issue Addressed
Fixes#1509
## Proposed Changes
Exit the beacon node if the eth1 endpoint points to an invalid eth1 network. Check the network id before every eth1 cache update and display an error log if the network id has changed to an invalid one.
## Issue Addressed
#1172
## Proposed Changes
* updates the libp2p dependency
* small adaptions based on changes in libp2p
* report not just valid messages but also invalid and distinguish between `IGNORE`d messages and `REJECT`ed messages
Co-authored-by: Age Manning <Age@AgeManning.com>
The PR:
* Adds the ability to generate a crucial test scenario that isn't possible with `BeaconChainHarness` (i.e. two blocks occupying the same slot; previously forks necessitated skipping slots):
![image](https://user-images.githubusercontent.com/165678/88195404-4bce3580-cc40-11ea-8c08-b48d2e1d5959.png)
* New testing API: Instead of repeatedly calling add_block(), you generate a sorted `Vec<Slot>` and leave it up to the framework to generate blocks at those slots.
* Jumping backwards to an earlier epoch is a hard error, so that tests necessarily generate blocks in a epoch-by-epoch manner.
* Configures the test logger so that output is printed on the console in case a test fails. The logger also plays well with `--nocapture`, contrary to the existing testing framework
* Rewrites existing fork pruning tests to use the new API
* Adds a tests that triggers finalization at a non epoch boundary slot
* Renamed `BeaconChainYoke` to `BeaconChainTestingRig` because the former has been too confusing
* Fixed multiple tests (e.g. `block_production_different_shuffling_long`, `delete_blocks_and_states`, `shuffling_compatible_simple_fork`) that relied on a weird (and accidental) feature of the old `BeaconChainHarness` that attestations aren't produced for epochs earlier than the current one, thus masking potential bugs in test cases.
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
## Issue Addressed
Closes#1488
## Proposed Changes
* Prevent the pruning algorithm from over-eagerly deleting states at skipped slots when they are shared with the canonical chain.
* Add `debug` logging to the pruning algorithm so we have so better chance of debugging future issues from logs.
* Modify the handling of the "finalized state" in the beacon chain, so that it's always the state at the first slot of the finalized epoch (previously it was the state at the finalized block). This gives database pruning a clearer and cleaner view of things, and will marginally impact the pruning of the op pool, observed proposers, etc (in ways that are safe as far as I can tell).
* Remove duplicated `RevertedFinalizedEpoch` check from `after_finalization`
* Delete useless and unused `max_finality_distance`
* Add tests that exercise pruning with shared states at skip slots
* Delete unnecessary `block_strategy` argument from `add_blocks` and friends in the test harness (will likely conflict with #1380 slightly, sorry @adaszko -- but we can fix that)
* Bonus: add a `BeaconChain::with_head` method. I didn't end up needing it, but it turned out quite nice, so I figured we could keep it?
## Additional Info
Any users who have experienced pruning errors on Medalla will need to resync after upgrading to a release including this change. This should end unbounded `chain_db` growth! 🎉
## Issue Addressed
NA
## Proposed Changes
Shift practically all HTTP endpoint handlers to the blocking executor (some very light tasks are left on the core executor).
## Additional Info
This PR covers the `rest_api` which will soon be refactored to suit the standard API. As such, I've cut a few corners and left some existing issues open in this patch. What I have done here should leave the API in state that is not necessary *exactly* the same, but good enough for us to run validators with. Specifically, the number of blocking workers that can be spawned is unbounded and I have not implemented a queue; this will need to be fixed when we implement the standard API.
## Issue Addressed
#1378
## Proposed Changes
Boot node reuses code from beacon_node to initialize network config. This also enables using the network directory to store/load the enr and the private key.
## Additional Info
Note that before this PR the port cli arguments were off (the argument was named `enr-port` but used as `boot-node-enr-port`).
Therefore as port always the cli port argument was used (for both enr and listening). Now the enr-port argument can be used to overwrite the listening port as the public port others should connect to.
Last but not least note, that this restructuring reuses `ethlibp2p::NetworkConfig` that has many more options than the ones used in the boot node. For example the network config has an own `discv5_config` field that gets never used in the boot node and instead another `Discv5Config` gets created later in the boot node process.
Co-authored-by: Age Manning <Age@AgeManning.com>
## Overview
There are forked chains which get referenced by blocks and attestations on a network. Typically if these chains are very long, we stop looking up the chain and downvote the peer. In extreme circumstances, many peers are on many chains, the chains can be very deep and become time consuming performing lookups.
This PR adds a cache to known failed chain lookups. This prevents us from starting a parent-lookup (or stopping one half way through) if we have attempted the chain lookup in the past.
The changes are somewhat simple but should solve two issues:
- When quickly changing between chains once and a second time back again, batchIds would collide and cause havoc.
- If we got an out of range response from a peer, sync would remain in syncing but without advancing
Changes:
- remove the batch id. Identify each batch (inside a chain) by its starting epoch. Target epochs for downloading and processing now advance by EPOCHS_PER_BATCH
- for the same reason, move the "to_be_downloaded_id" to be an epoch
- remove a sneaky line that dropped an out of range batch without downloading it
- bonus: put the chain_id in the log given to the chain. This is why explicitly logging the chain_id is removed
## Issue Addressed
There is currently an issue with yamux when connecting to prysm peers. The source of the issue is currently unknown.
This PR removes yamux support to force mplex negotation. We can add back yamux support once we have isolated and corrected the issue.
## Issue Addressed
NA
## Proposed Changes
This PR commits the `Cargo.lock` file so it does not indicate a dirty git tree in the version tag. This code should be used for the `v0.2.3` release.
Also, adds a `Makefile` command to produce tarballs for upload on release.
## Additional Info
NA
## Issue Addressed
N/A
## Proposed Changes
Introduces the `GossipProcessor`, a multi-threaded (multi-tasked?), non-blocking processor for some messages from the network which require verification and import into the `BeaconChain`.
Initial testing indicates that this massively improves system stability by (a) moving block tasks from the normal executor (b) spreading out attestation load.
## Additional Info
TBC
## Issue Addressed
NA
## Proposed Changes
Adds support for using the [`cross`](https://github.com/rust-embedded/cross) project to produce cross-compiled binaries using Docker images.
Provides quite clean and simple cross-compiles cause all the complexity is hidden in Dockerfiles. It does require you to be in the `docker` group though.
## Details
- Adds shortcut commands to `Makefile`
- Ensures `reqwest` and `discv5` use vendored openssl libs (i.e., static not shared).
- Switches to a [commit](284f705964) of blst that has a renamed C function to avoid a collision with openssl (upstream issue: https://github.com/supranational/blst/issues/21).
- Updates `ring` to the latest satisfiable version, since an earlier version was causing issues with `cross`.
- Off-topic, but adds extra message about Windows support as suggested by Discord user.
## Additional Info
- ~~Blocked on #1495~~
- There are no tests in CI for this yet for a few reasons:
- I'm hesitant to add more long-running tasks.
- Short-term bitrot should be avoided since we'll use it each release.
- In the long term I think it would be good to automate binary creation on a release.
- I observed the binaries increase in size from 50mb to 52mb after these changes.
## Issue Addressed
#1483
## Proposed Changes
Upgrades the log to a critical if a listener fails. We are able to listen on many interfaces so a single instance is not critical. We should however gracefully shutdown the client if we have no listeners, although the client can still function solely on outgoing connections.
For now a critical is raised and I leave #1494 for more sophisticated handling of this.
This also updates discv5 to handle errors of binding to a UDP socket such that lighthouse is now able to handle them.
## Issue Addressed
Some nodes not following head, high CPU usage and HTTP API delays
## Proposed Changes
Patches gossipsub. Gossipsub was using an `lru_time_cache` to check for duplicates. This contained an `O(N)` lookup for every gossipsub message to update the time cache. This was causing high cpu usage and blocking network threads.
This PR introduces a custom cache without `O(N)` inserts.
This also adds built in safety mechanisms to prevent gossipsub from excessively retrying connections upon failure. A maximum limit is set after which we disconnect from the node from too many failed substream connections.
## Issue Addressed
NA
## Proposed Changes
- Moves the git-based versioning we were doing into the `lighthouse_version` crate in `common`.
- Removes the `beacon_node/version` crate, replacing it with `lighthouse_version`.
- Bumps the version to `v0.2.0`.
## Additional Info
There are now two types of version string:
1. `const VERSION: &str = Lighthouse/v0.2.0-1419501f2+`
1. `version_with_platform() = Lighthouse/v0.2.0-1419501f2+/x86_64-linux`
(1) is handy cause it's a `const` and shorter. (2) has platform info so it's more useful. Note that the plus-sign (`+`) indicates the the git commit is dirty (it used to be `(modified)` but I had to shorten it to fit into graffiti).
These version strings are now included on:
- `lighthouse --version`
- `lcli --version`
- `curl localhost:5052/node/version`
- p2p messages when we communicate our version
You can update the version by changing this constant (version is not related to a `Cargo.toml`):
b9ad7102d5/common/lighthouse_version/src/lib.rs (L4-L15)
## Issue Addressed
Sync was breaking occasionally. The root cause appears to be identify crashing as events we being sent to the protocol after nodes were banned. Have not been able to reproduce sync issues since this update.
## Proposed Changes
Only send messages to sub-behaviour protocols if the peer manager thinks the peer is connected. All other messages are dropped.
## Proposed Changes
In the continuing war against unportable binaries I figured we should have an option to enable building the Lighthouse binary itself with Milagro. This PR adds a `milagro` feature that can be used with `cargo install --path lighthouse --features milagro --force --locked`. The BLS library in-use will also show up under `lighthouse --version` like this:
```
Lighthouse 0.1.2-7d8acc20a(modified)
BLS Library: milagro
```
Future work: add other cool stuff like the compiler version and CPU target to `--version`.
## Issue Addressed
Closes#1395
## Proposed Changes
* Add a feature to `lighthouse` and `lcli` called `portable` which enables the `portable` feature on our fork of BLST. This feature turns off the `-march=native` C compiler flag that produces binaries highly targeted to the host CPU's instruction set.
* Tweak the `Makefile` so that when the `PORTABLE` environment variable is set to `true`, it compiles with this feature.
* Temporarily enable `PORTABLE=true` in the Docker build so that the image on Docker Hub is portable. Eventually I think we should enable `PORTABLE=true` _only on Docker Hub_, so that users building locally can take advantage of the tasty compiler magic. This seems to be possible by setting a Docker Hub environment variable: https://docs.docker.com/docker-hub/builds/#environment-variables-for-builds
## Additional Info
Tested by compiling on a very new CPU (Intel Core i7-8550U) and copying the binary to a very old CPU (Intel Core i3 530). Before the portability fix, this produced the SIGILL crash described in #1395, and after the fix, it worked smoothly.
I'm in the process of testing the Docker build and running some benches to confirm that the performance penalty isn't too severe.
## Issue Addressed
NA
## Proposed Changes
Allows for multiple "hardcoded" testnets.
## Additional Info
This PR is incomplete.
## TODO
- [x] Add flag to CLI, integrate with rest of Lighthouse.
Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
## Issue Addressed
N/A
## Proposed Changes
This provides a number of corrections and improvements to gossipsub. Specifically
- Enables options for greater privacy around the message author
- Provides greater flexibility on message validation
- Prevents unvalidated messages from being gossiped
- Shifts the duplicate cache to a time-based cache inside gossipsub
- Updates the message-id to handle bytes
- Bug fixes related to mesh maintenance and topic subscription. This should improve our attestation inclusion rate.
## Issue Addressed
NA
## Proposed Changes
- Refactor the `bls` crate to support multiple BLS "backends" (e.g., milagro, blst, etc).
- Removes some duplicate, unused code in `common/rest_types/src/validator.rs`.
- Removes the old "upgrade legacy keypairs" functionality (these were unencrypted keys that haven't been supported for a few testnets, no one should be using them anymore).
## Additional Info
Most of the files changed are just inconsequential changes to function names.
## TODO
- [x] Optimization levels
- [x] Infinity point: https://github.com/supranational/blst/issues/11
- [x] Ensure milagro *and* blst are tested via CI
- [x] What to do with unsafe code?
- [x] Test infinity point in signature sets
## Issue Addressed
#1112
The logic is slightly different but still valid wrt to error handling.
- Inbound state is either Busy with a future that return the subtream (and info about the processing)
- The state machine works as follows:
- `Idle` with pending responses => `Busy`
- `Busy` => finished ? if so and there are new pending responses then `Busy`, if not then `Idle`
=> not finished remains `Busy`
- Add an `InboundInfo` for readability
- Other stuff:
- Close inbound substreams when all expected responses are sent
- Remove the error variants from `RPCCodedResponse` and use the codes instead
- Fix various spelling mistakes because I got sloppy last time
Sorry for the delay
Co-authored-by: Age Manning <Age@AgeManning.com>
## Issue Addressed
NA
## Proposed Changes
- Introduces the `valdiator_definitions.yml` file which serves as an explicit list of validators that should be run by the validator client.
- Removes `--strict` flag, split into `--strict-lockfiles` and `--disable-auto-discover`
- Adds a "Validator Management" page to the book.
- Adds the `common/account_utils` crate which contains some logic that was starting to duplicate across the codebase.
The new docs for this feature are the best description of it (apart from the code, I guess): 9cb87e93ce/book/src/validator-management.md
## API Changes
This change should be transparent for *most* existing users. If the `valdiator_definitions.yml` doesn't exist then it will be automatically generated using a method that will detect all the validators in their `validators_dir`.
Users will have issues if they are:
1. Using `--strict`.
1. Have keystores in their `~/.lighthouse/validators` directory that weren't being detected by the current keystore discovery method.
For users with (1), the VC will refuse to start because the `--strict` flag has been removed. They will be forced to review `--help` and choose an equivalent flag.
For users with (2), this seems fairly unlikely and since we're only in testnets there's no *real* value on the line here. I'm happy to take the risk, it would be a different case for mainnet.
## Additional Info
This PR adds functionality we will need for #1347.
## TODO
- [x] Reconsider flags
- [x] Move doc into a more reasonable chapter.
- [x] Check for compile warnings.
## Issue Addressed
https://github.com/sigp/lighthouse/issues/1177
## Proposed Changes
Add a command line option (`--http-allow-origin`) and a config item for configuring the `Access-Control-Allow-Origin` response header. This should unblock making XMLHttpRequests.
Downgrades libp2p and the gossipsub updates.
This looks to resolve the CPU usage issue we have been seeing.
The root cause is likely inside the latest gossipsub updates, which will be addressed in a later PR
* Add latest commit info to git version
* Testing docker build
* Use fallback; modify format
* Revert "Testing docker build"
This reverts commit 197140d6973e22a6b891ebc0f1798d60c87b182a.
* Modify fallback to have crate version
* Process exits and slashings off the network
* Fix rest_api tests
* Add op verification tests
* Add tests for pruning of slashings in the op pool
* Address Paul's review comments
* Layer do_atomically() abstractions properly
* Reduce allocs and DRY get_key_for_col()
* Parameterize HotColdDB with hot and cold item stores
* -impl Store for MemoryStore
* Replace Store uses with HotColdDB
* Ditch Store trait
* cargo fmt
* Style fix
* Readd missing dep that broke the build
* Update `milagro_bls` to new release (#1183)
* Update milagro_bls to new release
Signed-off-by: Kirk Baird <baird.k@outlook.com>
* Tidy up fake cryptos
Signed-off-by: Kirk Baird <baird.k@outlook.com>
* move SecretHash to bls and put plaintext back
Signed-off-by: Kirk Baird <baird.k@outlook.com>
* Update v0.12.0 to v0.12.1
* Remove secio
* Remove ssz encoding for gossipsub
Co-authored-by: Kirk Baird <baird.k@outlook.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: Age Manning <Age@AgeManning.com>
* Update `milagro_bls` to new release (#1183)
* Update milagro_bls to new release
Signed-off-by: Kirk Baird <baird.k@outlook.com>
* Tidy up fake cryptos
Signed-off-by: Kirk Baird <baird.k@outlook.com>
* move SecretHash to bls and put plaintext back
Signed-off-by: Kirk Baird <baird.k@outlook.com>
* Update v0.12.0 to v0.12.1
* Use ssz types for Request and error types
* Fix errors
* Constrain BlocksByRangeRequest count to MAX_REQUEST_BLOCKS
* Fix issues after rebasing
* Address review comments
Co-authored-by: Kirk Baird <baird.k@outlook.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: Age Manning <Age@AgeManning.com>
* add retry logic to peer discovery and an expiration time for peers
* Restructure discovery
* Add mac build to CI
* Always return an error for Health when not linux
* Change macos workflow
* Rename macos tests
* Update DiscoverPeers messages to pass Instants. Implement PartialEq for AttServiceMessage
* update discover peer queueing to always check existing messages and extend min_ttl as necessary
* update method name and comment
* Correct merge issues
* Add subnet id check to partialeq, fix discover peer message dups
* fix discover peer message dups
* fix discover peer message dups for real this time
Co-authored-by: Age Manning <Age@AgeManning.com>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
* Add logging on shutdown
* Replace tokio::spawn with handle.spawn
* Upgrade tokio
* Add a task executor
* Beacon chain tasks use task executor
* Validator client tasks use task executor
* Rename runtime_handle to executor
* Add duration histograms; minor fixes
* Cleanup
* Fix logs
* Fix tests
* Remove random file
* Get enr dependency instead of libp2p
* Address some review comments
* Libp2p takes a TaskExecutor
* Ugly fix libp2p tests
* Move TaskExecutor to own file
* Upgrade Dockerfile rust version
* Minor fixes
* Revert "Ugly fix libp2p tests"
This reverts commit 58d4bb690f52de28d893943b7504d2d0c6621429.
* Pretty fix libp2p tests
* Add spawn_without_exit; change Counter to Gauge
* Tidy
* Move log from RuntimeContext to TaskExecutor
* Fix errors
* Replace histogram with int_gauge for async tasks
* Fix todo
* Fix memory leak in test by exiting all spawned tasks at the end
* Update milagro_bls to new release
Signed-off-by: Kirk Baird <baird.k@outlook.com>
* Tidy up fake cryptos
Signed-off-by: Kirk Baird <baird.k@outlook.com>
* move SecretHash to bls and put plaintext back
Signed-off-by: Kirk Baird <baird.k@outlook.com>