lighthouse

Author	SHA1	Message	Date
Age Manning	55eee18ebb	Version bump to 0.3.1 (#1813 ) ## Description Bumps Lighthouse to version 0.3.1.	2020-10-23 04:16:36 +00:00
Age Manning	64c5899d25	Adds colour help to bn and vc subcommands (#1811 ) Adds coloured help to the bn and vc subcommands	2020-10-23 04:16:34 +00:00
Age Manning	2c7f362908	Discovery v5.1 (#1786 ) ## Overview This updates lighthouse to discovery v5.1 Note: This makes lighthouse's discovery not compatible with any previous version. Lighthouse cannot discover peers or send/receive ENR's from any previous version. This is a breaking change. This resolves #1605	2020-10-23 04:16:33 +00:00
Age Manning	ae96dab5d2	Increase UPnP logging and decrease batch sizes (#1812 ) ## Description This increases the logging of the underlying UPnP tasks to inform the user of UPnP error/success. This also decreases the batch syncing size to two epochs per batch.	2020-10-23 03:01:33 +00:00
Age Manning	c49dd94e20	Update to latest libp2p (#1810 ) ## Description Updates to the latest libp2p and includes gossipsub updates. Of particular note is the limitation of a single topic per gossipsub message. Co-authored-by: blacktemplar <blacktemplar@a1.net>	2020-10-23 03:01:31 +00:00
Michael Sproul	acd49d988d	Implement database temp states to reduce memory usage (#1798 ) ## Issue Addressed Closes #800 Closes #1713 ## Proposed Changes Implement the temporary state storage algorithm described in #800. Specifically: * Add `DBColumn::BeaconStateTemporary`, for storing 0-length temporary marker values. * Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully. * Add a garbage collection process to delete leftover temporary states on start-up. * Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784) ## Additional Info There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant. ### Race 1: Permanent state marked temporary EDIT: this has been fixed by the addition of a lock around the relevant critical section There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events: 1. Thread 1 checks if state `s` already exists, and seeing that it doesn't, prepares an atomic commit of `(s, s_temporary_flag)`. 2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and _deleting_ the temporary flag. 3. Thread 1 is (finally) scheduled again, and marks `s` as temporary with its transaction. 4. a) The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state `s` will fail... hope it isn't on the main chain! Alternatively (4b) happens... b) Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state `s` will disappear temporarily, but will come back once thread 1 finishes running. I _hope_ that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. [LMDB only allows one R/W transaction at a time](https://docs.rs/lmdb/0.8.0/lmdb/struct.Environment.html#method.begin_rw_txn). ### Race 2: Temporary state returned from `get_state` I don't think this race really matters, but in `load_hot_state`, if another thread stores a state between when we call `load_state_temporary_flag` and when we call `load_hot_state_summary`, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than _not_ having data). This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now.	2020-10-23 01:27:51 +00:00
Age Manning	66f0cf4430	Improve peer handling (#1796 ) ## Issue Addressed Potentially resolves #1647 and sync stalls. ## Proposed Changes The handling of the state of banned peers was inadequate for the complex peerdb data structure. We store a limited number of disconnected and banned peers in the db. We were not tracking intermediate "disconnecting" states and the in some circumstances we were updating the peer state without informing the peerdb. This lead to a number of inconsistencies in the peer state. Further, the peer manager could ban a peer changing a peer's state from being connected to banned. In this circumstance, if the peer then disconnected, we didn't inform the application layer, which lead to applications like sync not being informed of a peers disconnection. This could lead to sync stalling and having to require a lighthouse restart. Improved handling for peer states and interactions with the peerdb is made in this PR.	2020-10-23 01:27:48 +00:00
Paul Hauner	b829257cca	Ssz state (#1749 ) ## Issue Addressed NA ## Proposed Changes Adds a `lighthouse/beacon/states/:state_id/ssz` endpoint to allow us to pull the genesis state from the API. ## Additional Info NA	2020-10-22 06:05:49 +00:00
Michael Sproul	7f73dccebc	Refine op pool pruning (#1805 ) ## Issue Addressed Closes #1769 Closes #1708 ## Proposed Changes Tweaks the op pool pruning so that the attestation pool is pruned against the wall-clock epoch instead of the finalized state's epoch. This should reduce the unbounded growth that we've seen during periods without finality. Also fixes up the voluntary exit pruning as raised in #1708.	2020-10-22 04:47:29 +00:00
Paul Hauner	a3704b971e	Support pre-flight CORS check (#1772 ) ## Issue Addressed - Resolves #1766 ## Proposed Changes - Use the `warp::filters::cors` filter instead of our work-around. ## Additional Info It's not trivial to enable/disable `cors` using `warp`, since using `routes.with(cors)` changes the type of `routes`. This makes it difficult to apply/not apply cors at runtime. My solution has been to always use the `warp::filters::cors` wrapper but when cors should be disabled, just pass the HTTP server listen address as the only permissible origin.	2020-10-22 04:47:27 +00:00
realbigsean	a3552a4b70	Node endpoints (#1778 ) ## Issue Addressed `node` endpoints in #1434 ## Proposed Changes Implement these: ``` /eth/v1/node/health /eth/v1/node/peers/{peer_id} /eth/v1/node/peers ``` - Add an `Option<Enr>` to `PeerInfo` - Finish implementation of `/eth/v1/node/identity` ## Additional Info - should update the `peers` endpoints when #1764 is resolved Co-authored-by: realbigsean <seananderson33@gmail.com>	2020-10-22 02:59:42 +00:00
Daniel Schonfeld	8f86baa48d	Optimize attester slashing (#1745 ) ## Issue Addressed Closes #1548 ## Proposed Changes Optimizes attester slashing choice by choosing the ones that cover the most amount of validators slashed, with the highest effective balances ## Additional Info Initial pass, need to write a test for it	2020-10-22 01:43:54 +00:00
divma	668513b67e	Sync state adjustments (#1804 ) check for advanced peers and the state of the chain wrt the clock slot to decide if a chain is or not synced /transitioning to a head sync. Also a fix that prevented getting the right state while syncing heads	2020-10-22 00:26:06 +00:00
realbigsean	628891df1d	fix genesis state root provided to HTTP server (#1783 ) ## Issue Addressed Resolves #1776 ## Proposed Changes The beacon chain builder was using the canonical head's state root for the `genesis_state_root` field. ## Additional Info	2020-10-21 23:15:30 +00:00
realbigsean	fdb9744759	use head slot instead of the target slot for the not_while_syncing fi… (#1802 ) ## Issue Addressed Resolves #1792 ## Proposed Changes Use `chain.best_slot()` instead of the sync state's target slot in the `not_while_syncing_filter` ## Additional Info N/A	2020-10-21 22:02:25 +00:00
divma	2acf75785c	More sync updates (#1791 ) ## Issue Addressed #1614 and a couple of sync-stalling problems, the most important is a cyclic dependency between the sync manager and the peer manager	2020-10-20 22:34:18 +00:00
Michael Sproul	703c33bdc7	Fix head tracker concurrency bugs (#1771 ) ## Issue Addressed Closes #1557 ## Proposed Changes Modify the pruning algorithm so that it mutates the head-tracker _before_ committing the database transaction to disk, and _only if_ all the heads to be removed are still present in the head-tracker (i.e. no concurrent mutations). In the process of writing and testing this I also had to make a few other changes: * Use internal mutability for all `BeaconChainHarness` functions (namely the RNG and the graffiti), in order to enable parallel calls (see testing section below). * Disable logging in harness tests unless the `test_logger` feature is turned on And chose to make some clean-ups: * Delete the `NullMigrator` * Remove type-based configuration for the migrator in favour of runtime config (simpler, less duplicated code) * Use the non-blocking migrator unless the blocking migrator is required. In the store tests we need the blocking migrator because some tests make asserts about the state of the DB after the migration has run. * Rename `validators_keypairs` -> `validator_keypairs` in the `BeaconChainHarness` ## Testing To confirm that the fix worked, I wrote a test using [Hiatus](https://crates.io/crates/hiatus), which can be found here: https://github.com/michaelsproul/lighthouse/tree/hiatus-issue-1557 That test can't be merged because it inserts random breakpoints everywhere, but if you check out that branch you can run the test with: ``` $ cd beacon_node/beacon_chain $ cargo test --release --test parallel_tests --features test_logger ``` It should pass, and the log output should show: ``` WARN Pruning deferred because of a concurrent mutation, message: this is expected only very rarely! ``` ## Additional Info This is a backwards-compatible change with no impact on consensus.	2020-10-19 05:58:39 +00:00
blacktemplar	6ba997b88e	add direction information to PeerInfo (#1768 ) ## Issue Addressed NA ## Proposed Changes Adds a direction field to `PeerConnectionStatus` that can be accessed by calling `is_outgoing` which will return `true` iff the peer is connected and the first connection was an outgoing one.	2020-10-16 05:24:21 +00:00
Herman Junge	d7b9d0dd9f	Implement matches! macro (#1777 ) Fix #1775	2020-10-15 21:42:43 +00:00
Pawan Dhananjay	97be2ca295	Simulator and attestation service fixes (#1747 ) ## Issue Addressed #1729 #1730 Which issue # does this PR address? ## Proposed Changes 1. Fixes a bug in the simulator where nodes can't find each other due to 0 udp ports in their enr. 2. Fixes bugs in attestation service where we are unsubscribing from a subnet prematurely. More testing is needed for attestation service fixes.	2020-10-15 07:11:31 +00:00
blacktemplar	a0634cc64f	Gossipsub topic filters (#1767 ) ## Proposed Changes Adds a gossipsub topic filter that only allows subscribing and incoming subscriptions from valid ETH2 topics. ## Additional Info Currently the preparation of the valid topic hashes uses only the current fork id but in the future it must also use all possible future fork ids for planned forks. This has to get added when hard coded forks get implemented. DO NOT MERGE: We first need to merge the libp2p changes (see https://github.com/sigp/rust-libp2p/pull/70) so that we can refer from here to a commit hash inside the lighthouse branch.	2020-10-14 10:12:57 +00:00
blacktemplar	8248afa793	Updates the message-id according to the Networking Spec (#1752 ) ## Proposed Changes Implement the new message id function (see https://github.com/ethereum/eth2.0-specs/pull/2089) using an additional fast message id function for better performance + caching decompressed data.	2020-10-14 06:51:58 +00:00
Pawan Dhananjay	99a02fd2ab	Limit snappy input stream (#1738 ) ## Issue Addressed N/A ## Proposed Changes This PR limits the length of the stream received by the snappy decoder to be the maximum allowed size for the received rpc message type. Also adds further checks to ensure that the length specified in the rpc [encoding-dependent header](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md#encoding-strategies) is within the bounds for the rpc message type being decoded.	2020-10-11 22:45:33 +00:00
Paul Hauner	0e4cc50262	Remove unused deps	2020-10-09 15:58:20 +11:00
Paul Hauner	db3e0578e9	Merge branch 'v0.3.0-staging' into v3-master	2020-10-09 15:27:08 +11:00
Paul Hauner	72cc5e35af	Bump version to v0.3.0 (#1743 ) ## Issue Addressed NA ## Proposed Changes - Bump version to v0.3.0 - Run `cargo update` ## Additional Info NA	2020-10-09 02:05:30 +00:00
Paul Hauner	da44821e39	Clean up obsolete TODOs (#1734 ) Squashed commit of the following: commit f99373cbaec9adb2bdbae3f7e903284327962083 Author: Age Manning <Age@AgeManning.com> Date: Mon Oct 5 18:44:09 2020 +1100 Clean up obsolute TODOs	2020-10-05 21:08:14 +11:00
Paul Hauner	ee7c8a0b7e	Update external deps (#1711 ) ## Issue Addressed - Resolves #1706 ## Proposed Changes Updates dependencies across the workspace. Any crate that was not able to be brought to the latest version is listed in #1712. ## Additional Info NA	2020-10-05 08:22:19 +00:00
Age Manning	240181e840	Upgrade discovery and restructure task execution (#1693 ) * Initial rebase * Remove old code * Correct release tests * Rebase commit * Remove eth2-testnet dep on eth2libp2p * Remove crates lost in rebase * Remove unused dep	2020-10-05 18:45:54 +11:00
Age Manning	bcb629564a	Improve error handling in network processing (#1654 ) * Improve error handling in network processing * Cargo fmt * Cargo fmt * Improve error handling for prior genesis * Remove dep	2020-10-05 17:34:56 +11:00
divma	113758a4f5	From panic to crit (#1726 ) ## Issue Addressed Downgrade inconsistent chain segment states from `panic` to `crit`. I don't love this solution but since range can always bounce back from any of those, we don't panic. Co-authored-by: Age Manning <Age@AgeManning.com>	2020-10-05 17:34:49 +11:00
Age Manning	a8c5af8874	Increase content-id length (#1725 ) ## Issue Addressed N/A ## Proposed Changes Increase gossipsub's content-id length to the full 32 byte hash. ## Additional Info N/A	2020-10-05 17:33:42 +11:00
divma	6997776494	Sync fixes (#1716 ) ## Issue Addressed chain state inconsistencies ## Proposed Changes - a batch can be fake-failed by Range if it needs to move a peer to another chain. The peer will still send blocks/ errors / produce timeouts for those requests, so check when we get a response from the RPC that the request id matches, instead of only the peer, since a re-request can be directed to the same peer. - if an optimistic batch succeeds, store the attempt to avoid trying it again when quickly switching chains. Also, use it only if ahead of our current target, instead of the segment's start epoch	2020-10-05 17:33:36 +11:00
Paul Hauner	e7eb99cb5e	Use Drop impl to send worker idle message (#1718 ) ## Issue Addressed NA ## Proposed Changes Uses a `Drop` implementation to help ensure that `BeaconProcessor` workers are freed. This will help prevent against regression, if someone happens to add an early return and it will also help in the case of a panic. ## Additional Info NA	2020-10-05 17:33:25 +11:00
Age Manning	fe07a3c21c	Improve error handling in network processing (#1654 ) * Improve error handling in network processing * Cargo fmt * Cargo fmt * Improve error handling for prior genesis * Remove dep	2020-10-05 17:30:43 +11:00
Age Manning	47c921f326	Update libp2p (#1728 ) ## Issue Addressed N/A ## Proposed Changes Updates the libp2p dependency to the latest version ## Additional Info N/A	2020-10-05 05:16:27 +00:00
divma	b1c121b880	From panic to crit (#1726 ) ## Issue Addressed Downgrade inconsistent chain segment states from `panic` to `crit`. I don't love this solution but since range can always bounce back from any of those, we don't panic. Co-authored-by: Age Manning <Age@AgeManning.com>	2020-10-05 04:02:09 +00:00
Age Manning	6b68c628df	Increase content-id length (#1725 ) ## Issue Addressed N/A ## Proposed Changes Increase gossipsub's content-id length to the full 32 byte hash. ## Additional Info N/A	2020-10-04 23:49:16 +00:00
divma	86a18e72c4	Sync fixes (#1716 ) ## Issue Addressed chain state inconsistencies ## Proposed Changes - a batch can be fake-failed by Range if it needs to move a peer to another chain. The peer will still send blocks/ errors / produce timeouts for those requests, so check when we get a response from the RPC that the request id matches, instead of only the peer, since a re-request can be directed to the same peer. - if an optimistic batch succeeds, store the attempt to avoid trying it again when quickly switching chains. Also, use it only if ahead of our current target, instead of the segment's start epoch	2020-10-04 23:49:14 +00:00
divma	e3c7b58657	Address a couple of TODOs (#1724 ) ## Issue Addressed couple of TODOs	2020-10-04 22:50:44 +00:00
Paul Hauner	d72c026d32	Use Drop impl to send worker idle message (#1718 ) ## Issue Addressed NA ## Proposed Changes Uses a `Drop` implementation to help ensure that `BeaconProcessor` workers are freed. This will help prevent against regression, if someone happens to add an early return and it will also help in the case of a panic. ## Additional Info NA	2020-10-04 21:59:20 +00:00
Paul Hauner	c4bd9c86e6	Add check for head/target consistency (#1702 ) ## Issue Addressed NA ## Proposed Changes Addresses an interesting DoS vector raised by @protolambda by verifying that the head and target are consistent when processing aggregate attestations. This check prevents us from loading very old target blocks and doing lots of work to skip them to the current slot. ## Additional Info NA	2020-10-03 10:08:06 +10:00
Sean	6af3bc9ce2	Add UPnP support for Lighthouse (#1587 ) This commit was modified by Paul H whilst rebasing master onto v0.3.0-staging Adding UPnP support will help grow the DHT by allowing NAT traversal for peers with UPnP supported routers. Using IGD library: https://docs.rs/igd/0.10.0/igd/ Adding the the libp2p tcp port and discovery udp port. If this fails it simply logs the attempt and moves on Co-authored-by: Age Manning <Age@AgeManning.com>	2020-10-03 10:07:47 +10:00
realbigsean	255cc25623	Weak subjectivity start from genesis (#1675 ) This commit was edited by Paul H when rebasing from master to v0.3.0-staging. Solution 2 proposed here: https://github.com/sigp/lighthouse/issues/1435#issuecomment-692317639 - Adds an optional `--wss-checkpoint` flag that takes a string `root:epoch` - Verify that the given checkpoint exists in the chain, or that the the chain syncs through this checkpoint. If not, shutdown and prompt the user to purge state before restarting. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2020-10-03 10:00:28 +10:00
Paul Hauner	32338bcafa	Add check for head/target consistency (#1702 ) ## Issue Addressed NA ## Proposed Changes Addresses an interesting DoS vector raised by @protolambda by verifying that the head and target are consistent when processing aggregate attestations. This check prevents us from loading very old target blocks and doing lots of work to skip them to the current slot. ## Additional Info NA	2020-10-02 10:46:37 +00:00
Paul Hauner	6ea3bc5e52	Implement VC API (#1657 ) ## Issue Addressed NA ## Proposed Changes - Implements a HTTP API for the validator client. - Creates EIP-2335 keystores with an empty `description` field, instead of a missing `description` field. Adds option to set name. - Be more graceful with setups without any validators (yet) - Remove an error log when there are no validators. - Create the `validator` dir if it doesn't exist. - Allow building a `ValidatorDir` without a withdrawal keystore (required for the API method where we only post a voting keystore). - Add optional `description` field to `validator_definitions.yml` ## TODO - [x] Signature header, as per https://github.com/sigp/lighthouse/issues/1269#issuecomment-649879855 - [x] Return validator descriptions - [x] Return deposit data - [x] Respect the mnemonic offset - [x] Check that mnemonic can derive returned keys - [x] Be strict about non-localhost - [x] Allow graceful start without any validators (+ create validator dir) - [x] Docs final pass - [x] Swap to EIP-2335 description field. - [x] Fix Zerioze TODO in VC api types. - [x] Zeroize secp256k1 key ## Endpoints - [x] `GET /lighthouse/version` - [x] `GET /lighthouse/health` - [x] `GET /lighthouse/validators` - [x] `POST /lighthouse/validators/hd` - [x] `POST /lighthouse/validators/keystore` - [x] `PATCH /lighthouse/validators/:validator_pubkey` - [ ] ~~`POST /lighthouse/validators/:validator_pubkey/exit/:epoch`~~ Future works ## Additional Info TBC	2020-10-02 09:42:19 +00:00
Sean	94b17ce02b	Add UPnP support for Lighthouse (#1587 ) Adding UPnP support will help grow the DHT by allowing NAT traversal for peers with UPnP supported routers. ## Issue Addressed #927 ## Proposed Changes Using IGD library: https://docs.rs/igd/0.10.0/igd/ Adding the the libp2p tcp port and discovery udp port. If this fails it simply logs the attempt and moves on ## Additional Info Co-authored-by: Age Manning <Age@AgeManning.com>	2020-10-02 08:47:00 +00:00
realbigsean	9d2d6239cd	Weak subjectivity start from genesis (#1675 ) ## Issue Addressed Solution 2 proposed here: https://github.com/sigp/lighthouse/issues/1435#issuecomment-692317639 ## Proposed Changes - Adds an optional `--wss-checkpoint` flag that takes a string `root:epoch` - Verify that the given checkpoint exists in the chain, or that the the chain syncs through this checkpoint. If not, shutdown and prompt the user to purge state before restarting. ## Additional Info Co-authored-by: Paul Hauner <paul@paulhauner.com>	2020-10-01 01:41:58 +00:00
Michael Sproul	22aedda1be	Add database schema versioning (#1688 ) ## Issue Addressed Closes #673 ## Proposed Changes Store a schema version in the database so that future releases can check they're running against a compatible database version. This would also enable automatic migration on breaking database changes, but that's left as future work. The database config is also stored in the database so that the `slots_per_restore_point` value can be checked for consistency, which closes #673	2020-10-01 11:12:36 +10:00
Paul Hauner	cdec3cec18	Implement standard eth2.0 API (#1569 ) - Resolves #1550 - Resolves #824 - Resolves #825 - Resolves #1131 - Resolves #1411 - Resolves #1256 - Resolve #1177 - Includes the `ShufflingId` struct initially defined in #1492. That PR is now closed and the changes are included here, with significant bug fixes. - Implement the https://github.com/ethereum/eth2.0-APIs in a new `http_api` crate using `warp`. This replaces the `rest_api` crate. - Add a new `common/eth2` crate which provides a wrapper around `reqwest`, providing the HTTP client that is used by the validator client and for testing. This replaces the `common/remote_beacon_node` crate. - Create a `http_metrics` crate which is a dedicated server for Prometheus metrics (they are no longer served on the same port as the REST API). We now have flags for `--metrics`, `--metrics-address`, etc. - Allow the `subnet_id` to be an optional parameter for `VerifiedUnaggregatedAttestation::verify`. This means it does not need to be provided unnecessarily by the validator client. - Move `fn map_attestation_committee` in `mod beacon_chain::attestation_verification` to a new `fn with_committee_cache` on the `BeaconChain` so the same cache can be used for obtaining validator duties. - Add some other helpers to `BeaconChain` to assist with common API duties (e.g., `block_root_at_slot`, `head_beacon_block_root`). - Change the `NaiveAggregationPool` so it can index attestations by `hash_tree_root(attestation.data)`. This is a requirement of the API. - Add functions to `BeaconChainHarness` to allow it to create slashings and exits. - Allow for `eth1::Eth1NetworkId` to go to/from a `String`. - Add functions to the `OperationPool` to allow getting all objects in the pool. - Add function to `BeaconState` to check if a committee cache is initialized. - Fix bug where `seconds_per_eth1_block` was not transferring over from `YamlConfig` to `ChainSpec`. - Add the `deposit_contract_address` to `YamlConfig` and `ChainSpec`. We needed to be able to return it in an API response. - Change some uses of serde `serialize_with` and `deserialize_with` to a single use of `with` (code quality). - Impl `Display` and `FromStr` for several BLS fields. - Check for clock discrepancy when VC polls BN for sync state (with +/- 1 slot tolerance). This is not intended to be comprehensive, it was just easy to do. - See #1434 for a per-endpoint overview. - Seeking clarity here: https://github.com/ethereum/eth2.0-APIs/issues/75 - [x] Add docs for prom port to close #1256 - [x] Follow up on this #1177 - [x] ~~Follow up with #1424~~ Will fix in future PR. - [x] Follow up with #1411 - [x] ~~Follow up with #1260~~ Will fix in future PR. - [x] Add quotes to all integers. - [x] Remove `rest_types` - [x] Address missing beacon block error. (#1629) - [x] ~~Add tests for lighthouse/peers endpoints~~ Wontfix - [x] ~~Follow up with validator status proposal~~ Tracked in #1434 - [x] Unify graffiti structs - [x] ~~Start server when waiting for genesis?~~ Will fix in future PR. - [x] TODO in http_api tests - [x] Move lighthouse endpoints off /eth/v1 - [x] Update docs to link to standard - ~~Blocked on #1586~~ Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2020-10-01 11:12:36 +10:00
Pawan Dhananjay	8e20176337	Directory restructure (#1532 ) Closes #1487 Closes #1427 Directory restructure in accordance with #1487. Also has temporary migration code to move the old directories into new structure. Also extracts all default directory names and utility functions into a `directory` crate to avoid repetitio. ~Since `validator_definition.yaml` stores absolute paths, users will have to manually change the keystore paths or delete the file to get the validators picked up by the vc.~. `validator_definition.yaml` is migrated as well from the default directories. Co-authored-by: realbigsean <seananderson33@gmail.com> Co-authored-by: Paul Hauner <paul@paulhauner.com>	2020-10-01 11:12:35 +10:00
Paul Hauner	dffc56ef1d	Fix validator lockfiles (#1586 ) ## Issue Addressed - Resolves #1313 ## Proposed Changes Changes the way we start the validator client and beacon node to ensure that we cleanly drop the validator keystores (which therefore ensures we cleanup their lockfiles). Previously we were holding the validator keystores in a tokio task that was being forcefully killed (i.e., without `Drop`). Now, we hold them in a task that can gracefully handle a shutdown. Also, switches the `--strict-lockfiles` flag to `--delete-lockfiles`. This means two things: 1. We are now strict on lockfiles by default (before we weren't). 1. There's a simple way for people delete the lockfiles if they experience a crash. ## Additional Info I've only given the option to ignore and delete lockfiles, not just ignore them. I can't see a strong need for ignore-only but could easily add it, if the need arises. I've flagged this as `api-breaking` since users that have lockfiles lingering around will be required to supply `--delete-lockfiles` next time they run.	2020-10-01 11:12:35 +10:00
Michael Sproul	fcf8419c90	Allow truncation of pubkey cache on creation (#1686 ) ## Issue Addressed Closes #1680 ## Proposed Changes This PR fixes a race condition in beacon node start-up whereby the pubkey cache could be created by the beacon chain builder before the `PersistedBeaconChain` was stored to disk. When the node restarted, it would find the persisted chain missing, and attempt to start from scratch, creating a new pubkey cache in the process. This call to `ValidatorPubkeyCache::new` would fail if the file already existed (which it did). I changed the behaviour so that pubkey cache initialization now doesn't care whether there's a file already in existence (it's only a cache after all). Instead it will truncate and recreate the file in the race scenario described.	2020-09-30 04:42:52 +00:00
Age Manning	c0e76d2c15	Version bump and cargo update (#1683 )	2020-09-29 18:29:04 +10:00
Age Manning	13cb642f39	Update boot-node and discovery (#1682 ) * Improve boot_node and upgrade discovery * Clippy lints	2020-09-29 18:28:29 +10:00
blacktemplar	ae28773965	Networking bug fixes (#1684 ) * call correct unsubscribe method for subnets * correctly delegate closed connections in behaviour * correct unsubscribe method name	2020-09-29 18:28:15 +10:00
Paul Hauner	1ef4f0ea12	Add gossip conditions from spec v0.12.3 (#1667 ) ## Issue Addressed NA ## Proposed Changes There are four new conditions introduced in v0.12.3: 1. _[REJECT]_ The attestation's epoch matches its target -- i.e. `attestation.data.target.epoch == compute_epoch_at_slot(attestation.data.slot)` 1. _[REJECT]_ The attestation's target block is an ancestor of the block named in the LMD vote -- i.e. `get_ancestor(store, attestation.data.beacon_block_root, compute_start_slot_at_epoch(attestation.data.target.epoch)) == attestation.data.target.root` 1. _[REJECT]_ The committee index is within the expected range -- i.e. `data.index < get_committee_count_per_slot(state, data.target.epoch)`. 1. _[REJECT]_ The number of aggregation bits matches the committee size -- i.e. `len(attestation.aggregation_bits) == len(get_beacon_committee(state, data.slot, data.index))`. This PR implements new logic to suit (1) and (2). Tests are added for (3) and (4), although they were already implicitly enforced. ## Additional Info - There's a bit of edge-case with target root verification that I raised here: https://github.com/ethereum/eth2.0-specs/pull/2001#issuecomment-699246659 - I've had to add an `--ignore` to `cargo audit` to get CI to pass. See https://github.com/sigp/lighthouse/issues/1669	2020-09-27 20:59:40 +00:00
Paul Hauner	f1180a8947	Prepare for v0.2.12 (#1672 ) ## Issue Addressed NA ## Proposed Changes - Bump versions - Run cargo update ## Additional Info NA	2020-09-26 06:35:45 +00:00
Age Manning	28b6d921c6	Remove banned peers from DHT and track IPs (#1656 ) ## Issue Addressed #629 ## Proposed Changes This removes banned peers from the DHT and informs discovery to block the node_id and the known source IP's associated with this node. It has the capabilities of un banning this peer after a period of time. This also corrects the logic about banning specific IP addresses. We now use seen_ip addresses from libp2p rather than those sent to us via identify (which also include local addresses).	2020-09-25 01:52:39 +00:00
Pawan Dhananjay	15638d1448	Beacon node does not quit on eth1 errors (#1663 ) ## Issue Addressed N/A ## Proposed Changes Log critical errors instead of quitting if eth1 node cannot be reached or is on wrong network id.	2020-09-25 00:43:45 +00:00
divma	b8013b7b2c	Super Silky Smooth Syncs, like a Sir (#1628 ) ## Issue Addressed In principle.. closes #1551 but in general are improvements for performance, maintainability and readability. The logic for the optimistic sync in actually simple ## Proposed Changes There are miscellaneous things here: - Remove unnecessary `BatchProcessResult::Partial` to simplify the batch validation logic - Make batches a state machine. This is done to ensure batch state transitions respect our logic (this was previously done by moving batches between `Vec`s) and to ease the cognitive load of the `SyncingChain` struct - Move most batch-related logic to the batch - Remove `PendingBatches` in favor of a map of peers to their batches. This is to avoid duplicating peers inside the chain (peer_pool and pending_batches) - Add `must_use` decoration to the `ProcessingResult` so that chains that request to be removed are handled accordingly. This also means that chains are now removed in more places than before to account for unhandled cases - Store batches in a sorted map (`BTreeMap`) access is not O(1) but since the number of _active_ batches is bounded this should be fast, and saves performing hashing ops. Batches are indexed by the epoch they start. Sorted, to easily handle chain advancements (range logic) - Produce the chain Id from the identifying fields: target root and target slot. This, to guarantee there can't be duplicated chains and be able to consistently search chains by either Id or checkpoint - Fix chain_id not being present in all chain loggers - Handle mega-edge case where the processor's work queue is full and the batch can't be sent. In this case the chain would lose the blocks, remain in a "syncing" state and waiting for a result that won't arrive, effectively stalling sync. - When a batch imports blocks or the chain starts syncing with a local finalized epoch greater that the chain's start epoch, the chain is advanced instead of reset. This is to avoid losing download progress and validate batches faster. This also means that the old `start_epoch` now means "current first unvalidated batch", so it represents more accurately the progress of the chain. - Batch status peers from the same chain to reduce Arc access. - Handle a couple of cases where the retry counters for a batch were not updated/checked are now handled via the batch state machine. Basically now if we forget to do it, we will know. - Do not send back the blocks from the processor to the batch. Instead register the attempt before sending the blocks (does not count as failed) - When re-requesting a batch, try to avoid not only the last failed peer, but all previous failed peers. - Optimize requesting batches ahead in the buffer by shuffling idle peers just once (this is just addressing a couple of old TODOs in the code) - In chain_collection, store chains by their id in a map - Include a mapping from request_ids to (chain, batch) that requested the batch to avoid the double O(n) search on block responses - Other stuff: - impl `slog::KV` for batches - impl `slog::KV` for syncing chains - PSA: when logging, we can use `%thing` if `thing` implements `Display`. Same for `?` and `Debug` ### Optimistic syncing: Try first the batch that contains the current head, if the batch imports any block, advance the chain. If not, if this optimistic batch is inside the current processing window leave it there for future use, if not drop it. The tolerance for this block is the same for downloading, but just once for processing Co-authored-by: Age Manning <Age@AgeManning.com>	2020-09-23 06:29:55 +00:00
Age Manning	80e52a0263	Subscribe to core topics after sync (#1613 ) ## Issue Addressed N/A ## Proposed Changes Prevent subscribing to core gossipsub topics until after we have achieved a full sync. This prevents us censoring gossipsub channels, getting penalised in gossipsub 1.1 scoring and saves us computation time in attempting to validate gossipsub messages which we will be unable to do with a non-sync'd chain.	2020-09-23 03:26:33 +00:00
Pawan Dhananjay	80ecafaae4	Add `--staking` flag (#1641 ) ## Issue Addressed Closes #1472 ## Proposed Changes Add `--staking` ~~and`staking-with-eth1-endpoint`~~ flag to improve UX for stakers. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2020-09-23 01:19:58 +00:00
realbigsean	b75df29501	minimize the number of places we are calling `update_pubkey_cache` (#1626 ) ## Issue Addressed - Resolves #1080 ## Proposed Changes - Call `update_pubkey_cache` only in the `build_all_caches` method and `get_validator_index` method. ## Additional Info This does reduce the number of places the cache is updated, making it simpler. But the `get_validator_index` method is used a couple times when we are iterating through the entire validator registry (or set of active validators). Before, we would only call `update_pubkey_cache` once before iterating through all validators. So I'm not _totally_ sure this change is worth it.	2020-09-23 01:19:56 +00:00
Pawan Dhananjay	a97ec318c4	Subscribe to subnets an epoch in advance (#1600 ) ## Issue Addressed N/A ## Proposed Changes Subscibe to subnet an epoch in advance of the attestation slot instead of 4 slots in advance.	2020-09-22 07:29:34 +00:00
Paul Hauner	d85d5a435e	Bump to v0.2.11 (#1645 ) ## Issue Addressed NA ## Proposed Changes - Bump version to v0.2.11 - Run `cargo update`. ## Additional Info NA	2020-09-22 04:45:15 +00:00
Paul Hauner	bd39cc8e26	Apply hotfix for inconsistent head (#1639 ) ## Issue Addressed - Resolves #1616 ## Proposed Changes If we look at the function which persists fork choice and the canonical head to disk: `1db8daae0c/beacon_node/beacon_chain/src/beacon_chain.rs (L234-L280)` There is a race-condition which might cause the canonical head and fork choice values to be out-of-sync. I believe this is the cause of #1616. I managed to recreate the issue and produce a database that was unable to sync under the `master` branch but able to sync with this branch. These new changes solve the issue by ignoring the persisted `canonical_head_block_root` value and instead getting fork choice to generate it. This ensures that the canonical head is in-sync with fork choice. ## Additional Info This is hotfix method that leaves some crusty code hanging around. Once this PR is merged (to satisfy the v0.2.x users) we should later update and merge #1638 so we can have a clean fix for the v0.3.x versions.	2020-09-22 02:06:10 +00:00
Pawan Dhananjay	14ff38539c	Add trusted peers (#1640 ) ## Issue Addressed Closes #1581 ## Proposed Changes Adds a new cli option for trusted peers who always have the maximum possible score.	2020-09-22 01:12:36 +00:00
Michael Sproul	5d17eb899f	Update LevelDB to v0.8.6, removing patch (#1636 ) Removes our dependency on a fork of LevelDB now that https://github.com/skade/leveldb-sys/pull/17 is merged	2020-09-21 11:53:53 +00:00
Age Manning	1db8daae0c	Shift metadata to the global network variables (#1631 ) ## Issue Addressed N/A ## Proposed Changes Shifts the local `metadata` to `network_globals` making it accessible to the HTTP API and other areas of lighthouse. ## Additional Info N/A	2020-09-21 02:00:38 +00:00
Pawan Dhananjay	7b97c4ad30	Snappy additional sanity checks (#1625 ) ## Issue Addressed N/A ## Proposed Changes Adds the following check from the spec > A reader SHOULD NOT read more than max_encoded_len(n) bytes after reading the SSZ length-prefix n from the header.	2020-09-21 01:06:25 +00:00
Paul Hauner	371e1c1d5d	Bump version to v0.2.10 (#1630 ) ## Issue Addressed NA ## Proposed Changes Bump crate version so we can cut a new release with the fix from #1629. ## Additional Info NA	2020-09-18 06:41:29 +00:00
Paul Hauner	a17f74896a	Fix bad assumption when checking finalized descendant (#1629 ) ## Issue Addressed - Resolves #1616 ## Proposed Changes Fixes a bug where we are unable to read the finalized block from fork choice. ## Detail I had made an assumption that the finalized block always has a parent root of `None`: `e5fc6bab48/consensus/fork_choice/src/fork_choice.rs (L749-L752)` This was a faulty assumption, we don't set parent roots to `None`. Instead we sometimes set parent indices to `None`, depending if this pruning condition is satisfied: `e5fc6bab48/consensus/proto_array/src/proto_array.rs (L229-L232)` The bug manifested itself like this: 1. We attempt to get the finalized block from fork choice 1. We try to check that the block is descendant of the finalized block (note: they're the same block). 1. We expect the parent root to be `None`, but it's actually the parent root of the finalized root. 1. We therefore end up checking if the parent of the finalized root is a descendant of itself. (note: it's an ancestor not a descendant). 1. We therefore declare that the finalized block is not a descendant of (or eq to) the finalized block. Bad. ## Additional Info In reflection, I made a poor assumption in the quest to obtain a probably negligible performance gain. The performance gain wasn't worth the risk and we got burnt.	2020-09-18 05:14:31 +00:00
Age Manning	49ab414594	Shift gossipsub validation (#1612 ) ## Issue Addressed N/A ## Proposed Changes This will consider all gossipsub messages that have either the `from`, `seqno` or `signature` field as invalid. ## Additional Info We should not merge this until all other clients have been sending empty fields for a while. See https://github.com/ethereum/eth2.0-specs/issues/1981 for reference	2020-09-18 02:05:36 +00:00
Age Manning	2074beccdc	Gossipsub message id to shortened bytes (#1607 ) ## Issue Addressed https://github.com/ethereum/eth2.0-specs/pull/2044 ## Proposed Changes Shifts the gossipsub message id to use the first 8 bytes of the SHA256 hash of the gossipsub message data field. ## Additional Info We should merge this in once the spec has been decided on. It will cause issues with gossipsub scoring and gossipsub propagation rates (as we won't receive IWANT) messages from clients that also haven't made this update.	2020-09-18 02:05:34 +00:00
Age Manning	c9596fcf0e	Temporary Sync Work-Around (#1615 ) ## Issue Addressed #1590 ## Proposed Changes This is a temporary workaround that prevents finalized chain sync from swapping chains. I'm merging this in now until the full solution is ready.	2020-09-13 23:58:49 +00:00
Age Manning	c6abc56113	Prevent large step-size parameters (#1583 ) ## Issue Addressed Malicious users could request very large block ranges, more than we expect. Although technically legal, we are now quadraticaly weighting large step sizes in the filter. Therefore users may request large skips, but not a large number of blocks, to prevent requests forcing us to do long chain lookups. ## Proposed Changes Weight the step parameter in the RPC filter and prevent any overflows that effect us in the step parameter. ## Additional Info	2020-09-11 02:33:36 +00:00
blacktemplar	7f1b936905	ignore too early / too late attestations instead of penalizing them (#1608 ) ## Issue Addressed NA ## Proposed Changes This ignores attestations that are too early or too late as it is specified in the spec (see https://github.com/ethereum/eth2.0-specs/blob/v0.12.1/specs/phase0/p2p-interface.md#global-topics first subpoint of `beacon_aggregate_and_proof`)	2020-09-11 01:43:15 +00:00
Pawan Dhananjay	0525876882	Dial cached enr's before making subnet discovery query (#1376 ) ## Issue Addressed Closes #1365 ## Proposed Changes Dial peers in the `cached_enrs` who aren't connected, aren't banned and satisfy the subnet predicate before making a subnet discovery query.	2020-09-11 00:52:27 +00:00
Age Manning	d79366c503	Prevent printing binary in RPC errors (#1604 ) ## Issue Addressed #1566 ## Proposed Changes Prevents printing binary characters in the RPC error response from peers.	2020-09-10 04:43:22 +00:00
Age Manning	b19cf02d2d	Penalise bad peer behaviour (#1602 ) ## Issue Addressed #1386 ## Proposed Changes Penalises peers in our scoring system that produce invalid attestations or blocks.	2020-09-10 03:51:06 +00:00
Paul Hauner	0821e6b39f	Bump version to v0.2.9 (#1598 ) ## Issue Addressed NA ## Proposed Changes - Bump version tags - Run `cargo update` ## Additional Info NA	2020-09-09 02:28:35 +00:00
Pawan Dhananjay	00cdc4bb35	Update state before producing attestation (#1596 ) ## Issue Addressed Partly addresses #1547 ## Proposed Changes This fix addresses the missing attestations at slot 0 of an epoch (also sometimes slot 1 when slot 0 was skipped). There are 2 cases: 1. BN receives the block for the attestation slot after 4 seconds (1/3rd of the slot). 2. No block is proposed for this slot. In both cases, when we produce the attestation, we pass the head state to the `produce_unaggregated_attestation_for_block` function here `9833eca024/beacon_node/beacon_chain/src/beacon_chain.rs (L845-L850)` Since we don't advance the state in this function, we set `attestation.data.source = state.current_justified_checkpoint` which is atleast 2 epochs lower than current_epoch(wall clock epoch). This attestation is invalid and cannot be included in a block because of this assert from the spec: ```python if data.target.epoch == get_current_epoch(state): assert data.source == state.current_justified_checkpoint state.current_epoch_attestations.append(pending_attestation) ``` https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/beacon-chain.md#attestations This PR changes the `produce_unaggregated_attestation_for_block` function to ensure that it advances the state before producing the attestation at the new epoch. Running this on my node, have missed 0 attestations across all 8 of my validators in a 100 epoch period 🎉 To compare, I was missing ~14 attestations across all 8 validators in the same 100 epoch period before the fix. Will report missed attestations if any after running for another 100 epochs tomorrow.	2020-09-08 11:25:43 +00:00
Daniel Schonfeld	2a9a815f29	conforming to the p2p specs, requiring error_messages to be bound (#1593 ) ## Issue Addressed #1421 ## Proposed Changes Bounding the error_message that can be returned for RPC domain errors Co-authored-by: Age Manning <Age@AgeManning.com>	2020-09-07 06:47:05 +00:00
Age Manning	a6376b4585	Update discv5 to v10 (#1592 ) ## Issue Addressed Code improvements, dependency improvements and better async handling.	2020-09-07 05:53:20 +00:00
Sean	638daa87fe	Avoid Printing Binary String to Logs (#1576 ) Converts the graffiti binary data to string before printing to logs. ## Issue Addressed #1566 ## Proposed Changes Rather than converting graffiti to a vector the binary data less the last character is passed to String::from_utf_lossy(). This then allows us to call the to_string() function directly to give us the string ## Additional Info Rust skills are fairly weak	2020-09-05 05:46:25 +00:00
Age Manning	fb9d828e5e	Extended Gossipsub metrics (#1577 ) ## Issue Addressed N/A ## Proposed Changes Adds extended metrics to get a better idea of what is happening at the gossipsub layer of lighthouse. This provides information about mesh statistics per topics, subscriptions and peer scores. ## Additional Info	2020-09-01 06:59:14 +00:00
Pawan Dhananjay	adea7992f8	Eth1 network exit on wrong network id (#1563 ) ## Issue Addressed Fixes #1509 ## Proposed Changes Exit the beacon node if the eth1 endpoint points to an invalid eth1 network. Check the network id before every eth1 cache update and display an error log if the network id has changed to an invalid one.	2020-08-31 02:36:17 +00:00
blacktemplar	c18d37c202	Use Gossipsub 1.1 (#1516 ) ## Issue Addressed #1172 ## Proposed Changes * updates the libp2p dependency * small adaptions based on changes in libp2p * report not just valid messages but also invalid and distinguish between `IGNORE`d messages and `REJECT`ed messages Co-authored-by: Age Manning <Age@AgeManning.com>	2020-08-30 13:06:50 +00:00
Paul Hauner	967700c1ff	Bump version to v0.2.8 (#1572 ) ## Issue Addressed NA ## Proposed Changes - Bump versions - Run `cargo update` ## Additional Info NA	2020-08-27 07:04:12 +00:00
Adam Szkoda	d9f4819fe0	Alternative (to BeaconChainHarness) BeaconChain testing API (#1380 ) The PR: * Adds the ability to generate a crucial test scenario that isn't possible with `BeaconChainHarness` (i.e. two blocks occupying the same slot; previously forks necessitated skipping slots): ![image](https://user-images.githubusercontent.com/165678/88195404-4bce3580-cc40-11ea-8c08-b48d2e1d5959.png) * New testing API: Instead of repeatedly calling add_block(), you generate a sorted `Vec<Slot>` and leave it up to the framework to generate blocks at those slots. * Jumping backwards to an earlier epoch is a hard error, so that tests necessarily generate blocks in a epoch-by-epoch manner. * Configures the test logger so that output is printed on the console in case a test fails. The logger also plays well with `--nocapture`, contrary to the existing testing framework * Rewrites existing fork pruning tests to use the new API * Adds a tests that triggers finalization at a non epoch boundary slot * Renamed `BeaconChainYoke` to `BeaconChainTestingRig` because the former has been too confusing * Fixed multiple tests (e.g. `block_production_different_shuffling_long`, `delete_blocks_and_states`, `shuffling_compatible_simple_fork`) that relied on a weird (and accidental) feature of the old `BeaconChainHarness` that attestations aren't produced for epochs earlier than the current one, thus masking potential bugs in test cases. Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2020-08-26 09:24:55 +00:00
Michael Sproul	4763f03dcc	Fix bug in database pruning (#1564 ) ## Issue Addressed Closes #1488 ## Proposed Changes * Prevent the pruning algorithm from over-eagerly deleting states at skipped slots when they are shared with the canonical chain. * Add `debug` logging to the pruning algorithm so we have so better chance of debugging future issues from logs. * Modify the handling of the "finalized state" in the beacon chain, so that it's always the state at the first slot of the finalized epoch (previously it was the state at the finalized block). This gives database pruning a clearer and cleaner view of things, and will marginally impact the pruning of the op pool, observed proposers, etc (in ways that are safe as far as I can tell). * Remove duplicated `RevertedFinalizedEpoch` check from `after_finalization` * Delete useless and unused `max_finality_distance` * Add tests that exercise pruning with shared states at skip slots * Delete unnecessary `block_strategy` argument from `add_blocks` and friends in the test harness (will likely conflict with #1380 slightly, sorry @adaszko -- but we can fix that) * Bonus: add a `BeaconChain::with_head` method. I didn't end up needing it, but it turned out quite nice, so I figured we could keep it? ## Additional Info Any users who have experienced pruning errors on Medalla will need to resync after upgrading to a release including this change. This should end unbounded `chain_db` growth! 🎉	2020-08-26 00:01:06 +00:00
Paul Hauner	dfd02d6179	Bump to v0.2.7 (#1561 ) ## Issue Addressed NA ## Proposed Changes - Update to v0.2.7 - Add script to make update easy. ## Additional Info NA	2020-08-24 08:25:34 +00:00
Paul Hauner	3569506acd	Remove rayon from rest_api (#1562 ) ## Issue Addressed NA ## Proposed Changes Addresses a deadlock condition described here: https://hackmd.io/ijQlqOdqSGaWmIo6zMVV-A?view ## Additional Info NA	2020-08-24 07:28:54 +00:00
Paul Hauner	c895dc8971	Shift HTTP server heavy-lifting to blocking executor (#1518 ) ## Issue Addressed NA ## Proposed Changes Shift practically all HTTP endpoint handlers to the blocking executor (some very light tasks are left on the core executor). ## Additional Info This PR covers the `rest_api` which will soon be refactored to suit the standard API. As such, I've cut a few corners and left some existing issues open in this patch. What I have done here should leave the API in state that is not necessary exactly the same, but good enough for us to run validators with. Specifically, the number of blocking workers that can be spawned is unbounded and I have not implemented a queue; this will need to be fixed when we implement the standard API.	2020-08-24 03:06:10 +00:00
blacktemplar	2bc9115a94	reuse beacon_node methods for initializing network configs in boot_node (#1520 ) ## Issue Addressed #1378 ## Proposed Changes Boot node reuses code from beacon_node to initialize network config. This also enables using the network directory to store/load the enr and the private key. ## Additional Info Note that before this PR the port cli arguments were off (the argument was named `enr-port` but used as `boot-node-enr-port`). Therefore as port always the cli port argument was used (for both enr and listening). Now the enr-port argument can be used to overwrite the listening port as the public port others should connect to. Last but not least note, that this restructuring reuses `ethlibp2p::NetworkConfig` that has many more options than the ones used in the boot node. For example the network config has an own `discv5_config` field that gets never used in the boot node and instead another `Discv5Config` gets created later in the boot node process. Co-authored-by: Age Manning <Age@AgeManning.com>	2020-08-21 12:00:01 +00:00
blacktemplar	3f0a113c7f	ban IP addresses if too many banned peers for this IP address (#1543 ) ## Issue Addressed #1283 ## Proposed Changes All peers with the same IP will be considered banned as long as there are more than 5 (constant) peers with this IP that have a score below the ban threshold. As soon as some of those 5 peers get unbanned (through decay) and if there are then less than 5 peers with a score below the threshold the IP will be considered not banned anymore.	2020-08-21 01:41:12 +00:00
Paul Hauner	ebb25b5569	Bump version to v0.2.6 (#1549 ) ## Issue Addressed NA ## Proposed Changes See title. ## Additional Info NA	2020-08-19 09:31:01 +00:00
Pawan Dhananjay	bbed42f30c	Refactor attestation service (#1415 ) ## Issue Addressed N/A ## Proposed Changes Refactor attestation service to send out requests to find peers for subnets as soon as we get attestation duties. Earlier, we had much more involved logic to send the discovery requests to the discovery service only 6 slots before the attestation slot. Now that discovery is much smarter with grouped queries, the complexity in attestation service can be reduced considerably. Co-authored-by: Age Manning <Age@AgeManning.com>	2020-08-19 08:46:25 +00:00
divma	fdc6e2aa8e	Shutdown like a Sir (#1545 ) ## Issue Addressed #1494 ## Proposed Changes - Give the TaskExecutor the sender side of a channel that a task can clone to request shutting down - The receiver side of this channel is in environment and now we block until ctrl+c or an internal shutdown signal is received - The swarm now informs when it has reached 0 listeners - The network receives this message and requests the shutdown	2020-08-19 05:51:14 +00:00
Paul Hauner	8e7dd7b2b1	Add remaining network ops to queuing system (#1546 ) ## Issue Addressed NA ## Proposed Changes - Refactors the `BeaconProcessor` to remove some excessive nesting and file bloat - Sorry about the noise from this, it's all contained in 4d3f8c5 though. - Adds exits, proposer slashings, attester slashings to the `BeaconProcessor` so we don't get overwhelmed with large amounts of slashings (which happened a few hours ago). ## Additional Info NA	2020-08-19 05:09:53 +00:00
Age Manning	33b2a3d0e0	Version bump to v0.2.5 (#1540 ) ## Description Version bumps lighthouse to v0.2.5	2020-08-18 11:23:08 +00:00
Paul Hauner	93b7c3b7ff	Set default max skips to 700 (#1542 ) ## Issue Addressed NA ## Proposed Changes Sets the default max skips to 700 so that it can cover the 693 slot skip from `80894 - 80201`. ## Additional Info NA	2020-08-18 09:27:04 +00:00
Age Manning	2d0b214b57	Clean up logs (#1541 ) ## Description This PR improves some logging for the end-user. It downgrades some warning logs and removes the slots per second sync speed if we are syncing and the speed is 0. This is likely because we are syncing from a finalised checkpoint and the head doesn't change.	2020-08-18 08:11:39 +00:00
Paul Hauner	d4f763bbae	Fix mistake with attestation skip slots (#1539 ) ## Issue Addressed NA ## Proposed Changes - Fixes a mistake I made in #1530 which resulted us in not rejecting attestations that we intended to reject. - Adds skip-slot checks for blocks earlier in import process, so it rejects gossip and RPC blocks. ## Additional Info NA	2020-08-18 06:28:26 +00:00
Age Manning	e1e5002d3c	Fingerprint Lodestar (#1536 ) Fingerprints the Lodestar client	2020-08-18 06:28:24 +00:00
Age Manning	8311074d68	Purge out-dated head chains on chain completion (#1538 ) ## Description There can be many head chains queued up to complete. Currently we try and process all of these to completion before we consider the node synced. In a chaotic network, there can be many of these and processing them to completion can be very expensive and slow. This PR removes any non-syncing head chains from the queue, and re-status's the peers. If, after we have synced to head on one chain, there is still a valid head chain to download, it will be re-established once the status has been returned. This should assist with getting nodes to sync on medalla faster.	2020-08-18 05:22:34 +00:00
Age Manning	3bb30754d9	Keep track of failed head chains and prevent re-lookups (#1534 ) ## Overview There are forked chains which get referenced by blocks and attestations on a network. Typically if these chains are very long, we stop looking up the chain and downvote the peer. In extreme circumstances, many peers are on many chains, the chains can be very deep and become time consuming performing lookups. This PR adds a cache to known failed chain lookups. This prevents us from starting a parent-lookup (or stopping one half way through) if we have attempted the chain lookup in the past.	2020-08-18 03:54:09 +00:00
Age Manning	cc44a64d15	Limit parallelism of head chain sync (#1527 ) ## Description Currently lighthouse load-balances across peers a single finalized chain. The chain is selected via the most peers. Once synced to the latest finalized epoch Lighthouse creates chains amongst its peers and syncs them all in parallel amongst each peer (grouped by their current head block). This is typically fast and relatively efficient under normal operations. However if the chain has not finalized in a long time, the head chains can grow quite long. Peer's head chains will update every slot as new blocks are added to the head. Syncing all head chains in parallel is a bottleneck and highly inefficient in block duplication leads to RPC timeouts when attempting to handle all new heads chains at once. This PR limits the parallelism of head syncing chains to 2. We now sync at most two head chains at a time. This allows for the possiblity of sync progressing alongside a peer being slow and holding up one chain via RPC timeouts.	2020-08-18 02:49:24 +00:00
divma	46dbf027af	Do not reset batch ids & redownload out of range batches (#1528 ) The changes are somewhat simple but should solve two issues: - When quickly changing between chains once and a second time back again, batchIds would collide and cause havoc. - If we got an out of range response from a peer, sync would remain in syncing but without advancing Changes: - remove the batch id. Identify each batch (inside a chain) by its starting epoch. Target epochs for downloading and processing now advance by EPOCHS_PER_BATCH - for the same reason, move the "to_be_downloaded_id" to be an epoch - remove a sneaky line that dropped an out of range batch without downloading it - bonus: put the chain_id in the log given to the chain. This is why explicitly logging the chain_id is removed	2020-08-18 01:29:51 +00:00
Paul Hauner	9a97a0b14f	Prepare for v0.2.4 (#1533 ) ## Issue Addressed NA ## Proposed Changes NA ## Additional Info NA	2020-08-17 12:13:42 +00:00
Michael Sproul	719a69aee0	Ignore blocks that skip a large distance from their parent (#1530 ) ## Proposed Changes To mitigate the impact of minority forks on RAM and disk usage, this change rejects blocks whose parent lies more than 320 slots (10 epochs, ~1 hour) in the past. The behaviour is configurable via `lighthouse bn --max-skip-slots N`, and can be turned off entirely using `--max-skip-slots none`. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2020-08-17 10:54:58 +00:00
Paul Hauner	a58aa6ee55	Revert back to discv5 alpha 8 to maintain ARM support (#1531 ) ## Issue Addressed NA ## Proposed Changes See title. ## Additional Info NA	2020-08-17 10:06:08 +00:00
Paul Hauner	f85485884f	Process gossip blocks on the GossipProcessor (#1523 ) ## Issue Addressed NA ## Proposed Changes Moves beacon block processing over to the newly-added `GossipProcessor`. This moves the task off the core executor onto the blocking one. ## Additional Info - With this PR, gossip blocks are being ignored during sync.	2020-08-17 09:20:27 +00:00
Paul Hauner	61d5b592cb	Memory usage reduction (#1522 ) ## Issue Addressed NA ## Proposed Changes - Adds a new function to allow getting a state with a bad state root history for attestation verification. This reduces unnecessary tree hashing during attestation processing, which accounted for 23% of memory allocations (by bytes) in a recent `heaptrack` observation. - Don't clone caches on intermediate epoch-boundary states during block processing. - Reject blocks that are known to fork choice earlier during gossip processing, instead of waiting until after state has been loaded (this only happens in edge-case). - Avoid multiple re-allocations by creating a "forced" exact size iterator. ## Additional Info NA	2020-08-17 08:05:13 +00:00
Age Manning	3c689a6837	Remove yamux support (#1526 ) ## Issue Addressed There is currently an issue with yamux when connecting to prysm peers. The source of the issue is currently unknown. This PR removes yamux support to force mplex negotation. We can add back yamux support once we have isolated and corrected the issue.	2020-08-17 05:05:06 +00:00
Age Manning	afdc4fea1d	Correct logic for peer sync identification (#1525 ) Fix a small sync bug which can mis-classify newly connected peers.	2020-08-17 03:00:10 +00:00
Pawan Dhananjay	850a2d5985	Persist metadata and enr across restarts (#1513 ) ## Issue Addressed Resolves #1489 ## Proposed Changes - Change starting metadata seq num to 0 according to the [spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md#metadata). - Remove metadata field from `NetworkGlobals` - Persist metadata to disk on every update - Load metadata seq number from disk on restart - Persist enr to disk on update to ensure enr sequence number increments are persisted as well. ## Additional info Since we modified starting metadata seq num to 0 from 1, we might still see `Invalid Sequence number provided` like in #1489 from prysm nodes if they have our metadata cached.	2020-08-17 02:13:28 +00:00
divma	113b40f321	Add multiaddr support in bootnodes (#1481 ) ## Issue Addressed #1384 Only catch, as currently implemented, when dialing the multiaddr nodes, there is no way to ask the peer manager if they are already connected or dialing	2020-08-17 02:13:26 +00:00
Age Manning	99acfb50f2	Update gossipsub duplicate cache (#1524 ) This potentially handles memory leak issues by preventing adding references to already seen gossipsub messages.	2020-08-17 01:27:33 +00:00
Age Manning	c75c06cf16	Update discv5 to alpha.9 (#1517 ) ## Discovery v5 update In this update we remove the openssl dependency in favour of rust-crypto. The update also removes a series of unnecessary async functions which may improve some of the issues we have been experiencing.	2020-08-15 04:02:14 +00:00
Paul Hauner	f4a7311008	Update to v0.2.3 (#1519 ) ## Issue Addressed NA ## Proposed Changes Bump versions to v0.2.3. ## Additional Info NA	2020-08-14 08:32:31 +00:00
Paul Hauner	619ad106cf	Restrict fork choice getters to finalized blocks (#1475 ) ## Issue Addressed - Resolves #1451 ## Proposed Changes - Restricts the `contains_block` and `contains_block` so they only indicate a block is present if it descends from the finalized root. This helps to ensure that fork choice never points to a block that has been pruned from the database. - Resolves #1451 - Before importing a block, double-check that its parent is known and a descendant of the finalized root. - Split a big, monolithic block verification test into smaller tests. ## Additional Notes I suspect there would be a craftier way to do the `is_descendant_of_finalized` check, but we're a bit tight on time now and we can optimize later if it starts showing in benches. ## TODO - [x] Tests	2020-08-14 06:36:38 +00:00
Paul Hauner	b0a3731fff	Introduce a queue for attestations from the network (#1511 ) ## Issue Addressed N/A ## Proposed Changes Introduces the `GossipProcessor`, a multi-threaded (multi-tasked?), non-blocking processor for some messages from the network which require verification and import into the `BeaconChain`. Initial testing indicates that this massively improves system stability by (a) moving block tasks from the normal executor (b) spreading out attestation load. ## Additional Info TBC	2020-08-14 04:38:45 +00:00
Adam Szkoda	05a8399769	Wind down the SSE thread when the client disconnects (#1514 ) These started to appear when I `^C` `curl -N http://localhost:5052/beacon/fork/stream`: `Aug 12 13:00:01.539 ERRO Couldn't stream piece hyper::Error(ChannelClosed), service: http` Something must have changed in hyper since SSE has been implemented because I'm sure I haven't seen those errors before. This PR properly detects a closed SSE stream and cleans up.	2020-08-13 06:12:18 +00:00
Adam Szkoda	8a1a4051cf	Fix a bug in fork pruning (#1507 ) Extracted from https://github.com/sigp/lighthouse/pull/1380 because merging #1380 proves to be contentious. Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2020-08-12 07:00:00 +00:00
Paul Hauner	b063df5bf9	Cross-compile to vendored x86_84, aarch64 (Raspberry Pi 4) (#1497 ) ## Issue Addressed NA ## Proposed Changes Adds support for using the [`cross`](https://github.com/rust-embedded/cross) project to produce cross-compiled binaries using Docker images. Provides quite clean and simple cross-compiles cause all the complexity is hidden in Dockerfiles. It does require you to be in the `docker` group though. ## Details - Adds shortcut commands to `Makefile` - Ensures `reqwest` and `discv5` use vendored openssl libs (i.e., static not shared). - Switches to a [commit](`284f705964`) of blst that has a renamed C function to avoid a collision with openssl (upstream issue: https://github.com/supranational/blst/issues/21). - Updates `ring` to the latest satisfiable version, since an earlier version was causing issues with `cross`. - Off-topic, but adds extra message about Windows support as suggested by Discord user. ## Additional Info - ~~Blocked on #1495~~ - There are no tests in CI for this yet for a few reasons: - I'm hesitant to add more long-running tasks. - Short-term bitrot should be avoided since we'll use it each release. - In the long term I think it would be good to automate binary creation on a release. - I observed the binaries increase in size from 50mb to 52mb after these changes.	2020-08-11 05:16:30 +00:00
divma	1a67d15701	Mitigate too many outgoing connections (#1469 ) limit simultaneous outgoing connections attempts to a reasonable top as an extra layer of protection also shift the keep alive logic of the rpc handler to avoid needing to update it by hand. I think In rare cases this could make shutting down a connection a bit faster.	2020-08-11 02:16:31 +00:00
realbigsean	ec84183e05	Add graffiti cli flag to the validator client. (#1425 ) ## Issue Addressed #1419 ## Proposed Changes Creates a `--graffiti` cli flag in the validator client. If the flag is set, it overrides graffiti in the beacon node. ## Additional Info	2020-08-11 02:16:29 +00:00
divma	95b55d7170	Block error display (#1503 ) ## Issue Addressed #1486	2020-08-11 01:30:26 +00:00
Age Manning	134676fd6f	Version bump to v0.2.2 (#1496 ) Version bump to v0.2.2	2020-08-10 06:49:03 +00:00
Age Manning	cbfae87aa6	Upgrade logs (#1495 ) ## Issue Addressed #1483 ## Proposed Changes Upgrades the log to a critical if a listener fails. We are able to listen on many interfaces so a single instance is not critical. We should however gracefully shutdown the client if we have no listeners, although the client can still function solely on outgoing connections. For now a critical is raised and I leave #1494 for more sophisticated handling of this. This also updates discv5 to handle errors of binding to a UDP socket such that lighthouse is now able to handle them.	2020-08-10 05:19:51 +00:00
Age Manning	04e4389efe	Patch gossipsub (#1490 ) ## Issue Addressed Some nodes not following head, high CPU usage and HTTP API delays ## Proposed Changes Patches gossipsub. Gossipsub was using an `lru_time_cache` to check for duplicates. This contained an `O(N)` lookup for every gossipsub message to update the time cache. This was causing high cpu usage and blocking network threads. This PR introduces a custom cache without `O(N)` inserts. This also adds built in safety mechanisms to prevent gossipsub from excessively retrying connections upon failure. A maximum limit is set after which we disconnect from the node from too many failed substream connections.	2020-08-08 08:09:04 +00:00
Age Manning	08a31c5a1a	Disconnect peers (#1484 ) ## Issue Addressed Peers that connected after the peer limit may remain connected in some circumstances. This ensures peers not in the peer manager's list get disconnected. Further logging is also added to track this behaviour.	2020-08-08 06:08:44 +00:00
Age Manning	a1f9769040	Libp2p update (#1482 ) Updates to latest libp2p master. This now has native noise support. This PR - Removes secio support - Prioritises mplex over yamux	2020-08-08 02:17:32 +00:00
Paul Hauner	0b287f6ece	Push naive attestations into op pool (#1466 ) ## Issue Addressed NA ## Proposed Changes - When producing a block, go and ensure every attestation in the naive aggregation pool is included in the operation pool. This should help us increase the number of useful attestations in a block. - Lift the `RwLock`s inside `NaiveAggregationPool` up into a single high-level lock. There were race conditions in the existing setup and it was hard to reason about. ## Additional Info NA	2020-08-06 07:26:46 +00:00
divma	7d87e11e0f	Fix rpc coded response display (#1470 ) Prevent errors to be printed in debug mode	2020-08-06 04:29:23 +00:00
Pawan Dhananjay	983f768034	Remove ssz encoding support from rpc (#1457 ) ## Issue Addressed Partially resolves #1422 ## Proposed Changes Remove ssz encoding from req/resp in rpc.	2020-08-06 04:29:19 +00:00
divma	138c0cf7f0	Remove block clone (#1448 ) ## Issue Addressed #1028 A bit late, but I think if `BlockError` had a kind (the current `BlockError` minus everything on the variants that comes directly from the block) and the original block, more clones could be removed	2020-08-06 04:29:17 +00:00
Age Manning	09a615b2c0	Lighthouse crate v0.2.0 bump (#1450 ) ## Description This PR marks Lighthouse v0.2.0. This release marks the stable version of Lighthouse, ready for the approaching Medalla testnet.	2020-08-06 03:43:05 +00:00
divma	924ba66218	Update v0.12.2 gossip params (#1449 ) ## Issue Addressed #1422	2020-08-06 00:04:33 +00:00
Paul Hauner	5629126f45	Add reason to invalid attestation log (#1460 ) ## Issue Addressed NA ## Proposed Changes Adds an extra field to a debug log so we can see why an attestation was invalid. ## Additional Info NA	2020-08-05 01:49:52 +00:00
Paul Hauner	f26adc0a36	Lighthouse v0.2.0 (Medalla) (#1452 ) ## Issue Addressed NA ## Proposed Changes - Moves the git-based versioning we were doing into the `lighthouse_version` crate in `common`. - Removes the `beacon_node/version` crate, replacing it with `lighthouse_version`. - Bumps the version to `v0.2.0`. ## Additional Info There are now two types of version string: 1. `const VERSION: &str = Lighthouse/v0.2.0-1419501f2+` 1. `version_with_platform() = Lighthouse/v0.2.0-1419501f2+/x86_64-linux` (1) is handy cause it's a `const` and shorter. (2) has platform info so it's more useful. Note that the plus-sign (`+`) indicates the the git commit is dirty (it used to be `(modified)` but I had to shorten it to fit into graffiti). These version strings are now included on: - `lighthouse --version` - `lcli --version` - `curl localhost:5052/node/version` - p2p messages when we communicate our version You can update the version by changing this constant (version is not related to a `Cargo.toml`): `b9ad7102d5/common/lighthouse_version/src/lib.rs (L4-L15)`	2020-08-04 07:44:53 +00:00
divma	1bbecbcf26	Track gossip subscriptions as a metric (#1445 ) ## Issue Addressed #1399 ## Proposed Changes Set an Int gauge per topic and inc/dec when peers subscribe/unsubscribe	2020-08-04 04:18:10 +00:00
Age Manning	31707ccf45	Shift author to sigma prime on some crates (#1440 ) Shifts the author to sigma prime on some crates	2020-08-04 02:31:41 +00:00
Age Manning	1419501f2e	Update peerdb constants (#1444 ) Increases the cache for disconnected and banned peers.	2020-08-03 12:48:22 +00:00
Age Manning	37679b8898	Update score decay behaviour (#1442 )	2020-08-03 20:46:08 +10:00
Age Manning	f634f073a8	Correct issue with network message passing (#1439 ) ## Issue Addressed Sync was breaking occasionally. The root cause appears to be identify crashing as events we being sent to the protocol after nodes were banned. Have not been able to reproduce sync issues since this update. ## Proposed Changes Only send messages to sub-behaviour protocols if the peer manager thinks the peer is connected. All other messages are dropped.	2020-08-03 09:35:53 +00:00
Age Manning	3b5da8f35f	Gossipsub update (#1432 ) ## Issue Addressed The most recent gossipsub update had an issue where some privacy settings lead to not sending a sequence number with the message. Although Lighthouse treats these as valid (based on current configuration) other clients may not. This corrects gossipsub to send sequence numbers where expected and based on the configuration settings.	2020-08-02 13:19:56 +00:00
divma	4d77784bb8	Rate limit RPC requests (#1402 ) ## Issue Addressed #1056 ## Proposed Changes - Add a rate limiter to the RPC behaviour. This also means the rate limiting occurs just before the door to the application level, so the number of connections a peer opens does not affect this (this would happen in the future if put on the handler) - The algorithm used is the leaky bucket as a meter / token bucket implemented the GCRA way - Each protocol has its own limit. Due to the way the algorithm works, the "small" protocols have a hard limit, while bbrange and bbroot allow [burstiness](https://www.wikiwand.com/en/Burstiness). This is so that a peer can't request hundreds of individual requests expecting only one block in a short period of time, it also allows a peer to send two half size requests instead of one with max if they want to without getting limited, and.. it also allows a peer to request a batch of the maximum size and then send _appropriately spaced_ requests of really small sizes. From what I've seen in sync this is plausible when reaching the target slot. ## Additional Info Needs to be heavily tested	2020-07-31 05:47:09 +00:00

1 2 3 4 5 ...

1490 Commits