lighthouse

Author	SHA1	Message	Date
Paul Hauner	bf4e02e2cc	Return a specific error for frozen attn states (#2384 ) ## Issue Addressed NA ## Proposed Changes Return a very specific error when at attestation reads shuffling from a frozen `BeaconState`. Previously, this was returning `MissingBeaconState` which indicates a much more serious issue. ## Additional Info Since `get_inconsistent_state_for_attestation_verification_only` is only called once in `BeaconChain::with_committee_cache`, it is quite easy to reason about the impact of this change.	2021-06-01 06:59:43 +00:00
Paul Hauner	ba9c4c5eea	Return more detail in Eth1 HTTP errors (#2383 ) ## Issue Addressed NA ## Proposed Changes Whilst investigating #2372, I [learned](https://github.com/sigp/lighthouse/issues/2372#issuecomment-851725049) that the error message returned from some failed Eth1 requests are always `NotReachable`. This makes debugging quite painful. This PR adds more detail to these errors. For example: - Bad infura key: `ERRO Failed to update eth1 cache error: Failed to update Eth1 service: "All fallback errored: https://mainnet.infura.io/ => EndpointError(RequestFailed(\"Response HTTP status was not 200 OK: 401 Unauthorized.\"))", retry_millis: 60000, service: eth1_rpc` - Unreachable server: `ERRO Failed to update eth1 cache error: Failed to update Eth1 service: "All fallback errored: http://127.0.0.1:8545/ => EndpointError(RequestFailed(\"Request failed: reqwest::Error { kind: Request, url: Url { scheme: \\\"http\\\", cannot_be_a_base: false, username: \\\"\\\", password: None, host: Some(Ipv4(127.0.0.1)), port: Some(8545), path: \\\"/\\\", query: None, fragment: None }, source: hyper::Error(Connect, ConnectError(\\\"tcp connect error\\\", Os { code: 111, kind: ConnectionRefused, message: \\\"Connection refused\\\" })) }\"))", retry_millis: 60000, service: eth1_rpc` - Bad server: `ERRO Failed to update eth1 cache error: Failed to update Eth1 service: "All fallback errored: http://127.0.0.1:8545/ => EndpointError(RequestFailed(\"Response HTTP status was not 200 OK: 501 Not Implemented.\"))", retry_millis: 60000, service: eth1_rpc` ## Additional Info NA	2021-06-01 06:59:41 +00:00
Paul Hauner	4c7bb4984c	Use the forwards iterator more often (#2376 ) ## Issue Addressed NA ## Primary Change When investigating memory usage, I noticed that retrieving a block from an early slot (e.g., slot 900) would cause a sharp increase in the memory footprint (from 400mb to 800mb+) which seemed to be ever-lasting. After some investigation, I found that the reverse iteration from the head back to that slot was the likely culprit. To counter this, I've switched the `BeaconChain::block_root_at_slot` to use the forwards iterator, instead of the reverse one. I also noticed that the networking stack is using `BeaconChain::root_at_slot` to check if a peer is relevant (`check_peer_relevance`). Perhaps the steep, seemingly-random-but-consistent increases in memory usage are caused by the use of this function. Using the forwards iterator with the HTTP API alleviated the sharp increases in memory usage. It also made the response much faster (before it felt like to took 1-2s, now it feels instant). ## Additional Changes In the process I also noticed that we have two functions for getting block roots: - `BeaconChain::block_root_at_slot`: returns `None` for a skip slot. - `BeaconChain::root_at_slot`: returns the previous root for a skip slot. I unified these two functions into `block_root_at_slot` and added the `WhenSlotSkipped` enum. Now, the caller must be explicit about the skip-slot behaviour when requesting a root. Additionally, I replaced `vec![]` with `Vec::with_capacity` in `store::chunked_vector::range_query`. I stumbled across this whilst debugging and made this modification to see what effect it would have (not much). It seems like a decent change to keep around, but I'm not concerned either way. Also, `BeaconChain::get_ancestor_block_root` is unused, so I got rid of it 🗑️. ## Additional Info I haven't also done the same for state roots here. Whilst it's possible and a good idea, it's more work since the fwds iterators are presently block-roots-specific. Whilst there's a few places a reverse iteration of state roots could be triggered (e.g., attestation production, HTTP API), they're no where near as common as the `check_peer_relevance` call. As such, I think we should get this PR merged first, then come back for the state root iters. I made an issue here https://github.com/sigp/lighthouse/issues/2377.	2021-05-31 04:18:20 +00:00
Kevin Lu	320a683e72	Minimum Outbound-Only Peers Requirement (#2356 ) ## Issue Addressed #2325 ## Proposed Changes This pull request changes the behavior of the Peer Manager by including a minimum outbound-only peers requirement. The peer manager will continue querying for peers if this outbound-only target number hasn't been met. Additionally, when peers are being removed, an outbound-only peer will not be disconnected if doing so brings us below the minimum. ## Additional Info Unit test for heartbeat function tests that disconnection behavior is correct. Continual querying for peers if outbound-only hasn't been met is not directly tested, but indirectly through unit testing of the helper function that counts the number of outbound-only peers. EDIT: Am concerned about the behavior of ```update_peer_scores```. If we have connected to a peer with a score below the disconnection threshold (-20), then its connection status will remain connected, while its score state will change to disconnected. ```rust let previous_state = info.score_state(); // Update scores info.score_update(); Self::handle_score_transitions( previous_state, peer_id, info, &mut to_ban_peers, &mut to_unban_peers, &mut self.events, &self.log, ); ``` ```previous_state``` will be set to Disconnected, and then because ```handle_score_transitions``` only changes connection status for a peer if the state changed, the peer remains connected. Then in the heartbeat code, because we only disconnect healthy peers if we have too many peers, these peers don't get disconnected. I'm not sure realistically how often this scenario would occur, but it might be better to adjust the logic to account for scenarios where the score state implies a connection status different from the current connection status. Co-authored-by: Kevin Lu <kevlu93@gmail.com>	2021-05-31 04:18:19 +00:00
Mac L	0847986936	Reduce outbound requests to eth1 endpoints (#2340 ) ## Issue Addressed #2282 ## Proposed Changes Reduce the outbound requests made to eth1 endpoints by caching the results from `eth_chainId` and `net_version`. Further reduce the overall request count by increasing `auto_update_interval_millis` from `7_000` (7 seconds) to `60_000` (1 minute). This will result in a reduction from ~2000 requests per hour to 360 requests per hour (during normal operation). A reduction of 82%. ## Additional Info If an endpoint fails, its state is dropped from the cache and the `eth_chainId` and `net_version` calls will be made for that endpoint again during the regular update cycle (once per minute) until it is back online. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2021-05-31 04:18:18 +00:00
Age Manning	ec5cceba50	Correct issue with dialing peers (#2375 ) The ordering of adding new peers to the peerdb and deciding when to dial them was not considered in a previous update. This adds the condition that if a peer is not in the peer-db then it is an acceptable peer to dial. This makes #2374 obsolete.	2021-05-29 07:25:06 +00:00
Age Manning	d12e746b50	Network protocol upgrades (#2345 ) This provides a number of upgrades to gossipsub and discovery. The updates are extensive and this needs thorough testing.	2021-05-28 22:02:10 +00:00
Paul Hauner	456b313665	Tune GNU malloc (#2299 ) ## Issue Addressed NA ## Proposed Changes Modify the configuration of [GNU malloc](https://www.gnu.org/software/libc/manual/html_node/The-GNU-Allocator.html) to reduce memory footprint. - Set `M_ARENA_MAX` to 4. - This reduces memory fragmentation at the cost of contention between threads. - Set `M_MMAP_THRESHOLD` to 2mb - This means that any allocation >= 2mb is allocated via an anonymous mmap, instead of on the heap/arena. This reduces memory fragmentation since we don't need to keep growing the heap to find big contiguous slabs of free memory. - ~~Run `malloc_trim` every 60 seconds.~~ - ~~This shaves unused memory from the top of the heap, preventing the heap from constantly growing.~~ - Removed, see: https://github.com/sigp/lighthouse/pull/2299#issuecomment-825322646 Note: this only provides memory savings on the Linux (glibc) platform. ## Additional Info I'm going to close #2288 in favor of this for the following reasons: - I've managed to get the memory footprint smaller here than with jemalloc. - This PR seems to be less of a dramatic change than bringing in the jemalloc dep. - The changes in this PR are strictly runtime changes, so we can create CLI flags which disable them completely. Since this change is wide-reaching and complex, it's nice to have an easy "escape hatch" if there are undesired consequences. ## TODO - [x] Allow configuration via CLI flags - [x] Test on Mac - [x] Test on RasPi. - [x] Determine if GNU malloc is present? - I'm not quite sure how to detect for glibc.. This issue suggests we can't really: https://github.com/rust-lang/rust/issues/33244 - [x] Make a clear argument regarding the affect of this on CPU utilization. - [x] Test with higher `M_ARENA_MAX` values. - [x] Test with longer trim intervals - [x] Add some stats about memory savings - [x] Remove `malloc_trim` calls & code	2021-05-28 05:59:45 +00:00
Pawan Dhananjay	fdaeec631b	Monitoring service api (#2251 ) ## Issue Addressed N/A ## Proposed Changes Adds a client side api for collecting system and process metrics and pushing it to a monitoring service.	2021-05-26 05:58:41 +00:00
Age Manning	55aada006f	More stringent dialing (#2363 ) * More stringent dialing * Cover cached enr dialing	2021-05-26 14:21:44 +10:00
ethDreamer	ba55e140ae	Enable Compatibility with Windows (#2333 ) ## Issue Addressed Windows incompatibility. ## Proposed Changes On windows, lighthouse needs to default to STDIN as tty doesn't exist. Also Windows uses ACLs for file permissions. So to mirror chmod 600, we will remove every entry in a file's ACL and add only a single SID that is an alias for the file owner. Beyond that, there were several changes made to different unit tests because windows has slightly different error messages as well as frustrating nuances around killing a process :/ ## Additional Info Tested on my Windows VM and it appears to work, also compiled & tested on Linux with these changes. Permissions look correct on both platforms now. Just waiting for my validator to activate on Prater so I can test running full validator client on windows. Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com> Co-authored-by: Michael Sproul <micsproul@gmail.com>	2021-05-19 23:05:16 +00:00
ethDreamer	cb47388ad7	Updated to comply with new clippy formatting rules (#2336 ) ## Issue Addressed The latest version of Rust has new clippy rules & the codebase isn't up to date with them. ## Proposed Changes Small formatting changes that clippy tells me are functionally equivalent	2021-05-10 00:53:09 +00:00
Mac L	bacc38c3da	Add testing for beacon node and validator client CLI flags (#2311 ) ## Issue Addressed N/A ## Proposed Changes Add unit tests for the various CLI flags associated with the beacon node and validator client. These changes require the addition of two new flags: `dump-config` and `immediate-shutdown`. ## Additional Info Both `dump-config` and `immediate-shutdown` are marked as hidden since they should only be used in testing and other advanced use cases. Note: This requires changing `main.rs` so that the flags can adjust the program behavior as necessary. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2021-05-06 00:36:22 +00:00
Mac L	4cc613d644	Add `SensitiveUrl` to redact user secrets from endpoints (#2326 ) ## Issue Addressed #2276 ## Proposed Changes Add the `SensitiveUrl` struct which wraps `Url` and implements custom `Display` and `Debug` traits to redact user secrets from being logged in eth1 endpoints, beacon node endpoints and metrics. ## Additional Info This also includes a small rewrite of the eth1 crate to make requests using `Url` instead of `&str`. Some error messages have also been changed to remove `Url` data.	2021-05-04 01:59:51 +00:00
ethDreamer	0aa8509525	Filter Disconnected Peers from Discv5 DHT (#2219 ) ## Issue Addressed #2107 ## Proposed Change The peer manager will mark peers as disconnected in the discv5 DHT when they disconnect or dial fails ## Additional Info Rationale for this particular change is explained in my comment on #2107	2021-04-28 04:07:37 +00:00
realbigsean	2c2c443718	404's on API requests for slots that have been skipped or orphaned (#2272 ) ## Issue Addressed Resolves #2186 ## Proposed Changes 404 for any block-related information on a slot that was skipped or orphaned Affected endpoints: - `/eth/v1/beacon/blocks/{block_id}` - `/eth/v1/beacon/blocks/{block_id}/root` - `/eth/v1/beacon/blocks/{block_id}/attestations` - `/eth/v1/beacon/headers/{block_id}` ## Additional Info Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-04-25 03:59:59 +00:00
Paul Hauner	3a24ca5f14	v1.3.0 (#2310 ) ## Issue Addressed NA ## Proposed Changes Bump versions. ## Additional Info This is a minor release (not patch) due to the very slight change introduced by #2291.	2021-04-13 22:46:34 +00:00
Michael Sproul	3b901dc5ec	Pack attestations into blocks in parallel (#2307 ) ## Proposed Changes Use two instances of max cover when packing attestations into blocks: one for the previous epoch, and one for the current epoch. This reduces the amount of computation done by roughly half due to the `O(n^2)` running time of max cover (`2 * (n/2)^2 = n^2/2`). This should help alleviate some load on block proposal, particularly on Prater.	2021-04-13 05:27:42 +00:00
Paul Hauner	c1203f5e52	Add specific log and metric for delayed blocks (#2308 ) ## Issue Addressed NA ## Proposed Changes - Adds a specific log and metric for when a block is enshrined as head with a delay that will caused bad attestations - We technically already expose this information, but it's a little tricky to determine during debugging. This makes it nice and explicit. - Fixes a minor reporting bug with the validator monitor where it was expecting agg. attestations too early (at half-slot rather than two-thirds-slot). ## Additional Info NA	2021-04-13 02:16:59 +00:00
Paul Hauner	0df7be1814	Add check for aggregate target (#2306 ) ## Issue Addressed NA ## Proposed Changes - Ensure that the [target consistency check](`b356f52c5c`) is always performed on aggregates. - Add a regression test. ## Additional Info NA	2021-04-13 00:24:39 +00:00
Age Manning	aaa14073ff	Clean up warnings (#2240 ) This is a small PR that cleans up compiler warnings. The most controversial change is removing the `data_dir` field from the `BeaconChainBuilder`. It was removed because it was never read. Co-authored-by: Paul Hauner <paul@paulhauner.com> Co-authored-by: Herman Junge <hermanjunge@protonmail.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2021-04-12 00:57:43 +00:00
Mac L	f6f64cf0f5	Correcting `disable-enr-auto-update` flag definition (#2303 ) ## Issue Addressed N/A ## Proposed Changes Correct the `disable-enr-auto-update` boolean flag so that it no longer requires a value. Previously it would require a value which was never used. ## Additional Info Flag is read here: https://github.com/sigp/lighthouse/blob/unstable/beacon_node/src/config.rs#L585-L587	2021-04-11 23:52:29 +00:00
Paul Hauner	e7e5878953	Avoid BeaconState clone during metrics scrape (#2298 ) ## Issue Addressed Which issue # does this PR address? ## Proposed Changes Avoids cloning the `BeaconState` each time Prometheus scrapes our metrics (generally every 5s 😱). I think the original motivation behind this was "don't hold the lock on the head whilst we do computation on it", however I think is flawed since our computation here is so small that it'll be quicker than the clone. The primary motivation here is to maintain a small memory footprint by holding less in memory (i.e., the cloned `BeaconState`) and to avoid the fragmentation-creep that occurs when cloning the big contiguous slabs of memory in the `BeaconState`. I also collapsed the active/slashed/withdrawn counters into a single loop to increase efficiency. ## Additional Info NA	2021-04-07 01:02:56 +00:00
Pawan Dhananjay	95a362213d	Fix local testnet scripts (#2229 ) ## Issue Addressed Resolves #2094 ## Proposed Changes Fixes scripts for creating local testnets. Adds an option in `lighthouse boot_node` to run with a previously generated enr.	2021-03-30 05:17:58 +00:00
Paul Hauner	9eb1945136	v1.2.2 (#2287 ) ## Issue Addressed NA ## Proposed Changes - Bump versions ## Additional Info NA	2021-03-30 04:07:03 +00:00
Paul Hauner	3d239b85ac	Allow for a clock disparity on the duties endpoints (#2283 ) ## Issue Addressed Resolves #2280 ## Proposed Changes Allows for API consumers to call the proposer/attester duties endpoints [`MAXIMUM_GOSSIP_CLOCK_DISPARITY`](`b34a79dc0b/beacon_node/beacon_chain/src/beacon_chain.rs (L99-L102)`) earlier than the current epoch. For additional reasoning, see https://github.com/sigp/lighthouse/issues/2280#issuecomment-805358897. ## Additional Info NA	2021-03-29 23:42:35 +00:00
Paul Hauner	03cefd0065	Expand observed attestations capacity (#2266 ) ## Issue Addressed NA ## Proposed Changes I noticed the following error on one of our nodes: ``` Mar 18 00:03:35 ip-xxxx lighthouse-bn[333503]: Mar 18 00:03:35.103 ERRO Unable to validate aggregate error: ObservedAttestersError(EpochTooLow { epoch: Epoch(23961), lowest_permissible_epoch: Epoch(23962) }), peer_id: 16Uiu2HAm5GL5KzPLhvfg9MBBFSpBqTVGRFSiTg285oezzWcZzwEv ``` The slot during this log was 766,815 (the last slot of the epoch). I believe this is due to an off-by-one error in `observed_attesters` where we were failing to provide enough capacity to store observations from the previous, current and next epochs. See code comments for further reasoning. Here's a link to the spec: https://github.com/ethereum/eth2.0-specs/blob/v1.0.1/specs/phase0/p2p-interface.md#beacon_aggregate_and_proof ## Additional Info NA	2021-03-29 23:42:34 +00:00
Michael Sproul	f9d60f5436	VC: accept unknown fields in chain spec (#2277 ) ## Issue Addressed Closes #2274 ## Proposed Changes * Modify the `YamlConfig` to collect unknown fields into an `extra_fields` map, instead of failing hard. * Log a debug message if there are extra fields returned to the VC from one of its BNs. This restores Lighthouse's compatibility with Teku beacon nodes (and therefore Infura)	2021-03-26 04:53:57 +00:00
Paul Hauner	b34a79dc0b	v1.2.1 (#2263 ) ## Issue Addressed NA ## Proposed Changes - Bump version. - Add some new ENR for Prater - Afri: https://github.com/eth2-clients/eth2-testnets/pull/42 - Prysm: https://github.com/eth2-clients/eth2-testnets/pull/43 - Apply the fixes from #2181 to the no-eth1-sim to try fix CI issues. ## Additional Info NA	2021-03-18 04:20:46 +00:00
Paul Hauner	015ab7d0a7	Optimize validator duties (#2243 ) ## Issue Addressed Closes #2052 ## Proposed Changes - Refactor the attester/proposer duties endpoints in the BN - Performance improvements - Fixes some potential inconsistencies with the dependent root fields. - Removes `http_api::beacon_proposer_cache` and just uses the one on the `BeaconChain` instead. - Move the code for the proposer/attester duties endpoints into separate files, for readability. - Refactor the `DutiesService` in the VC - Required to reduce the delay on broadcasting new blocks. - Gets rid of the `ValidatorDuty` shim struct that came about when we adopted the standard API. - Separate block/attestation duty tasks so that they don't block each other when one is slow. - In the VC, use `PublicKeyBytes` to represent validators instead of `PublicKey`. `PublicKey` is a legit crypto object whilst `PublicKeyBytes` is just a byte-array, it's much faster to clone/hash `PublicKeyBytes` and this change has had a significant impact on runtimes. - Unfortunately this has created lots of dust changes. - In the BN, store `PublicKeyBytes` in the `beacon_proposer_cache` and allow access to them. The HTTP API always sends `PublicKeyBytes` over the wire and the conversion from `PublicKey` -> `PublickeyBytes` is non-trivial, especially when queries have 100s/1000s of validators (like Pyrmont). - Add the `state_processing::state_advance` mod which dedups a lot of the "apply `n` skip slots to the state" code. - This also fixes a bug with some functions which were failing to include a state root as per [this comment](`072695284f/consensus/state_processing/src/state_advance.rs (L69-L74)`). I couldn't find any instance of this bug that resulted in anything more severe than keying a shuffling cache by the wrong block root. - Swap the VC block service to use `mpsc` from `tokio` instead of `futures`. This is consistent with the rest of the code base. ~~This PR reduces the size of the codebase 🎉~~ It used to reduce the size of the code base before I added more comments. ## Observations on Prymont - Proposer duties times down from peaks of 450ms to consistent <1ms. - Current epoch attester duties times down from >1s peaks to a consistent 20-30ms. - Block production down from +600ms to 100-200ms. ## Additional Info - ~~Blocked on #2241~~ - ~~Blocked on #2234~~ ## TODO - [x] ~~Refactor this into some smaller PRs?~~ Leaving this as-is for now. - [x] Address `per_slot_processing` roots. - [x] Investigate slow next epoch times. Not getting added to cache on block processing? - [x] Consider [this](`072695284f/beacon_node/store/src/hot_cold_store.rs (L811-L812)`) in the scenario of replacing the state roots Co-authored-by: pawan <pawandhananjay@gmail.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2021-03-17 05:09:57 +00:00
Michael Sproul	3919737978	Release v1.2.0 (#2249 ) ## Proposed Changes Release v1.2.0 unchanged from the release candidate.	2021-03-10 01:28:32 +00:00
Michael Sproul	770a2ca030	Fix proposer cache priming upon state advance (#2252 ) ## Proposed Changes While investigating an incorrect head + target vote for the epoch boundary block 708544, I noticed that the state advance failed to prime the proposer cache, as per these logs: ``` Mar 09 21:42:47.448 DEBG Subscribing to subnet target_slot: 708544, subnet: Y, service: attestation_service Mar 09 21:49:08.063 DEBG Advanced head state one slot current_slot: 708543, state_slot: 708544, head_root: 0xaf5e69de09f384ee3b4fb501458b7000c53bb6758a48817894ec3d2b030e3e6f, service: state_advance Mar 09 21:49:08.063 DEBG Completed state advance initial_slot: 708543, advanced_slot: 708544, head_root: 0xaf5e69de09f384ee3b4fb501458b7000c53bb6758a48817894ec3d2b030e3e6f, service: state_advance Mar 09 21:49:14.787 DEBG Proposer shuffling cache miss block_slot: 708544, block_root: 0x9b14bf68667ab1d9c35e6fd2c95ff5d609aa9e8cf08e0071988ae4aa00b9f9fe, parent_slot: 708543, parent_root: 0xaf5e69de09f384ee3b4fb501458b7000c53bb6758a48817894ec3d2b030e3e6f, service: beacon Mar 09 21:49:14.800 DEBG Successfully processed gossip block root: 0x9b14bf68667ab1d9c35e6fd2c95ff5d609aa9e8cf08e0071988ae4aa00b9f9fe, slot: 708544, graffiti: , service: beacon Mar 09 21:49:14.800 INFO New block received hash: 0x9b14…f9fe, slot: 708544 Mar 09 21:49:14.984 DEBG Head beacon block slot: 708544, root: 0x9b14…f9fe, finalized_epoch: 22140, finalized_root: 0x28ec…29a7, justified_epoch: 22141, justified_root: 0x59db…e451, service: beacon Mar 09 21:49:15.055 INFO Unaggregated attestation validator: XXXXX, src: api, slot: 708544, epoch: 22142, delay_ms: 53, index: Y, head: 0xaf5e69de09f384ee3b4fb501458b7000c53bb6758a48817894ec3d2b030e3e6f, service: val_mon Mar 09 21:49:17.001 DEBG Slot timer sync_state: Synced, current_slot: 708544, head_slot: 708544, head_block: 0x9b14…f9fe, finalized_epoch: 22140, finalized_root: 0x28ec…29a7, peers: 55, service: slot_notifier ``` The reason for this is that the condition was backwards, so that whole block of code was unreachable. Looking at the attestations for the block included in the block after, we can see that lots of validators missed it. Some of them may be Lighthouse v1.1.1-v1.2.0-rc.0, but it's probable that they would have missed even with the proposer cache primed, given how late the block 708544 arrived (the cache miss occurred 3.787s after the slot start): https://beaconcha.in/block/708545#attestations	2021-03-10 00:20:50 +00:00
Michael Sproul	786e25ea08	Release candidate v1.2.0-rc.0 (#2248 ) Prepare for v1.2.0 with this release candidate. To be merged after #2247 and #2246 Co-authored-by: Age Manning <Age@AgeManning.com>	2021-03-08 06:27:50 +00:00
Age Manning	babd153352	Prevent adding and dialing bootnodes when discovery is disabled (#2247 ) This is a small PR which prevents unwanted bootnodes from being added to the DHT and being dialed when the `--disable-discovery` flag is set. The main reason one would want to disable discovery is to connect to a fix set of peers. Currently, regardless of what the user does, Lighthouse will populate its DHT with previously known peers and also fill it with the spec's bootnodes. It will then dial the bootnodes that are capable of being dialed. This prevents testing with a fixed peer list. This PR prevents these excess nodes from being added and dialed if the user has set `--disable-discovery`.	2021-03-08 06:27:49 +00:00
Paul Hauner	e4eb0eb168	Use advanced state for block production (#2241 ) ## Issue Addressed NA ## Proposed Changes - Use the pre-states from #2174 during block production. - Running this on Pyrmont shows block production times dropping from ~550ms to ~150ms. - Create `crit` and `warn` logs when a block is published to the API later than we expect. - On mainnet we are issuing a warn if the block is published more than 1s later than the slot start and a crit for more than 3s. - Rename some methods on the `SnapshotCache` for clarity. - Add the ability to pass the state root to `BeaconChain::produce_block_on_state` to avoid computing a state root. This is a very common LH optimization. - Add a metric that tracks how late we broadcast blocks received from the HTTP API. This is technically a duplicate of a `ValidatorMonitor` log, but I wanted to have it for the case where we aren't monitoring validators too.	2021-03-04 04:43:31 +00:00
Michael Sproul	363f15f362	Use the database to persist the pubkey cache (#2234 ) ## Issue Addressed Closes #1787 ## Proposed Changes * Abstract the `ValidatorPubkeyCache` over a "backing" which is either a file (legacy), or the database. * Implement a migration from schema v2 to schema v3, whereby the contents of the cache file are copied to the DB, and then the file is deleted. The next release to include this change must be a minor version bump, and we will need to warn users of the inability to downgrade (this is our first DB schema change since mainnet genesis). * Move the schema migration code from the `store` crate into the `beacon_chain` crate so that it can access the datadir and the `ValidatorPubkeyCache`, etc. It gets injected back into the `store` via a closure (similar to what we do in fork choice).	2021-03-04 01:25:12 +00:00
Age Manning	1c507c588e	Update to the latest libp2p (#2239 ) Updates to the latest libp2p and ignores RUSTSEC-2020-0146 from cargo-audit Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2021-03-02 05:59:49 +00:00
realbigsean	ed9b245de0	update tokio-stream to 0.1.3 and use `BroadcastStream` (#2212 ) ## Issue Addressed Resolves #2189 ## Proposed Changes use tokio's `BroadcastStream` ## Additional Info N/A Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-03-01 01:58:05 +00:00
Michael Sproul	2f077b11fe	Allow HTTP API to return SSZ blocks (#2209 ) ## Issue Addressed Implements https://github.com/ethereum/eth2.0-APIs/pull/125 ## Proposed Changes Optionally return SSZ bytes from the `beacon/blocks` endpoint.	2021-02-24 04:15:14 +00:00
realbigsean	5bc93869c8	Update ValidatorStatus to match the v1 API (#2149 ) ## Issue Addressed N/A ## Proposed Changes We are currently a bit off of the standard API spec because we have [this](https://hackmd.io/bQxMDRt1RbS1TLno8K4NPg?view) proposal implemented for validator status. Based on discussion [here](https://github.com/ethereum/eth2.0-APIs/pull/94), it looks like this won't be added to the spec until v2, so this PR implements [this](https://hackmd.io/ofFJ5gOmQpu1jjHilHbdQQ) validator status logic instead ## Additional Info N/A Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-02-24 04:15:13 +00:00
Paul Hauner	a764c3b247	Handle early blocks (#2155 ) ## Issue Addressed NA ## Problem this PR addresses There's an issue where Lighthouse is banning a lot of peers due to the following sequence of events: 1. Gossip block 0xabc arrives ~200ms early - It is propagated across the network, with respect to [`MAXIMUM_GOSSIP_CLOCK_DISPARITY`](https://github.com/ethereum/eth2.0-specs/blob/v1.0.0/specs/phase0/p2p-interface.md#why-is-there-maximum_gossip_clock_disparity-when-validating-slot-ranges-of-messages-in-gossip-subnets). - However, it is not imported to our database since the block is early. 2. Attestations for 0xabc arrive, but the block was not imported. - The peer that sent the attestation is down-voted. - Each unknown-block attestation causes a score loss of 1, the peer is banned at -100. - When the peer is on an attestation subnet there can be hundreds of attestations, so the peer is banned quickly (before the missed block can be obtained via rpc). ## Potential solutions I can think of three solutions to this: 1. Wait for attestation-queuing (#635) to arrive and solve this. - Easy - Not immediate fix. - Whilst this would work, I don't think it's a perfect solution for this particular issue, rather (3) is better. 1. Allow importing blocks with a tolerance of `MAXIMUM_GOSSIP_CLOCK_DISPARITY`. - Easy - ~~I have implemented this, for now.~~ 1. If a block is verified for gossip propagation (i.e., signature verified) and it's within `MAXIMUM_GOSSIP_CLOCK_DISPARITY`, then queue it to be processed at the start of the appropriate slot. - More difficult - Feels like the best solution, I will try to implement this. This PR takes approach (3). ## Changes included - Implement the `block_delay_queue`, based upon a [`DelayQueue`](https://docs.rs/tokio-util/0.6.3/tokio_util/time/delay_queue/struct.DelayQueue.html) which can store blocks until it's time to import them. - Add a new `DelayedImportBlock` variant to the `beacon_processor::WorkEvent` enum to handle this new event. - In the `BeaconProcessor`, refactor a `tokio::select!` to a struct with an explicit `Stream` implementation. I experienced some issues with `tokio::select!` in the block delay queue and I also found it hard to debug. I think this explicit implementation is nicer and functionally equivalent (apart from the fact that `tokio::select!` randomly chooses futures to poll, whereas now we're deterministic). - Add a testing framework to the `beacon_processor` module that tests this new block delay logic. I also tested a handful of other operations in the beacon processor (attns, slashings, exits) since it was super easy to copy-pasta the code from the `http_api` tester. - To implement these tests I added the concept of an optional `work_journal_tx` to the `BeaconProcessor` which will spit out a log of events. I used this in the tests to ensure that things were happening as I expect. - The tests are a little racey, but it's hard to avoid that when testing timing-based code. If we see CI failures I can revise. I haven't observed any failures due to races on my machine or on CI yet. - To assist with testing I allowed for directly setting the time on the `ManualSlotClock`. - I gave the `beacon_processor::Worker` a `Toolbox` for two reasons; (a) it avoids changing tons of function sigs when you want to pass a new object to the worker and (b) it seemed cute.	2021-02-24 03:08:52 +00:00
Paul Hauner	46920a84e8	v1.1.3 (#2217 ) ## Issue Addressed NA ## Proposed Changes Bump versions ## Additional Info NA	2021-02-22 06:21:38 +00:00
Paul Hauner	4362ea4f98	Fix false positive "State advance too slow" logs (#2218 ) ## Issue Addressed - Resolves #2214 ## Proposed Changes Fix the false positive warning log described in #2214. ## Additional Info NA	2021-02-21 23:47:53 +00:00
Paul Hauner	8949ae7c4e	Address ENR update loop (#2216 ) ## Issue Addressed - Resolves #2215 ## Proposed Changes Addresses a potential loop when the majority of peers indicate that we are contactable via an IPv6 address. See https://github.com/sigp/discv5/pull/62 for further rationale. ## Additional Info The alternative to this PR is to use `--disable-enr-auto-update` and then manually supply an `--enr-address` and `--enr-upd-port`. However, that requires the user to know their IP addresses in order for discovery to work properly. This might not be practical/achievable for some users, hence this hotfix.	2021-02-21 23:47:52 +00:00
Paul Hauner	8c6537e71d	v1.1.2 (#2213 ) ## Issue Addressed NA ## Proposed Changes Bump versions ## Additional Info NA	2021-02-19 00:49:32 +00:00
Paul Hauner	f8cc82f2b1	Switch back to warp with cors wildcard support (#2211 ) ## Issue Addressed - Resolves #2204 - Resolves #2205 ## Proposed Changes Switches to my fork of `warp` which contains support for cors wildcards: https://github.com/paulhauner/warp/tree/cors-wildcard I have a PR open on the `warp` repo but it hasn't had any interest from the maintainers as of yet: https://github.com/seanmonstar/warp/pull/726. I think running from a fork is the best we can do for now. ## Additional Info NA	2021-02-18 22:33:12 +00:00
Lion - dapplion	613382f304	Add slot offset computing to be downloaded slot (#2198 ) The current implementation assumes the range offset of slots downloaded on a batch to equal zero. This conflicts with the condition to consider this chain as sync. For finalized sync, it results in one extra batch being downloaded which can't be processed. CC @wemeetagain	2021-02-18 08:24:46 +00:00
Paul Hauner	f819ba5414	v1.1.1 (#2202 ) ## Issue Addressed NA ## Proposed Changes Bump versions	2021-02-16 00:09:02 +00:00
Pawan Dhananjay	4a357c9947	Upgrade rand_core (#2201 ) ## Issue Addressed N/A ## Proposed Changes Upgrade `rand_core` to latest version to fix https://rustsec.org/advisories/RUSTSEC-2021-0023	2021-02-15 20:34:49 +00:00
Paul Hauner	88cc222204	Advance state to next slot after importing block (#2174 ) ## Issue Addressed NA ## Proposed Changes Add an optimization to perform `per_slot_processing` from the leading-edge of block processing to the trailing-edge. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`. Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking). There were several ancillary changes required to achieve this: - Remove the `state_root` field of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`. - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block. - Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it. - Address https://github.com/ethereum/eth2.0-specs/pull/2196 - Add a debug log when we get a block with an unknown parent. There was previously no logging around this case. - Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot. ## Additional Info - ~~Blocked on #2173~~ - ~~Blocked on #2179~~ That PR was wrapped into this PR. - There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later. ## TODO - [x] Tidy, improve comments. - [x] ~~Try avoid computing proposer index in `per_block_processing`?~~	2021-02-15 07:17:52 +00:00
Paul Hauner	3000f3e5da	Dht persistence on drop (v2) (#2200 ) ## Issue Addressed NA ## Proposed Changes This is simply #2177 with a merge conflict fixed. Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-02-15 06:09:55 +00:00
Paul Hauner	8e5c20b6d1	Update for clippy 1.50 (#2193 ) ## Issue Addressed NA ## Proposed Changes Rust 1.50 has landed 🎉 The shiny new `clippy` peers down upon us mere mortals with disgust. Brutish peasants wrapping our `usize`s in superfluous `Option`s... tsk tsk. I've performed the goat sacrifice and corrected our evil ways in this PR. Tonight we shall pray that Github Actions bestows the almighty green tick upon us. ## Additional Info NA Co-authored-by: realbigsean <seananderson33@gmail.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2021-02-15 00:09:12 +00:00
realbigsean	e20f64b21a	Update to tokio 1.1 (#2172 ) ## Issue Addressed resolves #2129 resolves #2099 addresses some of #1712 unblocks #2076 unblocks #2153 ## Proposed Changes - Updates all the dependencies mentioned in #2129, except for web3. They haven't merged their tokio 1.0 update because they are waiting on some dependencies of their own. Since we only use web3 in tests, I think updating it in a separate issue is fine. If they are able to merge soon though, I can update in this PR. - Updates `tokio_util` to 0.6.2 and `bytes` to 1.0.1. - We haven't made a discv5 release since merging tokio 1.0 updates so I'm using a commit rather than release atm. Edit: I think we should merge an update of `tokio_util` to 0.6.2 into discv5 before this release because it has panic fixes in `DelayQueue` --> PR in discv5: https://github.com/sigp/discv5/pull/58 ## Additional Info tokio 1.0 changes that required some changes in lighthouse: - `interval.next().await.is_some()` -> `interval.tick().await` - `sleep` future is now `!Unpin` -> https://github.com/tokio-rs/tokio/issues/3028 - `try_recv` has been temporarily removed from `mpsc` -> https://github.com/tokio-rs/tokio/issues/3350 - stream features have moved to `tokio-stream` and `broadcast::Receiver::into_stream()` has been temporarily removed -> `https://github.com/tokio-rs/tokio/issues/2870 - I've copied over the `BroadcastStream` wrapper from this PR, but can update to use `tokio-stream` once it's merged https://github.com/tokio-rs/tokio/pull/3384 Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-02-10 23:29:49 +00:00
Paul Hauner	e383ef3e91	Avoid temp allocations with slog (#2183 ) ## Issue Addressed Which issue # does this PR address? ## Proposed Changes Replaces use of `format!` in `slog` logging with it's special no-allocation `?` and `%` shortcuts. According to a `heaptrack` analysis today over about a period of an hour, this will reduce temporary allocations by at least 4%. ## Additional Info NA	2021-02-04 07:31:47 +00:00
Paul Hauner	ff35fbb121	Add metrics for beacon block propagation (#2173 ) ## Issue Addressed NA ## Proposed Changes Adds some metrics to track delays regarding: - LH processing of blocks - delays receiving blocks from other nodes. ## Additional Info NA	2021-02-04 05:33:56 +00:00
Akihito Nakano	1a22a096c6	Fix clippy errors on tests (#2160 ) ## Issue Addressed There are some clippy error on tests. ## Proposed Changes Enable clippy check on tests and fix the errors. 💪	2021-01-28 23:31:06 +00:00
Paul Hauner	e4b62139d7	v1.1.0 (#2168 ) ## Issue Addressed NA ## Proposed Changes - Bump version - ~~Run `cargo update`~~ ## Additional Info NA	2021-01-21 02:37:08 +00:00
Paul Hauner	2b2a358522	Detailed validator monitoring (#2151 ) ## Issue Addressed - Resolves #2064 ## Proposed Changes Adds a `ValidatorMonitor` struct which provides additional logging and Grafana metrics for specific validators. Use `lighthouse bn --validator-monitor` to automatically enable monitoring for any validator that hits the [subnet subscription](https://ethereum.github.io/eth2.0-APIs/#/Validator/prepareBeaconCommitteeSubnet) HTTP API endpoint. Also, use `lighthouse bn --validator-monitor-pubkeys` to supply a list of validators which will always be monitored. See the new docs included in this PR for more info. ## TODO - [x] Track validator balance, `slashed` status, etc. - [x] ~~Register slashings in current epoch, not offense epoch~~ - [ ] Publish Grafana dashboard, update TODO link in docs - [x] ~~#2130 is merged into this branch, resolve that~~	2021-01-20 19:19:38 +00:00
Paul Hauner	1eb0915301	Fix bug from #2163 (#2165 ) ## Issue Addressed NA ## Proposed Changes Fixes a bug that I missed during a review in #2163. I found this bug by observing that nodes were receiving far less attestations (~1/2 of previous). I'm not certain on exactly how this mistake manifested in a reduction in attestations, but the mistake touches so much code that I think it's reasonable to declare that this it the cause of the observed issue (drop in attestations). ## Additional Info NA	2021-01-20 10:28:12 +00:00
Paul Hauner	b06559ae97	Disallow attestation production earlier than head (#2130 ) ## Issue Addressed The non-finality period on Pyrmont between epochs [`9114`](https://pyrmont.beaconcha.in/epoch/9114) and [`9182`](https://pyrmont.beaconcha.in/epoch/9182) was contributed to by all the `lighthouse_team` validators going down. The nodes saw excessive CPU and RAM usage, resulting in the system to kill the `lighthouse bn` process. The `Restart=on-failure` directive for `systemd` caused the process to bounce in ~10-30m intervals. Diagnosis with `heaptrack` showed that the `BeaconChain::produce_unaggregated_attestation` function was calling `store::beacon_state::get_full_state` and sometimes resulting in a tree hash cache allocation. These allocations were approximately the size of the hosts physical memory and still allocated when `lighthouse bn` was killed by the OS. There was no CPU analysis (e.g., `perf`), but the `BeaconChain::produce_unaggregated_attestation` is very CPU-heavy so it is reasonable to assume it is the cause of the excessive CPU usage, too. ## Proposed Changes `BeaconChain::produce_unaggregated_attestation` has two paths: 1. Fast path: attesting to the head slot or later. 2. Slow path: attesting to a slot earlier than the head block. Path (2) is the only path that calls `store::beacon_state::get_full_state`, therefore it is the path causing this excessive CPU/RAM usage. This PR removes the current functionality of path (2) and replaces it with a static error (`BeaconChainError::AttestingPriorToHead`). This change reduces the generality of `BeaconChain::produce_unaggregated_attestation` (and therefore [`/eth/v1/validator/attestation_data`](https://ethereum.github.io/eth2.0-APIs/#/Validator/produceAttestationData)), but I argue that this functionality is an edge-case and arguably a violation of the [Honest Validator spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md). It's possible that a validator goes back to a prior slot to "catch up" and submit some missed attestations. This change would prevent such behaviour, returning an error. My concerns with this catch-up behaviour is that it is: - Not specified as "honest validator" attesting behaviour. - Is behaviour that is risky for slashing (although, all validator clients should have slashing protection and will eventually fail if they do not). - It disguises clock-sync issues between a BN and VC. ## Additional Info It's likely feasible to implement path (2) if we implement some sort of caching mechanism. This would be a multi-week task and this PR gets the issue patched in the short term. I haven't created an issue to add path (2), instead I think we should implement it if we get user-demand.	2021-01-20 06:52:37 +00:00
Paul Hauner	d9f940613f	Represent slots in secs instead of millisecs (#2163 ) ## Issue Addressed NA ## Proposed Changes Copied from #2083, changes the config milliseconds_per_slot to seconds_per_slot to avoid errors when slot duration is not a multiple of a second. To avoid deserializing old serialized data (with milliseconds instead of seconds) the Serialize and Deserialize derive got removed from the Spec struct (isn't currently used anyway). This PR replaces #2083 for the purpose of fixing a merge conflict without requiring the input of @blacktemplar. ## Additional Info NA Co-authored-by: blacktemplar <blacktemplar@a1.net>	2021-01-19 09:39:51 +00:00
Paul Hauner	805e152f66	Simplify enum -> str with strum (#2164 ) ## Issue Addressed NA ## Proposed Changes As per #2100, uses derives from the sturm library to implement AsRef<str> and AsStaticRef to easily get str values from enums without creating new Strings. Furthermore unifies all attestation error counter into one IntCounterVec vector. These works are originally by @blacktemplar, I've just created this PR so I can resolve some merge conflicts. ## Additional Info NA Co-authored-by: blacktemplar <blacktemplar@a1.net>	2021-01-19 06:33:58 +00:00
realbigsean	7a71977987	Clippy 1.49.0 updates and dht persistence test fix (#2156 ) ## Issue Addressed `test_dht_persistence` failing ## Proposed Changes Bind `NetworkService::start` to an underscore prefixed variable rather than `_`. `_` was causing it to be dropped immediately This was failing 5/100 times before this update, but I haven't been able to get it to fail after updating it Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-01-19 00:34:28 +00:00
Pawan Dhananjay	28238d97b1	Disconnect from peers quicker on internet issues (#2147 ) ## Issue Addressed Fixes #2146 ## Proposed Changes Change ping timeout errors to return `LowToleranceErrors` so that we disconnect faster on internet failures/changes.	2021-01-13 08:09:10 +00:00
realbigsean	423dea169c	update smallvec (#2152 ) ## Issue Addressed `cargo audit` is failing because of a potential for an overflow in the version of `smallvec` we're using ## Proposed Changes Update to the latest version of `smallvec`, which has the fix Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-01-11 23:32:11 +00:00
Arthur Woimbée	851a4dca3c	replace tempdir by tempfile (#2143 ) ## Issue Addressed Fixes #2141 Remove [tempdir](https://docs.rs/tempdir/0.3.7/tempdir/) in favor of [tempfile](https://docs.rs/tempfile/3.1.0/tempfile/). ## Proposed Changes `tempfile` has a slightly different api that makes creating temp folders with a name prefix a chore (`tempdir::TempDir::new("toto")` => `tempfile::Builder::new().prefix("toto").tempdir()`). So I removed temp folder name prefix where I deemed it not useful. Otherwise, the functionality is the same.	2021-01-06 06:36:11 +00:00
Age Manning	7e4b190df0	Reduce ping interval (#2132 ) ## Issue Addressed #2123 ## Description Reduces the TCP ping interval to increase our responsiveness to peer liveness changes.	2021-01-06 04:35:52 +00:00
realbigsean	588b90157d	Ssz state api endpoint (#2111 ) ## Issue Addressed Catching up to a recently merged API spec PR: https://github.com/ethereum/eth2.0-APIs/pull/119 ## Proposed Changes - Return an SSZ beacon state on `/eth/v1/debug/beacon/states/{stateId}` when passed this header: `accept: application/octet-stream`. - requests to this endpoint with no `accept` header or an `accept` header and a value of `application/json` or `/` , or will result in a JSON response ## Additional Info Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-01-06 03:01:46 +00:00
Samuel E. Moelius	939fa717fd	`test_decode_malicious_status_message` improvements (#2104 ) ## Issue Addressed None ## Proposed Changes * Correct typo in one comment, elaborate some others. * Add asserts to ensure comments match code. * Eliminate one unnecessary `clone`. ## Additional Info None	2021-01-06 01:10:26 +00:00
Samuel E. Moelius	0245ddd37b	Fix typo in `ssz_snappy.rs` comment (#2103 ) ## Issue Addressed None ## Proposed Changes Correct a typo in `ssz_snappy.rs`. ## Additional Info Pedantry at it finest.	2021-01-06 01:10:24 +00:00
Paul Hauner	f183af20e3	Version v1.0.6 (#2126 ) ## Issue Addressed NA ## Proposed Changes - Bump versions - Run `cargo update` ## Additional Info NA	2020-12-28 23:38:02 +00:00
Akihito Nakano	78d17c3255	Tweak error messages for ease of investigation (#2122 ) ## Proposed Changes <!-- Please list or describe the changes introduced by this PR. --> Tweaked the error message for ease of investigation as `Failed to update eth1 cache` is used in multiple places. 😃	2020-12-28 01:25:33 +00:00
Paul Hauner	9ed65a64f8	Version v1.0.5 (#2117 ) ## Issue Addressed NA ## Proposed Changes - Bump versions to `v1.0.5` - Run `cargo update` ## Additional Info NA	2020-12-23 18:52:48 +00:00
Age Manning	2931b05582	Update libp2p (#2101 ) This is a little bit of a tip-of-the-iceberg PR. It houses a lot of code changes in the libp2p dependency. This needs a bit of thorough testing before merging. The primary code changes are: - General libp2p dependency update - Gossipsub refactor to shift compression into gossipsub providing performance improvements and improved API for handling compression Co-authored-by: Paul Hauner <paul@paulhauner.com>	2020-12-23 07:53:36 +00:00
Samuel E. Moelius	3381266998	Eliminate uses of `expect` in `ssz_snappy.rs` (#2105 ) ## Issue Addressed None ## Proposed Changes Eliminate three uses of `expect` in `ssz_snappy.rs`. ## Additional Info None	2020-12-22 02:28:37 +00:00
Michael Sproul	e5bf2576f1	Optimise tree hash caching for block production (#2106 ) ## Proposed Changes `@potuz` on the Eth R&D Discord observed that Lighthouse blocks on Pyrmont were always arriving at other nodes after at least 1 second. Part of this could be due to processing and slow propagation, but metrics also revealed that the Lighthouse nodes were usually taking 400-600ms to even just produce a block before broadcasting it. I tracked the slowness down to the lack of a pre-built tree hash cache (THC) on the states being used for block production. This was due to using the head state for block production, which lacks a THC in order to keep fork choice fast (cloning a THC takes at least 30ms for 100k validators). This PR modifies block production to clone a state from the snapshot cache rather than the head, which speeds things up by 200-400ms by avoiding the tree hash cache rebuild. In practice this seems to have cut block production time down to 300ms or less. Ideally we could _remove_ the snapshot from the cache (and save the 30ms), but it is required for when we re-process the block after signing it with the validator client. ## Alternatives I experimented with 2 alternatives to this approach, before deciding on it: * Alternative 1: ensure the `head` has a tree hash cache. This is too slow, as it imposes a +30ms hit on fork choice, which currently takes ~5ms (with occasional spikes). * Alternative 2: use `Arc<BeaconSnapshot>` in the snapshot cache and share snapshots between the cache and the `head`. This made fork choice blazing fast (1ms), and block production the same as in this PR, but had a negative impact on block processing which I don't think is worth it. It ended up being necessary to clone the full state from the snapshot cache during block production, imposing the +30ms penalty there _as well_ as in block production. In contract, the approach in this PR should only impact block production, and it improves it! Yay for pareto improvements 🎉 ## Additional Info This commit (ac59dfa) is currently running on all the Lighthouse Pyrmont nodes, and I've added a dashboard to the Pyrmont grafana instance with the metrics. In future work we should optimise the attestation packing, which consumes around 30-60ms and is now a substantial contributor to the total.	2020-12-21 06:29:39 +00:00
Paul Hauner	a62dc65ca4	BN Fallback v2 (#2080 ) ## Issue Addressed - Resolves #1883 ## Proposed Changes This follows on from @blacktemplar's work in #2018. - Allows the VC to connect to multiple BN for redundancy. - Update the simulator so some nodes always need to rely on their fallback. - Adds some extra deprecation warnings for `--eth1-endpoint` - Pass `SignatureBytes` as a reference instead of by value. ## Additional Info NA Co-authored-by: blacktemplar <blacktemplar@a1.net>	2020-12-18 09:17:03 +00:00
Pawan Dhananjay	f998eff7ce	Subnet discovery fixes (#2095 ) ## Issue Addressed N/A ## Proposed Changes Fixes multiple issues related to discovering of subnet peers. 1. Subnet discovery retries after yielding no results 2. Metadata updates if peer send older metadata 3. peerdb stores the peer subscriptions from gossipsub	2020-12-17 00:39:15 +00:00
blacktemplar	3fcc517993	Fix Syncing Simulator (#2049 ) ## Issue Addressed NA ## Proposed Changes Fixes problems with slot times below 1 second which got revealed by running the syncing simulator with the default speedup time.	2020-12-16 05:37:38 +00:00
Michael Sproul	0c529b8d52	Add slasher broadcast (#2079 ) ## Issue Addressed Closes #2048 ## Proposed Changes * Broadcast slashings when the `--slasher-broadcast` flag is provided. * In the process of implementing this I refactored the slasher service into its own crate so that it could access the network code without creating a circular dependency. I moved the responsibility for putting slashings into the op pool into the service as well, as it makes sense for it to handle the whole slashing lifecycle.	2020-12-16 03:44:01 +00:00
Pawan Dhananjay	63eeb14a81	Improve eth1 fallback logging (#2096 ) ## Issue Addressed N/A ## Proposed Changes There seemed to be confusion among discord users on the eth1 fallback logging ``` WARN Error connecting to eth1 node. Trying fallback ..., endpoint: http://127.0.0.1:8545/, service: eth1_rpc ``` The assumption users seem to be making here is that it is trying the fallback and fallback=endpoint in the log. This PR improves the logging to be like ``` WARN Error connecting to eth1 node endpoint, endpoint: http://127.0.0.1:8545/, action: trying fallbacks, service: eth1_rpc ``` I think this is a bit more clear that the endpoint that failed is the one in the log.	2020-12-16 02:39:09 +00:00
divma	11c299cbf6	impl Resource Unavailable RPC error (#2072 ) ## Issue Addressed Related to #1891, The error is not in the spec yet (see ethereum/eth2.0-specs#2131) ## Proposed Changes Implement the proposed error, banning peers that send it ## Additional Info NA	2020-12-15 00:17:32 +00:00
blacktemplar	701843aaa0	Update dependencies (#2084 ) ## Issue Addressed Partially addresses dependencies mentioned in issue #1712. ## Proposed Changes Updates dependencies (including an update avoiding a vulnerability) + add tokio compatibility to `remote_signer_test`	2020-12-14 02:28:19 +00:00
Michael Sproul	1abc70e815	Version v1.0.4 (#2073 ) ## Proposed Changes Run cargo update and bump version in prep for v1.0.4 release ## Additional Info Planning to merge this commit to `unstable`, test on Pyrmont and canary nodes, then push to `stable`.	2020-12-10 04:01:40 +00:00
Age Manning	dfb588e521	Softer penalties for missing blocks (#2075 ) ## Issue Addressed Users are reporting errors for sending attestations to peers. If the clock sync is a little out or we receive attestations before blocks, peers are being too harshly penalized. They can get scored many times per missing block and we typically need these peers on subnets. ## Proposed Changes This removes the penalization for missing blocks with attestations. The penalty should be handled when #635 gets built as it will allow us to group attestations per missing block and penalize once.	2020-12-10 00:40:12 +00:00
Michael Sproul	aa45fa3ff7	Revert fork choice if disk write fails (#2068 ) ## Issue Addressed Closes #2028 Replaces #2059 ## Proposed Changes If writing to the database fails while importing a block, revert fork choice to the last version stored on disk. This prevents fork choice from being ahead of the blocks on disk. Having fork choice ahead is particularly bad if it is later successfully written to disk, because it renders the database corrupt (see #2028). ## Additional Info * This mitigation might fail if the head+fork choice haven't been persisted yet, which can only happen at first startup (see #2067) * This relies on it being OK for the head tracker to be ahead of fork choice. I figure this is tolerable because blocks only get added to the head tracker after successfully being written on disk _and_ to fork choice, so even if fork choice reverts a little bit, when the pruning algorithm runs, those blocks will still be on disk and OK to prune. The pruning algorithm also doesn't rely on heads being unique, technically it's OK for multiple blocks from the same linear chain segment to be present in the head tracker. This begs the question of #1785 (i.e. things would be simpler with the head tracker out of the way). Alternatively, this PR could just revert the head tracker as well (I'll look into this tomorrow).	2020-12-09 05:10:34 +00:00
Michael Sproul	82753f842d	Improve compile time (#1989 ) ## Issue Addressed Closes #1264 ## Proposed Changes * Milagro BLS: tweak the feature flags so that Milagro doesn't get compiled if we're using BLST. Profiling showed that it was consuming about 1 minute of CPU time out of 60 minutes of CPU time (real time ~15 mins). A 1.6% saving. * Reduce monomorphization: compiling for 3 different `EthSpec` types causes a heck of a lot of generic functions to be instantiated (monomorphized). Removing 2 of 3 cuts the LLVM+linking step from around 250 seconds to 180 seconds, a saving of 70 seconds (real time!). This applies only to `make` and not the CI build, because we test with the minimal spec on CI. * Update `web3` crate to v0.13. This is perhaps the most controversial change, because it requires axing some deposit contract tools from `lcli`. I suspect these tools weren't used much anyway, and could be maintained separately, but I'm also happy to revert this change. However, it does save us a lot of compile time. With #1839, we now have 3 versions of Tokio (and all of Tokio's deps). This change brings us down to 2 versions, but 1 should be achievable once web3 (and reqwest) move to Tokio 0.3. * Remove `lcli` from the Docker image. It's a dev tool and can be built from the repo if required.	2020-12-09 01:34:58 +00:00
Age Manning	4f85371ce8	Downgrades a valid log (#2057 ) ## Issue Addressed #2046 ## Proposed Changes The log was originally intended to verify the correct logic and ordering of events when scoring peers. The queued tasks can be structured in such a way that peers can be banned after they are disconnected. Therefore the error log is now downgraded to debug log.	2020-12-08 10:48:45 +00:00
divma	57489e620f	fix default network handling (#2029 ) ## Issue Addressed #1992 and #1987, and also to be considered a continuation of #1751 ## Proposed Changes many changed files but most are renaming to align the code with the semantics of `--network` - remove the `--network` default value (in clap) and instead set it after checking the `network` and `testnet-dir` flags - move `eth2_testnet_config` crate to `eth2_network_config` - move `Eth2TestnetConfig` to `Eth2NetworkConfig` - move `DEFAULT_HARDCODED_TESTNET` to `DEFAULT_HARDCODED_NETWORK` - `beacon_node`s `get_eth2_testnet_config` loads the `DEFAULT_HARDCODED_NETWORK` if there is no network nor testnet provided - `boot_node`s config loads the config same as the `beacon_node`, it was using the configuration only for preconfigured networks (That code is ~1year old so I asume it was not intended) - removed a one year old comment stating we should try to emulate `https://github.com/eth2-clients/eth2-testnets/tree/master/nimbus/testnet1` it looks outdated (?) - remove `lighthouse`s `load_testnet_config` in favor of `get_eth2_network_config` to centralize that logic (It had differences) - some spelling ## Additional Info Both the command of #1992 and the scripts of #1987 seem to work fine, same as `bn` and `vc`	2020-12-08 05:41:10 +00:00
divma	f3200784b4	More metrics + RPC tweaks (#2041 ) ## Issue Addressed NA ## Proposed Changes This was mostly done to find the reason why LH was dropping peers from Nimbus. It proved to be useful so I think it's worth it. But there is also some functional stuff here - Add metrics for rpc errors per client, error type and direction - Add metrics for downscoring events per source type, client and penalty type - Add metrics for gossip validation results per client for non-accepted messages - Make the RPC handler return errors and requests/responses in the order we see them - Allow a small burst for the Ping rate limit, from 1 every 5 seconds to 2 every 10 seconds - Send rate limiting errors with a particular code and use that same code to identify them. I picked something different to 128 since that is most likely what other clients are using for their own errors - Remove some unused code in the `PeerAction` and the rpc handler - Remove the unused variant `RateLimited`. tTis was never produced directly, since the only way to get the request's protocol is via de handler. The handler upon receiving from LH a response with an error (rate limited in this case) emits this event with the missing info (It was always like this, just pointing out that we do downscore rate limiting errors regardless of the change) Metrics for Nimbus looked like this: Downscoring events: `increase(libp2p_peer_actions_per_client{client="Nimbus"}[5m])` ![image](https://user-images.githubusercontent.com/26765164/101210880-862bf280-3676-11eb-94c0-399f0bf5aa2e.png) RPC Errors: `increase(libp2p_rpc_errors_per_client{client="Nimbus"}[5m])` ![image](https://user-images.githubusercontent.com/26765164/101210997-ba071800-3676-11eb-847a-f32405ede002.png) Unaccepted gossip message: `increase(gossipsub_unaccepted_messages_per_client{client="Nimbus"}[5m])` ![image](https://user-images.githubusercontent.com/26765164/101211124-f470b500-3676-11eb-9459-132ecff058ec.png)	2020-12-08 03:55:50 +00:00
blacktemplar	a28e8decbf	update dependencies (#2032 ) ## Issue Addressed NA ## Proposed Changes Updates out of date dependencies. ## Additional Info See also https://github.com/sigp/lighthouse/issues/1712 for a list of dependencies that are still out of date and the resasons.	2020-12-07 08:20:33 +00:00
Michael Sproul	c1ec386d18	Pass failed gossip blocks to the slasher (#2047 ) ## Issue Addressed Closes #2042 ## Proposed Changes Pass blocks that fail gossip verification to the slasher. Blocks that are successfully verified are not passed immediately, but will be passed as part of full block verification.	2020-12-04 05:03:30 +00:00
Pawan Dhananjay	7933596c89	Add a purge-eth1-cache cli option (#2039 ) ## Issue Some eth1 clients are missing deposit logs on mainnet for multiple reasons (not fully synced, eth1 client issues) because of which we are getting `FailedToInsertDeposit` errors. Ideally, LH should pick up where it left off after pointing it to a nice eth1 client endpoint (which has all deposits). However, I have seen instances where LH keeps getting `FailedToInsertDeposit` even after switching to a good endpoint. Only deleting the beacon directory (which also wipes the eth1 cache) and resyncing the eth1 caches seems to be the solution. This wouldn't be great for mainnet if you have to sync your beacon node again as well. ## Proposed Changes Add a `--purge-eth1-db` option which just wipes the eth1 cache and doesn't touch the rest of the beacon db. Still need to investigate if and why LH isn't picking up where it left off for the deposit logs sync, but I think it would be good to have an option to just delete eth1 caches regardless.	2020-12-04 05:03:28 +00:00
realbigsean	fdfb81a74a	Server sent events (#1920 ) ## Issue Addressed Resolves #1434 (this is the last major feature in the standard spec. There are only a couple of places we may be off-spec due to recent spec changes or ongoing discussion) Partly addresses #1669 ## Proposed Changes - remove the websocket server - remove the `TeeEventHandler` and `NullEventHandler` - add server sent events according to the eth2 API spec ## Additional Info This is according to the currently unmerged PR here: https://github.com/ethereum/eth2.0-APIs/pull/117 Co-authored-by: realbigsean <seananderson33@gmail.com>	2020-12-04 00:18:58 +00:00
realbigsean	2b5c0df9e5	Validators endpoint status code (#2040 ) ## Issue Addressed Resolves #2035 ## Proposed Changes Update 405's to 400's for failures when we are parsing path params. ## Additional Info Haven't updated the same for non-standard endpoints Co-authored-by: realbigsean <seananderson33@gmail.com>	2020-12-03 23:10:08 +00:00
Age Manning	2682f46025	Fingerprint new client identify agent string (#2027 ) Nimbus have modified their identify agent string. This PR adds their new agent string to identify new nimbus peers.	2020-12-03 22:07:14 +00:00
Pawan Dhananjay	482695142a	Minor fixes (#2038 ) Fixes a couple of low hanging fruits. - Fixes #2037 - `validators-dir` and `secrets-dir` flags don't really need to depend upon each other - Fixes #2006 and Fixes #1995	2020-12-03 01:10:28 +00:00
blacktemplar	d8cda2d86e	Fix new clippy lints (#2036 ) ## Issue Addressed NA ## Proposed Changes Fixes new clippy lints in the whole project (mainly [manual_strip](https://rust-lang.github.io/rust-clippy/master/index.html#manual_strip) and [unnecessary_lazy_evaluations](https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_lazy_evaluations)). Furthermore, removes `to_string()` calls on literals when used with the `?`-operator.	2020-12-03 01:10:26 +00:00
Paul Hauner	b8bd80d2fb	Add Content-Type to metrics server (#2019 ) ## Issue Addressed - Resolves #2013 ## Proposed Changes Adds the `Content-Type text/plain` header as per #2013 ## Additional Info NA	2020-12-01 00:04:46 +00:00
Paul Hauner	65dcdc361b	Bump version to v1.0.3 (#2024 ) ## Issue Addressed NA ## Proposed Changes - Set version to `v1.0.3` - Run cargo update ## Additional Info - ~~Blocked on #2008~~	2020-11-30 22:55:10 +00:00
Age Manning	c718e81eaf	Add privacy option (#2016 ) Adds a `--privacy` CLI flag to the beacon node that users may opt into. This does two things: - Removes client identifying information from the identify libp2p protocol - Changes the default graffiti to "" if no graffiti is set.	2020-11-30 22:55:08 +00:00
Paul Hauner	77f3539654	Improve eth1 block sync (#2008 ) ## Issue Addressed NA ## Proposed Changes - Log about eth1 whilst waiting for genesis. - For the block and deposit caches, update them after each download instead of when all downloads are complete. - This prevents the case where a single timeout error can cause us to drop all previously download blocks/deposits. - Set `max_log_requests_per_update` to avoid timeouts due to very large log counts in a response. - Set `max_blocks_per_update` to prevent a single update of the block cache to download an unreasonable number of blocks. - This shouldn't have any affect in normal use, it's just a safe-guard against bugs. - Increase the timeout for eth1 calls from 15s to 60s, as per @pawanjay176's experience with Infura. ## Additional Info NA	2020-11-30 20:29:17 +00:00
divma	8fcd22992c	No string in slog (#2017 ) ## Issue Addressed Following slog's documentation, this should help a bit with string allocations. I left it run for two days and mem usage is lower. This is of course anecdotal, but shouldn't harm anyway ## Proposed Changes remove `String` creation in logs when possible	2020-11-30 10:33:00 +00:00
Paul Hauner	85e69249e6	Drop discovery log to trace (#2007 ) ## Issue Addressed NA ## Proposed Changes This was causing: ``` Nov 28 21:56:08.154 ERRO slog-async: logger dropped messages due to channel overflow, count: 44, service: libp2p ``` ## Additional Info NA	2020-11-29 03:02:23 +00:00
Age Manning	f7183098ee	Bump to version v1.0.2 (#2001 ) Update lighthouse to version `v1.0.2`. There are two major updates in this version: - Updates to the task executor to tokio 0.3 and all sub-dependencies relying on core execution, including libp2p - Update BLST	2020-11-28 13:22:37 +00:00
Age Manning	a567f788bd	Upgrade to tokio 0.3 (#1839 ) ## Description This PR updates Lighthouse to tokio 0.3. It includes a number of dependency updates and some structural changes as to how we create and spawn tasks. This also brings with it a number of various improvements: - Discv5 update - Libp2p update - Fix for recompilation issues - Improved UPnP port mapping handling - Futures dependency update - Log downgrade to traces for rejecting peers when we've reached our max Co-authored-by: blacktemplar <blacktemplar@a1.net>	2020-11-28 05:30:57 +00:00
Paul Hauner	5a3b94cbb4	Update to v1.0.1, run cargo update	2020-11-27 21:16:59 +11:00
blacktemplar	38b15deccb	Fallback nodes for eth1 access (#1918 ) ## Issue Addressed part of #1883 ## Proposed Changes Adds a new cli argument `--eth1-endpoints` that can be used instead of `--eth1-endpoint` to specify a comma-separated list of endpoints. If the first endpoint returns an error for some request the other endpoints are tried in the given order. ## Additional Info Currently if the first endpoint fails the fallbacks are used silently (except for `try_fallback_test_endpoint` that is used in `do_update` which logs a `WARN` for each endpoint that is not reachable). A question is if we should add more logs so that the user gets warned if his main endpoint is for example just slow and sometimes hits timeouts.	2020-11-27 08:37:44 +00:00
Michael Sproul	1312844f29	Disable snappy in LevelDB to fix build issues (#1983 ) ## Proposed Changes A user on Discord reported build issues when trying to compile Lighthouse checked out to a path with spaces in it. I've fixed the issue upstream in `leveldb-sys` (https://github.com/skade/leveldb-sys/pull/22), but rather than waiting for a new release of the `leveldb` crate, we can also work around the issue by disabling Snappy in LevelDB, which we weren't using anyway. This may also have the side-effect of slightly improving compilation times, as LevelDB+Snappy was found to be a substantial contributor to build time (although I'm not sure how much was LevelDB and how much was Snappy).	2020-11-27 03:01:57 +00:00
Pawan Dhananjay	0589a14afe	Log better error message (#1981 ) ## Issue Addressed Fixes #1965 ## Proposed Changes Log an error and don't update eth1 caches if `chain_id = 0`	2020-11-26 23:13:46 +00:00
divma	fc07cc3fdf	Sync metrics (#1975 ) ## Issue Addressed - Add metrics to keep track of peer counts by sync type - Add metric to keep track of the number of syncing chains in range ## Proposed Changes Plugin to the network metrics update interval and update too the counts for peers wrt to their sync status with us ## Additional Info For the peer counts - By the way it is implemented the numbers won't always match to the total peer count in the `libp2p` metric. - Updating the gauge with every change is messy because it requires to be updated on connection (in the `eth2_libp2p` crate, while metrics are defined in the `network` crate) on Goodbye sent (for an `IrrelevantPeer`) either in the `beacon_processor` or the `peer_manager`, and on disconnection. Since this is not a critical metric I think counting once every second is enough. If you think more accuracy is needed we can do it too, but it would be harder to maintain) ATM those look like this ![image](https://user-images.githubusercontent.com/26765164/100275387-22137b00-2f60-11eb-93b9-94b0f265240c.png)	2020-11-26 05:23:17 +00:00
Paul Hauner	26741944b1	Add metrics to VC (#1954 ) ## Issue Addressed NA ## Proposed Changes - Adds a HTTP server to the VC which provides Prometheus metrics. - Moves the health metrics into the `lighthouse_metrics` crate so it can be shared between BN/VC. - Sprinkle some metrics around the VC. - Update the book to indicate that we now have VC metrics. - Shifts the "waiting for genesis" logic later in the `ProductionValidatorClient::new_from_cli` - This is worth attention during the review. ## Additional Info - ~~`clippy` has some new lints that are failing. I'll deal with that in another PR.~~	2020-11-26 01:10:51 +00:00
divma	3b4afc27bf	Status race condition (#1967 ) ## Issue Addressed Sync stalls due to race conditions between dc notifications and status processing	2020-11-25 02:15:38 +00:00
Paul Hauner	c6baa0eed1	Bump to v1.0.0, run cargo update	2020-11-25 02:02:19 +11:00
Age Manning	a96893744c	Update bootnodes and boot_node cli (#1961 )	2020-11-25 02:01:37 +11:00
divma	6f890c398e	Sync Bug fixes (#1950 ) ## Issue Addressed Two issues related to empty batches - Chain target's was not being advanced when the batch was successful, empty and the chain didn't have an optimistic batch - Not switching finalized chains. We now switch finalized chains requiring a minimum work first	2020-11-24 02:11:31 +00:00
Paul Hauner	21617aa87f	Change --testnet flag to --network (#1751 ) ## Issue Addressed - Resolves #1689 ## Proposed Changes TBC ## Additional Info NA	2020-11-23 23:54:03 +00:00
Michael Sproul	7d644103c6	Tweak slasher DB schema and pruning (#1948 ) ## Issue Addressed Resolves #1890 ## Proposed Changes Change the slasher database schema to key indexed attestations by `(target_epoch, indexed_attestation_root)` instead of just `indexed_attestation_root`. This allows more straight-forward pruning (linear scan), that is also "re-entrant". By re-entrant, we mean that a pruning pass that gets stuck because of a `MapFull` error can attempt to commit midway, and be resumed later without issue. The previous pruning strategy for indexed attestations did not have this property. There was also a flaw in the previous pruning that could leave "zombie" indexed attestations in the database (ones not referenced by any attester record), which could build up and contribute to bloat (although in practice I think they occur quite infrequently). ## Additional Info During testing I noticed that a `MapFull` error can still occur during the commit of the transaction itself, which is irritating, but not unbearable. This PR should at least reduce the frequency with which users need to manually resize their DB, and if the `MapFull` on commit rears its ugly head too often we could use a dynamic strategy (temporarily increase the size of the map until the transaction commits). The extra bytes for the epoch make the database a bit heavier, so the size estimate docs have been updated to reflect this. This is also a breaking schema change, so anyone using a v0 database from a few hours ago will need to drop it and update 😅	2020-11-23 21:33:51 +00:00
Michael Sproul	5828ff1204	Implement slasher (#1567 ) This is an implementation of a slasher that lives inside the BN and can be enabled via `lighthouse bn --slasher`. Features included in this PR: - [x] Detection of attester slashing conditions (double votes, surrounds existing, surrounded by existing) - [x] Integration into Lighthouse's attestation verification flow - [x] Detection of proposer slashing conditions - [x] Extraction of attestations from blocks as they are verified - [x] Compression of chunks - [x] Configurable history length - [x] Pruning of old attestations and blocks - [x] More tests Future work: * Focus on a slice of history separate from the most recent N epochs (e.g. epochs `current - K` to `current - M`) * Run out-of-process * Ingest attestations from the chain without a resync Design notes are here https://hackmd.io/@sproul/HJSEklmPL	2020-11-23 03:43:22 +00:00
Paul Hauner	59b2247ab8	Improve UX whilst VC is waiting for genesis (#1915 ) ## Issue Addressed - Resolves #1424 ## Proposed Changes Add a `GET lighthouse/staking` that returns 200 if the node is ready to stake (i.e., `--eth1` flag is present) or a 404 otherwise. Whilst the VC is waiting for the genesis time to start (i.e., when the genesis state is known), check the `lighthouse/staking` endpoint and log an error if the node isn't configured for staking. ## Additional Info NA	2020-11-23 01:00:22 +00:00
Paul Hauner	65b1cf2af1	Add flag to import all attestations (#1941 ) ## Issue Addressed NA ## Proposed Changes Adds the `--import-all-attestations` flag which tells the `network::AttestationService` to import/aggregate all attestations after verification (instead of only ones for subnets that are relevant to local validators). This is useful for testing/debugging and also for creating back-up nodes that should be all cached up and ready for any validator. ## Additional Info NA	2020-11-22 23:58:25 +00:00
divma	d0cbf3111a	move sync state to the chains KV (#1940 ) ## Issue Addressed we have a log saying we add a peer to a chain, and an another one in case the chain is not syncing. To avoid needing to peer there two (and reduce log entries) simply log the chain's syncing state in the chain's KV	2020-11-22 23:58:23 +00:00
Michael Sproul	426b3001e0	Fix race condition in seen caches (#1937 ) ## Issue Addressed Closes #1719 ## Proposed Changes Lift the internal `RwLock`s and `Mutex`es from the `Observed*` data structures to resolve the race conditions described in #1719. Most of this work was done by @paulhauner on his `lift-locks` branch, I merely updated it for the current `master` and checked over it. ## Additional Info I think it would be prudent to test this on a testnet or two before mainnet launch, just to be sure that the extra lock contention doesn't negatively impact performance.	2020-11-22 23:02:51 +00:00
Paul Hauner	0b556c4405	Fix metrics http server error messages (#1946 ) ## Issue Addressed - Resolves #1945 ## Proposed Changes - As per #1945, fix a log message from the metrics server that was falsely claiming to be from the api server. - Ensure successful api request logs are published to debug, not trace. This is something I've wanted to do for a while. ## Additional Info NA	2020-11-22 03:39:13 +00:00
Paul Hauner	48f73b21e6	Expand eth1 block cache, add more logs (#1938 ) ## Issue Addressed NA ## Proposed Changes - Caches later blocks than is required by `ETH1_FOLLOW_DISTANCE`. - Adds logging to `warn` if the eth1 cache is insufficiently primed. - Use `max_by_key` instead of `max_by` in `BeaconChain::Eth1Chain` since it's simpler. - Rename `voting_period_start_timestamp` to `voting_target_timestamp` for accuracy. ## Additional Info The reason for eating into the `ETH1_FOLLOW_DISTANCE` and caching blocks that are closer to the head is due to possibility for `SECONDS_PER_ETH1_BLOCK` to be incorrect (as is the case for the Pyrmont testnet on Goerli). If `SECONDS_PER_ETH1_BLOCK` is too short, we'll skip back too far from the head and skip over blocks that would be valid [`is_candidate_block`](https://github.com/ethereum/eth2.0-specs/blob/v1.0.0/specs/phase0/validator.md#eth1-data) blocks. This was the case on the Pyrmont testnet and resulted in Lighthouse choosing blocks that were about 30 minutes older than is ideal.	2020-11-21 00:26:15 +00:00
Kirk Baird	3b405f10ea	Ensure deposit signatures do not use aggregate functions (#1935 ) ## Issue Addressed Resolves #1333 ## Proposed Changes - Remove `deposit_signature_set()` function - Prevent deposits from being in `SignatureSets` - User `Signature.verify()` to verify deposit signatures rather than a signature set which uses `fast_aggregate_verify()` ## Additional Info n/a	2020-11-20 03:37:20 +00:00
divma	d727e55abe	Move some rpc processing to the beacon_processor (#1936 ) ## Issue Addressed `BlocksByRange` requests were the main culprit of a series of timeouts to peer's requests in general because they produce build up in the router's processor. Those were moved to the blocking executor but a task is being spawned for each; also not ideal since the amount of resources we give to those is not controlled ## Proposed Changes - Move `BlocksByRange` and `BlocksByRoots` to the `beacon_processor`. The processor crafts the responses and sends them. - Move too the processing of `StatusMessage`s from other peers. This is a fast operation but it can also build up and won't scale if we keep it in the router (processing one at the time). These don't need to send an answer, so there is no harm in processing them "later" if that were to happen. Sending responses to status requests is still in the router, so we answer as soon as we see them. - Some "extras" that are basically clean up: - Split the `Worker` logic in sync methods (chain processing and rpc blocks), gossip methods (the majority of methods) and rpc methods (the new ones) - Move the `status_message` function previously provided by the router's processor to a more central place since it is used by the router, sync, network_context and beacon_processor - Some spelling ## Additional Info What's left to decide/test more thoroughly is the length of the queues and the priority rules. @paulhauner suggested at some point to put status above attestations, and @AgeManning had described an importance of "protecting gossipsub" so my solution is leaving status requests in the router and RPC methods below attestations. Slashings and Exits are at the end.	2020-11-19 23:33:44 +00:00
Pawan Dhananjay	e47739047d	Add additional libp2p tests (#1867 ) ## Issue Addressed N/A ## Proposed Changes Adds tests for the eth2_libp2p crate.	2020-11-19 22:32:09 +00:00
realbigsean	79fd9b32b9	Update pool/attestations and committees endpoints (#1899 ) ## Issue Addressed Catching up on a few eth2 spec updates: ## Proposed Changes - adding query params to the `GET pool/attestations` endpoint - allowing the `POST pool/attestations` endpoint to accept an array of attestations - batching attestation submission - moving `epoch` from a path param to a query param in the `committees` endpoint ## Additional Info Co-authored-by: realbigsean <seananderson33@gmail.com>	2020-11-18 23:31:39 +00:00
blacktemplar	3408de8151	Avoid string initialization in network metrics and replace by &str where possible (#1898 ) ## Issue Addressed NA ## Proposed Changes Removes most of the temporary string initializations in network metrics and replaces them by directly using `&str`. This further improves on PR https://github.com/sigp/lighthouse/pull/1895. For the subnet id handling the current approach uses a build script to create a static map. This has the disadvantage that the build script hardcodes the number of subnets. If we want to use more than 64 subnets we need to adjust this in the build script. ## Additional Info We still have some string initializations for the enum `PeerKind`. To also replace that by `&str` I created a PR in the libp2p dependency: https://github.com/sigp/rust-libp2p/pull/91. Either we wait with merging until this dependency PR is merged (and all conflicts with the newest libp2p version are resolved) or we just merge as is and I will create another PR when the dependency is ready.	2020-11-18 23:31:37 +00:00
Paul Hauner	bcc7f6b143	Add new flag to set blocks per eth1 query (#1931 ) ## Issue Addressed NA ## Proposed Changes Users on Discord (and @protolambda) have experienced this error (or variants of it): ``` Failed to update eth1 cache: GetDepositLogsFailed("Eth1 node returned error: {\"code\":-32005,\"message\":\"query returned more than 10000 results\"}") ``` This PR allows users to reduce the span of blocks searched for deposit logs and therefore reduce the size of the return result. Hopefully experimentation with this flag can lead to finding a better default value. ## Additional Info NA	2020-11-18 22:18:59 +00:00
Paul Hauner	7e4ee58729	Bump to v0.3.5 (#1927 ) ## Issue Addressed NA ## Proposed Changes - Bump version to `v0.3.5` - Run `cargo update` ## Additional Info NA	2020-11-18 00:44:28 +00:00
Paul Hauner	103103e72e	Address queue congestion in migrator (#1923 ) ## Issue Addressed Should address #1917 ## Proposed Changes Stops the `BackgroupMigrator` rx channel from backing up with big `BeaconState` messages. Looking at some logs from my Medalla node, we can see a discrepancy between the head finalized epoch and the migrator finalized epoch: ``` Nov 17 16:50:21.606 DEBG Head beacon block slot: 129214, root: 0xbc7a…0b99, finalized_epoch: 4033, finalized_root: 0xf930…6562, justified_epoch: 4035, justified_root: 0x206b…9321, service: beacon Nov 17 16:50:21.626 DEBG Batch processed service: sync, processed_blocks: 43, last_block_slot: 129214, chain: 8274002112260436595, first_block_slot: 129153, batch_epoch: 4036 Nov 17 16:50:21.626 DEBG Chain advanced processing_target: 4036, new_start: 4036, previous_start: 4034, chain: 8274002112260436595, service: sync Nov 17 16:50:22.162 DEBG Completed batch received awaiting_batches: 5, blocks: 47, epoch: 4048, chain: 8274002112260436595, service: sync Nov 17 16:50:22.162 DEBG Requesting batch start_slot: 129601, end_slot: 129664, downloaded: 0, processed: 0, state: Downloading(16Uiu2HAmG3C3t1McaseReECjAF694tjVVjkDoneZEbxNhWm1nZaT, 0 blocks, 1273), epoch: 4050, chain: 8274002112260436595, service: sync Nov 17 16:50:22.654 DEBG Database compaction complete service: beacon Nov 17 16:50:22.655 INFO Starting database pruning new_finalized_epoch: 2193, old_finalized_epoch: 2192, service: beacon ``` I believe this indicates that the migrator rx has a backed-up queue of `MigrationNotification` items which each contain a `BeaconState`. ## TODO - [x] Remove finalized state requirement for op-pool	2020-11-17 23:11:26 +00:00
Michael Sproul	a60ab4eff2	Refine compaction (#1916 ) ## Proposed Changes In an attempt to fix OOM issues and database consistency issues observed by some users after the introduction of compaction in v0.3.4, this PR makes the following changes: * Run compaction less often: roughly every 1024 epochs, including after long periods of non-finality. I think the division check proposed by Paul is pretty solid, and ensures we don't miss any events where we should be compacting. LevelDB lacks an easy way to check the size of the DB, which would be another good trigger. * Make it possible to disable the compaction on finalization using `--auto-compact-db=false` * Make it possible to trigger a manual, single-threaded foreground compaction on start-up using `--compact-db` * Downgrade the pruning log to `DEBUG`, as it's particularly noisy during sync I would like to ship these changes to affected users ASAP, and will document them further in the Advanced Database section of the book if they prove effective.	2020-11-17 09:10:53 +00:00
divma	398919b5d4	router: drop requests from peers that have dc'd (#1919 ) ## Issue Addressed A peer might send a lot of requests that comply to the rate limit and the disconnect, this humongous pr makes sure we don't process them if the peer is not connected	2020-11-17 02:06:21 +00:00
Pawan Dhananjay	280334b1b0	Validate eth1 chain id (#1877 ) ## Issue Addressed Resolves #1815 ## Proposed Changes Adds extra validation for eth1 chain id apart from the existing check for eth1 network id.	2020-11-16 23:10:42 +00:00
Age Manning	49c4630045	Performance improvement for db reads (#1909 ) This PR adds a number of improvements: - Downgrade a warning log when we ignore blocks for gossipsub processing - Revert a a correction to improve logging of peer score changes - Shift syncing DB reads off the core-executor allowing parallel processing of large sync messages - Correct the timeout logic of RPC chunk sends, giving more time before timing out RPC outbound messages.	2020-11-16 07:28:30 +00:00
divma	eb56140582	Update logs + do not downscore peers if WE time out (#1901 ) ## Issue Addressed - RPC Errors were being logged twice: first in the peer manager and then again in the router, so leave just the peer manager's one - The "reduce peer count" warn message gets thrown to the user for every missed chunk, so instead print it when the request times out and also do not include there info that is not relevant to the user - The processor didn't have the service tag so add it - Impl `KV` for status message - Do not downscore peers if we are the ones that timed out Other small improvements	2020-11-16 04:06:14 +00:00
realbigsean	6a7d221f72	add slot validation to attestation_data endpoint (#1888 ) ## Issue Addressed Resolves #1801 ## Proposed Changes Verify queries to `attestation_data` are for no later than `current_slot + 1`. If they are later than this, return a 400. Co-authored-by: realbigsean <seananderson33@gmail.com>	2020-11-16 02:59:35 +00:00
divma	8a16548715	Misc Peer sync info adjustments (#1896 ) ## Issue Addressed #1856 ## Proposed Changes - For clarity, the router's processor now only decides if a peer is compatible and it disconnects it or sends it to sync accordingly. No logic here regarding how useful is the peer. - Update peer_sync_info's rules - Add an `IrrelevantPeer` sync status to account for incompatible peers (maybe this should be "IncompatiblePeer" now that I think about it?) this state is update upon receiving an internal goodbye in the peer manager - Misc code cleanups - Reduce the need to create `StatusMessage`s (and thus, `Arc` accesses ) - Add missing calls to update the global sync state The overall effect should be: - More peers recognized as Behind, and less as Unknown - Peers identified as incompatible	2020-11-13 09:00:10 +00:00
Michael Sproul	46a06069c6	Release v0.3.4 (#1894 ) ## Proposed Changes Bump version to v0.3.4 and update dependencies with `cargo update`. Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2020-11-13 06:06:35 +00:00
Age Manning	c00e6c2c6f	Small network adjustments (#1884 ) ## Issue Addressed - Asymmetric pings - Currently with symmetric ping intervals, lighthouse nodes race each other to ping often ending in simultaneous ping connections. This shifts the ping interval to be asymmetric based on inbound/outbound connections - Correct inbound/outbound peer-db registering - It appears we were accounting inbound as outbound and vice versa in the peerdb, this has been corrected - Improved logging There is likely more to come - I'll leave this open as we investigate further testnets	2020-11-13 06:06:33 +00:00
Paul Hauner	8772c02fa0	Reduce temp allocations in network metrics (#1895 ) ## Issue Addressed Using `heaptrack` I could see that ~75% of Lighthouse temporary allocations are caused by temporary string allocations here. ## Proposed Changes Reduces temporary `String` allocations when updating metrics in the `network` crate. The solution isn't perfect since we rebuild our caches with each call, but it's a significant improvement. ## Additional Info NA	2020-11-13 04:19:38 +00:00
blacktemplar	c7ac967d5a	handle peer state transitions on gossipsub score changes + refactoring (#1892 ) ## Issue Addressed NA ## Proposed Changes Correctly handles peer state transitions on gossipsub changes + refactors handling of peer state transitions into one function used for lighthouse score changes and gossipsub score changes. Co-authored-by: Age Manning <Age@AgeManning.com>	2020-11-13 03:15:03 +00:00
realbigsean	cb26c15eb6	Peer endpoint updates (#1893 ) ## Issue Addressed N/A ## Proposed Changes - rename `address` -> `last_seen_p2p_address` - state and direction filters for `peers` endpoint - metadata count addition to `peers` endpoint - add `peer_count` endpoint Co-authored-by: realbigsean <seananderson33@gmail.com>	2020-11-13 02:02:41 +00:00
blacktemplar	fcb4893f72	do subnet discoveries until we have MESH_N_LOW many peers (#1886 ) ## Issue Addressed NA ## Proposed Changes Increases the target peers for a subnet, so that subnet queries are executed until we have at least the minimum required peers for a mesh (`MESH_N_LOW`). We keep the limit of `6` target peers for aggregated subnet discovery queries, therefore the size (and the time needed) for a query doesn't change.	2020-11-13 00:56:05 +00:00
blacktemplar	7404f1ce54	Gossipsub scoring (#1668 ) ## Issue Addressed #1606 ## Proposed Changes Uses dynamic gossipsub scoring parameters depending on the number of active validators as specified in https://gist.github.com/blacktemplar/5c1862cb3f0e32a1a7fb0b25e79e6e2c. ## Additional Info Although the parameters got tested on Medalla, extensive testing using simulations on larger networks is still to be done and we expect that we need to change the parameters, although this might only affect constants within the dynamic parameter framework.	2020-11-12 01:48:28 +00:00
realbigsean	f8da151b0b	Standard beacon api updates (#1831 ) ## Issue Addressed Resolves #1809 Resolves #1824 Resolves #1818 Resolves #1828 (hopefully) ## Proposed Changes - add `validator_index` to the proposer duties endpoint - add the ability to query for historical proposer duties - `StateId` deserialization now fails with a 400 warp rejection - add the `validator_balances` endpoint - update the `aggregate_and_proofs` endpoint to accept an array - updates the attester duties endpoint from a `GET` to a `POST` - reduces the number of times we query for proposer duties from once per slot per validator to only once per slot Co-authored-by: realbigsean <seananderson33@gmail.com> Co-authored-by: Paul Hauner <paul@paulhauner.com>	2020-11-09 23:13:56 +00:00
Michael Sproul	556190ff46	Compact database on finalization (#1871 ) ## Issue Addressed Closes #1866 ## Proposed Changes * Compact the database on finalization. This removes the deleted states from disk completely. Because it happens in the background migrator, it doesn't block other database operations while it runs. On my Medalla node it took about 1 minute and shrank the database from 90GB to 9GB. * Fix an inefficiency in the pruning algorithm where it would always use the genesis checkpoint as the `old_finalized_checkpoint` when running for the first time after start-up. This would result in loading lots of states one-at-a-time back to genesis, and storing a lot of block roots in memory. The new code stores the old finalized checkpoint on disk and only uses genesis if no checkpoint is already stored. This makes it both backwards compatible _and_ forwards compatible -- no schema change required! * Introduce two new `INFO` logs to indicate when pruning has started and completed. Users seem to want to know this information without enabling debug logs!	2020-11-09 07:02:21 +00:00
Paul Hauner	2f9999752e	Add --testnet mainnet and start HTTP server before genesis (#1862 ) ## Issue Addressed NA ## Proposed Changes - Adds support for `--testnet mainnet` - Start HTTP server prior to genesis ## Additional Info Note: This is an incomplete work-in-progress. Use Lighthouse for mainnet at your own risk. With this PR, you can check the deposits: ```bash lighthouse --testnet mainnet bn --http ``` ```bash curl localhost:5052/lighthouse/eth1/deposit_cache \| jq ``` ```json { "data": [ { "deposit_data": { "pubkey": "0x854980aa9bf2e84723e1fa6ef682e3537257984cc9cb1daea2ce6b268084b414f0bb43206e9fa6fd7a202357d6eb2b0d", "withdrawal_credentials": "0x00cacf703c658b802d55baa2a5c1777500ef5051fc084330d2761bcb6ab6182b", "amount": "32000000000", "signature": "0xace226cdfd9da6b1d827c3a6ab93f91f53e8e090eb6ca5ee7c7c5fe3acc75558240ca9291684a2a7af5cac67f0558d1109cc95309f5cdf8c125185ec9dcd22635f900d791316924aed7c40cff2ffccdac0d44cf496853db678c8c53745b3545b" }, "block_number": 3492981, "index": 0, "signature_is_valid": true }, { "deposit_data": { "pubkey": "0x93da03a71bc4ed163c2f91c8a54ea3ba2461383dd615388fd494670f8ce571b46e698fc8d04b49e4a8ffe653f581806b", "withdrawal_credentials": "0x006ebfbb7c8269a78018c8b810492979561d0404d74ba9c234650baa7524dcc4", "amount": "32000000000", "signature": "0x8d1f4a1683f798a76effcc6e2cdb8c3eed5a79123d201c5ecd4ab91f768a03c30885455b8a952aeec3c02110457f97ae0a60724187b6d4129d7c352f2e1ac19b4210daacd892fe4629ad3260ce2911dceae3890b04ed28267b2d8cb831f6a92d" }, "block_number": 3493427, "index": 1, "signature_is_valid": true }, ```	2020-11-09 05:04:03 +00:00
divma	b0e9e3dcef	Seen addresses store port (#1841 ) ## Issue Addressed #1764	2020-11-09 04:01:03 +00:00
Age Manning	e2ae5010a6	Update libp2p (#1865 ) Updates libp2p to the latest version. This adds tokio 0.3 support and brings back yamux support. This also updates some discv5 configuration parameters for leaner discovery queries	2020-11-06 04:14:14 +00:00
blacktemplar	7e7fad5734	Ignore RPC messages of disconnected peers and remove old peers based on disconnection time (#1854 ) ## Issue Addressed NA ## Proposed Changes Lets the networking behavior ignore messages of peers that are not connected. Furthermore, old peers are not removed from the peerdb based on score anymore but based on the disconnection time.	2020-11-03 23:43:10 +00:00
Age Manning	0a0f4daf9d	Prevent errors for stream termination race (#1853 ) Prevents an error being propagated on a race condition for RPC stream termination	2020-11-03 10:37:00 +00:00
Paul Hauner	0cde4e285c	Bump version to v0.3.3 (#1850 ) ## Issue Addressed NA ## Proposed Changes - Update versions - Run `cargo update` ## Additional Info - Blocked on #1846	2020-11-02 23:55:15 +00:00
Paul Hauner	7afbaa807e	Return eth1-related data via the API (#1797 ) ## Issue Addressed - Related to #1691 ## Proposed Changes Adds the following API endpoints: - `GET lighthouse/eth1/syncing`: status about how synced we are with Eth1. - `GET lighthouse/eth1/block_cache`: all locally cached eth1 blocks. - `GET lighthouse/eth1/deposit_cache`: all locally cached eth1 deposits. Additionally: - Moves some types from the `beacon_node/eth1` to the `common/eth2` crate, so they can be used in the API without duplication. - Allow `update_deposit_cache` and `update_block_cache` to take an optional head block number to avoid duplicate requests. ## Additional Info TBC	2020-11-02 00:37:30 +00:00
divma	6c0c050fbb	Tweak head syncing (#1845 ) ## Issue Addressed Fixes head syncing ## Proposed Changes - Get back to statusing peers after removing chain segments and making the peer manager deal with status according to the Sync status, preventing an old known deadlock - Also a bug where a chain would get removed if the optimistic batch succeeds being empty ## Additional Info Tested on Medalla and looking good	2020-11-01 23:37:39 +00:00
Paul Hauner	f64f8246db	Only run http_api tests in release (#1827 ) ## Issue Addressed NA ## Proposed Changes As raised by @hermanjunge in a DM, the `http_api` tests have been observed taking 100+ minutes on debug. This PR: - Moves the `http_api` tests to only run in release. - Groups some `http_api` tests to reduce test-setup overhead. ## Additional Info NA	2020-10-29 22:25:20 +00:00
realbigsean	ae0f025375	Beacon state validator id filter (#1803 ) ## Issue Addressed Michael's comment here: https://github.com/sigp/lighthouse/issues/1434#issuecomment-708834079 Resolves #1808 ## Proposed Changes - Add query param `id` and `status` to the `validators` endpoint - Add string serialization and deserialization for `ValidatorStatus` - Drop `Epoch` from `ValidatorStatus` variants ## Additional Info Please provide any additional information. For example, future considerations or information useful for reviewers.	2020-10-29 05:13:04 +00:00
divma	9f45ac2f5e	More sync edge cases + prettify range (#1834 ) ## Issue Addressed Sync edge case when we get an empty optimistic batch that passes validation and is inside the download buffer. Eventually the chain would reach the batch and treat it as an ugly state. ## Proposed Changes - Handle the edge case advancing the chain's target + code clarification - Some largey changes for readability + ergonomics since rust has try ops - Better handling of bad batch and chain states	2020-10-29 02:29:24 +00:00
blacktemplar	2bd5b9182f	fix unbanning of peers (#1838 ) ## Issue Addressed NA ## Proposed Changes Currently a banned peer will remain banned indefinitely as long as update is called on the score struct regularly. This fixes this bug and the score decay starts after `BANNED_BEFORE_DECAY` seconds after banning.	2020-10-29 01:25:02 +00:00
Michael Sproul	36bd4d87f0	Update to spec v1.0.0-rc.0 and BLSv4 (#1765 ) ## Issue Addressed Closes #1504 Closes #1505 Replaces #1703 Closes #1707 ## Proposed Changes * Update BLST and Milagro to versions compatible with BLSv4 spec * Update Lighthouse to spec v1.0.0-rc.0, and update EF test vectors * Use the v1.0.0 constants for `MainnetEthSpec`. * Rename `InteropEthSpec` -> `V012LegacyEthSpec` * Change all constants to suit the mainnet `v0.12.3` specification (i.e., Medalla). * Deprecate the `--spec` flag for the `lighthouse` binary * This value is now obtained from the `config_name` field of the `YamlConfig`. * Built in testnet YAML files have been updated. * Ignore the `--spec` value, if supplied, log a warning that it will be deprecated * `lcli` still has the spec flag, that's fine because it's dev tooling. * Remove the `E: EthSpec` from `YamlConfig` * This means we need to deser the genesis `BeaconState` on-demand, but this is fine. * Swap the old "minimal", "mainnet" strings over to the new `EthSpecId` enum. * Always require a `CONFIG_NAME` field in `YamlConfig` (it used to have a default). ## Additional Info Lots of breaking changes, do not merge! ~~We will likely need a Lighthouse v0.4.0 branch, and possibly a long-term v0.3.0 branch to keep Medalla alive~~. Co-authored-by: Kirk Baird <baird.k@outlook.com> Co-authored-by: Paul Hauner <paul@paulhauner.com>	2020-10-28 22:19:38 +00:00
divma	ad846ad280	Inform peers of requests that exceed the maximum rate limit + log downgrade (#1830 ) ## Issue Addressed #1825 ## Proposed Changes Since we penalize more blocks by range requests that have large steps, it is possible to get requests that will never be processed. We were not informing peers about this requests and also logging CRIT that is no longer relevant. Later we should check if more sophisticated handling for those requests is needed	2020-10-27 11:46:38 +00:00
Paul Hauner	92c8eba8ca	Ensure eth1 deposit/chain IDs are used from YamlConfig (#1829 ) ## Issue Addressed NA ## Proposed Changes Fixes a bug which causes the node to reject valid eth1 nodes. - Fix core bug: failure to apply `YamlConfig` values to `ChainSpec`. - Add a test to prevent regression in this specific case. - Fix an invalid log message ## Additional Info NA	2020-10-26 03:34:14 +00:00
Paul Hauner	f157d61cc7	Address clippy lints, panic in ssz_derive on overflow (#1714 ) ## Issue Addressed NA ## Proposed Changes - Panic or return error if we overflow `usize` in SSZ decoding/encoding derive macros. - I claim that the panics can only be triggered by a faulty type definition in lighthouse, they cannot be triggered externally on a validly defined struct. - Use `Ordering` instead of some `if` statements, as demanded by clippy. - Remove some old clippy `allow` that seem to no longer be required. - Add comments to interesting clippy statements that we're going to continue to ignore. - Create #1713 ## Additional Info NA	2020-10-25 23:27:39 +00:00
Paul Hauner	eba51f0973	Update testnet configs, change on-disk format (#1799 ) ## Issue Addressed - Related to #1691 ## Proposed Changes - Add `DEPOSIT_CHAIN_ID` and `DEPOSIT_NETWORK_ID` to `config.yaml`. - Pass the `DEPOSIT_NETWORK_ID` to the `eth1::Service`. - Remove the unused `MAX_EPOCHS_PER_CROSSLINK` from the `altona` and `medalla` configs (see [spec commit](`2befe90032 (diff-efb845ac2ebd4aafbc23df40f47ce25699255064e99d36d0406d0a14ca7953ec)`)). - Change from compressing the whole testnet directory, to only compressing the genesis state file. This is the only file we need to compress and not compressing the others makes them work nicely with git. - We can modify the boot nodes, configs, etc. without incurring an eternal binary-blob cost on our git history. - This change is backwards compatible (i.e., non-breaking). ## Additional Info NA	2020-10-25 22:15:46 +00:00
Age Manning	7453f39d68	Prevent unbanning of disconnected peers (#1822 ) ## Issue Addressed Further testing revealed another edge case where we attempt to unban a peer that can be in a disconnected start. Although this causes no real issue, it does log an error to the user. This PR adds a check to prevent this edge case and prevents the error being logged to the user.	2020-10-24 05:24:20 +00:00
Age Manning	a3cc1a1e0f	Call unban only when necessary (#1821 ) This PR prevents a user-facing error. It prevents optimistically unbanning a peer and instead checks the state of the peer before requesting the peers state to be unbanned.	2020-10-24 03:24:19 +00:00
blacktemplar	1644289a08	Updates the libp2p to the second newest commit => Allow only one topic per message (#1819 ) As @AgeManning mentioned the newest libp2p version had some problems and got downgraded again on lighthouse master. This is an intermediate version that makes no problems and only adds a small change of allowing only one topic per message.	2020-10-24 01:05:37 +00:00
Age Manning	7870b81ade	Downgrade libp2p (#1817 ) ## Description This downgrades the recent libp2p upgrade. There were issues with the RPC which prevented syncing of the chain and this upgrade needs to be further investigated.	2020-10-23 09:33:59 +00:00
Age Manning	55eee18ebb	Version bump to 0.3.1 (#1813 ) ## Description Bumps Lighthouse to version 0.3.1.	2020-10-23 04:16:36 +00:00
Age Manning	64c5899d25	Adds colour help to bn and vc subcommands (#1811 ) Adds coloured help to the bn and vc subcommands	2020-10-23 04:16:34 +00:00
Age Manning	2c7f362908	Discovery v5.1 (#1786 ) ## Overview This updates lighthouse to discovery v5.1 Note: This makes lighthouse's discovery not compatible with any previous version. Lighthouse cannot discover peers or send/receive ENR's from any previous version. This is a breaking change. This resolves #1605	2020-10-23 04:16:33 +00:00
Age Manning	ae96dab5d2	Increase UPnP logging and decrease batch sizes (#1812 ) ## Description This increases the logging of the underlying UPnP tasks to inform the user of UPnP error/success. This also decreases the batch syncing size to two epochs per batch.	2020-10-23 03:01:33 +00:00
Age Manning	c49dd94e20	Update to latest libp2p (#1810 ) ## Description Updates to the latest libp2p and includes gossipsub updates. Of particular note is the limitation of a single topic per gossipsub message. Co-authored-by: blacktemplar <blacktemplar@a1.net>	2020-10-23 03:01:31 +00:00
Michael Sproul	acd49d988d	Implement database temp states to reduce memory usage (#1798 ) ## Issue Addressed Closes #800 Closes #1713 ## Proposed Changes Implement the temporary state storage algorithm described in #800. Specifically: * Add `DBColumn::BeaconStateTemporary`, for storing 0-length temporary marker values. * Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully. * Add a garbage collection process to delete leftover temporary states on start-up. * Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784) ## Additional Info There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant. ### Race 1: Permanent state marked temporary EDIT: this has been fixed by the addition of a lock around the relevant critical section There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events: 1. Thread 1 checks if state `s` already exists, and seeing that it doesn't, prepares an atomic commit of `(s, s_temporary_flag)`. 2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and _deleting_ the temporary flag. 3. Thread 1 is (finally) scheduled again, and marks `s` as temporary with its transaction. 4. a) The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state `s` will fail... hope it isn't on the main chain! Alternatively (4b) happens... b) Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state `s` will disappear temporarily, but will come back once thread 1 finishes running. I _hope_ that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. [LMDB only allows one R/W transaction at a time](https://docs.rs/lmdb/0.8.0/lmdb/struct.Environment.html#method.begin_rw_txn). ### Race 2: Temporary state returned from `get_state` I don't think this race really matters, but in `load_hot_state`, if another thread stores a state between when we call `load_state_temporary_flag` and when we call `load_hot_state_summary`, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than _not_ having data). This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now.	2020-10-23 01:27:51 +00:00
Age Manning	66f0cf4430	Improve peer handling (#1796 ) ## Issue Addressed Potentially resolves #1647 and sync stalls. ## Proposed Changes The handling of the state of banned peers was inadequate for the complex peerdb data structure. We store a limited number of disconnected and banned peers in the db. We were not tracking intermediate "disconnecting" states and the in some circumstances we were updating the peer state without informing the peerdb. This lead to a number of inconsistencies in the peer state. Further, the peer manager could ban a peer changing a peer's state from being connected to banned. In this circumstance, if the peer then disconnected, we didn't inform the application layer, which lead to applications like sync not being informed of a peers disconnection. This could lead to sync stalling and having to require a lighthouse restart. Improved handling for peer states and interactions with the peerdb is made in this PR.	2020-10-23 01:27:48 +00:00
Paul Hauner	b829257cca	Ssz state (#1749 ) ## Issue Addressed NA ## Proposed Changes Adds a `lighthouse/beacon/states/:state_id/ssz` endpoint to allow us to pull the genesis state from the API. ## Additional Info NA	2020-10-22 06:05:49 +00:00
Michael Sproul	7f73dccebc	Refine op pool pruning (#1805 ) ## Issue Addressed Closes #1769 Closes #1708 ## Proposed Changes Tweaks the op pool pruning so that the attestation pool is pruned against the wall-clock epoch instead of the finalized state's epoch. This should reduce the unbounded growth that we've seen during periods without finality. Also fixes up the voluntary exit pruning as raised in #1708.	2020-10-22 04:47:29 +00:00
Paul Hauner	a3704b971e	Support pre-flight CORS check (#1772 ) ## Issue Addressed - Resolves #1766 ## Proposed Changes - Use the `warp::filters::cors` filter instead of our work-around. ## Additional Info It's not trivial to enable/disable `cors` using `warp`, since using `routes.with(cors)` changes the type of `routes`. This makes it difficult to apply/not apply cors at runtime. My solution has been to always use the `warp::filters::cors` wrapper but when cors should be disabled, just pass the HTTP server listen address as the only permissible origin.	2020-10-22 04:47:27 +00:00
realbigsean	a3552a4b70	Node endpoints (#1778 ) ## Issue Addressed `node` endpoints in #1434 ## Proposed Changes Implement these: ``` /eth/v1/node/health /eth/v1/node/peers/{peer_id} /eth/v1/node/peers ``` - Add an `Option<Enr>` to `PeerInfo` - Finish implementation of `/eth/v1/node/identity` ## Additional Info - should update the `peers` endpoints when #1764 is resolved Co-authored-by: realbigsean <seananderson33@gmail.com>	2020-10-22 02:59:42 +00:00
Daniel Schonfeld	8f86baa48d	Optimize attester slashing (#1745 ) ## Issue Addressed Closes #1548 ## Proposed Changes Optimizes attester slashing choice by choosing the ones that cover the most amount of validators slashed, with the highest effective balances ## Additional Info Initial pass, need to write a test for it	2020-10-22 01:43:54 +00:00
divma	668513b67e	Sync state adjustments (#1804 ) check for advanced peers and the state of the chain wrt the clock slot to decide if a chain is or not synced /transitioning to a head sync. Also a fix that prevented getting the right state while syncing heads	2020-10-22 00:26:06 +00:00
realbigsean	628891df1d	fix genesis state root provided to HTTP server (#1783 ) ## Issue Addressed Resolves #1776 ## Proposed Changes The beacon chain builder was using the canonical head's state root for the `genesis_state_root` field. ## Additional Info	2020-10-21 23:15:30 +00:00
realbigsean	fdb9744759	use head slot instead of the target slot for the not_while_syncing fi… (#1802 ) ## Issue Addressed Resolves #1792 ## Proposed Changes Use `chain.best_slot()` instead of the sync state's target slot in the `not_while_syncing_filter` ## Additional Info N/A	2020-10-21 22:02:25 +00:00
divma	2acf75785c	More sync updates (#1791 ) ## Issue Addressed #1614 and a couple of sync-stalling problems, the most important is a cyclic dependency between the sync manager and the peer manager	2020-10-20 22:34:18 +00:00
Michael Sproul	703c33bdc7	Fix head tracker concurrency bugs (#1771 ) ## Issue Addressed Closes #1557 ## Proposed Changes Modify the pruning algorithm so that it mutates the head-tracker _before_ committing the database transaction to disk, and _only if_ all the heads to be removed are still present in the head-tracker (i.e. no concurrent mutations). In the process of writing and testing this I also had to make a few other changes: * Use internal mutability for all `BeaconChainHarness` functions (namely the RNG and the graffiti), in order to enable parallel calls (see testing section below). * Disable logging in harness tests unless the `test_logger` feature is turned on And chose to make some clean-ups: * Delete the `NullMigrator` * Remove type-based configuration for the migrator in favour of runtime config (simpler, less duplicated code) * Use the non-blocking migrator unless the blocking migrator is required. In the store tests we need the blocking migrator because some tests make asserts about the state of the DB after the migration has run. * Rename `validators_keypairs` -> `validator_keypairs` in the `BeaconChainHarness` ## Testing To confirm that the fix worked, I wrote a test using [Hiatus](https://crates.io/crates/hiatus), which can be found here: https://github.com/michaelsproul/lighthouse/tree/hiatus-issue-1557 That test can't be merged because it inserts random breakpoints everywhere, but if you check out that branch you can run the test with: ``` $ cd beacon_node/beacon_chain $ cargo test --release --test parallel_tests --features test_logger ``` It should pass, and the log output should show: ``` WARN Pruning deferred because of a concurrent mutation, message: this is expected only very rarely! ``` ## Additional Info This is a backwards-compatible change with no impact on consensus.	2020-10-19 05:58:39 +00:00
blacktemplar	6ba997b88e	add direction information to PeerInfo (#1768 ) ## Issue Addressed NA ## Proposed Changes Adds a direction field to `PeerConnectionStatus` that can be accessed by calling `is_outgoing` which will return `true` iff the peer is connected and the first connection was an outgoing one.	2020-10-16 05:24:21 +00:00
Herman Junge	d7b9d0dd9f	Implement matches! macro (#1777 ) Fix #1775	2020-10-15 21:42:43 +00:00
Pawan Dhananjay	97be2ca295	Simulator and attestation service fixes (#1747 ) ## Issue Addressed #1729 #1730 Which issue # does this PR address? ## Proposed Changes 1. Fixes a bug in the simulator where nodes can't find each other due to 0 udp ports in their enr. 2. Fixes bugs in attestation service where we are unsubscribing from a subnet prematurely. More testing is needed for attestation service fixes.	2020-10-15 07:11:31 +00:00
blacktemplar	a0634cc64f	Gossipsub topic filters (#1767 ) ## Proposed Changes Adds a gossipsub topic filter that only allows subscribing and incoming subscriptions from valid ETH2 topics. ## Additional Info Currently the preparation of the valid topic hashes uses only the current fork id but in the future it must also use all possible future fork ids for planned forks. This has to get added when hard coded forks get implemented. DO NOT MERGE: We first need to merge the libp2p changes (see https://github.com/sigp/rust-libp2p/pull/70) so that we can refer from here to a commit hash inside the lighthouse branch.	2020-10-14 10:12:57 +00:00
blacktemplar	8248afa793	Updates the message-id according to the Networking Spec (#1752 ) ## Proposed Changes Implement the new message id function (see https://github.com/ethereum/eth2.0-specs/pull/2089) using an additional fast message id function for better performance + caching decompressed data.	2020-10-14 06:51:58 +00:00
Pawan Dhananjay	99a02fd2ab	Limit snappy input stream (#1738 ) ## Issue Addressed N/A ## Proposed Changes This PR limits the length of the stream received by the snappy decoder to be the maximum allowed size for the received rpc message type. Also adds further checks to ensure that the length specified in the rpc [encoding-dependent header](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md#encoding-strategies) is within the bounds for the rpc message type being decoded.	2020-10-11 22:45:33 +00:00
Paul Hauner	0e4cc50262	Remove unused deps	2020-10-09 15:58:20 +11:00
Paul Hauner	db3e0578e9	Merge branch 'v0.3.0-staging' into v3-master	2020-10-09 15:27:08 +11:00
Paul Hauner	72cc5e35af	Bump version to v0.3.0 (#1743 ) ## Issue Addressed NA ## Proposed Changes - Bump version to v0.3.0 - Run `cargo update` ## Additional Info NA	2020-10-09 02:05:30 +00:00
Paul Hauner	da44821e39	Clean up obsolete TODOs (#1734 ) Squashed commit of the following: commit f99373cbaec9adb2bdbae3f7e903284327962083 Author: Age Manning <Age@AgeManning.com> Date: Mon Oct 5 18:44:09 2020 +1100 Clean up obsolute TODOs	2020-10-05 21:08:14 +11:00
Paul Hauner	ee7c8a0b7e	Update external deps (#1711 ) ## Issue Addressed - Resolves #1706 ## Proposed Changes Updates dependencies across the workspace. Any crate that was not able to be brought to the latest version is listed in #1712. ## Additional Info NA	2020-10-05 08:22:19 +00:00
Age Manning	240181e840	Upgrade discovery and restructure task execution (#1693 ) * Initial rebase * Remove old code * Correct release tests * Rebase commit * Remove eth2-testnet dep on eth2libp2p * Remove crates lost in rebase * Remove unused dep	2020-10-05 18:45:54 +11:00
Age Manning	bcb629564a	Improve error handling in network processing (#1654 ) * Improve error handling in network processing * Cargo fmt * Cargo fmt * Improve error handling for prior genesis * Remove dep	2020-10-05 17:34:56 +11:00
divma	113758a4f5	From panic to crit (#1726 ) ## Issue Addressed Downgrade inconsistent chain segment states from `panic` to `crit`. I don't love this solution but since range can always bounce back from any of those, we don't panic. Co-authored-by: Age Manning <Age@AgeManning.com>	2020-10-05 17:34:49 +11:00
Age Manning	a8c5af8874	Increase content-id length (#1725 ) ## Issue Addressed N/A ## Proposed Changes Increase gossipsub's content-id length to the full 32 byte hash. ## Additional Info N/A	2020-10-05 17:33:42 +11:00
divma	6997776494	Sync fixes (#1716 ) ## Issue Addressed chain state inconsistencies ## Proposed Changes - a batch can be fake-failed by Range if it needs to move a peer to another chain. The peer will still send blocks/ errors / produce timeouts for those requests, so check when we get a response from the RPC that the request id matches, instead of only the peer, since a re-request can be directed to the same peer. - if an optimistic batch succeeds, store the attempt to avoid trying it again when quickly switching chains. Also, use it only if ahead of our current target, instead of the segment's start epoch	2020-10-05 17:33:36 +11:00
Paul Hauner	e7eb99cb5e	Use Drop impl to send worker idle message (#1718 ) ## Issue Addressed NA ## Proposed Changes Uses a `Drop` implementation to help ensure that `BeaconProcessor` workers are freed. This will help prevent against regression, if someone happens to add an early return and it will also help in the case of a panic. ## Additional Info NA	2020-10-05 17:33:25 +11:00
Age Manning	fe07a3c21c	Improve error handling in network processing (#1654 ) * Improve error handling in network processing * Cargo fmt * Cargo fmt * Improve error handling for prior genesis * Remove dep	2020-10-05 17:30:43 +11:00
Age Manning	47c921f326	Update libp2p (#1728 ) ## Issue Addressed N/A ## Proposed Changes Updates the libp2p dependency to the latest version ## Additional Info N/A	2020-10-05 05:16:27 +00:00
divma	b1c121b880	From panic to crit (#1726 ) ## Issue Addressed Downgrade inconsistent chain segment states from `panic` to `crit`. I don't love this solution but since range can always bounce back from any of those, we don't panic. Co-authored-by: Age Manning <Age@AgeManning.com>	2020-10-05 04:02:09 +00:00
Age Manning	6b68c628df	Increase content-id length (#1725 ) ## Issue Addressed N/A ## Proposed Changes Increase gossipsub's content-id length to the full 32 byte hash. ## Additional Info N/A	2020-10-04 23:49:16 +00:00
divma	86a18e72c4	Sync fixes (#1716 ) ## Issue Addressed chain state inconsistencies ## Proposed Changes - a batch can be fake-failed by Range if it needs to move a peer to another chain. The peer will still send blocks/ errors / produce timeouts for those requests, so check when we get a response from the RPC that the request id matches, instead of only the peer, since a re-request can be directed to the same peer. - if an optimistic batch succeeds, store the attempt to avoid trying it again when quickly switching chains. Also, use it only if ahead of our current target, instead of the segment's start epoch	2020-10-04 23:49:14 +00:00
divma	e3c7b58657	Address a couple of TODOs (#1724 ) ## Issue Addressed couple of TODOs	2020-10-04 22:50:44 +00:00
Paul Hauner	d72c026d32	Use Drop impl to send worker idle message (#1718 ) ## Issue Addressed NA ## Proposed Changes Uses a `Drop` implementation to help ensure that `BeaconProcessor` workers are freed. This will help prevent against regression, if someone happens to add an early return and it will also help in the case of a panic. ## Additional Info NA	2020-10-04 21:59:20 +00:00
Paul Hauner	c4bd9c86e6	Add check for head/target consistency (#1702 ) ## Issue Addressed NA ## Proposed Changes Addresses an interesting DoS vector raised by @protolambda by verifying that the head and target are consistent when processing aggregate attestations. This check prevents us from loading very old target blocks and doing lots of work to skip them to the current slot. ## Additional Info NA	2020-10-03 10:08:06 +10:00
Sean	6af3bc9ce2	Add UPnP support for Lighthouse (#1587 ) This commit was modified by Paul H whilst rebasing master onto v0.3.0-staging Adding UPnP support will help grow the DHT by allowing NAT traversal for peers with UPnP supported routers. Using IGD library: https://docs.rs/igd/0.10.0/igd/ Adding the the libp2p tcp port and discovery udp port. If this fails it simply logs the attempt and moves on Co-authored-by: Age Manning <Age@AgeManning.com>	2020-10-03 10:07:47 +10:00
realbigsean	255cc25623	Weak subjectivity start from genesis (#1675 ) This commit was edited by Paul H when rebasing from master to v0.3.0-staging. Solution 2 proposed here: https://github.com/sigp/lighthouse/issues/1435#issuecomment-692317639 - Adds an optional `--wss-checkpoint` flag that takes a string `root:epoch` - Verify that the given checkpoint exists in the chain, or that the the chain syncs through this checkpoint. If not, shutdown and prompt the user to purge state before restarting. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2020-10-03 10:00:28 +10:00
Paul Hauner	32338bcafa	Add check for head/target consistency (#1702 ) ## Issue Addressed NA ## Proposed Changes Addresses an interesting DoS vector raised by @protolambda by verifying that the head and target are consistent when processing aggregate attestations. This check prevents us from loading very old target blocks and doing lots of work to skip them to the current slot. ## Additional Info NA	2020-10-02 10:46:37 +00:00
Paul Hauner	6ea3bc5e52	Implement VC API (#1657 ) ## Issue Addressed NA ## Proposed Changes - Implements a HTTP API for the validator client. - Creates EIP-2335 keystores with an empty `description` field, instead of a missing `description` field. Adds option to set name. - Be more graceful with setups without any validators (yet) - Remove an error log when there are no validators. - Create the `validator` dir if it doesn't exist. - Allow building a `ValidatorDir` without a withdrawal keystore (required for the API method where we only post a voting keystore). - Add optional `description` field to `validator_definitions.yml` ## TODO - [x] Signature header, as per https://github.com/sigp/lighthouse/issues/1269#issuecomment-649879855 - [x] Return validator descriptions - [x] Return deposit data - [x] Respect the mnemonic offset - [x] Check that mnemonic can derive returned keys - [x] Be strict about non-localhost - [x] Allow graceful start without any validators (+ create validator dir) - [x] Docs final pass - [x] Swap to EIP-2335 description field. - [x] Fix Zerioze TODO in VC api types. - [x] Zeroize secp256k1 key ## Endpoints - [x] `GET /lighthouse/version` - [x] `GET /lighthouse/health` - [x] `GET /lighthouse/validators` - [x] `POST /lighthouse/validators/hd` - [x] `POST /lighthouse/validators/keystore` - [x] `PATCH /lighthouse/validators/:validator_pubkey` - [ ] ~~`POST /lighthouse/validators/:validator_pubkey/exit/:epoch`~~ Future works ## Additional Info TBC	2020-10-02 09:42:19 +00:00
Sean	94b17ce02b	Add UPnP support for Lighthouse (#1587 ) Adding UPnP support will help grow the DHT by allowing NAT traversal for peers with UPnP supported routers. ## Issue Addressed #927 ## Proposed Changes Using IGD library: https://docs.rs/igd/0.10.0/igd/ Adding the the libp2p tcp port and discovery udp port. If this fails it simply logs the attempt and moves on ## Additional Info Co-authored-by: Age Manning <Age@AgeManning.com>	2020-10-02 08:47:00 +00:00
realbigsean	9d2d6239cd	Weak subjectivity start from genesis (#1675 ) ## Issue Addressed Solution 2 proposed here: https://github.com/sigp/lighthouse/issues/1435#issuecomment-692317639 ## Proposed Changes - Adds an optional `--wss-checkpoint` flag that takes a string `root:epoch` - Verify that the given checkpoint exists in the chain, or that the the chain syncs through this checkpoint. If not, shutdown and prompt the user to purge state before restarting. ## Additional Info Co-authored-by: Paul Hauner <paul@paulhauner.com>	2020-10-01 01:41:58 +00:00
Michael Sproul	22aedda1be	Add database schema versioning (#1688 ) ## Issue Addressed Closes #673 ## Proposed Changes Store a schema version in the database so that future releases can check they're running against a compatible database version. This would also enable automatic migration on breaking database changes, but that's left as future work. The database config is also stored in the database so that the `slots_per_restore_point` value can be checked for consistency, which closes #673	2020-10-01 11:12:36 +10:00
Paul Hauner	cdec3cec18	Implement standard eth2.0 API (#1569 ) - Resolves #1550 - Resolves #824 - Resolves #825 - Resolves #1131 - Resolves #1411 - Resolves #1256 - Resolve #1177 - Includes the `ShufflingId` struct initially defined in #1492. That PR is now closed and the changes are included here, with significant bug fixes. - Implement the https://github.com/ethereum/eth2.0-APIs in a new `http_api` crate using `warp`. This replaces the `rest_api` crate. - Add a new `common/eth2` crate which provides a wrapper around `reqwest`, providing the HTTP client that is used by the validator client and for testing. This replaces the `common/remote_beacon_node` crate. - Create a `http_metrics` crate which is a dedicated server for Prometheus metrics (they are no longer served on the same port as the REST API). We now have flags for `--metrics`, `--metrics-address`, etc. - Allow the `subnet_id` to be an optional parameter for `VerifiedUnaggregatedAttestation::verify`. This means it does not need to be provided unnecessarily by the validator client. - Move `fn map_attestation_committee` in `mod beacon_chain::attestation_verification` to a new `fn with_committee_cache` on the `BeaconChain` so the same cache can be used for obtaining validator duties. - Add some other helpers to `BeaconChain` to assist with common API duties (e.g., `block_root_at_slot`, `head_beacon_block_root`). - Change the `NaiveAggregationPool` so it can index attestations by `hash_tree_root(attestation.data)`. This is a requirement of the API. - Add functions to `BeaconChainHarness` to allow it to create slashings and exits. - Allow for `eth1::Eth1NetworkId` to go to/from a `String`. - Add functions to the `OperationPool` to allow getting all objects in the pool. - Add function to `BeaconState` to check if a committee cache is initialized. - Fix bug where `seconds_per_eth1_block` was not transferring over from `YamlConfig` to `ChainSpec`. - Add the `deposit_contract_address` to `YamlConfig` and `ChainSpec`. We needed to be able to return it in an API response. - Change some uses of serde `serialize_with` and `deserialize_with` to a single use of `with` (code quality). - Impl `Display` and `FromStr` for several BLS fields. - Check for clock discrepancy when VC polls BN for sync state (with +/- 1 slot tolerance). This is not intended to be comprehensive, it was just easy to do. - See #1434 for a per-endpoint overview. - Seeking clarity here: https://github.com/ethereum/eth2.0-APIs/issues/75 - [x] Add docs for prom port to close #1256 - [x] Follow up on this #1177 - [x] ~~Follow up with #1424~~ Will fix in future PR. - [x] Follow up with #1411 - [x] ~~Follow up with #1260~~ Will fix in future PR. - [x] Add quotes to all integers. - [x] Remove `rest_types` - [x] Address missing beacon block error. (#1629) - [x] ~~Add tests for lighthouse/peers endpoints~~ Wontfix - [x] ~~Follow up with validator status proposal~~ Tracked in #1434 - [x] Unify graffiti structs - [x] ~~Start server when waiting for genesis?~~ Will fix in future PR. - [x] TODO in http_api tests - [x] Move lighthouse endpoints off /eth/v1 - [x] Update docs to link to standard - ~~Blocked on #1586~~ Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2020-10-01 11:12:36 +10:00
Pawan Dhananjay	8e20176337	Directory restructure (#1532 ) Closes #1487 Closes #1427 Directory restructure in accordance with #1487. Also has temporary migration code to move the old directories into new structure. Also extracts all default directory names and utility functions into a `directory` crate to avoid repetitio. ~Since `validator_definition.yaml` stores absolute paths, users will have to manually change the keystore paths or delete the file to get the validators picked up by the vc.~. `validator_definition.yaml` is migrated as well from the default directories. Co-authored-by: realbigsean <seananderson33@gmail.com> Co-authored-by: Paul Hauner <paul@paulhauner.com>	2020-10-01 11:12:35 +10:00
Paul Hauner	dffc56ef1d	Fix validator lockfiles (#1586 ) ## Issue Addressed - Resolves #1313 ## Proposed Changes Changes the way we start the validator client and beacon node to ensure that we cleanly drop the validator keystores (which therefore ensures we cleanup their lockfiles). Previously we were holding the validator keystores in a tokio task that was being forcefully killed (i.e., without `Drop`). Now, we hold them in a task that can gracefully handle a shutdown. Also, switches the `--strict-lockfiles` flag to `--delete-lockfiles`. This means two things: 1. We are now strict on lockfiles by default (before we weren't). 1. There's a simple way for people delete the lockfiles if they experience a crash. ## Additional Info I've only given the option to ignore and delete lockfiles, not just ignore them. I can't see a strong need for ignore-only but could easily add it, if the need arises. I've flagged this as `api-breaking` since users that have lockfiles lingering around will be required to supply `--delete-lockfiles` next time they run.	2020-10-01 11:12:35 +10:00
Michael Sproul	fcf8419c90	Allow truncation of pubkey cache on creation (#1686 ) ## Issue Addressed Closes #1680 ## Proposed Changes This PR fixes a race condition in beacon node start-up whereby the pubkey cache could be created by the beacon chain builder before the `PersistedBeaconChain` was stored to disk. When the node restarted, it would find the persisted chain missing, and attempt to start from scratch, creating a new pubkey cache in the process. This call to `ValidatorPubkeyCache::new` would fail if the file already existed (which it did). I changed the behaviour so that pubkey cache initialization now doesn't care whether there's a file already in existence (it's only a cache after all). Instead it will truncate and recreate the file in the race scenario described.	2020-09-30 04:42:52 +00:00
Age Manning	c0e76d2c15	Version bump and cargo update (#1683 )	2020-09-29 18:29:04 +10:00
Age Manning	13cb642f39	Update boot-node and discovery (#1682 ) * Improve boot_node and upgrade discovery * Clippy lints	2020-09-29 18:28:29 +10:00
blacktemplar	ae28773965	Networking bug fixes (#1684 ) * call correct unsubscribe method for subnets * correctly delegate closed connections in behaviour * correct unsubscribe method name	2020-09-29 18:28:15 +10:00
Paul Hauner	1ef4f0ea12	Add gossip conditions from spec v0.12.3 (#1667 ) ## Issue Addressed NA ## Proposed Changes There are four new conditions introduced in v0.12.3: 1. _[REJECT]_ The attestation's epoch matches its target -- i.e. `attestation.data.target.epoch == compute_epoch_at_slot(attestation.data.slot)` 1. _[REJECT]_ The attestation's target block is an ancestor of the block named in the LMD vote -- i.e. `get_ancestor(store, attestation.data.beacon_block_root, compute_start_slot_at_epoch(attestation.data.target.epoch)) == attestation.data.target.root` 1. _[REJECT]_ The committee index is within the expected range -- i.e. `data.index < get_committee_count_per_slot(state, data.target.epoch)`. 1. _[REJECT]_ The number of aggregation bits matches the committee size -- i.e. `len(attestation.aggregation_bits) == len(get_beacon_committee(state, data.slot, data.index))`. This PR implements new logic to suit (1) and (2). Tests are added for (3) and (4), although they were already implicitly enforced. ## Additional Info - There's a bit of edge-case with target root verification that I raised here: https://github.com/ethereum/eth2.0-specs/pull/2001#issuecomment-699246659 - I've had to add an `--ignore` to `cargo audit` to get CI to pass. See https://github.com/sigp/lighthouse/issues/1669	2020-09-27 20:59:40 +00:00
Paul Hauner	f1180a8947	Prepare for v0.2.12 (#1672 ) ## Issue Addressed NA ## Proposed Changes - Bump versions - Run cargo update ## Additional Info NA	2020-09-26 06:35:45 +00:00
Age Manning	28b6d921c6	Remove banned peers from DHT and track IPs (#1656 ) ## Issue Addressed #629 ## Proposed Changes This removes banned peers from the DHT and informs discovery to block the node_id and the known source IP's associated with this node. It has the capabilities of un banning this peer after a period of time. This also corrects the logic about banning specific IP addresses. We now use seen_ip addresses from libp2p rather than those sent to us via identify (which also include local addresses).	2020-09-25 01:52:39 +00:00
Pawan Dhananjay	15638d1448	Beacon node does not quit on eth1 errors (#1663 ) ## Issue Addressed N/A ## Proposed Changes Log critical errors instead of quitting if eth1 node cannot be reached or is on wrong network id.	2020-09-25 00:43:45 +00:00
divma	b8013b7b2c	Super Silky Smooth Syncs, like a Sir (#1628 ) ## Issue Addressed In principle.. closes #1551 but in general are improvements for performance, maintainability and readability. The logic for the optimistic sync in actually simple ## Proposed Changes There are miscellaneous things here: - Remove unnecessary `BatchProcessResult::Partial` to simplify the batch validation logic - Make batches a state machine. This is done to ensure batch state transitions respect our logic (this was previously done by moving batches between `Vec`s) and to ease the cognitive load of the `SyncingChain` struct - Move most batch-related logic to the batch - Remove `PendingBatches` in favor of a map of peers to their batches. This is to avoid duplicating peers inside the chain (peer_pool and pending_batches) - Add `must_use` decoration to the `ProcessingResult` so that chains that request to be removed are handled accordingly. This also means that chains are now removed in more places than before to account for unhandled cases - Store batches in a sorted map (`BTreeMap`) access is not O(1) but since the number of _active_ batches is bounded this should be fast, and saves performing hashing ops. Batches are indexed by the epoch they start. Sorted, to easily handle chain advancements (range logic) - Produce the chain Id from the identifying fields: target root and target slot. This, to guarantee there can't be duplicated chains and be able to consistently search chains by either Id or checkpoint - Fix chain_id not being present in all chain loggers - Handle mega-edge case where the processor's work queue is full and the batch can't be sent. In this case the chain would lose the blocks, remain in a "syncing" state and waiting for a result that won't arrive, effectively stalling sync. - When a batch imports blocks or the chain starts syncing with a local finalized epoch greater that the chain's start epoch, the chain is advanced instead of reset. This is to avoid losing download progress and validate batches faster. This also means that the old `start_epoch` now means "current first unvalidated batch", so it represents more accurately the progress of the chain. - Batch status peers from the same chain to reduce Arc access. - Handle a couple of cases where the retry counters for a batch were not updated/checked are now handled via the batch state machine. Basically now if we forget to do it, we will know. - Do not send back the blocks from the processor to the batch. Instead register the attempt before sending the blocks (does not count as failed) - When re-requesting a batch, try to avoid not only the last failed peer, but all previous failed peers. - Optimize requesting batches ahead in the buffer by shuffling idle peers just once (this is just addressing a couple of old TODOs in the code) - In chain_collection, store chains by their id in a map - Include a mapping from request_ids to (chain, batch) that requested the batch to avoid the double O(n) search on block responses - Other stuff: - impl `slog::KV` for batches - impl `slog::KV` for syncing chains - PSA: when logging, we can use `%thing` if `thing` implements `Display`. Same for `?` and `Debug` ### Optimistic syncing: Try first the batch that contains the current head, if the batch imports any block, advance the chain. If not, if this optimistic batch is inside the current processing window leave it there for future use, if not drop it. The tolerance for this block is the same for downloading, but just once for processing Co-authored-by: Age Manning <Age@AgeManning.com>	2020-09-23 06:29:55 +00:00
Age Manning	80e52a0263	Subscribe to core topics after sync (#1613 ) ## Issue Addressed N/A ## Proposed Changes Prevent subscribing to core gossipsub topics until after we have achieved a full sync. This prevents us censoring gossipsub channels, getting penalised in gossipsub 1.1 scoring and saves us computation time in attempting to validate gossipsub messages which we will be unable to do with a non-sync'd chain.	2020-09-23 03:26:33 +00:00
Pawan Dhananjay	80ecafaae4	Add `--staking` flag (#1641 ) ## Issue Addressed Closes #1472 ## Proposed Changes Add `--staking` ~~and`staking-with-eth1-endpoint`~~ flag to improve UX for stakers. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2020-09-23 01:19:58 +00:00
realbigsean	b75df29501	minimize the number of places we are calling `update_pubkey_cache` (#1626 ) ## Issue Addressed - Resolves #1080 ## Proposed Changes - Call `update_pubkey_cache` only in the `build_all_caches` method and `get_validator_index` method. ## Additional Info This does reduce the number of places the cache is updated, making it simpler. But the `get_validator_index` method is used a couple times when we are iterating through the entire validator registry (or set of active validators). Before, we would only call `update_pubkey_cache` once before iterating through all validators. So I'm not _totally_ sure this change is worth it.	2020-09-23 01:19:56 +00:00
Pawan Dhananjay	a97ec318c4	Subscribe to subnets an epoch in advance (#1600 ) ## Issue Addressed N/A ## Proposed Changes Subscibe to subnet an epoch in advance of the attestation slot instead of 4 slots in advance.	2020-09-22 07:29:34 +00:00
Paul Hauner	d85d5a435e	Bump to v0.2.11 (#1645 ) ## Issue Addressed NA ## Proposed Changes - Bump version to v0.2.11 - Run `cargo update`. ## Additional Info NA	2020-09-22 04:45:15 +00:00
Paul Hauner	bd39cc8e26	Apply hotfix for inconsistent head (#1639 ) ## Issue Addressed - Resolves #1616 ## Proposed Changes If we look at the function which persists fork choice and the canonical head to disk: `1db8daae0c/beacon_node/beacon_chain/src/beacon_chain.rs (L234-L280)` There is a race-condition which might cause the canonical head and fork choice values to be out-of-sync. I believe this is the cause of #1616. I managed to recreate the issue and produce a database that was unable to sync under the `master` branch but able to sync with this branch. These new changes solve the issue by ignoring the persisted `canonical_head_block_root` value and instead getting fork choice to generate it. This ensures that the canonical head is in-sync with fork choice. ## Additional Info This is hotfix method that leaves some crusty code hanging around. Once this PR is merged (to satisfy the v0.2.x users) we should later update and merge #1638 so we can have a clean fix for the v0.3.x versions.	2020-09-22 02:06:10 +00:00
Pawan Dhananjay	14ff38539c	Add trusted peers (#1640 ) ## Issue Addressed Closes #1581 ## Proposed Changes Adds a new cli option for trusted peers who always have the maximum possible score.	2020-09-22 01:12:36 +00:00
Michael Sproul	5d17eb899f	Update LevelDB to v0.8.6, removing patch (#1636 ) Removes our dependency on a fork of LevelDB now that https://github.com/skade/leveldb-sys/pull/17 is merged	2020-09-21 11:53:53 +00:00
Age Manning	1db8daae0c	Shift metadata to the global network variables (#1631 ) ## Issue Addressed N/A ## Proposed Changes Shifts the local `metadata` to `network_globals` making it accessible to the HTTP API and other areas of lighthouse. ## Additional Info N/A	2020-09-21 02:00:38 +00:00
Pawan Dhananjay	7b97c4ad30	Snappy additional sanity checks (#1625 ) ## Issue Addressed N/A ## Proposed Changes Adds the following check from the spec > A reader SHOULD NOT read more than max_encoded_len(n) bytes after reading the SSZ length-prefix n from the header.	2020-09-21 01:06:25 +00:00
Paul Hauner	371e1c1d5d	Bump version to v0.2.10 (#1630 ) ## Issue Addressed NA ## Proposed Changes Bump crate version so we can cut a new release with the fix from #1629. ## Additional Info NA	2020-09-18 06:41:29 +00:00
Paul Hauner	a17f74896a	Fix bad assumption when checking finalized descendant (#1629 ) ## Issue Addressed - Resolves #1616 ## Proposed Changes Fixes a bug where we are unable to read the finalized block from fork choice. ## Detail I had made an assumption that the finalized block always has a parent root of `None`: `e5fc6bab48/consensus/fork_choice/src/fork_choice.rs (L749-L752)` This was a faulty assumption, we don't set parent roots to `None`. Instead we sometimes set parent indices to `None`, depending if this pruning condition is satisfied: `e5fc6bab48/consensus/proto_array/src/proto_array.rs (L229-L232)` The bug manifested itself like this: 1. We attempt to get the finalized block from fork choice 1. We try to check that the block is descendant of the finalized block (note: they're the same block). 1. We expect the parent root to be `None`, but it's actually the parent root of the finalized root. 1. We therefore end up checking if the parent of the finalized root is a descendant of itself. (note: it's an ancestor not a descendant). 1. We therefore declare that the finalized block is not a descendant of (or eq to) the finalized block. Bad. ## Additional Info In reflection, I made a poor assumption in the quest to obtain a probably negligible performance gain. The performance gain wasn't worth the risk and we got burnt.	2020-09-18 05:14:31 +00:00
Age Manning	49ab414594	Shift gossipsub validation (#1612 ) ## Issue Addressed N/A ## Proposed Changes This will consider all gossipsub messages that have either the `from`, `seqno` or `signature` field as invalid. ## Additional Info We should not merge this until all other clients have been sending empty fields for a while. See https://github.com/ethereum/eth2.0-specs/issues/1981 for reference	2020-09-18 02:05:36 +00:00
Age Manning	2074beccdc	Gossipsub message id to shortened bytes (#1607 ) ## Issue Addressed https://github.com/ethereum/eth2.0-specs/pull/2044 ## Proposed Changes Shifts the gossipsub message id to use the first 8 bytes of the SHA256 hash of the gossipsub message data field. ## Additional Info We should merge this in once the spec has been decided on. It will cause issues with gossipsub scoring and gossipsub propagation rates (as we won't receive IWANT) messages from clients that also haven't made this update.	2020-09-18 02:05:34 +00:00
Age Manning	c9596fcf0e	Temporary Sync Work-Around (#1615 ) ## Issue Addressed #1590 ## Proposed Changes This is a temporary workaround that prevents finalized chain sync from swapping chains. I'm merging this in now until the full solution is ready.	2020-09-13 23:58:49 +00:00
Age Manning	c6abc56113	Prevent large step-size parameters (#1583 ) ## Issue Addressed Malicious users could request very large block ranges, more than we expect. Although technically legal, we are now quadraticaly weighting large step sizes in the filter. Therefore users may request large skips, but not a large number of blocks, to prevent requests forcing us to do long chain lookups. ## Proposed Changes Weight the step parameter in the RPC filter and prevent any overflows that effect us in the step parameter. ## Additional Info	2020-09-11 02:33:36 +00:00
blacktemplar	7f1b936905	ignore too early / too late attestations instead of penalizing them (#1608 ) ## Issue Addressed NA ## Proposed Changes This ignores attestations that are too early or too late as it is specified in the spec (see https://github.com/ethereum/eth2.0-specs/blob/v0.12.1/specs/phase0/p2p-interface.md#global-topics first subpoint of `beacon_aggregate_and_proof`)	2020-09-11 01:43:15 +00:00
Pawan Dhananjay	0525876882	Dial cached enr's before making subnet discovery query (#1376 ) ## Issue Addressed Closes #1365 ## Proposed Changes Dial peers in the `cached_enrs` who aren't connected, aren't banned and satisfy the subnet predicate before making a subnet discovery query.	2020-09-11 00:52:27 +00:00
Age Manning	d79366c503	Prevent printing binary in RPC errors (#1604 ) ## Issue Addressed #1566 ## Proposed Changes Prevents printing binary characters in the RPC error response from peers.	2020-09-10 04:43:22 +00:00
Age Manning	b19cf02d2d	Penalise bad peer behaviour (#1602 ) ## Issue Addressed #1386 ## Proposed Changes Penalises peers in our scoring system that produce invalid attestations or blocks.	2020-09-10 03:51:06 +00:00
Paul Hauner	0821e6b39f	Bump version to v0.2.9 (#1598 ) ## Issue Addressed NA ## Proposed Changes - Bump version tags - Run `cargo update` ## Additional Info NA	2020-09-09 02:28:35 +00:00
Pawan Dhananjay	00cdc4bb35	Update state before producing attestation (#1596 ) ## Issue Addressed Partly addresses #1547 ## Proposed Changes This fix addresses the missing attestations at slot 0 of an epoch (also sometimes slot 1 when slot 0 was skipped). There are 2 cases: 1. BN receives the block for the attestation slot after 4 seconds (1/3rd of the slot). 2. No block is proposed for this slot. In both cases, when we produce the attestation, we pass the head state to the `produce_unaggregated_attestation_for_block` function here `9833eca024/beacon_node/beacon_chain/src/beacon_chain.rs (L845-L850)` Since we don't advance the state in this function, we set `attestation.data.source = state.current_justified_checkpoint` which is atleast 2 epochs lower than current_epoch(wall clock epoch). This attestation is invalid and cannot be included in a block because of this assert from the spec: ```python if data.target.epoch == get_current_epoch(state): assert data.source == state.current_justified_checkpoint state.current_epoch_attestations.append(pending_attestation) ``` https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/beacon-chain.md#attestations This PR changes the `produce_unaggregated_attestation_for_block` function to ensure that it advances the state before producing the attestation at the new epoch. Running this on my node, have missed 0 attestations across all 8 of my validators in a 100 epoch period 🎉 To compare, I was missing ~14 attestations across all 8 validators in the same 100 epoch period before the fix. Will report missed attestations if any after running for another 100 epochs tomorrow.	2020-09-08 11:25:43 +00:00
Daniel Schonfeld	2a9a815f29	conforming to the p2p specs, requiring error_messages to be bound (#1593 ) ## Issue Addressed #1421 ## Proposed Changes Bounding the error_message that can be returned for RPC domain errors Co-authored-by: Age Manning <Age@AgeManning.com>	2020-09-07 06:47:05 +00:00
Age Manning	a6376b4585	Update discv5 to v10 (#1592 ) ## Issue Addressed Code improvements, dependency improvements and better async handling.	2020-09-07 05:53:20 +00:00
Sean	638daa87fe	Avoid Printing Binary String to Logs (#1576 ) Converts the graffiti binary data to string before printing to logs. ## Issue Addressed #1566 ## Proposed Changes Rather than converting graffiti to a vector the binary data less the last character is passed to String::from_utf_lossy(). This then allows us to call the to_string() function directly to give us the string ## Additional Info Rust skills are fairly weak	2020-09-05 05:46:25 +00:00
Age Manning	fb9d828e5e	Extended Gossipsub metrics (#1577 ) ## Issue Addressed N/A ## Proposed Changes Adds extended metrics to get a better idea of what is happening at the gossipsub layer of lighthouse. This provides information about mesh statistics per topics, subscriptions and peer scores. ## Additional Info	2020-09-01 06:59:14 +00:00
Pawan Dhananjay	adea7992f8	Eth1 network exit on wrong network id (#1563 ) ## Issue Addressed Fixes #1509 ## Proposed Changes Exit the beacon node if the eth1 endpoint points to an invalid eth1 network. Check the network id before every eth1 cache update and display an error log if the network id has changed to an invalid one.	2020-08-31 02:36:17 +00:00
blacktemplar	c18d37c202	Use Gossipsub 1.1 (#1516 ) ## Issue Addressed #1172 ## Proposed Changes * updates the libp2p dependency * small adaptions based on changes in libp2p * report not just valid messages but also invalid and distinguish between `IGNORE`d messages and `REJECT`ed messages Co-authored-by: Age Manning <Age@AgeManning.com>	2020-08-30 13:06:50 +00:00
Paul Hauner	967700c1ff	Bump version to v0.2.8 (#1572 ) ## Issue Addressed NA ## Proposed Changes - Bump versions - Run `cargo update` ## Additional Info NA	2020-08-27 07:04:12 +00:00
Adam Szkoda	d9f4819fe0	Alternative (to BeaconChainHarness) BeaconChain testing API (#1380 ) The PR: * Adds the ability to generate a crucial test scenario that isn't possible with `BeaconChainHarness` (i.e. two blocks occupying the same slot; previously forks necessitated skipping slots): ![image](https://user-images.githubusercontent.com/165678/88195404-4bce3580-cc40-11ea-8c08-b48d2e1d5959.png) * New testing API: Instead of repeatedly calling add_block(), you generate a sorted `Vec<Slot>` and leave it up to the framework to generate blocks at those slots. * Jumping backwards to an earlier epoch is a hard error, so that tests necessarily generate blocks in a epoch-by-epoch manner. * Configures the test logger so that output is printed on the console in case a test fails. The logger also plays well with `--nocapture`, contrary to the existing testing framework * Rewrites existing fork pruning tests to use the new API * Adds a tests that triggers finalization at a non epoch boundary slot * Renamed `BeaconChainYoke` to `BeaconChainTestingRig` because the former has been too confusing * Fixed multiple tests (e.g. `block_production_different_shuffling_long`, `delete_blocks_and_states`, `shuffling_compatible_simple_fork`) that relied on a weird (and accidental) feature of the old `BeaconChainHarness` that attestations aren't produced for epochs earlier than the current one, thus masking potential bugs in test cases. Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2020-08-26 09:24:55 +00:00
Michael Sproul	4763f03dcc	Fix bug in database pruning (#1564 ) ## Issue Addressed Closes #1488 ## Proposed Changes * Prevent the pruning algorithm from over-eagerly deleting states at skipped slots when they are shared with the canonical chain. * Add `debug` logging to the pruning algorithm so we have so better chance of debugging future issues from logs. * Modify the handling of the "finalized state" in the beacon chain, so that it's always the state at the first slot of the finalized epoch (previously it was the state at the finalized block). This gives database pruning a clearer and cleaner view of things, and will marginally impact the pruning of the op pool, observed proposers, etc (in ways that are safe as far as I can tell). * Remove duplicated `RevertedFinalizedEpoch` check from `after_finalization` * Delete useless and unused `max_finality_distance` * Add tests that exercise pruning with shared states at skip slots * Delete unnecessary `block_strategy` argument from `add_blocks` and friends in the test harness (will likely conflict with #1380 slightly, sorry @adaszko -- but we can fix that) * Bonus: add a `BeaconChain::with_head` method. I didn't end up needing it, but it turned out quite nice, so I figured we could keep it? ## Additional Info Any users who have experienced pruning errors on Medalla will need to resync after upgrading to a release including this change. This should end unbounded `chain_db` growth! 🎉	2020-08-26 00:01:06 +00:00
Paul Hauner	dfd02d6179	Bump to v0.2.7 (#1561 ) ## Issue Addressed NA ## Proposed Changes - Update to v0.2.7 - Add script to make update easy. ## Additional Info NA	2020-08-24 08:25:34 +00:00
Paul Hauner	3569506acd	Remove rayon from rest_api (#1562 ) ## Issue Addressed NA ## Proposed Changes Addresses a deadlock condition described here: https://hackmd.io/ijQlqOdqSGaWmIo6zMVV-A?view ## Additional Info NA	2020-08-24 07:28:54 +00:00
Paul Hauner	c895dc8971	Shift HTTP server heavy-lifting to blocking executor (#1518 ) ## Issue Addressed NA ## Proposed Changes Shift practically all HTTP endpoint handlers to the blocking executor (some very light tasks are left on the core executor). ## Additional Info This PR covers the `rest_api` which will soon be refactored to suit the standard API. As such, I've cut a few corners and left some existing issues open in this patch. What I have done here should leave the API in state that is not necessary exactly the same, but good enough for us to run validators with. Specifically, the number of blocking workers that can be spawned is unbounded and I have not implemented a queue; this will need to be fixed when we implement the standard API.	2020-08-24 03:06:10 +00:00
blacktemplar	2bc9115a94	reuse beacon_node methods for initializing network configs in boot_node (#1520 ) ## Issue Addressed #1378 ## Proposed Changes Boot node reuses code from beacon_node to initialize network config. This also enables using the network directory to store/load the enr and the private key. ## Additional Info Note that before this PR the port cli arguments were off (the argument was named `enr-port` but used as `boot-node-enr-port`). Therefore as port always the cli port argument was used (for both enr and listening). Now the enr-port argument can be used to overwrite the listening port as the public port others should connect to. Last but not least note, that this restructuring reuses `ethlibp2p::NetworkConfig` that has many more options than the ones used in the boot node. For example the network config has an own `discv5_config` field that gets never used in the boot node and instead another `Discv5Config` gets created later in the boot node process. Co-authored-by: Age Manning <Age@AgeManning.com>	2020-08-21 12:00:01 +00:00
blacktemplar	3f0a113c7f	ban IP addresses if too many banned peers for this IP address (#1543 ) ## Issue Addressed #1283 ## Proposed Changes All peers with the same IP will be considered banned as long as there are more than 5 (constant) peers with this IP that have a score below the ban threshold. As soon as some of those 5 peers get unbanned (through decay) and if there are then less than 5 peers with a score below the threshold the IP will be considered not banned anymore.	2020-08-21 01:41:12 +00:00
Paul Hauner	ebb25b5569	Bump version to v0.2.6 (#1549 ) ## Issue Addressed NA ## Proposed Changes See title. ## Additional Info NA	2020-08-19 09:31:01 +00:00
Pawan Dhananjay	bbed42f30c	Refactor attestation service (#1415 ) ## Issue Addressed N/A ## Proposed Changes Refactor attestation service to send out requests to find peers for subnets as soon as we get attestation duties. Earlier, we had much more involved logic to send the discovery requests to the discovery service only 6 slots before the attestation slot. Now that discovery is much smarter with grouped queries, the complexity in attestation service can be reduced considerably. Co-authored-by: Age Manning <Age@AgeManning.com>	2020-08-19 08:46:25 +00:00
divma	fdc6e2aa8e	Shutdown like a Sir (#1545 ) ## Issue Addressed #1494 ## Proposed Changes - Give the TaskExecutor the sender side of a channel that a task can clone to request shutting down - The receiver side of this channel is in environment and now we block until ctrl+c or an internal shutdown signal is received - The swarm now informs when it has reached 0 listeners - The network receives this message and requests the shutdown	2020-08-19 05:51:14 +00:00
Paul Hauner	8e7dd7b2b1	Add remaining network ops to queuing system (#1546 ) ## Issue Addressed NA ## Proposed Changes - Refactors the `BeaconProcessor` to remove some excessive nesting and file bloat - Sorry about the noise from this, it's all contained in 4d3f8c5 though. - Adds exits, proposer slashings, attester slashings to the `BeaconProcessor` so we don't get overwhelmed with large amounts of slashings (which happened a few hours ago). ## Additional Info NA	2020-08-19 05:09:53 +00:00
Age Manning	33b2a3d0e0	Version bump to v0.2.5 (#1540 ) ## Description Version bumps lighthouse to v0.2.5	2020-08-18 11:23:08 +00:00
Paul Hauner	93b7c3b7ff	Set default max skips to 700 (#1542 ) ## Issue Addressed NA ## Proposed Changes Sets the default max skips to 700 so that it can cover the 693 slot skip from `80894 - 80201`. ## Additional Info NA	2020-08-18 09:27:04 +00:00
Age Manning	2d0b214b57	Clean up logs (#1541 ) ## Description This PR improves some logging for the end-user. It downgrades some warning logs and removes the slots per second sync speed if we are syncing and the speed is 0. This is likely because we are syncing from a finalised checkpoint and the head doesn't change.	2020-08-18 08:11:39 +00:00
Paul Hauner	d4f763bbae	Fix mistake with attestation skip slots (#1539 ) ## Issue Addressed NA ## Proposed Changes - Fixes a mistake I made in #1530 which resulted us in not rejecting attestations that we intended to reject. - Adds skip-slot checks for blocks earlier in import process, so it rejects gossip and RPC blocks. ## Additional Info NA	2020-08-18 06:28:26 +00:00
Age Manning	e1e5002d3c	Fingerprint Lodestar (#1536 ) Fingerprints the Lodestar client	2020-08-18 06:28:24 +00:00
Age Manning	8311074d68	Purge out-dated head chains on chain completion (#1538 ) ## Description There can be many head chains queued up to complete. Currently we try and process all of these to completion before we consider the node synced. In a chaotic network, there can be many of these and processing them to completion can be very expensive and slow. This PR removes any non-syncing head chains from the queue, and re-status's the peers. If, after we have synced to head on one chain, there is still a valid head chain to download, it will be re-established once the status has been returned. This should assist with getting nodes to sync on medalla faster.	2020-08-18 05:22:34 +00:00
Age Manning	3bb30754d9	Keep track of failed head chains and prevent re-lookups (#1534 ) ## Overview There are forked chains which get referenced by blocks and attestations on a network. Typically if these chains are very long, we stop looking up the chain and downvote the peer. In extreme circumstances, many peers are on many chains, the chains can be very deep and become time consuming performing lookups. This PR adds a cache to known failed chain lookups. This prevents us from starting a parent-lookup (or stopping one half way through) if we have attempted the chain lookup in the past.	2020-08-18 03:54:09 +00:00
Age Manning	cc44a64d15	Limit parallelism of head chain sync (#1527 ) ## Description Currently lighthouse load-balances across peers a single finalized chain. The chain is selected via the most peers. Once synced to the latest finalized epoch Lighthouse creates chains amongst its peers and syncs them all in parallel amongst each peer (grouped by their current head block). This is typically fast and relatively efficient under normal operations. However if the chain has not finalized in a long time, the head chains can grow quite long. Peer's head chains will update every slot as new blocks are added to the head. Syncing all head chains in parallel is a bottleneck and highly inefficient in block duplication leads to RPC timeouts when attempting to handle all new heads chains at once. This PR limits the parallelism of head syncing chains to 2. We now sync at most two head chains at a time. This allows for the possiblity of sync progressing alongside a peer being slow and holding up one chain via RPC timeouts.	2020-08-18 02:49:24 +00:00
divma	46dbf027af	Do not reset batch ids & redownload out of range batches (#1528 ) The changes are somewhat simple but should solve two issues: - When quickly changing between chains once and a second time back again, batchIds would collide and cause havoc. - If we got an out of range response from a peer, sync would remain in syncing but without advancing Changes: - remove the batch id. Identify each batch (inside a chain) by its starting epoch. Target epochs for downloading and processing now advance by EPOCHS_PER_BATCH - for the same reason, move the "to_be_downloaded_id" to be an epoch - remove a sneaky line that dropped an out of range batch without downloading it - bonus: put the chain_id in the log given to the chain. This is why explicitly logging the chain_id is removed	2020-08-18 01:29:51 +00:00
Paul Hauner	9a97a0b14f	Prepare for v0.2.4 (#1533 ) ## Issue Addressed NA ## Proposed Changes NA ## Additional Info NA	2020-08-17 12:13:42 +00:00
Michael Sproul	719a69aee0	Ignore blocks that skip a large distance from their parent (#1530 ) ## Proposed Changes To mitigate the impact of minority forks on RAM and disk usage, this change rejects blocks whose parent lies more than 320 slots (10 epochs, ~1 hour) in the past. The behaviour is configurable via `lighthouse bn --max-skip-slots N`, and can be turned off entirely using `--max-skip-slots none`. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2020-08-17 10:54:58 +00:00
Paul Hauner	a58aa6ee55	Revert back to discv5 alpha 8 to maintain ARM support (#1531 ) ## Issue Addressed NA ## Proposed Changes See title. ## Additional Info NA	2020-08-17 10:06:08 +00:00
Paul Hauner	f85485884f	Process gossip blocks on the GossipProcessor (#1523 ) ## Issue Addressed NA ## Proposed Changes Moves beacon block processing over to the newly-added `GossipProcessor`. This moves the task off the core executor onto the blocking one. ## Additional Info - With this PR, gossip blocks are being ignored during sync.	2020-08-17 09:20:27 +00:00
Paul Hauner	61d5b592cb	Memory usage reduction (#1522 ) ## Issue Addressed NA ## Proposed Changes - Adds a new function to allow getting a state with a bad state root history for attestation verification. This reduces unnecessary tree hashing during attestation processing, which accounted for 23% of memory allocations (by bytes) in a recent `heaptrack` observation. - Don't clone caches on intermediate epoch-boundary states during block processing. - Reject blocks that are known to fork choice earlier during gossip processing, instead of waiting until after state has been loaded (this only happens in edge-case). - Avoid multiple re-allocations by creating a "forced" exact size iterator. ## Additional Info NA	2020-08-17 08:05:13 +00:00
Age Manning	3c689a6837	Remove yamux support (#1526 ) ## Issue Addressed There is currently an issue with yamux when connecting to prysm peers. The source of the issue is currently unknown. This PR removes yamux support to force mplex negotation. We can add back yamux support once we have isolated and corrected the issue.	2020-08-17 05:05:06 +00:00
Age Manning	afdc4fea1d	Correct logic for peer sync identification (#1525 ) Fix a small sync bug which can mis-classify newly connected peers.	2020-08-17 03:00:10 +00:00
Pawan Dhananjay	850a2d5985	Persist metadata and enr across restarts (#1513 ) ## Issue Addressed Resolves #1489 ## Proposed Changes - Change starting metadata seq num to 0 according to the [spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md#metadata). - Remove metadata field from `NetworkGlobals` - Persist metadata to disk on every update - Load metadata seq number from disk on restart - Persist enr to disk on update to ensure enr sequence number increments are persisted as well. ## Additional info Since we modified starting metadata seq num to 0 from 1, we might still see `Invalid Sequence number provided` like in #1489 from prysm nodes if they have our metadata cached.	2020-08-17 02:13:28 +00:00
divma	113b40f321	Add multiaddr support in bootnodes (#1481 ) ## Issue Addressed #1384 Only catch, as currently implemented, when dialing the multiaddr nodes, there is no way to ask the peer manager if they are already connected or dialing	2020-08-17 02:13:26 +00:00
Age Manning	99acfb50f2	Update gossipsub duplicate cache (#1524 ) This potentially handles memory leak issues by preventing adding references to already seen gossipsub messages.	2020-08-17 01:27:33 +00:00
Age Manning	c75c06cf16	Update discv5 to alpha.9 (#1517 ) ## Discovery v5 update In this update we remove the openssl dependency in favour of rust-crypto. The update also removes a series of unnecessary async functions which may improve some of the issues we have been experiencing.	2020-08-15 04:02:14 +00:00
Paul Hauner	f4a7311008	Update to v0.2.3 (#1519 ) ## Issue Addressed NA ## Proposed Changes Bump versions to v0.2.3. ## Additional Info NA	2020-08-14 08:32:31 +00:00
Paul Hauner	619ad106cf	Restrict fork choice getters to finalized blocks (#1475 ) ## Issue Addressed - Resolves #1451 ## Proposed Changes - Restricts the `contains_block` and `contains_block` so they only indicate a block is present if it descends from the finalized root. This helps to ensure that fork choice never points to a block that has been pruned from the database. - Resolves #1451 - Before importing a block, double-check that its parent is known and a descendant of the finalized root. - Split a big, monolithic block verification test into smaller tests. ## Additional Notes I suspect there would be a craftier way to do the `is_descendant_of_finalized` check, but we're a bit tight on time now and we can optimize later if it starts showing in benches. ## TODO - [x] Tests	2020-08-14 06:36:38 +00:00
Paul Hauner	b0a3731fff	Introduce a queue for attestations from the network (#1511 ) ## Issue Addressed N/A ## Proposed Changes Introduces the `GossipProcessor`, a multi-threaded (multi-tasked?), non-blocking processor for some messages from the network which require verification and import into the `BeaconChain`. Initial testing indicates that this massively improves system stability by (a) moving block tasks from the normal executor (b) spreading out attestation load. ## Additional Info TBC	2020-08-14 04:38:45 +00:00
Adam Szkoda	05a8399769	Wind down the SSE thread when the client disconnects (#1514 ) These started to appear when I `^C` `curl -N http://localhost:5052/beacon/fork/stream`: `Aug 12 13:00:01.539 ERRO Couldn't stream piece hyper::Error(ChannelClosed), service: http` Something must have changed in hyper since SSE has been implemented because I'm sure I haven't seen those errors before. This PR properly detects a closed SSE stream and cleans up.	2020-08-13 06:12:18 +00:00
Adam Szkoda	8a1a4051cf	Fix a bug in fork pruning (#1507 ) Extracted from https://github.com/sigp/lighthouse/pull/1380 because merging #1380 proves to be contentious. Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2020-08-12 07:00:00 +00:00
Paul Hauner	b063df5bf9	Cross-compile to vendored x86_84, aarch64 (Raspberry Pi 4) (#1497 ) ## Issue Addressed NA ## Proposed Changes Adds support for using the [`cross`](https://github.com/rust-embedded/cross) project to produce cross-compiled binaries using Docker images. Provides quite clean and simple cross-compiles cause all the complexity is hidden in Dockerfiles. It does require you to be in the `docker` group though. ## Details - Adds shortcut commands to `Makefile` - Ensures `reqwest` and `discv5` use vendored openssl libs (i.e., static not shared). - Switches to a [commit](`284f705964`) of blst that has a renamed C function to avoid a collision with openssl (upstream issue: https://github.com/supranational/blst/issues/21). - Updates `ring` to the latest satisfiable version, since an earlier version was causing issues with `cross`. - Off-topic, but adds extra message about Windows support as suggested by Discord user. ## Additional Info - ~~Blocked on #1495~~ - There are no tests in CI for this yet for a few reasons: - I'm hesitant to add more long-running tasks. - Short-term bitrot should be avoided since we'll use it each release. - In the long term I think it would be good to automate binary creation on a release. - I observed the binaries increase in size from 50mb to 52mb after these changes.	2020-08-11 05:16:30 +00:00
divma	1a67d15701	Mitigate too many outgoing connections (#1469 ) limit simultaneous outgoing connections attempts to a reasonable top as an extra layer of protection also shift the keep alive logic of the rpc handler to avoid needing to update it by hand. I think In rare cases this could make shutting down a connection a bit faster.	2020-08-11 02:16:31 +00:00
realbigsean	ec84183e05	Add graffiti cli flag to the validator client. (#1425 ) ## Issue Addressed #1419 ## Proposed Changes Creates a `--graffiti` cli flag in the validator client. If the flag is set, it overrides graffiti in the beacon node. ## Additional Info	2020-08-11 02:16:29 +00:00
divma	95b55d7170	Block error display (#1503 ) ## Issue Addressed #1486	2020-08-11 01:30:26 +00:00

... 4 5 6 7 8 ...

1810 Commits