lighthouse

Author	SHA1	Message	Date
Paul Hauner	9656ffee7c	Metrics for sync aggregate fullness (#2439 ) ## Issue Addressed NA ## Proposed Changes Adds a metric to see how many set bits are in the sync aggregate for each beacon block being imported. ## Additional Info NA	2021-07-13 02:22:55 +00:00
Paul Hauner	27aec1962c	Add more detail to "Prior attestation known" log (#2447 ) ## Issue Addressed NA ## Proposed Changes Adds more detail to the log when an attestation is ignored due to a prior one being known. This will help identify which validators are causing the issue. ## Additional Info NA	2021-07-13 01:02:03 +00:00
Paul Hauner	a7b7134abb	Return more detail when invalid data is found in the DB during startup (#2445 ) ## Issue Addressed - Resolves #2444 ## Proposed Changes Adds some more detail to the error message returned when the `BeaconChainBuilder` is unable to access or decode block/state objects during startup. ## Additional Info NA	2021-07-12 07:31:27 +00:00
Michael Sproul	371c216ac3	Use read_recursive locks in database (#2417 ) ## Issue Addressed Closes #2245 ## Proposed Changes Replace all calls to `RwLock::read` in the `store` crate with `RwLock::read_recursive`. ## Additional Info * Unfortunately we can't run the deadlock detector on CI because it's pinned to an old Rust 1.51.0 nightly which cannot compile Lighthouse (one of our deps uses `ptr::addr_of!` which is too new). A fun side-project at some point might be to update the deadlock detector. * The reason I think we haven't seen this deadlock (at all?) in practice is that _writes_ to the database's split point are quite infrequent, and a concurrent write is required to trigger the deadlock. The split point is only written when finalization advances, which is once per epoch (every ~6 minutes), and state reads are also quite sporadic. Perhaps we've just been incredibly lucky, or there's something about the timing of state reads vs database migration that protects us. * I wrote a few small programs to demo the deadlock, and the effectiveness of the `read_recursive` fix: https://github.com/michaelsproul/relock_deadlock_mvp * [The docs for `read_recursive`](https://docs.rs/lock_api/0.4.2/lock_api/struct.RwLock.html#method.read_recursive) warn of starvation for writers. I think in order for starvation to occur the database would have to be spammed with so many state reads that it's unable to ever clear them all and find time for a write, in which case migration of states to the freezer would cease. If an attack could be performed to trigger this starvation then it would likely trigger a deadlock in the current code, and I think ceasing migration is preferable to deadlocking in this extreme situation. In practice neither should occur due to protection from spammy peers at the network layer. Nevertheless, it would be prudent to run this change on the testnet nodes to check that it doesn't cause accidental starvation.	2021-07-12 07:31:26 +00:00
Mac L	b3c7e59a5b	Adjust beacon node timeouts for validator client HTTP requests (#2352 ) ## Issue Addressed Resolves #2313 ## Proposed Changes Provide `BeaconNodeHttpClient` with a dedicated `Timeouts` struct. This will allow granular adjustment of the timeout duration for different calls made from the VC to the BN. These can either be a constant value, or as a ratio of the slot duration. Improve timeout performance by using these adjusted timeout duration's only whenever a fallback endpoint is available. Add a CLI flag called `use-long-timeouts` to revert to the old behavior. ## Additional Info Additionally set the default `BeaconNodeHttpClient` timeouts to the be the slot duration of the network, rather than a constant 12 seconds. This will allow it to adjust to different network specifications. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2021-07-12 01:47:48 +00:00
Michael Sproul	b4689e20c6	Altair consensus changes and refactors (#2279 ) ## Proposed Changes Implement the consensus changes necessary for the upcoming Altair hard fork. ## Additional Info This is quite a heavy refactor, with pivotal types like the `BeaconState` and `BeaconBlock` changing from structs to enums. This ripples through the whole codebase with field accesses changing to methods, e.g. `state.slot` => `state.slot()`. Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-07-09 06:15:32 +00:00
Paul Hauner	78e5c0c157	Capture a missed VC error (#2436 ) ## Issue Addressed Related to #2430, #2394 ## Proposed Changes As per https://github.com/sigp/lighthouse/issues/2430#issuecomment-875323615, ensure that the `ProductionValidatorClient::new` error raises a log and shuts down the VC. Also, I implemened `spawn_ignoring_error`, as per @michaelsproul's suggestion in https://github.com/sigp/lighthouse/pull/2436#issuecomment-876084419. I got unlucky and CI picked up a [new rustsec vuln](https://rustsec.org/advisories/RUSTSEC-2021-0072). To fix this, I had to update the following crates: - `tokio` - `web3` - `tokio-compat-02` ## Additional Info NA	2021-07-09 03:20:24 +00:00
Mac L	406e3921d9	Use forwards iterator for state root lookups (#2422 ) ## Issue Addressed #2377 ## Proposed Changes Implement the same code used for block root lookups (from #2376) to state root lookups in order to improve performance and reduce associated memory spikes (e.g. from certain HTTP API requests). ## Additional Changes - Tests using `rev_iter_state_roots` and `rev_iter_block_roots` have been refactored to use their `forwards` versions instead. - The `rev_iter_state_roots` and `rev_iter_block_roots` functions are now unused and have been removed. - The `state_at_slot` function has been changed to use the `forwards` iterator. ## Additional Info - Some tests still need to be refactored to use their `forwards_iter` versions. These tests start their iteration from a specific beacon state and thus use the `rev_iter_state_roots_from` and `rev_iter_block_roots_from` functions. If they can be refactored, those functions can also be removed.	2021-07-06 02:38:53 +00:00
Age Manning	73d002ef92	Update outdated dependencies (#2425 ) This updates some older dependencies to address a few cargo audit warnings. The majority of warnings come from network dependencies which will be addressed in #2389. This PR contains some minor dep updates that are not network related. Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2021-07-05 00:54:17 +00:00
realbigsean	b84ff9f793	rust 1.53.0 updates (#2411 ) ## Issue Addressed `make lint` failing on rust 1.53.0. ## Proposed Changes 1.53.0 updates ## Additional Info I haven't figure out why yet, we were now hitting the recursion limit in a few crates. So I had to add `#![recursion_limit = "256"]` in a few places Co-authored-by: realbigsean <seananderson33@gmail.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2021-06-18 05:58:01 +00:00
Michael Sproul	3dc1eb5eb6	Ignore inactive validators in validator monitor (#2396 ) ## Proposed Changes A user on Discord (`@ChewsMacRibs`) reported that the validator monitor was logging `WARN Attested to an incorrect head` for their validator while it was awaiting activation. This PR modifies the monitor so that it ignores inactive validators, by the logic that they are either awaiting activation, or have already exited. Either way, there's no way for an inactive validator to have their attestations included on chain, so no need for the monitor to report on them. ## Additional Info To reproduce the bug requires registering validator keys manually with `--validator-monitor-pubkeys`. I don't think the bug will present itself with `--validator-monitor-auto`.	2021-06-17 02:10:48 +00:00
Jack	98ab00cc52	Handle Geth pre-EIP-155 block sync error condition (#2304 ) ## Issue Addressed #2293 ## Proposed Changes - Modify the handler for the `eth_chainId` RPC (i.e., `get_chain_id`) to explicitly match against the Geth error string returned for pre-EIP-155 synced Geth nodes - ~~Add a new helper function, `rpc_error_msg`, to aid in the above point~~ - Refactor `response_result` into `response_result_or_error` and patch reliant RPC handlers accordingly (thanks to @pawanjay176) ## Additional Info Geth, as of Pangaea Expanse (v1.10.0), returns an explicit error when it is not synced past the EIP-155 block (2675000). Previously, Geth simply returned a chain ID of 0 (which was obviously much easier to handle on Lighthouse's part). Co-authored-by: Paul Hauner <paul@paulhauner.com>	2021-06-17 02:10:47 +00:00
realbigsean	b1657a60e9	Reorg events (#2090 ) ## Issue Addressed Resolves #2088 ## Proposed Changes Add the `chain_reorg` SSE event topic ## Additional Info Co-authored-by: realbigsean <seananderson33@gmail.com> Co-authored-by: Paul Hauner <paul@paulhauner.com>	2021-06-17 02:10:46 +00:00
divma	3261eff0bf	split outbound and inbound codecs encoded types (#2410 ) Splits the inbound and outbound requests, for maintainability.	2021-06-17 00:40:16 +00:00
Paul Hauner	3b600acdc5	v1.4.0 (#2402 ) ## Issue Addressed NA ## Proposed Changes - Bump versions and update `Cargo.lock` ## Additional Info NA ## TODO - [x] Ensure #2398 gets merged succesfully	2021-06-10 01:44:49 +00:00
Paul Hauner	93100f221f	Make less logs for attn with unknown head (#2395 ) ## Issue Addressed NA ## Proposed Changes I am starting to see a lot of slog-async overflows (i.e., too many logs) on Prater whenever we see attestations for an unknown block. Since these logs are identical (except for peer id) and we expose volume/count of these errors via `metrics::GOSSIP_ATTESTATION_ERRORS_PER_TYPE`, I took the following actions to remove them from `DEBUG` logs: - Push the "Attestation for unknown block" log to trace. - Add a debug log in `search_for_block`. In effect, this should serve as a de-duped version of the previous, downgraded log. ## Additional Info TBC	2021-06-07 02:34:09 +00:00
Pawan Dhananjay	502402c6b9	Fix options for `--eth1-endpoints` flag (#2392 ) ## Issue Addressed N/A ## Proposed Changes Set `config.sync_eth1_chain` to true when using just the `--eth1-endpoints` flag (without `--eth1`).	2021-06-04 00:10:59 +00:00
Paul Hauner	f6280aa663	v1.4.0-rc.0 (#2379 ) ## Issue Addressed NA ## Proposed Changes Bump versions. ## Additional Info This is not exactly the v1.4.0 release described in [Lighthouse Update #36](https://lighthouse.sigmaprime.io/update-36.html). Whilst it contains: - Beta Windows support - A reduction in Eth1 queries - A reduction in memory footprint It does not contain: - Altair - Doppelganger Protection - The remote signer We have decided to release some features early. This is primarily due to the desire to allow users to benefit from the memory saving improvements as soon as possible. ## TODO - [x] Wait for #2340, #2356 and #2376 to merge and then rebase on `unstable`. - [x] Ensure discovery issues are fixed (see #2388) - [x] Ensure https://github.com/sigp/lighthouse/pull/2382 is merged/removed. - [x] Ensure https://github.com/sigp/lighthouse/pull/2383 is merged/removed. - [x] Ensure https://github.com/sigp/lighthouse/pull/2384 is merged/removed. - [ ] Double-check eth1 cache is carried between boots	2021-06-03 00:13:02 +00:00
Paul Hauner	90ea075c62	Revert "Network protocol upgrades (#2345 )" (#2388 ) ## Issue Addressed NA ## Proposed Changes Reverts #2345 in the interests of getting v1.4.0 out this week. Once we have released that, we can go back to testing this again. ## Additional Info NA	2021-06-02 01:07:28 +00:00
Paul Hauner	d34f922c1d	Add early check for RPC block relevancy (#2289 ) ## Issue Addressed NA ## Proposed Changes When observing `jemallocator` heap profiles and Grafana, it became clear that Lighthouse is spending significant RAM/CPU on processing blocks from the RPC. On investigation, it seems that we are loading the parent of the block before we check to see if the block is already known. This is a big waste of resources. This PR adds an additional `check_block_relevancy` call as the first thing we do when we try to process a `SignedBeaconBlock` via the RPC (or other similar methods). Ultimately, `check_block_relevancy` will be called again later in the block processing flow. It's a very light function and I don't think trying to optimize it out is worth the risk of a bad block slipping through. Also adds a `New RPC block received` info log when we process a new RPC block. This seems like interesting and infrequent info. ## Additional Info NA	2021-06-02 01:07:27 +00:00
Paul Hauner	bf4e02e2cc	Return a specific error for frozen attn states (#2384 ) ## Issue Addressed NA ## Proposed Changes Return a very specific error when at attestation reads shuffling from a frozen `BeaconState`. Previously, this was returning `MissingBeaconState` which indicates a much more serious issue. ## Additional Info Since `get_inconsistent_state_for_attestation_verification_only` is only called once in `BeaconChain::with_committee_cache`, it is quite easy to reason about the impact of this change.	2021-06-01 06:59:43 +00:00
Paul Hauner	ba9c4c5eea	Return more detail in Eth1 HTTP errors (#2383 ) ## Issue Addressed NA ## Proposed Changes Whilst investigating #2372, I [learned](https://github.com/sigp/lighthouse/issues/2372#issuecomment-851725049) that the error message returned from some failed Eth1 requests are always `NotReachable`. This makes debugging quite painful. This PR adds more detail to these errors. For example: - Bad infura key: `ERRO Failed to update eth1 cache error: Failed to update Eth1 service: "All fallback errored: https://mainnet.infura.io/ => EndpointError(RequestFailed(\"Response HTTP status was not 200 OK: 401 Unauthorized.\"))", retry_millis: 60000, service: eth1_rpc` - Unreachable server: `ERRO Failed to update eth1 cache error: Failed to update Eth1 service: "All fallback errored: http://127.0.0.1:8545/ => EndpointError(RequestFailed(\"Request failed: reqwest::Error { kind: Request, url: Url { scheme: \\\"http\\\", cannot_be_a_base: false, username: \\\"\\\", password: None, host: Some(Ipv4(127.0.0.1)), port: Some(8545), path: \\\"/\\\", query: None, fragment: None }, source: hyper::Error(Connect, ConnectError(\\\"tcp connect error\\\", Os { code: 111, kind: ConnectionRefused, message: \\\"Connection refused\\\" })) }\"))", retry_millis: 60000, service: eth1_rpc` - Bad server: `ERRO Failed to update eth1 cache error: Failed to update Eth1 service: "All fallback errored: http://127.0.0.1:8545/ => EndpointError(RequestFailed(\"Response HTTP status was not 200 OK: 501 Not Implemented.\"))", retry_millis: 60000, service: eth1_rpc` ## Additional Info NA	2021-06-01 06:59:41 +00:00
Paul Hauner	4c7bb4984c	Use the forwards iterator more often (#2376 ) ## Issue Addressed NA ## Primary Change When investigating memory usage, I noticed that retrieving a block from an early slot (e.g., slot 900) would cause a sharp increase in the memory footprint (from 400mb to 800mb+) which seemed to be ever-lasting. After some investigation, I found that the reverse iteration from the head back to that slot was the likely culprit. To counter this, I've switched the `BeaconChain::block_root_at_slot` to use the forwards iterator, instead of the reverse one. I also noticed that the networking stack is using `BeaconChain::root_at_slot` to check if a peer is relevant (`check_peer_relevance`). Perhaps the steep, seemingly-random-but-consistent increases in memory usage are caused by the use of this function. Using the forwards iterator with the HTTP API alleviated the sharp increases in memory usage. It also made the response much faster (before it felt like to took 1-2s, now it feels instant). ## Additional Changes In the process I also noticed that we have two functions for getting block roots: - `BeaconChain::block_root_at_slot`: returns `None` for a skip slot. - `BeaconChain::root_at_slot`: returns the previous root for a skip slot. I unified these two functions into `block_root_at_slot` and added the `WhenSlotSkipped` enum. Now, the caller must be explicit about the skip-slot behaviour when requesting a root. Additionally, I replaced `vec![]` with `Vec::with_capacity` in `store::chunked_vector::range_query`. I stumbled across this whilst debugging and made this modification to see what effect it would have (not much). It seems like a decent change to keep around, but I'm not concerned either way. Also, `BeaconChain::get_ancestor_block_root` is unused, so I got rid of it 🗑️. ## Additional Info I haven't also done the same for state roots here. Whilst it's possible and a good idea, it's more work since the fwds iterators are presently block-roots-specific. Whilst there's a few places a reverse iteration of state roots could be triggered (e.g., attestation production, HTTP API), they're no where near as common as the `check_peer_relevance` call. As such, I think we should get this PR merged first, then come back for the state root iters. I made an issue here https://github.com/sigp/lighthouse/issues/2377.	2021-05-31 04:18:20 +00:00
Kevin Lu	320a683e72	Minimum Outbound-Only Peers Requirement (#2356 ) ## Issue Addressed #2325 ## Proposed Changes This pull request changes the behavior of the Peer Manager by including a minimum outbound-only peers requirement. The peer manager will continue querying for peers if this outbound-only target number hasn't been met. Additionally, when peers are being removed, an outbound-only peer will not be disconnected if doing so brings us below the minimum. ## Additional Info Unit test for heartbeat function tests that disconnection behavior is correct. Continual querying for peers if outbound-only hasn't been met is not directly tested, but indirectly through unit testing of the helper function that counts the number of outbound-only peers. EDIT: Am concerned about the behavior of ```update_peer_scores```. If we have connected to a peer with a score below the disconnection threshold (-20), then its connection status will remain connected, while its score state will change to disconnected. ```rust let previous_state = info.score_state(); // Update scores info.score_update(); Self::handle_score_transitions( previous_state, peer_id, info, &mut to_ban_peers, &mut to_unban_peers, &mut self.events, &self.log, ); ``` ```previous_state``` will be set to Disconnected, and then because ```handle_score_transitions``` only changes connection status for a peer if the state changed, the peer remains connected. Then in the heartbeat code, because we only disconnect healthy peers if we have too many peers, these peers don't get disconnected. I'm not sure realistically how often this scenario would occur, but it might be better to adjust the logic to account for scenarios where the score state implies a connection status different from the current connection status. Co-authored-by: Kevin Lu <kevlu93@gmail.com>	2021-05-31 04:18:19 +00:00
Mac L	0847986936	Reduce outbound requests to eth1 endpoints (#2340 ) ## Issue Addressed #2282 ## Proposed Changes Reduce the outbound requests made to eth1 endpoints by caching the results from `eth_chainId` and `net_version`. Further reduce the overall request count by increasing `auto_update_interval_millis` from `7_000` (7 seconds) to `60_000` (1 minute). This will result in a reduction from ~2000 requests per hour to 360 requests per hour (during normal operation). A reduction of 82%. ## Additional Info If an endpoint fails, its state is dropped from the cache and the `eth_chainId` and `net_version` calls will be made for that endpoint again during the regular update cycle (once per minute) until it is back online. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2021-05-31 04:18:18 +00:00
Age Manning	ec5cceba50	Correct issue with dialing peers (#2375 ) The ordering of adding new peers to the peerdb and deciding when to dial them was not considered in a previous update. This adds the condition that if a peer is not in the peer-db then it is an acceptable peer to dial. This makes #2374 obsolete.	2021-05-29 07:25:06 +00:00
Age Manning	d12e746b50	Network protocol upgrades (#2345 ) This provides a number of upgrades to gossipsub and discovery. The updates are extensive and this needs thorough testing.	2021-05-28 22:02:10 +00:00
Paul Hauner	456b313665	Tune GNU malloc (#2299 ) ## Issue Addressed NA ## Proposed Changes Modify the configuration of [GNU malloc](https://www.gnu.org/software/libc/manual/html_node/The-GNU-Allocator.html) to reduce memory footprint. - Set `M_ARENA_MAX` to 4. - This reduces memory fragmentation at the cost of contention between threads. - Set `M_MMAP_THRESHOLD` to 2mb - This means that any allocation >= 2mb is allocated via an anonymous mmap, instead of on the heap/arena. This reduces memory fragmentation since we don't need to keep growing the heap to find big contiguous slabs of free memory. - ~~Run `malloc_trim` every 60 seconds.~~ - ~~This shaves unused memory from the top of the heap, preventing the heap from constantly growing.~~ - Removed, see: https://github.com/sigp/lighthouse/pull/2299#issuecomment-825322646 Note: this only provides memory savings on the Linux (glibc) platform. ## Additional Info I'm going to close #2288 in favor of this for the following reasons: - I've managed to get the memory footprint smaller here than with jemalloc. - This PR seems to be less of a dramatic change than bringing in the jemalloc dep. - The changes in this PR are strictly runtime changes, so we can create CLI flags which disable them completely. Since this change is wide-reaching and complex, it's nice to have an easy "escape hatch" if there are undesired consequences. ## TODO - [x] Allow configuration via CLI flags - [x] Test on Mac - [x] Test on RasPi. - [x] Determine if GNU malloc is present? - I'm not quite sure how to detect for glibc.. This issue suggests we can't really: https://github.com/rust-lang/rust/issues/33244 - [x] Make a clear argument regarding the affect of this on CPU utilization. - [x] Test with higher `M_ARENA_MAX` values. - [x] Test with longer trim intervals - [x] Add some stats about memory savings - [x] Remove `malloc_trim` calls & code	2021-05-28 05:59:45 +00:00
Pawan Dhananjay	fdaeec631b	Monitoring service api (#2251 ) ## Issue Addressed N/A ## Proposed Changes Adds a client side api for collecting system and process metrics and pushing it to a monitoring service.	2021-05-26 05:58:41 +00:00
Age Manning	55aada006f	More stringent dialing (#2363 ) * More stringent dialing * Cover cached enr dialing	2021-05-26 14:21:44 +10:00
ethDreamer	ba55e140ae	Enable Compatibility with Windows (#2333 ) ## Issue Addressed Windows incompatibility. ## Proposed Changes On windows, lighthouse needs to default to STDIN as tty doesn't exist. Also Windows uses ACLs for file permissions. So to mirror chmod 600, we will remove every entry in a file's ACL and add only a single SID that is an alias for the file owner. Beyond that, there were several changes made to different unit tests because windows has slightly different error messages as well as frustrating nuances around killing a process :/ ## Additional Info Tested on my Windows VM and it appears to work, also compiled & tested on Linux with these changes. Permissions look correct on both platforms now. Just waiting for my validator to activate on Prater so I can test running full validator client on windows. Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com> Co-authored-by: Michael Sproul <micsproul@gmail.com>	2021-05-19 23:05:16 +00:00
ethDreamer	cb47388ad7	Updated to comply with new clippy formatting rules (#2336 ) ## Issue Addressed The latest version of Rust has new clippy rules & the codebase isn't up to date with them. ## Proposed Changes Small formatting changes that clippy tells me are functionally equivalent	2021-05-10 00:53:09 +00:00
Mac L	bacc38c3da	Add testing for beacon node and validator client CLI flags (#2311 ) ## Issue Addressed N/A ## Proposed Changes Add unit tests for the various CLI flags associated with the beacon node and validator client. These changes require the addition of two new flags: `dump-config` and `immediate-shutdown`. ## Additional Info Both `dump-config` and `immediate-shutdown` are marked as hidden since they should only be used in testing and other advanced use cases. Note: This requires changing `main.rs` so that the flags can adjust the program behavior as necessary. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2021-05-06 00:36:22 +00:00
Mac L	4cc613d644	Add `SensitiveUrl` to redact user secrets from endpoints (#2326 ) ## Issue Addressed #2276 ## Proposed Changes Add the `SensitiveUrl` struct which wraps `Url` and implements custom `Display` and `Debug` traits to redact user secrets from being logged in eth1 endpoints, beacon node endpoints and metrics. ## Additional Info This also includes a small rewrite of the eth1 crate to make requests using `Url` instead of `&str`. Some error messages have also been changed to remove `Url` data.	2021-05-04 01:59:51 +00:00
ethDreamer	0aa8509525	Filter Disconnected Peers from Discv5 DHT (#2219 ) ## Issue Addressed #2107 ## Proposed Change The peer manager will mark peers as disconnected in the discv5 DHT when they disconnect or dial fails ## Additional Info Rationale for this particular change is explained in my comment on #2107	2021-04-28 04:07:37 +00:00
realbigsean	2c2c443718	404's on API requests for slots that have been skipped or orphaned (#2272 ) ## Issue Addressed Resolves #2186 ## Proposed Changes 404 for any block-related information on a slot that was skipped or orphaned Affected endpoints: - `/eth/v1/beacon/blocks/{block_id}` - `/eth/v1/beacon/blocks/{block_id}/root` - `/eth/v1/beacon/blocks/{block_id}/attestations` - `/eth/v1/beacon/headers/{block_id}` ## Additional Info Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-04-25 03:59:59 +00:00
Paul Hauner	3a24ca5f14	v1.3.0 (#2310 ) ## Issue Addressed NA ## Proposed Changes Bump versions. ## Additional Info This is a minor release (not patch) due to the very slight change introduced by #2291.	2021-04-13 22:46:34 +00:00
Michael Sproul	3b901dc5ec	Pack attestations into blocks in parallel (#2307 ) ## Proposed Changes Use two instances of max cover when packing attestations into blocks: one for the previous epoch, and one for the current epoch. This reduces the amount of computation done by roughly half due to the `O(n^2)` running time of max cover (`2 * (n/2)^2 = n^2/2`). This should help alleviate some load on block proposal, particularly on Prater.	2021-04-13 05:27:42 +00:00
Paul Hauner	c1203f5e52	Add specific log and metric for delayed blocks (#2308 ) ## Issue Addressed NA ## Proposed Changes - Adds a specific log and metric for when a block is enshrined as head with a delay that will caused bad attestations - We technically already expose this information, but it's a little tricky to determine during debugging. This makes it nice and explicit. - Fixes a minor reporting bug with the validator monitor where it was expecting agg. attestations too early (at half-slot rather than two-thirds-slot). ## Additional Info NA	2021-04-13 02:16:59 +00:00
Paul Hauner	0df7be1814	Add check for aggregate target (#2306 ) ## Issue Addressed NA ## Proposed Changes - Ensure that the [target consistency check](`b356f52c5c`) is always performed on aggregates. - Add a regression test. ## Additional Info NA	2021-04-13 00:24:39 +00:00
Age Manning	aaa14073ff	Clean up warnings (#2240 ) This is a small PR that cleans up compiler warnings. The most controversial change is removing the `data_dir` field from the `BeaconChainBuilder`. It was removed because it was never read. Co-authored-by: Paul Hauner <paul@paulhauner.com> Co-authored-by: Herman Junge <hermanjunge@protonmail.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2021-04-12 00:57:43 +00:00
Mac L	f6f64cf0f5	Correcting `disable-enr-auto-update` flag definition (#2303 ) ## Issue Addressed N/A ## Proposed Changes Correct the `disable-enr-auto-update` boolean flag so that it no longer requires a value. Previously it would require a value which was never used. ## Additional Info Flag is read here: https://github.com/sigp/lighthouse/blob/unstable/beacon_node/src/config.rs#L585-L587	2021-04-11 23:52:29 +00:00
Paul Hauner	e7e5878953	Avoid BeaconState clone during metrics scrape (#2298 ) ## Issue Addressed Which issue # does this PR address? ## Proposed Changes Avoids cloning the `BeaconState` each time Prometheus scrapes our metrics (generally every 5s 😱). I think the original motivation behind this was "don't hold the lock on the head whilst we do computation on it", however I think is flawed since our computation here is so small that it'll be quicker than the clone. The primary motivation here is to maintain a small memory footprint by holding less in memory (i.e., the cloned `BeaconState`) and to avoid the fragmentation-creep that occurs when cloning the big contiguous slabs of memory in the `BeaconState`. I also collapsed the active/slashed/withdrawn counters into a single loop to increase efficiency. ## Additional Info NA	2021-04-07 01:02:56 +00:00
Pawan Dhananjay	95a362213d	Fix local testnet scripts (#2229 ) ## Issue Addressed Resolves #2094 ## Proposed Changes Fixes scripts for creating local testnets. Adds an option in `lighthouse boot_node` to run with a previously generated enr.	2021-03-30 05:17:58 +00:00
Paul Hauner	9eb1945136	v1.2.2 (#2287 ) ## Issue Addressed NA ## Proposed Changes - Bump versions ## Additional Info NA	2021-03-30 04:07:03 +00:00
Paul Hauner	3d239b85ac	Allow for a clock disparity on the duties endpoints (#2283 ) ## Issue Addressed Resolves #2280 ## Proposed Changes Allows for API consumers to call the proposer/attester duties endpoints [`MAXIMUM_GOSSIP_CLOCK_DISPARITY`](`b34a79dc0b/beacon_node/beacon_chain/src/beacon_chain.rs (L99-L102)`) earlier than the current epoch. For additional reasoning, see https://github.com/sigp/lighthouse/issues/2280#issuecomment-805358897. ## Additional Info NA	2021-03-29 23:42:35 +00:00
Paul Hauner	03cefd0065	Expand observed attestations capacity (#2266 ) ## Issue Addressed NA ## Proposed Changes I noticed the following error on one of our nodes: ``` Mar 18 00:03:35 ip-xxxx lighthouse-bn[333503]: Mar 18 00:03:35.103 ERRO Unable to validate aggregate error: ObservedAttestersError(EpochTooLow { epoch: Epoch(23961), lowest_permissible_epoch: Epoch(23962) }), peer_id: 16Uiu2HAm5GL5KzPLhvfg9MBBFSpBqTVGRFSiTg285oezzWcZzwEv ``` The slot during this log was 766,815 (the last slot of the epoch). I believe this is due to an off-by-one error in `observed_attesters` where we were failing to provide enough capacity to store observations from the previous, current and next epochs. See code comments for further reasoning. Here's a link to the spec: https://github.com/ethereum/eth2.0-specs/blob/v1.0.1/specs/phase0/p2p-interface.md#beacon_aggregate_and_proof ## Additional Info NA	2021-03-29 23:42:34 +00:00
Michael Sproul	f9d60f5436	VC: accept unknown fields in chain spec (#2277 ) ## Issue Addressed Closes #2274 ## Proposed Changes * Modify the `YamlConfig` to collect unknown fields into an `extra_fields` map, instead of failing hard. * Log a debug message if there are extra fields returned to the VC from one of its BNs. This restores Lighthouse's compatibility with Teku beacon nodes (and therefore Infura)	2021-03-26 04:53:57 +00:00
Paul Hauner	b34a79dc0b	v1.2.1 (#2263 ) ## Issue Addressed NA ## Proposed Changes - Bump version. - Add some new ENR for Prater - Afri: https://github.com/eth2-clients/eth2-testnets/pull/42 - Prysm: https://github.com/eth2-clients/eth2-testnets/pull/43 - Apply the fixes from #2181 to the no-eth1-sim to try fix CI issues. ## Additional Info NA	2021-03-18 04:20:46 +00:00
Paul Hauner	015ab7d0a7	Optimize validator duties (#2243 ) ## Issue Addressed Closes #2052 ## Proposed Changes - Refactor the attester/proposer duties endpoints in the BN - Performance improvements - Fixes some potential inconsistencies with the dependent root fields. - Removes `http_api::beacon_proposer_cache` and just uses the one on the `BeaconChain` instead. - Move the code for the proposer/attester duties endpoints into separate files, for readability. - Refactor the `DutiesService` in the VC - Required to reduce the delay on broadcasting new blocks. - Gets rid of the `ValidatorDuty` shim struct that came about when we adopted the standard API. - Separate block/attestation duty tasks so that they don't block each other when one is slow. - In the VC, use `PublicKeyBytes` to represent validators instead of `PublicKey`. `PublicKey` is a legit crypto object whilst `PublicKeyBytes` is just a byte-array, it's much faster to clone/hash `PublicKeyBytes` and this change has had a significant impact on runtimes. - Unfortunately this has created lots of dust changes. - In the BN, store `PublicKeyBytes` in the `beacon_proposer_cache` and allow access to them. The HTTP API always sends `PublicKeyBytes` over the wire and the conversion from `PublicKey` -> `PublickeyBytes` is non-trivial, especially when queries have 100s/1000s of validators (like Pyrmont). - Add the `state_processing::state_advance` mod which dedups a lot of the "apply `n` skip slots to the state" code. - This also fixes a bug with some functions which were failing to include a state root as per [this comment](`072695284f/consensus/state_processing/src/state_advance.rs (L69-L74)`). I couldn't find any instance of this bug that resulted in anything more severe than keying a shuffling cache by the wrong block root. - Swap the VC block service to use `mpsc` from `tokio` instead of `futures`. This is consistent with the rest of the code base. ~~This PR reduces the size of the codebase 🎉~~ It used to reduce the size of the code base before I added more comments. ## Observations on Prymont - Proposer duties times down from peaks of 450ms to consistent <1ms. - Current epoch attester duties times down from >1s peaks to a consistent 20-30ms. - Block production down from +600ms to 100-200ms. ## Additional Info - ~~Blocked on #2241~~ - ~~Blocked on #2234~~ ## TODO - [x] ~~Refactor this into some smaller PRs?~~ Leaving this as-is for now. - [x] Address `per_slot_processing` roots. - [x] Investigate slow next epoch times. Not getting added to cache on block processing? - [x] Consider [this](`072695284f/beacon_node/store/src/hot_cold_store.rs (L811-L812)`) in the scenario of replacing the state roots Co-authored-by: pawan <pawandhananjay@gmail.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2021-03-17 05:09:57 +00:00

1 2 3 4 5 ...

1580 Commits