lighthouse

Author	SHA1	Message	Date
Pawan Dhananjay	b3ce8d0de9	Fix penalties in sync methods (#3384 ) ## Issue Addressed N/A ## Proposed Changes Uses the `penalize_peer` function added in #3350 in sync methods as well. The existing code in sync methods missed the `ExecutionPayloadError::UnverifiedNonOptimisticCandidate` case.	2022-07-30 00:22:39 +00:00
Paul Hauner	25f0e261cb	Don't return errors when fork choice fails (#3370 ) ## Issue Addressed NA ## Proposed Changes There are scenarios where the only viable head will have an invalid execution payload, in this scenario the `get_head` function on `proto_array` will return an error. We must recover from this scenario by importing blocks from the network. This PR stops `BeaconChain::recompute_head` from returning an error so that we can't accidentally start down-scoring peers or aborting block import just because the current head has an invalid payload. ## Reviewer Notes The following changes are included: 1. Allow `fork_choice.get_head` to fail gracefully in `BeaconChain::process_block` when trying to update the `early_attester_cache`; simply don't add the block to the cache rather than aborting the entire process. 1. Don't return an error from `BeaconChain::recompute_head_at_current_slot` and `BeaconChain::recompute_head` to defensively prevent calling functions from aborting any process just because the fork choice function failed to run. - This should have practically no effect, since most callers were still continuing if recomputing the head failed. - The outlier is that the API will return 200 rather than a 500 when fork choice fails. 1. Add the `ProtoArrayForkChoice::set_all_blocks_to_optimistic` function to recover from the scenario where we've rebooted and the persisted fork choice has an invalid head.	2022-07-28 13:57:09 +00:00
Pawan Dhananjay	f3439116da	Return ResourceUnavailable if we are unable to reconstruct execution payloads (#3365 ) ## Issue Addressed Resolves #3351 ## Proposed Changes Returns a `ResourceUnavailable` rpc error if we are unable to serve full payloads to blocks by root and range requests because the execution layer is not synced. ## Additional Info This PR also changes the penalties such that a `ResourceUnavailable` error is only penalized if it is an outgoing request. If we are syncing and aren't getting full block responses, then we don't have use for the peer. However, this might not be true for the incoming request case. We let the peer decide in this case if we are still useful or if we should be banned. cc @divagant-martian please let me know if i'm missing something here.	2022-07-27 03:20:00 +00:00
Justin Traglia	0f62d900fe	Fix some typos (#3376 ) ## Proposed Changes This PR fixes various minor typos in the project.	2022-07-27 00:51:06 +00:00
realbigsean	20ebf1f3c1	Realized unrealized experimentation (#3322 ) ## Issue Addressed Add a flag that optionally enables unrealized vote tracking. Would like to test out on testnets and benchmark differences in methods of vote tracking. This PR includes a DB schema upgrade to enable to new vote tracking style. Co-authored-by: realbigsean <sean@sigmaprime.io> Co-authored-by: Paul Hauner <paul@paulhauner.com> Co-authored-by: sean <seananderson33@gmail.com> Co-authored-by: Mac L <mjladson@pm.me>	2022-07-25 23:53:26 +00:00
ethDreamer	7c3ff903ca	Fix Gossip Penalties During Optimistic Sync Window (#3350 ) ## Issue Addressed * #3344 ## Proposed Changes There are a number of cases during block processing where we might get an `ExecutionPayloadError` but we shouldn't penalize peers. We were forgetting to enumerate all of the non-penalizing errors in every single match statement where we are making that decision. I created a function to make it explicit when we should and should not penalize peers and I used that function in all places where this logic is needed. This way we won't make the same mistake if we add another variant of `ExecutionPayloadError` in the future.	2022-07-20 20:59:38 +00:00
Age Manning	2ed51c364d	Improve block-lookup functionality (#3287 ) Improves some of the functionality around single and parent block lookup. Gives extra information about whether failures for lookups are related to processing or downloading. This is entirely untested. Co-authored-by: Diva M <divma@protonmail.com>	2022-07-17 23:26:58 +00:00
Pawan Dhananjay	28b0ff27ff	Ignored sync jobs 2 (#3317 ) ## Issue Addressed Duplicate of #3269. Making this since @divagant-martian opened the previous PR and she can't approve her own PR 😄 Co-authored-by: Diva M <divma@protonmail.com>	2022-07-15 07:31:20 +00:00
Akihito Nakano	1cc8a97d4e	Remove unused method in HandlerNetworkContext (#3299 ) ## Issue Addressed N/A ## Proposed Changes Removed unused method in `HandlerNetworkContext`.	2022-07-04 02:56:14 +00:00
Paul Hauner	be4e261e74	Use async code when interacting with EL (#3244 ) ## Overview This rather extensive PR achieves two primary goals: 1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state. 2. Refactors fork choice, block production and block processing to `async` functions. Additionally, it achieves: - Concurrent forkchoice updates to the EL and cache pruning after a new head is selected. - Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production. - Concurrent per-block-processing and execution payload verification during block processing. - The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?): - I had to do this to deal with sending blocks into spawned tasks. - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones. - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap. - Avoids cloning all the blocks in every chain segment during sync. - It also has the potential to clean up our code where we need to pass an owned block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough 😅) - The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs. For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273 ## Changes to `canonical_head` and `fork_choice` Previously, the `BeaconChain` had two separate fields: ``` canonical_head: RwLock<Snapshot>, fork_choice: RwLock<BeaconForkChoice> ``` Now, we have grouped these values under a single struct: ``` canonical_head: CanonicalHead { cached_head: RwLock<Arc<Snapshot>>, fork_choice: RwLock<BeaconForkChoice> } ``` Apart from ergonomics, the only actual change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously. ## Breaking Changes ### The `state` (root) field in the `finalized_checkpoint` SSE event Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event: 1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`. 4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots. Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](`de2b2801c8/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java (L171-L182)`) it uses [`getStateRootFromBlockRoot`](`de2b2801c8/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java (L336-L341)`) which uses (1). I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku. ## Notes for Reviewers I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct. I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking". I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run at least once means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows when it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it. I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around. Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2. You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests: - Changing tests to be `tokio::async` tests. - Adding `.await` to fork choice, block processing and block production functions. - Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`. - Wrapping `SignedBeaconBlock` in an `Arc`. - In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant. I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic. Co-authored-by: Mac L <mjladson@pm.me>	2022-07-03 05:36:50 +00:00
Divma	d40c76e667	Fix clippy lints for rust 1.62 (#3300 ) ## Issue Addressed Fixes some new clippy lints after the last rust release ### Lints fixed for the curious: - [cast_abs_to_unsigned](https://rust-lang.github.io/rust-clippy/master/index.html#cast_abs_to_unsigned) - [map_identity](https://rust-lang.github.io/rust-clippy/master/index.html#map_identity) - [let_unit_value](https://rust-lang.github.io/rust-clippy/master/index.html#let_unit_value) - [crate_in_macro_def](https://rust-lang.github.io/rust-clippy/master/index.html#crate_in_macro_def) - [extra_unused_lifetimes](https://rust-lang.github.io/rust-clippy/master/index.html#extra_unused_lifetimes) - [format_push_string](https://rust-lang.github.io/rust-clippy/master/index.html#format_push_string)	2022-06-30 22:51:49 +00:00
Divma	7af5742081	Deprecate step param in BlocksByRange RPC request (#3275 ) ## Issue Addressed Deprecates the step parameter in the blocks by range request ## Proposed Changes - Modifies the BlocksByRangeRequest type to remove the step parameter and everywhere we took it into account before - Adds a new type to still handle coding and decoding of requests that use the parameter ## Additional Info I went with a deprecation over the type itself so that requests received outside `lighthouse_network` don't even need to deal with this parameter. After the deprecation period just removing the Old blocks by range request should be straightforward	2022-06-22 16:23:34 +00:00
Pawan Dhananjay	f428719761	Do not penalize peers on execution layer offline errors (#3258 ) ## Issue Addressed Partly resolves https://github.com/sigp/lighthouse/issues/3032 ## Proposed Changes Extracts some of the functionality of #3094 into a separate PR as the original PR requires a bit more work. Do not unnecessarily penalize peers when we fail to validate received execution payloads because our execution layer is offline.	2022-06-19 23:13:40 +00:00
Divma	cfd26d25e0	do not count sync batch attempts when peer is not at fault (#3245 ) ## Issue Addressed currently we count a failed attempt for a syncing chain even if the peer is not at fault. This makes us do more work if the chain fails, and heavily penalize peers, when we can simply retry. Inspired by a proposal I made to #3094 ## Proposed Changes If a batch fails but the peer is not at fault, do not count the attempt Also removes some annoying logs ## Additional Info We still get a counter on ignored attempts.. just in case	2022-06-07 02:35:56 +00:00
Divma	493c2c037c	reduce reprocess queue/channel sizes (#3239 ) ## Issue Addressed Reduces the effect of late blocks on overall node buildup ## Proposed Changes change the capacity of the channels used to send work for reprocessing in the beacon processor, and to send back to the main processor task, to be 75% of the capacity of the channel for receiving new events ## Additional Info The issues we've seen suggest we should still evaluate node performance under stress, with late blocks being a big factor. Other changes that could help: 1. right now we have a cap for queued attestations for reprocessing that applies to the sum of aggregated and unaggregated attestations. We could consider adding a separate cap that favors aggregated ones. 2. solving #2848	2022-06-06 23:52:31 +00:00
Michael Sproul	98c8ac1a87	Fix typo in peer state transition log (#3224 ) ## Issue Addressed We were logging `out_finalized_epoch` instead of `our_finalized_epoch`. I noticed this ages ago but only just got around to fixing it. ## Additional Info I also reformatted the log line to respect the line length limit (`rustfmt` won't do it because it gets confused by the `;` in slog's log macros).	2022-05-31 06:09:10 +00:00
Divma	a7896a58cc	move backfill sync jobs from highest priority to lowest (#3215 ) ## Issue Addressed #3212 ## Proposed Changes Move chain segments coming from back-fill syncing from highest priority to lowest ## Additional Info If this does not solve the issue, next steps would be lowering the batch size for back-fill sync, and as last resort throttling the processing of these chain segments	2022-05-26 02:05:17 +00:00
Michael Sproul	6eaeaa542f	Fix Rust 1.61 clippy lints (#3192 ) ## Issue Addressed This fixes the low-hanging Clippy lints introduced in Rust 1.61 (due any hour now). It _ignores_ one lint, because fixing it requires a structural refactor of the validator client that needs to be done delicately. I've started on that refactor and will create another PR that can be reviewed in more depth in the coming days. I think we should merge this PR in the meantime to unblock CI.	2022-05-20 05:02:13 +00:00
Paul Hauner	38050fa460	Allow `TaskExecutor` to be used in `async` tests (#3178 ) # Description Since the `TaskExecutor` currently requires a `Weak<Runtime>`, it's impossible to use it in an async test where the `Runtime` is created outside our scope. Whilst we could create a new `Runtime` instance inside the async test, dropping that `Runtime` would cause a panic (you can't drop a `Runtime` in an async context). To address this issue, this PR creates the `enum Handle`, which supports either: - A `Weak<Runtime>` (for use in our production code) - A `Handle` to a runtime (for use in testing) In theory, there should be no change to the behaviour of our production code (beyond some slightly different descriptions in HTTP 500 errors), or even our tests. If there is no change, you might ask "why bother?". There are two PRs (#3070 and #3175) that are waiting on these fixes to introduce some new tests. Since we've added the EL to the `BeaconChain` (for the merge), we are now doing more async stuff in tests. I've also added a `RuntimeExecutor` to the `BeaconChainTestHarness`. Whilst that's not immediately useful, it will become useful in the near future with all the new async testing.	2022-05-16 08:35:59 +00:00
Michael Sproul	bcdd960ab1	Separate execution payloads in the DB (#3157 ) ## Proposed Changes Reduce post-merge disk usage by not storing finalized execution payloads in Lighthouse's database. ⚠️ This is achieved in a backwards-incompatible way for networks that have already merged ⚠️. Kiln users and shadow fork enjoyers will be unable to downgrade after running the code from this PR. The upgrade migration may take several minutes to run, and can't be aborted after it begins. The main changes are: - New column in the database called `ExecPayload`, keyed by beacon block root. - The `BeaconBlock` column now stores blinded blocks only. - Lots of places that previously used full blocks now use blinded blocks, e.g. analytics APIs, block replay in the DB, etc. - On finalization: - `prune_abanonded_forks` deletes non-canonical payloads whilst deleting non-canonical blocks. - `migrate_db` deletes finalized canonical payloads whilst deleting finalized states. - Conversions between blinded and full blocks are implemented in a compositional way, duplicating some work from Sean's PR #3134. - The execution layer has a new `get_payload_by_block_hash` method that reconstructs a payload using the EE's `eth_getBlockByHash` call. - I've tested manually that it works on Kiln, using Geth and Nethermind. - This isn't necessarily the most efficient method, and new engine APIs are being discussed to improve this: https://github.com/ethereum/execution-apis/pull/146. - We're depending on the `ethers` master branch, due to lots of recent changes. We're also using a workaround for https://github.com/gakonst/ethers-rs/issues/1134. - Payload reconstruction is used in the HTTP API via `BeaconChain::get_block`, which is now `async`. Due to the `async` fn, the `blocking_json` wrapper has been removed. - Payload reconstruction is used in network RPC to serve blocks-by-{root,range} responses. Here the `async` adjustment is messier, although I think I've managed to come up with a reasonable compromise: the handlers take the `SendOnDrop` by value so that they can drop it on _task completion_ (after the `fn` returns). Still, this is introducing disk reads onto core executor threads, which may have a negative performance impact (thoughts appreciated). ## Additional Info - [x] For performance it would be great to remove the cloning of full blocks when converting them to blinded blocks to write to disk. I'm going to experiment with a `put_block` API that takes the block by value, breaks it into a blinded block and a payload, stores the blinded block, and then re-assembles the full block for the caller. - [x] We should measure the latency of blocks-by-root and blocks-by-range responses. - [x] We should add integration tests that stress the payload reconstruction (basic tests done, issue for more extensive tests: https://github.com/sigp/lighthouse/issues/3159) - [x] We should (manually) test the schema v9 migration from several prior versions, particularly as blocks have changed on disk and some migrations rely on being able to load blocks. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2022-05-12 00:42:17 +00:00
Divma	7366266bd1	keep failed finalized chains to avoid retries (#3142 ) ## Issue Addressed In very rare occasions we've seen most if not all our peers in a chain with which we don't agree. Purging these peers can take a very long time: number of retries of the chain. Meanwhile sync is caught in a loop trying the chain again and again. This makes it so that we fast track purging peers via registering the failed chain to prevent retrying for some time (30 seconds). Longer times could be dangerous since a chain can fail if a batch fails to download for example. In this case, I think it's still acceptable to fast track purging peers since they are nor providing the required info anyway Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>	2022-04-13 01:10:55 +00:00
Michael Sproul	bac7c3fa54	v2.2.0 (#3139 ) ## Proposed Changes Cut release v2.2.0 including proposer boost. ## Additional Info I also updated the clippy lints for the imminent release of Rust 1.60, although LH v2.2.0 will continue to compile using Rust 1.58 (our MSRV).	2022-04-05 02:53:09 +00:00
Michael Sproul	4d0122444b	Update and consolidate dependencies (#3136 ) ## Proposed Changes I did some gardening 🌳 in our dependency tree: - Remove duplicate versions of `warp` (git vs patch) - Remove duplicate versions of lots of small deps: `cpufeatures`, `ethabi`, `ethereum-types`, `bitvec`, `nix`, `libsecp256k1`. - Update MDBX (should resolve #3028). I tested and Lighthouse compiles on Windows 11 now. - Restore `psutil` back to upstream - Make some progress updating everything to rand 0.8. There are a few crates stuck on 0.7. Hopefully this puts us on a better footing for future `cargo audit` issues, and improves compile times slightly. ## Additional Info Some crates are held back by issues with `zeroize`. libp2p-noise depends on [`chacha20poly1305`](https://crates.io/crates/chacha20poly1305) which depends on zeroize < v1.5, and we can only have one version of zeroize because it's post 1.0 (see https://github.com/rust-lang/cargo/issues/6584). The latest version of `zeroize` is v1.5.4, which is used by the new versions of many other crates (e.g. `num-bigint-dig`). Once a new version of chacha20poly1305 is released we can update libp2p-noise and upgrade everything to the latest `zeroize` version. I've also opened a PR to `blst` related to zeroize: https://github.com/supranational/blst/pull/111	2022-04-04 00:26:16 +00:00
Michael Sproul	41e7a07c51	Add `lighthouse db` command (#3129 ) ## Proposed Changes Add a `lighthouse db` command with three initial subcommands: - `lighthouse db version`: print the database schema version. - `lighthouse db migrate --to N`: manually upgrade (or downgrade!) the database to a different version. - `lighthouse db inspect --column C`: log the key and size in bytes of every value in a given `DBColumn`. This PR lays the groundwork for other changes, namely: - Mark's fast-deposit sync (https://github.com/sigp/lighthouse/pull/2915), for which I think we should implement a database downgrade (from v9 to v8). - My `tree-states` work, which already implements a downgrade (v10 to v8). - Standalone purge commands like `lighthouse db purge-dht` per https://github.com/sigp/lighthouse/issues/2824. ## Additional Info I updated the `strum` crate to 0.24.0, which necessitated some changes in the network code to remove calls to deprecated methods. Thanks to @winksaville for the motivation, and implementation work that I used as a source of inspiration (https://github.com/sigp/lighthouse/pull/2685).	2022-04-01 00:58:59 +00:00
Divma	788b6af3c4	Remove sync await points (#3036 ) ## Issue Addressed Removes the await points in sync waiting for a processor response for rpc block processing. Built on top of #3029 This also handles a couple of bugs in the previous code and adds a relatively comprehensive test suite.	2022-03-23 01:09:39 +00:00
Paul Hauner	267d8babc8	Prepare proposer (#3043 ) ## Issue Addressed Resolves #2936 ## Proposed Changes Adds functionality for calling [`validator/prepare_beacon_proposer`](https://ethereum.github.io/beacon-APIs/?urls.primaryName=dev#/Validator/prepareBeaconProposer) in advance. There is a `BeaconChain::prepare_beacon_proposer` method which, which called, computes the proposer for the next slot. If that proposer has been registered via the `validator/prepare_beacon_proposer` API method, then the `beacon_chain.execution_layer` will be provided the `PayloadAttributes` for us in all future forkchoiceUpdated calls. An artificial forkchoiceUpdated call will be created 4s before each slot, when the head updates and when a validator updates their information. Additionally, I added strict ordering for calls from the `BeaconChain` to the `ExecutionLayer`. I'm not certain the `ExecutionLayer` will always maintain this ordering, but it's a good start to have consistency from the `BeaconChain`. There are some deadlock opportunities introduced, they are documented in the code. ## Additional Info - ~~Blocked on #2837~~ Co-authored-by: realbigsean <seananderson33@GMAIL.com>	2022-03-09 00:42:05 +00:00
tim gretler	cbda0a2f0a	Add log debounce to work processor (#3045 ) ## Issue Addressed #3010 ## Proposed Changes - move log debounce time latch to `./common/logging` - add timelatch to limit logging for `attestations_delay_queue` and `queued_block_roots` ## Additional Info - Is a separate crate for the time latch preferred? - `elapsed()` could take `LOG_DEBOUNCE_INTERVAL ` as an argument to allow for different granularity.	2022-03-07 06:30:17 +00:00
Michael Sproul	1829250ee4	Ignore attestations to finalized blocks (don't reject) (#3052 ) ## Issue Addressed Addresses spec changes from v1.1.0: - https://github.com/ethereum/consensus-specs/pull/2830 - https://github.com/ethereum/consensus-specs/pull/2846 ## Proposed Changes * Downgrade the REJECT for `HeadBlockFinalized` to an IGNORE. This applies to both unaggregated and aggregated attestations. ## Additional Info I thought about also changing the penalty for `UnknownTargetRoot` but I don't think it's reachable in practice.	2022-03-04 00:41:22 +00:00
Divma	4bf1af4e85	Custom RPC request management for sync (#3029 ) ## Proposed Changes Make `lighthouse_network` generic over request ids, now usable by sync	2022-03-02 22:07:17 +00:00
Age Manning	f3c1dde898	Filter non global ips from discovery (#3023 ) ## Issue Addressed #3006 ## Proposed Changes This PR changes the default behaviour of lighthouse to ignore discovered IPs that are not globally routable. It adds a CLI flag, --enable-local-discovery to permit the non-global IPs in discovery. NOTE: We should take care in merging this as I will break current set-ups that rely on local IP discovery. I made this the non-default behaviour because we dont really want to be wasting resources attempting to connect to non-routable addresses and we dont want to propagate these to others (on the chance we can connect to one of these local nodes), improving discoveries efficiency.	2022-03-02 03:14:27 +00:00
Paul Hauner	b6493d5e24	Enforce Optimistic Sync Conditions & CLI Tests (v2) (#3050 ) ## Description This PR adds a single, trivial commit (f5d2b27d78349d5a675a2615eba42cc9ae708094) atop #2986 to resolve a tests compile error. The original author (@ethDreamer) is AFK so I'm getting this one merged ☺️ Please see #2986 for more information about the other, significant changes in this PR. Co-authored-by: Mark Mackey <mark@sigmaprime.io> Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>	2022-03-01 22:56:47 +00:00
Age Manning	a1b730c043	Cleanup small issues (#3027 ) Downgrades some excessive networking logs and corrects some metrics.	2022-03-01 01:49:22 +00:00
Michael Sproul	5e1f8a8480	Update to Rust 1.59 and 2021 edition (#3038 ) ## Proposed Changes Lots of lint updates related to `flat_map`, `unwrap_or_else` and string patterns. I did a little more creative refactoring in the op pool, but otherwise followed Clippy's suggestions. ## Additional Info We need this PR to unblock CI.	2022-02-25 00:10:17 +00:00
Age Manning	3ebb8b0244	Improved peer management (#2993 ) ## Issue Addressed I noticed in some logs some excess and unecessary discovery queries. What was happening was we were pruning our peers down to our outbound target and having some disconnect. When we are below this threshold we try to find more peers (even if we are at our peer limit). The request becomes futile because we have no more peer slots. This PR corrects this issue and advances the pruning mechanism to favour subnet peers. An overview the new logic added is: - We prune peers down to a target outbound peer count which is higher than the minimum outbound peer count. - We only search for more peers if there is room to do so, and we are below the minimum outbound peer count not the target. So this gives us some buffer for peers to disconnect. The buffer is currently 10% The modified pruning logic is documented in the code but for reference it should do the following: - Prune peers with bad scores first - If we need to prune more peers, then prune peers that are subscribed to a long-lived subnet - If we still need to prune peers, the prune peers that we have a higher density of on any given subnet which should drive for uniform peers across all subnets. This will need a bit of testing as it modifies some significant peer management behaviours in lighthouse.	2022-02-18 02:36:43 +00:00
Divma	1306b2db96	libp2p upgrade + gossipsub interval fix (#3012 ) ## Issue Addressed Lighthouse gossiping late messages ## Proposed Changes Point LH to our fork using tokio interval, which 1) works as expected 2) is more performant than the previous version that actually worked as expected Upgrade libp2p ## Additional Info https://github.com/libp2p/rust-libp2p/issues/2497	2022-02-10 04:12:03 +00:00
Paul Hauner	fc37d51e10	Add checks to prevent fwding old messages (#2978 ) ## Issue Addressed NA ## Proposed Changes Checks to see if attestations or sync messages are still valid before "accepting" them for propagation. ## Additional Info NA	2022-02-01 01:04:24 +00:00
Paul Hauner	a6da87066b	Add strict penalties const bool (#2976 ) ## Issue Addressed NA ## Proposed Changes Adds `STRICT_LATE_MESSAGE_PENALTIES: bool` which allows for toggling penalties for late sync/attn messages. `STRICT_LATE_MESSAGE_PENALTIES` is set to `false`, since we're seeing a lot of late messages on the network which are causing peer drops. We can toggle the bool during testing to try and figure out what/who is the cause of these late messages. In effect, this PR relaxes peer downscoring for late attns and sync committee messages. ## Additional Info - ~~Blocked on #2974~~	2022-02-01 01:04:22 +00:00
Michael Sproul	99d2c33387	Avoid looking up pre-finalization blocks (#2909 ) ## Issue Addressed This PR fixes the unnecessary `WARN Single block lookup failed` messages described here: https://github.com/sigp/lighthouse/pull/2866#issuecomment-1008442640 ## Proposed Changes Add a new cache to the `BeaconChain` that tracks the block roots of blocks from before finalization. These could be blocks from the canonical chain (which might need to be read from disk), or old pre-finalization blocks that have been forked out. The cache also stores a set of block roots for in-progress single block lookups, which duplicates some of the information from sync's `single_block_lookups` hashmap: `a836e180f9/beacon_node/network/src/sync/manager.rs (L192-L196)` On a live node you can confirm that the cache is working by grepping logs for the message: `Rejected attestation to finalized block`.	2022-01-27 22:58:32 +00:00
Divma	f2b1e096b2	Code quality improvents to the network service (#2932 ) Checking how to priorize the polling of the network I moved most of the service code to functions. This change I think it's worth on it's own for code quality since inside the `tokio::select` many tools don't work (cargo fmt, sometimes clippy, and sometimes even the compiler's errors get wack). This is functionally equivalent to the previous code, just better organized	2022-01-26 23:14:23 +00:00
Divma	9964f5afe5	Document why we hash downloaded blocks for both sync algs (#2927 ) ## Proposed Changes Initially the idea was to remove hashing of blocks in backfill sync. After considering it more, we conclude that we need to do it in both (forward and backfill) anyway. But since we forgot why we were doing it in the first place, this PR documents this logic. Future us should find it useful Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>	2022-01-26 23:14:22 +00:00
Michael Sproul	ceeab02e3a	Lazy hashing for SignedBeaconBlock in sync (#2916 ) ## Proposed Changes Allocate less memory in sync by hashing the `SignedBeaconBlock`s in a batch directly, rather than going via SSZ bytes. Credit to @paulhauner for finding this source of temporary allocations.	2022-01-14 07:20:54 +00:00
Paul Hauner	2ce2ec9b62	Remove penalty for attesting to unknown head (#2903 ) ## Issue Addressed - Resolves https://github.com/sigp/lighthouse/issues/2902 ## Proposed Changes As documented in https://github.com/sigp/lighthouse/issues/2902, there are some cases where we will score peers very harshly for sending attestations to an unknown head. This PR removes the penalty when an attestation for an unknown head is received, queued for block look-up, then popped from the queue without the head block being known. This prevents peers from being penalized for an unknown block when that peer was never actually asked for the block. Peer penalties should still be applied to the peers who do get the request for the block and fail to respond with a valid block. As such, peers who send us attestations to non-existent heads should eventually be booted. ## Additional Info - [ ] Need to confirm that a timeout for a bbroot request will incur a penalty.	2022-01-13 03:08:38 +00:00
Paul Hauner	aaa5344eab	Add peer score adjustment msgs (#2901 ) ## Issue Addressed N/A ## Proposed Changes This PR adds the `msg` field to `Peer score adjusted` log messages. These `msg` fields help identify why a peer was banned. Example: ``` Jan 11 04:18:48.096 DEBG Peer score adjusted score: -100.00, peer_id: 16Uiu2HAmQskxKWWGYfginwZ51n5uDbhvjHYnvASK7PZ5gBdLmzWj, msg: attn_unknown_head, service: libp2p Jan 11 04:18:48.096 DEBG Peer score adjusted score: -27.86, peer_id: 16Uiu2HAmA7cCb3MemVDbK3MHZoSb7VN3cFUG3vuSZgnGesuVhPDE, msg: sync_past_slot, service: libp2p Jan 11 04:18:48.096 DEBG Peer score adjusted score: -100.00, peer_id: 16Uiu2HAmQskxKWWGYfginwZ51n5uDbhvjHYnvASK7PZ5gBdLmzWj, msg: attn_unknown_head, service: libp2p Jan 11 04:18:48.096 DEBG Peer score adjusted score: -28.86, peer_id: 16Uiu2HAmA7cCb3MemVDbK3MHZoSb7VN3cFUG3vuSZgnGesuVhPDE, msg: sync_past_slot, service: libp2p Jan 11 04:18:48.096 DEBG Peer score adjusted score: -29.86, peer_id: 16Uiu2HAmA7cCb3MemVDbK3MHZoSb7VN3cFUG3vuSZgnGesuVhPDE, msg: sync_past_slot, service: libp2p ``` There is also a `libp2p_report_peer_msgs_total` metrics which allows us to see count of reports per `msg` tag. ## Additional Info NA	2022-01-12 05:32:14 +00:00
Paul Hauner	61f60bdf03	Avoid penalizing peers for delays during processing (#2894 ) ## Issue Addressed NA ## Proposed Changes We have observed occasions were under-resourced nodes will receive messages that were valid at the time, but later become invalidated due to long waits for a `BeaconProcessor` worker. In this PR, we will check to see if the message was valid at the time of receipt. If it was initially valid but invalid now, we just ignore the message without penalizing the peer. ## Additional Info NA	2022-01-12 02:36:24 +00:00
Paul Hauner	4848e53155	Avoid peer penalties on internal errors for batch block import (#2898 ) ## Issue Addressed NA ## Proposed Changes I've observed some Prater nodes (and potentially some mainnet nodes) banning peers due to validator pubkey cache lock timeouts. For the `BeaconChainError`-type of errors, they're caused by internal faults and we can't necessarily tell if the peer is bad or not. I think this is causing us to ban peers unnecessarily when running on under-resourced machines. ## Additional Info NA	2022-01-11 05:33:28 +00:00
Paul Hauner	02e2fd2fb8	Add early attester cache (#2872 ) ## Issue Addressed NA ## Proposed Changes Introduces a cache to attestation to produce atop blocks which will become the head, but are not fully imported (e.g., not inserted into the database). Whilst attesting to a block before it's imported is rather easy, if we're going to produce that attestation then we also need to be able to: 1. Verify that attestation. 1. Respond to RPC requests for the `beacon_block_root`. Attestation verification (1) is partially covered. Since we prime the shuffling cache before we insert the block into the early attester cache, we should be fine for all typical use-cases. However, it is possible that the cache is washed out before we've managed to insert the state into the database and then attestation verification will fail with a "missing beacon state"-type error. Providing the block via RPC (2) is also partially covered, since we'll check the database and the early attester cache when responding a blocks-by-root request. However, we'll still omit the block from blocks-by-range requests (until the block lands in the DB). I think this is fine, since there's no guarantee that we return all blocks for those responses. Another important consideration is whether or not the parent of the early attester block is available in the databse. If it were not, we might fail to respond to blocks-by-root request that are iterating backwards to collect a chain of blocks. I argue that we will always have the parent of the early attester block in the database. This is because we are holding the fork-choice write-lock when inserting the block into the early attester cache and we do not drop that until the block is in the database.	2022-01-11 01:35:55 +00:00
Age Manning	81c667b58e	Additional networking metrics (#2549 ) Adds additional metrics for network monitoring and evaluation. Co-authored-by: Mark Mackey <mark@sigmaprime.io>	2021-12-22 06:17:14 +00:00
Divma	56d596ee42	Unban peers at the swarm level when purged (#2855 ) ## Issue Addressed #2840	2021-12-20 23:45:21 +00:00
eklm	9be3d4ecac	Downgrade AttestationStateIsFinalized error to debug (#2866 ) ## Issue Addressed #2834 ## Proposed Changes Change log message severity from error to debug in attestation verification when attestation state is finalized.	2021-12-17 07:59:46 +00:00
realbigsean	a80ccc3a33	1.57.0 lints (#2850 ) ## Issue Addressed New rust lints ## Proposed Changes - Boxing some enum variants - removing some unused fields (is the validator lockfile unused? seemed so to me) ## Additional Info - some error fields were marked as dead code but are logged out in areas - left some dead fields in our ef test code because I assume they are useful for debugging? Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-12-03 04:44:30 +00:00

1 2 3 4 5 ...

482 Commits