Commit Graph

690 Commits

Author SHA1 Message Date
Pawan Dhananjay
5b5cf9cfaa Log ttd (#3339)
## Issue Addressed

Resolves #3249 

## Proposed Changes

Log merge related parameters and EE status in the beacon notifier before the merge.


Co-authored-by: Paul Hauner <paul@paulhauner.com>
2022-07-20 23:16:54 +00:00
ethDreamer
7c3ff903ca Fix Gossip Penalties During Optimistic Sync Window (#3350)
## Issue Addressed
* #3344 

## Proposed Changes

There are a number of cases during block processing where we might get an `ExecutionPayloadError` but we shouldn't penalize peers. We were forgetting to enumerate all of the non-penalizing errors in every single match statement where we are making that decision. I created a function to make it explicit when we should and should not penalize peers and I used that function in all places where this logic is needed. This way we won't make the same mistake if we add another variant of `ExecutionPayloadError` in the future.
2022-07-20 20:59:38 +00:00
Pawan Dhananjay
e5e4e62758 Don't create a execution payload with same timestamp as terminal block (#3331)
## Issue Addressed

Resolves #3316 

## Proposed Changes

This PR fixes an issue where lighthouse created a transition block with `block.execution_payload().timestamp == terminal_block.timestamp` if the terminal block was created at the slot boundary.
2022-07-18 23:15:41 +00:00
Pawan Dhananjay
f9b9658711 Add merge support to simulator (#3292)
## Issue Addressed

N/A

## Proposed Changes

Make simulator merge compatible. Adds a `--post_merge` flag to the eth1 simulator that enables a ttd and simulates the merge transition. Uses the `MockServer` in the execution layer test utils to simulate a dummy execution node.

Adds the merge transition simulation to CI.
2022-07-18 23:15:40 +00:00
Paul Hauner
be4e261e74 Use async code when interacting with EL (#3244)
## Overview

This rather extensive PR achieves two primary goals:

1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.

Additionally, it achieves:

- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
    - I had to do this to deal with sending blocks into spawned tasks.
    - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
    - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
    - Avoids cloning *all the blocks* in *every chain segment* during sync.
    - It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough 😅)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.

For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273

## Changes to `canonical_head` and `fork_choice`

Previously, the `BeaconChain` had two separate fields:

```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```

Now, we have grouped these values under a single struct:

```
canonical_head: CanonicalHead {
  cached_head: RwLock<Arc<Snapshot>>,
  fork_choice: RwLock<BeaconForkChoice>
} 
```

Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.

## Breaking Changes

### The `state` (root) field in the `finalized_checkpoint` SSE event

Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:

1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.

Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](de2b2801c8/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java (L171-L182)) it uses [`getStateRootFromBlockRoot`](de2b2801c8/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java (L336-L341)) which uses (1).

I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.

## Notes for Reviewers

I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.

I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".

I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.

I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.

Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.

You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.

I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.

Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
realbigsean
a7da0677d5 Remove builder redundancy (#3294)
## Issue Addressed

This PR is a subset of the changes in #3134. Unstable will still not function correctly with the new builder spec once this is merged, #3134 should be used on testnets

## Proposed Changes

- Removes redundancy in "builders" (servers implementing the builder spec)
- Renames `payload-builder` flag to `builder`
- Moves from old builder RPC API to new HTTP API, but does not implement the validator registration API (implemented in https://github.com/sigp/lighthouse/pull/3194)



Co-authored-by: sean <seananderson33@gmail.com>
Co-authored-by: realbigsean <sean@sigmaprime.io>
2022-07-01 01:15:19 +00:00
Divma
d40c76e667 Fix clippy lints for rust 1.62 (#3300)
## Issue Addressed

Fixes some new clippy lints after the last rust release
### Lints fixed for the curious:
- [cast_abs_to_unsigned](https://rust-lang.github.io/rust-clippy/master/index.html#cast_abs_to_unsigned)
- [map_identity](https://rust-lang.github.io/rust-clippy/master/index.html#map_identity) 
- [let_unit_value](https://rust-lang.github.io/rust-clippy/master/index.html#let_unit_value)
- [crate_in_macro_def](https://rust-lang.github.io/rust-clippy/master/index.html#crate_in_macro_def) 
- [extra_unused_lifetimes](https://rust-lang.github.io/rust-clippy/master/index.html#extra_unused_lifetimes)
- [format_push_string](https://rust-lang.github.io/rust-clippy/master/index.html#format_push_string)
2022-06-30 22:51:49 +00:00
Michael Sproul
53b2b500db Extend block reward APIs (#3290)
## Proposed Changes

Add a new HTTP endpoint `POST /lighthouse/analysis/block_rewards` which takes a vec of `BeaconBlock`s as input and outputs the `BlockReward`s for them.

Augment the `BlockReward` struct with the attestation data for attestations in the block, which simplifies access to this information from blockprint. Using attestation data I've been able to make blockprint up to 95% accurate across Prysm/Lighthouse/Teku/Nimbus. I hope to go even higher using a bunch of synthetic blocks produced for Prysm/Nimbus/Lodestar, which are underrepresented in the current training data.
2022-06-29 04:50:37 +00:00
Michael Sproul
efebf712dd Avoid cloning snapshots during sync (#3271)
## Issue Addressed

Closes https://github.com/sigp/lighthouse/issues/2944

## Proposed Changes

Remove snapshots from the cache during sync rather than cloning them. This reduces unnecessary cloning and memory fragmentation during sync.

## Additional Info

This PR relies on the fact that the `block_delay` cache is not populated for blocks from sync. Relying on block delay may have the side effect that a change in `block_delay` calculation could lead to: a) more clones, if block delays are added for syncing blocks or b) less clones, if blocks near the head are erroneously provided without a `block_delay`. Case (a) would be a regression to the current status quo, and (b) is low-risk given we know that the snapshot cache is current susceptible to misses (hence `tree-states`).
2022-06-20 23:20:29 +00:00
eklm
a9e158663b Fix validator_monitor_prev_epoch_ metrics (#2911)
## Issue Addressed

#2820

## Proposed Changes

The problem is that validator_monitor_prev_epoch metrics are updated only if there is EpochSummary present in summaries map for the previous epoch and it is not the case for the offline validator. Ensure that EpochSummary is inserted into summaries map also for the offline validators.
2022-06-20 04:06:30 +00:00
Michael Sproul
229f883968 Avoid parallel fork choice runs during sync (#3217)
## Issue Addressed

Fixes an issue that @paulhauner found with the v2.3.0 release candidate whereby the fork choice runs introduced by #3168 tripped over each other during sync:

```
May 24 23:06:40.542 WARN Error signalling fork choice waiter     slot: 3884129, error: ForkChoiceSignalOutOfOrder { current: Slot(3884131), latest: Slot(3884129) }, service: beacon
```

This can occur because fork choice is called from the state advance _and_ the per-slot task. When one of these runs takes a long time it can end up finishing after a run from a later slot, tripping the error above. The problem is resolved by not running either of these fork choice calls during sync.

Additionally, these parallel fork choice runs were causing issues in the database:

```
May 24 07:49:05.098 WARN Found a chain that should already have been pruned, head_slot: 92925, head_block_root: 0xa76c7bf1b98e54ed4b0d8686efcfdf853484e6c2a4c67e91cbf19e5ad1f96b17, service: beacon
May 24 07:49:05.101 WARN Database migration failed               error: HotColdDBError(FreezeSlotError { current_split_slot: Slot(92608), proposed_split_slot: Slot(92576) }), service: beacon
```

In this case, two fork choice calls triggering the finalization processing were being processed out of order due to differences in their processing time, causing the background migrator to try to advance finalization _backwards_ 😳. Removing the parallel fork choice runs from sync effectively addresses the issue, because these runs are most likely to have different finalized checkpoints (because of the speed at which fork choice advances during sync). In theory it's still possible to process updates out of order if any other fork choice runs end up completing out of order, but this should be much less common. Fixing out of order fork choice runs in general is difficult as it requires architectural changes like serialising fork choice updates through a single thread, or locking fork choice along with the head when it is mutated (https://github.com/sigp/lighthouse/pull/3175).

## Proposed Changes

* Don't run per-slot fork choice during sync (if head is older than 4 slots)
* Don't run state-advance fork choice during sync (if head is older than 4 slots)
* Check for monotonic finalization updates in the background migrator. This is a good defensive check to have, and I'm not sure why we didn't have it before (we may have had it and wrongly removed it).
2022-05-25 03:27:30 +00:00
Paul Hauner
7a64994283 Call per_slot_task from a blocking thread (v2) (#3199)
*This PR was adapted from @pawanjay176's work in #3197.*

## Issue Addressed

Fixes a regression in https://github.com/sigp/lighthouse/pull/3168

## Proposed Changes

https://github.com/sigp/lighthouse/pull/3168 added calls to `fork_choice` in  `BeaconChain::per_slot_task` function. This leads to a panic as `per_slot_task` is called from an async context which calls fork choice, which then calls `block_on`.

This PR changes the timer to call the `per_slot_task` function in a blocking thread.

Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
2022-05-20 23:05:07 +00:00
Michael Sproul
6eaeaa542f Fix Rust 1.61 clippy lints (#3192)
## Issue Addressed

This fixes the low-hanging Clippy lints introduced in Rust 1.61 (due any hour now). It _ignores_ one lint, because fixing it requires a structural refactor of the validator client that needs to be done delicately. I've started on that refactor and will create another PR that can be reviewed in more depth in the coming days. I think we should merge this PR in the meantime to unblock CI.
2022-05-20 05:02:13 +00:00
Michael Sproul
8fa032c8ae Run fork choice before block proposal (#3168)
## Issue Addressed

Upcoming spec change https://github.com/ethereum/consensus-specs/pull/2878

## Proposed Changes

1. Run fork choice at the start of every slot, and wait for this run to complete before proposing a block.
2. As an optimisation, also run fork choice 3/4 of the way through the slot (at 9s), _dequeueing attestations for the next slot_.
3. Remove the fork choice run from the state advance timer that occurred before advancing the state.

## Additional Info

### Block Proposal Accuracy

This change makes us more likely to propose on top of the correct head in the presence of re-orgs with proposer boost in play. The main scenario that this change is designed to address is described in the linked spec issue.

### Attestation Accuracy

This change _also_ makes us more likely to attest to the correct head. Currently in the case of a skipped slot at `slot` we only run fork choice 9s into `slot - 1`. This means the attestations from `slot - 1` aren't taken into consideration, and any boost applied to the block from `slot - 1` is not removed (it should be). In the language of the linked spec issue, this means we are liable to attest to C, even when the majority voting weight has already caused a re-org to B.

### Why remove the call before the state advance?

If we've run fork choice at the start of the slot then it has already dequeued all the attestations from the previous slot, which are the only ones eligible to influence the head in the current slot. Running fork choice again is unnecessary (unless we run it for the next slot and try to pre-empt a re-org, but I don't currently think this is a great idea).

### Performance

Based on Prater testing this adds about 5-25ms of runtime to block proposal times, which are 500-1000ms on average (and spike to 5s+ sometimes due to state handling issues 😢 ). I believe this is a small enough penalty to enable it by default, with the option to disable it via the new flag `--fork-choice-before-proposal-timeout 0`. Upcoming work on block packing and state representation will also reduce block production times in general, while removing the spikes.

### Implementation

Fork choice gets invoked at the start of the slot via the `per_slot_task` function called from the slot timer. It then uses a condition variable to signal to block production that fork choice has been updated. This is a bit funky, but it seems to work. One downside of the timer-based approach is that it doesn't happen automatically in most of the tests. The test added by this PR has to trigger the run manually.
2022-05-20 05:02:11 +00:00
Mac L
def9bc660e Remove DB migrations for legacy database schemas (#3181)
## Proposed Changes

Remove support for DB migrations that support upgrading from schema's below version 5. This is mostly for cosmetic/code quality reasons as in most circumstances upgrading from versions of Lighthouse this old will almost always require a re-sync.

## Additional Info

The minimum supported database schema is now version 5.
2022-05-17 04:54:39 +00:00
Paul Hauner
db8a6f81ea Prevent attestation to future blocks from early attester cache (#3183)
## Issue Addressed

N/A

## Proposed Changes

Prevents the early attester cache from producing attestations to future blocks. This bug could result in a missed head vote if the BN was requested to produce an attestation for an earlier slot than the head block during the (usually) short window of time between verifying a block and setting it as the head.

This bug was noticed in an [Antithesis](https://andreagrieser.com/) test and diagnosed by @realbigsean. 

## Additional Info

NA
2022-05-17 01:51:25 +00:00
Paul Hauner
38050fa460 Allow TaskExecutor to be used in async tests (#3178)
# Description

Since the `TaskExecutor` currently requires a `Weak<Runtime>`, it's impossible to use it in an async test where the `Runtime` is created outside our scope. Whilst we *could* create a new `Runtime` instance inside the async test, dropping that `Runtime` would cause a panic (you can't drop a `Runtime` in an async context).

To address this issue, this PR creates the `enum Handle`, which supports either:

- A `Weak<Runtime>` (for use in our production code)
- A `Handle` to a runtime (for use in testing)

In theory, there should be no change to the behaviour of our production code (beyond some slightly different descriptions in HTTP 500 errors), or even our tests. If there is no change, you might ask *"why bother?"*. There are two PRs (#3070 and #3175) that are waiting on these fixes to introduce some new tests. Since we've added the EL to the `BeaconChain` (for the merge), we are now doing more async stuff in tests.

I've also added a `RuntimeExecutor` to the `BeaconChainTestHarness`. Whilst that's not immediately useful, it will become useful in the near future with all the new async testing.
2022-05-16 08:35:59 +00:00
Michael Sproul
bcdd960ab1 Separate execution payloads in the DB (#3157)
## Proposed Changes

Reduce post-merge disk usage by not storing finalized execution payloads in Lighthouse's database.

⚠️ **This is achieved in a backwards-incompatible way for networks that have already merged** ⚠️. Kiln users and shadow fork enjoyers will be unable to downgrade after running the code from this PR. The upgrade migration may take several minutes to run, and can't be aborted after it begins.

The main changes are:

- New column in the database called `ExecPayload`, keyed by beacon block root.
- The `BeaconBlock` column now stores blinded blocks only.
- Lots of places that previously used full blocks now use blinded blocks, e.g. analytics APIs, block replay in the DB, etc.
- On finalization:
    - `prune_abanonded_forks` deletes non-canonical payloads whilst deleting non-canonical blocks.
    - `migrate_db` deletes finalized canonical payloads whilst deleting finalized states.
- Conversions between blinded and full blocks are implemented in a compositional way, duplicating some work from Sean's PR #3134.
- The execution layer has a new `get_payload_by_block_hash` method that reconstructs a payload using the EE's `eth_getBlockByHash` call.
   - I've tested manually that it works on Kiln, using Geth and Nethermind.
   - This isn't necessarily the most efficient method, and new engine APIs are being discussed to improve this: https://github.com/ethereum/execution-apis/pull/146.
   - We're depending on the `ethers` master branch, due to lots of recent changes. We're also using a workaround for https://github.com/gakonst/ethers-rs/issues/1134.
- Payload reconstruction is used in the HTTP API via `BeaconChain::get_block`, which is now `async`. Due to the `async` fn, the `blocking_json` wrapper has been removed.
- Payload reconstruction is used in network RPC to serve blocks-by-{root,range} responses. Here the `async` adjustment is messier, although I think I've managed to come up with a reasonable compromise: the handlers take the `SendOnDrop` by value so that they can drop it on _task completion_ (after the `fn` returns). Still, this is introducing disk reads onto core executor threads, which may have a negative performance impact (thoughts appreciated).

## Additional Info

- [x] For performance it would be great to remove the cloning of full blocks when converting them to blinded blocks to write to disk. I'm going to experiment with a `put_block` API that takes the block by value, breaks it into a blinded block and a payload, stores the blinded block, and then re-assembles the full block for the caller.
- [x] We should measure the latency of blocks-by-root and blocks-by-range responses.
- [x] We should add integration tests that stress the payload reconstruction (basic tests done, issue for more extensive tests: https://github.com/sigp/lighthouse/issues/3159)
- [x] We should (manually) test the schema v9 migration from several prior versions, particularly as blocks have changed on disk and some migrations rely on being able to load blocks.

Co-authored-by: Paul Hauner <paul@paulhauner.com>
2022-05-12 00:42:17 +00:00
Michael Sproul
ae47a93c42 Don't panic in forkchoiceUpdated handler (#3165)
## Issue Addressed

Fix a panic due to misuse of the Tokio executor when processing a forkchoiceUpdated response. We were previously calling `process_invalid_execution_payload` from the async function `update_execution_engine_forkchoice_async`, which resulted in a panic because `process_invalid_execution_payload` contains a call to fork choice, which ultimately calls `block_on`.

An example backtrace can be found here: https://gist.github.com/michaelsproul/ac5da03e203d6ffac672423eaf52fb20

## Proposed Changes

Wrap the call to `process_invalid_execution_payload` in a `spawn_blocking` so that `block_on` is no longer called from an async context.

## Additional Info

- I've been thinking about how to catch bugs like this with static analysis (a new Clippy lint).
- The payload validation tests have been re-worked to support distinct responses from the mock EE for newPayload and forkchoiceUpdated. Three new tests have been added covering the `Invalid`, `InvalidBlockHash` and `InvalidTerminalBlock` cases.
- I think we need a bunch more tests of different legal and illegal variations
2022-05-04 23:30:34 +00:00
Paul Hauner
b49b4291a3 Disallow attesting to optimistic head (#3140)
## Issue Addressed

NA

## Proposed Changes

Disallow the production of attestations and retrieval of unaggregated attestations when they reference an optimistic head. Add tests to this end.

I also moved `BeaconChain::produce_unaggregated_attestation_for_block` to the `BeaconChainHarness`. It was only being used during tests, so it's nice to stop pretending it's production code. I also needed something that could produce attestations to optimistic blocks in order to simulate scenarios where the justified checkpoint is determined invalid (if no one would attest to an optimistic block, we could never justify it and then flip it to invalid).

## Additional Info

- ~~Blocked on #3126~~
2022-04-13 03:54:42 +00:00
Paul Hauner
c8edeaff29 Don't log crits for missing EE before Bellatrix (#3150)
## Issue Addressed

NA

## Proposed Changes

Fixes an issue introduced in #3088 which was causing unnecessary `crit` logs on networks without Bellatrix enabled.

## Additional Info

NA
2022-04-11 23:14:47 +00:00
ethDreamer
22002a4e68 Transition Block Proposer Preparation (#3088)
## Issue Addressed

- #3058 

Co-authored-by: Paul Hauner <paul@paulhauner.com>
2022-04-07 14:03:34 +00:00
Paul Hauner
8a40763183 Ensure VALID response from fcU updates protoarray (#3126)
## Issue Addressed

NA

## Proposed Changes

Ensures that a `VALID` response from a `forkchoiceUpdate` call will update that block in `ProtoArray`.

I also had to modify the mock execution engine so it wouldn't return valid when all payloads were supposed to be some other static value.

## Additional Info

NA
2022-04-05 20:58:17 +00:00
Paul Hauner
42cdaf5840 Add tests for importing blocks on invalid parents (#3123)
## Issue Addressed

NA

## Proposed Changes

- Adds more checks to prevent importing blocks atop parent with invalid execution payloads.
- Adds a test for these conditions.

## Additional Info

NA
2022-04-05 20:58:16 +00:00
Michael Sproul
bac7c3fa54 v2.2.0 (#3139)
## Proposed Changes

Cut release v2.2.0 including proposer boost.

## Additional Info

I also updated the clippy lints for the imminent release of Rust 1.60, although LH v2.2.0 will continue to compile using Rust 1.58 (our MSRV).
2022-04-05 02:53:09 +00:00
realbigsean
ea783360d3 Kiln mev boost (#3062)
## Issue Addressed

MEV boost compatibility

## Proposed Changes

See #2987

## Additional Info

This is blocked on the stabilization of a couple specs, [here](https://github.com/ethereum/beacon-APIs/pull/194) and [here](https://github.com/flashbots/mev-boost/pull/20).

Additional TODO's and outstanding questions

- [ ] MEV boost JWT Auth
- [ ] Will `builder_proposeBlindedBlock` return the revealed payload for the BN to propogate
- [ ] Should we remove `private-tx-proposals` flag and communicate BN <> VC with blinded blocks by default once these endpoints enter the beacon-API's repo? This simplifies merge transition logic. 

Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: realbigsean <sean@sigmaprime.io>
2022-03-31 07:52:23 +00:00
Michael Sproul
6efd95496b Optionally skip RANDAO verification during block production (#3116)
## Proposed Changes

Allow Lighthouse to speculatively create blocks via the `/eth/v1/validators/blocks` endpoint by optionally skipping the RANDAO verification that we introduced in #2740. When `verify_randao=false` is passed as a query parameter the `randao_reveal` is not required to be present, and if present will only be lightly checked (must be a valid BLS sig). If `verify_randao` is omitted it defaults to true and Lighthouse behaves exactly as it did previously, hence this PR is backwards-compatible.

I'd like to get this change into `unstable` pretty soon as I've got 3 projects building on top of it:

- [`blockdreamer`](https://github.com/michaelsproul/blockdreamer), which mocks block production every slot in order to fingerprint clients
- analysis of Lighthouse's block packing _optimality_, which uses `blockdreamer` to extract interesting instances of the attestation packing problem
- analysis of Lighthouse's block packing _performance_ (as in speed) on the `tree-states` branch

## Additional Info

Having tested `blockdreamer` with Prysm, Nimbus and Teku I noticed that none of them verify the randao signature on `/eth/v1/validator/blocks`. I plan to open a PR to the `beacon-APIs` repo anyway so that this parameter can be standardised in case the other clients add RANDAO verification by default in future.
2022-03-28 07:14:13 +00:00
Lucas Manuel
adca0efc64 feat: Update ASCII art (#3113)
## Issue Addressed

No issue, just updating merge ASCII art.

## Proposed Changes

Updating ASCII art for merge.

## Additional Info

Please provide any additional information. For example, future considerations
or information useful for reviewers.
2022-03-24 00:04:50 +00:00
ethDreamer
af50130e21 Add Proposer Cache Pruning & POS Activated Banner (#3109)
## Issue Addressed

The proposers cache wasn't being pruned. Also didn't have a celebratory banner for the merge 😄

## Banner
![pos_log_panda](https://user-images.githubusercontent.com/37123614/159528545-3aa54cbd-9362-49b1-830c-f4402f6ac341.png)
2022-03-22 21:33:38 +00:00
Paul Hauner
e4fa7d906f Fix post-merge checkpoint sync (#3065)
## Issue Addressed

This address an issue which was preventing checkpoint-sync.

When the node starts from checkpoint sync, the head block and the finalized block are the same value. We did not respect this when sending a `forkchoiceUpdated` (fcU) call to the EL and were expecting fork choice to hold the *finalized ancestor of the head* and returning an error when it didn't.

This PR uses *only fork choice* for sending fcU updates. This is actually quite nice and avoids some atomicity issues between `chain.canonical_head` and `chain.fork_choice`. Now, whenever `chain.fork_choice.get_head` returns a value we also cache the values required for the next fcU call.

## TODO

- [x] ~~Blocked on #3043~~
- [x] Ensure there isn't a warn message at startup.
2022-03-10 06:05:24 +00:00
Paul Hauner
267d8babc8 Prepare proposer (#3043)
## Issue Addressed

Resolves #2936

## Proposed Changes

Adds functionality for calling [`validator/prepare_beacon_proposer`](https://ethereum.github.io/beacon-APIs/?urls.primaryName=dev#/Validator/prepareBeaconProposer) in advance.

There is a `BeaconChain::prepare_beacon_proposer` method which, which called, computes the proposer for the next slot. If that proposer has been registered via the `validator/prepare_beacon_proposer` API method, then the `beacon_chain.execution_layer` will be provided the `PayloadAttributes` for us in all future forkchoiceUpdated calls. An artificial forkchoiceUpdated call will be created 4s before each slot, when the head updates and when a validator updates their information.

Additionally, I added strict ordering for calls from the `BeaconChain` to the `ExecutionLayer`. I'm not certain the `ExecutionLayer` will always maintain this ordering, but it's a good start to have consistency from the `BeaconChain`. There are some deadlock opportunities introduced, they are documented in the code.

## Additional Info

- ~~Blocked on #2837~~

Co-authored-by: realbigsean <seananderson33@GMAIL.com>
2022-03-09 00:42:05 +00:00
Pawan Dhananjay
381d0ece3c auth for engine api (#3046)
## Issue Addressed

Resolves #3015 

## Proposed Changes

Add JWT token based authentication to engine api requests. The jwt secret key is read from the provided file and is used to sign tokens that are used for authenticated communication with the EL node.

- [x] Interop with geth (synced `merge-devnet-4` with the `merge-kiln-v2` branch on geth)
- [x] Interop with other EL clients (nethermind on `merge-devnet-4`)
- [x] ~Implement `zeroize` for jwt secrets~
- [x] Add auth server tests with `mock_execution_layer`
- [x] Get auth working with the `execution_engine_integration` tests






Co-authored-by: Paul Hauner <paul@paulhauner.com>
2022-03-08 06:46:24 +00:00
Akihito Nakano
4186d117af Replace OpenOptions::new with File::options to be readable (#3059)
## Issue Addressed

Closes #3049 

This PR updates widely but this replace is safe as `File::options()` is equivelent to `OpenOptions::new()`.
ref: https://doc.rust-lang.org/stable/src/std/fs.rs.html#378-380
2022-03-07 06:30:18 +00:00
Michael Sproul
1829250ee4 Ignore attestations to finalized blocks (don't reject) (#3052)
## Issue Addressed

Addresses spec changes from v1.1.0:

- https://github.com/ethereum/consensus-specs/pull/2830
- https://github.com/ethereum/consensus-specs/pull/2846

## Proposed Changes

* Downgrade the REJECT for `HeadBlockFinalized` to an IGNORE. This applies to both unaggregated and aggregated attestations.

## Additional Info

I thought about also changing the penalty for `UnknownTargetRoot` but I don't think it's reachable in practice.
2022-03-04 00:41:22 +00:00
Paul Hauner
aea43b626b Rename random to prev_randao (#3040)
## Issue Addressed

As discussed on last-night's consensus call, the testnets next week will target the [Kiln Spec v2](https://hackmd.io/@n0ble/kiln-spec).

Presently, we support Kiln V1. V2 is backwards compatible, except for renaming `random` to `prev_randao` in:

- https://github.com/ethereum/execution-apis/pull/180
- https://github.com/ethereum/consensus-specs/pull/2835

With this PR we'll no longer be compatible with the existing Kintsugi and Kiln testnets, however we'll be ready for the testnets next week. I raised this breaking change in the call last night, we are all keen to move forward and break things.

We now target the [`merge-kiln-v2`](https://github.com/MariusVanDerWijden/go-ethereum/tree/merge-kiln-v2) branch for interop with Geth. This required adding the `--http.aauthport` to the tester to avoid a port conflict at startup.

### Changes to exec integration tests

There's some change in the `merge-kiln-v2` version of Geth that means it can't compile on a vanilla Github runner. Bumping the `go` version on the runner solved this issue.

Whilst addressing this, I refactored the `testing/execution_integration` crate to be a *binary* rather than a *library* with tests. This means that we don't need to run the `build.rs` and build Geth whenever someone runs `make lint` or `make test-release`. This is nice for everyday users, but it's also nice for CI so that we can have a specific runner for these tests and we don't need to ensure *all* runners support everything required to build all execution clients.

## More Info

- [x] ~~EF tests are failing since the rename has broken some tests that reference the old field name. I have been told there will be new tests released in the coming days (25/02/22 or 26/02/22).~~
2022-03-03 02:10:57 +00:00
Paul Hauner
b6493d5e24 Enforce Optimistic Sync Conditions & CLI Tests (v2) (#3050)
## Description

This PR adds a single, trivial commit (f5d2b27d78349d5a675a2615eba42cc9ae708094) atop #2986 to resolve a tests compile error. The original author (@ethDreamer) is AFK so I'm getting this one merged ☺️ 

Please see #2986 for more information about the other, significant changes in this PR.


Co-authored-by: Mark Mackey <mark@sigmaprime.io>
Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>
2022-03-01 22:56:47 +00:00
Paul Hauner
27e83b888c Retrospective invalidation of exec. payloads for opt. sync (#2837)
## Issue Addressed

NA

## Proposed Changes

Adds the functionality to allow blocks to be validated/invalidated after their import as per the [optimistic sync spec](https://github.com/ethereum/consensus-specs/blob/dev/sync/optimistic.md#how-to-optimistically-import-blocks). This means:

- Updating `ProtoArray` to allow flipping the `execution_status` of ancestors/descendants based on payload validity updates.
- Creating separation between `execution_layer` and the `beacon_chain` by creating a `PayloadStatus` struct.
- Refactoring how the `execution_layer` selects a `PayloadStatus` from the multiple statuses returned from multiple EEs.
- Adding testing framework for optimistic imports.
- Add `ExecutionBlockHash(Hash256)` new-type struct to avoid confusion between *beacon block roots* and *execution payload hashes*.
- Add `merge` to [`FORKS`](c3a793fd73/Makefile (L17)) in the `Makefile` to ensure we test the beacon chain with merge settings.
    - Fix some tests here that were failing due to a missing execution layer.

## TODO

- [ ] Balance tests

Co-authored-by: Mark Mackey <mark@sigmaprime.io>
2022-02-28 22:07:48 +00:00
Michael Sproul
5e1f8a8480 Update to Rust 1.59 and 2021 edition (#3038)
## Proposed Changes

Lots of lint updates related to `flat_map`, `unwrap_or_else` and string patterns. I did a little more creative refactoring in the op pool, but otherwise followed Clippy's suggestions.

## Additional Info

We need this PR to unblock CI.
2022-02-25 00:10:17 +00:00
Paul Hauner
0a6a8ea3b0 Engine API v1.0.0.alpha.6 + interop tests (#3024)
## Issue Addressed

NA

## Proposed Changes

This PR extends #3018 to address my review comments there and add automated integration tests with Geth (and other implementations, in the future).

I've also de-duplicated the "unused port" logic by creating an  `common/unused_port` crate.

## Additional Info

I'm not sure if we want to merge this PR, or update #3018 and merge that. I don't mind, I'm primarily opening this PR to make sure CI works.


Co-authored-by: Mark Mackey <mark@sigmaprime.io>
2022-02-17 21:47:06 +00:00
Paul Hauner
2f8531dc60 Update to consensus-specs v1.1.9 (#3016)
## Issue Addressed

Closes #3014

## Proposed Changes

- Rename `receipt_root` to `receipts_root`
- Rename `execute_payload` to `notify_new_payload`
   - This is slightly weird since we modify everything except the actual HTTP call to the engine API. That change is expected to be implemented in #2985 (cc @ethDreamer)
- Enable "random" tests for Bellatrix.

## Notes

This will break *partially* compatibility with Kintusgi testnets in order to gain compatibility with [Kiln](https://hackmd.io/@n0ble/kiln-spec) testnets. I think it will only break the BN APIs due to the `receipts_root` change, however it might have some other effects too.

Co-authored-by: Michael Sproul <micsproul@gmail.com>
2022-02-14 23:57:23 +00:00
Philipp K
5388183884 Allow per validator fee recipient via flag or file in validator client (similar to graffiti / graffiti-file) (#2924)
## Issue Addressed

#2883 

## Proposed Changes

* Added `suggested-fee-recipient` & `suggested-fee-recipient-file` flags to validator client (similar to graffiti / graffiti-file implementation).
* Added proposer preparation service to VC, which sends the fee-recipient of all known validators to the BN via [/eth/v1/validator/prepare_beacon_proposer](https://github.com/ethereum/beacon-APIs/pull/178) api once per slot
* Added [/eth/v1/validator/prepare_beacon_proposer](https://github.com/ethereum/beacon-APIs/pull/178) api endpoint and preparation data caching
* Added cleanup routine to remove cached proposer preparations when not updated for 2 epochs

## Additional Info

Changed the Implementation following the discussion in #2883.



Co-authored-by: pk910 <philipp@pk910.de>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
Co-authored-by: Philipp K <philipp@pk910.de>
2022-02-08 19:52:20 +00:00
Michael Sproul
99d2c33387 Avoid looking up pre-finalization blocks (#2909)
## Issue Addressed

This PR fixes the unnecessary `WARN Single block lookup failed` messages described here:

https://github.com/sigp/lighthouse/pull/2866#issuecomment-1008442640

## Proposed Changes

Add a new cache to the `BeaconChain` that tracks the block roots of blocks from before finalization. These could be blocks from the canonical chain (which might need to be read from disk), or old pre-finalization blocks that have been forked out.

The cache also stores a set of block roots for in-progress single block lookups, which duplicates some of the information from sync's `single_block_lookups` hashmap:

a836e180f9/beacon_node/network/src/sync/manager.rs (L192-L196)

On a live node you can confirm that the cache is working by grepping logs for the message: `Rejected attestation to finalized block`.
2022-01-27 22:58:32 +00:00
Michael Sproul
e70daaa3b6 Implement API for block rewards (#2628)
## Proposed Changes

Add an API endpoint for retrieving detailed information about block rewards.

For information on usage see [the docs](https://github.com/sigp/lighthouse/blob/block-rewards-api/book/src/api-lighthouse.md#lighthouseblock_rewards), and the source.
2022-01-27 01:06:02 +00:00
Rishi Kumar Ray
f0f327af0c Removed all disable_forks (#2925)
#2923 

Which issue # does this PR address?
There's a redundant field on the BeaconChain called disabled_forks that was once part of our fork-aware networking (#953) but which is no longer used and could be deleted. so Removed all references to disabled_forks so that the code compiles and git grep disabled_forks returns no results.

## Proposed Changes

Please list or describe the changes introduced by this PR.
Removed all references of disabled_forks


Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>
2022-01-20 09:14:26 +00:00
Paul Hauner
c11253a82f Remove grandparents from snapshot cache (#2917)
## Issue Addressed

NA

## Proposed Changes

In https://github.com/sigp/lighthouse/pull/2832 we made some changes to the `SnapshotCache` to help deal with the one-block reorgs seen on mainnet (and testnets).

I believe the change in #2832 is good and we should keep it, but I think that in its present form it is causing the `SnapshotCache` to hold onto states that it doesn't need anymore. For example, a skip slot will result in one more `BeaconSnapshot` being stored in the cache.

This PR adds a new type of pruning that happens after a block is inserted to the cache. We will remove any snapshot from the cache that is a *grandparent* of the block being imported. Since we know the grandparent has two valid blocks built atop it, it is not at risk from a one-block re-org. 

## Additional Info

NA
2022-01-14 07:20:55 +00:00
Paul Hauner
aaa5344eab Add peer score adjustment msgs (#2901)
## Issue Addressed

N/A

## Proposed Changes

This PR adds the `msg` field to `Peer score adjusted` log messages. These `msg` fields help identify *why* a peer was banned.

Example:

```
Jan 11 04:18:48.096 DEBG Peer score adjusted                     score: -100.00, peer_id: 16Uiu2HAmQskxKWWGYfginwZ51n5uDbhvjHYnvASK7PZ5gBdLmzWj, msg: attn_unknown_head, service: libp2p
Jan 11 04:18:48.096 DEBG Peer score adjusted                     score: -27.86, peer_id: 16Uiu2HAmA7cCb3MemVDbK3MHZoSb7VN3cFUG3vuSZgnGesuVhPDE, msg: sync_past_slot, service: libp2p
Jan 11 04:18:48.096 DEBG Peer score adjusted                     score: -100.00, peer_id: 16Uiu2HAmQskxKWWGYfginwZ51n5uDbhvjHYnvASK7PZ5gBdLmzWj, msg: attn_unknown_head, service: libp2p
Jan 11 04:18:48.096 DEBG Peer score adjusted                     score: -28.86, peer_id: 16Uiu2HAmA7cCb3MemVDbK3MHZoSb7VN3cFUG3vuSZgnGesuVhPDE, msg: sync_past_slot, service: libp2p
Jan 11 04:18:48.096 DEBG Peer score adjusted                     score: -29.86, peer_id: 16Uiu2HAmA7cCb3MemVDbK3MHZoSb7VN3cFUG3vuSZgnGesuVhPDE, msg: sync_past_slot, service: libp2p
```

There is also a `libp2p_report_peer_msgs_total` metrics which allows us to see count of reports per `msg` tag. 

## Additional Info

NA
2022-01-12 05:32:14 +00:00
Paul Hauner
61f60bdf03 Avoid penalizing peers for delays during processing (#2894)
## Issue Addressed

NA

## Proposed Changes

We have observed occasions were under-resourced nodes will receive messages that were valid *at the time*, but later become invalidated due to long waits for a `BeaconProcessor` worker.

In this PR, we will check to see if the message was valid *at the time of receipt*. If it was initially valid but invalid now, we just ignore the message without penalizing the peer.

## Additional Info

NA
2022-01-12 02:36:24 +00:00
Paul Hauner
02e2fd2fb8 Add early attester cache (#2872)
## Issue Addressed

NA

## Proposed Changes

Introduces a cache to attestation to produce atop blocks which will become the head, but are not fully imported (e.g., not inserted into the database).

Whilst attesting to a block before it's imported is rather easy, if we're going to produce that attestation then we also need to be able to:

1. Verify that attestation.
1. Respond to RPC requests for the `beacon_block_root`.

Attestation verification (1) is *partially* covered. Since we prime the shuffling cache before we insert the block into the early attester cache, we should be fine for all typical use-cases. However, it is possible that the cache is washed out before we've managed to insert the state into the database and then attestation verification will fail with a "missing beacon state"-type error.

Providing the block via RPC (2) is also partially covered, since we'll check the database *and* the early attester cache when responding a blocks-by-root request. However, we'll still omit the block from blocks-by-range requests (until the block lands in the DB). I *think* this is fine, since there's no guarantee that we return all blocks for those responses.

Another important consideration is whether or not the *parent* of the early attester block is available in the databse. If it were not, we might fail to respond to blocks-by-root request that are iterating backwards to collect a chain of blocks. I argue that *we will always have the parent of the early attester block in the database.* This is because we are holding the fork-choice write-lock when inserting the block into the early attester cache and we do not drop that until the block is in the database.
2022-01-11 01:35:55 +00:00
Paul Hauner
f6b5b1a8be Use ? debug formatting for block roots in beacon_chain.rs (#2890)
## Issue Addressed

NA

## Proposed Changes

Ensures full roots are printed, rather than shortened versions like `0x935b…d376`.

For example, it would be nice if we could do API queries based upon the roots shown in the `Beacon chain re-org` event:

```
Jan 05 12:36:52.224 WARN Beacon chain re-org                     reorg_distance: 2, new_slot: 2073184, new_head: 0x8a97…2dec, new_head_parent: 0xa985…7688, previous_slot: 2073183, previous_head: 0x935b…d376, service: beacon
Jan 05 13:35:05.832 WARN Beacon chain re-org                     reorg_distance: 1, new_slot: 2073475, new_head: 0x9207…c6b9, new_head_parent: 0xb2ce…839b, previous_slot: 2073474, previous_head: 0x8066…92f7, service: beacon
```

## Additional Info

We should eventually fix this project-wide, however this is a short-term patch.
2022-01-06 05:16:50 +00:00
Michael Sproul
fac117667b Update to superstruct v0.4.1 (#2886)
## Proposed Changes

Update `superstruct` to bring in @realbigsean's fixes necessary for MEV-compatible private beacon block types (a la #2795).

The refactoring is due to another change in superstruct that allows partial getters to be auto-generated.
2022-01-06 03:14:58 +00:00
Pawan Dhananjay
a0c5701e36 Only import blocks with valid execution payloads (#2869)
## Issue Addressed

N/A

## Proposed Changes

We are currently treating errors from the EL on `engine_executePayload` as `PayloadVerificationStatus::NotVerified`. This adds the block as a candidate head block in fork choice even if the EL explicitly rejected the block as invalid. 

`PayloadVerificationStatus::NotVerified` should be only returned when the EL explicitly returns "syncing" imo. This PR propagates an error instead of returning `NotVerified` on EL all EL errors.
2021-12-22 08:15:37 +00:00
Michael Sproul
a290a3c537 Add configurable block replayer (#2863)
## Issue Addressed

Successor to #2431

## Proposed Changes

* Add a `BlockReplayer` struct to abstract over the intricacies of calling `per_slot_processing` and `per_block_processing` while avoiding unnecessary tree hashing.
* Add a variant of the forwards state root iterator that does not require an `end_state`.
* Use the `BlockReplayer` when reconstructing states in the database. Use the efficient forwards iterator for frozen states.
* Refactor the iterators to remove `Arc<HotColdDB>` (this seems to be neater than making _everything_ an `Arc<HotColdDB>` as I did in #2431).

Supplying the state roots allow us to avoid building a tree hash cache at all when reconstructing historic states, which saves around 1 second flat (regardless of `slots-per-restore-point`). This is a small percentage of worst-case state load times with 200K validators and SPRP=2048 (~15s vs ~16s) but a significant speed-up for more frequent restore points: state loads with SPRP=32 should be now consistently <500ms instead of 1.5s (a ~3x speedup).

## Additional Info

Required by https://github.com/sigp/lighthouse/pull/2628
2021-12-21 06:30:52 +00:00
Michael Sproul
a43d5e161f Optimise balances cache in case of skipped slots (#2849)
## Proposed Changes

Remove the `is_first_block_in_epoch` logic from the balances cache update logic, as it was incorrect in the case of skipped slots. The updated code is simpler because regardless of whether the block is the first in the epoch we can check if an entry for the epoch boundary root already exists in the cache, and update the cache accordingly.

Additionally, to assist with flip-flopping justified epochs, move to cloning the balance cache rather than moving it. This should still be very fast in practice because the balances cache is a ~1.6MB `Vec`, and this operation is expected to only occur infrequently.
2021-12-13 23:35:57 +00:00
realbigsean
b22ac95d7f v1.1.6 Fork Choice changes (#2822)
## Issue Addressed

Resolves: https://github.com/sigp/lighthouse/issues/2741
Includes: https://github.com/sigp/lighthouse/pull/2853 so that we can get ssz static tests passing here on v1.1.6. If we want to merge that first, we can make this diff slightly smaller

## Proposed Changes

- Changes the `justified_epoch` and `finalized_epoch` in the `ProtoArrayNode` each to an `Option<Checkpoint>`. The `Option` is necessary only for the migration, so not ideal. But does allow us to add a default logic to `None` on these fields during the database migration.
- Adds a database migration from a legacy fork choice struct to the new one, search for all necessary block roots in fork choice by iterating through blocks in the db.
- updates related to https://github.com/ethereum/consensus-specs/pull/2727
  -  We will have to update the persisted forkchoice to make sure the justified checkpoint stored is correct according to the updated fork choice logic. This boils down to setting the forkchoice store's justified checkpoint to the justified checkpoint of the block that advanced the finalized checkpoint to the current one. 
  - AFAICT there's no migration steps necessary for the update to allow applying attestations from prior blocks, but would appreciate confirmation on that
- I updated the consensus spec tests to v1.1.6 here, but they will fail until we also implement the proposer score boost updates. I confirmed that the previously failing scenario `new_finalized_slot_is_justified_checkpoint_ancestor` will now pass after the boost updates, but haven't confirmed _all_ tests will pass because I just quickly stubbed out the proposer boost test scenario formatting.
- This PR now also includes proposer boosting https://github.com/ethereum/consensus-specs/pull/2730

## Additional Info
I realized checking justified and finalized roots in fork choice makes it more likely that we trigger this bug: https://github.com/ethereum/consensus-specs/pull/2727

It's possible the combination of justified checkpoint and finalized checkpoint in the forkchoice store is different from in any block in fork choice. So when trying to startup our store's justified checkpoint seems invalid to the rest of fork choice (but it should be valid). When this happens we get an `InvalidBestNode` error and fail to start up. So I'm including that bugfix in this branch.

Todo:

- [x] Fix fork choice tests
- [x] Self review
- [x] Add fix for https://github.com/ethereum/consensus-specs/pull/2727
- [x] Rebase onto Kintusgi 
- [x] Fix `num_active_validators` calculation as @michaelsproul pointed out
- [x] Clean up db migrations

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-12-13 20:43:22 +00:00
Pawan Dhananjay
e391b32858 Merge devnet 3 (#2859)
## Issue Addressed

N/A

## Proposed Changes

Changes required for the `merge-devnet-3`. Added some more non substantive renames on top of @realbigsean 's commit. 
Note: this doesn't include the proposer boosting changes in kintsugi v3.

This devnet isn't running with the proposer boosting fork choice changes so if we are looking to merge https://github.com/sigp/lighthouse/pull/2822 into `unstable`, then I think we should just maintain this branch for the devnet temporarily. 


Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
2021-12-12 09:04:21 +00:00
Mac L
a7a7edb6cf Optimise snapshot cache for late blocks (#2832)
## Proposed Changes

In the event of a late block, keep the block in the snapshot cache by cloning it. This helps us process new blocks quickly in the event the late block was re-org'd.


Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-12-06 03:41:31 +00:00
realbigsean
b5f2764bae fix cache miss justified balances calculation (#2852)
## Issue Addressed

We were calculating justified balances incorrectly on cache misses in `set_justified_checkpoint`

## Proposed Changes

Use the `get_effective_balances` method as opposed to `state.balances`, which returns exact balances


Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-12-03 16:58:10 +00:00
realbigsean
a80ccc3a33 1.57.0 lints (#2850)
## Issue Addressed

New rust lints

## Proposed Changes

- Boxing some enum variants
- removing some unused fields (is the validator lockfile unused? seemed so to me)

## Additional Info

- some error fields were marked as dead code but are logged out in areas
- left some dead fields in our ef test code because I assume they are useful for debugging?

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-12-03 04:44:30 +00:00
Paul Hauner
144978f8f8
Remove duplicate slot_clock method (#2842) 2021-12-02 14:29:59 +11:00
Paul Hauner
1b56ebf85e
Kintsugi review comments (#2831)
* Fix makefile

* Return on invalid finalized block

* Fix todo in gossip scoring

* Require --merge for --fee-recipient

* Bump eth2_serde_utils

* Change schema versions

* Swap hash/uint256 test_random impls

* Use default for ExecutionPayload::empty

* Check for DBs before removing

* Remove kintsugi docker image

* Fix CLI default value
2021-12-02 14:29:59 +11:00
ethDreamer
f6748537db
Removed PowBlock struct that never got used (#2813) 2021-12-02 14:29:20 +11:00
Paul Hauner
5f0fef2d1e
Kintsugi on_merge_block tests (#2811)
* Start v1.1.5 updates

* Implement new payload creation logic

* Tidy, add comments

* Remove unused error enums

* Add validate payload for gossip

* Refactor validate_merge_block

* Split payload verification in per block processing

* Add execute_payload

* Tidy

* Tidy

* Start working on new fork choice tests

* Fix failing merge block test

* Skip block_lookup_failed test

* Fix failing terminal block test

* Fixes from self-review

* Address review comments
2021-12-02 14:29:20 +11:00
pawan
44a7b37ce3
Increase network limits (#2796)
Fix max packet sizes

Fix max_payload_size function

Add merge block test

Fix max size calculation; fix up test

Clear comments

Add a payload_size_function

Use safe arith for payload calculation

Return an error if block too big in block production

Separate test to check if block is over limit
2021-12-02 14:29:20 +11:00
Paul Hauner
afe59afacd
Ensure difficulty/hash/epoch overrides change the ChainSpec (#2798)
* Unify loading of eth2_network_config

* Apply overrides at lighthouse binary level

* Remove duplicate override values

* Add merge values to existing net configs

* Make override flags global

* Add merge fields to testing config

* Add one to TTD

* Fix failing engine tests

* Fix test compile error

* Remove TTD flags

* Move get_eth2_network_config

* Fix warn

* Address review comments
2021-12-02 14:29:18 +11:00
Paul Hauner
47db682d7e
Implement engine API v1.0.0-alpha.4 (#2810)
* Added ForkchoiceUpdatedV1 & GetPayloadV1

* Added ExecutePayloadV1

* Added new geth test vectors

* Separated Json Object/Serialization Code into file

* Deleted code/tests for Requests Removed from spec

* Finally fixed serialization of null '0x'

* Made Naming of JSON Structs Consistent

* Fix clippy lints

* Remove u64 payload id

* Remove unused serde impls

* Swap to [u8; 8] for payload id

* Tidy

* Adjust some block gen return vals

* Tidy

* Add fallback when payload id is unknown

* Remove comment

Co-authored-by: Mark Mackey <mark@sigmaprime.io>
2021-12-02 14:26:55 +11:00
Paul Hauner
cbd2201164
Fixes after rebasing Kintsugi onto unstable (#2799)
* Fix fork choice after rebase

* Remove paulhauner warp dep

* Fix fork choice test compile errors

* Assume fork choice payloads are valid

* Add comment

* Ignore new tests

* Fix error in test skipping
2021-12-02 14:26:55 +11:00
realbigsean
de49c7ddaa
1.1.5 merge spec tests (#2781)
* Fix arbitrary check kintsugi

* Add merge chain spec fields, and a function to determine which constant to use based on the state variant

* increment spec test version

* Remove `Transaction` enum wrapper

* Remove Transaction new-type

* Remove gas validations

* Add `--terminal-block-hash-epoch-override` flag

* Increment spec tests version to 1.1.5

* Remove extraneous gossip verification https://github.com/ethereum/consensus-specs/pull/2687

* - Remove unused Error variants
- Require both "terminal-block-hash-epoch-override" and "terminal-block-hash-override" when either flag is used

* - Remove a couple more unused Error variants

Co-authored-by: Paul Hauner <paul@paulhauner.com>
2021-12-02 14:26:55 +11:00
realbigsean
d8eec16c5e
v1.1.1 spec updates (#2684)
* update initializing from eth1 for merge genesis

* read execution payload header from file lcli

* add `create-payload-header` command to `lcli`

* fix base fee parsing

* Apply suggestions from code review

* default `execution_payload_header` bool to false when deserializing `meta.yml` in EF tests

Co-authored-by: Paul Hauner <paul@paulhauner.com>
2021-12-02 14:26:54 +11:00
Paul Hauner
6dde12f311
[Merge] Optimistic Sync: Stage 1 (#2686)
* Add payload verification status to fork choice

* Pass payload verification status to import_block

* Add valid back-propagation

* Add head safety status latch to API

* Remove ExecutionLayerStatus

* Add execution info to client notifier

* Update notifier logs

* Change use of "hash" to refer to beacon block

* Shutdown on invalid finalized block

* Tidy, add comments

* Fix failing FC tests

* Allow blocks with unsafe head

* Fix forkchoiceUpdate call on startup
2021-12-02 14:26:54 +11:00
Paul Hauner
67a6f91df6
[Merge] Optimistic EL verification (#2683)
* Ignore payload errors

* Only return payload handle on valid response

* Push some engine logs down to debug

* Push ee fork choice log to debug

* Push engine call failure to debug

* Push some more errors to debug

* Fix panic at startup
2021-12-02 14:26:53 +11:00
Paul Hauner
35350dff75
[Merge] Block validator duties when EL is not ready (#2672)
* Reject some HTTP endpoints when EL is not ready

* Restrict more endpoints

* Add watchdog task

* Change scheduling

* Update to new schedule

* Add "syncing" concept

* Remove RequireSynced

* Add is_merge_complete to head_info

* Cache latest_head in Engines

* Call consensus_forkchoiceUpdate on startup
2021-12-02 14:26:53 +11:00
Paul Hauner
d6fda44620
Disable notifier logging from dummy eth1 backend (#2680) 2021-12-02 14:26:53 +11:00
ethDreamer
52e5083502
Fixed bugs for m3 readiness (#2669)
* Fixed bugs for m3 readiness

* woops

* cargo fmt..
2021-12-02 14:26:53 +11:00
Paul Hauner
b162b067de
Misc changes for merge testnets (#2667)
* Thread eth1_block_hash into interop genesis state

* Add merge-fork-epoch flag

* Build LH with minimal spec by default

* Add verbose logs to execution_layer

* Add --http-allow-sync-stalled flag

* Update lcli new-testnet to create genesis state

* Fix http test

* Fix compile errors in tests
2021-12-02 14:26:52 +11:00
Paul Hauner
a1033a9247
Add BeaconChainHarness tests for The Merge (#2661)
* Start adding merge tests

* Expose MockExecutionLayer

* Add mock_execution_layer to BeaconChainHarness

* Progress with merge test

* Return more detailed errors with gas limit issues

* Use a better gas limit in block gen

* Ensure TTD is met in block gen

* Fix basic_merge tests

* Start geth testing

* Fix conflicts after rebase

* Remove geth tests

* Improve merge test

* Address clippy lints

* Make pow block gen a pure function

* Add working new test, breaking existing test

* Fix test names

* Add should_panic

* Don't run merge tests in debug

* Detect a tokio runtime when starting MockServer

* Fix clippy lint, include merge tests
2021-12-02 14:26:52 +11:00
Paul Hauner
d8623cfc4f
[Merge] Implement execution_layer (#2635)
* Checkout serde_utils from rayonism

* Make eth1::http functions pub

* Add bones of execution_layer

* Modify decoding

* Expose Transaction, cargo fmt

* Add executePayload

* Add all minimal spec endpoints

* Start adding json rpc wrapper

* Finish custom JSON response handler

* Switch to new rpc sending method

* Add first test

* Fix camelCase

* Finish adding tests

* Begin threading execution layer into BeaconChain

* Fix clippy lints

* Fix clippy lints

* Thread execution layer into ClientBuilder

* Add CLI flags

* Add block processing methods to ExecutionLayer

* Add block_on to execution_layer

* Integrate execute_payload

* Add extra_data field

* Begin implementing payload handle

* Send consensus valid/invalid messages

* Fix minor type in task_executor

* Call forkchoiceUpdated

* Add search for TTD block

* Thread TTD into execution layer

* Allow producing block with execution payload

* Add LRU cache for execution blocks

* Remove duplicate 0x on ssz_types serialization

* Add tests for block getter methods

* Add basic block generator impl

* Add is_valid_terminal_block to EL

* Verify merge block in block_verification

* Partially implement --terminal-block-hash-override

* Add terminal_block_hash to ChainSpec

* Remove Option from terminal_block_hash in EL

* Revert merge changes to consensus/fork_choice

* Remove commented-out code

* Add bones for handling RPC methods on test server

* Add first ExecutionLayer tests

* Add testing for finding terminal block

* Prevent infinite loops

* Add insert_merge_block to block gen

* Add block gen test for pos blocks

* Start adding payloads to block gen

* Fix clippy lints

* Add execution payload to block gen

* Add execute_payload to block_gen

* Refactor block gen

* Add all routes to mock server

* Use Uint256 for base_fee_per_gas

* Add working execution chain build

* Remove unused var

* Revert "Use Uint256 for base_fee_per_gas"

This reverts commit 6c88f19ac45db834dd4dbf7a3c6e7242c1c0f735.

* Fix base_fee_for_gas Uint256

* Update execute payload handle

* Improve testing, fix bugs

* Fix default fee-recipient

* Fix fee-recipient address (again)

* Add check for terminal block, add comments, tidy

* Apply suggestions from code review

Co-authored-by: realbigsean <seananderson33@GMAIL.com>

* Fix is_none on handle Drop

* Remove commented-out tests

Co-authored-by: realbigsean <seananderson33@GMAIL.com>
2021-12-02 14:26:51 +11:00
ethDreamer
1563bce905
Finished Gossip Block Validation Conditions (#2640)
* Gossip Block Validation is Much More Efficient

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-12-02 14:26:51 +11:00
realbigsean
aa534f8989
Store execution block hash in fork choice (#2643)
* - Update the fork choice `ProtoNode` to include `is_merge_complete`
- Add database migration for the persisted fork choice

* update tests

* Small cleanup

* lints

* store execution block hash in fork choice rather than bool
2021-12-02 14:26:51 +11:00
Paul Hauner
c10e8ce955
Fix clippy lints on merge-f2f (#2626)
* Remove unchecked arith from ssz_derive

* Address clippy lints in block_verfication

* Use safe math for is_valid_gas_limit
2021-12-02 14:26:50 +11:00
Mark Mackey
5687c56d51
Initial merge changes
Added Execution Payload from Rayonism Fork

Updated new Containers to match Merge Spec

Updated BeaconBlockBody for Merge Spec

Completed updating BeaconState and BeaconBlockBody

Modified ExecutionPayload<T> to use Transaction<T>

Mostly Finished Changes for beacon-chain.md

Added some things for fork-choice.md

Update to match new fork-choice.md/fork.md changes

ran cargo fmt

Added Missing Pieces in eth2_libp2p for Merge

fix ef test

Various Changes to Conform Closer to Merge Spec
2021-12-02 14:26:50 +11:00
Paul Hauner
931daa40d7 Add fork choice EF tests (#2737)
## Issue Addressed

Resolves #2545

## Proposed Changes

Adds the long-overdue EF tests for fork choice. Although we had pretty good coverage via other implementations that closely followed our approach, it is nonetheless important for us to implement these tests too.

During testing I found that we were using a hard-coded `SAFE_SLOTS_TO_UPDATE_JUSTIFIED` value rather than one from the `ChainSpec`. This caused a failure during a minimal preset test. This doesn't represent a risk to mainnet or testnets, since the hard-coded value matched the mainnet preset.

## Failing Cases

There is one failing case which is presently marked as `SkippedKnownFailure`:

```
case 4 ("new_finalized_slot_is_justified_checkpoint_ancestor") from /home/paul/development/lighthouse/testing/ef_tests/consensus-spec-tests/tests/minimal/phase0/fork_choice/on_block/pyspec_tests/new_finalized_slot_is_justified_checkpoint_ancestor failed with NotEqual:
head check failed: Got Head { slot: Slot(40), root: 0x9183dbaed4191a862bd307d476e687277fc08469fc38618699863333487703e7 } | Expected Head { slot: Slot(24), root: 0x105b49b51bf7103c182aa58860b039550a89c05a4675992e2af703bd02c84570 }
```

This failure is due to #2741. It's not a particularly high priority issue at the moment, so we fix it after merging this PR.
2021-11-08 07:29:04 +00:00
realbigsean
c4ad0e3fb3 Ensure dependent root consistency in head events (#2753)
## Issue Addressed

@paulhauner noticed that when we send head events, we use the block root from `new_head` in `fork_choice_internal`, but calculate `dependent_root` and `previous_dependent_root` using the `canonical_head`. This is normally fine because `new_head` updates the `canonical_head` in `fork_choice_internal`, but it's possible we have a reorg updating `canonical_head` before our head events are sent. So this PR ensures `dependent_root` and `previous_dependent_root` are always derived from the state associated with `new_head`.



Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-11-02 02:26:32 +00:00
Pawan Dhananjay
4499adc7fd Check proposer index during block production (#2740)
## Issue Addressed

Resolves #2612 

## Proposed Changes

Implements both the checks mentioned in the original issue. 
1. Verifies the `randao_reveal` in the beacon node
2. Cross checks the proposer index after getting back the block from the beacon node.

## Additional info
The block production time increases by ~10x because of the signature verification on the beacon node (based on the `beacon_block_production_process_seconds` metric) when running on a local testnet.
2021-11-01 07:44:40 +00:00
Michael Sproul
ffb04e1a9e Add op pool metrics for attestations (#2758)
## Proposed Changes

Add several metrics for the number of attestations in the op pool. These give us a way to observe the number of valid, non-trivial attestations during block packing rather than just the size of the entire op pool.
2021-11-01 05:52:31 +00:00
Michael Sproul
d2e3d4c6f1 Add flag to disable lock timeouts (#2714)
## Issue Addressed

Mitigates #1096

## Proposed Changes

Add a flag to the beacon node called `--disable-lock-timeouts` which allows opting out of lock timeouts.

The lock timeouts serve a dual purpose:

1. They prevent any single operation from hogging the lock for too long. When a timeout occurs it logs a nasty error which indicates that there's suboptimal lock use occurring, which we can then act on.
2. They allow deadlock detection. We're fairly sure there are no deadlocks left in Lighthouse anymore but the timeout locks offer a safeguard against that.

However, timeouts on locks are not without downsides:

They allow for the possibility of livelock, particularly on slower hardware. If lock timeouts keep failing spuriously the node can be prevented from making any progress, even if it would be able to make progress slowly without the timeout. One particularly concerning scenario which could occur would be if a DoS attack succeeded in slowing block signature verification times across the network, and all Lighthouse nodes got livelocked because they timed out repeatedly. This could also occur on just a subset of nodes (e.g. dual core VPSs or Raspberri Pis).

By making the behaviour runtime configurable this PR allows us to choose the behaviour we want depending on circumstance. I suspect that long term we could make the timeout-free approach the default (#2381 moves in this direction) and just enable the timeouts on our testnet nodes for debugging purposes. This PR conservatively leaves the default as-is so we can gain some more experience before switching the default.
2021-10-19 00:30:40 +00:00
Paul Hauner
a7b675460d Add Altair tests to op pool (#2723)
## Issue Addressed

NA

## Proposed Changes

Adds some more testing for Altair to the op pool. Credits to @michaelsproul for some appropriated efforts here.

## Additional Info

NA


Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-10-16 05:07:23 +00:00
Michael Sproul
5cde3fc4da Reduce lock contention in backfill sync (#2716)
## Proposed Changes

Clone the proposer pubkeys during backfill signature verification to reduce the time that the pubkey cache lock is held for. Cloning such a small number of pubkeys has negligible impact on the total running time, but greatly reduces lock contention.

On a Ryzen 5950X, the setup step seems to take around 180us regardless of whether the key is cloned or not, while the verification takes 7ms. When Lighthouse is limited to 10% of one core using `sudo cpulimit --pid <pid> --limit 10` the total time jumps up to 800ms, but the setup step remains only 250us. This means that under heavy load this PR could cut the time the lock is held for from 800ms to 250us, which is a huge saving of 99.97%!
2021-10-15 03:28:03 +00:00
Paul Hauner
e2d09bb8ac Add BeaconChainHarness::builder (#2707)
## Issue Addressed

NA

## Proposed Changes

This PR is near-identical to https://github.com/sigp/lighthouse/pull/2652, however it is to be merged into `unstable` instead of `merge-f2f`. Please see that PR for reasoning.

I'm making this duplicate PR to merge to `unstable` in an effort to shrink the diff between `unstable` and `merge-f2f` by doing smaller, lead-up PRs.

## Additional Info

NA
2021-10-14 02:58:10 +00:00
Pawan Dhananjay
34d22b5920 Reduce validator monitor logging verbosity (#2606)
## Issue Addressed

Resolves #2541

## Proposed Changes

Reduces verbosity of validator monitor per epoch logging by batching info logs for multiple validators.

Instead of a log for every validator managed by the validator monitor, we now batch logs for attestation records for previous epoch.

Before:
```log
Sep 20 06:53:08.239 INFO Previous epoch attestation success      validator: 1, epoch: 65875, matched_head: true, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon
Sep 20 06:53:08.239 INFO Previous epoch attestation success      validator: 2, epoch: 65875, matched_head: true, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon
Sep 20 06:53:08.239 INFO Previous epoch attestation success      validator: 3, epoch: 65875, matched_head: true, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon
Sep 20 06:53:08.239 INFO Previous epoch attestation success      validator: 4, epoch: 65875, matched_head: true, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon
Sep 20 06:53:08.239 INFO Previous epoch attestation success      validator: 5, epoch: 65875, matched_head: false, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon
Sep 20 06:53:08.239 WARN Attestation failed to match head        validator: 5, epoch: 65875, service: val_mon
Sep 20 06:53:08.239 INFO Previous epoch attestation success      validator: 6, epoch: 65875, matched_head: false, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon
Sep 20 06:53:08.239 WARN Attestation failed to match head        validator: 6, epoch: 65875, service: val_mon
Sep 20 06:53:08.239 INFO Previous epoch attestation success      validator: 7, epoch: 65875, matched_head: true, matched_target: false, inclusion_lag: 1 slot(s), service: val_mon
Sep 20 06:53:08.239 WARN Attestation failed to match target      validator: 7, epoch: 65875, service: val_mon
Sep 20 06:53:08.239 WARN Sub-optimal inclusion delay             validator: 7, epoch: 65875, optimal: 1, delay: 2, service: val_mon
Sep 20 06:53:08.239 INFO Previous epoch attestation success      validator: 8, epoch: 65875, matched_head: true, matched_target: false, inclusion_lag: 1 slot(s), service: val_mon
Sep 20 06:53:08.239 WARN Attestation failed to match target      validator: 8, epoch: 65875, service: val_mon
Sep 20 06:53:08.239 WARN Sub-optimal inclusion delay             validator: 8, epoch: 65875, optimal: 1, delay: 2, service: val_mon
Sep 20 06:53:08.239 ERRO Previous epoch attestation missing      validator: 9, epoch: 65875, service: val_mon
Sep 20 06:53:08.239 ERRO Previous epoch attestation missing      validator: 10, epoch: 65875, service: val_mon
```

after
```
Sep 20 06:53:08.239 INFO Previous epoch attestation success      validators: [1,2,3,4,5,6,7,8,9] , epoch: 65875, service: val_mon
Sep 20 06:53:08.239 WARN Previous epoch attestation failed to match head, validators: [5,6], epoch: 65875, service: val_mon
Sep 20 06:53:08.239 WARN Previous epoch attestation failed to match target, validators: [7,8], epoch: 65875, service: val_mon
Sep 20 06:53:08.239 WARN Previous epoch attestations had sub-optimal inclusion delay, validators: [7,8], epoch: 65875, service: val_mon
Sep 20 06:53:08.239 ERRO Previous epoch attestation missing      validators: [9,10], epoch: 65875, service: val_mon
```

The detailed individual logs are downgraded to debug logs.
2021-10-12 05:06:48 +00:00
Wink Saville
58870fc6d3 Add test_logger as feature to logging (#2586)
## Issue Addressed

Fix #2585

## Proposed Changes

Provide a canonical version of test_logger that can be used
throughout lighthouse.

## Additional Info

This allows tests to conditionally emit logging data by adding
test_logger as the default logger. And then when executing
`cargo test --features logging/test_logger` log output
will be visible:

  wink@3900x:~/lighthouse/common/logging/tests/test-feature-test_logger (Add-test_logger-as-feature-to-logging)
  $ cargo test --features logging/test_logger
      Finished test [unoptimized + debuginfo] target(s) in 0.02s
       Running unittests (target/debug/deps/test_logger-e20115db6a5e3714)

  running 1 test
  Sep 10 12:53:45.212 INFO hi, module: test_logger:8
  test tests::test_fn_with_logging ... ok

  test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Doc-tests test-logger

  running 0 tests

  test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Or, in normal scenarios where logging isn't needed, executing
`cargo test` the log output will not be visible:

  wink@3900x:~/lighthouse/common/logging/tests/test-feature-test_logger (Add-test_logger-as-feature-to-logging)
  $ cargo test
      Finished test [unoptimized + debuginfo] target(s) in 0.02s
       Running unittests (target/debug/deps/test_logger-02e02f8d41e8cf8a)

  running 1 test
  test tests::test_fn_with_logging ... ok

  test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Doc-tests test-logger

  running 0 tests

  test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
2021-10-06 00:46:07 +00:00
Michael Sproul
ed1fc7cca6 Fix I/O atomicity issues with checkpoint sync (#2671)
## Issue Addressed

This PR addresses an issue found by @YorickDowne during testing of v2.0.0-rc.0.

Due to a lack of atomic database writes on checkpoint sync start-up, it was possible for the database to get into an inconsistent state from which it couldn't recover without `--purge-db`. The core of the issue was that the store's anchor info was being stored _before_ the `PersistedBeaconChain`. If a crash occured so that anchor info was stored but _not_ the `PersistedBeaconChain`, then on restart Lighthouse would think the database was unitialized and attempt to compare-and-swap a `None` value, but would actually find the stale info from the previous run.

## Proposed Changes

The issue is fixed by writing the anchor info, the split point, and the `PersistedBeaconChain` atomically on start-up. Some type-hinting ugliness was required, which could possibly be cleaned up in future refactors.
2021-10-05 03:53:17 +00:00
Mac L
4c510f8f6b Add BlockTimesCache to allow additional block delay metrics (#2546)
## Issue Addressed

Closes #2528

## Proposed Changes

- Add `BlockTimesCache` to provide block timing information to `BeaconChain`. This allows additional metrics to be calculated for blocks that are set as head too late.
- Thread the `seen_timestamp` of blocks received from RPC responses (except blocks from syncing) through to the sync manager, similar to what is done for blocks from gossip.

## Additional Info

This provides the following additional metrics:
- `BEACON_BLOCK_OBSERVED_SLOT_START_DELAY_TIME`
  - The delay between the start of the slot and when the block was first observed.
- `BEACON_BLOCK_IMPORTED_OBSERVED_DELAY_TIME`
   - The delay between when the block was first observed and when the block was imported.
- `BEACON_BLOCK_HEAD_IMPORTED_DELAY_TIME`
  - The delay between when the block was imported and when the block was set as head.

The metric `BEACON_BLOCK_IMPORTED_SLOT_START_DELAY_TIME` was removed.

A log is produced when a block is set as head too late, e.g.:
```
Aug 27 03:46:39.006 DEBG Delayed head block                      set_as_head_delay: Some(21.731066ms), imported_delay: Some(119.929934ms), observed_delay: Some(3.864596988s), block_delay: 4.006257988s, slot: 1931331, proposer_index: 24294, block_root: 0x937602c89d3143afa89088a44bdf4b4d0d760dad082abacb229495c048648a9e, service: beacon
```
2021-09-30 04:31:41 +00:00
Pawan Dhananjay
70441aa554 Improve valmon inclusion delay calculation (#2618)
## Issue Addressed

Resolves #2552 

## Proposed Changes

Offers some improvement in inclusion distance calculation in the validator monitor. 

When registering an attestation from a block, instead of doing `block.slot() - attesstation.data.slot()` to get the inclusion distance, we now pass the parent block slot from the beacon chain and do `parent_slot.saturating_sub(attestation.data.slot())`. This allows us to give best effort inclusion distance in scenarios where the attestation was included right after a skip slot. Note that this does not give accurate results in scenarios where the attestation was included few blocks after the skip slot.

In this case, if the attestation slot was `b1` and was included in block `b2` with a skip slot in between, we would get the inclusion delay as 0  (by ignoring the skip slot) which is the best effort inclusion delay.
```
b1 <- missed <- b2
``` 

Here, if the attestation slot was `b1` and was included in block `b3` with a skip slot and valid block `b2` in between, then we would get the inclusion delay as 2 instead of 1 (by ignoring the skip slot).
```
b1 <- missed <- b2 <- b3 
```
A solution for the scenario 2 would be to count number of slots between included slot and attestation slot ignoring the skip slots in the beacon chain and pass the value to the validator monitor. But I'm concerned that it could potentially lead to db accesses for older blocks in extreme cases.


This PR also uses the validator monitor data for logging per epoch inclusion distance. This is useful as we won't get inclusion data in post-altair summaries.


Co-authored-by: Michael Sproul <micsproul@gmail.com>
2021-09-30 01:22:43 +00:00
realbigsean
7d13e57d9f Add interop metrics (#2645)
## Issue Addressed

Resolves: #2644

## Proposed Changes

- Adds mandatory metrics mentioned here: https://github.com/ethereum/beacon-metrics/blob/master/metrics.md#interop-metrics

## Additional Info

Couldn't figure out how to alias metrics, so I created them all as new gauges/counters.

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-09-29 23:44:24 +00:00
realbigsean
113ef74ef6 Add contribution and proof event (#2527)
## Issue Addressed

N/A

## Proposed Changes

Add the new ContributionAndProof event: https://github.com/ethereum/beacon-APIs/pull/158

## Additional Info

N/A

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-09-25 07:53:58 +00:00
Paul Hauner
be11437c27 Batch BLS verification for attestations (#2399)
## Issue Addressed

NA

## Proposed Changes

Adds the ability to verify batches of aggregated/unaggregated attestations from the network.

When the `BeaconProcessor` finds there are messages in the aggregated or unaggregated attestation queues, it will first check the length of the queue:

- `== 1` verify the attestation individually.
- `>= 2` take up to 64 of those attestations and verify them in a batch.

Notably, we only perform batch verification if the queue has a backlog. We don't apply any artificial delays to attestations to try and force them into batches. 

### Batching Details

To assist with implementing batches we modify `beacon_chain::attestation_verification` to have two distinct categories for attestations:

- *Indexed* attestations: those which have passed initial validation and were valid enough for us to derive an `IndexedAttestation`.
- *Verified* attestations: those attestations which were indexed *and also* passed signature verification. These are well-formed, interesting messages which were signed by validators.

The batching functions accept `n` attestations and then return `n` attestation verification `Result`s, where those `Result`s can be any combination of `Ok` or `Err`. In other words, we attempt to verify as many attestations as possible and return specific per-attestation results so peer scores can be updated, if required.

When we batch verify attestations, we first try to map all those attestations to *indexed* attestations. If any of those attestations were able to be indexed, we then perform batch BLS verification on those indexed attestations. If the batch verification succeeds, we convert them into *verified* attestations, disabling individual signature checking. If the batch fails, we convert to verified attestations with individual signature checking enabled.

Ultimately, we optimistically try to do a batch verification of attestation signatures and fall-back to individual verification if it fails. This opens an attach vector for "poisoning" the attestations and causing us to waste a batch verification. I argue that peer scoring should do a good-enough job of defending against this and the typical-case gains massively outweigh the worst-case losses.

## Additional Info

Before this PR, attestation verification took the attestations by value (instead of by reference). It turns out that this was unnecessary and, in my opinion, resulted in some undesirable ergonomics (e.g., we had to pass the attestation back in the `Err` variant to avoid clones). In this PR I've modified attestation verification so that it now takes a reference.

I refactored the `beacon_chain/tests/attestation_verification.rs` tests so they use a builder-esque "tester" struct instead of a weird macro. It made it easier for me to test individual/batch with the same set of tests and I think it was a nice tidy-up. Notably, I did this last to try and make sure my new refactors to *actual* production code would pass under the existing test suite.
2021-09-22 08:49:41 +00:00
Michael Sproul
9667dc2f03 Implement checkpoint sync (#2244)
## Issue Addressed

Closes #1891
Closes #1784

## Proposed Changes

Implement checkpoint sync for Lighthouse, enabling it to start from a weak subjectivity checkpoint.

## Additional Info

- [x] Return unavailable status for out-of-range blocks requested by peers (#2561)
- [x] Implement sync daemon for fetching historical blocks (#2561)
- [x] Verify chain hashes (either in `historical_blocks.rs` or the calling module)
- [x] Consistency check for initial block + state
- [x] Fetch the initial state and block from a beacon node HTTP endpoint
- [x] Don't crash fetching beacon states by slot from the API
- [x] Background service for state reconstruction, triggered by CLI flag or API call.

Considered out of scope for this PR:

- Drop the requirement to provide the `--checkpoint-block` (this would require some pretty heavy refactoring of block verification)


Co-authored-by: Diva M <divma@protonmail.com>
2021-09-22 00:37:28 +00:00
Pawan Dhananjay
5a3bcd2904 Validator monitor support for sync committees (#2476)
## Issue Addressed

N/A

## Proposed Changes

Add functionality in the validator monitor to provide sync committee related metrics for monitored validators.


Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-08-31 23:31:36 +00:00
Paul Hauner
44fa54004c Persist to DB after setting canonical head (#2547)
## Issue Addressed

NA

## Proposed Changes

Missed head votes on attestations is a well-known issue. The primary cause is a block getting set as the head *after* the attestation deadline.

This PR aims to shorten the overall time between "block received" and "block set as head" by:

1. Persisting the head and fork choice *after* setting the canonical head
    - Informal measurements show this takes ~200ms
 1. Pruning the op pool *after* setting the canonical head.
 1. No longer persisting the op pool to disk during `BeaconChain::fork_choice`
     - Informal measurements show this can take up to 1.2s.
     
I also add some metrics to help measure the effect of these changes.
     
Persistence changes like this run the risk of breaking assumptions downstream. However, I have considered these risks and I think we're fine here. I will describe my reasoning for each change.

## Reasoning

### Change 1:  Persisting the head and fork choice *after* setting the canonical head

For (1), although the function is called `persist_head_and_fork_choice`, it only persists:

- Fork choice
- Head tracker
- Genesis block root

Since `BeaconChain::fork_choice_internal` does not modify these values between the original time we were persisting it and the current time, I assert that the change I've made is non-substantial in terms of what ends up on-disk. There's the possibility that some *other* thread has modified fork choice in the extra time we've given it, but that's totally fine.

Since the only time we *read* those values from disk is during startup, I assert that this has no impact during runtime. 

### Change 2: Pruning the op pool after setting the canonical head

Similar to the argument above, we don't modify the op pool during `BeaconChain::fork_choice_internal` so it shouldn't matter when we prune. This change should be non-substantial.

### Change 3: No longer persisting the op pool to disk during `BeaconChain::fork_choice`

This change *is* substantial. With the proposed changes, we'll only be persisting the op pool to disk when we shut down cleanly (i.e., the `BeaconChain` gets dropped). This means we'll save disk IO and time during usual operation, but a `kill -9` or similar "crash" will probably result in an out-of-date op pool when we reboot. An out-of-date op pool can only have an impact when producing blocks or aggregate attestations/sync committees.

I think it's pretty reasonable that a crash might result in an out-of-date op pool, since:

- Crashes are fairly rare. Practically the only time I see LH suffer a full crash is when the OOM killer shows up, and that's a very serious event.
- It's generally quite rare to produce a block/aggregate immediately after a reboot. Just a few slots of runtime is probably enough to have a decent-enough op pool again.

## Additional Info

Credits to @macladson for the timings referenced here.
2021-08-31 04:48:21 +00:00
Michael Sproul
10945e0619 Revert bad blocks on missed fork (#2529)
## Issue Addressed

Closes #2526

## Proposed Changes

If the head block fails to decode on start up, do two things:

1. Revert all blocks between the head and the most recent hard fork (to `fork_slot - 1`).
2. Reset fork choice so that it contains the new head, and all blocks back to the new head's finalized checkpoint.

## Additional Info

I tweaked some of the beacon chain test harness stuff in order to make it generic enough to test with a non-zero slot clock on start-up. In the process I consolidated all the various `new_` methods into a single generic one which will hopefully serve all future uses 🤞
2021-08-30 06:41:31 +00:00