Commit Graph

2622 Commits

Author SHA1 Message Date
Pawan Dhananjay
1430b561c3
Add more gossip verification conditions 2022-10-06 21:16:59 -05:00
realbigsean
44515b8cbe
cargo fix 2022-10-05 17:20:54 -04:00
realbigsean
b5b4ce9509
blob production 2022-10-05 17:14:45 -04:00
Pawan Dhananjay
91efb9d4c7
Add todos 2022-10-05 02:56:57 -05:00
Pawan Dhananjay
21bf3d37cd
Reprocess blob sidecar messages 2022-10-05 02:52:26 -05:00
Pawan Dhananjay
c55b28bf10
Minor fixes 2022-10-04 19:18:06 -05:00
Pawan Dhananjay
12fe514550
Add more gossip verification functions for blobs 2022-10-04 19:17:53 -05:00
Pawan Dhananjay
9d99c784ea
Add gossip verification stub 2022-10-04 17:54:14 -05:00
realbigsean
7527c2b455
fix RPC limit add blob signing domain 2022-10-04 14:57:29 -04:00
realbigsean
ba16a037a3
cleanup 2022-10-04 09:34:05 -04:00
mariuspod
242ae21e5d Pass EL JWT secret key via cli flag (#3568)
## Proposed Changes

In this change I've added a new beacon_node cli flag `--execution-jwt-secret-key` for passing the JWT secret directly as string.

Without this flag, it was non-trivial to pass a secrets file containing a JWT secret key without compromising its contents into some management repo or fiddling around with manual file mounts for cloud-based deployments.

When used in combination with environment variables, the secret can be injected into container-based systems like docker & friends quite easily.

It's both possible to either specify the file_path to the JWT secret or pass the JWT secret directly.

I've modified the docs and attached a test as well.

## Additional Info

The logic has been adapted a bit so that either one of `--execution-jwt` or `--execution-jwt-secret-key` must be set when specifying `--execution-endpoint` so that it's still compatible with the semantics before this change and there's at least one secret provided.
2022-10-04 12:41:03 +00:00
realbigsean
c0dc42ea07
cargo fmt 2022-10-04 08:21:46 -04:00
Divma
4926e3967f [DEV FEATURE] Deterministic long lived subnets (#3453)
## Issue Addressed

#2847 

## Proposed Changes
Add under a feature flag the required changes to subscribe to long lived subnets in a deterministic way

## Additional Info

There is an additional required change that is actually searching for peers using the prefix, but I find that it's best to make this change in the future
2022-10-04 10:37:48 +00:00
GeemoCandama
6a92bf70e4 CLI tests for logging flags (#3609)
## Issue Addressed
Adding CLI tests for logging flags: log-color and disable-log-timestamp
Which issue # does this PR address?
#3588 
## Proposed Changes
Add CLI tests for logging flags as described in #3588 
Please list or describe the changes introduced by this PR.
Added logger_config to client::Config as suggested. Implemented Default for LoggerConfig based on what was being done elsewhere in the repo. Created 2 tests for each flag addressed.
## Additional Info

Please provide any additional information. For example, future considerations
or information useful for reviewers.
2022-10-04 08:33:40 +00:00
Pawan Dhananjay
8728c40102 Remove fallback support from eth1 service (#3594)
## Issue Addressed

N/A

## Proposed Changes

With https://github.com/sigp/lighthouse/pull/3214 we made it such that you can either have 1 auth endpoint or multiple non auth endpoints. Now that we are post merge on all networks (testnets and mainnet), we cannot progress a chain without a dedicated auth execution layer connection so there is no point in having a non-auth eth1-endpoint for syncing deposit cache. 

This code removes all fallback related code in the eth1 service. We still keep the single non-auth endpoint since it's useful for testing.

## Additional Info

This removes all eth1 fallback related metrics that were relevant for the monitoring service, so we might need to change the api upstream.
2022-10-04 08:33:39 +00:00
realbigsean
8d45e48775
cargo fix 2022-10-03 21:52:16 -04:00
realbigsean
e81dbbfea4
compile 2022-10-03 21:48:02 -04:00
realbigsean
88006735c4
compile 2022-10-03 10:06:04 -04:00
realbigsean
7520651515
cargo fix and some test fixes 2022-09-29 12:43:35 -04:00
realbigsean
fe6fc55449
fix compilation errors, rename capella -> shanghai, cleanup some rebase issues 2022-09-29 12:43:13 -04:00
realbigsean
809b52715e
some block building updates 2022-09-29 12:38:00 -04:00
realbigsean
acaa340b41
add new beacon state variant for shanghai 2022-09-29 12:37:14 -04:00
realbigsean
203418ffc9
add engine_getBlobV1 2022-09-29 12:35:55 -04:00
realbigsean
3f1e5cee78
Some gossip work 2022-09-29 12:35:53 -04:00
realbigsean
ebc0ccd02a
some more sync boilerplate 2022-09-29 12:34:09 -04:00
realbigsean
4008da6c60
sync tx blobs 2022-09-29 12:32:55 -04:00
realbigsean
4cdf1b546d
add shanghai fork version and epoch 2022-09-29 12:28:58 -04:00
realbigsean
de44b300c0
add/update types 2022-09-29 12:25:56 -04:00
Age Manning
27bb9ff07d Handle Lodestar's new agent string (#3620)
## Issue Addressed

#3561 

## Proposed Changes

Recognize Lodestars new agent string and appropriately count these peers as lodestar peers.
2022-09-29 01:50:13 +00:00
Age Manning
01b6bf7a2d Improve logging a little (#3619)
Some of the logs in combination with others could be improved. 

It will save some time debugging by improving the wording slightly.
2022-09-29 01:50:12 +00:00
Divma
b1d2510d1b Libp2p v0.48.0 upgrade (#3547)
## Issue Addressed

Upgrades libp2p to v.0.47.0. This is the compilation of
- [x] #3495 
- [x] #3497 
- [x] #3491 
- [x] #3546 
- [x] #3553 

Co-authored-by: Age Manning <Age@AgeManning.com>
2022-09-29 01:50:11 +00:00
Paul Hauner
01e84b71f5 v3.1.2 (#3603)
## Issue Addressed

NA

## Proposed Changes

Bump versions to v3.1.2

## Additional Info

- ~~Blocked on several PRs.~~
- ~~Requires further testing.~~
2022-09-26 01:17:36 +00:00
Divma
bd873e7162 New rust lints for rustc 1.64.0 (#3602)
## Issue Addressed
fixes lints from the last rust release

## Proposed Changes
Fix the lints, most of the lints by `clippy::question-mark` are false positives in the form of https://github.com/rust-lang/rust-clippy/issues/9518 so it's allowed for now

## Additional Info
2022-09-23 03:52:46 +00:00
Divma
9bd384a573 send attnet unsubscription event on random subnet expiry (#3600)
## Issue Addressed
🐞 in which we don't actually unsubscribe from a random long lived subnet when it expires

## Proposed Changes

Remove code addressing a specific case in which we are subscribed to all subnets and handle the removal of the long lived subnet. I don't think the special case code is particularly important as, if someone is running with that many validators to be subscribed to all subnets, it should use `--subscribe-all-subnets` instead

## Additional Info

Noticed on some test nodes climbing bandwidth usage periodically (around 27hours, the time of subnet expirations) I'm running this code to test this does not happen anymore, but I think it should be good now
2022-09-23 03:52:45 +00:00
Paul Hauner
9246a92d76 Make garbage collection test less failure prone (#3599)
## Issue Addressed

NA

## Proposed Changes

This PR attempts to fix the following spurious CI failure:

```
---- store_tests::garbage_collect_temp_states_from_failed_block stdout ----
thread 'store_tests::garbage_collect_temp_states_from_failed_block' panicked at 'disk store should initialize: DBError { message: "Error { message: \"IO error: lock /tmp/.tmp6DcBQ9/cold_db/LOCK: already held by process\" }" }', beacon_node/beacon_chain/tests/store_tests.rs:59:10
```

I believe that some async task is taking a clone of the store and holding it in some other thread for a short time. This creates a race-condition when we try to open a new instance of the store.

## Additional Info

NA
2022-09-23 03:52:44 +00:00
Paul Hauner
fa6ad1a11a Deduplicate block root computation (#3590)
## Issue Addressed

NA

## Proposed Changes

This PR removes duplicated block root computation.

Computing the `SignedBeaconBlock::canonical_root` has become more expensive since the merge as we need to compute the merke root of each transaction inside an `ExecutionPayload`.

Computing the root for [a mainnet block](https://beaconcha.in/slot/4704236) is taking ~10ms on my i7-8700K CPU @ 3.70GHz (no sha extensions). Given that our median seen-to-imported time for blocks is presently 300-400ms, removing a few duplicated block roots (~30ms) could represent an easy 10% improvement. When we consider that the seen-to-imported times include operations *after* the block has been placed in the early attester cache, we could expect the 30ms to be more significant WRT our seen-to-attestable times.

## Additional Info

NA
2022-09-23 03:52:42 +00:00
Paul Hauner
3128b5b430 v3.1.1 (#3585)
## Issue Addressed

NA

## Proposed Changes

Bump versions

## Additional Info

- ~~Requires additional testing~~
- ~~Blocked on:~~
    - ~~#3589~~
    - ~~#3540~~
    - ~~#3587~~
2022-09-22 06:08:52 +00:00
Paul Hauner
96692b8e43 Impl oneshot_broadcast for committee promises (#3595)
## Issue Addressed

NA

## Proposed Changes

Fixes an issue introduced in #3574 where I erroneously assumed that a `crossbeam_channel` multiple receiver queue was a *broadcast* queue. This is incorrect, each message will be received by *only one* receiver. The effect of this mistake is these logs:

```
Sep 20 06:56:17.001 INFO Synced                                  slot: 4736079, block: 0xaa8a…180d, epoch: 148002, finalized_epoch: 148000, finalized_root: 0x2775…47f2, exec_hash: 0x2ca5…ffde (verified), peers: 6, service: slot_notifier
Sep 20 06:56:23.237 ERRO Unable to validate attestation          error: CommitteeCacheWait(RecvError), peer_id: 16Uiu2HAm2Jnnj8868tb7hCta1rmkXUf5YjqUH1YPj35DCwNyeEzs, type: "aggregated", slot: Slot(4736047), beacon_block_root: 0x88d318534b1010e0ebd79aed60b6b6da1d70357d72b271c01adf55c2b46206c1
```

## Additional Info

NA
2022-09-21 01:01:50 +00:00
Paul Hauner
a95bcba2ab Avoid holding write-lock whilst waiting on shuffling cache promise (#3589)
## Issue Addressed

NA

## Proposed Changes

Fixes a bug which hogged the write-lock for the `shuffling_cache`.

## Additional Info

NA
2022-09-19 07:58:50 +00:00
Michael Sproul
507bb9dad4 Refined payload pruning (#3587)
## Proposed Changes

Improve the payload pruning feature in several ways:

- Payload pruning is now entirely optional. It is enabled by default but can be disabled with `--prune-payloads false`. The previous `--prune-payloads-on-startup` flag from #3565 is removed.
- Initial payload pruning on startup now runs in a background thread. This thread will always load the split state, which is a small fraction of its total work (up to ~300ms) and then backtrack from that state. This pruning process ran in 2m5s on one Prater node with good I/O and 16m on a node with slower I/O.
- To work with the optional payload pruning the database function `try_load_full_block` will now attempt to load execution payloads for finalized slots _if_ pruning is currently disabled. This gives users an opt-out for the extensive traffic between the CL and EL for reconstructing payloads.

## Additional Info

If the `prune-payloads` flag is toggled on and off then the on-startup check may not see any payloads to delete and fail to clean them up. In this case the `lighthouse db prune_payloads` command should be used to force a manual sweep of the database.
2022-09-19 07:58:49 +00:00
Michael Sproul
f2ac0738d8 Implement skip_randao_verification and blinded block rewards API (#3540)
## Issue Addressed

https://github.com/ethereum/beacon-APIs/pull/222

## Proposed Changes

Update Lighthouse's randao verification API to match the `beacon-APIs` spec. We implemented the API before spec stabilisation, and it changed slightly in the course of review.

Rather than a flag `verify_randao` taking a boolean value, the new API uses a `skip_randao_verification` flag which takes no argument. The new spec also requires the randao reveal to be present and equal to the point-at-infinity when `skip_randao_verification` is set.

I've also updated the `POST /lighthouse/analysis/block_rewards` API to take blinded blocks as input, as the execution payload is irrelevant and we may want to assess blocks produced by builders.

## Additional Info

This is technically a breaking change, but seeing as I suspect I'm the only one using these parameters/APIs, I think we're OK to include this in a patch release.
2022-09-19 07:58:48 +00:00
Marius van der Wijden
6f7d21c542 enable 4844 at epoch 3 2022-09-18 12:13:03 +02:00
Marius van der Wijden
285dbf43ed hacky hacks 2022-09-18 11:34:46 +02:00
Marius van der Wijden
8b71b978e0 new round of hacks (config etc) 2022-09-17 23:42:49 +02:00
Daniel Knopik
750c594f5f forgor something 2022-09-17 21:38:57 +02:00
Daniel Knopik
eab1fce0e5 Merge branch 'eip4844' of github.com:dknopik/lighthouse into eip4844 2022-09-17 20:55:36 +02:00
Daniel Knopik
76572db9d5 add network config 2022-09-17 20:55:21 +02:00
Marius van der Wijden
f43532d3de implement handle blobs by range req 2022-09-17 20:05:51 +02:00
Marius van der Wijden
f9209e2d08 more network stuff 2022-09-17 16:39:40 +02:00
Marius van der Wijden
aeb52ff186 network stuff 2022-09-17 16:10:42 +02:00
Daniel Knopik
d4d40be870 storable blobs 2022-09-17 15:58:52 +02:00
Marius van der Wijden
36a0add0cd network stuff 2022-09-17 15:23:28 +02:00
Daniel Knopik
0518665949 Merge remote-tracking branch 'fork/eip4844' into eip4844 2022-09-17 14:58:33 +02:00
Daniel Knopik
292a16a6eb gossip boilerplate 2022-09-17 14:58:27 +02:00
Marius van der Wijden
acace8ab31 network: blobs by range message 2022-09-17 14:55:18 +02:00
Daniel Knopik
bcc738cb9d progress on gossip stuff 2022-09-17 14:31:57 +02:00
Marius van der Wijden
8473f08d10 beacon: consensus: implement engine api getBlobs 2022-09-17 14:10:15 +02:00
Daniel Knopik
dcfae6c5cf implement From<FullPayload> for Payload 2022-09-17 13:29:20 +02:00
Marius van der Wijden
fe6be28e6b beacon: consensus: implement engine api getBlobs 2022-09-17 13:20:18 +02:00
Daniel Knopik
ca1e17b386 it compiles! 2022-09-17 12:23:03 +02:00
Daniel Knopik
95203c51d4 fix some bugx, adjust stucts 2022-09-17 11:26:18 +02:00
Michael Sproul
ca42ef2e5a Prune finalized execution payloads (#3565)
## Issue Addressed

Closes https://github.com/sigp/lighthouse/issues/3556

## Proposed Changes

Delete finalized execution payloads from the database in two places:

1. When running the finalization migration in `migrate_database`. We delete the finalized payloads between the last split point and the new updated split point. _If_ payloads are already pruned prior to this then this is sufficient to prune _all_ payloads as non-canonical payloads are already deleted by the head pruner, and all canonical payloads prior to the previous split will already have been pruned.
2. To address the fact that users will update to this code _after_ the merge on mainnet (and testnets), we need a one-off scan to delete the finalized payloads from the canonical chain. This is implemented in `try_prune_execution_payloads` which runs on startup and scans the chain back to the Bellatrix fork or the anchor slot (if checkpoint synced after Bellatrix). In the case where payloads are already pruned this check only imposes a single state load for the split state, which shouldn't be _too slow_. Even so, a flag `--prepare-payloads-on-startup=false` is provided to turn this off after it has run the first time, which provides faster start-up times.

There is also a new `lighthouse db prune_payloads` subcommand for users who prefer to run the pruning manually.

## Additional Info

The tests have been updated to not rely on finalized payloads in the database, instead using the `MockExecutionLayer` to reconstruct them. Additionally a check was added to `check_chain_dump` which asserts the non-existence or existence of payloads on disk depending on their slot.
2022-09-17 02:27:01 +00:00
Paul Hauner
2cd3e3a768 Avoid duplicate committee cache loads (#3574)
## Issue Addressed

NA

## Proposed Changes

I have observed scenarios on Goerli where Lighthouse was receiving attestations which reference the same, un-cached shuffling on multiple threads at the same time. Lighthouse was then loading the same state from database and determining the shuffling on multiple threads at the same time. This is unnecessary load on the disk and RAM.

This PR modifies the shuffling cache so that each entry can be either:

- A committee
- A promise for a committee (i.e., a `crossbeam_channel::Receiver`)

Now, in the scenario where we have thread A and thread B simultaneously requesting the same un-cached shuffling, we will have the following:

1. Thread A will take the write-lock on the shuffling cache, find that there's no cached committee and then create a "promise" (a `crossbeam_channel::Sender`) for a committee before dropping the write-lock.
1. Thread B will then be allowed to take the write-lock for the shuffling cache and find the promise created by thread A. It will block the current thread waiting for thread A to fulfill that promise.
1. Thread A will load the state from disk, obtain the shuffling, send it down the channel, insert the entry into the cache and then continue to verify the attestation.
1. Thread B will then receive the shuffling from the receiver, be un-blocked and then continue to verify the attestation.

In the case where thread A fails to generate the shuffling and drops the sender, the next time that specific shuffling is requested we will detect that the channel is disconnected and return a `None` entry for that shuffling. This will cause the shuffling to be re-calculated.

## Additional Info

NA
2022-09-16 08:54:03 +00:00
Paul Hauner
7d3948c8fe Add metric for re-org distance (#3566)
## Issue Addressed

NA

## Proposed Changes

Add a metric to track the re-org distance.

## Additional Info

NA
2022-09-13 17:19:27 +00:00
tim gretler
98815516a1 Support histogram buckets (#3391)
## Issue Addressed

#3285

## Proposed Changes

Adds support for specifying histogram with buckets and adds new metric buckets for metrics mentioned in issue.

## Additional Info

Need some help for the buckets.


Co-authored-by: Michael Sproul <micsproul@gmail.com>
2022-09-13 01:57:44 +00:00
Nils Effinghausen
f682df51a1 fix description for BALANCES_CACHE_MISSES metric (#3545)
## Issue Addressed

fixes metric description


Co-authored-by: Nils Effinghausen <nils.effinghausen@t-systems.com>
2022-09-10 01:35:10 +00:00
realbigsean
d1a8d6cf91 Pin mev rs deps (#3557)
## Issue Addressed

We were unable to update lighthouse by running `cargo update` because some of the `mev-build-rs` deps weren't pinned. But `mev-build-rs` is now pinned here and includes it's own pinned commits for `ssz-rs` and `etheruem-consensus`



Co-authored-by: realbigsean <sean@sigmaprime.io>
2022-09-08 23:46:03 +00:00
Michael Sproul
9a7f7f1c1e Configurable monitoring endpoint frequency (#3530)
## Issue Addressed

Closes #3514

## Proposed Changes

- Change default monitoring endpoint frequency to 120 seconds to fit with 30k requests/month limit.
- Allow configuration of the monitoring endpoint frequency using `--monitoring-endpoint-frequency N` where `N` is a value in seconds.
2022-09-05 08:29:00 +00:00
realbigsean
177aef8f1e Builder profit threshold flag (#3534)
## Issue Addressed

Resolves https://github.com/sigp/lighthouse/issues/3517

## Proposed Changes

Adds a `--builder-profit-threshold <wei value>` flag to the BN. If an external payload's value field is less than this value, the local payload will be used. The value of the local payload will not be checked (it can't really be checked until the engine API is updated to support this).


Co-authored-by: realbigsean <sean@sigmaprime.io>
2022-09-05 04:50:49 +00:00
realbigsean
cae40731a2 Strict count unrealized (#3522)
## Issue Addressed

Add a flag that can increase count unrealized strictness, defaults to false

## Proposed Changes

Please list or describe the changes introduced by this PR.

## Additional Info

Please provide any additional information. For example, future considerations
or information useful for reviewers.


Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: sean <seananderson33@gmail.com>
2022-09-05 04:50:47 +00:00
Mac L
80359d8ddb Fix attestation performance API InvalidValidatorIndex error (#3503)
## Issue Addressed

When requesting an index which is not active during `start_epoch`, Lighthouse returns: 
```
curl "http://localhost:5052/lighthouse/analysis/attestation_performance/999999999?start_epoch=100000&end_epoch=100000"
```
```json
{
  "code": 500,
  "message": "INTERNAL_SERVER_ERROR: ParticipationCache(InvalidValidatorIndex(999999999))",
  "stacktraces": []
}
```

This error occurs even when the index in question becomes active before `end_epoch` which is undesirable as it can prevent larger queries from completing.

## Proposed Changes

In the event the index is out-of-bounds (has not yet been activated), simply return all fields as `false`:

```
-> curl "http://localhost:5052/lighthouse/analysis/attestation_performance/999999999?start_epoch=100000&end_epoch=100000"
```
```json
[
  {
    "index": 999999999,
    "epochs": {
      "100000": {
        "active": false,
        "head": false,
        "target": false,
        "source": false
      }
    }
  }
]
```

By doing this, we cover the case where a validator becomes active sometime between `start_epoch` and `end_epoch`.

## Additional Info

Note that this error only occurs for epochs after the Altair hard fork.
2022-09-05 04:50:45 +00:00
Divma
473abc14ca Subscribe to subnets only when needed (#3419)
## Issue Addressed

We currently subscribe to attestation subnets as soon as the subscription arrives (one epoch in advance), this makes it so that subscriptions for future slots are scheduled instead of done immediately. 

## Proposed Changes

- Schedule subscriptions to subnets for future slots.
- Finish removing hashmap_delay, in favor of [delay_map](https://github.com/AgeManning/delay_map). This was the only remaining service to do this.
- Subscriptions for past slots are rejected, before we would subscribe for one slot.
- Add a new test for subscriptions that are not consecutive.

## Additional Info

This is also an effort in making the code easier to understand
2022-09-05 00:22:48 +00:00
Paul Hauner
aa022f4685 v3.1.0 (#3525)
## Issue Addressed

NA

## Proposed Changes

- Bump versions

## Additional Info

- ~~Blocked on #3508~~
- ~~Blocked on #3526~~
- ~~Requires additional testing.~~
- Expected release date is 2022-09-01
2022-08-31 22:21:55 +00:00
Paul Hauner
661307dce1 Separate committee subscriptions queue (#3508)
## Issue Addressed

NA

## Proposed Changes

As we've seen on Prater, there seems to be a correlation between these messages

```
WARN Not enough time for a discovery search  subnet_id: ExactSubnet { subnet_id: SubnetId(19), slot: Slot(3742336) }, service: attestation_service
```

... and nodes falling 20-30 slots behind the head for short periods. These nodes are running ~20k Prater validators.

After running some metrics, I can see that the `network_recv` channel is processing ~250k `AttestationSubscribe` messages per minute. It occurred to me that perhaps the `AttestationSubscribe` messages are "washing out" the `SendRequest` and `SendResponse` messages. In this PR I separate the `AttestationSubscribe` and `SyncCommitteeSubscribe` messages into their own queue so the `tokio::select!` in the `NetworkService` can still process the other messages in the `network_recv` channel without necessarily having to clear all the subscription messages first.

~~I've also added filter to the HTTP API to prevent duplicate subscriptions going to the network service.~~

## Additional Info

- Currently being tested on Prater
2022-08-30 05:47:31 +00:00
Michael Sproul
7a50684741 Harden slot notifier against clock drift (#3519)
## Issue Addressed

Partly resolves #3518

## Proposed Changes

Change the slot notifier to use `duration_to_next_slot` rather than an interval timer. This makes it robust against underlying clock changes.
2022-08-29 14:34:43 +00:00
Paul Hauner
1a833ecc17 Add more logging for invalid payloads (#3515)
## Issue Addressed

NA

## Proposed Changes

Adds more `debug` logging to help troubleshoot invalid execution payload blocks. I was doing some of this recently and found it to be challenging.

With this PR we should be able to grep `Invalid execution payload` and get one-liners that will show the block, slot and details about the proposer.

I also changed the log in `process_invalid_execution_payload` since it was a little misleading; the `block_root` wasn't necessary the block which had an invalid payload.

## Additional Info

NA
2022-08-29 14:34:42 +00:00
Paul Hauner
8609cced0e Reset payload statuses when resuming fork choice (#3498)
## Issue Addressed

NA

## Proposed Changes

This PR is motivated by a recent consensus failure in Geth where it returned `INVALID` for an `VALID` block. Without this PR, the only way to recover is by re-syncing Lighthouse. Whilst ELs "shouldn't have consensus failures", in reality it's something that we can expect from time to time due to the complex nature of Ethereum. Being able to recover easily will help the network recover and EL devs to troubleshoot.

The risk introduced with this PR is that genuinely INVALID payloads get a "second chance" at being imported. I believe the DoS risk here is negligible since LH needs to be restarted in order to re-process the payload. Furthermore, there's no reason to think that a well-performing EL will accept a truly invalid payload the second-time-around.

## Additional Info

This implementation has the following intricacies:

1. Instead of just resetting *invalid* payloads to optimistic, we'll also reset *valid* payloads. This is an artifact of our existing implementation.
1. We will only reset payload statuses when we detect an invalid payload present in `proto_array`
    - This helps save us from forgetting that all our blocks are valid in the "best case scenario" where there are no invalid blocks.
1. If we fail to revert the payload statuses we'll log a `CRIT` and just continue with a `proto_array` that *does not* have reverted payload statuses.
    - The code to revert statuses needs to deal with balances and proposer-boost, so it's a failure point. This is a defensive measure to avoid introducing new show-stopping bugs to LH.
2022-08-29 14:34:41 +00:00
Michael Sproul
66eca1a882 Refactor op pool for speed and correctness (#3312)
## Proposed Changes

This PR has two aims: to speed up attestation packing in the op pool, and to fix bugs in the verification of attester slashings, proposer slashings and voluntary exits. The changes are bundled into a single database schema upgrade (v12).

Attestation packing is sped up by removing several inefficiencies: 

- No more recalculation of `attesting_indices` during packing.
- No (unnecessary) examination of the `ParticipationFlags`: a bitfield suffices. See `RewardCache`.
- No re-checking of attestation validity during packing: the `AttestationMap` provides attestations which are "correct by construction" (I have checked this using Hydra).
- No SSZ re-serialization for the clunky `AttestationId` type (it can be removed in a future release).

So far the speed-up seems to be roughly 2-10x, from 500ms down to 50-100ms.

Verification of attester slashings, proposer slashings and voluntary exits is fixed by:

- Tracking the `ForkVersion`s that were used to verify each message inside the `SigVerifiedOp`. This allows us to quickly re-verify that they match the head state's opinion of what the `ForkVersion` should be at the epoch(s) relevant to the message.
- Storing the `SigVerifiedOp` on disk rather than the raw operation. This allows us to continue track the fork versions after a reboot.

This is mostly contained in this commit 52bb1840ae5c4356a8fc3a51e5df23ed65ed2c7f.

## Additional Info

The schema upgrade uses the justified state to re-verify attestations and compute `attesting_indices` for them. It will drop any attestations that fail to verify, by the logic that attestations are most valuable in the few slots after they're observed, and are probably stale and useless by the time a node restarts. Exits and proposer slashings and similarly re-verified to obtain `SigVerifiedOp`s.

This PR contains a runtime killswitch `--paranoid-block-proposal` which opts out of all the optimisations in favour of closely verifying every included message. Although I'm quite sure that the optimisations are correct this flag could be useful in the event of an unforeseen emergency.

Finally, you might notice that the `RewardCache` appears quite useless in its current form because it is only updated on the hot-path immediately before proposal. My hope is that in future we can shift calls to `RewardCache::update` into the background, e.g. while performing the state advance. It is also forward-looking to `tree-states` compatibility, where iterating and indexing `state.{previous,current}_epoch_participation` is expensive and needs to be minimised.
2022-08-29 09:10:26 +00:00
realbigsean
cb132c622d don't register exited or slashed validators with the builder api (#3473)
## Issue Addressed

#3465

## Proposed Changes

Filter out any validator registrations for validators that are not `active` or `pending`.  I'm adding this filtering the beacon node because all the information is readily available there. In other parts of the VC we are usually sending per-validator requests based on duties from the BN. And duties will only be provided for active validators so we don't have this type of filtering elsewhere in the VC.



Co-authored-by: realbigsean <sean@sigmaprime.io>
2022-08-24 23:34:58 +00:00
Divma
8c69d57c2c Pause sync when EE is offline (#3428)
## Issue Addressed

#3032

## Proposed Changes

Pause sync when ee is offline. Changes include three main parts:
- Online/offline notification system
- Pause sync
- Resume sync

#### Online/offline notification system
- The engine state is now guarded behind a new struct `State` that ensures every change is correctly notified. Notifications are only sent if the state changes. The new `State` is behind a `RwLock` (as before) as the synchronization mechanism.
- The actual notification channel is a [tokio::sync::watch](https://docs.rs/tokio/latest/tokio/sync/watch/index.html) which ensures only the last value is in the receiver channel. This way we don't need to worry about message order etc.
- Sync waits for state changes concurrently with normal messages.

#### Pause Sync
Sync has four components, pausing is done differently in each:
- **Block lookups**: Disabled while in this state. We drop current requests and don't search for new blocks. Block lookups are infrequent and I don't think it's worth the extra logic of keeping these and delaying processing. If we later see that this is required, we can add it.
- **Parent lookups**: Disabled while in this state. We drop current requests and don't search for new parents. Parent lookups are even less frequent and I don't think it's worth the extra logic of keeping these and delaying processing. If we later see that this is required, we can add it.
- **Range**: Chains don't send batches for processing to the beacon processor. This is easily done by guarding the channel to the beacon processor and giving it access only if the ee is responsive. I find this the simplest and most powerful approach since we don't need to deal with new sync states and chain segments that are added while the ee is offline will follow the same logic without needing to synchronize a shared state among those. Another advantage of passive pause vs active pause is that we can still keep track of active advertised chain segments so that on resume we don't need to re-evaluate all our peers.
- **Backfill**: Not affected by ee states, we don't pause.

#### Resume Sync
- **Block lookups**: Enabled again.
- **Parent lookups**: Enabled again.
- **Range**: Active resume. Since the only real pause range does is not sending batches for processing, resume makes all chains that are holding read-for-processing batches send them.
- **Backfill**: Not affected by ee states, no need to resume.

## Additional Info

**QUESTION**: Originally I made this to notify and change on synced state, but @pawanjay176 on talks with @paulhauner concluded we only need to check online/offline states. The upcheck function mentions extra checks to have a very up to date sync status to aid the networking stack. However, the only need the networking stack would have is this one. I added a TODO to review if the extra check can be removed

Next gen of #3094

Will work best with #3439 

Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
2022-08-24 23:34:56 +00:00
Michael Sproul
aab4a8d2f2 Update docs for mainnet merge release (#3494)
## Proposed Changes

Update the merge migration docs to encourage updating mainnet configs _now_!

The docs are also updated to recommend _against_ `--suggested-fee-recipient` on the beacon node (https://github.com/sigp/lighthouse/issues/3432).

Additionally the `--help` for the CLI is updated to match with a few small semantic changes:

- `--execution-jwt` is no longer allowed without `--execution-endpoint`. We've ended up without a default for `--execution-endpoint`, so I think that's fine.
- The flags related to the JWT are only allowed if `--execution-jwt` is provided.
2022-08-23 03:50:58 +00:00
Paul Hauner
18c61a5e8b v3.0.0 (#3464)
## Issue Addressed

NA

## Proposed Changes

Bump versions to v3.0.0

## Additional Info

- ~~Blocked on #3439~~
- ~~Blocked on #3459~~
- ~~Blocked on #3463~~
- ~~Blocked on #3462~~
- ~~Requires further testing~~


Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2022-08-22 03:43:08 +00:00
Paul Hauner
931153885c Run per-slot fork choice at a further distance from the head (#3487)
## Issue Addressed

NA

## Proposed Changes

Run fork choice when the head is 256 slots from the wall-clock slot, rather than 4.

The reason we don't *always* run FC is so that it doesn't slow us down during sync. As the comments state, setting the value to 256 means that we'd only have one interrupting fork-choice call if we were syncing at 20 slots/sec.

## Additional Info

NA
2022-08-19 04:27:24 +00:00
Paul Hauner
df358b864d Add metrics for EE PayloadStatus returns (#3486)
## Issue Addressed

NA

## Proposed Changes

Adds some metrics so we can track payload status responses from the EE. I think this will be useful for troubleshooting and alerting.

I also bumped the `BecaonChain::per_slot_task` to `debug` since it doesn't seem too noisy and would have helped us with some things we were debugging in the past.

## Additional Info

NA
2022-08-19 04:27:23 +00:00
Paul Hauner
043fa2153e Revise EE peer penalites (#3485)
## Issue Addressed

NA

## Proposed Changes

Don't penalize peers for errors that might be caused by an honest optimistic node.

## Additional Info

NA
2022-08-19 04:27:22 +00:00
Michael Sproul
8255c8682e Align engine API timeouts with spec (#3470)
## Proposed Changes

Match the timeouts from the `execution-apis` spec. Our existing values were already quite close so I don't imagine this change to be very disruptive.

The spec sets the timeout for `engine_getPayloadV1` to only 1 second, but we were already using a longer value of 2 seconds. I've kept the 2 second timeout as I don't think there's any need to fail faster when producing a payload.

There's no timeout specified for `eth_syncing` so I've matched it to the shortest timeout from the spec (1 second). I think the previous value of 250ms was likely too low and could have been contributing to spurious timeouts, particularly for remote ELs.

## Additional Info

The timeouts are defined on each endpoint in this document: https://github.com/ethereum/execution-apis/blob/main/src/engine/specification.md
2022-08-17 02:36:39 +00:00
Michael Sproul
e5fc9f26bc Log if no execution endpoint is configured (#3467)
## Issue Addressed

Fixes an issue whereby syncing a post-merge network without an execution endpoint would silently stall. Sync swallows the errors from block verification so previously there was no indication in the logs for why the node couldn't sync.

## Proposed Changes

Add an error log to the merge-readiness notifier for the case where the merge has already completed but no execution endpoint is configured.
2022-08-15 01:31:02 +00:00
Michael Sproul
25e3dc9300 Fix block verification and checkpoint sync caches (#3466)
## Issue Addressed

Closes https://github.com/sigp/lighthouse/issues/2962

## Proposed Changes

Build all caches on the checkpoint state before storing it in the database.

Additionally, fix a bug in `signature_verify_chain_segment` which prevented block verification from succeeding unless the previous epoch cache was already built. The previous epoch cache is required to verify the signatures of attestations included from previous epochs, even when all the blocks in the segment are from the same epoch.

The comments around `signature_verify_chain_segment` have also been updated to reflect the fact that it should only be used on a chain of blocks from a single epoch. I believe this restriction had already been added at some point in the past and that the current comments were just outdated (and I think because the proposer shuffling can change in the next epoch based on the blocks applied in the current epoch that this limitation is essential).
2022-08-15 01:31:00 +00:00
Paul Hauner
f03f9ba680 Increase merge-readiness lookhead (#3463)
## Issue Addressed

NA

## Proposed Changes

Start issuing merge-readiness logs 2 weeks before the Bellatrix fork epoch. Additionally, if the Bellatrix epoch is specified and the use has configured an EL, always log merge readiness logs, this should benefit pro-active users.

### Lookahead Reasoning

- Bellatrix fork is:
    - epoch 144896
    - slot 4636672
    - Unix timestamp: `1606824023 + (4636672 * 12) = 1662464087`
    - GMT: Tue Sep 06 2022 11:34:47 GMT+0000
- Warning start time is:
    - Unix timestamp: `1662464087 - 604800 * 2 = 1661254487`
    - GMT: Tue Aug 23 2022 11:34:47 GMT+0000

The [current expectation](https://discord.com/channels/595666850260713488/745077610685661265/1007445305198911569) is that EL and CL clients will releases out by Aug 22nd at the latest, then an EF announcement will go out on the 23rd. If all goes well, LH will start alerting users about merge-readiness just after the announcement.

## Additional Info

NA
2022-08-15 01:30:59 +00:00
Michael Sproul
92d597ad23 Modularise slasher backend (#3443)
## Proposed Changes

Enable multiple database backends for the slasher, either MDBX (default) or LMDB. The backend can be selected using `--slasher-backend={lmdb,mdbx}`.

## Additional Info

In order to abstract over the two library's different handling of database lifetimes I've used `Box::leak` to give the `Environment` type a `'static` lifetime. This was the only way I could think of using 100% safe code to construct a self-referential struct `SlasherDB`, where the `OpenDatabases` refers to the `Environment`. I think this is OK, as the `Environment` is expected to live for the life of the program, and both database engines leave the database in a consistent state after each write. The memory claimed for memory-mapping will be freed by the OS and appropriately flushed regardless of whether the `Environment` is actually dropped.

We are depending on two `sigp` forks of `libmdbx-rs` and `lmdb-rs`, to give us greater control over MDBX OS support and LMDB's version.
2022-08-15 01:30:56 +00:00
Pawan Dhananjay
71fd0b42f2 Fix lints for Rust 1.63 (#3459)
## Issue Addressed

N/A

## Proposed Changes

Fix clippy lints for latest rust version 1.63. I have allowed the [derive_partial_eq_without_eq](https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq) lint as satisfying this lint would result in more code that we might not want and I feel it's not required. 

Happy to fix this lint across lighthouse if required though.
2022-08-12 00:56:39 +00:00
Divma
f4ffa9e0b4 Handle processing results of non faulty batches (#3439)
## Issue Addressed
Solves #3390 

So after checking some logs @pawanjay176 got, we conclude that this happened because we blacklisted a chain after trying it "too much". Now here, in all occurrences it seems that "too much" means we got too many download failures. This happened very slowly, exactly because the batch is allowed to stay alive for very long times after not counting penalties when the ee is offline. The error here then was not that the batch failed because of offline ee errors, but that we blacklisted a chain because of download errors, which we can't pin on the chain but on the peer. This PR fixes that.

## Proposed Changes

Adds a missing piece of logic so that if a chain fails for errors that can't be attributed to an objectively bad behavior from the peer, it is not blacklisted. The issue at hand occurred when new peers arrived claiming a head that had wrongfully blacklisted, even if the original peers participating in the chain were not penalized.

Another notable change is that we need to consider a batch invalid if it processed correctly but its next non empty batch fails processing. Now since a batch can fail processing in non empty ways, there is no need to mark as invalid previous batches.

Improves some logging as well.

## Additional Info

We should do this regardless of pausing sync on ee offline/unsynced state. This is because I think it's almost impossible to ensure a processing result will reach in a predictable order with a synced notification from the ee. Doing this handles what I think are inevitable data races when we actually pause sync

This also fixes a return that reports which batch failed and caused us some confusion checking the logs
2022-08-12 00:56:38 +00:00
Paul Hauner
4fc0cb121c Remove some "wontfix" TODOs for the merge (#3449)
## Issue Addressed

NA

## Proposed Changes

Removes three types of TODOs:

1. `execution_layer/src/lib.rs`: It was [determined](https://github.com/ethereum/consensus-specs/issues/2636#issuecomment-988688742) that there is no action required here.
2. `beacon_processor/worker/gossip_methods.rs`: Removed TODOs relating to peer scoring that have already been addressed via `epe.penalize_peer()`.
    - It seems `cargo fmt` wanted to adjust some things here as well 🤷 
3. `proto_array_fork_choice.rs`: it would be nice to remove that useless `bool` for cleanliness, but I don't think it's something we need to do and the TODO just makes things look messier IMO.


## Additional Info

There should be no functional changes to the code in this PR.

There are still some TODOs lingering, those ones require actual changes or more thought.
2022-08-10 13:06:46 +00:00
Michael Sproul
4e05f19fb5 Serve Bellatrix preset in BN API (#3425)
## Issue Addressed

Resolves #3388
Resolves #2638

## Proposed Changes

- Return the `BellatrixPreset` on `/eth/v1/config/spec` by default.
- Allow users to opt out of this by providing `--http-spec-fork=altair` (unless there's a Bellatrix fork epoch set).
- Add the Altair constants from #2638 and make serving the constants non-optional (the `http-disable-legacy-spec` flag is deprecated).
- Modify the VC to only read the `Config` and not to log extra fields. This prevents it from having to muck around parsing the `ConfigAndPreset` fields it doesn't need.

## Additional Info

This change is backwards-compatible for the VC and the BN, but is marked as a breaking change for the removal of `--http-disable-legacy-spec`.

I tried making `Config` a `superstruct` too, but getting the automatic decoding to work was a huge pain and was going to require a lot of hacks, so I gave up in favour of keeping the default-based approach we have now.
2022-08-10 07:52:59 +00:00
Pawan Dhananjay
c25934956b Remove INVALID_TERMINAL_BLOCK (#3385)
## Issue Addressed

Resolves #3379 

## Proposed Changes

Remove instances of `InvalidTerminalBlock` in lighthouse and use 
`Invalid {latest_valid_hash: "0x0000000000000000000000000000000000000000000000000000000000000000"}` 
to represent that status.
2022-08-10 07:52:58 +00:00
Paul Hauner
2de26b20f8 Don't return errors on HTTP API for already-known messages (#3341)
## Issue Addressed

- Resolves #3266

## Proposed Changes

Return 200 OK rather than an error when a block, attestation or sync message is already known.

Presently, we will log return an error which causes a BN to go "offline" from the VCs perspective which causes the fallback mechanism to do work to try and avoid and upcheck offline nodes. This can be observed as instability in the `vc_beacon_nodes_available_count` metric.

The current behaviour also causes scary logs for the user. There's nothing to *actually* be concerned about when we see duplicate messages, this can happen on fallback systems (see code comments).

## Additional Info

NA
2022-08-10 07:52:57 +00:00
realbigsean
6f13727fbe Don't use the builder network if the head is optimistic (#3412)
## Issue Addressed

Resolves https://github.com/sigp/lighthouse/issues/3394

Adds a check in `is_healthy` about whether the head is optimistic when choosing whether to use the builder network. 



Co-authored-by: realbigsean <sean@sigmaprime.io>
2022-08-09 06:05:16 +00:00
Paul Hauner
a688621919 Add support for beaconAPI in lcli functions (#3252)
## Issue Addressed

NA

## Proposed Changes

Modifies `lcli skip-slots` and `lcli transition-blocks` allow them to source blocks/states from a beaconAPI and also gives them some more features to assist with benchmarking.

## Additional Info

Breaks the current `lcli skip-slots` and `lcli transition-blocks` APIs by changing some flag names. It should be simple enough to figure out the changes via `--help`.

Currently blocked on #3263.
2022-08-09 06:05:13 +00:00
Michael Sproul
6bc4a2cc91 Update invalid head tests (#3400)
## Proposed Changes

Update the invalid head tests so that they work with the current default fork choice configuration.

Thanks @realbigsean for fixing the persistence test and the EF tests.

Co-authored-by: realbigsean <sean@sigmaprime.io>
2022-08-05 23:41:09 +00:00
Mac L
5d317779bb Ensure validator/blinded_blocks/{slot} endpoint conforms to spec (#3429)
## Issue Addressed

#3418

## Proposed Changes

- Remove `eth/v2/validator/blinded_blocks/{slot}` as this endpoint does not exist in the spec.
- Return `version` in the `eth/v1/validator/blinded_blocks/{slot}` endpoint.

## Additional Info

Since this removes the `v2` endpoint, this is *technically* a breaking change, but as this does not exist in the spec users may or may not be relying on this.

Depending on what we feel is appropriate, I'm happy to edit this so we keep the `v2` endpoint for now but simply bring the `v1` endpoint in line with `v2`.
2022-08-05 06:46:58 +00:00
realbigsean
43ce0de73f Downgrade log for 204 from builder (#3411)
## Issue Addressed

A 204 from the connected builder just indicates there's no payload available from the builder, not that there's an issue. So I don't actually think this should be a warn. During the merge transition when we are pre-finalization a 204 will actually be expected. And maybe even longer if the relay chooses to delay providing payloads for a longer period post-merge.

Co-authored-by: realbigsean <sean@sigmaprime.io>
2022-08-03 17:13:15 +00:00
Michael Sproul
df51a73272 Release v2.5.1 (#3406)
## Issue Addressed

Patch release to address fork choice issues in the presence of clock drift: https://github.com/sigp/lighthouse/pull/3402
2022-08-03 04:23:09 +00:00
Mac L
e24552d61a Restore backwards compatibility when using older BNs (#3410)
## Issue Addressed

https://github.com/status-im/nimbus-eth2/issues/3930

## Proposed Changes

We can trivially support beacon nodes which do not provide the `is_optimistic` field by wrapping the field in an `Option`.
2022-08-02 23:20:51 +00:00
Paul Hauner
d0beecca20 Make fork choice prune again (#3408)
## Issue Addressed

NA

## Proposed Changes

There was a regression in #3244 (released in v2.4.0) which stopped pruning fork choice (see [here](https://github.com/sigp/lighthouse/pull/3244#discussion_r935187485)).

This would form a very slow memory leak, using ~100mb per month. The release has been out for ~11 days, so users should not be seeing a dangerous increase in memory, *yet*.

Credits to @michaelsproul for noticing this 🎉 

## Additional Info

NA
2022-08-02 07:58:42 +00:00
Justin Traglia
807bc8b0b3 Fix a few typos in option help strings (#3401)
## Proposed Changes

Fixes a typo I noticed while looking at options.
2022-08-02 00:58:24 +00:00
Michael Sproul
18383a63b2 Tidy eth1/deposit contract logging (#3397)
## Issue Addressed

Fixes an issue identified by @remyroy whereby we were logging a recommendation to use `--eth1-endpoints` on merge-ready setups (when the execution layer was out of sync).

## Proposed Changes

I took the opportunity to clean up the other eth1-related logs, replacing "eth1" by "deposit contract" or "execution" as appropriate.

I've downgraded the severity of the `CRIT` log to `ERRO` and removed most of the recommendation text. The reason being that users lacking an execution endpoint will be informed by the new `WARN Not merge ready` log pre-Bellatrix, or the regular errors from block verification post-Bellatrix.
2022-08-01 07:20:43 +00:00
Paul Hauner
2983235650 v2.5.0 (#3392)
## Issue Addressed

NA

## Proposed Changes

Bump versions.

## Additional Info

- ~~Blocked on #3383~~
- ~~Awaiting further testing.~~
2022-08-01 03:41:08 +00:00
Paul Hauner
bcfde6e7df Indicate that invalid blocks are optimistic (#3383)
## Issue Addressed

NA

## Proposed Changes

This PR will make Lighthouse return blocks with invalid payloads via the API with `execution_optimistic = true`. This seems a bit awkward, however I think it's better than returning a 404 or some other error.

Let's consider the case where the only possible head is invalid (#3370 deals with this). In such a scenario all of the duties endpoints will start failing because the head is invalid. I think it would be better if the duties endpoints continue to work, because it's likely that even though the head is invalid the duties are still based upon valid blocks and we want the VC to have them cached. There's no risk to the VC here because we won't actually produce an attestation pointing to an invalid head.

Ultimately, I don't think it's particularly important for us to distinguish between optimistic and invalid blocks on the API. Neither should be trusted and the only *real* reason that we track this is so we can try and fork around the invalid blocks.


## Additional Info

- ~~Blocked on #3370~~
2022-07-30 05:08:57 +00:00
Michael Sproul
fdfdb9b57c Enable count-unrealized by default (#3389)
## Issue Addressed

Enable https://github.com/sigp/lighthouse/pull/3322 by default on all networks.

The feature can be opted out of using `--count-unrealized=false` (the CLI flag is updated to take a parameter).
2022-07-30 00:22:41 +00:00
Pawan Dhananjay
b3ce8d0de9 Fix penalties in sync methods (#3384)
## Issue Addressed

N/A

## Proposed Changes

Uses the `penalize_peer` function added in #3350 in sync methods as well. The existing code in sync methods missed the `ExecutionPayloadError::UnverifiedNonOptimisticCandidate` case.
2022-07-30 00:22:39 +00:00
ethDreamer
034260bd99 Initial Commit of Retrospective OTB Verification (#3372)
## Issue Addressed

* #2983 

## Proposed Changes

Basically followed the [instructions laid out here](https://github.com/sigp/lighthouse/issues/2983#issuecomment-1062494947)


Co-authored-by: Paul Hauner <paul@paulhauner.com>
Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>
2022-07-30 00:22:38 +00:00
realbigsean
6c2d8b2262 Builder Specs v0.2.0 (#3134)
## Issue Addressed

https://github.com/sigp/lighthouse/issues/3091

Extends https://github.com/sigp/lighthouse/pull/3062, adding pre-bellatrix block support on blinded endpoints and allowing the normal proposal flow (local payload construction) on blinded endpoints. This resulted in better fallback logic because the VC will not have to switch endpoints on failure in the BN <> Builder API, the BN can just fallback immediately and without repeating block processing that it shouldn't need to. We can also keep VC fallback from the VC<>BN API's blinded endpoint to full endpoint.

## Proposed Changes

- Pre-bellatrix blocks on blinded endpoints
- Add a new `PayloadCache` to the execution layer
- Better fallback-from-builder logic

## Todos

- [x] Remove VC transition logic
- [x] Add logic to only enable builder flow after Merge transition finalization
- [x] Tests
- [x] Fix metrics
- [x] Rustdocs


Co-authored-by: Mac L <mjladson@pm.me>
Co-authored-by: realbigsean <sean@sigmaprime.io>
2022-07-30 00:22:37 +00:00
Paul Hauner
25f0e261cb Don't return errors when fork choice fails (#3370)
## Issue Addressed

NA

## Proposed Changes

There are scenarios where the only viable head will have an invalid execution payload, in this scenario the `get_head` function on `proto_array` will return an error. We must recover from this scenario by importing blocks from the network.

This PR stops `BeaconChain::recompute_head` from returning an error so that we can't accidentally start down-scoring peers or aborting block import just because the current head has an invalid payload.

## Reviewer Notes

The following changes are included:

1. Allow `fork_choice.get_head` to fail gracefully in `BeaconChain::process_block` when trying to update the `early_attester_cache`; simply don't add the block to the cache rather than aborting the entire process.
1. Don't return an error from `BeaconChain::recompute_head_at_current_slot` and `BeaconChain::recompute_head` to defensively prevent calling functions from aborting any process just because the fork choice function failed to run.
    - This should have practically no effect, since most callers were still continuing if recomputing the head failed.
    - The outlier is that the API will return 200 rather than a 500 when fork choice fails.
1. Add the `ProtoArrayForkChoice::set_all_blocks_to_optimistic` function to recover from the scenario where we've rebooted and the persisted fork choice has an invalid head.
2022-07-28 13:57:09 +00:00
Michael Sproul
d04fde3ba9 Remove equivocating validators from fork choice (#3371)
## Issue Addressed

Closes https://github.com/sigp/lighthouse/issues/3241
Closes https://github.com/sigp/lighthouse/issues/3242

## Proposed Changes

* [x] Implement logic to remove equivocating validators from fork choice per https://github.com/ethereum/consensus-specs/pull/2845
* [x] Update tests to v1.2.0-rc.1. The new test which exercises `equivocating_indices` is passing.
* [x] Pull in some SSZ abstractions from the `tree-states` branch that make implementing Vec-compatible encoding for types like `BTreeSet` and `BTreeMap`.
* [x] Implement schema upgrades and downgrades for the database (new schema version is V11).
* [x] Apply attester slashings from blocks to fork choice

## Additional Info

* This PR doesn't need the `BTreeMap` impl, but `tree-states` does, and I don't think there's any harm in keeping it. But I could also be convinced to drop it.

Blocked on #3322.
2022-07-28 09:43:41 +00:00
Pawan Dhananjay
f3439116da Return ResourceUnavailable if we are unable to reconstruct execution payloads (#3365)
## Issue Addressed

Resolves #3351 

## Proposed Changes

Returns a `ResourceUnavailable` rpc error if we are unable to serve full payloads to blocks by root and range requests because the execution layer is not synced.


## Additional Info

This PR also changes the penalties such that a `ResourceUnavailable` error is only penalized if it is an outgoing request. If we are syncing and aren't getting full block responses, then we don't have use for the peer. However, this might not be true for the incoming request case. We let the peer decide in this case if we are still useful or if we should be banned.
cc @divagant-martian please let me know if i'm missing something here.
2022-07-27 03:20:00 +00:00
Justin Traglia
0f62d900fe Fix some typos (#3376)
## Proposed Changes

This PR fixes various minor typos in the project.
2022-07-27 00:51:06 +00:00
Mac L
44fae52cd7 Refuse to sign sync committee messages when head is optimistic (#3191)
## Issue Addressed

Resolves #3151 

## Proposed Changes

When fetching duties for sync committee contributions, check the value of `execution_optimistic` of the head block from the BN and refuse to sign any sync committee messages `if execution_optimistic == true`.

## Additional Info
- Is backwards compatible with older BNs
- Finding a way to add test coverage for this would be prudent. Open to suggestions.
2022-07-27 00:51:05 +00:00
Mac L
d316305411 Add is_optimistic to eth/v1/node/syncing response (#3374)
## Issue Addressed

As specified in the [Beacon Chain API specs](https://github.com/ethereum/beacon-APIs/blob/master/apis/node/syncing.yaml#L32-L35) we should return `is_optimistic` as part of the response to a query for the `eth/v1/node/syncing` endpoint.

## Proposed Changes

Compute the optimistic status of the head and add it to the `SyncingData` response.
2022-07-26 08:50:16 +00:00
realbigsean
904dd62524 Strict fee recipient (#3363)
## Issue Addressed

Resolves #3267
Resolves #3156 

## Proposed Changes

- Move the log for fee recipient checks from proposer cache insertion into block proposal so we are directly checking what we get from the EE
- Only log when there is a discrepancy with the local EE, not when using the builder API. In the `builder-api` branch there is an `info` log when there is a discrepancy, I think it is more likely there will be a difference in fee recipient with the builder api because proposer payments might be made via a transaction in the block. Not really sure what patterns will become commong.
- Upgrade the log from a `warn` to an `error` - not actually sure which we want, but I think this is worth an error because the local EE with default transaction ordering I think should pretty much always use the provided fee recipient
- add a `strict-fee-recipient` flag to the VC so we only sign blocks with matching fee recipients. Falls back from the builder API to the local API if there is a discrepancy .




Co-authored-by: realbigsean <sean@sigmaprime.io>
2022-07-26 02:17:24 +00:00
ethDreamer
f7354abe0f Fix Block Cache Range Math for Faster Syncing (#3358)
## Issue Addressed

While messing with the deposit snapshot stuff, I had my proxy running and noticed the beacon node wasn't syncing the block cache continuously. There were long periods where it did nothing. I believe this was caused by a logical error introduced in #3234 that dealt with an issue that arose while syncing the block cache on Ropsten.

The problem is that when the block cache is initially syncing, it will trigger the logic that detects the cache is far behind the execution chain in time. This will trigger a batch syncing mechanism which is intended to sync further ahead than the chain would normally. But the batch syncing is actually slower than the range this function usually estimates (in this scenario).

## Proposed Changes

I believe I've fixed this function by taking the end of the range to be the maximum of (batch syncing range, usual range).
I've also renamed and restructured some things a bit. It's equivalent logic but I think it's more clear what's going on.
2022-07-26 02:17:21 +00:00
realbigsean
20ebf1f3c1 Realized unrealized experimentation (#3322)
## Issue Addressed

Add a flag that optionally enables unrealized vote tracking.  Would like to test out on testnets and benchmark differences in methods of vote tracking. This PR includes a DB schema upgrade to enable to new vote tracking style.


Co-authored-by: realbigsean <sean@sigmaprime.io>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
Co-authored-by: sean <seananderson33@gmail.com>
Co-authored-by: Mac L <mjladson@pm.me>
2022-07-25 23:53:26 +00:00
Mac L
bb5a6d2cca Add execution_optimistic flag to HTTP responses (#3070)
## Issue Addressed

#3031 

## Proposed Changes

Updates the following API endpoints to conform with https://github.com/ethereum/beacon-APIs/pull/190 and https://github.com/ethereum/beacon-APIs/pull/196
- [x] `beacon/states/{state_id}/root` 
- [x] `beacon/states/{state_id}/fork`
- [x] `beacon/states/{state_id}/finality_checkpoints`
- [x] `beacon/states/{state_id}/validators`
- [x] `beacon/states/{state_id}/validators/{validator_id}`
- [x] `beacon/states/{state_id}/validator_balances`
- [x] `beacon/states/{state_id}/committees`
- [x] `beacon/states/{state_id}/sync_committees`
- [x] `beacon/headers`
- [x] `beacon/headers/{block_id}`
- [x] `beacon/blocks/{block_id}`
- [x] `beacon/blocks/{block_id}/root`
- [x] `beacon/blocks/{block_id}/attestations`
- [x] `debug/beacon/states/{state_id}`
- [x] `debug/beacon/heads`
- [x] `validator/duties/attester/{epoch}`
- [x] `validator/duties/proposer/{epoch}`
- [x] `validator/duties/sync/{epoch}`

Updates the following Server-Sent Events:
- [x]  `events?topics=head`
- [x]  `events?topics=block`
- [x]  `events?topics=finalized_checkpoint`
- [x]  `events?topics=chain_reorg`

## Backwards Incompatible
There is a very minor breaking change with the way the API now handles requests to `beacon/blocks/{block_id}/root` and `beacon/states/{state_id}/root` when `block_id` or `state_id` is the `Root` variant of `BlockId` and `StateId` respectively.

Previously a request to a non-existent root would simply echo the root back to the requester:
```
curl "http://localhost:5052/eth/v1/beacon/states/0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/root"
{"data":{"root":"0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}}
```
Now it will return a `404`:
```
curl "http://localhost:5052/eth/v1/beacon/blocks/0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/root"
{"code":404,"message":"NOT_FOUND: beacon block with root 0xaaaa…aaaa","stacktraces":[]}
```

In addition to this is the block root `0x0000000000000000000000000000000000000000000000000000000000000000` previously would return the genesis block. It will now return a `404`:
```
curl "http://localhost:5052/eth/v1/beacon/blocks/0x0000000000000000000000000000000000000000000000000000000000000000"
{"code":404,"message":"NOT_FOUND: beacon block with root 0x0000…0000","stacktraces":[]}
```

## Additional Info
- `execution_optimistic` is always set, and will return `false` pre-Bellatrix. I am also open to the idea of doing something like `#[serde(skip_serializing_if = "Option::is_none")]`.
- The value of `execution_optimistic` is set to `false` where possible. Any computation that is reliant on the `head` will simply use the `ExecutionStatus` of the head (unless the head block is pre-Bellatrix).

Co-authored-by: Paul Hauner <paul@paulhauner.com>
2022-07-25 08:23:00 +00:00
Paul Hauner
21dec6f603 v2.4.0 (#3360)
## Issue Addressed

NA

## Proposed Changes

Bump versions to v2.4.0

## Additional Info

Blocked on:

- ~~#3349~~
- ~~#3347~~
2022-07-21 22:02:36 +00:00
Pawan Dhananjay
612cdb7092 Merge readiness endpoint (#3349)
## Issue Addressed

Resolves final task in https://github.com/sigp/lighthouse/issues/3260

## Proposed Changes

Adds a lighthouse http endpoint to indicate merge readiness.

Blocked on #3339
2022-07-21 05:45:39 +00:00
Michael Sproul
e32868458f Set safe block hash to justified (#3347)
## Issue Addressed

Closes https://github.com/sigp/lighthouse/issues/3189.

## Proposed Changes

- Always supply the justified block hash as the `safe_block_hash` when calling `forkchoiceUpdated` on the execution engine.
- Refactor the `get_payload` routine to use the new `ForkchoiceUpdateParameters` struct rather than just the `finalized_block_hash`. I think this is a nice simplification and that the old way of computing the `finalized_block_hash` was unnecessary, but if anyone sees reason to keep that approach LMK.
2022-07-21 05:45:37 +00:00
Pawan Dhananjay
5b5cf9cfaa Log ttd (#3339)
## Issue Addressed

Resolves #3249 

## Proposed Changes

Log merge related parameters and EE status in the beacon notifier before the merge.


Co-authored-by: Paul Hauner <paul@paulhauner.com>
2022-07-20 23:16:54 +00:00
ethDreamer
7c3ff903ca Fix Gossip Penalties During Optimistic Sync Window (#3350)
## Issue Addressed
* #3344 

## Proposed Changes

There are a number of cases during block processing where we might get an `ExecutionPayloadError` but we shouldn't penalize peers. We were forgetting to enumerate all of the non-penalizing errors in every single match statement where we are making that decision. I created a function to make it explicit when we should and should not penalize peers and I used that function in all places where this logic is needed. This way we won't make the same mistake if we add another variant of `ExecutionPayloadError` in the future.
2022-07-20 20:59:38 +00:00
Pawan Dhananjay
e5e4e62758 Don't create a execution payload with same timestamp as terminal block (#3331)
## Issue Addressed

Resolves #3316 

## Proposed Changes

This PR fixes an issue where lighthouse created a transition block with `block.execution_payload().timestamp == terminal_block.timestamp` if the terminal block was created at the slot boundary.
2022-07-18 23:15:41 +00:00
Pawan Dhananjay
f9b9658711 Add merge support to simulator (#3292)
## Issue Addressed

N/A

## Proposed Changes

Make simulator merge compatible. Adds a `--post_merge` flag to the eth1 simulator that enables a ttd and simulates the merge transition. Uses the `MockServer` in the execution layer test utils to simulate a dummy execution node.

Adds the merge transition simulation to CI.
2022-07-18 23:15:40 +00:00
Age Manning
2ed51c364d Improve block-lookup functionality (#3287)
Improves some of the functionality around single and parent block lookup. 

Gives extra information about whether failures for lookups are related to processing or downloading.

This is entirely untested.


Co-authored-by: Diva M <divma@protonmail.com>
2022-07-17 23:26:58 +00:00
Pawan Dhananjay
5243cc6c30 Add a u256_hex_be module to encode/decode U256 types (#3321)
## Issue Addressed

Resolves #3314 

## Proposed Changes

Add a module to encode/decode u256 types according to the execution layer encoding/decoding standards
https://github.com/ethereum/execution-apis/blob/main/src/engine/specification.md#structures

Updates `JsonExecutionPayloadV1.base_fee_per_gas`, `JsonExecutionPayloadHeaderV1.base_fee_per_gas`  and `TransitionConfigurationV1.terminal_total_difficulty` to encode/decode according to standards

Co-authored-by: Michael Sproul <micsproul@gmail.com>
2022-07-15 07:31:21 +00:00
Pawan Dhananjay
28b0ff27ff Ignored sync jobs 2 (#3317)
## Issue Addressed

Duplicate of #3269. Making this since @divagant-martian opened the previous PR and she can't approve her own PR 😄 


Co-authored-by: Diva M <divma@protonmail.com>
2022-07-15 07:31:20 +00:00
Akihito Nakano
98a9626ef5 Bump the MSRV to 1.62 and using #[derive(Default)] on enums (#3304)
## Issue Addressed

N/A

## Proposed Changes

Since Rust 1.62, we can use `#[derive(Default)]` on enums.  

https://blog.rust-lang.org/2022/06/30/Rust-1.62.0.html#default-enum-variants

There are no changes to functionality in this PR, just replaced the `Default` trait implementation with `#[derive(Default)]`.
2022-07-15 07:31:19 +00:00
Paul Hauner
1f54e10b7b Do not interpret "latest valid hash" as identifying a valid hash (#3327)
## Issue Addressed

NA

## Proposed Changes

After some discussion in Discord with @mkalinin it was raised that it was not the intention of the engine API to have CLs validate the `latest_valid_hash` (LVH) and all ancestors.

Whilst I believe the engine API is being updated such that the LVH *must* identify a valid hash or be set to some junk value, I'm not confident that we can rely upon the LVH as being valid (at least for now) due to the confusion surrounding it.

Being able to validate blocks via the LVH is a relatively minor optimisation; if the LVH value ends up becoming our head we'll send an fcU and get the VALID status there.

Falsely marking a block as valid has serious consequences and since it's a minor optimisation to use LVH I think that we don't take the risk.

For clarity, we will still *invalidate* the *descendants* of the LVH, we just wont *validate* the *ancestors*.

## Additional Info

NA
2022-07-13 23:07:49 +00:00
Paul Hauner
7a6e6928a3 Further remove EE redundancy (#3324)
## Issue Addressed

Resolves #3176

## Proposed Changes

Continues from PRs by @divagant-martian to gradually remove EL redundancy (see #3284, #3257).

This PR achieves:

- Removes the `broadcast` and `first_success` methods. The functional impact is that every request to the EE will always be tried immediately, regardless of the cached `EngineState` (this resolves #3176). Previously we would check the engine state before issuing requests, this doesn't make sense in a single-EE world; there's only one EE so we might as well try it for every request.
- Runs the upcheck/watchdog routine once per slot rather than thrice. When we had multiple EEs frequent polling was useful to try and detect when the primary EE had come back online and we could switch to it. That's not as relevant now.
- Always creates logs in the `Engines::upcheck` function. Previously we would mute some logs since they could get really noisy when one EE was down but others were functioning fine. Now we only have one EE and are upcheck-ing it less, it makes sense to always produce logs.

This PR purposefully does not achieve:

- Updating all occurances of "engines" to "engine". I'm trying to keep the diff small and manageable. We can come back for this.

## Additional Info

NA
2022-07-13 20:31:39 +00:00
Divma
6d42a09ff8 Merge Engines and Engine struct in one in the execution_layer crate (#3284)
## Issue Addressed

Part of https://github.com/sigp/lighthouse/issues/3118, continuation of https://github.com/sigp/lighthouse/pull/3257 and https://github.com/sigp/lighthouse/pull/3283

## Proposed Changes
- Merge the [`Engines`](9c429d0764/beacon_node/execution_layer/src/engines.rs (L161-L165)) struct and [`Engine` ](9c429d0764/beacon_node/execution_layer/src/engines.rs (L62-L67))
- Remove unnecessary generics 

## Additional Info
There is more cleanup to do that will come in subsequent PRs
2022-07-11 01:44:41 +00:00
Michael Sproul
748475be1d Ensure caches are built for block_rewards POST API (#3305)
## Issue Addressed

Follow up to https://github.com/sigp/lighthouse/pull/3290 that fixes a caching bug

## Proposed Changes

Build the committee cache for the new `POST /lighthouse/analysis/block_rewards` API. Due to an unusual quirk of the total active balance cache the API endpoint would sometimes fail after loading a state from disk which had a current epoch cache _but not_  a total active balance cache. This PR adds calls to build the caches immediately before they're required, and has been running smoothly with `blockdreamer` the last few days.
2022-07-04 02:56:15 +00:00
Akihito Nakano
1cc8a97d4e Remove unused method in HandlerNetworkContext (#3299)
## Issue Addressed

N/A

## Proposed Changes

Removed unused method in `HandlerNetworkContext`.
2022-07-04 02:56:14 +00:00
Divma
1219da9a45 Simplify error handling after engines fallback removal (#3283)
## Issue Addressed
Part of #3118, continuation of #3257

## Proposed Changes
- the [ `first_success_without_retry` ](9c429d0764/beacon_node/execution_layer/src/engines.rs (L348-L351)) function returns a single error.
- the [`first_success`](9c429d0764/beacon_node/execution_layer/src/engines.rs (L324)) function returns a single error.
- [ `EngineErrors` ](9c429d0764/beacon_node/execution_layer/src/lib.rs (L69)) carries a single error.
- [`EngineError`](9c429d0764/beacon_node/execution_layer/src/engines.rs (L173-L177)) now does not need to carry an Id
- [`process_multiple_payload_statuses`](9c429d0764/beacon_node/execution_layer/src/payload_status.rs (L46-L50)) now doesn't need to receive an iterator of statuses and weight in different errors

## Additional Info
This is built on top of #3294
2022-07-04 02:56:13 +00:00
Michael Sproul
61ed5f0ec6 Optimize historic committee calculation for the HTTP API (#3272)
## Issue Addressed

Closes https://github.com/sigp/lighthouse/issues/3270

## Proposed Changes

Optimize the calculation of historic beacon committees in the HTTP API.

This is achieved by allowing committee caches to be constructed for historic epochs, and constructing these committee caches on the fly in the API. This is much faster than reconstructing the state at the requested epoch, which usually takes upwards of 20s, and sometimes minutes with SPRP=8192. The depth of the `randao_mixes` array allows us to look back 64K epochs/0.8 years from a single state, which is pretty awesome!

We always use the `state_id` provided by the caller, but will return a nice 400 error if the epoch requested is out of range for the state requested, e.g.

```bash
# Prater
curl "http://localhost:5052/eth/v1/beacon/states/3170304/committees?epoch=33538"
```

```json
{"code":400,"message":"BAD_REQUEST: epoch out of bounds, try state at slot 1081344","stacktraces":[]}
```

Queries will be fastest when aligned to `slot % SPRP == 0`, so the hint suggests a slot that is 0 mod 8192.
2022-07-04 02:56:11 +00:00
Paul Hauner
be4e261e74 Use async code when interacting with EL (#3244)
## Overview

This rather extensive PR achieves two primary goals:

1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.

Additionally, it achieves:

- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
    - I had to do this to deal with sending blocks into spawned tasks.
    - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
    - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
    - Avoids cloning *all the blocks* in *every chain segment* during sync.
    - It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough 😅)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.

For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273

## Changes to `canonical_head` and `fork_choice`

Previously, the `BeaconChain` had two separate fields:

```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```

Now, we have grouped these values under a single struct:

```
canonical_head: CanonicalHead {
  cached_head: RwLock<Arc<Snapshot>>,
  fork_choice: RwLock<BeaconForkChoice>
} 
```

Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.

## Breaking Changes

### The `state` (root) field in the `finalized_checkpoint` SSE event

Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:

1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.

Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](de2b2801c8/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java (L171-L182)) it uses [`getStateRootFromBlockRoot`](de2b2801c8/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java (L336-L341)) which uses (1).

I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.

## Notes for Reviewers

I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.

I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".

I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.

I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.

Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.

You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.

I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.

Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
realbigsean
a7da0677d5 Remove builder redundancy (#3294)
## Issue Addressed

This PR is a subset of the changes in #3134. Unstable will still not function correctly with the new builder spec once this is merged, #3134 should be used on testnets

## Proposed Changes

- Removes redundancy in "builders" (servers implementing the builder spec)
- Renames `payload-builder` flag to `builder`
- Moves from old builder RPC API to new HTTP API, but does not implement the validator registration API (implemented in https://github.com/sigp/lighthouse/pull/3194)



Co-authored-by: sean <seananderson33@gmail.com>
Co-authored-by: realbigsean <sean@sigmaprime.io>
2022-07-01 01:15:19 +00:00
Divma
d40c76e667 Fix clippy lints for rust 1.62 (#3300)
## Issue Addressed

Fixes some new clippy lints after the last rust release
### Lints fixed for the curious:
- [cast_abs_to_unsigned](https://rust-lang.github.io/rust-clippy/master/index.html#cast_abs_to_unsigned)
- [map_identity](https://rust-lang.github.io/rust-clippy/master/index.html#map_identity) 
- [let_unit_value](https://rust-lang.github.io/rust-clippy/master/index.html#let_unit_value)
- [crate_in_macro_def](https://rust-lang.github.io/rust-clippy/master/index.html#crate_in_macro_def) 
- [extra_unused_lifetimes](https://rust-lang.github.io/rust-clippy/master/index.html#extra_unused_lifetimes)
- [format_push_string](https://rust-lang.github.io/rust-clippy/master/index.html#format_push_string)
2022-06-30 22:51:49 +00:00
realbigsean
f6ec44f0dd Register validator api (#3194)
## Issue Addressed

Lays the groundwork for builder API changes by implementing the beacon-API's new `register_validator` endpoint

## Proposed Changes

- Add a routine in the VC that runs on startup (re-try until success), once per epoch or whenever `suggested_fee_recipient` is updated, signing `ValidatorRegistrationData` and sending it to the BN.
  -  TODO: `gas_limit` config options https://github.com/ethereum/builder-specs/issues/17
-  BN only sends VC registration data to builders on demand, but VC registration data *does update* the BN's prepare proposer cache and send an updated fcU to  a local EE. This is necessary for fee recipient consistency between the blinded and full block flow in the event of fallback.  Having the BN only send registration data to builders on demand gives feedback directly to the VC about relay status. Also, since the BN has no ability to sign these messages anyways (so couldn't refresh them if it wanted), and validator registration is independent of the BN head, I think this approach makes sense. 
- Adds upcoming consensus spec changes for this PR https://github.com/ethereum/consensus-specs/pull/2884
  -  I initially applied the bit mask based on a configured application domain.. but I ended up just hard coding it here instead because that's how it's spec'd in the builder repo. 
  -  Should application mask appear in the api?



Co-authored-by: realbigsean <sean@sigmaprime.io>
2022-06-30 00:49:21 +00:00
Pawan Dhananjay
5de00b7ee8 Unify execution layer endpoints (#3214)
## Issue Addressed

Resolves #3069 

## Proposed Changes

Unify the `eth1-endpoints` and `execution-endpoints` flags in a backwards compatible way as described in https://github.com/sigp/lighthouse/issues/3069#issuecomment-1134219221

Users have 2 options:
1. Use multiple non auth execution endpoints for deposit processing pre-merge
2. Use a single jwt authenticated execution endpoint for both execution layer and deposit processing post merge

Related https://github.com/sigp/lighthouse/issues/3118

To enable jwt authenticated deposit processing, this PR removes the calls to `net_version` as the `net` namespace is not exposed in the auth server in execution clients. 
Moving away from using `networkId` is a good step in my opinion as it doesn't provide us with any added guarantees over `chainId`. See https://github.com/ethereum/consensus-specs/issues/2163 and https://github.com/sigp/lighthouse/issues/2115


Co-authored-by: Paul Hauner <paul@paulhauner.com>
2022-06-29 09:07:09 +00:00
Michael Sproul
53b2b500db Extend block reward APIs (#3290)
## Proposed Changes

Add a new HTTP endpoint `POST /lighthouse/analysis/block_rewards` which takes a vec of `BeaconBlock`s as input and outputs the `BlockReward`s for them.

Augment the `BlockReward` struct with the attestation data for attestations in the block, which simplifies access to this information from blockprint. Using attestation data I've been able to make blockprint up to 95% accurate across Prysm/Lighthouse/Teku/Nimbus. I hope to go even higher using a bunch of synthetic blocks produced for Prysm/Nimbus/Lodestar, which are underrepresented in the current training data.
2022-06-29 04:50:37 +00:00
Paul Hauner
45b2eb18bc v2.3.2-rc.0 (#3289)
## Issue Addressed

NA

## Proposed Changes

Bump versions

## Additional Info

NA
2022-06-28 03:03:30 +00:00
Pawan Dhananjay
7acfbd89ee Recover from NonConsecutive eth1 errors (#3273)
## Issue Addressed

Fixes #1864 and a bunch of other closed but unresolved issues.

## Proposed Changes

Allows the deposit caching to recover from `NonConsecutive` deposit errors by resetting the last processed block to the last valid deposit's block number. Still not sure of the underlying cause of this error, but this should recover the cache so we don't need `--eth1-purge-cache` anymore 🎉 

A huge thanks to @one-three-three-seven for reproducing the error and providing the data that helped testing out the fix 🙌 

Still needs a few more tests.
2022-06-26 23:10:58 +00:00
Akihito Nakano
082ed35bdc Test the pruning of excess peers using randomly generated input (#3248)
## Issue Addressed

https://github.com/sigp/lighthouse/issues/3092


## Proposed Changes

Added property-based tests for the pruning implementation. A randomly generated input for the test contains connection direction, subnets, and scores.


## Additional Info

I left some comments on this PR, what I have tried, and [a question](https://github.com/sigp/lighthouse/pull/3248#discussion_r891981969).

Co-authored-by: Diva M <divma@protonmail.com>
2022-06-25 22:22:34 +00:00
Michael Sproul
d21f083777 Add more paths to HTTP API metrics (#3282)
## Proposed Changes

Expand the set of paths tracked by the HTTP API metrics to include all paths hit by the validator client.

These paths were only partially updated for Altair, so we were missing some of the sync committee and v2 APIs.
2022-06-23 05:19:21 +00:00
Paul Hauner
748658e32c Add some debug logs for checkpoint sync (#3281)
## Issue Addressed

NA

## Proposed Changes

I used these logs when debugging a spurious failure with Infura and thought they might be nice to have around permanently.

There's no changes to functionality in this PR, just some additional `debug!` logs.

## Additional Info

NA
2022-06-23 05:19:20 +00:00
Divma
7af5742081 Deprecate step param in BlocksByRange RPC request (#3275)
## Issue Addressed

Deprecates the step parameter in the blocks by range request

## Proposed Changes

- Modifies the BlocksByRangeRequest type to remove the step parameter and everywhere we took it into account before
- Adds a new type to still handle coding and decoding of requests that use the parameter

## Additional Info
I went with a deprecation over the type itself so that requests received outside `lighthouse_network` don't even need to deal with this parameter. After the deprecation period just removing the Old blocks by range request should be straightforward
2022-06-22 16:23:34 +00:00
Divma
2063c0fa0d Initial work to remove engines fallback from the execution_layer crate (#3257)
## Issue Addressed

Part of #3160 

## Proposed Changes
Use only the first url given in the execution engine, if more than one is provided log it.
This change only moves having multiple engines to one. The amount of code cleanup that can and should be done forward is not small and would interfere with ongoing PRs. I'm keeping the changes intentionally very very minimal.

## Additional Info

Future works:
- In [ `EngineError` ](9c429d0764/beacon_node/execution_layer/src/engines.rs (L173-L177)) the id is not needed since it now has no meaning.
- the [ `first_success_without_retry` ](9c429d0764/beacon_node/execution_layer/src/engines.rs (L348-L351)) function can return a single error.
- the [`first_success`](9c429d0764/beacon_node/execution_layer/src/engines.rs (L324)) function can return a single error.
- After the redundancy is removed for the builders, we can probably make the [ `EngineErrors` ](9c429d0764/beacon_node/execution_layer/src/lib.rs (L69)) carry a single error.
- Merge the [`Engines`](9c429d0764/beacon_node/execution_layer/src/engines.rs (L161-L165)) struct and [`Engine` ](9c429d0764/beacon_node/execution_layer/src/engines.rs (L62-L67))
- Fix the associated configurations and cli params. Not sure if both are done in https://github.com/sigp/lighthouse/pull/3214

In general I think those changes can be done incrementally and in individual pull requests.
2022-06-22 14:27:16 +00:00
Michael Sproul
efebf712dd Avoid cloning snapshots during sync (#3271)
## Issue Addressed

Closes https://github.com/sigp/lighthouse/issues/2944

## Proposed Changes

Remove snapshots from the cache during sync rather than cloning them. This reduces unnecessary cloning and memory fragmentation during sync.

## Additional Info

This PR relies on the fact that the `block_delay` cache is not populated for blocks from sync. Relying on block delay may have the side effect that a change in `block_delay` calculation could lead to: a) more clones, if block delays are added for syncing blocks or b) less clones, if blocks near the head are erroneously provided without a `block_delay`. Case (a) would be a regression to the current status quo, and (b) is low-risk given we know that the snapshot cache is current susceptible to misses (hence `tree-states`).
2022-06-20 23:20:29 +00:00
eklm
a9e158663b Fix validator_monitor_prev_epoch_ metrics (#2911)
## Issue Addressed

#2820

## Proposed Changes

The problem is that validator_monitor_prev_epoch metrics are updated only if there is EpochSummary present in summaries map for the previous epoch and it is not the case for the offline validator. Ensure that EpochSummary is inserted into summaries map also for the offline validators.
2022-06-20 04:06:30 +00:00
Pawan Dhananjay
f428719761 Do not penalize peers on execution layer offline errors (#3258)
## Issue Addressed

Partly resolves https://github.com/sigp/lighthouse/issues/3032

## Proposed Changes

Extracts some of the functionality of #3094 into a separate PR as the original PR requires a bit more work.
Do not unnecessarily penalize peers when we fail to validate received execution payloads because our execution layer is offline.
2022-06-19 23:13:40 +00:00
Paul Hauner
564d7da656 v2.3.1 (#3262)
## Issue Addressed

NA

## Proposed Changes

Bump versions

## Additional Info

NA
2022-06-14 05:25:38 +00:00
Divma
3dd50bda11 Improve substream management (#3261)
## Issue Addressed

Which issue # does this PR address?

## Proposed Changes

Please list or describe the changes introduced by this PR.

## Additional Info

Please provide any additional information. For example, future considerations
or information useful for reviewers.
2022-06-10 06:58:50 +00:00
Paul Hauner
11d80a6a38 Optimise per_epoch_processing low-hanging-fruit (#3254)
## Issue Addressed

NA

## Proposed Changes

- Uses a `Vec` in `SingleEpochParticipationCache` rather than `HashMap` to speed up processing times at the cost of memory usage.
- Cache the result of `integer_sqrt` rather than recomputing for each validator.
- Cache `state.previous_epoch` rather than recomputing it for each validator.

### Benchmarks

Benchmarks on a recent mainnet state using #3252 to get timing.

#### Without this PR

```
lcli skip-slots --state-path /tmp/state-0x3cdc.ssz --partial-state-advance --slots 32 --state-root 0x3cdc33cd02713d8d6cc33a6dbe2d3a5bf9af1d357de0d175a403496486ff845e --runs 10
[2022-06-09T08:21:02Z INFO  lcli::skip_slots] Using mainnet spec
[2022-06-09T08:21:02Z INFO  lcli::skip_slots] Advancing 32 slots
[2022-06-09T08:21:02Z INFO  lcli::skip_slots] Doing 10 runs
[2022-06-09T08:21:02Z INFO  lcli::skip_slots] State path: "/tmp/state-0x3cdc.ssz"
SSZ decoding /tmp/state-0x3cdc.ssz: 43ms
[2022-06-09T08:21:03Z INFO  lcli::skip_slots] Run 0: 245.718794ms
[2022-06-09T08:21:03Z INFO  lcli::skip_slots] Run 1: 245.364782ms
[2022-06-09T08:21:03Z INFO  lcli::skip_slots] Run 2: 255.866179ms
[2022-06-09T08:21:04Z INFO  lcli::skip_slots] Run 3: 243.838909ms
[2022-06-09T08:21:04Z INFO  lcli::skip_slots] Run 4: 250.431425ms
[2022-06-09T08:21:04Z INFO  lcli::skip_slots] Run 5: 248.68765ms
[2022-06-09T08:21:04Z INFO  lcli::skip_slots] Run 6: 262.051113ms
[2022-06-09T08:21:05Z INFO  lcli::skip_slots] Run 7: 264.293967ms
[2022-06-09T08:21:05Z INFO  lcli::skip_slots] Run 8: 293.202007ms
[2022-06-09T08:21:05Z INFO  lcli::skip_slots] Run 9: 264.552017ms
```

#### With this PR:

```
lcli skip-slots --state-path /tmp/state-0x3cdc.ssz --partial-state-advance --slots 32 --state-root 0x3cdc33cd02713d8d6cc33a6dbe2d3a5bf9af1d357de0d175a403496486ff845e --runs 10
[2022-06-09T08:57:59Z INFO  lcli::skip_slots] Run 0: 73.898678ms
[2022-06-09T08:57:59Z INFO  lcli::skip_slots] Run 1: 75.536978ms
[2022-06-09T08:57:59Z INFO  lcli::skip_slots] Run 2: 75.176104ms
[2022-06-09T08:57:59Z INFO  lcli::skip_slots] Run 3: 76.460828ms
[2022-06-09T08:57:59Z INFO  lcli::skip_slots] Run 4: 75.904195ms
[2022-06-09T08:58:00Z INFO  lcli::skip_slots] Run 5: 75.53077ms
[2022-06-09T08:58:00Z INFO  lcli::skip_slots] Run 6: 74.745572ms
[2022-06-09T08:58:00Z INFO  lcli::skip_slots] Run 7: 75.823489ms
[2022-06-09T08:58:00Z INFO  lcli::skip_slots] Run 8: 74.892055ms
[2022-06-09T08:58:00Z INFO  lcli::skip_slots] Run 9: 76.333569ms
```

## Additional Info

NA
2022-06-10 04:29:28 +00:00
Divma
56b4cd88ca minor libp2p upgrade (#3259)
## Issue Addressed

Upgrades libp2p
2022-06-09 23:48:51 +00:00
Divma
cfd26d25e0 do not count sync batch attempts when peer is not at fault (#3245)
## Issue Addressed
currently we count a failed attempt for a syncing chain even if the peer is not at fault. This makes us do more work if the chain fails, and heavily penalize peers, when we can simply retry. Inspired by a proposal I made to #3094 

## Proposed Changes
If a batch fails but the peer is not at fault, do not count the attempt
Also removes some annoying logs

## Additional Info
We still get a counter on ignored attempts.. just in case
2022-06-07 02:35:56 +00:00
Divma
58e223e429 update libp2p (#3233)
## Issue Addressed
na

## Proposed Changes
Updates libp2p to https://github.com/libp2p/rust-libp2p/pull/2662

## Additional Info
From comments on the relevant PRs listed, we should pay attention at peer management consistency, but I don't think anything weird will happen.
This is running in prater tok and sin
2022-06-07 02:35:55 +00:00
Michael Sproul
54cf94ea59 Fix per-slot timer in presence of clock changes (#3243)
## Issue Addressed

Fixes a timing issue that results in spurious fork choice notifier failures:

```
WARN Error signalling fork choice waiter     slot: 3962270, error: ForkChoiceSignalOutOfOrder { current: Slot(3962271), latest: Slot(3962270) }, service: beacon
```

There’s a fork choice run that is scheduled to run at the start of every slot by the `timer`, which creates a 12s interval timer when the beacon node starts up. The problem is that if there’s a bit of clock drift that gets corrected via NTP (or a leap second for that matter) then these 12s intervals will cease to line up with the start of the slot. This then creates the mismatch in slot number that we see above.

Lighthouse also runs fork choice 500ms before the slot begins, and these runs are what is conflicting with the start-of-slot runs. This means that the warning in current versions of Lighthouse is mostly cosmetic because fork choice is up to date with all but the most recent 500ms of attestations (which usually isn’t many).

## Proposed Changes

Fix the per-slot timer so that it continually re-calculates the duration to the start of the next slot and waits for that.

A side-effect of this change is that we may skip slots if the per-slot task takes >12s to run, but I think this is an unlikely scenario and an acceptable compromise.
2022-06-06 23:52:32 +00:00
Divma
493c2c037c reduce reprocess queue/channel sizes (#3239)
## Issue Addressed

Reduces the effect of late blocks on overall node buildup

## Proposed Changes

change the capacity of the channels used to send work for reprocessing in the beacon processor, and to send back to the main processor task, to be 75% of the capacity of the channel for receiving new events

## Additional Info

The issues we've seen suggest we should still evaluate node performance under stress, with late blocks being a big factor. 
Other changes that could help: 
1. right now we have a cap for queued attestations for reprocessing that applies to the sum of aggregated and unaggregated attestations. We could consider adding a separate cap that favors aggregated ones.
2. solving #2848
2022-06-06 23:52:31 +00:00
Akihito Nakano
a6d2ed6119 Fix: PeerManager doesn't remove "outbound only" peers which should be pruned (#3236)
## Issue Addressed

This is one step to address https://github.com/sigp/lighthouse/issues/3092 before introducing `quickcheck`.

I noticed an issue while I was reading the pruning implementation `PeerManager::prune_excess_peers()`. If a peer with the following condition, **`outbound_peers_pruned` counter increases but the peer is not pushed to `peers_to_prune`**.

- [outbound only](1e4ac8a4b9/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L1018))
- [min_subnet_count <= MIN_SYNC_COMMITTEE_PEERS](1e4ac8a4b9/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L1047))

As a result, PeerManager doesn't remove "outbound" peers which should be pruned.

Note: [`subnet_to_peer`](e0d673ea86/beacon_node/lighthouse_network/src/peer_manager/mod.rs (L999)) (HashMap) doesn't guarantee a particular order of iteration. So whether the test fails depend on the order of iteration.
2022-06-06 05:51:10 +00:00
Michael Sproul
47d57a290b Improve eth1 block cache sync (for Ropsten) (#3234)
## Issue Addressed

Fix for the eth1 cache sync issue observed on Ropsten.

## Proposed Changes

Ropsten blocks are so infrequent that they broke our algorithm for downloading eth1 blocks. We currently try to download forwards from the last block in our cache to the block with block number [`remote_highest_block - FOLLOW_DISTANCE + FOLLOW_DISTANCE / ETH1_BLOCK_TIME_TOLERANCE_FACTOR`](6f732986f1/beacon_node/eth1/src/service.rs (L489-L492)). With the tolerance set to 4 this is insufficient because we lag by 1536 blocks, which is more like ~14 hours on Ropsten. This results in us having an incomplete eth1 cache, because we should cache all blocks between -16h and -8h. Even if we were to set the tolerance to 2 for the largest allowance, we would only look back 1024 blocks which is still more than 8 hours.

For example consider this block https://ropsten.etherscan.io/block/12321390. The block from 1536 blocks earlier is 14 hours and 20 minutes before it: https://ropsten.etherscan.io/block/12319854. The block from 1024 blocks earlier is https://ropsten.etherscan.io/block/12320366, 8 hours and 48 minutes before.

- This PR introduces a new CLI flag called `--eth1-cache-follow-distance` which can be used to set the distance manually.
- A new dynamic catchup mechanism is added which detects when the cache is lagging the true eth1 chain and tries to download more blocks within the follow distance in order to catch up.
2022-06-03 06:05:03 +00:00
Mac L
55ac423872 Emit log when fee recipient values are inconsistent (#3202)
## Issue Addressed

#3156

## Proposed Changes

Emit a `WARN` log whenever the value of `fee_recipient` as returned from the EE is different from the value of `suggested_fee_recipient` as set on the BN, for example by the `--suggested-fee-recipient` CLI flag.

## Additional Info

I have set the log level to `WARN` since it is legal behaviour (meaning it isn't really an error but is important to know when it is occurring).

If we feel like this behaviour is almost always undesired (caused by a misconfiguration or malicious EE) then an `ERRO` log would be more appropriate. Happy to change it in that case.
2022-06-03 03:22:54 +00:00
Paul Hauner
16e49af8e1 Use genesis slot for node/syncing (#3226)
## Issue Addressed

NA

## Proposed Changes

Resolves this error log emitted from the VC prior to genesis:

```
WARN Unable connect to beacon node           error: ServerMessage(ErrorMessage { code: 500, message: "UNHANDLED_ERROR: UnableToReadSlot", stacktraces: [] })
```

## Additional Info

NA
2022-05-31 06:09:11 +00:00
Michael Sproul
98c8ac1a87 Fix typo in peer state transition log (#3224)
## Issue Addressed

We were logging `out_finalized_epoch` instead of `our_finalized_epoch`. I noticed this ages ago but only just got around to fixing it.

## Additional Info

I also reformatted the log line to respect the line length limit (`rustfmt` won't do it because it gets confused by the `;` in slog's log macros).
2022-05-31 06:09:10 +00:00
Paul Hauner
6f732986f1 v2.3.0 (#3222)
## Issue Addressed

NA

## Proposed Changes

Please list or describe the changes introduced by this PR.

## Additional Info

- Pending testing on our infra. **Please do not merge**
2022-05-30 01:35:10 +00:00
Divma
a7896a58cc move backfill sync jobs from highest priority to lowest (#3215)
## Issue Addressed
#3212 

## Proposed Changes
Move chain segments coming from back-fill syncing from highest priority to lowest

## Additional Info
If this does not solve the issue, next steps would be lowering the batch size for back-fill sync, and as last resort throttling the processing of these chain segments
2022-05-26 02:05:17 +00:00
Paul Hauner
f4aa17ef85 v2.3.0-rc.0 (#3218)
## Issue Addressed

NA

## Proposed Changes

Bump versions

## Additional Info

NA
2022-05-25 05:29:26 +00:00
Michael Sproul
229f883968 Avoid parallel fork choice runs during sync (#3217)
## Issue Addressed

Fixes an issue that @paulhauner found with the v2.3.0 release candidate whereby the fork choice runs introduced by #3168 tripped over each other during sync:

```
May 24 23:06:40.542 WARN Error signalling fork choice waiter     slot: 3884129, error: ForkChoiceSignalOutOfOrder { current: Slot(3884131), latest: Slot(3884129) }, service: beacon
```

This can occur because fork choice is called from the state advance _and_ the per-slot task. When one of these runs takes a long time it can end up finishing after a run from a later slot, tripping the error above. The problem is resolved by not running either of these fork choice calls during sync.

Additionally, these parallel fork choice runs were causing issues in the database:

```
May 24 07:49:05.098 WARN Found a chain that should already have been pruned, head_slot: 92925, head_block_root: 0xa76c7bf1b98e54ed4b0d8686efcfdf853484e6c2a4c67e91cbf19e5ad1f96b17, service: beacon
May 24 07:49:05.101 WARN Database migration failed               error: HotColdDBError(FreezeSlotError { current_split_slot: Slot(92608), proposed_split_slot: Slot(92576) }), service: beacon
```

In this case, two fork choice calls triggering the finalization processing were being processed out of order due to differences in their processing time, causing the background migrator to try to advance finalization _backwards_ 😳. Removing the parallel fork choice runs from sync effectively addresses the issue, because these runs are most likely to have different finalized checkpoints (because of the speed at which fork choice advances during sync). In theory it's still possible to process updates out of order if any other fork choice runs end up completing out of order, but this should be much less common. Fixing out of order fork choice runs in general is difficult as it requires architectural changes like serialising fork choice updates through a single thread, or locking fork choice along with the head when it is mutated (https://github.com/sigp/lighthouse/pull/3175).

## Proposed Changes

* Don't run per-slot fork choice during sync (if head is older than 4 slots)
* Don't run state-advance fork choice during sync (if head is older than 4 slots)
* Check for monotonic finalization updates in the background migrator. This is a good defensive check to have, and I'm not sure why we didn't have it before (we may have had it and wrongly removed it).
2022-05-25 03:27:30 +00:00
Paul Hauner
7a64994283 Call per_slot_task from a blocking thread (v2) (#3199)
*This PR was adapted from @pawanjay176's work in #3197.*

## Issue Addressed

Fixes a regression in https://github.com/sigp/lighthouse/pull/3168

## Proposed Changes

https://github.com/sigp/lighthouse/pull/3168 added calls to `fork_choice` in  `BeaconChain::per_slot_task` function. This leads to a panic as `per_slot_task` is called from an async context which calls fork choice, which then calls `block_on`.

This PR changes the timer to call the `per_slot_task` function in a blocking thread.

Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
2022-05-20 23:05:07 +00:00
Michael Sproul
6eaeaa542f Fix Rust 1.61 clippy lints (#3192)
## Issue Addressed

This fixes the low-hanging Clippy lints introduced in Rust 1.61 (due any hour now). It _ignores_ one lint, because fixing it requires a structural refactor of the validator client that needs to be done delicately. I've started on that refactor and will create another PR that can be reviewed in more depth in the coming days. I think we should merge this PR in the meantime to unblock CI.
2022-05-20 05:02:13 +00:00
Michael Sproul
8fa032c8ae Run fork choice before block proposal (#3168)
## Issue Addressed

Upcoming spec change https://github.com/ethereum/consensus-specs/pull/2878

## Proposed Changes

1. Run fork choice at the start of every slot, and wait for this run to complete before proposing a block.
2. As an optimisation, also run fork choice 3/4 of the way through the slot (at 9s), _dequeueing attestations for the next slot_.
3. Remove the fork choice run from the state advance timer that occurred before advancing the state.

## Additional Info

### Block Proposal Accuracy

This change makes us more likely to propose on top of the correct head in the presence of re-orgs with proposer boost in play. The main scenario that this change is designed to address is described in the linked spec issue.

### Attestation Accuracy

This change _also_ makes us more likely to attest to the correct head. Currently in the case of a skipped slot at `slot` we only run fork choice 9s into `slot - 1`. This means the attestations from `slot - 1` aren't taken into consideration, and any boost applied to the block from `slot - 1` is not removed (it should be). In the language of the linked spec issue, this means we are liable to attest to C, even when the majority voting weight has already caused a re-org to B.

### Why remove the call before the state advance?

If we've run fork choice at the start of the slot then it has already dequeued all the attestations from the previous slot, which are the only ones eligible to influence the head in the current slot. Running fork choice again is unnecessary (unless we run it for the next slot and try to pre-empt a re-org, but I don't currently think this is a great idea).

### Performance

Based on Prater testing this adds about 5-25ms of runtime to block proposal times, which are 500-1000ms on average (and spike to 5s+ sometimes due to state handling issues 😢 ). I believe this is a small enough penalty to enable it by default, with the option to disable it via the new flag `--fork-choice-before-proposal-timeout 0`. Upcoming work on block packing and state representation will also reduce block production times in general, while removing the spikes.

### Implementation

Fork choice gets invoked at the start of the slot via the `per_slot_task` function called from the slot timer. It then uses a condition variable to signal to block production that fork choice has been updated. This is a bit funky, but it seems to work. One downside of the timer-based approach is that it doesn't happen automatically in most of the tests. The test added by this PR has to trigger the run manually.
2022-05-20 05:02:11 +00:00
realbigsean
54b58fdc01 Log out response status when we hit PayloadIdUnavailable (#3190)
## Issue Addressed

@z3n-chada is currently getting a `PayloadIdUnavailable` error when connecting lighthouse to Erigon and it's difficult to discern why so this just logs out the response status from the EE when we hit an `PayloadIdUnavailable` error

Co-authored-by: realbigsean <sean@sigmaprime.io>
2022-05-19 06:00:48 +00:00
Akihito Nakano
695f415590 Tiny improvement: PeerManager and maximum discovery query (#3182)
## Issue Addressed

As [`Discovery` bounds the maximum discovery query](e88b18be09/beacon_node/lighthouse_network/src/discovery/mod.rs (L328)), `PeerManager` no need to handle it.

e88b18be09/beacon_node/lighthouse_network/src/discovery/mod.rs (L328)
2022-05-19 06:00:46 +00:00
Mac L
def9bc660e Remove DB migrations for legacy database schemas (#3181)
## Proposed Changes

Remove support for DB migrations that support upgrading from schema's below version 5. This is mostly for cosmetic/code quality reasons as in most circumstances upgrading from versions of Lighthouse this old will almost always require a re-sync.

## Additional Info

The minimum supported database schema is now version 5.
2022-05-17 04:54:39 +00:00
Paul Hauner
db8a6f81ea Prevent attestation to future blocks from early attester cache (#3183)
## Issue Addressed

N/A

## Proposed Changes

Prevents the early attester cache from producing attestations to future blocks. This bug could result in a missed head vote if the BN was requested to produce an attestation for an earlier slot than the head block during the (usually) short window of time between verifying a block and setting it as the head.

This bug was noticed in an [Antithesis](https://andreagrieser.com/) test and diagnosed by @realbigsean. 

## Additional Info

NA
2022-05-17 01:51:25 +00:00
Paul Hauner
38050fa460 Allow TaskExecutor to be used in async tests (#3178)
# Description

Since the `TaskExecutor` currently requires a `Weak<Runtime>`, it's impossible to use it in an async test where the `Runtime` is created outside our scope. Whilst we *could* create a new `Runtime` instance inside the async test, dropping that `Runtime` would cause a panic (you can't drop a `Runtime` in an async context).

To address this issue, this PR creates the `enum Handle`, which supports either:

- A `Weak<Runtime>` (for use in our production code)
- A `Handle` to a runtime (for use in testing)

In theory, there should be no change to the behaviour of our production code (beyond some slightly different descriptions in HTTP 500 errors), or even our tests. If there is no change, you might ask *"why bother?"*. There are two PRs (#3070 and #3175) that are waiting on these fixes to introduce some new tests. Since we've added the EL to the `BeaconChain` (for the merge), we are now doing more async stuff in tests.

I've also added a `RuntimeExecutor` to the `BeaconChainTestHarness`. Whilst that's not immediately useful, it will become useful in the near future with all the new async testing.
2022-05-16 08:35:59 +00:00
François Garillot
3f9e83e840 [refactor] Refactor Option/Result combinators (#3180)
Code simplifications using `Option`/`Result` combinators to make pattern-matches a tad simpler. 
Opinions on these loosely held, happy to adjust in review.

Tool-aided by [comby-rust](https://github.com/huitseeker/comby-rust).
2022-05-16 01:59:47 +00:00
Michael Sproul
bcdd960ab1 Separate execution payloads in the DB (#3157)
## Proposed Changes

Reduce post-merge disk usage by not storing finalized execution payloads in Lighthouse's database.

⚠️ **This is achieved in a backwards-incompatible way for networks that have already merged** ⚠️. Kiln users and shadow fork enjoyers will be unable to downgrade after running the code from this PR. The upgrade migration may take several minutes to run, and can't be aborted after it begins.

The main changes are:

- New column in the database called `ExecPayload`, keyed by beacon block root.
- The `BeaconBlock` column now stores blinded blocks only.
- Lots of places that previously used full blocks now use blinded blocks, e.g. analytics APIs, block replay in the DB, etc.
- On finalization:
    - `prune_abanonded_forks` deletes non-canonical payloads whilst deleting non-canonical blocks.
    - `migrate_db` deletes finalized canonical payloads whilst deleting finalized states.
- Conversions between blinded and full blocks are implemented in a compositional way, duplicating some work from Sean's PR #3134.
- The execution layer has a new `get_payload_by_block_hash` method that reconstructs a payload using the EE's `eth_getBlockByHash` call.
   - I've tested manually that it works on Kiln, using Geth and Nethermind.
   - This isn't necessarily the most efficient method, and new engine APIs are being discussed to improve this: https://github.com/ethereum/execution-apis/pull/146.
   - We're depending on the `ethers` master branch, due to lots of recent changes. We're also using a workaround for https://github.com/gakonst/ethers-rs/issues/1134.
- Payload reconstruction is used in the HTTP API via `BeaconChain::get_block`, which is now `async`. Due to the `async` fn, the `blocking_json` wrapper has been removed.
- Payload reconstruction is used in network RPC to serve blocks-by-{root,range} responses. Here the `async` adjustment is messier, although I think I've managed to come up with a reasonable compromise: the handlers take the `SendOnDrop` by value so that they can drop it on _task completion_ (after the `fn` returns). Still, this is introducing disk reads onto core executor threads, which may have a negative performance impact (thoughts appreciated).

## Additional Info

- [x] For performance it would be great to remove the cloning of full blocks when converting them to blinded blocks to write to disk. I'm going to experiment with a `put_block` API that takes the block by value, breaks it into a blinded block and a payload, stores the blinded block, and then re-assembles the full block for the caller.
- [x] We should measure the latency of blocks-by-root and blocks-by-range responses.
- [x] We should add integration tests that stress the payload reconstruction (basic tests done, issue for more extensive tests: https://github.com/sigp/lighthouse/issues/3159)
- [x] We should (manually) test the schema v9 migration from several prior versions, particularly as blocks have changed on disk and some migrations rely on being able to load blocks.

Co-authored-by: Paul Hauner <paul@paulhauner.com>
2022-05-12 00:42:17 +00:00
Michael Sproul
ae47a93c42 Don't panic in forkchoiceUpdated handler (#3165)
## Issue Addressed

Fix a panic due to misuse of the Tokio executor when processing a forkchoiceUpdated response. We were previously calling `process_invalid_execution_payload` from the async function `update_execution_engine_forkchoice_async`, which resulted in a panic because `process_invalid_execution_payload` contains a call to fork choice, which ultimately calls `block_on`.

An example backtrace can be found here: https://gist.github.com/michaelsproul/ac5da03e203d6ffac672423eaf52fb20

## Proposed Changes

Wrap the call to `process_invalid_execution_payload` in a `spawn_blocking` so that `block_on` is no longer called from an async context.

## Additional Info

- I've been thinking about how to catch bugs like this with static analysis (a new Clippy lint).
- The payload validation tests have been re-worked to support distinct responses from the mock EE for newPayload and forkchoiceUpdated. Three new tests have been added covering the `Invalid`, `InvalidBlockHash` and `InvalidTerminalBlock` cases.
- I think we need a bunch more tests of different legal and illegal variations
2022-05-04 23:30:34 +00:00
Pawan Dhananjay
db0beb5178 Poll shutdown timeout in rpc handler (#3153)
## Issue Addressed

N/A

## Proposed Changes

Previously, we were using `Sleep::is_elapsed()` to check if the shutdown timeout had triggered without polling the sleep. This PR polls the sleep timer.
2022-04-13 03:54:44 +00:00
Divma
580d2f7873 log upgrades + prevent dialing of disconnecting peers (#3148)
## Issue Addressed
We still ping peers that are considered in a disconnecting state

## Proposed Changes

Do not ping peers once we decide they are disconnecting
Upgrade logs about ignored rpc messages

## Additional Info
--
2022-04-13 03:54:43 +00:00
Paul Hauner
b49b4291a3 Disallow attesting to optimistic head (#3140)
## Issue Addressed

NA

## Proposed Changes

Disallow the production of attestations and retrieval of unaggregated attestations when they reference an optimistic head. Add tests to this end.

I also moved `BeaconChain::produce_unaggregated_attestation_for_block` to the `BeaconChainHarness`. It was only being used during tests, so it's nice to stop pretending it's production code. I also needed something that could produce attestations to optimistic blocks in order to simulate scenarios where the justified checkpoint is determined invalid (if no one would attest to an optimistic block, we could never justify it and then flip it to invalid).

## Additional Info

- ~~Blocked on #3126~~
2022-04-13 03:54:42 +00:00
Divma
7366266bd1 keep failed finalized chains to avoid retries (#3142)
## Issue Addressed

In very rare occasions we've seen most if not all our peers in a chain with which we don't agree. Purging these peers can take a very long time: number of retries of the chain. Meanwhile sync is caught in a loop trying the chain again and again. This makes it so that we fast track purging peers via registering the failed chain to prevent retrying for some time (30 seconds). Longer times could be dangerous since a chain can fail if a batch fails to download for example. In this case, I think it's still acceptable to fast track purging peers since they are nor providing the required info anyway 

Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>
2022-04-13 01:10:55 +00:00
Michael Sproul
aa72088f8f v2.2.1 (#3149)
## Issue Addressed

Addresses sync stalls on v2.2.0 (i.e. https://github.com/sigp/lighthouse/issues/3147).

## Additional Info

I've avoided doing a full `cargo update` because I noticed there's a new patch version of libp2p and thought it could do with some more testing.



Co-authored-by: Paul Hauner <paul@paulhauner.com>
2022-04-12 02:52:12 +00:00
Paul Hauner
c8edeaff29 Don't log crits for missing EE before Bellatrix (#3150)
## Issue Addressed

NA

## Proposed Changes

Fixes an issue introduced in #3088 which was causing unnecessary `crit` logs on networks without Bellatrix enabled.

## Additional Info

NA
2022-04-11 23:14:47 +00:00
Pawan Dhananjay
fff4dd6311 Fix rpc limits version 2 (#3146)
## Issue Addressed

N/A

## Proposed Changes

https://github.com/sigp/lighthouse/pull/3133 changed the rpc type limits to be fork aware i.e. if our current fork based on wall clock slot is Altair, then we apply only altair rpc type limits. This is a bug because phase0 blocks can still be sent over rpc and phase 0 block minimum size is smaller than altair block minimum size. So a phase0 block with `size < SIGNED_BEACON_BLOCK_ALTAIR_MIN` will return an `InvalidData` error as it doesn't pass the rpc types bound check.

This error can be seen when we try syncing pre-altair blocks with size smaller than `SIGNED_BEACON_BLOCK_ALTAIR_MIN`.

This PR fixes the issue by also accounting for forks earlier than current_fork in the rpc limits calculation in the  `rpc_block_limits_by_fork` function. I decided to hardcode the limits in the function because that seemed simpler than calculating previous forks based on current fork and doing a min across forks. Adding a new fork variant is simple and can the limits can be easily checked in a review. 

Adds unit tests and modifies the syncing simulator to check the syncing from across fork boundaries. 
The syncing simulator's block 1 would always be of phase 0 minimum size (404 bytes) which is smaller than altair min block size (since block 1 contains no attestations).
2022-04-07 23:45:38 +00:00
ethDreamer
22002a4e68 Transition Block Proposer Preparation (#3088)
## Issue Addressed

- #3058 

Co-authored-by: Paul Hauner <paul@paulhauner.com>
2022-04-07 14:03:34 +00:00
Aren49
5ff4013263 Fix SPRP default value in cli (#3145)
Changed SPRP to the correct default value of 8192.
2022-04-07 04:04:11 +00:00
Paul Hauner
8a40763183 Ensure VALID response from fcU updates protoarray (#3126)
## Issue Addressed

NA

## Proposed Changes

Ensures that a `VALID` response from a `forkchoiceUpdate` call will update that block in `ProtoArray`.

I also had to modify the mock execution engine so it wouldn't return valid when all payloads were supposed to be some other static value.

## Additional Info

NA
2022-04-05 20:58:17 +00:00
Paul Hauner
42cdaf5840 Add tests for importing blocks on invalid parents (#3123)
## Issue Addressed

NA

## Proposed Changes

- Adds more checks to prevent importing blocks atop parent with invalid execution payloads.
- Adds a test for these conditions.

## Additional Info

NA
2022-04-05 20:58:16 +00:00
Michael Sproul
bac7c3fa54 v2.2.0 (#3139)
## Proposed Changes

Cut release v2.2.0 including proposer boost.

## Additional Info

I also updated the clippy lints for the imminent release of Rust 1.60, although LH v2.2.0 will continue to compile using Rust 1.58 (our MSRV).
2022-04-05 02:53:09 +00:00
Michael Sproul
4d0122444b Update and consolidate dependencies (#3136)
## Proposed Changes

I did some gardening 🌳 in our dependency tree:

- Remove duplicate versions of `warp` (git vs patch)
- Remove duplicate versions of lots of small deps: `cpufeatures`, `ethabi`, `ethereum-types`, `bitvec`, `nix`, `libsecp256k1`.
- Update MDBX (should resolve #3028). I tested and Lighthouse compiles on Windows 11 now.
- Restore `psutil` back to upstream
- Make some progress updating everything to rand 0.8. There are a few crates stuck on 0.7.

Hopefully this puts us on a better footing for future `cargo audit` issues, and improves compile times slightly.

## Additional Info

Some crates are held back by issues with `zeroize`. libp2p-noise depends on [`chacha20poly1305`](https://crates.io/crates/chacha20poly1305) which depends on zeroize < v1.5, and we can only have one version of zeroize because it's post 1.0 (see https://github.com/rust-lang/cargo/issues/6584). The latest version of `zeroize` is v1.5.4, which is used by the new versions of many other crates (e.g. `num-bigint-dig`). Once a new version of chacha20poly1305 is released we can update libp2p-noise and upgrade everything to the latest `zeroize` version.

I've also opened a PR to `blst` related to zeroize: https://github.com/supranational/blst/pull/111
2022-04-04 00:26:16 +00:00
Pawan Dhananjay
ab434bc075 Fix merge rpc length limits (#3133)
## Issue Addressed

N/A

## Proposed Changes

Fix the upper bound for blocks by root responses to be equal to the max merge block size instead of altair.
Further make the rpc response limits fork aware.
2022-04-04 00:26:15 +00:00
Michael Sproul
375e2b49b3 Conserve disk space by raising default SPRP (#3137)
## Proposed Changes

Increase the default `--slots-per-restore-point` to 8192 for a 4x reduction in freezer DB disk usage.

Existing nodes that use the previous default of 2048 will be left unchanged. Newly synced nodes (with or without checkpoint sync) will use the new 8192 default. 

Long-term we could do away with the freezer DB entirely for validator-only nodes, but this change is much simpler and grants us some extra space in the short term. We can also roll it out gradually across our nodes by purging databases one by one, while keeping the Ansible config the same.

## Additional Info

We ignore a change from 2048 to 8192 if the user hasn't set the 8192 explicitly. We fire a debug log in the case where we do ignore:

```
DEBG Ignoring slots-per-restore-point config in favour of on-disk value, on_disk: 2048, config: 8192
```
2022-04-01 07:16:25 +00:00
Pawan Dhananjay
9ec072ff3b Strip newline from jwt secrets (#3132)
## Issue Addressed

Resolves #3128 

## Proposed Changes

Strip trailing newlines from jwt secret files.
2022-04-01 00:59:00 +00:00
Michael Sproul
41e7a07c51 Add lighthouse db command (#3129)
## Proposed Changes

Add a `lighthouse db` command with three initial subcommands:

- `lighthouse db version`: print the database schema version.
- `lighthouse db migrate --to N`: manually upgrade (or downgrade!) the database to a different version.
- `lighthouse db inspect --column C`: log the key and size in bytes of every value in a given `DBColumn`.

This PR lays the groundwork for other changes, namely:

- Mark's fast-deposit sync (https://github.com/sigp/lighthouse/pull/2915), for which I think we should implement a database downgrade (from v9 to v8).
- My `tree-states` work, which already implements a downgrade (v10 to v8).
- Standalone purge commands like `lighthouse db purge-dht` per https://github.com/sigp/lighthouse/issues/2824.

## Additional Info

I updated the `strum` crate to 0.24.0, which necessitated some changes in the network code to remove calls to deprecated methods.

Thanks to @winksaville for the motivation, and implementation work that I used as a source of inspiration (https://github.com/sigp/lighthouse/pull/2685).
2022-04-01 00:58:59 +00:00
realbigsean
ea783360d3 Kiln mev boost (#3062)
## Issue Addressed

MEV boost compatibility

## Proposed Changes

See #2987

## Additional Info

This is blocked on the stabilization of a couple specs, [here](https://github.com/ethereum/beacon-APIs/pull/194) and [here](https://github.com/flashbots/mev-boost/pull/20).

Additional TODO's and outstanding questions

- [ ] MEV boost JWT Auth
- [ ] Will `builder_proposeBlindedBlock` return the revealed payload for the BN to propogate
- [ ] Should we remove `private-tx-proposals` flag and communicate BN <> VC with blinded blocks by default once these endpoints enter the beacon-API's repo? This simplifies merge transition logic. 

Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: realbigsean <sean@sigmaprime.io>
2022-03-31 07:52:23 +00:00
realbigsean
83234ee4ce json rpc id to value (#3110)
## Issue Addressed

N/A

## Proposed Changes

- Update the JSON-RPC id field for both our request and response objects to be a `serde_json::Value` rather than a `u32`. This field could be a string or a number according to the JSON-RPC 2.0 spec. We only ever set it to a number, but if, for example, we get a response that wraps this number in quotes, we would fail to deserialize it. I think because we're not doing any validation around this id otherwise, we should be less strict with it in this regard. 

## Additional Info



Co-authored-by: realbigsean <sean@sigmaprime.io>
2022-03-29 22:59:55 +00:00
Paul Hauner
26e5281c68 Increase timeouts for EEs (#3125)
## Issue Addressed

NA

## Proposed Changes

In the first Goerli shadow-fork, Lighthouse was getting timeouts from Geth which prevented the LH+Geth pair from progressing.

There's not a whole lot of information I can use to set these timeouts. The most interesting pieces of information I have are quotes from Marius from Geth:

- *"Fcu also needs to construct the block which can take 2sec"* ([Discord](https://discord.com/channels/595666850260713488/910910348922589184/958006487052066836))
- *"2 sec should be ok for new payload, weird that it times out"* ([Discord](https://discord.com/channels/595666850260713488/910910348922589184/958006487052066836))

I don't think we should be so worried about getting these timeouts correct now. No one really knows how long the various EEs are going to take, it's a bit too early in development. With these changes I'm giving some headroom so that we don't fail just because EEs are quite optimized enough. I've set the value to 6s (half a mainnet slot), since I think anything beyond 6s is an interesting problem that we want to know about sooner rather than later.

## Additional Info

NA
2022-03-28 23:32:12 +00:00
Pawan Dhananjay
a42cb69f6e Update engine state in broadcast (#3071)
## Issue Addressed

N/A

## Proposed Changes

Set the engine state to `EngineState::Offline` if the engine api call fails during broadcast. This caused issues while pausing sync when the execution engine is offline because `EngineState` always returned `Synced`.
2022-03-28 23:32:11 +00:00
Michael Sproul
6efd95496b Optionally skip RANDAO verification during block production (#3116)
## Proposed Changes

Allow Lighthouse to speculatively create blocks via the `/eth/v1/validators/blocks` endpoint by optionally skipping the RANDAO verification that we introduced in #2740. When `verify_randao=false` is passed as a query parameter the `randao_reveal` is not required to be present, and if present will only be lightly checked (must be a valid BLS sig). If `verify_randao` is omitted it defaults to true and Lighthouse behaves exactly as it did previously, hence this PR is backwards-compatible.

I'd like to get this change into `unstable` pretty soon as I've got 3 projects building on top of it:

- [`blockdreamer`](https://github.com/michaelsproul/blockdreamer), which mocks block production every slot in order to fingerprint clients
- analysis of Lighthouse's block packing _optimality_, which uses `blockdreamer` to extract interesting instances of the attestation packing problem
- analysis of Lighthouse's block packing _performance_ (as in speed) on the `tree-states` branch

## Additional Info

Having tested `blockdreamer` with Prysm, Nimbus and Teku I noticed that none of them verify the randao signature on `/eth/v1/validator/blocks`. I plan to open a PR to the `beacon-APIs` repo anyway so that this parameter can be standardised in case the other clients add RANDAO verification by default in future.
2022-03-28 07:14:13 +00:00
Lucas Manuel
adca0efc64 feat: Update ASCII art (#3113)
## Issue Addressed

No issue, just updating merge ASCII art.

## Proposed Changes

Updating ASCII art for merge.

## Additional Info

Please provide any additional information. For example, future considerations
or information useful for reviewers.
2022-03-24 00:04:50 +00:00
Mac L
41b5af9b16 Support IPv6 in BN and VC HTTP APIs (#3104)
## Issue Addressed

#3103

## Proposed Changes

Parse `http-address` and `metrics-address` as `IpAddr` for both the beacon node and validator client to support IPv6 addresses.
Also adjusts parsing of CORS origins to allow for IPv6 addresses.

## Usage
You can now set  `http-address` and/or `metrics-address`  flags to IPv6 addresses.
For example, the following:
`lighthouse bn --http --http-address :: --metrics --metrics-address ::1`
will expose the beacon node HTTP server on `[::]` (equivalent of `0.0.0.0` in IPv4) and the metrics HTTP server on `localhost` (the equivalent of `127.0.0.1` in IPv4) 

The beacon node API can then be accessed by:
`curl "http://[server-ipv6-address]:5052/eth/v1/some_endpoint"`

And the metrics server api can be accessed by:
`curl "http://localhost:5054/metrics"` or by `curl "http://[::1]:5054/metrics"`

## Additional Info
On most Linux distributions the `v6only` flag is set to `false` by default (see the section for the `IPV6_V6ONLY` flag in https://www.man7.org/linux/man-pages/man7/ipv6.7.html) which means IPv4 connections will continue to function on a IPv6 address (providing it is appropriately mapped). This means that even if the Lighthouse API is running on `::` it is also possible to accept IPv4 connections.

However on Windows, this is not the case. The `v6only` flag is set to `true` so binding to `::` will only allow IPv6 connections.
2022-03-24 00:04:49 +00:00
Divma
788b6af3c4 Remove sync await points (#3036)
## Issue Addressed

Removes the await points in sync waiting for a processor response for rpc block processing. Built on top of #3029 
This also handles a couple of bugs in the previous code and adds a relatively comprehensive test suite.
2022-03-23 01:09:39 +00:00
ethDreamer
af50130e21 Add Proposer Cache Pruning & POS Activated Banner (#3109)
## Issue Addressed

The proposers cache wasn't being pruned. Also didn't have a celebratory banner for the merge 😄

## Banner
![pos_log_panda](https://user-images.githubusercontent.com/37123614/159528545-3aa54cbd-9362-49b1-830c-f4402f6ac341.png)
2022-03-22 21:33:38 +00:00
realbigsean
ae5b141dc4 Updates to tests and local testnet for Ganache 7 (#3056)
## Issue Addressed

#2961

## Proposed Changes

-- update `--chainId` -> `--chain.chainId`
-- remove `--keepAliveTimeout`
-- fix log to listen for
-- rename `ganache-cli` to `ganache` everywhere


Co-authored-by: realbigsean <sean@sigmaprime.io>
2022-03-20 22:48:14 +00:00
Michael Sproul
9bc9527998 v2.1.5 (#3096)
## Issue Addressed

New release to address openssl vuln fixed in #3095

Closes #3093
2022-03-17 23:13:46 +00:00
Paul Hauner
28aceaa213 v2.1.4 (#3076)
## Issue Addressed

NA

## Proposed Changes

- Bump version to `v2.1.4`
- Run `cargo update`

## Additional Info

I think this release should be published around the 15th of March.

Presently `blocked` for testing on our infrastructure.
2022-03-14 23:11:40 +00:00
Paul Hauner
e4fa7d906f Fix post-merge checkpoint sync (#3065)
## Issue Addressed

This address an issue which was preventing checkpoint-sync.

When the node starts from checkpoint sync, the head block and the finalized block are the same value. We did not respect this when sending a `forkchoiceUpdated` (fcU) call to the EL and were expecting fork choice to hold the *finalized ancestor of the head* and returning an error when it didn't.

This PR uses *only fork choice* for sending fcU updates. This is actually quite nice and avoids some atomicity issues between `chain.canonical_head` and `chain.fork_choice`. Now, whenever `chain.fork_choice.get_head` returns a value we also cache the values required for the next fcU call.

## TODO

- [x] ~~Blocked on #3043~~
- [x] Ensure there isn't a warn message at startup.
2022-03-10 06:05:24 +00:00
Paul Hauner
c475499dfe Fix UnableToReadSlot at startup (#3066)
## Issue Addressed

Don't send an fcU message at startup if it's pre-genesis. The startup fcU message is not critical, not required by the spec, so it's fine to avoid it for networks that start post-Bellatrix fork.
2022-03-09 23:04:19 +00:00
Paul Hauner
267d8babc8 Prepare proposer (#3043)
## Issue Addressed

Resolves #2936

## Proposed Changes

Adds functionality for calling [`validator/prepare_beacon_proposer`](https://ethereum.github.io/beacon-APIs/?urls.primaryName=dev#/Validator/prepareBeaconProposer) in advance.

There is a `BeaconChain::prepare_beacon_proposer` method which, which called, computes the proposer for the next slot. If that proposer has been registered via the `validator/prepare_beacon_proposer` API method, then the `beacon_chain.execution_layer` will be provided the `PayloadAttributes` for us in all future forkchoiceUpdated calls. An artificial forkchoiceUpdated call will be created 4s before each slot, when the head updates and when a validator updates their information.

Additionally, I added strict ordering for calls from the `BeaconChain` to the `ExecutionLayer`. I'm not certain the `ExecutionLayer` will always maintain this ordering, but it's a good start to have consistency from the `BeaconChain`. There are some deadlock opportunities introduced, they are documented in the code.

## Additional Info

- ~~Blocked on #2837~~

Co-authored-by: realbigsean <seananderson33@GMAIL.com>
2022-03-09 00:42:05 +00:00
Divma
527dfa4893 cargo audit updates (#3063)
## Issue Addressed
Closes #3008 and updates `regex` to solve https://rustsec.org/advisories/RUSTSEC-2022-0013
2022-03-08 19:48:12 +00:00
Pawan Dhananjay
381d0ece3c auth for engine api (#3046)
## Issue Addressed

Resolves #3015 

## Proposed Changes

Add JWT token based authentication to engine api requests. The jwt secret key is read from the provided file and is used to sign tokens that are used for authenticated communication with the EL node.

- [x] Interop with geth (synced `merge-devnet-4` with the `merge-kiln-v2` branch on geth)
- [x] Interop with other EL clients (nethermind on `merge-devnet-4`)
- [x] ~Implement `zeroize` for jwt secrets~
- [x] Add auth server tests with `mock_execution_layer`
- [x] Get auth working with the `execution_engine_integration` tests






Co-authored-by: Paul Hauner <paul@paulhauner.com>
2022-03-08 06:46:24 +00:00
Paul Hauner
3b4865c3ae Poll the engine_exchangeTransitionConfigurationV1 endpoint (#3047)
## Issue Addressed

There has been an [`engine_exchangetransitionconfigurationv1`](https://github.com/ethereum/execution-apis/blob/main/src/engine/specification.md#engine_exchangetransitionconfigurationv1) method added to the execution API specs.

The `engine_exchangetransitionconfigurationv1` will be polled every 60s as per this PR: https://github.com/ethereum/execution-apis/pull/189. If that PR is merged as-is, then we will be matching the spec. If that PR *is not* merged, we are still fully compatible with the spec, but just doing more than we are required.

## Additional Info

- [x] ~~Blocked on #2837~~
- [x] Add method to EE integration tests
2022-03-08 04:40:42 +00:00
Akihito Nakano
4186d117af Replace OpenOptions::new with File::options to be readable (#3059)
## Issue Addressed

Closes #3049 

This PR updates widely but this replace is safe as `File::options()` is equivelent to `OpenOptions::new()`.
ref: https://doc.rust-lang.org/stable/src/std/fs.rs.html#378-380
2022-03-07 06:30:18 +00:00
tim gretler
cbda0a2f0a Add log debounce to work processor (#3045)
## Issue Addressed

#3010 

## Proposed Changes

- move log debounce time latch to `./common/logging`
- add timelatch to limit logging for `attestations_delay_queue` and `queued_block_roots`

## Additional Info

- Is a separate crate for the time latch preferred? 
- `elapsed()` could take `LOG_DEBOUNCE_INTERVAL ` as an argument to allow for different granularity.
2022-03-07 06:30:17 +00:00
Michael Sproul
1829250ee4 Ignore attestations to finalized blocks (don't reject) (#3052)
## Issue Addressed

Addresses spec changes from v1.1.0:

- https://github.com/ethereum/consensus-specs/pull/2830
- https://github.com/ethereum/consensus-specs/pull/2846

## Proposed Changes

* Downgrade the REJECT for `HeadBlockFinalized` to an IGNORE. This applies to both unaggregated and aggregated attestations.

## Additional Info

I thought about also changing the penalty for `UnknownTargetRoot` but I don't think it's reachable in practice.
2022-03-04 00:41:22 +00:00
Paul Hauner
09d2187198 Lower debug! logs to trace! (#3053)
## Issue Addressed

These logs were very loud during sync.
2022-03-03 22:37:42 +00:00
Paul Hauner
aea43b626b Rename random to prev_randao (#3040)
## Issue Addressed

As discussed on last-night's consensus call, the testnets next week will target the [Kiln Spec v2](https://hackmd.io/@n0ble/kiln-spec).

Presently, we support Kiln V1. V2 is backwards compatible, except for renaming `random` to `prev_randao` in:

- https://github.com/ethereum/execution-apis/pull/180
- https://github.com/ethereum/consensus-specs/pull/2835

With this PR we'll no longer be compatible with the existing Kintsugi and Kiln testnets, however we'll be ready for the testnets next week. I raised this breaking change in the call last night, we are all keen to move forward and break things.

We now target the [`merge-kiln-v2`](https://github.com/MariusVanDerWijden/go-ethereum/tree/merge-kiln-v2) branch for interop with Geth. This required adding the `--http.aauthport` to the tester to avoid a port conflict at startup.

### Changes to exec integration tests

There's some change in the `merge-kiln-v2` version of Geth that means it can't compile on a vanilla Github runner. Bumping the `go` version on the runner solved this issue.

Whilst addressing this, I refactored the `testing/execution_integration` crate to be a *binary* rather than a *library* with tests. This means that we don't need to run the `build.rs` and build Geth whenever someone runs `make lint` or `make test-release`. This is nice for everyday users, but it's also nice for CI so that we can have a specific runner for these tests and we don't need to ensure *all* runners support everything required to build all execution clients.

## More Info

- [x] ~~EF tests are failing since the rename has broken some tests that reference the old field name. I have been told there will be new tests released in the coming days (25/02/22 or 26/02/22).~~
2022-03-03 02:10:57 +00:00
Divma
4bf1af4e85 Custom RPC request management for sync (#3029)
## Proposed Changes
Make `lighthouse_network` generic over request ids, now usable by sync
2022-03-02 22:07:17 +00:00
Age Manning
e88b18be09 Update libp2p (#3039)
Update libp2p. 

This corrects some gossipsub metrics.
2022-03-02 05:09:52 +00:00
Age Manning
f3c1dde898 Filter non global ips from discovery (#3023)
## Issue Addressed

#3006 

## Proposed Changes

This PR changes the default behaviour of lighthouse to ignore discovered IPs that are not globally routable. It adds a CLI flag, --enable-local-discovery to permit the non-global IPs in discovery.

NOTE: We should take care in merging this as I will break current set-ups that rely on local IP discovery. I made this the non-default behaviour because we dont really want to be wasting resources attempting to connect to non-routable addresses and we dont want to propagate these to others (on the chance we can connect to one of these local nodes), improving discoveries efficiency.
2022-03-02 03:14:27 +00:00
Age Manning
e34524be75 Increase default target-peer count to 80 (#3005)
Increase the default peer count from 50 to 80
2022-03-02 01:05:07 +00:00
Paul Hauner
b6493d5e24 Enforce Optimistic Sync Conditions & CLI Tests (v2) (#3050)
## Description

This PR adds a single, trivial commit (f5d2b27d78349d5a675a2615eba42cc9ae708094) atop #2986 to resolve a tests compile error. The original author (@ethDreamer) is AFK so I'm getting this one merged ☺️ 

Please see #2986 for more information about the other, significant changes in this PR.


Co-authored-by: Mark Mackey <mark@sigmaprime.io>
Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>
2022-03-01 22:56:47 +00:00
Age Manning
a1b730c043 Cleanup small issues (#3027)
Downgrades some excessive networking logs and corrects some metrics.
2022-03-01 01:49:22 +00:00
Paul Hauner
27e83b888c Retrospective invalidation of exec. payloads for opt. sync (#2837)
## Issue Addressed

NA

## Proposed Changes

Adds the functionality to allow blocks to be validated/invalidated after their import as per the [optimistic sync spec](https://github.com/ethereum/consensus-specs/blob/dev/sync/optimistic.md#how-to-optimistically-import-blocks). This means:

- Updating `ProtoArray` to allow flipping the `execution_status` of ancestors/descendants based on payload validity updates.
- Creating separation between `execution_layer` and the `beacon_chain` by creating a `PayloadStatus` struct.
- Refactoring how the `execution_layer` selects a `PayloadStatus` from the multiple statuses returned from multiple EEs.
- Adding testing framework for optimistic imports.
- Add `ExecutionBlockHash(Hash256)` new-type struct to avoid confusion between *beacon block roots* and *execution payload hashes*.
- Add `merge` to [`FORKS`](c3a793fd73/Makefile (L17)) in the `Makefile` to ensure we test the beacon chain with merge settings.
    - Fix some tests here that were failing due to a missing execution layer.

## TODO

- [ ] Balance tests

Co-authored-by: Mark Mackey <mark@sigmaprime.io>
2022-02-28 22:07:48 +00:00
Michael Sproul
5e1f8a8480 Update to Rust 1.59 and 2021 edition (#3038)
## Proposed Changes

Lots of lint updates related to `flat_map`, `unwrap_or_else` and string patterns. I did a little more creative refactoring in the op pool, but otherwise followed Clippy's suggestions.

## Additional Info

We need this PR to unblock CI.
2022-02-25 00:10:17 +00:00
Mac L
104e3104f9 Add API to compute block packing efficiency data (#2879)
## Issue Addressed
N/A

## Proposed Changes
Add a HTTP API which can be used to compute the block packing data for all blocks over a discrete range of epochs.

## Usage
### Request
```
curl "http:localhost:5052/lighthouse/analysis/block_packing_efficiency?start_epoch=57730&end_epoch=57732"
```
### Response
```
[
  {
    "slot": "1847360",
    "block_hash": "0xa7dc230659802df2f99ea3798faede2e75942bb5735d56e6bfdc2df335dcd61f",
    "proposer_info": {
      "validator_index": 1686,
      "graffiti": ""
    },
    "available_attestations": 7096,
    "included_attestations": 6459,
    "prior_skip_slots": 0
  },
  ...
]
```
## Additional Info

This is notably different to the existing lcli code:
- Uses `BlockReplayer` #2863 and as such runs significantly faster than the previous method.
- Corrects the off-by-one #2878
- Removes the `offline` validators component. This was only a "best guess" and simply was used as a way to determine an estimate of the "true" packing efficiency and was generally not helpful in terms of direct comparisons between different packing methods. As such it has been removed from the API and any future estimates of "offline" validators would be better suited in a separate/more targeted API or as part of 'beacon watch': #2873 
- Includes `prior_skip_slots`.
2022-02-21 23:21:02 +00:00
eklm
56b2ec6b29 Allow proposer duties request for the next epoch (#2963)
## Issue Addressed

Closes #2880 

## Proposed Changes

Support requests to the next epoch in proposer_duties api.

## Additional Info

Implemented with skipping proposer cache for this case because the cache for the future epoch will be missed every new slot as dependent_root is changed and we don't want to "wash it out" by saving additional values.
2022-02-18 05:32:00 +00:00
Age Manning
3ebb8b0244 Improved peer management (#2993)
## Issue Addressed

I noticed in some logs some excess and unecessary discovery queries. What was happening was we were pruning our peers down to our outbound target and having some disconnect. When we are below this threshold we try to find more peers (even if we are at our peer limit). The request becomes futile because we have no more peer slots. 

This PR corrects this issue and advances the pruning mechanism to favour subnet peers. 

An overview the new logic added is:
- We prune peers down to a target outbound peer count which is higher than the minimum outbound peer count.
- We only search for more peers if there is room to do so, and we are below the minimum outbound peer count not the target. So this gives us some buffer for peers to disconnect. The buffer is currently 10%

The modified pruning logic is documented in the code but for reference it should do the following:
- Prune peers with bad scores first
- If we need to prune more peers, then prune peers that are subscribed to a long-lived subnet
- If we still need to prune peers, the prune peers that we have a higher density of on any given subnet which should drive for uniform peers across all subnets.

This will need a bit of testing as it modifies some significant peer management behaviours in lighthouse.
2022-02-18 02:36:43 +00:00
Paul Hauner
0a6a8ea3b0 Engine API v1.0.0.alpha.6 + interop tests (#3024)
## Issue Addressed

NA

## Proposed Changes

This PR extends #3018 to address my review comments there and add automated integration tests with Geth (and other implementations, in the future).

I've also de-duplicated the "unused port" logic by creating an  `common/unused_port` crate.

## Additional Info

I'm not sure if we want to merge this PR, or update #3018 and merge that. I don't mind, I'm primarily opening this PR to make sure CI works.


Co-authored-by: Mark Mackey <mark@sigmaprime.io>
2022-02-17 21:47:06 +00:00
Paul Hauner
2f8531dc60 Update to consensus-specs v1.1.9 (#3016)
## Issue Addressed

Closes #3014

## Proposed Changes

- Rename `receipt_root` to `receipts_root`
- Rename `execute_payload` to `notify_new_payload`
   - This is slightly weird since we modify everything except the actual HTTP call to the engine API. That change is expected to be implemented in #2985 (cc @ethDreamer)
- Enable "random" tests for Bellatrix.

## Notes

This will break *partially* compatibility with Kintusgi testnets in order to gain compatibility with [Kiln](https://hackmd.io/@n0ble/kiln-spec) testnets. I think it will only break the BN APIs due to the `receipts_root` change, however it might have some other effects too.

Co-authored-by: Michael Sproul <micsproul@gmail.com>
2022-02-14 23:57:23 +00:00
Paul Hauner
c3a793fd73 v2.1.3 (#3017)
## Issue Addressed

NA

## Proposed Changes

Bump versions

## Additional Info

NA
2022-02-11 01:54:33 +00:00
Divma
1306b2db96 libp2p upgrade + gossipsub interval fix (#3012)
## Issue Addressed
Lighthouse gossiping late messages

## Proposed Changes
Point LH to our fork using tokio interval, which 1) works as expected 2) is more performant than the previous version that actually worked as expected
Upgrade libp2p 

## Additional Info
https://github.com/libp2p/rust-libp2p/issues/2497
2022-02-10 04:12:03 +00:00
Philipp K
5388183884 Allow per validator fee recipient via flag or file in validator client (similar to graffiti / graffiti-file) (#2924)
## Issue Addressed

#2883 

## Proposed Changes

* Added `suggested-fee-recipient` & `suggested-fee-recipient-file` flags to validator client (similar to graffiti / graffiti-file implementation).
* Added proposer preparation service to VC, which sends the fee-recipient of all known validators to the BN via [/eth/v1/validator/prepare_beacon_proposer](https://github.com/ethereum/beacon-APIs/pull/178) api once per slot
* Added [/eth/v1/validator/prepare_beacon_proposer](https://github.com/ethereum/beacon-APIs/pull/178) api endpoint and preparation data caching
* Added cleanup routine to remove cached proposer preparations when not updated for 2 epochs

## Additional Info

Changed the Implementation following the discussion in #2883.



Co-authored-by: pk910 <philipp@pk910.de>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
Co-authored-by: Philipp K <philipp@pk910.de>
2022-02-08 19:52:20 +00:00
Divma
36fc887a40 Gossip cache timeout adjustments (#2997)
## Proposed Changes

- Do not retry to publish sync committee messages.
- Give a more lenient timeout to slashings and exits
2022-02-07 23:25:06 +00:00
Age Manning
675c7b7e26 Correct a dial race condition (#2992)
## Issue Addressed

On a network with few nodes, it is possible that the same node can be found from a subnet discovery and a normal peer discovery at the same time.

The network behaviour loads these peers into events and processes them when it has the chance. It can happen that the same peer can enter the event queue more than once and then attempt to be dialed twice. 

This PR shifts the registration of nodes in the peerdb as being dialed before they enter the NetworkBehaviour queue, preventing multiple attempts of the same peer being entered into the queue and avoiding the race condition.
2022-02-07 23:25:05 +00:00
Divma
48b7c8685b upgrade libp2p (#2933)
## Issue Addressed

Upgrades libp2p to v.0.42.0 pre release (https://github.com/libp2p/rust-libp2p/pull/2440)
2022-02-07 23:25:03 +00:00
Divma
615695776e Retry gossipsub messages when insufficient peers (#2964)
## Issue Addressed
#2947 

## Proposed Changes

Store messages that fail to be published due to insufficient peers for retry later. Messages expire after half an epoch and are retried if gossipsub informs us that an useful peer has connected. Currently running in Atlanta

## Additional Info
If on retry sending the messages fails they will not be tried again
2022-02-03 01:12:30 +00:00
Paul Hauner
0177b9286e v2.1.2 (#2980)
## Issue Addressed

NA

## Proposed Changes

- Bump version to `v2.1.2`
- Run `cargo update`

## Additional Info

NA
2022-02-01 23:53:53 +00:00
Paul Hauner
fc37d51e10 Add checks to prevent fwding old messages (#2978)
## Issue Addressed

NA

## Proposed Changes

Checks to see if attestations or sync messages are still valid before "accepting" them for propagation.

## Additional Info

NA
2022-02-01 01:04:24 +00:00
Paul Hauner
a6da87066b Add strict penalties const bool (#2976)
## Issue Addressed

NA

## Proposed Changes

Adds `STRICT_LATE_MESSAGE_PENALTIES: bool` which allows for toggling penalties for late sync/attn messages.

`STRICT_LATE_MESSAGE_PENALTIES` is set to `false`, since we're seeing a lot of late messages on the network which are causing peer drops. We can toggle the bool during testing to try and figure out what/who is the cause of these late messages.

In effect, this PR *relaxes* peer downscoring for late attns and sync committee messages.

## Additional Info

- ~~Blocked on #2974~~
2022-02-01 01:04:22 +00:00
Mac L
286996b090 Fix small typo in error log (#2975)
## Proposed Changes

Fixes a small typo I came across.
2022-01-31 22:55:07 +00:00
Age Manning
bdd70d7aef Reduce gossip history (#2969)
The gossipsub history was increased to a good portion of a slot from 2.1 seconds in the last release.

Although it shouldn't cause too much issue, it could be related to recieving later messages than usual and interacting with our scoring system penalizing peers. For consistency, this PR reduces the time we gossip messages back to the same values of the previous release.

It also adjusts the gossipsub heartbeat time for testing purposes with a developer flag but this should not effect end users.
2022-01-31 07:29:41 +00:00
Michael Sproul
99d2c33387 Avoid looking up pre-finalization blocks (#2909)
## Issue Addressed

This PR fixes the unnecessary `WARN Single block lookup failed` messages described here:

https://github.com/sigp/lighthouse/pull/2866#issuecomment-1008442640

## Proposed Changes

Add a new cache to the `BeaconChain` that tracks the block roots of blocks from before finalization. These could be blocks from the canonical chain (which might need to be read from disk), or old pre-finalization blocks that have been forked out.

The cache also stores a set of block roots for in-progress single block lookups, which duplicates some of the information from sync's `single_block_lookups` hashmap:

a836e180f9/beacon_node/network/src/sync/manager.rs (L192-L196)

On a live node you can confirm that the cache is working by grepping logs for the message: `Rejected attestation to finalized block`.
2022-01-27 22:58:32 +00:00
Mac L
e05142b798 Add API to compute discrete validator attestation performance (#2874)
## Issue Addressed

N/A

## Proposed Changes

Add a HTTP API which can be used to compute the attestation performances of a validator (or all validators) over a discrete range of epochs.
Performances can be computed for a single validator, or for the global validator set. 

## Usage
### Request
The API can be used as follows:
```
curl "http://localhost:5052/lighthouse/analysis/attestation_performance/{validator_index}?start_epoch=57730&end_epoch=57732"
```
Alternatively, to compute performances for the global validator set:
```
curl "http://localhost:5052/lighthouse/analysis/attestation_performance/global?start_epoch=57730&end_epoch=57732"
```

### Response
The response is JSON formatted as follows:
```
[
  {
    "index": 72,
    "epochs": {
      "57730": {
        "active": true,
        "head": false,
        "target": false,
        "source": false
      },
      "57731": {
        "active": true,
        "head": true,
        "target": true,
        "source": true,
        "delay": 1
      },
      "57732": {
        "active": true,
        "head": true,
        "target": true,
        "source": true,
        "delay": 1
      },
    }
  }
]
```
> Note that the `"epochs"` are not guaranteed to be in ascending order. 

## Additional Info

- This API is intended to be used in our upcoming validator analysis tooling (#2873) and will likely not be very useful for regular users. Some advanced users or block explorers may find this API useful however.
- The request range is limited to 100 epochs (since the range is inclusive and it also computes the `end_epoch` it's actually 101 epochs) to prevent Lighthouse using exceptionally large amounts of memory.
2022-01-27 22:58:31 +00:00
Michael Sproul
e70daaa3b6 Implement API for block rewards (#2628)
## Proposed Changes

Add an API endpoint for retrieving detailed information about block rewards.

For information on usage see [the docs](https://github.com/sigp/lighthouse/blob/block-rewards-api/book/src/api-lighthouse.md#lighthouseblock_rewards), and the source.
2022-01-27 01:06:02 +00:00
Divma
f2b1e096b2 Code quality improvents to the network service (#2932)
Checking how to priorize the polling of the network I moved most of the service code to functions. This change I think it's worth on it's own for code quality since inside the `tokio::select` many tools don't work (cargo fmt, sometimes clippy, and sometimes even the compiler's errors get wack). This is functionally equivalent to the previous code, just better organized
2022-01-26 23:14:23 +00:00
Divma
9964f5afe5 Document why we hash downloaded blocks for both sync algs (#2927)
## Proposed Changes
Initially the idea was to remove hashing of blocks in backfill sync. After considering it more, we conclude that we need to do it in both (forward and backfill) anyway. But since we forgot why we were doing it in the first place, this PR documents this logic. 

Future us should find it useful 


Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>
2022-01-26 23:14:22 +00:00
Paul Hauner
5f628a71d4 v2.1.1 (#2951)
## Issue Addressed

NA

## Proposed Changes

- Bump Lighthouse version to v2.1.1
- Update `thread_local` from v1.1.3 to v1.1.4 to address https://rustsec.org/advisories/RUSTSEC-2022-0006

## Additional Info

- ~~Blocked on #2950~~
- ~~Blocked on #2952~~
2022-01-25 00:46:24 +00:00
Age Manning
ca29b580a2 Increase target subnet peers (#2948)
In the latest release we decreased the target number of subnet peers. 

It appears this could be causing issues in some cases and so reverting it back to the previous number it wise. A larger PR that follows this will address some other related discovery issues and peer management around subnet peer discovery.
2022-01-24 12:08:00 +00:00
Rishi Kumar Ray
f0f327af0c Removed all disable_forks (#2925)
#2923 

Which issue # does this PR address?
There's a redundant field on the BeaconChain called disabled_forks that was once part of our fork-aware networking (#953) but which is no longer used and could be deleted. so Removed all references to disabled_forks so that the code compiles and git grep disabled_forks returns no results.

## Proposed Changes

Please list or describe the changes introduced by this PR.
Removed all references of disabled_forks


Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>
2022-01-20 09:14:26 +00:00
Age Manning
fc7a1a7dc7 Allow disconnected states to introduce new peers without warning (#2922)
## Issue Addressed

We emit a warning to verify that all peer connection state information is consistent. A warning is given under one edge case;
We try to dial a peer with peer-id X and multiaddr Y. The peer responds to multiaddr Y with a different peer-id, Z. The dialing to the peer fails, but libp2p injects the failed attempt as peer-id Z. 

In this instance, our PeerDB tries to add a new peer in the disconnected state under a previously unknown peer-id. This is harmless and so this PR permits this behaviour without logging a warning.
2022-01-20 09:14:25 +00:00
Mac L
d06f87486a Support duplicate keys in HTTP API query strings (#2908)
## Issues Addressed

Closes #2739
Closes #2812

## Proposed Changes

Support the deserialization of query strings containing duplicate keys into their corresponding types.
As `warp` does not support this feature natively (as discussed in #2739), it relies on the external library [`serde_array_query`](https://github.com/sigp/serde_array_query) (written by @michaelsproul)

This is backwards compatible meaning that both of the following requests will produce the same output:
```
curl "http://localhost:5052/eth/v1/events?topics=head,block"
```
```
curl "http://localhost:5052/eth/v1/events?topics=head&topics=block"
```

## Additional Info

Certain error messages have changed slightly.  This only affects endpoints which accept multiple values.
For example:
```
{"code":400,"message":"BAD_REQUEST: invalid query: Invalid query string","stacktraces":[]}
```
is now
```
{"code":400,"message":"BAD_REQUEST: unable to parse query","stacktraces":[]}
```


The serve order of the endpoints `get_beacon_state_validators` and `get_beacon_state_validators_id` have flipped:
```rust
.or(get_beacon_state_validators_id.boxed())
.or(get_beacon_state_validators.boxed())
``` 
This is to ensure proper error messages when filter fallback occurs due to the use of the `and_then` filter.

## Future Work
- Cleanup / remove filter fallback behaviour by substituting `and_then` with `then` where appropriate.
- Add regression tests for HTTP API error messages.

## Credits
- @mooori for doing the ground work of investigating possible solutions within the existing Rust ecosystem.
- @michaelsproul for writing [`serde_array_query`](https://github.com/sigp/serde_array_query) and for helping debug the behaviour of the `warp` filter fallback leading to incorrect error messages.
2022-01-20 09:14:19 +00:00
Paul Hauner
79db2d4deb v2.1.0 (#2928)
## Issue Addressed

NA

## Proposed Changes

Bump to `v2.1.0`.

## Additional Info

NA
2022-01-20 03:39:41 +00:00
Michael Sproul
ef7351ddfe Update to spec v1.1.8 (#2893)
## Proposed Changes

Change the canonical fork name for the merge to Bellatrix. Keep other merge naming the same to avoid churn.

I've also fixed and enabled the `fork` and `transition` tests for Bellatrix, and the v1.1.7 fork choice tests.

Additionally, the `BellatrixPreset` has been added with tests. It gets served via the `/config/spec` API endpoint along with the other presets.
2022-01-19 00:24:19 +00:00
Michael Sproul
a836e180f9 Release v2.1.0-rc.1 (#2921)
## Proposed Changes

New release candidate to address Windows build failure for rc.0
2022-01-17 03:25:30 +00:00
Paul Hauner
a26b8802da Release v2.1.0-rc.0 (#2905)
## Issue Addressed

NA

## Proposed Changes

Bump version tags to `v2.1.0-rc.0`.

## Additional Info

NA
2022-01-16 23:25:25 +00:00
Paul Hauner
c11253a82f Remove grandparents from snapshot cache (#2917)
## Issue Addressed

NA

## Proposed Changes

In https://github.com/sigp/lighthouse/pull/2832 we made some changes to the `SnapshotCache` to help deal with the one-block reorgs seen on mainnet (and testnets).

I believe the change in #2832 is good and we should keep it, but I think that in its present form it is causing the `SnapshotCache` to hold onto states that it doesn't need anymore. For example, a skip slot will result in one more `BeaconSnapshot` being stored in the cache.

This PR adds a new type of pruning that happens after a block is inserted to the cache. We will remove any snapshot from the cache that is a *grandparent* of the block being imported. Since we know the grandparent has two valid blocks built atop it, it is not at risk from a one-block re-org. 

## Additional Info

NA
2022-01-14 07:20:55 +00:00
Michael Sproul
ceeab02e3a Lazy hashing for SignedBeaconBlock in sync (#2916)
## Proposed Changes

Allocate less memory in sync by hashing the `SignedBeaconBlock`s in a batch directly, rather than going via SSZ bytes.

Credit to @paulhauner for finding this source of temporary allocations.
2022-01-14 07:20:54 +00:00
Age Manning
1c667ad3ca PeerDB Status unknown bug fix (#2907)
## Issue Addressed

The PeerDB was getting out of sync with the number of disconnected peers compared to the actual count. As this value determines how many we store in our cache, over time the cache was depleting and we were removing peers immediately resulting in errors that manifest as unknown peers for some operations.

The error occurs when dialing a peer fails, we were not correctly updating the peerdb counter because the increment to the counter was placed in the wrong order and was therefore not incrementing the count. 

This PR corrects this.
2022-01-14 05:42:48 +00:00
Age Manning
6f4102aab6 Network performance tuning (#2608)
There is a pretty significant tradeoff between bandwidth and speed of gossipsub messages. 

We can reduce our bandwidth usage considerably at the cost of minimally delaying gossipsub messages. The impact of delaying messages has not been analyzed thoroughly yet, however this PR in conjunction with some gossipsub updates show considerable bandwidth reduction. 

This PR allows the user to set a CLI value (`network-load`) which is an integer in the range of 1 of 5 depending on their bandwidth appetite. 1 represents the least bandwidth but slowest message recieving and 5 represents the most bandwidth and fastest received message time. 

For low-bandwidth users it is likely to be more efficient to use a lower value. The default is set to 3, which currently represents a reduced bandwidth usage compared to previous version of this PR. The previous lighthouse versions are equivalent to setting the `network-load` CLI to 4.

This PR is awaiting a few gossipsub updates before we can get it into lighthouse.
2022-01-14 05:42:47 +00:00
Michael Sproul
e8887ffea0 Rust 1.58 lints (#2906)
## Issue Addressed

Closes #2616

## Proposed Changes

* Fixes for new Rust 1.58.0 lints
* Enable the `fn_to_numeric_cast_any` (#2616)
2022-01-13 22:39:58 +00:00
Paul Hauner
2ce2ec9b62 Remove penalty for attesting to unknown head (#2903)
## Issue Addressed

- Resolves https://github.com/sigp/lighthouse/issues/2902

## Proposed Changes

As documented in https://github.com/sigp/lighthouse/issues/2902, there are some cases where we will score peers very harshly for sending attestations to an unknown head.

This PR removes the penalty when an attestation for an unknown head is received, queued for block look-up, then popped from the queue without the head block being known. This prevents peers from being penalized for an unknown block when that peer was never actually asked for the block.

Peer penalties should still be applied to the peers who *do* get the request for the block and fail to respond with a valid block. As such, peers who send us attestations to non-existent heads should eventually be booted.

## Additional Info

- [ ] Need to confirm that a timeout for a bbroot request will incur a penalty.
2022-01-13 03:08:38 +00:00
Paul Hauner
aaa5344eab Add peer score adjustment msgs (#2901)
## Issue Addressed

N/A

## Proposed Changes

This PR adds the `msg` field to `Peer score adjusted` log messages. These `msg` fields help identify *why* a peer was banned.

Example:

```
Jan 11 04:18:48.096 DEBG Peer score adjusted                     score: -100.00, peer_id: 16Uiu2HAmQskxKWWGYfginwZ51n5uDbhvjHYnvASK7PZ5gBdLmzWj, msg: attn_unknown_head, service: libp2p
Jan 11 04:18:48.096 DEBG Peer score adjusted                     score: -27.86, peer_id: 16Uiu2HAmA7cCb3MemVDbK3MHZoSb7VN3cFUG3vuSZgnGesuVhPDE, msg: sync_past_slot, service: libp2p
Jan 11 04:18:48.096 DEBG Peer score adjusted                     score: -100.00, peer_id: 16Uiu2HAmQskxKWWGYfginwZ51n5uDbhvjHYnvASK7PZ5gBdLmzWj, msg: attn_unknown_head, service: libp2p
Jan 11 04:18:48.096 DEBG Peer score adjusted                     score: -28.86, peer_id: 16Uiu2HAmA7cCb3MemVDbK3MHZoSb7VN3cFUG3vuSZgnGesuVhPDE, msg: sync_past_slot, service: libp2p
Jan 11 04:18:48.096 DEBG Peer score adjusted                     score: -29.86, peer_id: 16Uiu2HAmA7cCb3MemVDbK3MHZoSb7VN3cFUG3vuSZgnGesuVhPDE, msg: sync_past_slot, service: libp2p
```

There is also a `libp2p_report_peer_msgs_total` metrics which allows us to see count of reports per `msg` tag. 

## Additional Info

NA
2022-01-12 05:32:14 +00:00
Paul Hauner
61f60bdf03 Avoid penalizing peers for delays during processing (#2894)
## Issue Addressed

NA

## Proposed Changes

We have observed occasions were under-resourced nodes will receive messages that were valid *at the time*, but later become invalidated due to long waits for a `BeaconProcessor` worker.

In this PR, we will check to see if the message was valid *at the time of receipt*. If it was initially valid but invalid now, we just ignore the message without penalizing the peer.

## Additional Info

NA
2022-01-12 02:36:24 +00:00
Paul Hauner
4848e53155 Avoid peer penalties on internal errors for batch block import (#2898)
## Issue Addressed

NA

## Proposed Changes

I've observed some Prater nodes (and potentially some mainnet nodes) banning peers due to validator pubkey cache lock timeouts. For the `BeaconChainError`-type of errors, they're caused by internal faults and we can't necessarily tell if the peer is bad or not. I think this is causing us to ban peers unnecessarily when running on under-resourced machines.

## Additional Info

NA
2022-01-11 05:33:28 +00:00
Paul Hauner
02e2fd2fb8 Add early attester cache (#2872)
## Issue Addressed

NA

## Proposed Changes

Introduces a cache to attestation to produce atop blocks which will become the head, but are not fully imported (e.g., not inserted into the database).

Whilst attesting to a block before it's imported is rather easy, if we're going to produce that attestation then we also need to be able to:

1. Verify that attestation.
1. Respond to RPC requests for the `beacon_block_root`.

Attestation verification (1) is *partially* covered. Since we prime the shuffling cache before we insert the block into the early attester cache, we should be fine for all typical use-cases. However, it is possible that the cache is washed out before we've managed to insert the state into the database and then attestation verification will fail with a "missing beacon state"-type error.

Providing the block via RPC (2) is also partially covered, since we'll check the database *and* the early attester cache when responding a blocks-by-root request. However, we'll still omit the block from blocks-by-range requests (until the block lands in the DB). I *think* this is fine, since there's no guarantee that we return all blocks for those responses.

Another important consideration is whether or not the *parent* of the early attester block is available in the databse. If it were not, we might fail to respond to blocks-by-root request that are iterating backwards to collect a chain of blocks. I argue that *we will always have the parent of the early attester block in the database.* This is because we are holding the fork-choice write-lock when inserting the block into the early attester cache and we do not drop that until the block is in the database.
2022-01-11 01:35:55 +00:00
Philipp K
668477872e Allow value for beacon_node fee-recipient argument (#2884)
## Issue Addressed

The fee-recipient argument of the beacon node does not allow a value to be specified:

> $ lighthouse beacon_node --merge --fee-recipient "0x332E43696A505EF45b9319973785F837ce5267b9"
> error: Found argument '0x332E43696A505EF45b9319973785F837ce5267b9' which wasn't expected, or isn't valid in this context
> 
> USAGE:
>    lighthouse beacon_node --fee-recipient --merge
>
> For more information try --help

## Proposed Changes

Allow specifying a value for the fee-recipient argument in beacon_node/src/cli.rs

## Additional Info

I've added .takes_value(true) and successfully proposed a block in the kintsugi testnet with my own fee-recipient address instead of the hardcoded default. I think that was just missed as the argument does not make sense without a value :)


Co-authored-by: pk910 <philipp@pk910.de>
Co-authored-by: Michael Sproul <micsproul@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2022-01-07 01:21:42 +00:00
Paul Hauner
f6b5b1a8be Use ? debug formatting for block roots in beacon_chain.rs (#2890)
## Issue Addressed

NA

## Proposed Changes

Ensures full roots are printed, rather than shortened versions like `0x935b…d376`.

For example, it would be nice if we could do API queries based upon the roots shown in the `Beacon chain re-org` event:

```
Jan 05 12:36:52.224 WARN Beacon chain re-org                     reorg_distance: 2, new_slot: 2073184, new_head: 0x8a97…2dec, new_head_parent: 0xa985…7688, previous_slot: 2073183, previous_head: 0x935b…d376, service: beacon
Jan 05 13:35:05.832 WARN Beacon chain re-org                     reorg_distance: 1, new_slot: 2073475, new_head: 0x9207…c6b9, new_head_parent: 0xb2ce…839b, previous_slot: 2073474, previous_head: 0x8066…92f7, service: beacon
```

## Additional Info

We should eventually fix this project-wide, however this is a short-term patch.
2022-01-06 05:16:50 +00:00
Michael Sproul
fac117667b Update to superstruct v0.4.1 (#2886)
## Proposed Changes

Update `superstruct` to bring in @realbigsean's fixes necessary for MEV-compatible private beacon block types (a la #2795).

The refactoring is due to another change in superstruct that allows partial getters to be auto-generated.
2022-01-06 03:14:58 +00:00
Pawan Dhananjay
a0c5701e36 Only import blocks with valid execution payloads (#2869)
## Issue Addressed

N/A

## Proposed Changes

We are currently treating errors from the EL on `engine_executePayload` as `PayloadVerificationStatus::NotVerified`. This adds the block as a candidate head block in fork choice even if the EL explicitly rejected the block as invalid. 

`PayloadVerificationStatus::NotVerified` should be only returned when the EL explicitly returns "syncing" imo. This PR propagates an error instead of returning `NotVerified` on EL all EL errors.
2021-12-22 08:15:37 +00:00
Age Manning
81c667b58e Additional networking metrics (#2549)
Adds additional metrics for network monitoring and evaluation.


Co-authored-by: Mark Mackey <mark@sigmaprime.io>
2021-12-22 06:17:14 +00:00
Michael Sproul
3b61ac9cbf Optimise slasher DB layout and switch to MDBX (#2776)
## Issue Addressed

Closes #2286
Closes #2538
Closes #2342

## Proposed Changes

Part II of major slasher optimisations after #2767

These changes will be backwards-incompatible due to the move to MDBX (and the schema change) 😱 

* [x] Shrink attester keys from 16 bytes to 7 bytes.
* [x] Shrink attester records from 64 bytes to 6 bytes.
* [x] Separate `DiskConfig` from regular `Config`.
* [x] Add configuration for the LRU cache size.
* [x] Add a "migration" that deletes any legacy LMDB database.
2021-12-21 08:23:17 +00:00
Michael Sproul
a290a3c537 Add configurable block replayer (#2863)
## Issue Addressed

Successor to #2431

## Proposed Changes

* Add a `BlockReplayer` struct to abstract over the intricacies of calling `per_slot_processing` and `per_block_processing` while avoiding unnecessary tree hashing.
* Add a variant of the forwards state root iterator that does not require an `end_state`.
* Use the `BlockReplayer` when reconstructing states in the database. Use the efficient forwards iterator for frozen states.
* Refactor the iterators to remove `Arc<HotColdDB>` (this seems to be neater than making _everything_ an `Arc<HotColdDB>` as I did in #2431).

Supplying the state roots allow us to avoid building a tree hash cache at all when reconstructing historic states, which saves around 1 second flat (regardless of `slots-per-restore-point`). This is a small percentage of worst-case state load times with 200K validators and SPRP=2048 (~15s vs ~16s) but a significant speed-up for more frequent restore points: state loads with SPRP=32 should be now consistently <500ms instead of 1.5s (a ~3x speedup).

## Additional Info

Required by https://github.com/sigp/lighthouse/pull/2628
2021-12-21 06:30:52 +00:00
Divma
56d596ee42 Unban peers at the swarm level when purged (#2855)
## Issue Addressed
#2840
2021-12-20 23:45:21 +00:00
eklm
9be3d4ecac Downgrade AttestationStateIsFinalized error to debug (#2866)
## Issue Addressed

#2834 

## Proposed Changes

Change log message severity from error to debug in attestation verification when attestation state is finalized.
2021-12-17 07:59:46 +00:00
Divma
eee0260a68 do not count dialing peers in the connection limit (#2856)
## Issue Addressed
#2841 

## Proposed Changes
Not counting dialing peers while deciding if we have reached the target peers in case of outbound peers.

## Additional Info
Checked this running in nodes and bandwidth looks normal, peer count looks normal too
2021-12-15 05:48:45 +00:00
Michael Sproul
a43d5e161f Optimise balances cache in case of skipped slots (#2849)
## Proposed Changes

Remove the `is_first_block_in_epoch` logic from the balances cache update logic, as it was incorrect in the case of skipped slots. The updated code is simpler because regardless of whether the block is the first in the epoch we can check if an entry for the epoch boundary root already exists in the cache, and update the cache accordingly.

Additionally, to assist with flip-flopping justified epochs, move to cloning the balance cache rather than moving it. This should still be very fast in practice because the balances cache is a ~1.6MB `Vec`, and this operation is expected to only occur infrequently.
2021-12-13 23:35:57 +00:00
realbigsean
b22ac95d7f v1.1.6 Fork Choice changes (#2822)
## Issue Addressed

Resolves: https://github.com/sigp/lighthouse/issues/2741
Includes: https://github.com/sigp/lighthouse/pull/2853 so that we can get ssz static tests passing here on v1.1.6. If we want to merge that first, we can make this diff slightly smaller

## Proposed Changes

- Changes the `justified_epoch` and `finalized_epoch` in the `ProtoArrayNode` each to an `Option<Checkpoint>`. The `Option` is necessary only for the migration, so not ideal. But does allow us to add a default logic to `None` on these fields during the database migration.
- Adds a database migration from a legacy fork choice struct to the new one, search for all necessary block roots in fork choice by iterating through blocks in the db.
- updates related to https://github.com/ethereum/consensus-specs/pull/2727
  -  We will have to update the persisted forkchoice to make sure the justified checkpoint stored is correct according to the updated fork choice logic. This boils down to setting the forkchoice store's justified checkpoint to the justified checkpoint of the block that advanced the finalized checkpoint to the current one. 
  - AFAICT there's no migration steps necessary for the update to allow applying attestations from prior blocks, but would appreciate confirmation on that
- I updated the consensus spec tests to v1.1.6 here, but they will fail until we also implement the proposer score boost updates. I confirmed that the previously failing scenario `new_finalized_slot_is_justified_checkpoint_ancestor` will now pass after the boost updates, but haven't confirmed _all_ tests will pass because I just quickly stubbed out the proposer boost test scenario formatting.
- This PR now also includes proposer boosting https://github.com/ethereum/consensus-specs/pull/2730

## Additional Info
I realized checking justified and finalized roots in fork choice makes it more likely that we trigger this bug: https://github.com/ethereum/consensus-specs/pull/2727

It's possible the combination of justified checkpoint and finalized checkpoint in the forkchoice store is different from in any block in fork choice. So when trying to startup our store's justified checkpoint seems invalid to the rest of fork choice (but it should be valid). When this happens we get an `InvalidBestNode` error and fail to start up. So I'm including that bugfix in this branch.

Todo:

- [x] Fix fork choice tests
- [x] Self review
- [x] Add fix for https://github.com/ethereum/consensus-specs/pull/2727
- [x] Rebase onto Kintusgi 
- [x] Fix `num_active_validators` calculation as @michaelsproul pointed out
- [x] Clean up db migrations

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-12-13 20:43:22 +00:00
Pawan Dhananjay
e391b32858 Merge devnet 3 (#2859)
## Issue Addressed

N/A

## Proposed Changes

Changes required for the `merge-devnet-3`. Added some more non substantive renames on top of @realbigsean 's commit. 
Note: this doesn't include the proposer boosting changes in kintsugi v3.

This devnet isn't running with the proposer boosting fork choice changes so if we are looking to merge https://github.com/sigp/lighthouse/pull/2822 into `unstable`, then I think we should just maintain this branch for the devnet temporarily. 


Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
2021-12-12 09:04:21 +00:00
Lion - dapplion
2984f4b474 Remove wrong duplicated comment (#2751)
## Issue Addressed

Remove wrong duplicated comment. Comment was copied from ban_peer() but doesn't apply to unban_peer()
2021-12-06 05:34:15 +00:00
Mac L
a7a7edb6cf Optimise snapshot cache for late blocks (#2832)
## Proposed Changes

In the event of a late block, keep the block in the snapshot cache by cloning it. This helps us process new blocks quickly in the event the late block was re-org'd.


Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-12-06 03:41:31 +00:00
realbigsean
b5f2764bae fix cache miss justified balances calculation (#2852)
## Issue Addressed

We were calculating justified balances incorrectly on cache misses in `set_justified_checkpoint`

## Proposed Changes

Use the `get_effective_balances` method as opposed to `state.balances`, which returns exact balances


Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-12-03 16:58:10 +00:00
realbigsean
a80ccc3a33 1.57.0 lints (#2850)
## Issue Addressed

New rust lints

## Proposed Changes

- Boxing some enum variants
- removing some unused fields (is the validator lockfile unused? seemed so to me)

## Additional Info

- some error fields were marked as dead code but are logged out in areas
- left some dead fields in our ef test code because I assume they are useful for debugging?

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-12-03 04:44:30 +00:00
Pawan Dhananjay
f3c237cfa0
Restrict network limits based on merge fork epoch (#2839) 2021-12-02 14:32:31 +11:00
Paul Hauner
144978f8f8
Remove duplicate slot_clock method (#2842) 2021-12-02 14:29:59 +11:00
Paul Hauner
94385fe17b
Support legacy data directories (#2846) 2021-12-02 14:29:59 +11:00
Paul Hauner
ab86b42874
Kintsugi Diva comments (#2836)
* Remove TODOs

* Fix typo
2021-12-02 14:29:59 +11:00
ethDreamer
c2f2813385
Cleanup Comments & Fix get_pow_block_hash_at_ttd() (#2835) 2021-12-02 14:29:59 +11:00
Paul Hauner
1b56ebf85e
Kintsugi review comments (#2831)
* Fix makefile

* Return on invalid finalized block

* Fix todo in gossip scoring

* Require --merge for --fee-recipient

* Bump eth2_serde_utils

* Change schema versions

* Swap hash/uint256 test_random impls

* Use default for ExecutionPayload::empty

* Check for DBs before removing

* Remove kintsugi docker image

* Fix CLI default value
2021-12-02 14:29:59 +11:00
Paul Hauner
82a81524e3
Bump crate versions (#2829) 2021-12-02 14:29:57 +11:00
ethDreamer
f6748537db
Removed PowBlock struct that never got used (#2813) 2021-12-02 14:29:20 +11:00
Paul Hauner
5f0fef2d1e
Kintsugi on_merge_block tests (#2811)
* Start v1.1.5 updates

* Implement new payload creation logic

* Tidy, add comments

* Remove unused error enums

* Add validate payload for gossip

* Refactor validate_merge_block

* Split payload verification in per block processing

* Add execute_payload

* Tidy

* Tidy

* Start working on new fork choice tests

* Fix failing merge block test

* Skip block_lookup_failed test

* Fix failing terminal block test

* Fixes from self-review

* Address review comments
2021-12-02 14:29:20 +11:00
pawan
44a7b37ce3
Increase network limits (#2796)
Fix max packet sizes

Fix max_payload_size function

Add merge block test

Fix max size calculation; fix up test

Clear comments

Add a payload_size_function

Use safe arith for payload calculation

Return an error if block too big in block production

Separate test to check if block is over limit
2021-12-02 14:29:20 +11:00
Paul Hauner
afe59afacd
Ensure difficulty/hash/epoch overrides change the ChainSpec (#2798)
* Unify loading of eth2_network_config

* Apply overrides at lighthouse binary level

* Remove duplicate override values

* Add merge values to existing net configs

* Make override flags global

* Add merge fields to testing config

* Add one to TTD

* Fix failing engine tests

* Fix test compile error

* Remove TTD flags

* Move get_eth2_network_config

* Fix warn

* Address review comments
2021-12-02 14:29:18 +11:00
Paul Hauner
47db682d7e
Implement engine API v1.0.0-alpha.4 (#2810)
* Added ForkchoiceUpdatedV1 & GetPayloadV1

* Added ExecutePayloadV1

* Added new geth test vectors

* Separated Json Object/Serialization Code into file

* Deleted code/tests for Requests Removed from spec

* Finally fixed serialization of null '0x'

* Made Naming of JSON Structs Consistent

* Fix clippy lints

* Remove u64 payload id

* Remove unused serde impls

* Swap to [u8; 8] for payload id

* Tidy

* Adjust some block gen return vals

* Tidy

* Add fallback when payload id is unknown

* Remove comment

Co-authored-by: Mark Mackey <mark@sigmaprime.io>
2021-12-02 14:26:55 +11:00
Paul Hauner
cdfd1304a5
Skip memory intensive engine test (#2809)
* Allocate less memory (3GB) in engine tests

* Run cargo format

* Remove tx too large test

Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-12-02 14:26:55 +11:00
Paul Hauner
cbd2201164
Fixes after rebasing Kintsugi onto unstable (#2799)
* Fix fork choice after rebase

* Remove paulhauner warp dep

* Fix fork choice test compile errors

* Assume fork choice payloads are valid

* Add comment

* Ignore new tests

* Fix error in test skipping
2021-12-02 14:26:55 +11:00
Pawan Dhananjay
24966c059d
Fix Uint256 deserialization (#2786)
* Change base_fee_per_gas to Uint256

* Add custom (de)serialization to ExecutionPayload

* Fix errors

* Add a quoted_u256 module

* Remove unused function

* lint

* Add test

* Remove extra line

Co-authored-by: Paul Hauner <paul@paulhauner.com>
2021-12-02 14:26:55 +11:00
realbigsean
de49c7ddaa
1.1.5 merge spec tests (#2781)
* Fix arbitrary check kintsugi

* Add merge chain spec fields, and a function to determine which constant to use based on the state variant

* increment spec test version

* Remove `Transaction` enum wrapper

* Remove Transaction new-type

* Remove gas validations

* Add `--terminal-block-hash-epoch-override` flag

* Increment spec tests version to 1.1.5

* Remove extraneous gossip verification https://github.com/ethereum/consensus-specs/pull/2687

* - Remove unused Error variants
- Require both "terminal-block-hash-epoch-override" and "terminal-block-hash-override" when either flag is used

* - Remove a couple more unused Error variants

Co-authored-by: Paul Hauner <paul@paulhauner.com>
2021-12-02 14:26:55 +11:00
Paul Hauner
86e0c56a38
Kintsugi rebase patches (#2769)
* Freshen Cargo.lock

* Fix gossip worker

* Update map_fork_name_with
2021-12-02 14:26:54 +11:00
Paul Hauner
6b4cc63b57
Accept TTD override as decimal (#2676) 2021-12-02 14:26:54 +11:00
realbigsean
d8eec16c5e
v1.1.1 spec updates (#2684)
* update initializing from eth1 for merge genesis

* read execution payload header from file lcli

* add `create-payload-header` command to `lcli`

* fix base fee parsing

* Apply suggestions from code review

* default `execution_payload_header` bool to false when deserializing `meta.yml` in EF tests

Co-authored-by: Paul Hauner <paul@paulhauner.com>
2021-12-02 14:26:54 +11:00
Paul Hauner
6dde12f311
[Merge] Optimistic Sync: Stage 1 (#2686)
* Add payload verification status to fork choice

* Pass payload verification status to import_block

* Add valid back-propagation

* Add head safety status latch to API

* Remove ExecutionLayerStatus

* Add execution info to client notifier

* Update notifier logs

* Change use of "hash" to refer to beacon block

* Shutdown on invalid finalized block

* Tidy, add comments

* Fix failing FC tests

* Allow blocks with unsafe head

* Fix forkchoiceUpdate call on startup
2021-12-02 14:26:54 +11:00
Pawan Dhananjay
aa1d57aa55
Fix db paths when datadir is relative (#2682) 2021-12-02 14:26:53 +11:00
Paul Hauner
67a6f91df6
[Merge] Optimistic EL verification (#2683)
* Ignore payload errors

* Only return payload handle on valid response

* Push some engine logs down to debug

* Push ee fork choice log to debug

* Push engine call failure to debug

* Push some more errors to debug

* Fix panic at startup
2021-12-02 14:26:53 +11:00
Paul Hauner
35350dff75
[Merge] Block validator duties when EL is not ready (#2672)
* Reject some HTTP endpoints when EL is not ready

* Restrict more endpoints

* Add watchdog task

* Change scheduling

* Update to new schedule

* Add "syncing" concept

* Remove RequireSynced

* Add is_merge_complete to head_info

* Cache latest_head in Engines

* Call consensus_forkchoiceUpdate on startup
2021-12-02 14:26:53 +11:00
Paul Hauner
d6fda44620
Disable notifier logging from dummy eth1 backend (#2680) 2021-12-02 14:26:53 +11:00
ethDreamer
52e5083502
Fixed bugs for m3 readiness (#2669)
* Fixed bugs for m3 readiness

* woops

* cargo fmt..
2021-12-02 14:26:53 +11:00
Paul Hauner
b162b067de
Misc changes for merge testnets (#2667)
* Thread eth1_block_hash into interop genesis state

* Add merge-fork-epoch flag

* Build LH with minimal spec by default

* Add verbose logs to execution_layer

* Add --http-allow-sync-stalled flag

* Update lcli new-testnet to create genesis state

* Fix http test

* Fix compile errors in tests
2021-12-02 14:26:52 +11:00
Paul Hauner
a1033a9247
Add BeaconChainHarness tests for The Merge (#2661)
* Start adding merge tests

* Expose MockExecutionLayer

* Add mock_execution_layer to BeaconChainHarness

* Progress with merge test

* Return more detailed errors with gas limit issues

* Use a better gas limit in block gen

* Ensure TTD is met in block gen

* Fix basic_merge tests

* Start geth testing

* Fix conflicts after rebase

* Remove geth tests

* Improve merge test

* Address clippy lints

* Make pow block gen a pure function

* Add working new test, breaking existing test

* Fix test names

* Add should_panic

* Don't run merge tests in debug

* Detect a tokio runtime when starting MockServer

* Fix clippy lint, include merge tests
2021-12-02 14:26:52 +11:00
Paul Hauner
801f6f7425
Disable autotests for beacon_chain (#2658) 2021-12-02 14:26:52 +11:00
Paul Hauner
01031931d9
[Merge] Add execution API test vectors from Geth (#2651)
* Add geth request vectors

* Add geth response vectors

* Fix clippy lints
2021-12-02 14:26:52 +11:00
Paul Hauner
20ca7a56ed
[Merge] Add serde impls for Transactions type (#2649)
* Start implemented serde for transactions

* Revise serde impl

* Add tests for transaction decoding
2021-12-02 14:26:51 +11:00
Paul Hauner
d8623cfc4f
[Merge] Implement execution_layer (#2635)
* Checkout serde_utils from rayonism

* Make eth1::http functions pub

* Add bones of execution_layer

* Modify decoding

* Expose Transaction, cargo fmt

* Add executePayload

* Add all minimal spec endpoints

* Start adding json rpc wrapper

* Finish custom JSON response handler

* Switch to new rpc sending method

* Add first test

* Fix camelCase

* Finish adding tests

* Begin threading execution layer into BeaconChain

* Fix clippy lints

* Fix clippy lints

* Thread execution layer into ClientBuilder

* Add CLI flags

* Add block processing methods to ExecutionLayer

* Add block_on to execution_layer

* Integrate execute_payload

* Add extra_data field

* Begin implementing payload handle

* Send consensus valid/invalid messages

* Fix minor type in task_executor

* Call forkchoiceUpdated

* Add search for TTD block

* Thread TTD into execution layer

* Allow producing block with execution payload

* Add LRU cache for execution blocks

* Remove duplicate 0x on ssz_types serialization

* Add tests for block getter methods

* Add basic block generator impl

* Add is_valid_terminal_block to EL

* Verify merge block in block_verification

* Partially implement --terminal-block-hash-override

* Add terminal_block_hash to ChainSpec

* Remove Option from terminal_block_hash in EL

* Revert merge changes to consensus/fork_choice

* Remove commented-out code

* Add bones for handling RPC methods on test server

* Add first ExecutionLayer tests

* Add testing for finding terminal block

* Prevent infinite loops

* Add insert_merge_block to block gen

* Add block gen test for pos blocks

* Start adding payloads to block gen

* Fix clippy lints

* Add execution payload to block gen

* Add execute_payload to block_gen

* Refactor block gen

* Add all routes to mock server

* Use Uint256 for base_fee_per_gas

* Add working execution chain build

* Remove unused var

* Revert "Use Uint256 for base_fee_per_gas"

This reverts commit 6c88f19ac45db834dd4dbf7a3c6e7242c1c0f735.

* Fix base_fee_for_gas Uint256

* Update execute payload handle

* Improve testing, fix bugs

* Fix default fee-recipient

* Fix fee-recipient address (again)

* Add check for terminal block, add comments, tidy

* Apply suggestions from code review

Co-authored-by: realbigsean <seananderson33@GMAIL.com>

* Fix is_none on handle Drop

* Remove commented-out tests

Co-authored-by: realbigsean <seananderson33@GMAIL.com>
2021-12-02 14:26:51 +11:00
ethDreamer
1563bce905
Finished Gossip Block Validation Conditions (#2640)
* Gossip Block Validation is Much More Efficient

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-12-02 14:26:51 +11:00
realbigsean
aa534f8989
Store execution block hash in fork choice (#2643)
* - Update the fork choice `ProtoNode` to include `is_merge_complete`
- Add database migration for the persisted fork choice

* update tests

* Small cleanup

* lints

* store execution block hash in fork choice rather than bool
2021-12-02 14:26:51 +11:00
Paul Hauner
c10e8ce955
Fix clippy lints on merge-f2f (#2626)
* Remove unchecked arith from ssz_derive

* Address clippy lints in block_verfication

* Use safe math for is_valid_gas_limit
2021-12-02 14:26:50 +11:00
Mark Mackey
5687c56d51
Initial merge changes
Added Execution Payload from Rayonism Fork

Updated new Containers to match Merge Spec

Updated BeaconBlockBody for Merge Spec

Completed updating BeaconState and BeaconBlockBody

Modified ExecutionPayload<T> to use Transaction<T>

Mostly Finished Changes for beacon-chain.md

Added some things for fork-choice.md

Update to match new fork-choice.md/fork.md changes

ran cargo fmt

Added Missing Pieces in eth2_libp2p for Merge

fix ef test

Various Changes to Conform Closer to Merge Spec
2021-12-02 14:26:50 +11:00
Mac L
fe75a0a9a1 Add background file logging (#2762)
## Issue Addressed

Closes #1996 

## Proposed Changes

Run a second `Logger` via `sloggers` which logs to a file in the background with:
- separate `debug-level` for background and terminal logging
- the ability to limit log size
- rotation through a customizable number of log files
- an option to compress old log files (`.gz` format)

Add the following new CLI flags:
- `--logfile-debug-level`: The debug level of the log files
- `--logfile-max-size`: The maximum size of each log file
- `--logfile-max-number`: The number of old log files to store
- `--logfile-compress`: Whether to compress old log files

By default background logging uses the `debug` log level and saves logfiles to:
- Beacon Node:  `$HOME/.lighthouse/$network/beacon/logs/beacon.log`
- Validator Client:  `$HOME/.lighthouse/$network/validators/logs/validator.log`

Or, when using the `--datadir` flag:
`$datadir/beacon/logs/beacon.log` and `$datadir/validators/logs/validator.log`

Once rotated, old logs are stored like so: `beacon.log.1`, `beacon.log.2` etc. 
> Note: `beacon.log.1` is always newer than `beacon.log.2`.

## Additional Info

Currently the default value of `--logfile-max-size` is 200 (MB) and `--logfile-max-number` is 5.
This means that the maximum storage space that the logs will take up by default is 1.2GB. 
(200MB x 5 from old log files + <200MB the current logfile being written to)
Happy to adjust these default values to whatever people think is appropriate. 

It's also worth noting that when logging to a file, we lose our custom `slog` formatting. This means the logfile logs look like this:
```
Oct 27 16:02:50.305 INFO Lighthouse started, version: Lighthouse/v2.0.1-8edd9d4+, module: lighthouse:413
Oct 27 16:02:50.305 INFO Configured for network, name: prater, module: lighthouse:414
```
2021-11-30 03:25:32 +00:00
Age Manning
6625aa4afe Status'd Peer Not Found (#2761)
## Issue Addressed

Users are experiencing `Status'd peer not found` errors

## Proposed Changes

Although I cannot reproduce this error, this is only one connection state change that is not addressed in the peer manager (that I could see). The error occurs because the number of disconnected peers in the peerdb becomes out of sync with the actual number of disconnected peers. From what I can tell almost all possible connection state changes are handled, except for the case when a disconnected peer changes to be disconnecting. This can potentially happen at the peer connection limit, where a previously connected peer switches to disconnecting. 

This PR decrements the disconnected counter when this event occurs and from what I can tell, covers all possible disconnection state changes in the peer manager.
2021-11-28 22:46:17 +00:00
Divma
413b0b5b2b Correctly update range status when outdated chains are removed (#2827)
We were batch removing chains when purging, and then updating the status of the collection for each of those. This makes the range status be out of sync with the real status. This represented no harm to the global sync status, but I've changed it to comply with a correct debug assertion that I got triggered while doing some testing.
Also added tests and improved code quality as per @paulhauner 's suggestions.
2021-11-26 01:13:49 +00:00
Pawan Dhananjay
9eedb6b888 Allow additional subnet peers (#2823)
## Issue Addressed

N/A

## Proposed Changes

1. Don't disconnect peer from dht on connection limit errors
2. Bump up `PRIORITY_PEER_EXCESS` to allow for dialing upto 60 peers by default.



Co-authored-by: Diva M <divma@protonmail.com>
2021-11-25 21:27:08 +00:00
Michael Sproul
2c07a72980 Revert peer DB changes from #2724 (#2828)
## Proposed Changes

This reverts commit 53562010ec from PR #2724

Hopefully this will restore the reliability of the sync simulator.
2021-11-25 03:45:52 +00:00
Age Manning
0b319d4926 Inform dialing via the behaviour (#2814)
I had this change but it seems to have been lost in chaos of network upgrades.

The swarm dialing event seems to miss some cases where we dial via the behaviour. This causes an error to be logged as the peer manager doesn't know about some dialing events. 

This shifts the logic to the behaviour to inform the peer manager.
2021-11-19 04:42:33 +00:00
Divma
53562010ec Move peer db writes to eth2 libp2p (#2724)
## Issue Addressed
Part of a bigger effort to make the network globals read only. This moves all writes to the `PeerDB` to the `eth2_libp2p` crate. Limiting writes to the peer manager is a slightly more complicated issue for a next PR, to keep things reviewable.

## Proposed Changes
- Make the peers field in the globals a private field.
- Allow mutable access to the peers field to `eth2_libp2p` for now.
- Add a new network message to update the sync state.

Co-authored-by: Age Manning <Age@AgeManning.com>
2021-11-19 04:42:31 +00:00
Divma
31386277c3 Sync wrong dbg assertion (#2821)
## Issue Addressed

Running a beacon node I triggered a sync debug panic. And so finally the time to create tests for sync arrived. Fortunately, te bug was not in the sync algorithm itself but a wrong assertion

## Proposed Changes

- Split Range's impl from the BeaconChain via a trait. This is needed for testing. The TestingRig/Harness is way bigger than needed and does not provide the modification functionalities that are needed to test sync. I find this simpler, tho some could disagree.
- Add a regression test for sync that fails before the changes.
- Fix the wrong assertion.
2021-11-19 02:38:25 +00:00
Age Manning
e519af9012 Update Lighthouse Dependencies (#2818)
## Issue Addressed

Updates lighthouse dependencies to resolve audit issues in out-dated deps.
2021-11-18 05:08:42 +00:00
Pawan Dhananjay
e32c09bfda Fix decoding max length (#2816)
## Issue Addressed

N/A

## Proposed Changes

Fix encoder max length to the correct value (`MAX_RPC_SIZE`).
2021-11-16 22:23:39 +00:00
Age Manning
a43a2448b7 Investigate and correct RPC Response Timeouts (#2804)
RPC Responses are for some reason not removing their timeout when they are completing. 

As an example:

```
Nov 09 01:18:20.256 DEBG Received BlocksByRange Request          step: 1, start_slot: 728465, count: 64, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw
Nov 09 01:18:20.263 DEBG Received BlocksByRange Request          step: 1, start_slot: 728593, count: 64, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw
Nov 09 01:18:20.483 DEBG BlocksByRange Response sent             returned: 63, requested: 64, current_slot: 2466389, start_slot: 728465, msg: Failed to return all requested blocks, peer: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw
Nov 09 01:18:20.500 DEBG BlocksByRange Response sent             returned: 64, requested: 64, current_slot: 2466389, start_slot: 728593, peer: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw
Nov 09 01:18:21.068 DEBG Received BlocksByRange Request          step: 1, start_slot: 728529, count: 64, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw
Nov 09 01:18:21.272 DEBG BlocksByRange Response sent             returned: 63, requested: 64, current_slot: 2466389, start_slot: 728529, msg: Failed to return all requested blocks, peer: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw
Nov 09 01:18:23.434 DEBG Received BlocksByRange Request          step: 1, start_slot: 728657, count: 64, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw
Nov 09 01:18:23.665 DEBG BlocksByRange Response sent             returned: 64, requested: 64, current_slot: 2466390, start_slot: 728657, peer: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw
Nov 09 01:18:25.851 DEBG Received BlocksByRange Request          step: 1, start_slot: 728337, count: 64, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw
Nov 09 01:18:25.851 DEBG Received BlocksByRange Request          step: 1, start_slot: 728401, count: 64, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw
Nov 09 01:18:26.094 DEBG BlocksByRange Response sent             returned: 62, requested: 64, current_slot: 2466390, start_slot: 728401, msg: Failed to return all requested blocks, peer: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw
Nov 09 01:18:26.100 DEBG BlocksByRange Response sent             returned: 63, requested: 64, current_slot: 2466390, start_slot: 728337, msg: Failed to return all requested blocks, peer: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw
Nov 09 01:18:31.070 DEBG RPC Error                               direction: Incoming, score: 0, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw, client: Prysm: version: a80b1c252a9b4773493b41999769bf3134ac373f, os_version: unknown, err: Stream Timeout, protocol: beacon_blocks_by_range, service: libp2p
Nov 09 01:18:31.070 WARN Timed out to a peer's request. Likely insufficient resources, reduce peer count, service: libp2p
Nov 09 01:18:31.085 DEBG RPC Error                               direction: Incoming, score: 0, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw, client: Prysm: version: a80b1c252a9b4773493b41999769bf3134ac373f, os_version: unknown, err: Stream Timeout, protocol: beacon_blocks_by_range, service: libp2p
Nov 09 01:18:31.085 WARN Timed out to a peer's request. Likely insufficient resources, reduce peer count, service: libp2p
Nov 09 01:18:31.459 DEBG RPC Error                               direction: Incoming, score: 0, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw, client: Prysm: version: a80b1c252a9b4773493b41999769bf3134ac373f, os_version: unknown, err: Stream Timeout, protocol: beacon_blocks_by_range, service: libp2p
Nov 09 01:18:31.459 WARN Timed out to a peer's request. Likely insufficient resources, reduce peer count, service: libp2p
Nov 09 01:18:34.129 DEBG RPC Error                               direction: Incoming, score: 0, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw, client: Prysm: version: a80b1c252a9b4773493b41999769bf3134ac373f, os_version: unknown, err: Stream Timeout, protocol: beacon_blocks_by_range, service: libp2p
Nov 09 01:18:34.130 WARN Timed out to a peer's request. Likely insufficient resources, reduce peer count, service: libp2p
Nov 09 01:18:35.686 DEBG Peer Manager disconnecting peer         reason: Too many peers, peer_id: 16Uiu2HAmEmBURejquBUMgKAqxViNoPnSptTWLA2CfgSPnnKENBNw, service: libp2p
```

This PR is to investigate and correct the issue. 

~~My current thoughts are that for some reason we are not closing the streams correctly, or fast enough, or the executor is not registering the closes and waking up.~~ - Pretty sure this is not the case, see message below for a more accurate reason.

~~I've currently added a timeout to stream closures in an attempt to force streams to close and the future to always complete.~~ I removed this
2021-11-16 03:42:25 +00:00
Paul Hauner
931daa40d7 Add fork choice EF tests (#2737)
## Issue Addressed

Resolves #2545

## Proposed Changes

Adds the long-overdue EF tests for fork choice. Although we had pretty good coverage via other implementations that closely followed our approach, it is nonetheless important for us to implement these tests too.

During testing I found that we were using a hard-coded `SAFE_SLOTS_TO_UPDATE_JUSTIFIED` value rather than one from the `ChainSpec`. This caused a failure during a minimal preset test. This doesn't represent a risk to mainnet or testnets, since the hard-coded value matched the mainnet preset.

## Failing Cases

There is one failing case which is presently marked as `SkippedKnownFailure`:

```
case 4 ("new_finalized_slot_is_justified_checkpoint_ancestor") from /home/paul/development/lighthouse/testing/ef_tests/consensus-spec-tests/tests/minimal/phase0/fork_choice/on_block/pyspec_tests/new_finalized_slot_is_justified_checkpoint_ancestor failed with NotEqual:
head check failed: Got Head { slot: Slot(40), root: 0x9183dbaed4191a862bd307d476e687277fc08469fc38618699863333487703e7 } | Expected Head { slot: Slot(24), root: 0x105b49b51bf7103c182aa58860b039550a89c05a4675992e2af703bd02c84570 }
```

This failure is due to #2741. It's not a particularly high priority issue at the moment, so we fix it after merging this PR.
2021-11-08 07:29:04 +00:00
Divma
fbafe416d1 Move the peer manager to be a behaviour (#2773)
This simply moves some functions that were "swarm notifications" to a network behaviour implementation.

Notes
------
- We could disconnect from the peer manager but we would lose the rpc shutdown message
- We still notify from the swarm since this is the most reliable way to get some events. Ugly but best for now
- Events need to be pushed with "add event" to wake the waker

Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>
2021-11-08 00:01:10 +00:00
Michael Sproul
df02639b71 De-duplicate attestations in the slasher (#2767)
## Issue Addressed

Closes https://github.com/sigp/lighthouse/issues/2112
Closes https://github.com/sigp/lighthouse/issues/1861

## Proposed Changes

Collect attestations by validator index in the slasher, and use the magic of reference counting to automatically discard redundant attestations. This results in us storing only 1-2% of the attestations observed when subscribed to all subnets, which carries over to a 50-100x reduction in data stored 🎉 

## Additional Info

There's some nuance to the configuration of the `slot-offset`. It has a profound effect on the effictiveness of de-duplication, see the docs added to the book for an explanation: 5442e695e5/book/src/slasher.md (slot-offset)
2021-11-08 00:01:09 +00:00
Divma
a683e0296a Peer manager cfg (#2766)
## Issue Addressed
I've done this change in a couple of WIPs already so I might as well submit it on its own. This changes no functionality but reduces coupling in a 0.0001%. It also helps new people who need to work in the peer manager to better understand what it actually needs from the outside

## Proposed Changes

Add a config to the peer manager
2021-11-03 23:44:44 +00:00
Divma
7502970a7d Do not compute metrics in the network service if the cli flag is not set (#2765)
## Issue Addressed

The computation of metrics in the network service can be expensive. This disables the computation unless the cli flag `metrics` is set.

## Additional Info
Metrics in other parts of the network are still updated, since most are simple metrics and checking if metrics are enabled each time each metric is updated doesn't seem like a gain.
2021-11-03 00:06:03 +00:00
realbigsean
c4ad0e3fb3 Ensure dependent root consistency in head events (#2753)
## Issue Addressed

@paulhauner noticed that when we send head events, we use the block root from `new_head` in `fork_choice_internal`, but calculate `dependent_root` and `previous_dependent_root` using the `canonical_head`. This is normally fine because `new_head` updates the `canonical_head` in `fork_choice_internal`, but it's possible we have a reorg updating `canonical_head` before our head events are sent. So this PR ensures `dependent_root` and `previous_dependent_root` are always derived from the state associated with `new_head`.



Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-11-02 02:26:32 +00:00
Pawan Dhananjay
4499adc7fd Check proposer index during block production (#2740)
## Issue Addressed

Resolves #2612 

## Proposed Changes

Implements both the checks mentioned in the original issue. 
1. Verifies the `randao_reveal` in the beacon node
2. Cross checks the proposer index after getting back the block from the beacon node.

## Additional info
The block production time increases by ~10x because of the signature verification on the beacon node (based on the `beacon_block_production_process_seconds` metric) when running on a local testnet.
2021-11-01 07:44:40 +00:00
Michael Sproul
ffb04e1a9e Add op pool metrics for attestations (#2758)
## Proposed Changes

Add several metrics for the number of attestations in the op pool. These give us a way to observe the number of valid, non-trivial attestations during block packing rather than just the size of the entire op pool.
2021-11-01 05:52:31 +00:00
Divma
e2c0650d16 Relax late sync committee penalty (#2752)
## Issue Addressed

Getting too many peers kicked due to slightly late sync committee messages as tested on.. under-performant hardware.

## Proposed Changes

Only penalize if the message is more than one slot late. Still ignore the message-

Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>
2021-10-31 22:30:19 +00:00
Age Manning
1790010260 Upgrade to latest libp2p (#2605)
This is a pre-cursor to the next libp2p upgrade. 

It is currently being used for staging a number of PR upgrades which are contingent on the latest libp2p.
2021-10-29 01:59:29 +00:00
ethDreamer
2c4413454a Fixed Gossip Topics on Fork Boundary (#2619)
## Issue Addressed

The [p2p-interface section of the `altair` spec](https://github.com/ethereum/consensus-specs/blob/dev/specs/altair/p2p-interface.md#transitioning-the-gossip) says you should subscribe to the topics for a fork "In advance of the fork" and unsubscribe from old topics `2 Epochs` after the new fork is activated. We've chosen to subscribe to new fork topics `2 slots` before the fork is initiated.

This function is supposed to return the required fork digests at any given time but as it was currently written, it doesn't return the fork digest for a previous fork if you've switched to the current fork less than 2 epoch's ago. Also this function required modification for every new fork we add.

## Proposed Changes

Make this function fork-agnostic and correctly handle the previous fork topic digests when you've only just switched to the new fork.
2021-10-29 00:05:27 +00:00
Pawan Dhananjay
88063398f6 Prevent double import of blocks (#2647)
## Issue Addressed

Resolves #2611 

## Proposed Changes

Adds a duplicate block root cache to the `BeaconProcessor`. Adds the block root to the cache before calling `process_gossip_block` and `process_rpc_block`. Since `process_rpc_block` is called only for single block lookups, we don't have to worry about batched block imports.

The block is imported from the source(gossip/rpc) that arrives first. The block that arrives second is not imported to avoid the db access issue.
There are 2 cases:
1. Block that arrives second is from rpc: In this case, we return an optimistic `BlockError::BlockIsAlreadyKnown` to sync.
2. Block that arrives second is from gossip: In this case, we only do gossip verification and forwarding but don't import the block into the the beacon chain.

## Additional info
Splits up `process_gossip_block` function to `process_gossip_unverified_block` and `process_gossip_verified_block`.
2021-10-28 03:36:14 +00:00
Michael Sproul
2dc6163043 Add API version headers and map_fork_name! (#2745)
## Proposed Changes

* Add the `Eth-Consensus-Version` header to the HTTP API for the block and state endpoints. This is part of the v2.1.0 API that was recently released: https://github.com/ethereum/beacon-APIs/pull/170
* Add tests for the above. I refactored the `eth2` crate's helper functions to make this more straight-forward, and introduced some new mixin traits that I think greatly improve readability and flexibility.
* Add a new `map_with_fork!` macro which is useful for decoding a superstruct type without naming all its variants. It is now used for SSZ-decoding `BeaconBlock` and `BeaconState`, and for JSON-decoding `SignedBeaconBlock` in the API.

## Additional Info

The `map_with_fork!` changes will conflict with the Merge changes, but when resolving the conflict the changes from this branch should be preferred (it is no longer necessary to enumerate every fork). The merge fork _will_  need to be added to `map_fork_name_with`.
2021-10-28 01:18:04 +00:00
Mac L
8edd9d45ab Fix purge-db edge case (#2747)
## Issue Addressed

Currently, if you launch the beacon node with the `--purge-db` flag and the `beacon` directory exists, but one (or both) of the `chain_db` or `freezer-db` directories are missing, it will error unnecessarily with: 
```
Failed to remove chain_db: No such file or directory (os error 2)
```

This is an edge case which can occur in cases of manual intervention (a user deleted the directory) or if you had previously run with the `--purge-db` flag and Lighthouse errored before it could initialize the db directories.

## Proposed Changes

Check if the `chain_db`/`freezer_db` exists before attempting to remove them. This prevents unnecessary errors.
2021-10-25 22:11:28 +00:00
Divma
d4819bfd42 Add a waker to the RPC handler (#2721)
## Issue Addressed

Attempts to fix #2701 but I doubt this is the reason behind that.

## Proposed Changes

maintain a waker in the rpc handler and call it if an event is received
2021-10-21 06:14:36 +00:00
Pawan Dhananjay
de34001e78 Update next_fork_subscriptions correctly (#2688)
## Issue Addressed

N/A

## Proposed Changes

Update the `next_fork_subscriptions` timer only after a fork happens.
2021-10-21 04:38:44 +00:00
divma
99f7a7db58 remove double backfill sync state (#2733)
## Issue Addressed
In the backfill sync the state was maintained twice, once locally and also in the globals. This makes it so that it's maintained only once.

The only behavioral change is that when backfill sync in paused, the global backfill state is updated. I asked @AgeManning about this and he deemed it a bug, so this solves it.
2021-10-19 22:32:25 +00:00
Michael Sproul
aad397f00a Resolve Rust 1.56 lints and warnings (#2728)
## Issue Addressed

When compiling with Rust 1.56.0 the compiler generates 3 instances of this warning:

```
warning: trailing semicolon in macro used in expression position
   --> common/eth2_network_config/src/lib.rs:181:24
    |
181 |                     })?;
    |                        ^
...
195 |         let deposit_contract_deploy_block = load_from_file!(DEPLOY_BLOCK_FILE);
    |                                             ---------------------------------- in this macro invocation
    |
    = note: `#[warn(semicolon_in_expressions_from_macros)]` on by default
    = warning: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!
    = note: for more information, see issue #79813 <https://github.com/rust-lang/rust/issues/79813>
    = note: this warning originates in the macro `load_from_file` (in Nightly builds, run with -Z macro-backtrace for more info)
```

This warning is completely harmless, but will be visible to users compiling Lighthouse v2.0.1 (or earlier) with Rust 1.56.0 (to be released October 21st). It is **completely safe** to ignore this warning, it's just a superficial change to Rust's syntax.

## Proposed Changes

This PR removes the semi-colon as recommended, and fixes the new Clippy lints from 1.56.0
2021-10-19 00:30:42 +00:00
Akihito Nakano
efec60ee90 Tiny fix: wrong log level (#2720)
## Proposed Changes

If the `RemoveChain` is critical log level should be crit. 🙂
2021-10-19 00:30:41 +00:00
Michael Sproul
d2e3d4c6f1 Add flag to disable lock timeouts (#2714)
## Issue Addressed

Mitigates #1096

## Proposed Changes

Add a flag to the beacon node called `--disable-lock-timeouts` which allows opting out of lock timeouts.

The lock timeouts serve a dual purpose:

1. They prevent any single operation from hogging the lock for too long. When a timeout occurs it logs a nasty error which indicates that there's suboptimal lock use occurring, which we can then act on.
2. They allow deadlock detection. We're fairly sure there are no deadlocks left in Lighthouse anymore but the timeout locks offer a safeguard against that.

However, timeouts on locks are not without downsides:

They allow for the possibility of livelock, particularly on slower hardware. If lock timeouts keep failing spuriously the node can be prevented from making any progress, even if it would be able to make progress slowly without the timeout. One particularly concerning scenario which could occur would be if a DoS attack succeeded in slowing block signature verification times across the network, and all Lighthouse nodes got livelocked because they timed out repeatedly. This could also occur on just a subset of nodes (e.g. dual core VPSs or Raspberri Pis).

By making the behaviour runtime configurable this PR allows us to choose the behaviour we want depending on circumstance. I suspect that long term we could make the timeout-free approach the default (#2381 moves in this direction) and just enable the timeouts on our testnet nodes for debugging purposes. This PR conservatively leaves the default as-is so we can gain some more experience before switching the default.
2021-10-19 00:30:40 +00:00
Age Manning
df40700ddd Rename eth2_libp2p to lighthouse_network (#2702)
## Description

The `eth2_libp2p` crate was originally named and designed to incorporate a simple libp2p integration into lighthouse. Since its origins the crates purpose has expanded dramatically. It now houses a lot more sophistication that is specific to lighthouse and no longer just a libp2p integration. 

As of this writing it currently houses the following high-level lighthouse-specific logic:
- Lighthouse's implementation of the eth2 RPC protocol and specific encodings/decodings
- Integration and handling of ENRs with respect to libp2p and eth2
- Lighthouse's discovery logic, its integration with discv5 and logic about searching and handling peers. 
- Lighthouse's peer manager - This is a large module handling various aspects of Lighthouse's network, such as peer scoring, handling pings and metadata, connection maintenance and recording, etc.
- Lighthouse's peer database - This is a collection of information stored for each individual peer which is specific to lighthouse. We store connection state, sync state, last seen ips and scores etc. The data stored for each peer is designed for various elements of the lighthouse code base such as syncing and the http api.
- Gossipsub scoring - This stores a collection of gossipsub 1.1 scoring mechanisms that are continuously analyssed and updated based on the ethereum 2 networks and how Lighthouse performs on these networks.
- Lighthouse specific types for managing gossipsub topics, sync status and ENR fields
- Lighthouse's network HTTP API metrics - A collection of metrics for lighthouse network monitoring
- Lighthouse's custom configuration of all networking protocols, RPC, gossipsub, discovery, identify and libp2p. 

Therefore it makes sense to rename the crate to be more akin to its current purposes, simply that it manages the majority of Lighthouse's network stack. This PR renames this crate to `lighthouse_network`

Co-authored-by: Paul Hauner <paul@paulhauner.com>
2021-10-19 00:30:39 +00:00
Paul Hauner
fff01b24dd Release v2.0.1 (#2726)
## Issue Addressed

NA

## Proposed Changes

- Update versions to `v2.0.1` in anticipation for a release early next week.
- Add `--ignore` to `cargo audit`. See #2727.

## Additional Info

NA
2021-10-18 03:08:32 +00:00
Age Manning
180c90bf6d Correct peer connection transition logic (#2725)
## Description

This PR updates the peer connection transition logic. It is acceptable for a peer to immediately transition from a disconnected state to a disconnecting state. This can occur when we are at our peer limit and a new peer's dial us.
2021-10-17 04:04:36 +00:00
Paul Hauner
a7b675460d Add Altair tests to op pool (#2723)
## Issue Addressed

NA

## Proposed Changes

Adds some more testing for Altair to the op pool. Credits to @michaelsproul for some appropriated efforts here.

## Additional Info

NA


Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-10-16 05:07:23 +00:00
Michael Sproul
5cde3fc4da Reduce lock contention in backfill sync (#2716)
## Proposed Changes

Clone the proposer pubkeys during backfill signature verification to reduce the time that the pubkey cache lock is held for. Cloning such a small number of pubkeys has negligible impact on the total running time, but greatly reduces lock contention.

On a Ryzen 5950X, the setup step seems to take around 180us regardless of whether the key is cloned or not, while the verification takes 7ms. When Lighthouse is limited to 10% of one core using `sudo cpulimit --pid <pid> --limit 10` the total time jumps up to 800ms, but the setup step remains only 250us. This means that under heavy load this PR could cut the time the lock is held for from 800ms to 250us, which is a huge saving of 99.97%!
2021-10-15 03:28:03 +00:00
Paul Hauner
9c5a8ab7f2 Change "too many resources" to "insufficient resources" in eth2_libp2p (#2713)
## Issue Addressed

NA

## Proposed Changes

Fixes what I assume is a typo in a log message. See the diff for details.

## Additional Info

NA
2021-10-15 00:07:12 +00:00
Age Manning
05040e68ec Update discovery (#2711)
## Issue Addressed

#2695 

## Proposed Changes

This updates discovery to the latest version which has patched a panic that occurred due to a race condition in the bucket logic.
2021-10-14 22:09:38 +00:00
Paul Hauner
e2d09bb8ac Add BeaconChainHarness::builder (#2707)
## Issue Addressed

NA

## Proposed Changes

This PR is near-identical to https://github.com/sigp/lighthouse/pull/2652, however it is to be merged into `unstable` instead of `merge-f2f`. Please see that PR for reasoning.

I'm making this duplicate PR to merge to `unstable` in an effort to shrink the diff between `unstable` and `merge-f2f` by doing smaller, lead-up PRs.

## Additional Info

NA
2021-10-14 02:58:10 +00:00
Pawan Dhananjay
34d22b5920 Reduce validator monitor logging verbosity (#2606)
## Issue Addressed

Resolves #2541

## Proposed Changes

Reduces verbosity of validator monitor per epoch logging by batching info logs for multiple validators.

Instead of a log for every validator managed by the validator monitor, we now batch logs for attestation records for previous epoch.

Before:
```log
Sep 20 06:53:08.239 INFO Previous epoch attestation success      validator: 1, epoch: 65875, matched_head: true, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon
Sep 20 06:53:08.239 INFO Previous epoch attestation success      validator: 2, epoch: 65875, matched_head: true, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon
Sep 20 06:53:08.239 INFO Previous epoch attestation success      validator: 3, epoch: 65875, matched_head: true, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon
Sep 20 06:53:08.239 INFO Previous epoch attestation success      validator: 4, epoch: 65875, matched_head: true, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon
Sep 20 06:53:08.239 INFO Previous epoch attestation success      validator: 5, epoch: 65875, matched_head: false, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon
Sep 20 06:53:08.239 WARN Attestation failed to match head        validator: 5, epoch: 65875, service: val_mon
Sep 20 06:53:08.239 INFO Previous epoch attestation success      validator: 6, epoch: 65875, matched_head: false, matched_target: true, inclusion_lag: 0 slot(s), service: val_mon
Sep 20 06:53:08.239 WARN Attestation failed to match head        validator: 6, epoch: 65875, service: val_mon
Sep 20 06:53:08.239 INFO Previous epoch attestation success      validator: 7, epoch: 65875, matched_head: true, matched_target: false, inclusion_lag: 1 slot(s), service: val_mon
Sep 20 06:53:08.239 WARN Attestation failed to match target      validator: 7, epoch: 65875, service: val_mon
Sep 20 06:53:08.239 WARN Sub-optimal inclusion delay             validator: 7, epoch: 65875, optimal: 1, delay: 2, service: val_mon
Sep 20 06:53:08.239 INFO Previous epoch attestation success      validator: 8, epoch: 65875, matched_head: true, matched_target: false, inclusion_lag: 1 slot(s), service: val_mon
Sep 20 06:53:08.239 WARN Attestation failed to match target      validator: 8, epoch: 65875, service: val_mon
Sep 20 06:53:08.239 WARN Sub-optimal inclusion delay             validator: 8, epoch: 65875, optimal: 1, delay: 2, service: val_mon
Sep 20 06:53:08.239 ERRO Previous epoch attestation missing      validator: 9, epoch: 65875, service: val_mon
Sep 20 06:53:08.239 ERRO Previous epoch attestation missing      validator: 10, epoch: 65875, service: val_mon
```

after
```
Sep 20 06:53:08.239 INFO Previous epoch attestation success      validators: [1,2,3,4,5,6,7,8,9] , epoch: 65875, service: val_mon
Sep 20 06:53:08.239 WARN Previous epoch attestation failed to match head, validators: [5,6], epoch: 65875, service: val_mon
Sep 20 06:53:08.239 WARN Previous epoch attestation failed to match target, validators: [7,8], epoch: 65875, service: val_mon
Sep 20 06:53:08.239 WARN Previous epoch attestations had sub-optimal inclusion delay, validators: [7,8], epoch: 65875, service: val_mon
Sep 20 06:53:08.239 ERRO Previous epoch attestation missing      validators: [9,10], epoch: 65875, service: val_mon
```

The detailed individual logs are downgraded to debug logs.
2021-10-12 05:06:48 +00:00
Mac L
a73d698e30 Add TLS capability to the beacon node HTTP API (#2668)
Currently, the beacon node has no ability to serve the HTTP API over TLS.
Adding this functionality would be helpful for certain use cases, such as when you need a validator client to connect to a backup beacon node which is outside your local network, and the use of an SSH tunnel or reverse proxy would be inappropriate.

## Proposed Changes

- Add three new CLI flags to the beacon node
  - `--http-enable-tls`: enables TLS
  - `--http-tls-cert`: to specify the path to the certificate file
  - `--http-tls-key`: to specify the path to the key file
- Update the HTTP API to optionally use `warp`'s [`TlsServer`](https://docs.rs/warp/0.3.1/warp/struct.TlsServer.html) depending on the presence of the `--http-enable-tls` flag
- Update tests and docs
- Use a custom branch for `warp` to ensure proper error handling

## Additional Info

Serving the API over TLS should currently be considered experimental. The reason for this is that it uses code from an [unmerged PR](https://github.com/seanmonstar/warp/pull/717). This commit provides the `try_bind_with_graceful_shutdown` method to `warp`, which is helpful for controlling error flow when the TLS configuration is invalid (cert/key files don't exist, incorrect permissions, etc). 
I've implemented the same code in my [branch here](https://github.com/macladson/warp/tree/tls).

Once the code has been reviewed and merged upstream into `warp`, we can remove the dependency on my branch and the feature can be considered more stable.

Currently, the private key file must not be password-protected in order to be read into Lighthouse.
2021-10-12 03:35:49 +00:00
Age Manning
0aee7ec873 Refactor Peerdb and PeerManager (#2660)
## Proposed Changes

This is a refactor of the PeerDB and PeerManager. A number of bugs have been surfacing around the connection state of peers and their interaction with the score state. 

This refactor tightens the mutability properties of peers such that only specific modules are able to modify the state of peer information preventing inadvertant state changes that can lead to our local peer manager db being out of sync with libp2p. 

Further, the logic around connection and scoring was quite convoluted and the distinction between the PeerManager and Peerdb was not well defined. Although these issues are not fully resolved, this PR is step to cleaning up this logic. The peerdb solely manages most mutability operations of peers leaving high-order logic to the peer manager. 

A single `update_connection_state()` function has been added to the peer-db making it solely responsible for modifying the peer's connection state. The way the peer's scores can be modified have been reduced to three simple functions (`update_scores()`, `update_gossipsub_scores()` and `report_peer()`). This prevents any add-hoc modifications of scores and only natural processes of score modification is allowed which simplifies the reasoning of score and state changes.
2021-10-11 02:45:06 +00:00
Michael Sproul
708557a473 Fix cargo audit warns for nix, psutil, time (#2699)
## Issue Addressed

Fix `cargo audit` failures on `unstable`

Closes #2698

## Proposed Changes

The main culprit is `nix`, which is vulnerable for versions below v0.23.0. We can't get by with a straight-forward `cargo update` because `psutil` depends on an old version of `nix` (cf. https://github.com/rust-psutil/rust-psutil/pull/93). Hence I've temporarily forked `psutil` under the `sigp` org, where I've included the update to `nix` v0.23.0.

Additionally, I took the chance to update the `time` dependency to v0.3, which removed a bunch of stale deps including `stdweb` which is no longer maintained. Lighthouse only uses the `time` crate in the notifier to do some pretty printing, and so wasn't affected by any of the breaking changes in v0.3 ([changelog here](https://github.com/time-rs/time/blob/main/CHANGELOG.md#030-2021-07-30)).
2021-10-11 00:10:35 +00:00
Pawan Dhananjay
7c7ba770de Update broken api links (#2665)
## Issue Addressed

Resolves #2563 
Replacement for #2653 as I'm not able to reopen that PR after force pushing.

## Proposed Changes

Fixes all broken api links. Cherry picked changes in #2590 and updated a few more links.

Co-authored-by: Mason Stallmo <masonstallmo@gmail.com>
2021-10-06 00:46:09 +00:00
Pawan Dhananjay
73ec29c267 Don't log errors on resubscription of gossip topics (#2613)
## Issue Addressed

Resolves #2555

## Proposed Changes

Don't log errors on resubscribing to topics. Also don't log errors if we are setting already set attnet/syncnet bits.
2021-10-06 00:46:08 +00:00
Wink Saville
58870fc6d3 Add test_logger as feature to logging (#2586)
## Issue Addressed

Fix #2585

## Proposed Changes

Provide a canonical version of test_logger that can be used
throughout lighthouse.

## Additional Info

This allows tests to conditionally emit logging data by adding
test_logger as the default logger. And then when executing
`cargo test --features logging/test_logger` log output
will be visible:

  wink@3900x:~/lighthouse/common/logging/tests/test-feature-test_logger (Add-test_logger-as-feature-to-logging)
  $ cargo test --features logging/test_logger
      Finished test [unoptimized + debuginfo] target(s) in 0.02s
       Running unittests (target/debug/deps/test_logger-e20115db6a5e3714)

  running 1 test
  Sep 10 12:53:45.212 INFO hi, module: test_logger:8
  test tests::test_fn_with_logging ... ok

  test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Doc-tests test-logger

  running 0 tests

  test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Or, in normal scenarios where logging isn't needed, executing
`cargo test` the log output will not be visible:

  wink@3900x:~/lighthouse/common/logging/tests/test-feature-test_logger (Add-test_logger-as-feature-to-logging)
  $ cargo test
      Finished test [unoptimized + debuginfo] target(s) in 0.02s
       Running unittests (target/debug/deps/test_logger-02e02f8d41e8cf8a)

  running 1 test
  test tests::test_fn_with_logging ... ok

  test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Doc-tests test-logger

  running 0 tests

  test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
2021-10-06 00:46:07 +00:00
Michael Sproul
7c88f582d9 Release v2.0.0 (#2673)
## Proposed Changes

* Bump version to v2.0.0
* Update dependencies (obsoletes #2670). `tokio-macros` v1.4.0 had been yanked due to a bug.
2021-10-05 03:53:18 +00:00
Michael Sproul
ed1fc7cca6 Fix I/O atomicity issues with checkpoint sync (#2671)
## Issue Addressed

This PR addresses an issue found by @YorickDowne during testing of v2.0.0-rc.0.

Due to a lack of atomic database writes on checkpoint sync start-up, it was possible for the database to get into an inconsistent state from which it couldn't recover without `--purge-db`. The core of the issue was that the store's anchor info was being stored _before_ the `PersistedBeaconChain`. If a crash occured so that anchor info was stored but _not_ the `PersistedBeaconChain`, then on restart Lighthouse would think the database was unitialized and attempt to compare-and-swap a `None` value, but would actually find the stale info from the previous run.

## Proposed Changes

The issue is fixed by writing the anchor info, the split point, and the `PersistedBeaconChain` atomically on start-up. Some type-hinting ugliness was required, which could possibly be cleaned up in future refactors.
2021-10-05 03:53:17 +00:00
Kane Wallmann
28b79084cd Fix chain_id value in config/deposit_contract RPC method (#2659)
## Issue Addressed

This PR addresses issue #2657

## Proposed Changes

Changes `/eth/v1/config/deposit_contract` endpoint to return the chain ID from the loaded chain spec instead of eth1::DEFAULT_NETWORK_ID which is the Goerli chain ID of 5.

Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-10-01 06:32:38 +00:00
Michael Sproul
ea78315749 Release v2.0.0-rc.0 (#2634)
## Proposed Changes

Cut the first release candidate for v2.0.0, in preparation for testing and release this week

## Additional Info

Builds on #2632, which should either be merged first or in the same batch
2021-10-01 01:23:55 +00:00
Age Manning
29a8865d07 Consistent tracking of disconnected peers (#2650)
## Issue Addressed

N/A

## Proposed Changes

When peers switching to a disconnecting state, decrement the disconnected peers counter. This also downgrades some crit logs to errors. 

I've also added a re-sync point when peers get unbanned the disconnected peer count will match back to the number of disconnected peers if it has gone out of sync previously.
2021-09-30 04:31:43 +00:00
Squirrel
db4d72c4f1 Remove unused deps (#2592)
Found some deps you're possibly not using.

Please shout if you think they are indeed still needed.
2021-09-30 04:31:42 +00:00
Mac L
4c510f8f6b Add BlockTimesCache to allow additional block delay metrics (#2546)
## Issue Addressed

Closes #2528

## Proposed Changes

- Add `BlockTimesCache` to provide block timing information to `BeaconChain`. This allows additional metrics to be calculated for blocks that are set as head too late.
- Thread the `seen_timestamp` of blocks received from RPC responses (except blocks from syncing) through to the sync manager, similar to what is done for blocks from gossip.

## Additional Info

This provides the following additional metrics:
- `BEACON_BLOCK_OBSERVED_SLOT_START_DELAY_TIME`
  - The delay between the start of the slot and when the block was first observed.
- `BEACON_BLOCK_IMPORTED_OBSERVED_DELAY_TIME`
   - The delay between when the block was first observed and when the block was imported.
- `BEACON_BLOCK_HEAD_IMPORTED_DELAY_TIME`
  - The delay between when the block was imported and when the block was set as head.

The metric `BEACON_BLOCK_IMPORTED_SLOT_START_DELAY_TIME` was removed.

A log is produced when a block is set as head too late, e.g.:
```
Aug 27 03:46:39.006 DEBG Delayed head block                      set_as_head_delay: Some(21.731066ms), imported_delay: Some(119.929934ms), observed_delay: Some(3.864596988s), block_delay: 4.006257988s, slot: 1931331, proposer_index: 24294, block_root: 0x937602c89d3143afa89088a44bdf4b4d0d760dad082abacb229495c048648a9e, service: beacon
```
2021-09-30 04:31:41 +00:00
Pawan Dhananjay
70441aa554 Improve valmon inclusion delay calculation (#2618)
## Issue Addressed

Resolves #2552 

## Proposed Changes

Offers some improvement in inclusion distance calculation in the validator monitor. 

When registering an attestation from a block, instead of doing `block.slot() - attesstation.data.slot()` to get the inclusion distance, we now pass the parent block slot from the beacon chain and do `parent_slot.saturating_sub(attestation.data.slot())`. This allows us to give best effort inclusion distance in scenarios where the attestation was included right after a skip slot. Note that this does not give accurate results in scenarios where the attestation was included few blocks after the skip slot.

In this case, if the attestation slot was `b1` and was included in block `b2` with a skip slot in between, we would get the inclusion delay as 0  (by ignoring the skip slot) which is the best effort inclusion delay.
```
b1 <- missed <- b2
``` 

Here, if the attestation slot was `b1` and was included in block `b3` with a skip slot and valid block `b2` in between, then we would get the inclusion delay as 2 instead of 1 (by ignoring the skip slot).
```
b1 <- missed <- b2 <- b3 
```
A solution for the scenario 2 would be to count number of slots between included slot and attestation slot ignoring the skip slots in the beacon chain and pass the value to the validator monitor. But I'm concerned that it could potentially lead to db accesses for older blocks in extreme cases.


This PR also uses the validator monitor data for logging per epoch inclusion distance. This is useful as we won't get inclusion data in post-altair summaries.


Co-authored-by: Michael Sproul <micsproul@gmail.com>
2021-09-30 01:22:43 +00:00
realbigsean
7d13e57d9f Add interop metrics (#2645)
## Issue Addressed

Resolves: #2644

## Proposed Changes

- Adds mandatory metrics mentioned here: https://github.com/ethereum/beacon-metrics/blob/master/metrics.md#interop-metrics

## Additional Info

Couldn't figure out how to alias metrics, so I created them all as new gauges/counters.

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-09-29 23:44:24 +00:00
realbigsean
113ef74ef6 Add contribution and proof event (#2527)
## Issue Addressed

N/A

## Proposed Changes

Add the new ContributionAndProof event: https://github.com/ethereum/beacon-APIs/pull/158

## Additional Info

N/A

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-09-25 07:53:58 +00:00
Paul Hauner
fe52322088 Implement SSZ union type (#2579)
## Issue Addressed

NA

## Proposed Changes

Implements the "union" type from the SSZ spec for `ssz`, `ssz_derive`, `tree_hash` and `tree_hash_derive` so it may be derived for `enums`:

https://github.com/ethereum/consensus-specs/blob/v1.1.0-beta.3/ssz/simple-serialize.md#union

The union type is required for the merge, since the `Transaction` type is defined as a single-variant union `Union[OpaqueTransaction]`.

### Crate Updates

This PR will (hopefully) cause CI to publish new versions for the following crates:

- `eth2_ssz_derive`: `0.2.1` -> `0.3.0`
- `eth2_ssz`: `0.3.0` -> `0.4.0`
- `eth2_ssz_types`: `0.2.0` -> `0.2.1`
- `tree_hash`: `0.3.0` -> `0.4.0`
- `tree_hash_derive`: `0.3.0` -> `0.4.0`

These these crates depend on each other, I've had to add a workspace-level `[patch]` for these crates. A follow-up PR will need to remove this patch, ones the new versions are published.

### Union Behaviors

We already had SSZ `Encode` and `TreeHash` derive for enums, however it just did a "transparent" pass-through of the inner value. Since the "union" decoding from the spec is in conflict with the transparent method, I've required that all `enum` have exactly one of the following enum-level attributes:

#### SSZ

-  `#[ssz(enum_behaviour = "union")]`
    - matches the spec used for the merge
-  `#[ssz(enum_behaviour = "transparent")]`
    - maintains existing functionality
    - not supported for `Decode` (never was)
    
#### TreeHash

-  `#[tree_hash(enum_behaviour = "union")]`
    - matches the spec used for the merge
-  `#[tree_hash(enum_behaviour = "transparent")]`
    - maintains existing functionality

This means that we can maintain the existing transparent behaviour, but all existing users will get a compile-time error until they explicitly opt-in to being transparent.

### Legacy Option Encoding

Before this PR, we already had a union-esque encoding for `Option<T>`. However, this was with the *old* SSZ spec where the union selector was 4 bytes. During merge specification, the spec was changed to use 1 byte for the selector.

Whilst the 4-byte `Option` encoding was never used in the spec, we used it in our database. Writing a migrate script for all occurrences of `Option` in the database would be painful, especially since it's used in the `CommitteeCache`. To avoid the migrate script, I added a serde-esque `#[ssz(with = "module")]` field-level attribute to `ssz_derive` so that we can opt into the 4-byte encoding on a field-by-field basis.

The `ssz::legacy::four_byte_impl!` macro allows a one-liner to define the module required for the `#[ssz(with = "module")]` for some `Option<T> where T: Encode + Decode`.

Notably, **I have removed `Encode` and `Decode` impls for `Option`**. I've done this to force a break on downstream users. Like I mentioned, `Option` isn't used in the spec so I don't think it'll be *that* annoying. I think it's nicer than quietly having two different union implementations or quietly breaking the existing `Option` impl.

### Crate Publish Ordering

I've modified the order in which CI publishes crates to ensure that we don't publish a crate without ensuring we already published a crate that it depends upon.

## TODO

- [ ] Queue a follow-up `[patch]`-removing PR.
2021-09-25 05:58:36 +00:00
Age Manning
00a7ef0036 Correct bug in sync (#2615)
A bug that causes failed batches to continually download in a loop is corrected.
2021-09-23 01:32:04 +00:00
Paul Hauner
be11437c27 Batch BLS verification for attestations (#2399)
## Issue Addressed

NA

## Proposed Changes

Adds the ability to verify batches of aggregated/unaggregated attestations from the network.

When the `BeaconProcessor` finds there are messages in the aggregated or unaggregated attestation queues, it will first check the length of the queue:

- `== 1` verify the attestation individually.
- `>= 2` take up to 64 of those attestations and verify them in a batch.

Notably, we only perform batch verification if the queue has a backlog. We don't apply any artificial delays to attestations to try and force them into batches. 

### Batching Details

To assist with implementing batches we modify `beacon_chain::attestation_verification` to have two distinct categories for attestations:

- *Indexed* attestations: those which have passed initial validation and were valid enough for us to derive an `IndexedAttestation`.
- *Verified* attestations: those attestations which were indexed *and also* passed signature verification. These are well-formed, interesting messages which were signed by validators.

The batching functions accept `n` attestations and then return `n` attestation verification `Result`s, where those `Result`s can be any combination of `Ok` or `Err`. In other words, we attempt to verify as many attestations as possible and return specific per-attestation results so peer scores can be updated, if required.

When we batch verify attestations, we first try to map all those attestations to *indexed* attestations. If any of those attestations were able to be indexed, we then perform batch BLS verification on those indexed attestations. If the batch verification succeeds, we convert them into *verified* attestations, disabling individual signature checking. If the batch fails, we convert to verified attestations with individual signature checking enabled.

Ultimately, we optimistically try to do a batch verification of attestation signatures and fall-back to individual verification if it fails. This opens an attach vector for "poisoning" the attestations and causing us to waste a batch verification. I argue that peer scoring should do a good-enough job of defending against this and the typical-case gains massively outweigh the worst-case losses.

## Additional Info

Before this PR, attestation verification took the attestations by value (instead of by reference). It turns out that this was unnecessary and, in my opinion, resulted in some undesirable ergonomics (e.g., we had to pass the attestation back in the `Err` variant to avoid clones). In this PR I've modified attestation verification so that it now takes a reference.

I refactored the `beacon_chain/tests/attestation_verification.rs` tests so they use a builder-esque "tester" struct instead of a weird macro. It made it easier for me to test individual/batch with the same set of tests and I think it was a nice tidy-up. Notably, I did this last to try and make sure my new refactors to *actual* production code would pass under the existing test suite.
2021-09-22 08:49:41 +00:00
Michael Sproul
9667dc2f03 Implement checkpoint sync (#2244)
## Issue Addressed

Closes #1891
Closes #1784

## Proposed Changes

Implement checkpoint sync for Lighthouse, enabling it to start from a weak subjectivity checkpoint.

## Additional Info

- [x] Return unavailable status for out-of-range blocks requested by peers (#2561)
- [x] Implement sync daemon for fetching historical blocks (#2561)
- [x] Verify chain hashes (either in `historical_blocks.rs` or the calling module)
- [x] Consistency check for initial block + state
- [x] Fetch the initial state and block from a beacon node HTTP endpoint
- [x] Don't crash fetching beacon states by slot from the API
- [x] Background service for state reconstruction, triggered by CLI flag or API call.

Considered out of scope for this PR:

- Drop the requirement to provide the `--checkpoint-block` (this would require some pretty heavy refactoring of block verification)


Co-authored-by: Diva M <divma@protonmail.com>
2021-09-22 00:37:28 +00:00
Age Manning
280e4fe23d Increase connection limits and allow priority connections (#2604)
In previous network updates we have made our libp2p connections more lean by limiting the maximum number of connections a lighthouse node will accept before libp2p rejects new connections. 

However, we still maintain the logic that at maximum connections, we try to dial extra peers if they are needed by a validator client to publish messages on a specific subnet. The dials typically result in failures due the libp2p connection limits. 

This PR adds an extra factor, `PRIORITY_PEER_EXCESS` which sets aside a new allocation of peers we are able to dial in case we need these peers for the VC client.  This allocation sits along side the excess peer (which allows extra incoming peers on top of our target peer limit). 

The drawback here, is that libp2p now allows extra peers to connect to us (beyond the standard peer limit) which the peer manager should subsequently reject.
2021-09-21 07:45:13 +00:00
Age Manning
a73dcb7b6d Improved handling of IP Banning (#2530)
This PR in general improves the handling around peer banning. 

Specifically there were issues when multiple peers under a single IP connected to us after we banned the IP for poor behaviour.

This PR should now handle these peers gracefully as well as make some improvements around how we previously disconnected and banned peers. 

The logic now goes as follows:
- Once a peer gets banned, its gets registered with its known IP addresses
- Once enough banned peers exist under a single IP that IP is banned
- We retain connections with existing peers under this IP
- Any new connections under this IP are rejected
2021-09-17 04:02:31 +00:00
Pawan Dhananjay
64ad2af100 Subscribe to altair gossip topics 2 slots before fork (#2532)
## Issue Addressed

N/A

## Proposed Changes

Add a fork_digest to `ForkContext` only if it is set in the config.
Reject gossip messages on post fork topics before the fork happens.

Edit: Instead of rejecting gossip messages on post fork topics, we now subscribe to post fork topics 2 slots before the fork.

Co-authored-by: Age Manning <Age@AgeManning.com>
2021-09-17 01:11:16 +00:00
Age Manning
56e0615df8 Experimental discovery (#2577)
# Description

A few changes have been made to discovery. In particular a custom re-write of an LRU cache which previously was read/write O(N) for all our sessions ~5k, to a more reasonable hashmap-style O(1). 

Further there has been reported issues in the current discv5, so added error handling to help identify the issue has been added.
2021-09-16 04:45:05 +00:00
Age Manning
95b17137a8 Reduce network debug noise (#2593)
The identify network debug logs can get quite noisy and are unnecessary to print on every request/response. 

This PR reduces debug noise by only printing messages for identify messages that offer some new information.
2021-09-14 08:28:35 +00:00
Wink Saville
4755d4b236 Update sloggers to v2.0.2 (#2588)
fixes #2584
2021-09-14 06:48:26 +00:00
Paul Hauner
f9bba92db3 v1.5.2 (#2595)
## Issue Addressed

NA

## Proposed Changes

Version bump

## Additional Info

Please do not `bors` without my approval, I am still testing.
2021-09-13 23:01:19 +00:00
Paul Hauner
ddbd4e6965 v1.5.2-rc.0 (#2565)
## Issue Addressed

NA

## Proposed Changes

- Bump version
- Tidy some comments mangled by the version change regex.

## Additional Info

NA
2021-09-03 23:28:21 +00:00
Michael Sproul
9c785a9b33 Optimize process_attestation with active balance cache (#2560)
## Proposed Changes

Cache the total active balance for the current epoch in the `BeaconState`. Computing this value takes around 1ms, and this was negatively impacting block processing times on Prater, particularly when reconstructing states.

With a large number of attestations in each block, I saw the `process_attestations` function taking 150ms, which means that reconstructing hot states can take up to 4.65s (31 * 150ms), and reconstructing freezer states can take up to 307s (2047 * 150ms).

I opted to add the cache to the beacon state rather than computing the total active balance at the start of state processing and threading it through. Although this would be simpler in a way, it would waste time, particularly during block replay, as the total active balance doesn't change for the duration of an epoch. So we save ~32ms for hot states, and up to 8.1s for freezer states (using `--slots-per-restore-point 8192`).
2021-09-03 07:50:43 +00:00
realbigsean
50321c6671 Updates to make crates publishable (#2472)
## Issue Addressed

Related to: #2259

Made an attempt at all the necessary updates here to publish the crates to crates.io. I incremented the minor versions on all the crates that have been previously published. We still might run into some issues as we try to publish because I'm not able to test this out but I think it's a good starting point.

## Proposed Changes

- Add description and license to `ssz_types` and `serde_util`
- rename `serde_util` to `eth2_serde_util`
- increment minor versions
- remove path dependencies
- remove patch dependencies 

## Additional Info
Crates published: 

- [x] `tree_hash` -- need to publish `tree_hash_derive` and `eth2_hashing` first
- [x] `eth2_ssz_types` -- need to publish `eth2_serde_util` first
- [x] `tree_hash_derive`
- [x] `eth2_ssz`
- [x] `eth2_ssz_derive`
- [x] `eth2_serde_util`
- [x] `eth2_hashing`


Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-09-03 01:10:25 +00:00
Pawan Dhananjay
5a3bcd2904 Validator monitor support for sync committees (#2476)
## Issue Addressed

N/A

## Proposed Changes

Add functionality in the validator monitor to provide sync committee related metrics for monitored validators.


Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-08-31 23:31:36 +00:00
Paul Hauner
44fa54004c Persist to DB after setting canonical head (#2547)
## Issue Addressed

NA

## Proposed Changes

Missed head votes on attestations is a well-known issue. The primary cause is a block getting set as the head *after* the attestation deadline.

This PR aims to shorten the overall time between "block received" and "block set as head" by:

1. Persisting the head and fork choice *after* setting the canonical head
    - Informal measurements show this takes ~200ms
 1. Pruning the op pool *after* setting the canonical head.
 1. No longer persisting the op pool to disk during `BeaconChain::fork_choice`
     - Informal measurements show this can take up to 1.2s.
     
I also add some metrics to help measure the effect of these changes.
     
Persistence changes like this run the risk of breaking assumptions downstream. However, I have considered these risks and I think we're fine here. I will describe my reasoning for each change.

## Reasoning

### Change 1:  Persisting the head and fork choice *after* setting the canonical head

For (1), although the function is called `persist_head_and_fork_choice`, it only persists:

- Fork choice
- Head tracker
- Genesis block root

Since `BeaconChain::fork_choice_internal` does not modify these values between the original time we were persisting it and the current time, I assert that the change I've made is non-substantial in terms of what ends up on-disk. There's the possibility that some *other* thread has modified fork choice in the extra time we've given it, but that's totally fine.

Since the only time we *read* those values from disk is during startup, I assert that this has no impact during runtime. 

### Change 2: Pruning the op pool after setting the canonical head

Similar to the argument above, we don't modify the op pool during `BeaconChain::fork_choice_internal` so it shouldn't matter when we prune. This change should be non-substantial.

### Change 3: No longer persisting the op pool to disk during `BeaconChain::fork_choice`

This change *is* substantial. With the proposed changes, we'll only be persisting the op pool to disk when we shut down cleanly (i.e., the `BeaconChain` gets dropped). This means we'll save disk IO and time during usual operation, but a `kill -9` or similar "crash" will probably result in an out-of-date op pool when we reboot. An out-of-date op pool can only have an impact when producing blocks or aggregate attestations/sync committees.

I think it's pretty reasonable that a crash might result in an out-of-date op pool, since:

- Crashes are fairly rare. Practically the only time I see LH suffer a full crash is when the OOM killer shows up, and that's a very serious event.
- It's generally quite rare to produce a block/aggregate immediately after a reboot. Just a few slots of runtime is probably enough to have a decent-enough op pool again.

## Additional Info

Credits to @macladson for the timings referenced here.
2021-08-31 04:48:21 +00:00
Pawan Dhananjay
b4dd98b3c6 Shutdown after sync (#2519)
## Issue Addressed

Resolves #2033 

## Proposed Changes

Adds a flag to enable shutting down beacon node right after sync is completed.

## Additional Info

Will need modification after weak subjectivity sync is enabled to change definition of a fully synced node.
2021-08-30 13:46:13 +00:00
Michael Sproul
10945e0619 Revert bad blocks on missed fork (#2529)
## Issue Addressed

Closes #2526

## Proposed Changes

If the head block fails to decode on start up, do two things:

1. Revert all blocks between the head and the most recent hard fork (to `fork_slot - 1`).
2. Reset fork choice so that it contains the new head, and all blocks back to the new head's finalized checkpoint.

## Additional Info

I tweaked some of the beacon chain test harness stuff in order to make it generic enough to test with a non-zero slot clock on start-up. In the process I consolidated all the various `new_` methods into a single generic one which will hopefully serve all future uses 🤞
2021-08-30 06:41:31 +00:00
Mason Stallmo
bc14d1d73d Add more unix signal handlers (#2486)
## Issue Addressed

Resolves #2114 

Swapped out the ctrlc crate for tokio signals to hook register handlers for SIGPIPE and SIGHUP along with SIGTERM and SIGINT.

## Proposed Changes

- Swap out the ctrlc crate for tokio signals for unix signal handing
- Register signals for SIGPIPE and SHIGUP that trigger the same shutdown procedure as SIGTERM and SIGINT

## Additional Info

I tested these changes against the examples in the original issue and noticed some interesting behavior on my machine. When running `lighthouse bn --network pyrmont |& tee -a pyrmont_bn.log` or `lighthouse bn --network pyrmont 2>&1 | tee -a pyrmont_bn.log` none of the above signals are sent to the lighthouse program in a way I was able to observe. 

The only time it seems that the signal gets sent to the lighthouse program is if there is no redirection of stderr to stdout. I'm not as familiar with the details of how unix signals work in linux with a redirect like that so I'm not sure if this is a bug in the program or expected behavior.

Signals are correctly received without the redirection and if the above signals are sent directly to the program with something like `kill`.
2021-08-30 05:19:34 +00:00
Pawan Dhananjay
99737c551a Improve eth1 fallback logging (#2490)
## Issue Addressed

Resolves #2487 

## Proposed Changes

Logs a message once in every invocation of `Eth1Service::update` method if the primary endpoint is unavailable for some reason. 

e.g.
```log
Aug 03 00:09:53.517 WARN Error connecting to eth1 node endpoint  action: trying fallbacks, endpoint: http://localhost:8545/, service: eth1_rpc
Aug 03 00:09:56.959 INFO Fetched data from fallback              fallback_number: 1, service: eth1_rpc
```

The main aim of this PR is to have an accompanying message to the "action: trying fallbacks" error message that is returned when checking the endpoint for liveness. This is mainly to indicate to the user that the fallback was live and reachable. 

## Additional info
This PR is not meant to be a catch all for all cases where the primary endpoint failed. For instance, this won't log anything if the primary node was working fine during endpoint liveness checking and failed during deposit/block fetching. This is done intentionally to reduce number of logs while initial deposit/block sync and to avoid more complicated logic.
2021-08-30 00:51:26 +00:00
Paul Hauner
b0ac3464ca v1.5.1 (#2544)
## Issue Addressed

NA

## Proposed Changes

- Bump version

## Additional Info

NA
2021-08-27 01:58:19 +00:00
Paul Hauner
4405425726 Expand gossip duplicate cache time (#2542)
## Issue Addressed

NA

## Proposed Changes

This PR expands the time that entries exist in the gossip-sub duplicate cache. Recent investigations found that this cache is one slot (12s) shorter than the period for which an attestation is permitted to propagate on the gossip network.

Before #2540, this was causing peers to be unnecessarily down-scored for sending old attestations. Although that issue has been fixed, the duplicate cache time is increased here to avoid such messages from getting any further up the networking stack then required.

## Additional Info

NA
2021-08-26 23:25:50 +00:00
Paul Hauner
3fdad38eba Remove penality for duplicate attestation from same validator (#2540)
## Issue Addressed

NA

## Proposed Changes

A Discord user presented logs which indicated a drop in their peer count caused by a variety of peers sending attestations where we'd already seen an attestation for that validator. It's presently unclear how this case came about, but during our investigation I noticed that we are down-voting peers for sending such attestations.

There are three scenarios where we may receive duplicate unagg. attestations from the same validator:

1. The validator is committing a slashable offense.
2. The gossipsub message-deduping functionality is not working as expected.
3. We received the message via the HTTP prior to seeing it via gossip.

Scenario (1) would be so costly for an attacker that I don't think we need to add DoS protection for it.

Scenario (2) seems feasible. Our "seen message" caches in gossipsub might fill up/expire and let through these duplicates. There are also cases involving message ID mismatches with the other peers. In both these cases, I don't think we should be doing 1 attestation == -1 point down-voting.

Scenario (3) is not necessarily a fault of the peer and we shouldn't down-score them for it.

## Additional Info

NA
2021-08-26 08:00:50 +00:00
Age Manning
09545fe668 Increase maximum gossipsub subscriptions (#2531)
Due to the altair fork, in principle we can now subscribe to up to 148 topics. This bypasses our original limit and we can end up rejecting subscriptions. 

This PR increases the limit to account for the fork.
2021-08-26 02:01:10 +00:00
Pawan Dhananjay
d3b4cbed53 Packet filter cli option (#2523)
## Issue Addressed

N/A

## Proposed Changes

Adds a cli option to disable packet filter in `lighthouse bootnode`. This is useful in running local testnets as the bootnode bans requests from the same ip(localhost) if the packet filter is enabled.
2021-08-26 00:29:39 +00:00
realbigsean
5b8436e33f Fork schedule api (#2525)
## Issue Addressed

Resolves #2524

## Proposed Changes

- Return all known forks in the `/config/fork_schedule`, previously returned only the head of the chain's fork.
- Deleted the `StateId::head` method because it was only previously used in this endpoint.


Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-08-24 01:36:27 +00:00
Pawan Dhananjay
02fd54bea7 Refactor discovery queries (#2518)
## Issue Addressed

N/A

## Proposed Changes

Refactor discovery queries such that only `QueryType::Subnet` queries are queued for grouping. `QueryType::FindPeers` is always made regardless of the number of active `Subnet` queries (max = 2). This prevents `FindPeers` queries from getting starved if `Subnet` queries start queuing up. 

Also removes `GroupedQueryType` struct and use `QueryType` for all queuing and processing of discovery requests.

## Additional Info

Currently, no distinction is made between subnet discovery queries for attestation and sync committee subnets when grouping queries. Potentially we could prioritise attestation queries over sync committee queries in the future.
2021-08-24 00:12:13 +00:00
Paul Hauner
90d5ab1566 v1.5.0 (#2535)
## Issue Addressed

NA

## Proposed Changes

- Version bump
- Increase queue sizes for aggregated attestations and re-queued attestations. 

## Additional Info

NA
2021-08-23 04:27:36 +00:00
Paul Hauner
f2a8c6229c Metrics and DEBG log for late gossip blocks (#2533)
## Issue Addressed

Which issue # does this PR address?

## Proposed Changes

- Add a counter metric to log when a block is received late from gossip.
- Also push a `DEBG` log for the above condition.
- Use Debug (`?`) instead of Display (`%`) for a bunch of logs in the beacon processor, so we don't have to deal with concatenated block roots.
- Add new ERRO and CRIT to HTTP API to alert users when they're publishing late blocks.

## Additional Info

NA
2021-08-23 00:59:14 +00:00
Paul Hauner
c7379836a5 v1.5.0-rc.1 (#2516)
## Issue Addressed

NA

## Proposed Changes

- Bump version

## Additional Info

NA
2021-08-17 05:34:31 +00:00
Michael Sproul
c0a2f501d9 Upgrade dependencies (#2513)
## Proposed Changes

* Consolidate Tokio versions: everything now uses the latest v1.10.0, no more `tokio-compat`.
* Many semver-compatible changes via `cargo update`. Notably this upgrades from the yanked v0.8.0 version of crossbeam-deque which is present in v1.5.0-rc.0
* Many semver incompatible upgrades via `cargo upgrades` and `cargo upgrade --workspace pkg_name`. Notable ommissions:
    - Prometheus, to be handled separately: https://github.com/sigp/lighthouse/issues/2485
    - `rand`, `rand_xorshift`: the libsecp256k1 package requires 0.7.x, so we'll stick with that for now
    - `ethereum-types` is pinned at 0.11.0 because that's what `web3` is using and it seems nice to have just a single version
    
## Additional Info

We still have two versions of `libp2p-core` due to `discv5` depending on the v0.29.0 release rather than `master`. AFAIK it should be OK to release in this state (cc @AgeManning )
2021-08-17 01:00:24 +00:00
Pawan Dhananjay
d17350c0fa Lower penalty for past/future slot errors (#2510)
## Issue Addressed

N/A

## Proposed Changes

Reduce the penalties with past/future slot errors for sync committee messages.
2021-08-16 23:30:18 +00:00
Paul Hauner
4c4ebfbaa1 v1.5.0 rc.0 (#2506)
## Issue Addressed

NA

## Proposed Changes

- Bump to `v1.5.0-rc.0`.
- Increase attestation reprocessing queue size (I saw this filling up on Prater).
- Reduce error log for full attn reprocessing queue to warn.

## TODO

- [x] Manual testing
- [x] Resolve https://github.com/sigp/lighthouse/pull/2493
- [x] Include https://github.com/sigp/lighthouse/pull/2501
2021-08-12 04:02:46 +00:00
Paul Hauner
4af6fcfafd Bump libp2p to address inconsistency in mesh peer tracking (#2493)
## Issue Addressed

- Resolves #2457
- Resolves #2443

## Proposed Changes

Target the (presently unreleased) head of `libp2p/rust-libp2p:master` in order to obtain the fix from https://github.com/libp2p/rust-libp2p/pull/2175.

Additionally:

- `libsecp256k1` needed to be upgraded to satisfy the new version of `libp2p`.
- There were also a handful of minor changes to `eth2_libp2p` to suit some interface changes.
- Two `cargo audit --ignore` flags were remove due to libp2p upgrades.

## Additional Info
 
 NA
2021-08-12 01:59:20 +00:00
Paul Hauner
ceda27371d Ensure doppelganger detects attestations in blocks (#2495)
## Issue Addressed

NA

## Proposed Changes

When testing our (not-yet-released) Doppelganger implementation, I noticed that we aren't detecting attestations included in blocks (only those on the gossip network).

This is because during [block processing](e8c0d1f19b/beacon_node/beacon_chain/src/beacon_chain.rs (L2168)) we only update the `observed_attestations` cache with each attestation, but not the `observed_attesters` cache. This is the correct behaviour when we consider the [p2p spec](https://github.com/ethereum/eth2.0-specs/blob/v1.0.1/specs/phase0/p2p-interface.md):

> [IGNORE] There has been no other valid attestation seen on an attestation subnet that has an identical attestation.data.target.epoch and participating validator index.

We're doing the right thing here and still allowing attestations on gossip that we've seen in a block. However, this doesn't work so nicely for Doppelganger.

To resolve this, I've taken the following steps:

- Add a `observed_block_attesters` cache.
- Rename `observed_attesters` to `observed_gossip_attesters`.

## TODO

- [x] Add a test to ensure a validator that's been seen in a block attestation (but not a gossip attestation) returns `true` for `BeaconChain::validator_seen_at_epoch`.
- [x] Add a test to ensure `observed_block_attesters` isn't polluted via gossip attestations and vice versa. 


Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-08-09 02:43:03 +00:00
Michael Sproul
17a2c778e3 Altair validator client and HTTP API (#2404)
## Proposed Changes

* Implement the validator client and HTTP API changes necessary to support Altair


Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-08-06 00:47:31 +00:00
Pawan Dhananjay
e8c0d1f19b Altair networking (#2300)
## Issue Addressed

Resolves #2278 

## Proposed Changes

Implements the networking components for the Altair hard fork https://github.com/ethereum/eth2.0-specs/blob/dev/specs/altair/p2p-interface.md

## Additional Info

This PR acts as the base branch for networking changes and tracks https://github.com/sigp/lighthouse/pull/2279 . Changes to gossip, rpc and discovery can be separate PRs to be merged here for ease of review.

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-08-04 01:44:57 +00:00
Michael Sproul
187425cdc1 Bump discv5 to v0.1.0-beta.9 (#2479)
Bump discv5 to fix the issues with IP filters and removing nodes.

~~Blocked on an upstream release, and more testnet data.~~
2021-08-03 01:05:06 +00:00
realbigsean
c5786a8821 Doppelganger detection (#2230)
## Issue Addressed

Resolves #2069 

## Proposed Changes

- Adds a `--doppelganger-detection` flag
- Adds a `lighthouse/seen_validators` endpoint, which will make it so the lighthouse VC is not interopable with other client beacon nodes if the `--doppelganger-detection` flag is used, but hopefully this will become standardized. Relevant Eth2 API repo issue: https://github.com/ethereum/eth2.0-APIs/issues/64
- If the `--doppelganger-detection` flag is used, the VC will wait until the beacon node is synced, and then wait an additional 2 epochs. The reason for this is to make sure the beacon node is able to subscribe to the subnets our validators should be attesting on. I think an alternative would be to have the beacon node subscribe to all subnets for 2+ epochs on startup by default.

## Additional Info

I'd like to add tests and would appreciate feedback. 

TODO:  handle validators started via the API, potentially make this default behavior

Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
2021-07-31 03:50:52 +00:00
realbigsean
303deb9969 Rust 1.54.0 lints (#2483)
## Issue Addressed

N/A

## Proposed Changes

- Removing a bunch of unnecessary references
- Updated `Error::VariantError` to `Error::Variant`
- There were additional enum variant lints that I ignored, because I thought our variant names were fine
- removed `MonitoredValidator`'s `pubkey` field, because I couldn't find it used anywhere. It looks like we just use the string version of the pubkey (the `id` field) if there is no index

## Additional Info



Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-07-30 01:11:47 +00:00
Paul Hauner
8efd9fc324 Add AttesterCache for attestation production (#2478)
## Issue Addressed

- Resolves #2169

## Proposed Changes

Adds the `AttesterCache` to allow validators to produce attestations for older slots. Presently, some arbitrary restrictions can force validators to receive an error when attesting to a slot earlier than the present one. This can cause attestation misses when there is excessive load on the validator client or time sync issues between the VC and BN.

## Additional Info

NA
2021-07-29 04:38:26 +00:00
Michael Sproul
923486f34c Use bulk verification for sync_aggregate signature (#2415)
## Proposed Changes

Add the `sync_aggregate` from `BeaconBlock` to the bulk signature verifier for blocks. This necessitates a new signature set constructor for the sync aggregate, which is different from the others due to the use of [`eth2_fast_aggregate_verify`](https://github.com/ethereum/eth2.0-specs/blob/v1.1.0-alpha.7/specs/altair/bls.md#eth2_fast_aggregate_verify) for sync aggregates, per [`process_sync_aggregate`](https://github.com/ethereum/eth2.0-specs/blob/v1.1.0-alpha.7/specs/altair/beacon-chain.md#sync-aggregate-processing). I made the choice to return an optional signature set, with `None` representing the case where the signature is valid on account of being the point at infinity (requires no further checking).

To "dogfood" the changes and prevent duplication, the consensus logic now uses the signature set approach as well whenever it is required to verify signatures (which should only be in testing AFAIK). The EF tests pass with the code as it exists currently, but failed before I adapted the `eth2_fast_aggregate_verify` changes (which is good).

As a result of this change Altair block processing should be a little faster, and importantly, we will no longer accidentally verify signatures when replaying blocks, e.g. when replaying blocks from the database.
2021-07-28 05:40:21 +00:00
Paul Hauner
6e3ca48cb9 Cache participating indices for Altair epoch processing (#2416)
## Issue Addressed

NA

## Proposed Changes

This PR addresses two things:

1. Allows the `ValidatorMonitor` to work with Altair states.
1. Optimizes `altair::process_epoch` (see [code](https://github.com/paulhauner/lighthouse/blob/participation-cache/consensus/state_processing/src/per_epoch_processing/altair/participation_cache.rs) for description)

## Breaking Changes

The breaking changes in this PR revolve around one premise:

*After the Altair fork, it's not longer possible (given only a `BeaconState`) to identify if a validator had *any* attestation included during some epoch. The best we can do is see if that validator made the "timely" source/target/head flags.*

Whilst this seems annoying, it's not actually too bad. Finalization is based upon "timely target" attestations, so that's really the most important thing. Although there's *some* value in knowing if a validator had *any* attestation included, it's far more important to know about "timely target" participation, since this is what affects finality and justification.

For simplicity and consistency, I've also removed the ability to determine if *any* attestation was included from metrics and API endpoints. Now, all Altair and non-Altair states will simply report on the head/target attestations.

The following section details where we've removed fields and provides replacement values.

### Breaking Changes: Prometheus Metrics

Some participation metrics have been removed and replaced. Some were removed since they are no longer relevant to Altair (e.g., total attesting balance) and others replaced with gwei values instead of pre-computed values. This provides more flexibility at display-time (e.g., Grafana).

The following metrics were added as replacements:

- `beacon_participation_prev_epoch_head_attesting_gwei_total`
- `beacon_participation_prev_epoch_target_attesting_gwei_total`
- `beacon_participation_prev_epoch_source_attesting_gwei_total`
- `beacon_participation_prev_epoch_active_gwei_total`

The following metrics were removed:

- `beacon_participation_prev_epoch_attester`
   - instead use `beacon_participation_prev_epoch_source_attesting_gwei_total / beacon_participation_prev_epoch_active_gwei_total`.
- `beacon_participation_prev_epoch_target_attester`
   - instead use `beacon_participation_prev_epoch_target_attesting_gwei_total / beacon_participation_prev_epoch_active_gwei_total`.
- `beacon_participation_prev_epoch_head_attester`
   - instead use `beacon_participation_prev_epoch_head_attesting_gwei_total / beacon_participation_prev_epoch_active_gwei_total`.

The `beacon_participation_prev_epoch_attester` endpoint has been removed. Users should instead use the pre-existing `beacon_participation_prev_epoch_target_attester`. 

### Breaking Changes: HTTP API

The `/lighthouse/validator_inclusion/{epoch}/{validator_id}` endpoint loses the following fields:

- `current_epoch_attesting_gwei` (use `current_epoch_target_attesting_gwei` instead)
- `previous_epoch_attesting_gwei` (use `previous_epoch_target_attesting_gwei` instead)

The `/lighthouse/validator_inclusion/{epoch}/{validator_id}` endpoint lose the following fields:

- `is_current_epoch_attester` (use `is_current_epoch_target_attester` instead)
- `is_previous_epoch_attester` (use `is_previous_epoch_target_attester` instead)
- `is_active_in_current_epoch` becomes `is_active_unslashed_in_current_epoch`.
- `is_active_in_previous_epoch` becomes `is_active_unslashed_in_previous_epoch`.

## Additional Info

NA

## TODO

- [x] Deal with total balances
- [x] Update validator_inclusion API
- [ ] Ensure `beacon_participation_prev_epoch_target_attester` and `beacon_participation_prev_epoch_head_attester` work before Altair

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-07-27 07:01:01 +00:00
Michael Sproul
f5bdca09ff Update to spec v1.1.0-beta.1 (#2460)
## Proposed Changes

Update to the latest version of the Altair spec, which includes new tests and a tweak to the target sync aggregators.

## Additional Info

This change is _not_ required for the imminent Altair devnet, and is waiting on the merge of #2321 to unstable.


Co-authored-by: Paul Hauner <paul@paulhauner.com>
2021-07-27 05:43:35 +00:00
Michael Sproul
84e6d71950 Tree hash caching and optimisations for Altair (#2459)
## Proposed Changes

Remove the remaining Altair `FIXME`s from consensus land.

1. Implement tree hash caching for the participation lists. This required some light type manipulation, including removing the `TreeHash` bound from `CachedTreeHash` which was purely descriptive.
2. Plumb the proposer index through Altair attestation processing, to avoid calculating it for _every_ attestation (potentially 128ms on large networks). This duplicates some work from #2431, but with the aim of getting it in sooner, particularly for the Altair devnets.
3. Removes two FIXMEs related to `superstruct` and cloning, which are unlikely to be particularly detrimental and will be tracked here instead: https://github.com/sigp/superstruct/issues/5
2021-07-23 00:23:53 +00:00
Michael Sproul
63923eaa29 Bump discv5 to v0.1.0-beta.8 (#2471)
## Proposed Changes

Update discv5 to fix bugs seen on `altair-devnet-1`
2021-07-21 07:10:52 +00:00
Age Manning
08fedbfcba
Libp2p Connection Limit (#2455)
* Get libp2p to handle connection limits

* fmt
2021-07-15 16:43:18 +10:00
Age Manning
6818a94171
Discovery update (#2458) 2021-07-15 16:43:18 +10:00
Age Manning
381befbf82
Ensure disconnecting peers are added to the peerdb (#2451) 2021-07-15 16:43:18 +10:00
Age Manning
059d9ec1b1
Gossipsub scoring improvements (#2391)
* Tweak gossipsub parameters for improved scoring

* Modify gossip history

* Update settings

* Make mesh window constant

* Decrease the mesh message deliveries weight

* Fmt
2021-07-15 16:43:18 +10:00
Age Manning
c62810b408
Update to Libp2p to 39.1 (#2448)
* Adjust beacon node timeouts for validator client HTTP requests (#2352)

Resolves #2313

Provide `BeaconNodeHttpClient` with a dedicated `Timeouts` struct.
This will allow granular adjustment of the timeout duration for different calls made from the VC to the BN. These can either be a constant value, or as a ratio of the slot duration.

Improve timeout performance by using these adjusted timeout duration's only whenever a fallback endpoint is available.

Add a CLI flag called `use-long-timeouts` to revert to the old behavior.

Additionally set the default `BeaconNodeHttpClient` timeouts to the be the slot duration of the network, rather than a constant 12 seconds. This will allow it to adjust to different network specifications.

Co-authored-by: Paul Hauner <paul@paulhauner.com>

* Use read_recursive locks in database (#2417)

Closes #2245

Replace all calls to `RwLock::read` in the `store` crate with `RwLock::read_recursive`.

* Unfortunately we can't run the deadlock detector on CI because it's pinned to an old Rust 1.51.0 nightly which cannot compile Lighthouse (one of our deps uses `ptr::addr_of!` which is too new). A fun side-project at some point might be to update the deadlock detector.
* The reason I think we haven't seen this deadlock (at all?) in practice is that _writes_ to the database's split point are quite infrequent, and a concurrent write is required to trigger the deadlock. The split point is only written when finalization advances, which is once per epoch (every ~6 minutes), and state reads are also quite sporadic. Perhaps we've just been incredibly lucky, or there's something about the timing of state reads vs database migration that protects us.
* I wrote a few small programs to demo the deadlock, and the effectiveness of the `read_recursive` fix: https://github.com/michaelsproul/relock_deadlock_mvp
* [The docs for `read_recursive`](https://docs.rs/lock_api/0.4.2/lock_api/struct.RwLock.html#method.read_recursive) warn of starvation for writers. I think in order for starvation to occur the database would have to be spammed with so many state reads that it's unable to ever clear them all and find time for a write, in which case migration of states to the freezer would cease. If an attack could be performed to trigger this starvation then it would likely trigger a deadlock in the current code, and I think ceasing migration is preferable to deadlocking in this extreme situation. In practice neither should occur due to protection from spammy peers at the network layer. Nevertheless, it would be prudent to run this change on the testnet nodes to check that it doesn't cause accidental starvation.

* Return more detail when invalid data is found in the DB during startup (#2445)

- Resolves #2444

Adds some more detail to the error message returned when the `BeaconChainBuilder` is unable to access or decode block/state objects during startup.

NA

* Use hardware acceleration for SHA256 (#2426)

Modify the SHA256 implementation in `eth2_hashing` so that it switches between `ring` and `sha2` to take advantage of [x86_64 SHA extensions](https://en.wikipedia.org/wiki/Intel_SHA_extensions). The extensions are available on modern Intel and AMD CPUs, and seem to provide a considerable speed-up: on my Ryzen 5950X it dropped state tree hashing times by about 30% from 35ms to 25ms (on Prater).

The extensions became available in the `sha2` crate [last year](https://www.reddit.com/r/rust/comments/hf2vcx/ann_rustcryptos_sha1_and_sha2_now_support/), and are not available in Ring, which uses a [pure Rust implementation of sha2](https://github.com/briansmith/ring/blob/main/src/digest/sha2.rs). Ring is faster on CPUs that lack the extensions so I've implemented a runtime switch to use `sha2` only when the extensions are available. The runtime switching seems to impose a miniscule penalty (see the benchmarks linked below).

* Start a release checklist (#2270)

NA

Add a checklist to the release draft created by CI. I know @michaelsproul was also working on this and I suspect @realbigsean also might have useful input.

NA

* Serious banning

* fmt

Co-authored-by: Mac L <mjladson@pm.me>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-07-15 16:43:18 +10:00
Age Manning
3c0d3227ab
Global Network Behaviour Refactor (#2442)
* Network upgrades (#2345)

* Discovery patch (#2382)

* Upgrade libp2p and unstable gossip

* Network protocol upgrades

* Correct dependencies, reduce incoming bucket limit

* Clean up dirty DHT entries before repopulating

* Update cargo lock

* Update lockfile

* Update ENR dep

* Update deps to specific versions

* Update test dependencies

* Update docker rust, and remote signer tests

* More remote signer test fixes

* Temp commit

* Update discovery

* Remove cached enrs after dialing

* Increase the session capacity, for improved efficiency

* Bleeding edge discovery (#2435)

* Update discovery banning logic and tokio

* Update to latest discovery

* Shift to latest discovery

* Fmt

* Initial re-factor of the behaviour

* More progress

* Missed changes

* First draft

* Discovery as a behaviour

* Adding back event waker (not convinced its neccessary, but have made this many changes already)

* Corrections

* Speed up discovery

* Remove double log

* Fmt

* After disconnect inform swarm about ban

* More fmt

* Appease clippy

* Improve ban handling

* Update tests

* Update cargo.lock

* Correct tests

* Downgrade log
2021-07-15 16:43:17 +10:00
Pawan Dhananjay
64226321b3
Relax requirement for enr fork digest predicate (#2433) 2021-07-15 16:43:17 +10:00
Age Manning
c1d2e35c9e
Bleeding edge discovery (#2435)
* Update discovery banning logic and tokio

* Update to latest discovery

* Shift to latest discovery

* Fmt
2021-07-15 16:43:17 +10:00
Age Manning
f4bc9db16d
Change the window mode of yamux (#2390) 2021-07-15 16:43:17 +10:00
Age Manning
6fb48b45fa
Discovery patch (#2382)
* Upgrade libp2p and unstable gossip

* Network protocol upgrades

* Correct dependencies, reduce incoming bucket limit

* Clean up dirty DHT entries before repopulating

* Update cargo lock

* Update lockfile

* Update ENR dep

* Update deps to specific versions

* Update test dependencies

* Update docker rust, and remote signer tests

* More remote signer test fixes

* Temp commit

* Update discovery

* Remove cached enrs after dialing

* Increase the session capacity, for improved efficiency
2021-07-15 16:43:17 +10:00
Age Manning
4aa06c9555
Network upgrades (#2345) 2021-07-15 16:43:10 +10:00
Paul Hauner
b0f5c4c776 Clarify eth1 error message (#2461)
## Issue Addressed

- Closes #2452

## Proposed Changes

Addresses: https://github.com/sigp/lighthouse/issues/2452#issuecomment-879873511

## Additional Info

NA
2021-07-15 04:22:06 +00:00
realbigsean
a3a7f39b0d [Altair] Sync committee pools (#2321)
Add pools supporting sync committees:
- naive sync aggregation pool
- observed sync contributions pool
- observed sync contributors pool
- observed sync aggregators pool

Add SSZ types and tests related to sync committee signatures.

Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-07-15 00:52:02 +00:00
Paul Hauner
fc4c611476 Remove msg about longer sync with remote eth1 nodes (#2453)
## Issue Addressed

- Resolves #2452

## Proposed Changes

I've seen a few people confused by this and I don't think the message is really worth it.

## Additional Info

NA
2021-07-14 05:24:09 +00:00
divma
304fb05e44 Maintain attestations that reference unknown blocks (#2319)
## Issue Addressed

#635 

## Proposed Changes
- Keep attestations that reference a block we have not seen for 30secs before being re processed
- If we do import the block before that time elapses, it is reprocessed in that moment
- The first time it fails, do nothing wrt to gossipsub propagation or peer downscoring. If after being re processed it fails, downscore with a `LowToleranceError` and ignore the message.
2021-07-14 05:24:08 +00:00
Paul Hauner
9656ffee7c Metrics for sync aggregate fullness (#2439)
## Issue Addressed

NA

## Proposed Changes

Adds a metric to see how many set bits are in the sync aggregate for each beacon block being imported.

## Additional Info

NA
2021-07-13 02:22:55 +00:00
Paul Hauner
27aec1962c Add more detail to "Prior attestation known" log (#2447)
## Issue Addressed

NA

## Proposed Changes

Adds more detail to the log when an attestation is ignored due to a prior one being known. This will help identify which validators are causing the issue.

## Additional Info

NA
2021-07-13 01:02:03 +00:00
Paul Hauner
a7b7134abb Return more detail when invalid data is found in the DB during startup (#2445)
## Issue Addressed

- Resolves #2444

## Proposed Changes

Adds some more detail to the error message returned when the `BeaconChainBuilder` is unable to access or decode block/state objects during startup.

## Additional Info

NA
2021-07-12 07:31:27 +00:00
Michael Sproul
371c216ac3 Use read_recursive locks in database (#2417)
## Issue Addressed

Closes #2245

## Proposed Changes

Replace all calls to `RwLock::read` in the `store` crate with `RwLock::read_recursive`.

## Additional Info

* Unfortunately we can't run the deadlock detector on CI because it's pinned to an old Rust 1.51.0 nightly which cannot compile Lighthouse (one of our deps uses `ptr::addr_of!` which is too new). A fun side-project at some point might be to update the deadlock detector.
* The reason I think we haven't seen this deadlock (at all?) in practice is that _writes_ to the database's split point are quite infrequent, and a concurrent write is required to trigger the deadlock. The split point is only written when finalization advances, which is once per epoch (every ~6 minutes), and state reads are also quite sporadic. Perhaps we've just been incredibly lucky, or there's something about the timing of state reads vs database migration that protects us.
* I wrote a few small programs to demo the deadlock, and the effectiveness of the `read_recursive` fix: https://github.com/michaelsproul/relock_deadlock_mvp
* [The docs for `read_recursive`](https://docs.rs/lock_api/0.4.2/lock_api/struct.RwLock.html#method.read_recursive) warn of starvation for writers. I think in order for starvation to occur the database would have to be spammed with so many state reads that it's unable to ever clear them all and find time for a write, in which case migration of states to the freezer would cease. If an attack could be performed to trigger this starvation then it would likely trigger a deadlock in the current code, and I think ceasing migration is preferable to deadlocking in this extreme situation. In practice neither should occur due to protection from spammy peers at the network layer. Nevertheless, it would be prudent to run this change on the testnet nodes to check that it doesn't cause accidental starvation.
2021-07-12 07:31:26 +00:00
Mac L
b3c7e59a5b Adjust beacon node timeouts for validator client HTTP requests (#2352)
## Issue Addressed

Resolves #2313 

## Proposed Changes

Provide `BeaconNodeHttpClient` with a dedicated `Timeouts` struct.
This will allow granular adjustment of the timeout duration for different calls made from the VC to the BN. These can either be a constant value, or as a ratio of the slot duration.

Improve timeout performance by using these adjusted timeout duration's only whenever a fallback endpoint is available.

Add a CLI flag called `use-long-timeouts` to revert to the old behavior.

## Additional Info

Additionally set the default `BeaconNodeHttpClient` timeouts to the be the slot duration of the network, rather than a constant 12 seconds. This will allow it to adjust to different network specifications.


Co-authored-by: Paul Hauner <paul@paulhauner.com>
2021-07-12 01:47:48 +00:00
Michael Sproul
b4689e20c6 Altair consensus changes and refactors (#2279)
## Proposed Changes

Implement the consensus changes necessary for the upcoming Altair hard fork.

## Additional Info

This is quite a heavy refactor, with pivotal types like the `BeaconState` and `BeaconBlock` changing from structs to enums. This ripples through the whole codebase with field accesses changing to methods, e.g. `state.slot` => `state.slot()`.


Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-07-09 06:15:32 +00:00
Paul Hauner
78e5c0c157 Capture a missed VC error (#2436)
## Issue Addressed

Related to #2430, #2394

## Proposed Changes

As per https://github.com/sigp/lighthouse/issues/2430#issuecomment-875323615, ensure that the `ProductionValidatorClient::new` error raises a log and shuts down the VC. Also, I implemened `spawn_ignoring_error`, as per @michaelsproul's suggestion in https://github.com/sigp/lighthouse/pull/2436#issuecomment-876084419.

I got unlucky and CI picked up a [new rustsec vuln](https://rustsec.org/advisories/RUSTSEC-2021-0072). To fix this, I had to update the following crates:

- `tokio`
- `web3`
- `tokio-compat-02`

## Additional Info

NA
2021-07-09 03:20:24 +00:00
Mac L
406e3921d9 Use forwards iterator for state root lookups (#2422)
## Issue Addressed

#2377 

## Proposed Changes

Implement the same code used for block root lookups (from #2376) to state root lookups in order to improve performance and reduce associated memory spikes (e.g. from certain HTTP API requests).

## Additional Changes

- Tests using `rev_iter_state_roots` and `rev_iter_block_roots` have been refactored to use their `forwards` versions instead.
- The `rev_iter_state_roots` and `rev_iter_block_roots` functions are now unused and have been removed.
- The `state_at_slot` function has been changed to use the `forwards` iterator.

## Additional Info

- Some tests still need to be refactored to use their `forwards_iter` versions. These tests start their iteration from a specific beacon state and thus use the `rev_iter_state_roots_from` and `rev_iter_block_roots_from` functions. If they can be refactored, those functions can also be removed.
2021-07-06 02:38:53 +00:00
Age Manning
73d002ef92 Update outdated dependencies (#2425)
This updates some older dependencies to address a few cargo audit warnings.

The majority of warnings come from network dependencies which will be addressed in #2389. 

This PR contains some minor dep updates that are not network related.

Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-07-05 00:54:17 +00:00
realbigsean
b84ff9f793 rust 1.53.0 updates (#2411)
## Issue Addressed

`make lint` failing on rust 1.53.0.

## Proposed Changes

1.53.0 updates

## Additional Info

I haven't figure out why yet, we were now hitting the recursion limit in a few crates. So I had to add `#![recursion_limit = "256"]` in a few places


Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-06-18 05:58:01 +00:00
Michael Sproul
3dc1eb5eb6 Ignore inactive validators in validator monitor (#2396)
## Proposed Changes

A user on Discord (`@ChewsMacRibs`) reported that the validator monitor was logging `WARN Attested to an incorrect head` for their validator while it was awaiting activation.

This PR modifies the monitor so that it ignores inactive validators, by the logic that they are either awaiting activation, or have already exited. Either way, there's no way for an inactive validator to have their attestations included on chain, so no need for the monitor to report on them.

## Additional Info

To reproduce the bug requires registering validator keys manually with `--validator-monitor-pubkeys`. I don't think the bug will present itself with `--validator-monitor-auto`.
2021-06-17 02:10:48 +00:00
Jack
98ab00cc52 Handle Geth pre-EIP-155 block sync error condition (#2304)
## Issue Addressed

#2293 

## Proposed Changes

 - Modify the handler for the `eth_chainId` RPC (i.e., `get_chain_id`) to explicitly match against the Geth error string returned for pre-EIP-155 synced Geth nodes
 - ~~Add a new helper function, `rpc_error_msg`, to aid in the above point~~
 - Refactor `response_result` into `response_result_or_error` and patch reliant RPC handlers accordingly (thanks to @pawanjay176)

## Additional Info

Geth, as of Pangaea Expanse (v1.10.0), returns an explicit error when it is not synced past the EIP-155 block (2675000). Previously, Geth simply returned a chain ID of 0 (which was obviously much easier to handle on Lighthouse's part).


Co-authored-by: Paul Hauner <paul@paulhauner.com>
2021-06-17 02:10:47 +00:00
realbigsean
b1657a60e9 Reorg events (#2090)
## Issue Addressed

Resolves #2088

## Proposed Changes

Add the `chain_reorg` SSE event topic

## Additional Info


Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
2021-06-17 02:10:46 +00:00
divma
3261eff0bf split outbound and inbound codecs encoded types (#2410)
Splits the inbound and outbound requests, for maintainability.
2021-06-17 00:40:16 +00:00
Paul Hauner
3b600acdc5 v1.4.0 (#2402)
## Issue Addressed

NA

## Proposed Changes

- Bump versions and update `Cargo.lock`

## Additional Info

NA

## TODO

- [x] Ensure #2398 gets merged succesfully
2021-06-10 01:44:49 +00:00
Paul Hauner
93100f221f Make less logs for attn with unknown head (#2395)
## Issue Addressed

NA

## Proposed Changes

I am starting to see a lot of slog-async overflows (i.e., too many logs) on Prater whenever we see attestations for an unknown block. Since these logs are identical (except for peer id) and we expose volume/count of these errors via `metrics::GOSSIP_ATTESTATION_ERRORS_PER_TYPE`, I took the following actions to remove them from `DEBUG` logs:

- Push the "Attestation for unknown block" log to trace.
- Add a debug log in `search_for_block`. In effect, this should serve as a de-duped version of the previous, downgraded log.

## Additional Info

TBC
2021-06-07 02:34:09 +00:00
Pawan Dhananjay
502402c6b9 Fix options for --eth1-endpoints flag (#2392)
## Issue Addressed

N/A

## Proposed Changes

Set `config.sync_eth1_chain` to true when using just the  `--eth1-endpoints` flag (without `--eth1`).
2021-06-04 00:10:59 +00:00
Paul Hauner
f6280aa663 v1.4.0-rc.0 (#2379)
## Issue Addressed

NA

## Proposed Changes

Bump versions.

## Additional Info

This is not exactly the v1.4.0 release described in [Lighthouse Update #36](https://lighthouse.sigmaprime.io/update-36.html).

Whilst it contains:

- Beta Windows support
- A reduction in Eth1 queries
- A reduction in memory footprint

It does not contain:

- Altair
- Doppelganger Protection
- The remote signer

We have decided to release some features early. This is primarily due to the desire to allow users to benefit from the memory saving improvements as soon as possible.

## TODO

- [x] Wait for #2340, #2356 and #2376 to merge and then rebase on `unstable`. 
- [x] Ensure discovery issues are fixed (see #2388)
- [x] Ensure https://github.com/sigp/lighthouse/pull/2382 is merged/removed.
- [x] Ensure https://github.com/sigp/lighthouse/pull/2383 is merged/removed.
- [x] Ensure https://github.com/sigp/lighthouse/pull/2384 is merged/removed.
- [ ] Double-check eth1 cache is carried between boots
2021-06-03 00:13:02 +00:00
Paul Hauner
90ea075c62 Revert "Network protocol upgrades (#2345)" (#2388)
## Issue Addressed

NA

## Proposed Changes

Reverts #2345 in the interests of getting v1.4.0 out this week. Once we have released that, we can go back to testing this again.

## Additional Info

NA
2021-06-02 01:07:28 +00:00
Paul Hauner
d34f922c1d Add early check for RPC block relevancy (#2289)
## Issue Addressed

NA

## Proposed Changes

When observing `jemallocator` heap profiles and Grafana, it became clear that Lighthouse is spending significant RAM/CPU on processing blocks from the RPC. On investigation, it seems that we are loading the parent of the block *before* we check to see if the block is already known. This is a big waste of resources.

This PR adds an additional `check_block_relevancy` call as the first thing we do when we try to process a `SignedBeaconBlock` via the RPC (or other similar methods). Ultimately, `check_block_relevancy` will be called again later in the block processing flow. It's a very light function and I don't think trying to optimize it out is worth the risk of a bad block slipping through. 

Also adds a `New RPC block received` info log when we process a new RPC block. This seems like interesting and infrequent info.

## Additional Info

NA
2021-06-02 01:07:27 +00:00
Paul Hauner
bf4e02e2cc Return a specific error for frozen attn states (#2384)
## Issue Addressed

NA

## Proposed Changes

Return a very specific error when at attestation reads shuffling from a frozen `BeaconState`. Previously, this was returning `MissingBeaconState` which indicates a much more serious issue.

## Additional Info

Since `get_inconsistent_state_for_attestation_verification_only` is only called once in `BeaconChain::with_committee_cache`, it is quite easy to reason about the impact of this change.
2021-06-01 06:59:43 +00:00
Paul Hauner
ba9c4c5eea Return more detail in Eth1 HTTP errors (#2383)
## Issue Addressed

NA

## Proposed Changes

Whilst investigating #2372, I [learned](https://github.com/sigp/lighthouse/issues/2372#issuecomment-851725049) that the error message returned from some failed Eth1 requests are always `NotReachable`. This makes debugging quite painful.

This PR adds more detail to these errors. For example:

- Bad infura key: `ERRO Failed to update eth1 cache             error: Failed to update Eth1 service: "All fallback errored: https://mainnet.infura.io/ => EndpointError(RequestFailed(\"Response HTTP status was not 200 OK:  401 Unauthorized.\"))", retry_millis: 60000, service: eth1_rpc`
- Unreachable server: `ERRO Failed to update eth1 cache             error: Failed to update Eth1 service: "All fallback errored: http://127.0.0.1:8545/ => EndpointError(RequestFailed(\"Request failed: reqwest::Error { kind: Request, url: Url { scheme: \\\"http\\\", cannot_be_a_base: false, username: \\\"\\\", password: None, host: Some(Ipv4(127.0.0.1)), port: Some(8545), path: \\\"/\\\", query: None, fragment: None }, source: hyper::Error(Connect, ConnectError(\\\"tcp connect error\\\", Os { code: 111, kind: ConnectionRefused, message: \\\"Connection refused\\\" })) }\"))", retry_millis: 60000, service: eth1_rpc`
- Bad server: `ERRO Failed to update eth1 cache             error: Failed to update Eth1 service: "All fallback errored: http://127.0.0.1:8545/ => EndpointError(RequestFailed(\"Response HTTP status was not 200 OK:  501 Not Implemented.\"))", retry_millis: 60000, service: eth1_rpc`

## Additional Info

NA
2021-06-01 06:59:41 +00:00
Paul Hauner
4c7bb4984c Use the forwards iterator more often (#2376)
## Issue Addressed

NA

## Primary Change

When investigating memory usage, I noticed that retrieving a block from an early slot (e.g., slot 900) would cause a sharp increase in the memory footprint (from 400mb to 800mb+) which seemed to be ever-lasting.

After some investigation, I found that the reverse iteration from the head back to that slot was the likely culprit. To counter this, I've switched the `BeaconChain::block_root_at_slot` to use the forwards iterator, instead of the reverse one.

I also noticed that the networking stack is using `BeaconChain::root_at_slot` to check if a peer is relevant (`check_peer_relevance`). Perhaps the steep, seemingly-random-but-consistent increases in memory usage are caused by the use of this function.

Using the forwards iterator with the HTTP API alleviated the sharp increases in memory usage. It also made the response much faster (before it felt like to took 1-2s, now it feels instant).

## Additional Changes

In the process I also noticed that we have two functions for getting block roots:

- `BeaconChain::block_root_at_slot`: returns `None` for a skip slot.
- `BeaconChain::root_at_slot`: returns the previous root for a skip slot.

I unified these two functions into `block_root_at_slot` and added the `WhenSlotSkipped` enum. Now, the caller must be explicit about the skip-slot behaviour when requesting a root. 

Additionally, I replaced `vec![]` with `Vec::with_capacity` in `store::chunked_vector::range_query`. I stumbled across this whilst debugging and made this modification to see what effect it would have (not much). It seems like a decent change to keep around, but I'm not concerned either way.

Also, `BeaconChain::get_ancestor_block_root` is unused, so I got rid of it 🗑️.

## Additional Info

I haven't also done the same for state roots here. Whilst it's possible and a good idea, it's more work since the fwds iterators are presently block-roots-specific.

Whilst there's a few places a reverse iteration of state roots could be triggered (e.g., attestation production, HTTP API), they're no where near as common as the `check_peer_relevance` call. As such, I think we should get this PR merged first, then come back for the state root iters. I made an issue here https://github.com/sigp/lighthouse/issues/2377.
2021-05-31 04:18:20 +00:00
Kevin Lu
320a683e72 Minimum Outbound-Only Peers Requirement (#2356)
## Issue Addressed

#2325 

## Proposed Changes

This pull request changes the behavior of the Peer Manager by including a minimum outbound-only peers requirement. The peer manager will continue querying for peers if this outbound-only target number hasn't been met. Additionally, when peers are being removed, an outbound-only peer will not be disconnected if doing so brings us below the minimum. 

## Additional Info

Unit test for heartbeat function tests that disconnection behavior is correct. Continual querying for peers if outbound-only hasn't been met is not directly tested, but indirectly through unit testing of the helper function that counts the number of outbound-only peers.

EDIT: Am concerned about the behavior of ```update_peer_scores```. If we have connected to a peer with a score below the disconnection threshold (-20), then its connection status will remain connected, while its score state will change to disconnected. 

```rust
let previous_state = info.score_state();            
// Update scores            
info.score_update();
Self::handle_score_transitions(                
               previous_state,
                peer_id,
                info, 
               &mut to_ban_peers,
               &mut to_unban_peers,
               &mut self.events,
               &self.log,
);
```

```previous_state``` will be set to Disconnected, and then because ```handle_score_transitions``` only changes connection status for a peer if the state changed, the peer remains connected. Then in the heartbeat code, because we only disconnect healthy peers if we have too many peers, these peers don't get disconnected. I'm not sure realistically how often this scenario would occur, but it might be better to adjust the logic to account for scenarios where the score state implies a connection status different from the current connection status. 

Co-authored-by: Kevin Lu <kevlu93@gmail.com>
2021-05-31 04:18:19 +00:00
Mac L
0847986936 Reduce outbound requests to eth1 endpoints (#2340)
## Issue Addressed

#2282 

## Proposed Changes

Reduce the outbound requests made to eth1 endpoints by caching the results from `eth_chainId` and `net_version`.
Further reduce the overall request count by increasing `auto_update_interval_millis` from `7_000` (7 seconds) to `60_000` (1 minute). 
This will result in a reduction from ~2000 requests per hour to 360 requests per hour (during normal operation). A reduction of 82%.

## Additional Info

If an endpoint fails, its state is dropped from the cache and the `eth_chainId` and `net_version` calls will be made for that endpoint again during the regular update cycle (once per minute) until it is back online.


Co-authored-by: Paul Hauner <paul@paulhauner.com>
2021-05-31 04:18:18 +00:00
Age Manning
ec5cceba50 Correct issue with dialing peers (#2375)
The ordering of adding new peers to the peerdb and deciding when to dial them was not considered in a previous update.

This adds the condition that if a peer is not in the peer-db then it is an acceptable peer to dial.

This makes #2374 obsolete.
2021-05-29 07:25:06 +00:00
Age Manning
d12e746b50 Network protocol upgrades (#2345)
This provides a number of upgrades to gossipsub and discovery. 

The updates are extensive and this needs thorough testing.
2021-05-28 22:02:10 +00:00
Paul Hauner
456b313665 Tune GNU malloc (#2299)
## Issue Addressed

NA

## Proposed Changes

Modify the configuration of [GNU malloc](https://www.gnu.org/software/libc/manual/html_node/The-GNU-Allocator.html) to reduce memory footprint.

- Set `M_ARENA_MAX` to 4.
    - This reduces memory fragmentation at the cost of contention between threads.
- Set `M_MMAP_THRESHOLD` to 2mb
    - This means that any allocation >= 2mb is allocated via an anonymous mmap, instead of on the heap/arena. This reduces memory fragmentation since we don't need to keep growing the heap to find big contiguous slabs of free memory.
- ~~Run `malloc_trim` every 60 seconds.~~
    - ~~This shaves unused memory from the top of the heap, preventing the heap from constantly growing.~~
    - Removed, see: https://github.com/sigp/lighthouse/pull/2299#issuecomment-825322646

*Note: this only provides memory savings on the Linux (glibc) platform.*
    
## Additional Info

I'm going to close #2288 in favor of this for the following reasons:

- I've managed to get the memory footprint *smaller* here than with jemalloc.
- This PR seems to be less of a dramatic change than bringing in the jemalloc dep.
- The changes in this PR are strictly runtime changes, so we can create CLI flags which disable them completely. Since this change is wide-reaching and complex, it's nice to have an easy "escape hatch" if there are undesired consequences.

## TODO

- [x] Allow configuration via CLI flags
- [x] Test on Mac
- [x] Test on RasPi.
- [x] Determine if GNU malloc is present?
    - I'm not quite sure how to detect for glibc.. This issue suggests we can't really: https://github.com/rust-lang/rust/issues/33244
- [x] Make a clear argument regarding the affect of this on CPU utilization.
- [x] Test with higher `M_ARENA_MAX` values.
- [x] Test with longer trim intervals
- [x] Add some stats about memory savings
- [x] Remove `malloc_trim` calls & code
2021-05-28 05:59:45 +00:00
Pawan Dhananjay
fdaeec631b Monitoring service api (#2251)
## Issue Addressed

N/A

## Proposed Changes

Adds a client side api for collecting system and process metrics and pushing it to a monitoring service.
2021-05-26 05:58:41 +00:00
Age Manning
55aada006f
More stringent dialing (#2363)
* More stringent dialing

* Cover cached enr dialing
2021-05-26 14:21:44 +10:00
ethDreamer
ba55e140ae Enable Compatibility with Windows (#2333)
## Issue Addressed

Windows incompatibility.

## Proposed Changes

On windows, lighthouse needs to default to STDIN as tty doesn't exist. Also Windows uses ACLs for file permissions. So to mirror chmod 600, we will remove every entry in a file's ACL and add only a single SID that is an alias for the file owner.

Beyond that, there were several changes made to different unit tests because windows has slightly different error messages as well as frustrating nuances around killing a process :/

## Additional Info

Tested on my Windows VM and it appears to work, also compiled & tested on Linux with these changes. Permissions look correct on both platforms now. Just waiting for my validator to activate on Prater so I can test running full validator client on windows.

Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>
Co-authored-by: Michael Sproul <micsproul@gmail.com>
2021-05-19 23:05:16 +00:00
ethDreamer
cb47388ad7 Updated to comply with new clippy formatting rules (#2336)
## Issue Addressed

The latest version of Rust has new clippy rules & the codebase isn't up to date with them.

## Proposed Changes

Small formatting changes that clippy tells me are functionally equivalent
2021-05-10 00:53:09 +00:00
Mac L
bacc38c3da Add testing for beacon node and validator client CLI flags (#2311)
## Issue Addressed

N/A

## Proposed Changes

Add unit tests for the various CLI flags associated with the beacon node and validator client. These changes require the addition of two new flags: `dump-config` and `immediate-shutdown`.

## Additional Info

Both `dump-config` and `immediate-shutdown` are marked as hidden since they should only be used in testing and other advanced use cases.
**Note:** This requires changing `main.rs` so that the flags can adjust the program behavior as necessary.

Co-authored-by: Paul Hauner <paul@paulhauner.com>
2021-05-06 00:36:22 +00:00
Mac L
4cc613d644 Add SensitiveUrl to redact user secrets from endpoints (#2326)
## Issue Addressed

#2276 

## Proposed Changes

Add the `SensitiveUrl` struct which wraps `Url` and implements custom `Display` and `Debug` traits to redact user secrets from being logged in eth1 endpoints, beacon node endpoints and metrics.

## Additional Info

This also includes a small rewrite of the eth1 crate to make requests using `Url` instead of `&str`. 
Some error messages have also been changed to remove `Url` data.
2021-05-04 01:59:51 +00:00
ethDreamer
0aa8509525 Filter Disconnected Peers from Discv5 DHT (#2219)
## Issue Addressed
#2107

## Proposed Change
The peer manager will mark peers as disconnected in the discv5 DHT when they disconnect or dial fails

## Additional Info
Rationale for this particular change is explained in my comment on #2107
2021-04-28 04:07:37 +00:00
realbigsean
2c2c443718 404's on API requests for slots that have been skipped or orphaned (#2272)
## Issue Addressed

Resolves #2186

## Proposed Changes

404 for any block-related information on a slot that was skipped or orphaned

Affected endpoints:
- `/eth/v1/beacon/blocks/{block_id}`
- `/eth/v1/beacon/blocks/{block_id}/root`
- `/eth/v1/beacon/blocks/{block_id}/attestations`
- `/eth/v1/beacon/headers/{block_id}`

## Additional Info



Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-04-25 03:59:59 +00:00
Paul Hauner
3a24ca5f14 v1.3.0 (#2310)
## Issue Addressed

NA

## Proposed Changes

Bump versions.

## Additional Info

This is a minor release (not patch) due to the very slight change introduced by #2291.
2021-04-13 22:46:34 +00:00
Michael Sproul
3b901dc5ec Pack attestations into blocks in parallel (#2307)
## Proposed Changes

Use two instances of max cover when packing attestations into blocks: one for the previous epoch, and one for the current epoch. This reduces the amount of computation done by roughly half due to the `O(n^2)` running time of max cover (`2 * (n/2)^2 = n^2/2`). This should help alleviate some load on block proposal, particularly on Prater.
2021-04-13 05:27:42 +00:00
Paul Hauner
c1203f5e52 Add specific log and metric for delayed blocks (#2308)
## Issue Addressed

NA

## Proposed Changes

- Adds a specific log and metric for when a block is enshrined as head with a delay that will caused bad attestations
    - We *technically* already expose this information, but it's a little tricky to determine during debugging. This makes it nice and explicit.
- Fixes a minor reporting bug with the validator monitor where it was expecting agg. attestations too early (at half-slot rather than two-thirds-slot).

## Additional Info

NA
2021-04-13 02:16:59 +00:00
Paul Hauner
0df7be1814 Add check for aggregate target (#2306)
## Issue Addressed
NA

## Proposed Changes

- Ensure that the [target consistency check](b356f52c5c) is always performed on aggregates.
- Add a regression test.

## Additional Info

NA
2021-04-13 00:24:39 +00:00
Age Manning
aaa14073ff Clean up warnings (#2240)
This is a small PR that cleans up compiler warnings. 

The most controversial change is removing the `data_dir` field from the `BeaconChainBuilder`. 

It was removed because it was never read.


Co-authored-by: Paul Hauner <paul@paulhauner.com>
Co-authored-by: Herman Junge <hermanjunge@protonmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-04-12 00:57:43 +00:00
Mac L
f6f64cf0f5 Correcting disable-enr-auto-update flag definition (#2303)
## Issue Addressed

N/A

## Proposed Changes

Correct the `disable-enr-auto-update` boolean flag so that it no longer requires a value.
Previously it would require a value which was never used.

## Additional Info

Flag is read here: https://github.com/sigp/lighthouse/blob/unstable/beacon_node/src/config.rs#L585-L587
2021-04-11 23:52:29 +00:00
Paul Hauner
e7e5878953 Avoid BeaconState clone during metrics scrape (#2298)
## Issue Addressed

Which issue # does this PR address?

## Proposed Changes

Avoids cloning the `BeaconState` each time Prometheus scrapes our metrics (generally every 5s 😱).

I think the original motivation behind this was *"don't hold the lock on the head whilst we do computation on it"*, however I think is flawed since our computation here is so small that it'll be quicker than the clone.

The primary motivation here is to maintain a small memory footprint by holding less in memory (i.e., the cloned `BeaconState`) and to avoid the fragmentation-creep that occurs when cloning the big contiguous slabs of memory in the `BeaconState`.

I also collapsed the active/slashed/withdrawn counters into a single loop to increase efficiency.

## Additional Info

NA
2021-04-07 01:02:56 +00:00
Pawan Dhananjay
95a362213d Fix local testnet scripts (#2229)
## Issue Addressed

Resolves #2094 

## Proposed Changes

Fixes scripts for creating local testnets. Adds an option in `lighthouse boot_node` to run with a previously generated enr.
2021-03-30 05:17:58 +00:00
Paul Hauner
9eb1945136 v1.2.2 (#2287)
## Issue Addressed

NA

## Proposed Changes

- Bump versions

## Additional Info

NA
2021-03-30 04:07:03 +00:00
Paul Hauner
3d239b85ac Allow for a clock disparity on the duties endpoints (#2283)
## Issue Addressed

Resolves #2280

## Proposed Changes

Allows for API consumers to call the proposer/attester duties endpoints [`MAXIMUM_GOSSIP_CLOCK_DISPARITY`](b34a79dc0b/beacon_node/beacon_chain/src/beacon_chain.rs (L99-L102)) earlier than the current epoch. For additional reasoning, see https://github.com/sigp/lighthouse/issues/2280#issuecomment-805358897.

## Additional Info

NA
2021-03-29 23:42:35 +00:00
Paul Hauner
03cefd0065 Expand observed attestations capacity (#2266)
## Issue Addressed

NA

## Proposed Changes

I noticed the following error on one of our nodes:

```
Mar 18 00:03:35 ip-xxxx lighthouse-bn[333503]: Mar 18 00:03:35.103 ERRO Unable to validate aggregate            error: ObservedAttestersError(EpochTooLow { epoch: Epoch(23961), lowest_permissible_epoch: Epoch(23962) }), peer_id: 16Uiu2HAm5GL5KzPLhvfg9MBBFSpBqTVGRFSiTg285oezzWcZzwEv
```

The slot during this log was 766,815 (the last slot of the epoch). I believe this is due to an off-by-one error in `observed_attesters` where we were failing to provide enough capacity to store observations from the previous, current and next epochs. See code comments for further reasoning.

Here's a link to the spec: https://github.com/ethereum/eth2.0-specs/blob/v1.0.1/specs/phase0/p2p-interface.md#beacon_aggregate_and_proof

## Additional Info

NA
2021-03-29 23:42:34 +00:00
Michael Sproul
f9d60f5436 VC: accept unknown fields in chain spec (#2277)
## Issue Addressed

Closes #2274

## Proposed Changes

* Modify the `YamlConfig` to collect unknown fields into an `extra_fields` map, instead of failing hard.
* Log a debug message if there are extra fields returned to the VC from one of its BNs.

This restores Lighthouse's compatibility with Teku beacon nodes (and therefore Infura)
2021-03-26 04:53:57 +00:00
Paul Hauner
b34a79dc0b v1.2.1 (#2263)
## Issue Addressed

NA

## Proposed Changes

- Bump version.
- Add some new ENR for Prater
    - Afri: https://github.com/eth2-clients/eth2-testnets/pull/42
    - Prysm: https://github.com/eth2-clients/eth2-testnets/pull/43
- Apply the fixes from #2181 to the no-eth1-sim to try fix CI issues. 

## Additional Info

NA
2021-03-18 04:20:46 +00:00
Paul Hauner
015ab7d0a7 Optimize validator duties (#2243)
## Issue Addressed

Closes #2052

## Proposed Changes

- Refactor the attester/proposer duties endpoints in the BN
    - Performance improvements
    - Fixes some potential inconsistencies with the dependent root fields.
    - Removes `http_api::beacon_proposer_cache` and just uses the one on the `BeaconChain` instead.
    - Move the code for the proposer/attester duties endpoints into separate files, for readability.
- Refactor the `DutiesService` in the VC
    - Required to reduce the delay on broadcasting new blocks.
    - Gets rid of the `ValidatorDuty` shim struct that came about when we adopted the standard API.
    - Separate block/attestation duty tasks so that they don't block each other when one is slow.
- In the VC, use `PublicKeyBytes` to represent validators instead of `PublicKey`. `PublicKey` is a legit crypto object whilst `PublicKeyBytes` is just a byte-array, it's much faster to clone/hash `PublicKeyBytes` and this change has had a significant impact on runtimes.
    - Unfortunately this has created lots of dust changes.
 - In the BN, store `PublicKeyBytes` in the `beacon_proposer_cache` and allow access to them. The HTTP API always sends `PublicKeyBytes` over the wire and the conversion from `PublicKey` -> `PublickeyBytes` is non-trivial, especially when queries have 100s/1000s of validators (like Pyrmont).
 - Add the `state_processing::state_advance` mod which dedups a lot of the "apply `n` skip slots to the state" code.
    - This also fixes a bug with some functions which were failing to include a state root as per [this comment](072695284f/consensus/state_processing/src/state_advance.rs (L69-L74)). I couldn't find any instance of this bug that resulted in anything more severe than keying a shuffling cache by the wrong block root.
 - Swap the VC block service to use `mpsc` from `tokio` instead of `futures`. This is consistent with the rest of the code base.
    
~~This PR *reduces* the size of the codebase 🎉~~ It *used* to reduce the size of the code base before I added more comments. 

## Observations on Prymont

- Proposer duties times down from peaks of 450ms to consistent <1ms.
- Current epoch attester duties times down from >1s peaks to a consistent 20-30ms.
- Block production down from +600ms to 100-200ms.

## Additional Info

- ~~Blocked on #2241~~
- ~~Blocked on #2234~~

## TODO

- [x] ~~Refactor this into some smaller PRs?~~ Leaving this as-is for now.
- [x] Address `per_slot_processing` roots.
- [x] Investigate slow next epoch times. Not getting added to cache on block processing?
- [x] Consider [this](072695284f/beacon_node/store/src/hot_cold_store.rs (L811-L812)) in the scenario of replacing the state roots


Co-authored-by: pawan <pawandhananjay@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-03-17 05:09:57 +00:00
Michael Sproul
3919737978 Release v1.2.0 (#2249)
## Proposed Changes

Release v1.2.0 unchanged from the release candidate.
2021-03-10 01:28:32 +00:00
Michael Sproul
770a2ca030 Fix proposer cache priming upon state advance (#2252)
## Proposed Changes

While investigating an incorrect head + target vote for the epoch boundary block 708544, I noticed that the state advance failed to prime the proposer cache, as per these logs:

```
Mar 09 21:42:47.448 DEBG Subscribing to subnet                   target_slot: 708544, subnet: Y, service: attestation_service
Mar 09 21:49:08.063 DEBG Advanced head state one slot            current_slot: 708543, state_slot: 708544, head_root: 0xaf5e69de09f384ee3b4fb501458b7000c53bb6758a48817894ec3d2b030e3e6f, service: state_advance
Mar 09 21:49:08.063 DEBG Completed state advance                 initial_slot: 708543, advanced_slot: 708544, head_root: 0xaf5e69de09f384ee3b4fb501458b7000c53bb6758a48817894ec3d2b030e3e6f, service: state_advance
Mar 09 21:49:14.787 DEBG Proposer shuffling cache miss           block_slot: 708544, block_root: 0x9b14bf68667ab1d9c35e6fd2c95ff5d609aa9e8cf08e0071988ae4aa00b9f9fe, parent_slot: 708543, parent_root: 0xaf5e69de09f384ee3b4fb501458b7000c53bb6758a48817894ec3d2b030e3e6f, service: beacon
Mar 09 21:49:14.800 DEBG Successfully processed gossip block     root: 0x9b14bf68667ab1d9c35e6fd2c95ff5d609aa9e8cf08e0071988ae4aa00b9f9fe, slot: 708544, graffiti: , service: beacon
Mar 09 21:49:14.800 INFO New block received                      hash: 0x9b14…f9fe, slot: 708544
Mar 09 21:49:14.984 DEBG Head beacon block                       slot: 708544, root: 0x9b14…f9fe, finalized_epoch: 22140, finalized_root: 0x28ec…29a7, justified_epoch: 22141, justified_root: 0x59db…e451, service: beacon
Mar 09 21:49:15.055 INFO Unaggregated attestation                validator: XXXXX, src: api, slot: 708544, epoch: 22142, delay_ms: 53, index: Y, head: 0xaf5e69de09f384ee3b4fb501458b7000c53bb6758a48817894ec3d2b030e3e6f, service: val_mon
Mar 09 21:49:17.001 DEBG Slot timer                              sync_state: Synced, current_slot: 708544, head_slot: 708544, head_block: 0x9b14…f9fe, finalized_epoch: 22140, finalized_root: 0x28ec…29a7, peers: 55, service: slot_notifier
```

The reason for this is that the condition was backwards, so that whole block of code was unreachable.

Looking at the attestations for the block included in the block after, we can see that lots of validators missed it. Some of them may be Lighthouse v1.1.1-v1.2.0-rc.0, but it's probable that they would have missed even with the proposer cache primed, given how late the block 708544 arrived (the cache miss occurred 3.787s after the slot start): https://beaconcha.in/block/708545#attestations
2021-03-10 00:20:50 +00:00
Michael Sproul
786e25ea08 Release candidate v1.2.0-rc.0 (#2248)
Prepare for v1.2.0 with this release candidate.

To be merged after #2247 and #2246

Co-authored-by: Age Manning <Age@AgeManning.com>
2021-03-08 06:27:50 +00:00
Age Manning
babd153352 Prevent adding and dialing bootnodes when discovery is disabled (#2247)
This is a small PR which prevents unwanted bootnodes from being added to the DHT and being dialed when the `--disable-discovery` flag is set. 

The main reason one would want to disable discovery is to connect to a fix set of peers. Currently, regardless of what the user does, Lighthouse will populate its DHT with previously known peers and also fill it with the spec's bootnodes. It will then dial the bootnodes that are capable of being dialed. This prevents testing with a fixed peer list.

This PR prevents these excess nodes from being added and dialed if the user has set `--disable-discovery`.
2021-03-08 06:27:49 +00:00
Paul Hauner
e4eb0eb168 Use advanced state for block production (#2241)
## Issue Addressed

NA

## Proposed Changes

- Use the pre-states from #2174 during block production.
    - Running this on Pyrmont shows block production times dropping from ~550ms to ~150ms.
- Create `crit` and `warn` logs when a block is published to the API later than we expect.
    - On mainnet we are issuing a warn if the block is published more than 1s later than the slot start and a crit for more than 3s.
- Rename some methods on the `SnapshotCache` for clarity.
- Add the ability to pass the state root to `BeaconChain::produce_block_on_state` to avoid computing a state root. This is a very common LH optimization.
- Add a metric that tracks how late we broadcast blocks received from the HTTP API. This is *technically* a duplicate of a `ValidatorMonitor` log, but I wanted to have it for the case where we aren't monitoring validators too.
2021-03-04 04:43:31 +00:00
Michael Sproul
363f15f362 Use the database to persist the pubkey cache (#2234)
## Issue Addressed

Closes #1787

## Proposed Changes

* Abstract the `ValidatorPubkeyCache` over a "backing" which is either a file (legacy), or the database.
* Implement a migration from schema v2 to schema v3, whereby the contents of the cache file are copied to the DB, and then the file is deleted. The next release to include this change must be a minor version bump, and we will need to warn users of the inability to downgrade (this is our first DB schema change since mainnet genesis).
* Move the schema migration code from the `store` crate into the `beacon_chain` crate so that it can access the datadir and the `ValidatorPubkeyCache`, etc. It gets injected back into the `store` via a closure (similar to what we do in fork choice).
2021-03-04 01:25:12 +00:00
Age Manning
1c507c588e Update to the latest libp2p (#2239)
Updates to the latest libp2p and ignores RUSTSEC-2020-0146 from cargo-audit


Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-03-02 05:59:49 +00:00
realbigsean
ed9b245de0 update tokio-stream to 0.1.3 and use BroadcastStream (#2212)
## Issue Addressed

Resolves #2189 

## Proposed Changes

use tokio's `BroadcastStream`

## Additional Info

N/A


Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-03-01 01:58:05 +00:00
Michael Sproul
2f077b11fe Allow HTTP API to return SSZ blocks (#2209)
## Issue Addressed

Implements https://github.com/ethereum/eth2.0-APIs/pull/125

## Proposed Changes

Optionally return SSZ bytes from the `beacon/blocks` endpoint.
2021-02-24 04:15:14 +00:00
realbigsean
5bc93869c8 Update ValidatorStatus to match the v1 API (#2149)
## Issue Addressed

N/A

## Proposed Changes

We are currently a bit off of the standard API spec because we have [this](https://hackmd.io/bQxMDRt1RbS1TLno8K4NPg?view) proposal implemented for validator status.  Based on discussion [here](https://github.com/ethereum/eth2.0-APIs/pull/94), it looks like this won't be added to the spec until v2, so this PR implements [this](https://hackmd.io/ofFJ5gOmQpu1jjHilHbdQQ) validator status logic instead

## Additional Info

N/A


Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-02-24 04:15:13 +00:00
Paul Hauner
a764c3b247 Handle early blocks (#2155)
## Issue Addressed

NA

## Problem this PR addresses

There's an issue where Lighthouse is banning a lot of peers due to the following sequence of events:

1. Gossip block 0xabc arrives ~200ms early
    - It is propagated across the network, with respect to [`MAXIMUM_GOSSIP_CLOCK_DISPARITY`](https://github.com/ethereum/eth2.0-specs/blob/v1.0.0/specs/phase0/p2p-interface.md#why-is-there-maximum_gossip_clock_disparity-when-validating-slot-ranges-of-messages-in-gossip-subnets).
    - However, it is not imported to our database since the block is early.
2. Attestations for 0xabc arrive, but the block was not imported.
    - The peer that sent the attestation is down-voted.
        - Each unknown-block attestation causes a score loss of 1, the peer is banned at -100.
        - When the peer is on an attestation subnet there can be hundreds of attestations, so the peer is banned quickly (before the missed block can be obtained via rpc).

## Potential solutions

I can think of three solutions to this:

1. Wait for attestation-queuing (#635) to arrive and solve this.
    - Easy
    - Not immediate fix.
    - Whilst this would work, I don't think it's a perfect solution for this particular issue, rather (3) is better.
1. Allow importing blocks with a tolerance of `MAXIMUM_GOSSIP_CLOCK_DISPARITY`.
    - Easy
    - ~~I have implemented this, for now.~~
1. If a block is verified for gossip propagation (i.e., signature verified) and it's within `MAXIMUM_GOSSIP_CLOCK_DISPARITY`, then queue it to be processed at the start of the appropriate slot.
    - More difficult
    - Feels like the best solution, I will try to implement this.
    
    
**This PR takes approach (3).**

## Changes included

- Implement the `block_delay_queue`, based upon a [`DelayQueue`](https://docs.rs/tokio-util/0.6.3/tokio_util/time/delay_queue/struct.DelayQueue.html) which can store blocks until it's time to import them.
- Add a new `DelayedImportBlock` variant to the `beacon_processor::WorkEvent` enum to handle this new event.
- In the `BeaconProcessor`, refactor a `tokio::select!` to a struct with an explicit `Stream` implementation. I experienced some issues with `tokio::select!` in the block delay queue and I also found it hard to debug. I think this explicit implementation is nicer and functionally equivalent (apart from the fact that `tokio::select!` randomly chooses futures to poll, whereas now we're deterministic).
- Add a testing framework to the `beacon_processor` module that tests this new block delay logic. I also tested a handful of other operations in the beacon processor (attns, slashings, exits) since it was super easy to copy-pasta the code from the `http_api` tester.
    - To implement these tests I added the concept of an optional `work_journal_tx` to the `BeaconProcessor` which will spit out a log of events. I used this in the tests to ensure that things were happening as I expect.
    - The tests are a little racey, but it's hard to avoid that when testing timing-based code. If we see CI failures I can revise. I haven't observed *any* failures due to races on my machine or on CI yet.
    - To assist with testing I allowed for directly setting the time on the `ManualSlotClock`.
- I gave the `beacon_processor::Worker` a `Toolbox` for two reasons; (a) it avoids changing tons of function sigs when you want to pass a new object to the worker and (b) it seemed cute.
2021-02-24 03:08:52 +00:00
Paul Hauner
46920a84e8 v1.1.3 (#2217)
## Issue Addressed

NA

## Proposed Changes

Bump versions

## Additional Info

NA
2021-02-22 06:21:38 +00:00
Paul Hauner
4362ea4f98 Fix false positive "State advance too slow" logs (#2218)
## Issue Addressed

- Resolves #2214

## Proposed Changes

Fix the false positive warning log described in #2214.

## Additional Info

NA
2021-02-21 23:47:53 +00:00
Paul Hauner
8949ae7c4e Address ENR update loop (#2216)
## Issue Addressed

- Resolves #2215

## Proposed Changes

Addresses a potential loop when the majority of peers indicate that we are contactable via an IPv6 address.

See https://github.com/sigp/discv5/pull/62 for further rationale.

## Additional Info

The alternative to this PR is to use `--disable-enr-auto-update` and then manually supply an `--enr-address` and `--enr-upd-port`. However, that requires the user to know their IP addresses in order for discovery to work properly. This might not be practical/achievable for some users, hence this hotfix.
2021-02-21 23:47:52 +00:00
Paul Hauner
8c6537e71d v1.1.2 (#2213)
## Issue Addressed

NA

## Proposed Changes

Bump versions

## Additional Info

NA
2021-02-19 00:49:32 +00:00
Paul Hauner
f8cc82f2b1 Switch back to warp with cors wildcard support (#2211)
## Issue Addressed

- Resolves #2204
- Resolves #2205

## Proposed Changes

Switches to my fork of `warp` which contains support for cors wildcards: https://github.com/paulhauner/warp/tree/cors-wildcard

I have a PR open on the `warp` repo but it hasn't had any interest from the maintainers as of yet: https://github.com/seanmonstar/warp/pull/726. I think running from a fork is the best we can do for now.

## Additional Info

NA
2021-02-18 22:33:12 +00:00
Lion - dapplion
613382f304 Add slot offset computing to be downloaded slot (#2198)
The current implementation assumes the range offset of slots downloaded on a batch to equal zero. This conflicts with the condition to consider this chain as sync. For finalized sync, it results in one extra batch being downloaded which can't be processed.

CC @wemeetagain
2021-02-18 08:24:46 +00:00
Paul Hauner
f819ba5414 v1.1.1 (#2202)
## Issue Addressed

NA

## Proposed Changes

Bump versions
2021-02-16 00:09:02 +00:00
Pawan Dhananjay
4a357c9947 Upgrade rand_core (#2201)
## Issue Addressed

N/A

## Proposed Changes

Upgrade `rand_core` to latest version to fix https://rustsec.org/advisories/RUSTSEC-2021-0023
2021-02-15 20:34:49 +00:00
Paul Hauner
88cc222204 Advance state to next slot after importing block (#2174)
## Issue Addressed

NA

## Proposed Changes

Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`.

Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking).

There were several ancillary changes required to achieve this: 

- Remove the `state_root` field  of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`.
    - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block.
- Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it.
- Address https://github.com/ethereum/eth2.0-specs/pull/2196
- Add a debug log when we get a block with an unknown parent. There was previously no logging around this case.
- Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot.

## Additional Info

- ~~Blocked on #2173~~
- ~~Blocked on #2179~~ That PR was wrapped into this PR.
- There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later.

## TODO

- [x] Tidy, improve comments.
- [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
Paul Hauner
3000f3e5da Dht persistence on drop (v2) (#2200)
## Issue Addressed

NA

## Proposed Changes

This is simply #2177 with a merge conflict fixed.

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-02-15 06:09:55 +00:00
Paul Hauner
8e5c20b6d1 Update for clippy 1.50 (#2193)
## Issue Addressed

NA

## Proposed Changes

Rust 1.50 has landed 🎉

The shiny new `clippy` peers down upon us mere mortals with disgust. Brutish peasants wrapping our `usize`s in superfluous `Option`s... tsk tsk.

I've performed the goat sacrifice and corrected our evil ways in this PR. Tonight we shall pray that Github Actions bestows the almighty green tick upon us.

## Additional Info

NA


Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-02-15 00:09:12 +00:00
realbigsean
e20f64b21a Update to tokio 1.1 (#2172)
## Issue Addressed

resolves #2129
resolves #2099 
addresses some of #1712
unblocks #2076
unblocks #2153 

## Proposed Changes

- Updates all the dependencies mentioned in #2129, except for web3. They haven't merged their tokio 1.0 update because they are waiting on some dependencies of their own. Since we only use web3 in tests, I think updating it in a separate issue is fine. If they are able to merge soon though, I can update in this PR. 

- Updates `tokio_util` to 0.6.2 and `bytes` to 1.0.1.

- We haven't made a discv5 release since merging tokio 1.0 updates so I'm using a commit rather than release atm. **Edit:** I think we should merge an update of `tokio_util` to 0.6.2 into discv5 before this release because it has panic fixes in `DelayQueue`  --> PR in discv5:  https://github.com/sigp/discv5/pull/58

## Additional Info

tokio 1.0 changes that required some changes in lighthouse:

- `interval.next().await.is_some()` -> `interval.tick().await`
- `sleep` future is now `!Unpin` -> https://github.com/tokio-rs/tokio/issues/3028
- `try_recv` has been temporarily removed from `mpsc` -> https://github.com/tokio-rs/tokio/issues/3350
- stream features have moved to `tokio-stream` and `broadcast::Receiver::into_stream()` has been temporarily removed -> `https://github.com/tokio-rs/tokio/issues/2870
- I've copied over the `BroadcastStream` wrapper from this PR, but can update to use `tokio-stream` once it's merged https://github.com/tokio-rs/tokio/pull/3384

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-02-10 23:29:49 +00:00
Paul Hauner
e383ef3e91 Avoid temp allocations with slog (#2183)
## Issue Addressed

Which issue # does this PR address?

## Proposed Changes

Replaces use of `format!` in `slog` logging with it's special no-allocation `?` and `%` shortcuts. According to a `heaptrack` analysis today over about a period of an hour, this will reduce temporary allocations by at least 4%.

## Additional Info

NA
2021-02-04 07:31:47 +00:00
Paul Hauner
ff35fbb121 Add metrics for beacon block propagation (#2173)
## Issue Addressed

NA

## Proposed Changes

Adds some metrics to track delays regarding:

- LH processing of blocks
- delays receiving blocks from other nodes.

## Additional Info

NA
2021-02-04 05:33:56 +00:00
Akihito Nakano
1a22a096c6 Fix clippy errors on tests (#2160)
## Issue Addressed

There are some clippy error on tests.


## Proposed Changes

Enable clippy check on tests and fix the errors. 💪
2021-01-28 23:31:06 +00:00
Paul Hauner
e4b62139d7 v1.1.0 (#2168)
## Issue Addressed

NA

## Proposed Changes

- Bump version
- ~~Run `cargo update`~~

## Additional Info

NA
2021-01-21 02:37:08 +00:00
Paul Hauner
2b2a358522 Detailed validator monitoring (#2151)
## Issue Addressed

- Resolves #2064

## Proposed Changes

Adds a `ValidatorMonitor` struct which provides additional logging and Grafana metrics for specific validators.

Use `lighthouse bn --validator-monitor` to automatically enable monitoring for any validator that hits the [subnet subscription](https://ethereum.github.io/eth2.0-APIs/#/Validator/prepareBeaconCommitteeSubnet) HTTP API endpoint.

Also, use `lighthouse bn --validator-monitor-pubkeys` to supply a list of validators which will always be monitored.

See the new docs included in this PR for more info.

## TODO

- [x] Track validator balance, `slashed` status, etc.
- [x] ~~Register slashings in current epoch, not offense epoch~~
- [ ] Publish Grafana dashboard, update TODO link in docs
- [x] ~~#2130 is merged into this branch, resolve that~~
2021-01-20 19:19:38 +00:00
Paul Hauner
1eb0915301 Fix bug from #2163 (#2165)
## Issue Addressed

NA

## Proposed Changes

Fixes a bug that I missed during a review in #2163. I found this bug by observing that nodes were receiving far less attestations (~1/2 of previous).

I'm not certain on *exactly* how this mistake manifested in a reduction in attestations, but the mistake touches so much code that I think it's reasonable to declare that this it the cause of the observed issue (drop in attestations).

## Additional Info

NA
2021-01-20 10:28:12 +00:00
Paul Hauner
b06559ae97 Disallow attestation production earlier than head (#2130)
## Issue Addressed

The non-finality period on Pyrmont between epochs [`9114`](https://pyrmont.beaconcha.in/epoch/9114) and [`9182`](https://pyrmont.beaconcha.in/epoch/9182) was contributed to by all the `lighthouse_team` validators going down. The nodes saw excessive CPU and RAM usage, resulting in the system to kill the `lighthouse bn` process. The `Restart=on-failure` directive for `systemd` caused the process to bounce in ~10-30m intervals.

Diagnosis with `heaptrack` showed that the `BeaconChain::produce_unaggregated_attestation` function was calling `store::beacon_state::get_full_state` and sometimes resulting in a tree hash cache allocation. These allocations were approximately the size of the hosts physical memory and still allocated when `lighthouse bn` was killed by the OS.

There was no CPU analysis (e.g., `perf`), but the `BeaconChain::produce_unaggregated_attestation` is very CPU-heavy so it is reasonable to assume it is the cause of the excessive CPU usage, too.

## Proposed Changes

`BeaconChain::produce_unaggregated_attestation` has two paths:

1. Fast path: attesting to the head slot or later.
2. Slow path: attesting to a slot earlier than the head block.

Path (2) is the only path that calls `store::beacon_state::get_full_state`, therefore it is the path causing this excessive CPU/RAM usage.

This PR removes the current functionality of path (2) and replaces it with a static error (`BeaconChainError::AttestingPriorToHead`).

This change reduces the generality of `BeaconChain::produce_unaggregated_attestation` (and therefore [`/eth/v1/validator/attestation_data`](https://ethereum.github.io/eth2.0-APIs/#/Validator/produceAttestationData)), but I argue that this functionality is an edge-case and arguably a violation of the [Honest Validator spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md).

It's possible that a validator goes back to a prior slot to "catch up" and submit some missed attestations. This change would prevent such behaviour, returning an error. My concerns with this catch-up behaviour is that it is:

- Not specified as "honest validator" attesting behaviour.
- Is behaviour that is risky for slashing (although, all validator clients *should* have slashing protection and will eventually fail if they do not).
- It disguises clock-sync issues between a BN and VC.

## Additional Info

It's likely feasible to implement path (2) if we implement some sort of caching mechanism. This would be a multi-week task and this PR gets the issue patched in the short term. I haven't created an issue to add path (2), instead I think we should implement it if we get user-demand.
2021-01-20 06:52:37 +00:00
Paul Hauner
d9f940613f Represent slots in secs instead of millisecs (#2163)
## Issue Addressed

NA

## Proposed Changes

Copied from #2083, changes the config milliseconds_per_slot to seconds_per_slot to avoid errors when slot duration is not a multiple of a second. To avoid deserializing old serialized data (with milliseconds instead of seconds) the Serialize and Deserialize derive got removed from the Spec struct (isn't currently used anyway).

This PR replaces #2083 for the purpose of fixing a merge conflict without requiring the input of @blacktemplar.

## Additional Info

NA


Co-authored-by: blacktemplar <blacktemplar@a1.net>
2021-01-19 09:39:51 +00:00
Paul Hauner
805e152f66 Simplify enum -> str with strum (#2164)
## Issue Addressed

NA

## Proposed Changes

As per #2100, uses derives from the sturm library to implement AsRef<str> and AsStaticRef to easily get str values from enums without creating new Strings. Furthermore unifies all attestation error counter into one IntCounterVec vector.

These works are originally by @blacktemplar, I've just created this PR so I can resolve some merge conflicts.

## Additional Info

NA


Co-authored-by: blacktemplar <blacktemplar@a1.net>
2021-01-19 06:33:58 +00:00
realbigsean
7a71977987 Clippy 1.49.0 updates and dht persistence test fix (#2156)
## Issue Addressed

`test_dht_persistence` failing

## Proposed Changes

Bind `NetworkService::start` to an underscore prefixed variable rather than `_`.  `_` was causing it to be dropped immediately

This was failing 5/100 times before this update, but I haven't been able to get it to fail after updating it

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-01-19 00:34:28 +00:00
Pawan Dhananjay
28238d97b1 Disconnect from peers quicker on internet issues (#2147)
## Issue Addressed

Fixes #2146 

## Proposed Changes

Change ping timeout errors to return `LowToleranceErrors` so that we disconnect faster on internet failures/changes.
2021-01-13 08:09:10 +00:00
realbigsean
423dea169c update smallvec (#2152)
## Issue Addressed

`cargo audit` is failing because of a potential for an overflow in the version of `smallvec` we're using

## Proposed Changes

Update to the latest version of `smallvec`, which has the fix


Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-01-11 23:32:11 +00:00
Arthur Woimbée
851a4dca3c replace tempdir by tempfile (#2143)
## Issue Addressed

Fixes #2141 
Remove [tempdir](https://docs.rs/tempdir/0.3.7/tempdir/) in favor of [tempfile](https://docs.rs/tempfile/3.1.0/tempfile/).

## Proposed Changes

`tempfile` has a slightly different api that makes creating temp folders with a name prefix a chore (`tempdir::TempDir::new("toto")` => `tempfile::Builder::new().prefix("toto").tempdir()`).

So I removed temp folder name prefix where I deemed it not useful.

Otherwise, the functionality is the same.
2021-01-06 06:36:11 +00:00
Age Manning
7e4b190df0 Reduce ping interval (#2132)
## Issue Addressed

#2123

## Description

Reduces the TCP ping interval to increase our responsiveness to peer liveness changes.
2021-01-06 04:35:52 +00:00
realbigsean
588b90157d Ssz state api endpoint (#2111)
## Issue Addressed

Catching up to a recently merged API spec PR: https://github.com/ethereum/eth2.0-APIs/pull/119

## Proposed Changes

- Return an SSZ beacon state on `/eth/v1/debug/beacon/states/{stateId}` when passed this header: `accept: application/octet-stream`.
- requests to this endpoint with no  `accept` header or an `accept` header and a value of `application/json` or `*/*` , or will result in a JSON response

## Additional Info


Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-01-06 03:01:46 +00:00
Samuel E. Moelius
939fa717fd test_decode_malicious_status_message improvements (#2104)
## Issue Addressed

None

## Proposed Changes

* Correct typo in one comment, elaborate some others.
* Add asserts to ensure comments match code.
* Eliminate one unnecessary `clone`.

## Additional Info

None
2021-01-06 01:10:26 +00:00
Samuel E. Moelius
0245ddd37b Fix typo in ssz_snappy.rs comment (#2103)
## Issue Addressed

None

## Proposed Changes

Correct a typo in `ssz_snappy.rs`.

## Additional Info

Pedantry at it finest.
2021-01-06 01:10:24 +00:00
Paul Hauner
f183af20e3 Version v1.0.6 (#2126)
## Issue Addressed

NA

## Proposed Changes

- Bump versions
- Run `cargo update`

## Additional Info

NA
2020-12-28 23:38:02 +00:00
Akihito Nakano
78d17c3255 Tweak error messages for ease of investigation (#2122)
## Proposed Changes

<!-- Please list or describe the changes introduced by this PR. -->

Tweaked the error message for ease of investigation as `Failed to update eth1 cache` is used in multiple places. 😃
2020-12-28 01:25:33 +00:00
Paul Hauner
9ed65a64f8 Version v1.0.5 (#2117)
## Issue Addressed

NA

## Proposed Changes

- Bump versions to `v1.0.5`
- Run `cargo update`

## Additional Info

NA
2020-12-23 18:52:48 +00:00
Age Manning
2931b05582 Update libp2p (#2101)
This is a little bit of a tip-of-the-iceberg PR. It houses a lot of code changes in the libp2p dependency. 

This needs a bit of thorough testing before merging. 

The primary code changes are:
- General libp2p dependency update
- Gossipsub refactor to shift compression into gossipsub providing performance improvements and improved API for handling compression



Co-authored-by: Paul Hauner <paul@paulhauner.com>
2020-12-23 07:53:36 +00:00
Samuel E. Moelius
3381266998 Eliminate uses of expect in ssz_snappy.rs (#2105)
## Issue Addressed

None

## Proposed Changes

Eliminate three uses of `expect` in `ssz_snappy.rs`.

## Additional Info

None
2020-12-22 02:28:37 +00:00
Michael Sproul
e5bf2576f1 Optimise tree hash caching for block production (#2106)
## Proposed Changes

`@potuz` on the Eth R&D Discord observed that Lighthouse blocks on Pyrmont were always arriving at other nodes after at least 1 second. Part of this could be due to processing and slow propagation, but metrics also revealed that the Lighthouse nodes were usually taking 400-600ms to even just produce a block before broadcasting it.

I tracked the slowness down to the lack of a pre-built tree hash cache (THC) on the states being used for block production. This was due to using the head state for block production, which lacks a THC in order to keep fork choice fast (cloning a THC takes at least 30ms for 100k validators). This PR modifies block production to clone a state from the snapshot cache rather than the head, which speeds things up by 200-400ms by avoiding the tree hash cache rebuild. In practice this seems to have cut block production time down to 300ms or less. Ideally we could _remove_ the snapshot from the cache (and save the 30ms), but it is required for when we re-process the block after signing it with the validator client.

## Alternatives

I experimented with 2 alternatives to this approach, before deciding on it:

* Alternative 1: ensure the `head` has a tree hash cache. This is too slow, as it imposes a +30ms hit on fork choice, which currently takes ~5ms (with occasional spikes).
* Alternative 2: use `Arc<BeaconSnapshot>` in the snapshot cache and share snapshots between the cache and the `head`. This made fork choice blazing fast (1ms), and block production the same as in this PR, but had a negative impact on block processing which I don't think is worth it. It ended up being necessary to clone the full state from the snapshot cache during block production, imposing the +30ms penalty there _as well_ as in block production.

In contract, the approach in this PR should only impact block production, and it improves it! Yay for pareto improvements 🎉

## Additional Info

This commit (ac59dfa) is currently running on all the Lighthouse Pyrmont nodes, and I've added a dashboard to the Pyrmont grafana instance with the metrics.

In future work we should optimise the attestation packing, which consumes around 30-60ms and is now a substantial contributor to the total.
2020-12-21 06:29:39 +00:00
Paul Hauner
a62dc65ca4 BN Fallback v2 (#2080)
## Issue Addressed

- Resolves #1883

## Proposed Changes

This follows on from @blacktemplar's work in #2018.

- Allows the VC to connect to multiple BN for redundancy.
  - Update the simulator so some nodes always need to rely on their fallback.
- Adds some extra deprecation warnings for `--eth1-endpoint`
- Pass `SignatureBytes` as a reference instead of by value.

## Additional Info

NA

Co-authored-by: blacktemplar <blacktemplar@a1.net>
2020-12-18 09:17:03 +00:00
Pawan Dhananjay
f998eff7ce Subnet discovery fixes (#2095)
## Issue Addressed

N/A

## Proposed Changes

Fixes multiple issues related to discovering of subnet peers.
1. Subnet discovery retries after yielding no results
2. Metadata updates if peer send older metadata
3. peerdb stores the peer subscriptions from gossipsub
2020-12-17 00:39:15 +00:00
blacktemplar
3fcc517993 Fix Syncing Simulator (#2049)
## Issue Addressed

NA

## Proposed Changes

Fixes problems with slot times below 1 second which got revealed by running the syncing simulator with the default speedup time.
2020-12-16 05:37:38 +00:00
Michael Sproul
0c529b8d52 Add slasher broadcast (#2079)
## Issue Addressed

Closes #2048

## Proposed Changes

* Broadcast slashings when the `--slasher-broadcast` flag is provided.
* In the process of implementing this I refactored the slasher service into its own crate so that it could access the network code without creating a circular dependency. I moved the responsibility for putting slashings into the op pool into the service as well, as it makes sense for it to handle the whole slashing lifecycle.
2020-12-16 03:44:01 +00:00
Pawan Dhananjay
63eeb14a81 Improve eth1 fallback logging (#2096)
## Issue Addressed

N/A

## Proposed Changes

There seemed to be confusion among discord users on the eth1 fallback logging
```
WARN Error connecting to eth1 node. Trying fallback ..., endpoint: http://127.0.0.1:8545/, service: eth1_rpc
```
The assumption users seem to be making here is that it is trying the fallback and fallback=endpoint in the log.

This PR improves the logging to be like
```
WARN Error connecting to eth1 node endpoint, endpoint: http://127.0.0.1:8545/, action: trying fallbacks, service: eth1_rpc
```

I think this is a bit more clear that the endpoint that failed is the one in the log.
2020-12-16 02:39:09 +00:00
divma
11c299cbf6 impl Resource Unavailable RPC error (#2072)
## Issue Addressed

Related to #1891, The error is not in the spec yet (see ethereum/eth2.0-specs#2131)

## Proposed Changes

Implement the proposed error, banning peers that send it

## Additional Info

NA
2020-12-15 00:17:32 +00:00
blacktemplar
701843aaa0 Update dependencies (#2084)
## Issue Addressed

Partially addresses dependencies mentioned in issue #1712.

## Proposed Changes

Updates dependencies (including an update avoiding a vulnerability) + add tokio compatibility to `remote_signer_test`
2020-12-14 02:28:19 +00:00
Michael Sproul
1abc70e815 Version v1.0.4 (#2073)
## Proposed Changes

Run cargo update and bump version in prep for v1.0.4 release

## Additional Info

Planning to merge this commit to `unstable`, test on Pyrmont and canary nodes, then push to `stable`.
2020-12-10 04:01:40 +00:00
Age Manning
dfb588e521 Softer penalties for missing blocks (#2075)
## Issue Addressed

Users are reporting errors for sending attestations to peers. If the clock sync is a little out or we receive attestations before blocks, peers are being too harshly penalized. They can get scored many times per missing block and we typically need these peers on subnets. 


## Proposed Changes

This removes the penalization for missing blocks with attestations. The penalty should be handled when #635 gets built as it will allow us to group attestations per missing block and penalize once.
2020-12-10 00:40:12 +00:00
Michael Sproul
aa45fa3ff7 Revert fork choice if disk write fails (#2068)
## Issue Addressed

Closes #2028
Replaces #2059

## Proposed Changes

If writing to the database fails while importing a block, revert fork choice to the last version stored on disk. This prevents fork choice from being ahead of the blocks on disk. Having fork choice ahead is particularly bad if it is later successfully written to disk, because it renders the database corrupt (see #2028).

## Additional Info

* This mitigation might fail if the head+fork choice haven't been persisted yet, which can only happen at first startup (see #2067)
* This relies on it being OK for the head tracker to be ahead of fork choice. I figure this is tolerable because blocks only get added to the head tracker after successfully being written on disk _and_ to fork choice, so even if fork choice reverts a little bit, when the pruning algorithm runs, those blocks will still be on disk and OK to prune. The pruning algorithm also doesn't rely on heads being unique, technically it's OK for multiple blocks from the same linear chain segment to be present in the head tracker. This begs the question of #1785 (i.e. things would be simpler with the head tracker out of the way). Alternatively, this PR could just revert the head tracker as well (I'll look into this tomorrow).
2020-12-09 05:10:34 +00:00
Michael Sproul
82753f842d Improve compile time (#1989)
## Issue Addressed

Closes #1264

## Proposed Changes

* Milagro BLS: tweak the feature flags so that Milagro doesn't get compiled if we're using BLST. Profiling showed that it was consuming about 1 minute of CPU time out of 60 minutes of CPU time (real time ~15 mins). A 1.6% saving.
* Reduce monomorphization: compiling for 3 different `EthSpec` types causes a heck of a lot of generic functions to be instantiated (monomorphized). Removing 2 of 3 cuts the LLVM+linking step from around 250 seconds to 180 seconds, a saving of 70 seconds (real time!). This applies only to `make` and not the CI build, because we test with the minimal spec on CI.
* Update `web3` crate to v0.13. This is perhaps the most controversial change, because it requires axing some deposit contract tools from `lcli`. I suspect these tools weren't used much anyway, and could be maintained separately, but I'm also happy to revert this change. However, it does save us a lot of compile time. With #1839, we now have 3 versions of Tokio (and all of Tokio's deps). This change brings us down to 2 versions, but 1 should be achievable once web3 (and reqwest) move to Tokio 0.3.
* Remove `lcli` from the Docker image. It's a dev tool and can be built from the repo if required.
2020-12-09 01:34:58 +00:00
Age Manning
4f85371ce8 Downgrades a valid log (#2057)
## Issue Addressed

#2046 

## Proposed Changes

The log was originally intended to verify the correct logic and ordering of events when scoring peers. The queued tasks can be structured in such a way that peers can be banned after they are disconnected. Therefore the error log is now downgraded to  debug log.
2020-12-08 10:48:45 +00:00
divma
57489e620f fix default network handling (#2029)
## Issue Addressed
#1992 and #1987, and also to be considered a continuation of #1751

## Proposed Changes
many changed files but most are renaming to align the code with the semantics of `--network` 
- remove the `--network` default value (in clap) and instead set it after checking the `network` and `testnet-dir` flags
- move `eth2_testnet_config` crate to `eth2_network_config`
- move `Eth2TestnetConfig` to `Eth2NetworkConfig`
- move `DEFAULT_HARDCODED_TESTNET` to `DEFAULT_HARDCODED_NETWORK`
- `beacon_node`s `get_eth2_testnet_config` loads the `DEFAULT_HARDCODED_NETWORK` if there is no network nor testnet provided
- `boot_node`s config loads the config same as the `beacon_node`, it was using the configuration only for preconfigured networks (That code is ~1year old so I asume it was not intended)
- removed a one year old comment stating we should try to emulate `https://github.com/eth2-clients/eth2-testnets/tree/master/nimbus/testnet1` it looks outdated (?)
- remove `lighthouse`s `load_testnet_config` in favor of `get_eth2_network_config` to centralize that logic (It had differences)
- some spelling

## Additional Info
Both the command of #1992 and the scripts of #1987 seem to work fine, same as `bn` and `vc`
2020-12-08 05:41:10 +00:00
divma
f3200784b4 More metrics + RPC tweaks (#2041)
## Issue Addressed

NA

## Proposed Changes
This was mostly done to find the reason why LH was dropping peers from Nimbus. It proved to be useful so I think it's worth it. But there is also some functional stuff here
- Add metrics for rpc errors per client, error type and direction
- Add metrics for downscoring events per source type, client and penalty type
- Add metrics for gossip validation results per client for non-accepted messages
- Make the RPC handler return errors and requests/responses in the order we see them
- Allow a small burst for the Ping rate limit, from 1 every 5 seconds to 2 every 10 seconds
- Send rate limiting errors with a particular code and use that same code to identify them. I picked something different to 128 since that is most likely what other clients are using for their own errors
- Remove some unused code in the `PeerAction` and the rpc handler
- Remove the unused variant `RateLimited`. tTis was never produced directly, since the only way to get the request's protocol is via de handler. The handler upon receiving from LH a response with an error (rate limited in this case) emits this event with the missing info (It was always like this, just pointing out that we do downscore rate limiting errors regardless of the change)

Metrics for Nimbus looked like this:
Downscoring events: `increase(libp2p_peer_actions_per_client{client="Nimbus"}[5m])`
![image](https://user-images.githubusercontent.com/26765164/101210880-862bf280-3676-11eb-94c0-399f0bf5aa2e.png)

RPC Errors: `increase(libp2p_rpc_errors_per_client{client="Nimbus"}[5m])`
![image](https://user-images.githubusercontent.com/26765164/101210997-ba071800-3676-11eb-847a-f32405ede002.png)

Unaccepted gossip message: `increase(gossipsub_unaccepted_messages_per_client{client="Nimbus"}[5m])`
![image](https://user-images.githubusercontent.com/26765164/101211124-f470b500-3676-11eb-9459-132ecff058ec.png)
2020-12-08 03:55:50 +00:00
blacktemplar
a28e8decbf update dependencies (#2032)
## Issue Addressed

NA

## Proposed Changes

Updates out of date dependencies.

## Additional Info

See also https://github.com/sigp/lighthouse/issues/1712 for a list of dependencies that are still out of date and the resasons.
2020-12-07 08:20:33 +00:00
Michael Sproul
c1ec386d18 Pass failed gossip blocks to the slasher (#2047)
## Issue Addressed

Closes #2042

## Proposed Changes

Pass blocks that fail gossip verification to the slasher. Blocks that are successfully verified are not passed immediately, but will be passed as part of full block verification.
2020-12-04 05:03:30 +00:00
Pawan Dhananjay
7933596c89 Add a purge-eth1-cache cli option (#2039)
## Issue

Some eth1 clients are missing deposit logs on mainnet for multiple reasons (not fully synced, eth1 client issues) because of which we are getting `FailedToInsertDeposit` errors.
Ideally, LH should pick up where it left off after pointing it to a nice eth1 client endpoint (which has all deposits). 

However, I have seen instances where LH keeps getting `FailedToInsertDeposit` even after switching to a good endpoint. Only deleting the beacon directory (which also wipes the eth1 cache) and resyncing the eth1 caches seems to be the solution. This wouldn't be great for mainnet if you have to sync your beacon node again as well.

## Proposed Changes

Add a `--purge-eth1-db` option which just wipes the eth1 cache and doesn't touch the rest of the beacon db. 
Still need to investigate if and why LH isn't picking up where it left off for the deposit logs sync, but I think it would be good to have an option to just delete eth1 caches regardless.
2020-12-04 05:03:28 +00:00
realbigsean
fdfb81a74a Server sent events (#1920)
## Issue Addressed

Resolves #1434 (this is the last major feature in the standard spec. There are only a couple of places we may be off-spec due to recent spec changes or ongoing discussion)
Partly addresses #1669
 
## Proposed Changes

- remove the websocket server
- remove the `TeeEventHandler` and `NullEventHandler` 
- add server sent events according to the eth2 API spec

## Additional Info

This is according to the currently unmerged PR here: https://github.com/ethereum/eth2.0-APIs/pull/117


Co-authored-by: realbigsean <seananderson33@gmail.com>
2020-12-04 00:18:58 +00:00
realbigsean
2b5c0df9e5 Validators endpoint status code (#2040)
## Issue Addressed

Resolves #2035 

## Proposed Changes

Update 405's to 400's for failures when we are parsing path params.

## Additional Info

Haven't updated the same for non-standard endpoints

Co-authored-by: realbigsean <seananderson33@gmail.com>
2020-12-03 23:10:08 +00:00
Age Manning
2682f46025 Fingerprint new client identify agent string (#2027)
Nimbus have modified their identify agent string. 

This PR adds their new agent string to identify new nimbus peers.
2020-12-03 22:07:14 +00:00
Pawan Dhananjay
482695142a Minor fixes (#2038)
Fixes a couple of low hanging fruits.

- Fixes #2037 
- `validators-dir` and `secrets-dir` flags don't really need to depend upon each other
- Fixes #2006 and Fixes #1995
2020-12-03 01:10:28 +00:00
blacktemplar
d8cda2d86e Fix new clippy lints (#2036)
## Issue Addressed

NA

## Proposed Changes

Fixes new clippy lints in the whole project (mainly [manual_strip](https://rust-lang.github.io/rust-clippy/master/index.html#manual_strip) and [unnecessary_lazy_evaluations](https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_lazy_evaluations)). Furthermore, removes `to_string()` calls on literals when used with the `?`-operator.
2020-12-03 01:10:26 +00:00
Paul Hauner
b8bd80d2fb Add Content-Type to metrics server (#2019)
## Issue Addressed

- Resolves #2013

## Proposed Changes

Adds the `Content-Type text/plain` header as per #2013

## Additional Info

NA
2020-12-01 00:04:46 +00:00
Paul Hauner
65dcdc361b Bump version to v1.0.3 (#2024)
## Issue Addressed

NA

## Proposed Changes

- Set version to `v1.0.3`
- Run cargo update

## Additional Info

- ~~Blocked on #2008~~
2020-11-30 22:55:10 +00:00
Age Manning
c718e81eaf Add privacy option (#2016)
Adds a `--privacy` CLI flag to the beacon node that users may opt into. 

This does two things:
- Removes client identifying information from the identify libp2p protocol
- Changes the default graffiti to "" if no graffiti is set.
2020-11-30 22:55:08 +00:00
Paul Hauner
77f3539654 Improve eth1 block sync (#2008)
## Issue Addressed

NA

## Proposed Changes

- Log about eth1 whilst waiting for genesis.
- For the block and deposit caches, update them after each download instead of when *all* downloads are complete.
  - This prevents the case where a single timeout error can cause us to drop *all* previously download blocks/deposits.
- Set `max_log_requests_per_update` to avoid timeouts due to very large log counts in a response.
- Set `max_blocks_per_update` to prevent a single update of the block cache to download an unreasonable number of blocks.
  - This shouldn't have any affect in normal use, it's just a safe-guard against bugs.
- Increase the timeout for eth1 calls from 15s to 60s, as per @pawanjay176's experience with Infura.

## Additional Info

NA
2020-11-30 20:29:17 +00:00
divma
8fcd22992c No string in slog (#2017)
## Issue Addressed

Following slog's documentation, this should help a bit with string allocations. I left it run for two days and mem usage is lower. This is of course anecdotal, but shouldn't harm anyway 

## Proposed Changes

remove `String` creation in logs when possible
2020-11-30 10:33:00 +00:00
Paul Hauner
85e69249e6 Drop discovery log to trace (#2007)
## Issue Addressed

NA

## Proposed Changes

This was causing:

```
Nov 28 21:56:08.154 ERRO slog-async: logger dropped messages due to channel overflow, count: 44, service: libp2p
```

## Additional Info

NA
2020-11-29 03:02:23 +00:00
Age Manning
f7183098ee Bump to version v1.0.2 (#2001)
Update lighthouse to version `v1.0.2`. 

There are two major updates in this version:
- Updates to the task executor to tokio 0.3 and all sub-dependencies relying on core execution, including libp2p
- Update BLST
2020-11-28 13:22:37 +00:00
Age Manning
a567f788bd Upgrade to tokio 0.3 (#1839)
## Description

This PR updates Lighthouse to tokio 0.3. It includes a number of dependency updates and some structural changes as to how we create and spawn tasks.

This also brings with it a number of various improvements:

- Discv5 update
- Libp2p update
- Fix for recompilation issues
- Improved UPnP port mapping handling
- Futures dependency update
- Log downgrade to traces for rejecting peers when we've reached our max



Co-authored-by: blacktemplar <blacktemplar@a1.net>
2020-11-28 05:30:57 +00:00
Paul Hauner
5a3b94cbb4
Update to v1.0.1, run cargo update 2020-11-27 21:16:59 +11:00
blacktemplar
38b15deccb Fallback nodes for eth1 access (#1918)
## Issue Addressed

part of  #1883

## Proposed Changes

Adds a new cli argument `--eth1-endpoints` that can be used instead of `--eth1-endpoint` to specify a comma-separated list of endpoints. If the first endpoint returns an error for some request the other endpoints are tried in the given order.

## Additional Info

Currently if the first endpoint fails the fallbacks are used silently (except for `try_fallback_test_endpoint` that is used in `do_update` which logs a `WARN` for each endpoint that is not reachable). A question is if we should add more logs so that the user gets warned if his main endpoint is for example just slow and sometimes hits timeouts.
2020-11-27 08:37:44 +00:00
Michael Sproul
1312844f29 Disable snappy in LevelDB to fix build issues (#1983)
## Proposed Changes

A user on Discord reported build issues when trying to compile Lighthouse checked out to a path with spaces in it. I've fixed the issue upstream in `leveldb-sys` (https://github.com/skade/leveldb-sys/pull/22), but rather than waiting for a new release of the `leveldb` crate, we can also work around the issue by disabling Snappy in LevelDB, which we weren't using anyway.

This may also have the side-effect of slightly improving compilation times, as LevelDB+Snappy was found to be a substantial contributor to build time (although I'm not sure how much was LevelDB and how much was Snappy).
2020-11-27 03:01:57 +00:00
Pawan Dhananjay
0589a14afe Log better error message (#1981)
## Issue Addressed

Fixes #1965 

## Proposed Changes

Log an error and don't update eth1 caches if `chain_id = 0`
2020-11-26 23:13:46 +00:00
divma
fc07cc3fdf Sync metrics (#1975)
## Issue Addressed
- Add metrics to keep track of peer counts by sync type
- Add metric to keep track of the number of syncing chains in range

## Proposed Changes
Plugin to the network metrics update interval and update too the counts for peers wrt to their sync status with us

## Additional Info
For the peer counts
- By the way it is implemented the numbers won't always match to the total peer count in the `libp2p` metric.
- Updating the gauge with every change is messy because it requires to be updated on connection (in the `eth2_libp2p` crate, while metrics are defined in the `network` crate) on Goodbye sent (for an `IrrelevantPeer`) either in the `beacon_processor` or the `peer_manager`, and on disconnection. Since this is not a critical metric I think counting once every second is enough. If you think more accuracy is needed we can do it too, but it would be harder to maintain)

ATM those look like this
![image](https://user-images.githubusercontent.com/26765164/100275387-22137b00-2f60-11eb-93b9-94b0f265240c.png)
2020-11-26 05:23:17 +00:00
Paul Hauner
26741944b1 Add metrics to VC (#1954)
## Issue Addressed

NA

## Proposed Changes

- Adds a HTTP server to the VC which provides Prometheus metrics.
- Moves the health metrics into the `lighthouse_metrics` crate so it can be shared between BN/VC.
- Sprinkle some metrics around the VC.
- Update the book to indicate that we now have VC metrics.
- Shifts the "waiting for genesis" logic later in the `ProductionValidatorClient::new_from_cli`
  - This is worth attention during the review.

## Additional Info

- ~~`clippy` has some new lints that are failing. I'll deal with that in another PR.~~
2020-11-26 01:10:51 +00:00
divma
3b4afc27bf Status race condition (#1967)
## Issue Addressed

Sync stalls due to race conditions between dc notifications and status processing
2020-11-25 02:15:38 +00:00
Paul Hauner
c6baa0eed1
Bump to v1.0.0, run cargo update 2020-11-25 02:02:19 +11:00
Age Manning
a96893744c
Update bootnodes and boot_node cli (#1961) 2020-11-25 02:01:37 +11:00
divma
6f890c398e Sync Bug fixes (#1950)
## Issue Addressed

Two issues related to empty batches
- Chain target's was not being advanced when the batch was successful, empty and the chain didn't have an optimistic batch
- Not switching finalized chains. We now switch finalized chains requiring a minimum work first
2020-11-24 02:11:31 +00:00
Paul Hauner
21617aa87f Change --testnet flag to --network (#1751)
## Issue Addressed

- Resolves #1689

## Proposed Changes

TBC

## Additional Info

NA
2020-11-23 23:54:03 +00:00
Michael Sproul
7d644103c6 Tweak slasher DB schema and pruning (#1948)
## Issue Addressed

Resolves #1890

## Proposed Changes

Change the slasher database schema to key indexed attestations by `(target_epoch, indexed_attestation_root)` instead of just `indexed_attestation_root`. This allows more straight-forward pruning (linear scan), that is also "re-entrant". By re-entrant, we mean that a pruning pass that gets stuck because of a `MapFull` error can attempt to commit midway, and be resumed later without issue. The previous pruning strategy for indexed attestations did not have this property. There was also a flaw in the previous pruning that could leave "zombie" indexed attestations in the database (ones not referenced by any attester record), which could build up and contribute to bloat (although in practice I think they occur quite infrequently).

## Additional Info

During testing I noticed that a `MapFull` error can still occur during the commit of the transaction itself, which is irritating, but not unbearable. This PR should at least reduce the frequency with which users need to manually resize their DB, and if the `MapFull` on commit rears its ugly head too often we could use a dynamic strategy (temporarily increase the size of the map until the transaction commits).

The extra bytes for the epoch make the database a bit heavier, so the size estimate docs have been updated to reflect this. This is also a breaking schema change, so anyone using a v0 database from a few hours ago will need to drop it and update 😅
2020-11-23 21:33:51 +00:00
Michael Sproul
5828ff1204 Implement slasher (#1567)
This is an implementation of a slasher that lives inside the BN and can be enabled via `lighthouse bn --slasher`.

Features included in this PR:

- [x] Detection of attester slashing conditions (double votes, surrounds existing, surrounded by existing)
- [x] Integration into Lighthouse's attestation verification flow
- [x] Detection of proposer slashing conditions
- [x] Extraction of attestations from blocks as they are verified
- [x] Compression of chunks
- [x] Configurable history length
- [x] Pruning of old attestations and blocks
- [x] More tests

Future work:

* Focus on a slice of history separate from the most recent N epochs (e.g. epochs `current - K` to `current - M`)
* Run out-of-process
* Ingest attestations from the chain without a resync

Design notes are here https://hackmd.io/@sproul/HJSEklmPL
2020-11-23 03:43:22 +00:00
Paul Hauner
59b2247ab8 Improve UX whilst VC is waiting for genesis (#1915)
## Issue Addressed

- Resolves #1424

## Proposed Changes

Add a `GET lighthouse/staking` that returns 200 if the node is ready to stake (i.e., `--eth1` flag is present) or a 404 otherwise.

Whilst the VC is waiting for the genesis time to start (i.e., when the genesis state is known), check the `lighthouse/staking` endpoint and log an error if the node isn't configured for staking.

## Additional Info

NA
2020-11-23 01:00:22 +00:00
Paul Hauner
65b1cf2af1 Add flag to import all attestations (#1941)
## Issue Addressed

NA

## Proposed Changes

Adds the `--import-all-attestations` flag which tells the `network::AttestationService` to import/aggregate all attestations after verification (instead of only ones for subnets that are relevant to local validators).

This is useful for testing/debugging and also for creating back-up nodes that should be all cached up and ready for any validator.

## Additional Info

NA
2020-11-22 23:58:25 +00:00
divma
d0cbf3111a move sync state to the chains KV (#1940)
## Issue Addressed
we have a log saying we add a peer to a chain, and an another one in case the chain is not syncing. To avoid needing to peer there two (and reduce log entries) simply log the chain's syncing state in the chain's KV
2020-11-22 23:58:23 +00:00
Michael Sproul
426b3001e0 Fix race condition in seen caches (#1937)
## Issue Addressed

Closes #1719

## Proposed Changes

Lift the internal `RwLock`s and `Mutex`es from the `Observed*` data structures to resolve the race conditions described in #1719.

Most of this work was done by @paulhauner on his `lift-locks` branch, I merely updated it for the current `master` and checked over it.

## Additional Info

I think it would be prudent to test this on a testnet or two before mainnet launch, just to be sure that the extra lock contention doesn't negatively impact performance.
2020-11-22 23:02:51 +00:00
Paul Hauner
0b556c4405 Fix metrics http server error messages (#1946)
## Issue Addressed

- Resolves #1945

## Proposed Changes

- As per #1945, fix a log message from the metrics server that was falsely claiming to be from the api server.
- Ensure successful api request logs are published to debug, not trace. This is something I've wanted to do for a while.

## Additional Info

NA
2020-11-22 03:39:13 +00:00
Paul Hauner
48f73b21e6 Expand eth1 block cache, add more logs (#1938)
## Issue Addressed

NA

## Proposed Changes

- Caches later blocks than is required by `ETH1_FOLLOW_DISTANCE`.
- Adds logging to `warn` if the eth1 cache is insufficiently primed.
- Use `max_by_key` instead of `max_by` in `BeaconChain::Eth1Chain` since it's simpler.
- Rename `voting_period_start_timestamp` to `voting_target_timestamp` for accuracy.

## Additional Info

The reason for eating into the `ETH1_FOLLOW_DISTANCE` and caching blocks that are closer to the head is due to possibility for `SECONDS_PER_ETH1_BLOCK` to be incorrect (as is the case for the Pyrmont testnet on Goerli).

If `SECONDS_PER_ETH1_BLOCK` is too short, we'll skip back too far from the head and skip over blocks that would be valid [`is_candidate_block`](https://github.com/ethereum/eth2.0-specs/blob/v1.0.0/specs/phase0/validator.md#eth1-data) blocks. This was the case on the Pyrmont testnet and resulted in Lighthouse choosing blocks that were about 30 minutes older than is ideal.
2020-11-21 00:26:15 +00:00
Kirk Baird
3b405f10ea Ensure deposit signatures do not use aggregate functions (#1935)
## Issue Addressed

Resolves #1333 

## Proposed Changes

- Remove `deposit_signature_set()` function
- Prevent deposits from being in `SignatureSets`
- User `Signature.verify()` to verify deposit signatures rather than a signature set which uses `fast_aggregate_verify()`

## Additional Info

n/a
2020-11-20 03:37:20 +00:00
divma
d727e55abe Move some rpc processing to the beacon_processor (#1936)
## Issue Addressed
`BlocksByRange` requests were the main culprit of a series of timeouts to peer's requests in general because they produce build up in the router's processor. Those were moved to the blocking executor but a task is being spawned for each; also not ideal since the amount of resources we give to those is not controlled

## Proposed Changes
- Move `BlocksByRange` and `BlocksByRoots` to the `beacon_processor`. The processor crafts the responses and sends them.
- Move too the processing of `StatusMessage`s from other peers. This is a fast operation but it can also build up and won't scale if we keep it in the router (processing one at the time). These don't need to send an answer, so there is no harm in processing them "later" if that were to happen. Sending responses to status requests is still in the router, so we answer as soon as we see them.
- Some "extras" that are basically clean up:
  - Split the `Worker` logic in sync methods (chain processing and rpc blocks), gossip methods (the majority of methods) and rpc methods (the new ones)
  - Move the `status_message` function previously provided by the router's processor to a more central place since it is used by the router, sync, network_context and beacon_processor
 - Some spelling

## Additional Info
What's left to decide/test more thoroughly is the length of the queues and the priority rules. @paulhauner suggested at some point to put status above attestations, and @AgeManning had described an importance of "protecting gossipsub" so my solution is leaving status requests in the router and RPC methods below attestations. Slashings and Exits are at the end.
2020-11-19 23:33:44 +00:00
Pawan Dhananjay
e47739047d Add additional libp2p tests (#1867)
## Issue Addressed

N/A

## Proposed Changes

Adds tests for the eth2_libp2p crate.
2020-11-19 22:32:09 +00:00
realbigsean
79fd9b32b9 Update pool/attestations and committees endpoints (#1899)
## Issue Addressed

Catching up on a few eth2 spec updates:

## Proposed Changes

- adding query params to the `GET pool/attestations` endpoint
- allowing the `POST pool/attestations` endpoint to accept an array of attestations
    - batching attestation submission
- moving `epoch` from a path param to a query param in the `committees` endpoint

## Additional Info


Co-authored-by: realbigsean <seananderson33@gmail.com>
2020-11-18 23:31:39 +00:00
blacktemplar
3408de8151 Avoid string initialization in network metrics and replace by &str where possible (#1898)
## Issue Addressed

NA

## Proposed Changes

Removes most of the temporary string initializations in network metrics and replaces them by directly using `&str`. This further improves on PR https://github.com/sigp/lighthouse/pull/1895.

For the subnet id handling the current approach uses a build script to create a static map. This has the disadvantage that the build script hardcodes the number of subnets. If we want to use more than 64 subnets we need to adjust this in the build script.

## Additional Info

We still have some string initializations for the enum `PeerKind`. To also replace that by `&str` I created a PR in the libp2p dependency: https://github.com/sigp/rust-libp2p/pull/91. Either we wait with merging until this dependency PR is merged (and all conflicts with the newest libp2p version are resolved) or we just merge as is and I will create another PR when the dependency is ready.
2020-11-18 23:31:37 +00:00
Paul Hauner
bcc7f6b143 Add new flag to set blocks per eth1 query (#1931)
## Issue Addressed

NA

## Proposed Changes

Users on Discord (and @protolambda) have experienced this error (or variants of it):

```
Failed to update eth1 cache: GetDepositLogsFailed("Eth1 node returned error: {\"code\":-32005,\"message\":\"query returned more than 10000 results\"}")
```

This PR allows users to reduce the span of blocks searched for deposit logs and therefore reduce the size of the return result. Hopefully experimentation with this flag can lead to finding a better default value.


## Additional Info

NA
2020-11-18 22:18:59 +00:00
Paul Hauner
7e4ee58729 Bump to v0.3.5 (#1927)
## Issue Addressed

NA

## Proposed Changes

- Bump version to `v0.3.5`
- Run `cargo update`

## Additional Info

NA
2020-11-18 00:44:28 +00:00
Paul Hauner
103103e72e Address queue congestion in migrator (#1923)
## Issue Addressed

*Should* address #1917

## Proposed Changes

Stops the `BackgroupMigrator` rx channel from backing up with big `BeaconState` messages.

Looking at some logs from my Medalla node, we can see a discrepancy between the head finalized epoch and the migrator finalized epoch:

```
Nov 17 16:50:21.606 DEBG Head beacon block                       slot: 129214, root: 0xbc7a…0b99, finalized_epoch: 4033, finalized_root: 0xf930…6562, justified_epoch: 4035, justified_root: 0x206b…9321, service: beacon
Nov 17 16:50:21.626 DEBG Batch processed                         service: sync, processed_blocks: 43, last_block_slot: 129214, chain: 8274002112260436595, first_block_slot: 129153, batch_epoch: 4036
Nov 17 16:50:21.626 DEBG Chain advanced                          processing_target: 4036, new_start: 4036, previous_start: 4034, chain: 8274002112260436595, service: sync
Nov 17 16:50:22.162 DEBG Completed batch received                awaiting_batches: 5, blocks: 47, epoch: 4048, chain: 8274002112260436595, service: sync
Nov 17 16:50:22.162 DEBG Requesting batch                        start_slot: 129601, end_slot: 129664, downloaded: 0, processed: 0, state: Downloading(16Uiu2HAmG3C3t1McaseReECjAF694tjVVjkDoneZEbxNhWm1nZaT, 0 blocks, 1273), epoch: 4050, chain: 8274002112260436595, service: sync
Nov 17 16:50:22.654 DEBG Database compaction complete            service: beacon
Nov 17 16:50:22.655 INFO Starting database pruning               new_finalized_epoch: 2193, old_finalized_epoch: 2192, service: beacon
```

I believe this indicates that the migrator rx has a backed-up queue of `MigrationNotification` items which each contain a `BeaconState`.

## TODO

- [x] Remove finalized state requirement for op-pool
2020-11-17 23:11:26 +00:00
Michael Sproul
a60ab4eff2 Refine compaction (#1916)
## Proposed Changes

In an attempt to fix OOM issues and database consistency issues observed by some users after the introduction of compaction in v0.3.4, this PR makes the following changes:

* Run compaction less often: roughly every 1024 epochs, including after long periods of non-finality. I think the division check proposed by Paul is pretty solid, and ensures we don't miss any events where we should be compacting. LevelDB lacks an easy way to check the size of the DB, which would be another good trigger.
* Make it possible to disable the compaction on finalization using `--auto-compact-db=false`
* Make it possible to trigger a manual, single-threaded foreground compaction on start-up using `--compact-db`
* Downgrade the pruning log to `DEBUG`, as it's particularly noisy during sync

I would like to ship these changes to affected users ASAP, and will document them further in the Advanced Database section of the book if they prove effective.
2020-11-17 09:10:53 +00:00
divma
398919b5d4 router: drop requests from peers that have dc'd (#1919)
## Issue Addressed

A peer might send a lot of requests that comply to the rate limit and the disconnect, this humongous pr makes sure we don't process them if the peer is not connected
2020-11-17 02:06:21 +00:00
Pawan Dhananjay
280334b1b0 Validate eth1 chain id (#1877)
## Issue Addressed

Resolves #1815 

## Proposed Changes

Adds extra validation for eth1 chain id apart from the existing check for eth1 network id.
2020-11-16 23:10:42 +00:00
Age Manning
49c4630045 Performance improvement for db reads (#1909)
This PR adds a number of improvements:
- Downgrade a warning log when we ignore blocks for gossipsub processing
- Revert a a correction to improve logging of peer score changes
- Shift syncing DB reads off the core-executor allowing parallel processing of large sync messages
- Correct the timeout logic of RPC chunk sends, giving more time before timing out RPC outbound messages.
2020-11-16 07:28:30 +00:00
divma
eb56140582 Update logs + do not downscore peers if WE time out (#1901)
## Issue Addressed

- RPC Errors were being logged twice: first in the peer manager and then again in the router, so leave just the peer manager's one 
- The "reduce peer count" warn message gets thrown to the user for every missed chunk, so instead print it when the request times out and also do not include there info that is not relevant to the user
- The processor didn't have the service tag so add it
- Impl `KV` for status message
- Do not downscore peers if we are the ones that timed out

Other small improvements
2020-11-16 04:06:14 +00:00
realbigsean
6a7d221f72 add slot validation to attestation_data endpoint (#1888)
## Issue Addressed

Resolves #1801

## Proposed Changes

Verify queries to `attestation_data` are for no later than `current_slot + 1`. If they are later than this, return a 400.


Co-authored-by: realbigsean <seananderson33@gmail.com>
2020-11-16 02:59:35 +00:00
divma
8a16548715 Misc Peer sync info adjustments (#1896)
## Issue Addressed
#1856 

## Proposed Changes
- For clarity, the router's processor now only decides if a peer is compatible and it disconnects it or sends it to sync accordingly. No logic here regarding how useful is the peer. 
- Update peer_sync_info's rules
- Add an `IrrelevantPeer` sync status to account for incompatible peers (maybe this should be "IncompatiblePeer" now that I think about it?) this state is update upon receiving an internal goodbye in the peer manager
- Misc code cleanups
- Reduce the need to create `StatusMessage`s (and thus, `Arc` accesses )
- Add missing calls to update the global sync state

The overall effect should be:
- More peers recognized as Behind, and less as Unknown
- Peers identified as incompatible
2020-11-13 09:00:10 +00:00
Michael Sproul
46a06069c6 Release v0.3.4 (#1894)
## Proposed Changes

Bump version to v0.3.4 and update dependencies with `cargo update`.


Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2020-11-13 06:06:35 +00:00
Age Manning
c00e6c2c6f Small network adjustments (#1884)
## Issue Addressed

- Asymmetric pings - Currently with symmetric ping intervals, lighthouse nodes race each other to ping often ending in simultaneous ping connections. This shifts the ping interval to be asymmetric based on inbound/outbound connections
- Correct inbound/outbound peer-db registering - It appears we were accounting inbound as outbound and vice versa in the peerdb, this has been corrected
- Improved logging

There is likely more to come - I'll leave this open as we investigate further testnets
2020-11-13 06:06:33 +00:00
Paul Hauner
8772c02fa0 Reduce temp allocations in network metrics (#1895)
## Issue Addressed

Using `heaptrack` I could see that ~75% of Lighthouse temporary allocations are caused by temporary string allocations here.

## Proposed Changes

Reduces temporary `String` allocations when updating metrics in the `network` crate. The solution isn't perfect since we rebuild our caches with each call, but it's a significant improvement.

## Additional Info

NA
2020-11-13 04:19:38 +00:00
blacktemplar
c7ac967d5a handle peer state transitions on gossipsub score changes + refactoring (#1892)
## Issue Addressed

NA

## Proposed Changes

Correctly handles peer state transitions on gossipsub changes + refactors handling of peer state transitions into one function used for lighthouse score changes and gossipsub score changes.


Co-authored-by: Age Manning <Age@AgeManning.com>
2020-11-13 03:15:03 +00:00
realbigsean
cb26c15eb6 Peer endpoint updates (#1893)
## Issue Addressed

N/A

## Proposed Changes

- rename `address` -> `last_seen_p2p_address`
- state and direction filters for `peers` endpoint
- metadata count addition to `peers` endpoint
- add `peer_count` endpoint


Co-authored-by: realbigsean <seananderson33@gmail.com>
2020-11-13 02:02:41 +00:00
blacktemplar
fcb4893f72 do subnet discoveries until we have MESH_N_LOW many peers (#1886)
## Issue Addressed

NA

## Proposed Changes

Increases the target peers for a subnet, so that subnet queries are executed until we have at least the minimum required peers for a mesh (`MESH_N_LOW`). We keep the limit of `6` target peers for aggregated subnet discovery queries, therefore the size (and the time needed) for a query doesn't change.
2020-11-13 00:56:05 +00:00
blacktemplar
7404f1ce54 Gossipsub scoring (#1668)
## Issue Addressed

#1606 

## Proposed Changes

Uses dynamic gossipsub scoring parameters depending on the number of active validators as specified in https://gist.github.com/blacktemplar/5c1862cb3f0e32a1a7fb0b25e79e6e2c.

## Additional Info

Although the parameters got tested on Medalla, extensive testing using simulations on larger networks is still to be done and we expect that we need to change the parameters, although this might only affect constants within the dynamic parameter framework.
2020-11-12 01:48:28 +00:00
realbigsean
f8da151b0b Standard beacon api updates (#1831)
## Issue Addressed

Resolves #1809 
Resolves #1824
Resolves #1818
Resolves #1828 (hopefully)

## Proposed Changes

- add `validator_index` to the proposer duties endpoint
- add the ability to query for historical proposer duties
- `StateId` deserialization now fails with a 400 warp rejection
- add the `validator_balances` endpoint
- update the `aggregate_and_proofs` endpoint to accept an array
- updates the attester duties endpoint from a `GET` to a `POST`
- reduces the number of times we query for proposer duties from once per slot per validator to only once per slot 


Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
2020-11-09 23:13:56 +00:00
Michael Sproul
556190ff46 Compact database on finalization (#1871)
## Issue Addressed

Closes #1866

## Proposed Changes

* Compact the database on finalization. This removes the deleted states from disk completely. Because it happens in the background migrator, it doesn't block other database operations while it runs. On my Medalla node it took about 1 minute and shrank the database from 90GB to 9GB.
* Fix an inefficiency in the pruning algorithm where it would always use the genesis checkpoint as the `old_finalized_checkpoint` when running for the first time after start-up. This would result in loading lots of states one-at-a-time back to genesis, and storing a lot of block roots in memory. The new code stores the old finalized checkpoint on disk and only uses genesis if no checkpoint is already stored. This makes it both backwards compatible _and_ forwards compatible -- no schema change required!
* Introduce two new `INFO` logs to indicate when pruning has started and completed. Users seem to want to know this information without enabling debug logs!
2020-11-09 07:02:21 +00:00
Paul Hauner
2f9999752e Add --testnet mainnet and start HTTP server before genesis (#1862)
## Issue Addressed

NA

## Proposed Changes

- Adds support for `--testnet mainnet`
- Start HTTP server prior to genesis

## Additional Info

**Note: This is an incomplete work-in-progress. Use Lighthouse for mainnet at your own risk.**

With this PR, you can check the deposits:

```bash
lighthouse --testnet mainnet bn --http
```
```bash
curl localhost:5052/lighthouse/eth1/deposit_cache | jq
```

```json
{
  "data": [
    {
      "deposit_data": {
        "pubkey": "0x854980aa9bf2e84723e1fa6ef682e3537257984cc9cb1daea2ce6b268084b414f0bb43206e9fa6fd7a202357d6eb2b0d",
        "withdrawal_credentials": "0x00cacf703c658b802d55baa2a5c1777500ef5051fc084330d2761bcb6ab6182b",
        "amount": "32000000000",
        "signature": "0xace226cdfd9da6b1d827c3a6ab93f91f53e8e090eb6ca5ee7c7c5fe3acc75558240ca9291684a2a7af5cac67f0558d1109cc95309f5cdf8c125185ec9dcd22635f900d791316924aed7c40cff2ffccdac0d44cf496853db678c8c53745b3545b"
      },
      "block_number": 3492981,
      "index": 0,
      "signature_is_valid": true
    },
    {
      "deposit_data": {
        "pubkey": "0x93da03a71bc4ed163c2f91c8a54ea3ba2461383dd615388fd494670f8ce571b46e698fc8d04b49e4a8ffe653f581806b",
        "withdrawal_credentials": "0x006ebfbb7c8269a78018c8b810492979561d0404d74ba9c234650baa7524dcc4",
        "amount": "32000000000",
        "signature": "0x8d1f4a1683f798a76effcc6e2cdb8c3eed5a79123d201c5ecd4ab91f768a03c30885455b8a952aeec3c02110457f97ae0a60724187b6d4129d7c352f2e1ac19b4210daacd892fe4629ad3260ce2911dceae3890b04ed28267b2d8cb831f6a92d"
      },
      "block_number": 3493427,
      "index": 1,
      "signature_is_valid": true
    },
```
2020-11-09 05:04:03 +00:00
divma
b0e9e3dcef Seen addresses store port (#1841)
## Issue Addressed
#1764
2020-11-09 04:01:03 +00:00
Age Manning
e2ae5010a6 Update libp2p (#1865)
Updates libp2p to the latest version. 

This adds tokio 0.3 support and brings back yamux support. 

This also updates some discv5 configuration parameters for leaner discovery queries
2020-11-06 04:14:14 +00:00
blacktemplar
7e7fad5734 Ignore RPC messages of disconnected peers and remove old peers based on disconnection time (#1854)
## Issue Addressed

NA

## Proposed Changes

Lets the networking behavior ignore messages of peers that are not connected. Furthermore, old peers are not removed from the peerdb based on score anymore but based on the disconnection time.
2020-11-03 23:43:10 +00:00
Age Manning
0a0f4daf9d Prevent errors for stream termination race (#1853)
Prevents an error being propagated on a race condition for RPC stream termination
2020-11-03 10:37:00 +00:00
Paul Hauner
0cde4e285c Bump version to v0.3.3 (#1850)
## Issue Addressed

NA

## Proposed Changes

- Update versions
- Run `cargo update`

## Additional Info

- Blocked on #1846
2020-11-02 23:55:15 +00:00
Paul Hauner
7afbaa807e Return eth1-related data via the API (#1797)
## Issue Addressed

- Related to #1691

## Proposed Changes

Adds the following API endpoints:

- `GET lighthouse/eth1/syncing`: status about how synced we are with Eth1.
- `GET lighthouse/eth1/block_cache`: all locally cached eth1 blocks.
- `GET lighthouse/eth1/deposit_cache`: all locally cached eth1 deposits.

Additionally:

- Moves some types from the `beacon_node/eth1` to the `common/eth2` crate, so they can be used in the API without duplication.
- Allow `update_deposit_cache` and `update_block_cache` to take an optional head block number to avoid duplicate requests.

## Additional Info

TBC
2020-11-02 00:37:30 +00:00
divma
6c0c050fbb Tweak head syncing (#1845)
## Issue Addressed

Fixes head syncing

## Proposed Changes

- Get back to statusing peers after removing chain segments and making the peer manager deal with status according to the Sync status, preventing an old known deadlock
- Also a bug where a chain would get removed if the optimistic batch succeeds being empty

## Additional Info

Tested on Medalla and looking good
2020-11-01 23:37:39 +00:00
Paul Hauner
f64f8246db Only run http_api tests in release (#1827)
## Issue Addressed

NA

## Proposed Changes

As raised by @hermanjunge in a DM, the `http_api` tests have been observed taking 100+ minutes on debug. This PR:

- Moves the `http_api` tests to only run in release.
- Groups some `http_api` tests to reduce test-setup overhead.

## Additional Info

NA
2020-10-29 22:25:20 +00:00
realbigsean
ae0f025375 Beacon state validator id filter (#1803)
## Issue Addressed

Michael's comment here: https://github.com/sigp/lighthouse/issues/1434#issuecomment-708834079
Resolves #1808

## Proposed Changes

- Add query param `id` and `status` to the `validators` endpoint
- Add string serialization and deserialization for `ValidatorStatus`
- Drop `Epoch` from `ValidatorStatus` variants

## Additional Info

Please provide any additional information. For example, future considerations
or information useful for reviewers.
2020-10-29 05:13:04 +00:00
divma
9f45ac2f5e More sync edge cases + prettify range (#1834)
## Issue Addressed
Sync edge case when we get an empty optimistic batch that passes validation and is inside the download buffer. Eventually the chain would reach the batch and treat it as an ugly state. 

## Proposed Changes
- Handle the edge case advancing the chain's target + code clarification
- Some largey changes for readability + ergonomics since rust has try ops
- Better handling of bad batch and chain states
2020-10-29 02:29:24 +00:00
blacktemplar
2bd5b9182f fix unbanning of peers (#1838)
## Issue Addressed

NA

## Proposed Changes

Currently a banned peer will remain banned indefinitely as long as update is called on the score struct regularly. This fixes this bug and the score decay starts after `BANNED_BEFORE_DECAY` seconds after banning.
2020-10-29 01:25:02 +00:00
Michael Sproul
36bd4d87f0 Update to spec v1.0.0-rc.0 and BLSv4 (#1765)
## Issue Addressed

Closes #1504 
Closes #1505
Replaces #1703
Closes #1707

## Proposed Changes

* Update BLST and Milagro to versions compatible with BLSv4 spec
* Update Lighthouse to spec v1.0.0-rc.0, and update EF test vectors
* Use the v1.0.0 constants for `MainnetEthSpec`.
* Rename `InteropEthSpec` -> `V012LegacyEthSpec`
    * Change all constants to suit the mainnet `v0.12.3` specification (i.e., Medalla).
* Deprecate the `--spec` flag for the `lighthouse` binary
    * This value is now obtained from the `config_name` field of the `YamlConfig`.
        * Built in testnet YAML files have been updated.
    * Ignore the `--spec` value, if supplied, log a warning that it will be deprecated
    * `lcli` still has the spec flag, that's fine because it's dev tooling.
* Remove the `E: EthSpec` from `YamlConfig`
    * This means we need to deser the genesis `BeaconState` on-demand, but this is fine.
* Swap the old "minimal", "mainnet" strings over to the new `EthSpecId` enum.
* Always require a `CONFIG_NAME` field in `YamlConfig` (it used to have a default).

## Additional Info

Lots of breaking changes, do not merge! ~~We will likely need a Lighthouse v0.4.0 branch, and possibly a long-term v0.3.0 branch to keep Medalla alive~~.

Co-authored-by: Kirk Baird <baird.k@outlook.com>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
2020-10-28 22:19:38 +00:00
divma
ad846ad280 Inform peers of requests that exceed the maximum rate limit + log downgrade (#1830)
## Issue Addressed

#1825 

## Proposed Changes

Since we penalize more blocks by range requests that have large steps, it is possible to get requests that will never be processed. We were not informing peers about this requests and also logging CRIT that is no longer relevant. Later we should check if more sophisticated handling for those requests is needed
2020-10-27 11:46:38 +00:00
Paul Hauner
92c8eba8ca Ensure eth1 deposit/chain IDs are used from YamlConfig (#1829)
## Issue Addressed

 NA

## Proposed Changes

Fixes a bug which causes the node to reject valid eth1 nodes.

- Fix core bug: failure to apply `YamlConfig` values to `ChainSpec`.
- Add a test to prevent regression in this specific case.
- Fix an invalid log message

## Additional Info

NA
2020-10-26 03:34:14 +00:00
Paul Hauner
f157d61cc7 Address clippy lints, panic in ssz_derive on overflow (#1714)
## Issue Addressed

NA

## Proposed Changes

- Panic or return error if we overflow `usize` in SSZ decoding/encoding derive macros.
  - I claim that the panics can only be triggered by a faulty type definition in lighthouse, they cannot be triggered externally on a validly defined struct.
- Use `Ordering` instead of some `if` statements, as demanded by clippy.
- Remove some old clippy `allow` that seem to no longer be required.
- Add comments to interesting clippy statements that we're going to continue to ignore.
- Create #1713

## Additional Info

NA
2020-10-25 23:27:39 +00:00
Paul Hauner
eba51f0973 Update testnet configs, change on-disk format (#1799)
## Issue Addressed

- Related to #1691

## Proposed Changes

- Add `DEPOSIT_CHAIN_ID` and `DEPOSIT_NETWORK_ID` to `config.yaml`.
    - Pass the `DEPOSIT_NETWORK_ID` to the `eth1::Service`.
- Remove the unused `MAX_EPOCHS_PER_CROSSLINK` from the `altona` and `medalla` configs (see [spec commit](2befe90032 (diff-efb845ac2ebd4aafbc23df40f47ce25699255064e99d36d0406d0a14ca7953ec))).
- Change from compressing the whole testnet directory, to only compressing the genesis state file. This is the only file we need to compress and *not* compressing the others makes them work nicely with git.
    - We can modify the boot nodes, configs, etc. without incurring an eternal binary-blob cost on our git history.
    - This change is backwards compatible (i.e., non-breaking).

## Additional Info

NA
2020-10-25 22:15:46 +00:00
Age Manning
7453f39d68 Prevent unbanning of disconnected peers (#1822)
## Issue Addressed

Further testing revealed another edge case where we attempt to unban a peer that can be in a disconnected start. Although this causes no real issue, it does log an error to the user. 

This PR adds a check to prevent this edge case and prevents the error being logged to the user.
2020-10-24 05:24:20 +00:00
Age Manning
a3cc1a1e0f Call unban only when necessary (#1821)
This PR prevents a user-facing error. 

It prevents optimistically unbanning a peer and instead checks the state of the peer before requesting the peers state to be unbanned.
2020-10-24 03:24:19 +00:00
blacktemplar
1644289a08 Updates the libp2p to the second newest commit => Allow only one topic per message (#1819)
As @AgeManning mentioned the newest libp2p version had some problems and got downgraded again on lighthouse master. This is an intermediate version that makes no problems and only adds a small change of allowing only one topic per message.
2020-10-24 01:05:37 +00:00
Age Manning
7870b81ade Downgrade libp2p (#1817)
## Description

This downgrades the recent libp2p upgrade. 

There were issues with the RPC which prevented syncing of the chain and this upgrade needs to be further investigated.
2020-10-23 09:33:59 +00:00
Age Manning
55eee18ebb Version bump to 0.3.1 (#1813)
## Description

Bumps Lighthouse to version 0.3.1.
2020-10-23 04:16:36 +00:00
Age Manning
64c5899d25 Adds colour help to bn and vc subcommands (#1811)
Adds coloured help to the bn and vc subcommands
2020-10-23 04:16:34 +00:00
Age Manning
2c7f362908 Discovery v5.1 (#1786)
## Overview 

This updates lighthouse to discovery v5.1

Note: This makes lighthouse's discovery not compatible with any previous version. Lighthouse cannot discover peers or send/receive ENR's from any previous version. This is a breaking change. 

This resolves #1605
2020-10-23 04:16:33 +00:00
Age Manning
ae96dab5d2 Increase UPnP logging and decrease batch sizes (#1812)
## Description

This increases the logging of the underlying UPnP tasks to inform the user of UPnP error/success. 

This also decreases the batch syncing size to two epochs per batch.
2020-10-23 03:01:33 +00:00
Age Manning
c49dd94e20 Update to latest libp2p (#1810)
## Description

Updates to the latest libp2p and includes gossipsub updates. 

Of particular note is the limitation of a single topic per gossipsub message.

Co-authored-by: blacktemplar <blacktemplar@a1.net>
2020-10-23 03:01:31 +00:00
Michael Sproul
acd49d988d Implement database temp states to reduce memory usage (#1798)
## Issue Addressed

Closes #800
Closes #1713

## Proposed Changes

Implement the temporary state storage algorithm described in #800. Specifically:

* Add `DBColumn::BeaconStateTemporary`, for storing 0-length temporary marker values.
* Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully.
* Add a garbage collection process to delete leftover temporary states on start-up.
* Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784)

## Additional Info

There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant.

### Race 1: Permanent state marked temporary

EDIT: this has been fixed by the addition of a lock around the relevant critical section

There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events:

1. Thread 1 checks if state `s` already exists, and seeing that it doesn't, prepares an atomic commit of `(s, s_temporary_flag)`.
2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and _deleting_ the temporary flag.
3. Thread 1 is (finally) scheduled again, and marks `s` as temporary with its transaction.
4.
    a) The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state `s` will fail... hope it isn't on the main chain! Alternatively (4b) happens...
    b) Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state `s` will disappear temporarily, but will come back once thread 1 finishes running.

I _hope_ that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know

This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. [LMDB only allows one R/W transaction at a time](https://docs.rs/lmdb/0.8.0/lmdb/struct.Environment.html#method.begin_rw_txn).

### Race 2: Temporary state returned from `get_state`

I don't think this race really matters, but in `load_hot_state`, if another thread stores a state between when we call `load_state_temporary_flag` and when we call `load_hot_state_summary`, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than _not_ having data).

This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now.
2020-10-23 01:27:51 +00:00
Age Manning
66f0cf4430 Improve peer handling (#1796)
## Issue Addressed

Potentially resolves #1647 and sync stalls. 

## Proposed Changes

The handling of the state of banned peers was inadequate for the complex peerdb data structure. We store a limited number of disconnected and banned peers in the db. We were not tracking intermediate "disconnecting" states and the in some circumstances we were updating the peer state without informing the peerdb. This lead to a number of inconsistencies in the peer state. 

Further, the peer manager could ban a peer changing a peer's state from being connected to banned. In this circumstance, if the peer then disconnected, we didn't inform the application layer, which lead to applications like sync not being informed of a peers disconnection. This could lead to sync stalling and having to require a lighthouse restart. 

Improved handling for peer states and interactions with the peerdb is made in this PR.
2020-10-23 01:27:48 +00:00
Paul Hauner
b829257cca Ssz state (#1749)
## Issue Addressed

NA

## Proposed Changes

Adds a `lighthouse/beacon/states/:state_id/ssz` endpoint to allow us to pull the genesis state from the API.

## Additional Info

NA
2020-10-22 06:05:49 +00:00
Michael Sproul
7f73dccebc Refine op pool pruning (#1805)
## Issue Addressed

Closes #1769
Closes #1708

## Proposed Changes

Tweaks the op pool pruning so that the attestation pool is pruned against the wall-clock epoch instead of the finalized state's epoch. This should reduce the unbounded growth that we've seen during periods without finality.

Also fixes up the voluntary exit pruning as raised in #1708.
2020-10-22 04:47:29 +00:00
Paul Hauner
a3704b971e Support pre-flight CORS check (#1772)
## Issue Addressed

- Resolves #1766 

## Proposed Changes

- Use the `warp::filters::cors` filter instead of our work-around.

## Additional Info

It's not trivial to enable/disable `cors` using `warp`, since using `routes.with(cors)` changes the type of `routes`.  This makes it difficult to apply/not apply cors at runtime. My solution has been to *always* use the `warp::filters::cors` wrapper but when cors should be disabled, just pass the HTTP server listen address as the only permissible origin.
2020-10-22 04:47:27 +00:00
realbigsean
a3552a4b70 Node endpoints (#1778)
## Issue Addressed

`node` endpoints in #1434

## Proposed Changes

Implement these:
```
 /eth/v1/node/health
 /eth/v1/node/peers/{peer_id}
 /eth/v1/node/peers
```
- Add an `Option<Enr>` to `PeerInfo`
- Finish implementation of `/eth/v1/node/identity`

## Additional Info
- should update the `peers` endpoints when #1764 is resolved



Co-authored-by: realbigsean <seananderson33@gmail.com>
2020-10-22 02:59:42 +00:00
Daniel Schonfeld
8f86baa48d Optimize attester slashing (#1745)
## Issue Addressed

Closes #1548 

## Proposed Changes

Optimizes attester slashing choice by choosing the ones that cover the most amount of validators slashed, with the highest effective balances 

## Additional Info

Initial pass, need to write a test for it
2020-10-22 01:43:54 +00:00
divma
668513b67e Sync state adjustments (#1804)
check for advanced peers and the state of the chain wrt the clock slot to decide if a chain is or not synced /transitioning to a head sync. Also a fix that prevented getting the right state while syncing heads
2020-10-22 00:26:06 +00:00
realbigsean
628891df1d fix genesis state root provided to HTTP server (#1783)
## Issue Addressed

Resolves #1776

## Proposed Changes

The beacon chain builder was using the canonical head's state root for the `genesis_state_root` field.

## Additional Info
2020-10-21 23:15:30 +00:00
realbigsean
fdb9744759 use head slot instead of the target slot for the not_while_syncing fi… (#1802)
## Issue Addressed

Resolves #1792

## Proposed Changes

Use `chain.best_slot()` instead of the sync state's target slot in the `not_while_syncing_filter`

## Additional Info

N/A
2020-10-21 22:02:25 +00:00
divma
2acf75785c More sync updates (#1791)
## Issue Addressed
#1614 and a couple of sync-stalling problems, the most important is a cyclic dependency between the sync manager and the peer manager
2020-10-20 22:34:18 +00:00
Michael Sproul
703c33bdc7 Fix head tracker concurrency bugs (#1771)
## Issue Addressed

Closes #1557

## Proposed Changes

Modify the pruning algorithm so that it mutates the head-tracker _before_ committing the database transaction to disk, and _only if_ all the heads to be removed are still present in the head-tracker (i.e. no concurrent mutations).

In the process of writing and testing this I also had to make a few other changes:

* Use internal mutability for all `BeaconChainHarness` functions (namely the RNG and the graffiti), in order to enable parallel calls (see testing section below).
* Disable logging in harness tests unless the `test_logger` feature is turned on

And chose to make some clean-ups:

* Delete the `NullMigrator`
* Remove type-based configuration for the migrator in favour of runtime config (simpler, less duplicated code)
* Use the non-blocking migrator unless the blocking migrator is required. In the store tests we need the blocking migrator because some tests make asserts about the state of the DB after the migration has run.
* Rename `validators_keypairs` -> `validator_keypairs` in the `BeaconChainHarness`

## Testing

To confirm that the fix worked, I wrote a test using [Hiatus](https://crates.io/crates/hiatus), which can be found here:

https://github.com/michaelsproul/lighthouse/tree/hiatus-issue-1557

That test can't be merged because it inserts random breakpoints everywhere, but if you check out that branch you can run the test with:

```
$ cd beacon_node/beacon_chain
$ cargo test --release --test parallel_tests --features test_logger
```

It should pass, and the log output should show:

```
WARN Pruning deferred because of a concurrent mutation, message: this is expected only very rarely!
```

## Additional Info

This is a backwards-compatible change with no impact on consensus.
2020-10-19 05:58:39 +00:00
blacktemplar
6ba997b88e add direction information to PeerInfo (#1768)
## Issue Addressed

NA

## Proposed Changes

Adds a direction field to `PeerConnectionStatus` that can be accessed by calling `is_outgoing` which will return `true` iff the peer is connected and the first connection was an outgoing one.
2020-10-16 05:24:21 +00:00