Commit Graph

1646 Commits

Author SHA1 Message Date
Squirrel
db4d72c4f1 Remove unused deps (#2592)
Found some deps you're possibly not using.

Please shout if you think they are indeed still needed.
2021-09-30 04:31:42 +00:00
Mac L
4c510f8f6b Add BlockTimesCache to allow additional block delay metrics (#2546)
## Issue Addressed

Closes #2528

## Proposed Changes

- Add `BlockTimesCache` to provide block timing information to `BeaconChain`. This allows additional metrics to be calculated for blocks that are set as head too late.
- Thread the `seen_timestamp` of blocks received from RPC responses (except blocks from syncing) through to the sync manager, similar to what is done for blocks from gossip.

## Additional Info

This provides the following additional metrics:
- `BEACON_BLOCK_OBSERVED_SLOT_START_DELAY_TIME`
  - The delay between the start of the slot and when the block was first observed.
- `BEACON_BLOCK_IMPORTED_OBSERVED_DELAY_TIME`
   - The delay between when the block was first observed and when the block was imported.
- `BEACON_BLOCK_HEAD_IMPORTED_DELAY_TIME`
  - The delay between when the block was imported and when the block was set as head.

The metric `BEACON_BLOCK_IMPORTED_SLOT_START_DELAY_TIME` was removed.

A log is produced when a block is set as head too late, e.g.:
```
Aug 27 03:46:39.006 DEBG Delayed head block                      set_as_head_delay: Some(21.731066ms), imported_delay: Some(119.929934ms), observed_delay: Some(3.864596988s), block_delay: 4.006257988s, slot: 1931331, proposer_index: 24294, block_root: 0x937602c89d3143afa89088a44bdf4b4d0d760dad082abacb229495c048648a9e, service: beacon
```
2021-09-30 04:31:41 +00:00
Pawan Dhananjay
70441aa554 Improve valmon inclusion delay calculation (#2618)
## Issue Addressed

Resolves #2552 

## Proposed Changes

Offers some improvement in inclusion distance calculation in the validator monitor. 

When registering an attestation from a block, instead of doing `block.slot() - attesstation.data.slot()` to get the inclusion distance, we now pass the parent block slot from the beacon chain and do `parent_slot.saturating_sub(attestation.data.slot())`. This allows us to give best effort inclusion distance in scenarios where the attestation was included right after a skip slot. Note that this does not give accurate results in scenarios where the attestation was included few blocks after the skip slot.

In this case, if the attestation slot was `b1` and was included in block `b2` with a skip slot in between, we would get the inclusion delay as 0  (by ignoring the skip slot) which is the best effort inclusion delay.
```
b1 <- missed <- b2
``` 

Here, if the attestation slot was `b1` and was included in block `b3` with a skip slot and valid block `b2` in between, then we would get the inclusion delay as 2 instead of 1 (by ignoring the skip slot).
```
b1 <- missed <- b2 <- b3 
```
A solution for the scenario 2 would be to count number of slots between included slot and attestation slot ignoring the skip slots in the beacon chain and pass the value to the validator monitor. But I'm concerned that it could potentially lead to db accesses for older blocks in extreme cases.


This PR also uses the validator monitor data for logging per epoch inclusion distance. This is useful as we won't get inclusion data in post-altair summaries.


Co-authored-by: Michael Sproul <micsproul@gmail.com>
2021-09-30 01:22:43 +00:00
realbigsean
7d13e57d9f Add interop metrics (#2645)
## Issue Addressed

Resolves: #2644

## Proposed Changes

- Adds mandatory metrics mentioned here: https://github.com/ethereum/beacon-metrics/blob/master/metrics.md#interop-metrics

## Additional Info

Couldn't figure out how to alias metrics, so I created them all as new gauges/counters.

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-09-29 23:44:24 +00:00
realbigsean
113ef74ef6 Add contribution and proof event (#2527)
## Issue Addressed

N/A

## Proposed Changes

Add the new ContributionAndProof event: https://github.com/ethereum/beacon-APIs/pull/158

## Additional Info

N/A

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-09-25 07:53:58 +00:00
Paul Hauner
fe52322088 Implement SSZ union type (#2579)
## Issue Addressed

NA

## Proposed Changes

Implements the "union" type from the SSZ spec for `ssz`, `ssz_derive`, `tree_hash` and `tree_hash_derive` so it may be derived for `enums`:

https://github.com/ethereum/consensus-specs/blob/v1.1.0-beta.3/ssz/simple-serialize.md#union

The union type is required for the merge, since the `Transaction` type is defined as a single-variant union `Union[OpaqueTransaction]`.

### Crate Updates

This PR will (hopefully) cause CI to publish new versions for the following crates:

- `eth2_ssz_derive`: `0.2.1` -> `0.3.0`
- `eth2_ssz`: `0.3.0` -> `0.4.0`
- `eth2_ssz_types`: `0.2.0` -> `0.2.1`
- `tree_hash`: `0.3.0` -> `0.4.0`
- `tree_hash_derive`: `0.3.0` -> `0.4.0`

These these crates depend on each other, I've had to add a workspace-level `[patch]` for these crates. A follow-up PR will need to remove this patch, ones the new versions are published.

### Union Behaviors

We already had SSZ `Encode` and `TreeHash` derive for enums, however it just did a "transparent" pass-through of the inner value. Since the "union" decoding from the spec is in conflict with the transparent method, I've required that all `enum` have exactly one of the following enum-level attributes:

#### SSZ

-  `#[ssz(enum_behaviour = "union")]`
    - matches the spec used for the merge
-  `#[ssz(enum_behaviour = "transparent")]`
    - maintains existing functionality
    - not supported for `Decode` (never was)
    
#### TreeHash

-  `#[tree_hash(enum_behaviour = "union")]`
    - matches the spec used for the merge
-  `#[tree_hash(enum_behaviour = "transparent")]`
    - maintains existing functionality

This means that we can maintain the existing transparent behaviour, but all existing users will get a compile-time error until they explicitly opt-in to being transparent.

### Legacy Option Encoding

Before this PR, we already had a union-esque encoding for `Option<T>`. However, this was with the *old* SSZ spec where the union selector was 4 bytes. During merge specification, the spec was changed to use 1 byte for the selector.

Whilst the 4-byte `Option` encoding was never used in the spec, we used it in our database. Writing a migrate script for all occurrences of `Option` in the database would be painful, especially since it's used in the `CommitteeCache`. To avoid the migrate script, I added a serde-esque `#[ssz(with = "module")]` field-level attribute to `ssz_derive` so that we can opt into the 4-byte encoding on a field-by-field basis.

The `ssz::legacy::four_byte_impl!` macro allows a one-liner to define the module required for the `#[ssz(with = "module")]` for some `Option<T> where T: Encode + Decode`.

Notably, **I have removed `Encode` and `Decode` impls for `Option`**. I've done this to force a break on downstream users. Like I mentioned, `Option` isn't used in the spec so I don't think it'll be *that* annoying. I think it's nicer than quietly having two different union implementations or quietly breaking the existing `Option` impl.

### Crate Publish Ordering

I've modified the order in which CI publishes crates to ensure that we don't publish a crate without ensuring we already published a crate that it depends upon.

## TODO

- [ ] Queue a follow-up `[patch]`-removing PR.
2021-09-25 05:58:36 +00:00
Age Manning
00a7ef0036 Correct bug in sync (#2615)
A bug that causes failed batches to continually download in a loop is corrected.
2021-09-23 01:32:04 +00:00
Paul Hauner
be11437c27 Batch BLS verification for attestations (#2399)
## Issue Addressed

NA

## Proposed Changes

Adds the ability to verify batches of aggregated/unaggregated attestations from the network.

When the `BeaconProcessor` finds there are messages in the aggregated or unaggregated attestation queues, it will first check the length of the queue:

- `== 1` verify the attestation individually.
- `>= 2` take up to 64 of those attestations and verify them in a batch.

Notably, we only perform batch verification if the queue has a backlog. We don't apply any artificial delays to attestations to try and force them into batches. 

### Batching Details

To assist with implementing batches we modify `beacon_chain::attestation_verification` to have two distinct categories for attestations:

- *Indexed* attestations: those which have passed initial validation and were valid enough for us to derive an `IndexedAttestation`.
- *Verified* attestations: those attestations which were indexed *and also* passed signature verification. These are well-formed, interesting messages which were signed by validators.

The batching functions accept `n` attestations and then return `n` attestation verification `Result`s, where those `Result`s can be any combination of `Ok` or `Err`. In other words, we attempt to verify as many attestations as possible and return specific per-attestation results so peer scores can be updated, if required.

When we batch verify attestations, we first try to map all those attestations to *indexed* attestations. If any of those attestations were able to be indexed, we then perform batch BLS verification on those indexed attestations. If the batch verification succeeds, we convert them into *verified* attestations, disabling individual signature checking. If the batch fails, we convert to verified attestations with individual signature checking enabled.

Ultimately, we optimistically try to do a batch verification of attestation signatures and fall-back to individual verification if it fails. This opens an attach vector for "poisoning" the attestations and causing us to waste a batch verification. I argue that peer scoring should do a good-enough job of defending against this and the typical-case gains massively outweigh the worst-case losses.

## Additional Info

Before this PR, attestation verification took the attestations by value (instead of by reference). It turns out that this was unnecessary and, in my opinion, resulted in some undesirable ergonomics (e.g., we had to pass the attestation back in the `Err` variant to avoid clones). In this PR I've modified attestation verification so that it now takes a reference.

I refactored the `beacon_chain/tests/attestation_verification.rs` tests so they use a builder-esque "tester" struct instead of a weird macro. It made it easier for me to test individual/batch with the same set of tests and I think it was a nice tidy-up. Notably, I did this last to try and make sure my new refactors to *actual* production code would pass under the existing test suite.
2021-09-22 08:49:41 +00:00
Michael Sproul
9667dc2f03 Implement checkpoint sync (#2244)
## Issue Addressed

Closes #1891
Closes #1784

## Proposed Changes

Implement checkpoint sync for Lighthouse, enabling it to start from a weak subjectivity checkpoint.

## Additional Info

- [x] Return unavailable status for out-of-range blocks requested by peers (#2561)
- [x] Implement sync daemon for fetching historical blocks (#2561)
- [x] Verify chain hashes (either in `historical_blocks.rs` or the calling module)
- [x] Consistency check for initial block + state
- [x] Fetch the initial state and block from a beacon node HTTP endpoint
- [x] Don't crash fetching beacon states by slot from the API
- [x] Background service for state reconstruction, triggered by CLI flag or API call.

Considered out of scope for this PR:

- Drop the requirement to provide the `--checkpoint-block` (this would require some pretty heavy refactoring of block verification)


Co-authored-by: Diva M <divma@protonmail.com>
2021-09-22 00:37:28 +00:00
Age Manning
280e4fe23d Increase connection limits and allow priority connections (#2604)
In previous network updates we have made our libp2p connections more lean by limiting the maximum number of connections a lighthouse node will accept before libp2p rejects new connections. 

However, we still maintain the logic that at maximum connections, we try to dial extra peers if they are needed by a validator client to publish messages on a specific subnet. The dials typically result in failures due the libp2p connection limits. 

This PR adds an extra factor, `PRIORITY_PEER_EXCESS` which sets aside a new allocation of peers we are able to dial in case we need these peers for the VC client.  This allocation sits along side the excess peer (which allows extra incoming peers on top of our target peer limit). 

The drawback here, is that libp2p now allows extra peers to connect to us (beyond the standard peer limit) which the peer manager should subsequently reject.
2021-09-21 07:45:13 +00:00
Age Manning
a73dcb7b6d Improved handling of IP Banning (#2530)
This PR in general improves the handling around peer banning. 

Specifically there were issues when multiple peers under a single IP connected to us after we banned the IP for poor behaviour.

This PR should now handle these peers gracefully as well as make some improvements around how we previously disconnected and banned peers. 

The logic now goes as follows:
- Once a peer gets banned, its gets registered with its known IP addresses
- Once enough banned peers exist under a single IP that IP is banned
- We retain connections with existing peers under this IP
- Any new connections under this IP are rejected
2021-09-17 04:02:31 +00:00
Pawan Dhananjay
64ad2af100 Subscribe to altair gossip topics 2 slots before fork (#2532)
## Issue Addressed

N/A

## Proposed Changes

Add a fork_digest to `ForkContext` only if it is set in the config.
Reject gossip messages on post fork topics before the fork happens.

Edit: Instead of rejecting gossip messages on post fork topics, we now subscribe to post fork topics 2 slots before the fork.

Co-authored-by: Age Manning <Age@AgeManning.com>
2021-09-17 01:11:16 +00:00
Age Manning
56e0615df8 Experimental discovery (#2577)
# Description

A few changes have been made to discovery. In particular a custom re-write of an LRU cache which previously was read/write O(N) for all our sessions ~5k, to a more reasonable hashmap-style O(1). 

Further there has been reported issues in the current discv5, so added error handling to help identify the issue has been added.
2021-09-16 04:45:05 +00:00
Age Manning
95b17137a8 Reduce network debug noise (#2593)
The identify network debug logs can get quite noisy and are unnecessary to print on every request/response. 

This PR reduces debug noise by only printing messages for identify messages that offer some new information.
2021-09-14 08:28:35 +00:00
Wink Saville
4755d4b236 Update sloggers to v2.0.2 (#2588)
fixes #2584
2021-09-14 06:48:26 +00:00
Paul Hauner
f9bba92db3 v1.5.2 (#2595)
## Issue Addressed

NA

## Proposed Changes

Version bump

## Additional Info

Please do not `bors` without my approval, I am still testing.
2021-09-13 23:01:19 +00:00
Paul Hauner
ddbd4e6965 v1.5.2-rc.0 (#2565)
## Issue Addressed

NA

## Proposed Changes

- Bump version
- Tidy some comments mangled by the version change regex.

## Additional Info

NA
2021-09-03 23:28:21 +00:00
Michael Sproul
9c785a9b33 Optimize process_attestation with active balance cache (#2560)
## Proposed Changes

Cache the total active balance for the current epoch in the `BeaconState`. Computing this value takes around 1ms, and this was negatively impacting block processing times on Prater, particularly when reconstructing states.

With a large number of attestations in each block, I saw the `process_attestations` function taking 150ms, which means that reconstructing hot states can take up to 4.65s (31 * 150ms), and reconstructing freezer states can take up to 307s (2047 * 150ms).

I opted to add the cache to the beacon state rather than computing the total active balance at the start of state processing and threading it through. Although this would be simpler in a way, it would waste time, particularly during block replay, as the total active balance doesn't change for the duration of an epoch. So we save ~32ms for hot states, and up to 8.1s for freezer states (using `--slots-per-restore-point 8192`).
2021-09-03 07:50:43 +00:00
realbigsean
50321c6671 Updates to make crates publishable (#2472)
## Issue Addressed

Related to: #2259

Made an attempt at all the necessary updates here to publish the crates to crates.io. I incremented the minor versions on all the crates that have been previously published. We still might run into some issues as we try to publish because I'm not able to test this out but I think it's a good starting point.

## Proposed Changes

- Add description and license to `ssz_types` and `serde_util`
- rename `serde_util` to `eth2_serde_util`
- increment minor versions
- remove path dependencies
- remove patch dependencies 

## Additional Info
Crates published: 

- [x] `tree_hash` -- need to publish `tree_hash_derive` and `eth2_hashing` first
- [x] `eth2_ssz_types` -- need to publish `eth2_serde_util` first
- [x] `tree_hash_derive`
- [x] `eth2_ssz`
- [x] `eth2_ssz_derive`
- [x] `eth2_serde_util`
- [x] `eth2_hashing`


Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-09-03 01:10:25 +00:00
Pawan Dhananjay
5a3bcd2904 Validator monitor support for sync committees (#2476)
## Issue Addressed

N/A

## Proposed Changes

Add functionality in the validator monitor to provide sync committee related metrics for monitored validators.


Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-08-31 23:31:36 +00:00
Paul Hauner
44fa54004c Persist to DB after setting canonical head (#2547)
## Issue Addressed

NA

## Proposed Changes

Missed head votes on attestations is a well-known issue. The primary cause is a block getting set as the head *after* the attestation deadline.

This PR aims to shorten the overall time between "block received" and "block set as head" by:

1. Persisting the head and fork choice *after* setting the canonical head
    - Informal measurements show this takes ~200ms
 1. Pruning the op pool *after* setting the canonical head.
 1. No longer persisting the op pool to disk during `BeaconChain::fork_choice`
     - Informal measurements show this can take up to 1.2s.
     
I also add some metrics to help measure the effect of these changes.
     
Persistence changes like this run the risk of breaking assumptions downstream. However, I have considered these risks and I think we're fine here. I will describe my reasoning for each change.

## Reasoning

### Change 1:  Persisting the head and fork choice *after* setting the canonical head

For (1), although the function is called `persist_head_and_fork_choice`, it only persists:

- Fork choice
- Head tracker
- Genesis block root

Since `BeaconChain::fork_choice_internal` does not modify these values between the original time we were persisting it and the current time, I assert that the change I've made is non-substantial in terms of what ends up on-disk. There's the possibility that some *other* thread has modified fork choice in the extra time we've given it, but that's totally fine.

Since the only time we *read* those values from disk is during startup, I assert that this has no impact during runtime. 

### Change 2: Pruning the op pool after setting the canonical head

Similar to the argument above, we don't modify the op pool during `BeaconChain::fork_choice_internal` so it shouldn't matter when we prune. This change should be non-substantial.

### Change 3: No longer persisting the op pool to disk during `BeaconChain::fork_choice`

This change *is* substantial. With the proposed changes, we'll only be persisting the op pool to disk when we shut down cleanly (i.e., the `BeaconChain` gets dropped). This means we'll save disk IO and time during usual operation, but a `kill -9` or similar "crash" will probably result in an out-of-date op pool when we reboot. An out-of-date op pool can only have an impact when producing blocks or aggregate attestations/sync committees.

I think it's pretty reasonable that a crash might result in an out-of-date op pool, since:

- Crashes are fairly rare. Practically the only time I see LH suffer a full crash is when the OOM killer shows up, and that's a very serious event.
- It's generally quite rare to produce a block/aggregate immediately after a reboot. Just a few slots of runtime is probably enough to have a decent-enough op pool again.

## Additional Info

Credits to @macladson for the timings referenced here.
2021-08-31 04:48:21 +00:00
Pawan Dhananjay
b4dd98b3c6 Shutdown after sync (#2519)
## Issue Addressed

Resolves #2033 

## Proposed Changes

Adds a flag to enable shutting down beacon node right after sync is completed.

## Additional Info

Will need modification after weak subjectivity sync is enabled to change definition of a fully synced node.
2021-08-30 13:46:13 +00:00
Michael Sproul
10945e0619 Revert bad blocks on missed fork (#2529)
## Issue Addressed

Closes #2526

## Proposed Changes

If the head block fails to decode on start up, do two things:

1. Revert all blocks between the head and the most recent hard fork (to `fork_slot - 1`).
2. Reset fork choice so that it contains the new head, and all blocks back to the new head's finalized checkpoint.

## Additional Info

I tweaked some of the beacon chain test harness stuff in order to make it generic enough to test with a non-zero slot clock on start-up. In the process I consolidated all the various `new_` methods into a single generic one which will hopefully serve all future uses 🤞
2021-08-30 06:41:31 +00:00
Mason Stallmo
bc14d1d73d Add more unix signal handlers (#2486)
## Issue Addressed

Resolves #2114 

Swapped out the ctrlc crate for tokio signals to hook register handlers for SIGPIPE and SIGHUP along with SIGTERM and SIGINT.

## Proposed Changes

- Swap out the ctrlc crate for tokio signals for unix signal handing
- Register signals for SIGPIPE and SHIGUP that trigger the same shutdown procedure as SIGTERM and SIGINT

## Additional Info

I tested these changes against the examples in the original issue and noticed some interesting behavior on my machine. When running `lighthouse bn --network pyrmont |& tee -a pyrmont_bn.log` or `lighthouse bn --network pyrmont 2>&1 | tee -a pyrmont_bn.log` none of the above signals are sent to the lighthouse program in a way I was able to observe. 

The only time it seems that the signal gets sent to the lighthouse program is if there is no redirection of stderr to stdout. I'm not as familiar with the details of how unix signals work in linux with a redirect like that so I'm not sure if this is a bug in the program or expected behavior.

Signals are correctly received without the redirection and if the above signals are sent directly to the program with something like `kill`.
2021-08-30 05:19:34 +00:00
Pawan Dhananjay
99737c551a Improve eth1 fallback logging (#2490)
## Issue Addressed

Resolves #2487 

## Proposed Changes

Logs a message once in every invocation of `Eth1Service::update` method if the primary endpoint is unavailable for some reason. 

e.g.
```log
Aug 03 00:09:53.517 WARN Error connecting to eth1 node endpoint  action: trying fallbacks, endpoint: http://localhost:8545/, service: eth1_rpc
Aug 03 00:09:56.959 INFO Fetched data from fallback              fallback_number: 1, service: eth1_rpc
```

The main aim of this PR is to have an accompanying message to the "action: trying fallbacks" error message that is returned when checking the endpoint for liveness. This is mainly to indicate to the user that the fallback was live and reachable. 

## Additional info
This PR is not meant to be a catch all for all cases where the primary endpoint failed. For instance, this won't log anything if the primary node was working fine during endpoint liveness checking and failed during deposit/block fetching. This is done intentionally to reduce number of logs while initial deposit/block sync and to avoid more complicated logic.
2021-08-30 00:51:26 +00:00
Paul Hauner
b0ac3464ca v1.5.1 (#2544)
## Issue Addressed

NA

## Proposed Changes

- Bump version

## Additional Info

NA
2021-08-27 01:58:19 +00:00
Paul Hauner
4405425726 Expand gossip duplicate cache time (#2542)
## Issue Addressed

NA

## Proposed Changes

This PR expands the time that entries exist in the gossip-sub duplicate cache. Recent investigations found that this cache is one slot (12s) shorter than the period for which an attestation is permitted to propagate on the gossip network.

Before #2540, this was causing peers to be unnecessarily down-scored for sending old attestations. Although that issue has been fixed, the duplicate cache time is increased here to avoid such messages from getting any further up the networking stack then required.

## Additional Info

NA
2021-08-26 23:25:50 +00:00
Paul Hauner
3fdad38eba Remove penality for duplicate attestation from same validator (#2540)
## Issue Addressed

NA

## Proposed Changes

A Discord user presented logs which indicated a drop in their peer count caused by a variety of peers sending attestations where we'd already seen an attestation for that validator. It's presently unclear how this case came about, but during our investigation I noticed that we are down-voting peers for sending such attestations.

There are three scenarios where we may receive duplicate unagg. attestations from the same validator:

1. The validator is committing a slashable offense.
2. The gossipsub message-deduping functionality is not working as expected.
3. We received the message via the HTTP prior to seeing it via gossip.

Scenario (1) would be so costly for an attacker that I don't think we need to add DoS protection for it.

Scenario (2) seems feasible. Our "seen message" caches in gossipsub might fill up/expire and let through these duplicates. There are also cases involving message ID mismatches with the other peers. In both these cases, I don't think we should be doing 1 attestation == -1 point down-voting.

Scenario (3) is not necessarily a fault of the peer and we shouldn't down-score them for it.

## Additional Info

NA
2021-08-26 08:00:50 +00:00
Age Manning
09545fe668 Increase maximum gossipsub subscriptions (#2531)
Due to the altair fork, in principle we can now subscribe to up to 148 topics. This bypasses our original limit and we can end up rejecting subscriptions. 

This PR increases the limit to account for the fork.
2021-08-26 02:01:10 +00:00
Pawan Dhananjay
d3b4cbed53 Packet filter cli option (#2523)
## Issue Addressed

N/A

## Proposed Changes

Adds a cli option to disable packet filter in `lighthouse bootnode`. This is useful in running local testnets as the bootnode bans requests from the same ip(localhost) if the packet filter is enabled.
2021-08-26 00:29:39 +00:00
realbigsean
5b8436e33f Fork schedule api (#2525)
## Issue Addressed

Resolves #2524

## Proposed Changes

- Return all known forks in the `/config/fork_schedule`, previously returned only the head of the chain's fork.
- Deleted the `StateId::head` method because it was only previously used in this endpoint.


Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-08-24 01:36:27 +00:00
Pawan Dhananjay
02fd54bea7 Refactor discovery queries (#2518)
## Issue Addressed

N/A

## Proposed Changes

Refactor discovery queries such that only `QueryType::Subnet` queries are queued for grouping. `QueryType::FindPeers` is always made regardless of the number of active `Subnet` queries (max = 2). This prevents `FindPeers` queries from getting starved if `Subnet` queries start queuing up. 

Also removes `GroupedQueryType` struct and use `QueryType` for all queuing and processing of discovery requests.

## Additional Info

Currently, no distinction is made between subnet discovery queries for attestation and sync committee subnets when grouping queries. Potentially we could prioritise attestation queries over sync committee queries in the future.
2021-08-24 00:12:13 +00:00
Paul Hauner
90d5ab1566 v1.5.0 (#2535)
## Issue Addressed

NA

## Proposed Changes

- Version bump
- Increase queue sizes for aggregated attestations and re-queued attestations. 

## Additional Info

NA
2021-08-23 04:27:36 +00:00
Paul Hauner
f2a8c6229c Metrics and DEBG log for late gossip blocks (#2533)
## Issue Addressed

Which issue # does this PR address?

## Proposed Changes

- Add a counter metric to log when a block is received late from gossip.
- Also push a `DEBG` log for the above condition.
- Use Debug (`?`) instead of Display (`%`) for a bunch of logs in the beacon processor, so we don't have to deal with concatenated block roots.
- Add new ERRO and CRIT to HTTP API to alert users when they're publishing late blocks.

## Additional Info

NA
2021-08-23 00:59:14 +00:00
Paul Hauner
c7379836a5 v1.5.0-rc.1 (#2516)
## Issue Addressed

NA

## Proposed Changes

- Bump version

## Additional Info

NA
2021-08-17 05:34:31 +00:00
Michael Sproul
c0a2f501d9 Upgrade dependencies (#2513)
## Proposed Changes

* Consolidate Tokio versions: everything now uses the latest v1.10.0, no more `tokio-compat`.
* Many semver-compatible changes via `cargo update`. Notably this upgrades from the yanked v0.8.0 version of crossbeam-deque which is present in v1.5.0-rc.0
* Many semver incompatible upgrades via `cargo upgrades` and `cargo upgrade --workspace pkg_name`. Notable ommissions:
    - Prometheus, to be handled separately: https://github.com/sigp/lighthouse/issues/2485
    - `rand`, `rand_xorshift`: the libsecp256k1 package requires 0.7.x, so we'll stick with that for now
    - `ethereum-types` is pinned at 0.11.0 because that's what `web3` is using and it seems nice to have just a single version
    
## Additional Info

We still have two versions of `libp2p-core` due to `discv5` depending on the v0.29.0 release rather than `master`. AFAIK it should be OK to release in this state (cc @AgeManning )
2021-08-17 01:00:24 +00:00
Pawan Dhananjay
d17350c0fa Lower penalty for past/future slot errors (#2510)
## Issue Addressed

N/A

## Proposed Changes

Reduce the penalties with past/future slot errors for sync committee messages.
2021-08-16 23:30:18 +00:00
Paul Hauner
4c4ebfbaa1 v1.5.0 rc.0 (#2506)
## Issue Addressed

NA

## Proposed Changes

- Bump to `v1.5.0-rc.0`.
- Increase attestation reprocessing queue size (I saw this filling up on Prater).
- Reduce error log for full attn reprocessing queue to warn.

## TODO

- [x] Manual testing
- [x] Resolve https://github.com/sigp/lighthouse/pull/2493
- [x] Include https://github.com/sigp/lighthouse/pull/2501
2021-08-12 04:02:46 +00:00
Paul Hauner
4af6fcfafd Bump libp2p to address inconsistency in mesh peer tracking (#2493)
## Issue Addressed

- Resolves #2457
- Resolves #2443

## Proposed Changes

Target the (presently unreleased) head of `libp2p/rust-libp2p:master` in order to obtain the fix from https://github.com/libp2p/rust-libp2p/pull/2175.

Additionally:

- `libsecp256k1` needed to be upgraded to satisfy the new version of `libp2p`.
- There were also a handful of minor changes to `eth2_libp2p` to suit some interface changes.
- Two `cargo audit --ignore` flags were remove due to libp2p upgrades.

## Additional Info
 
 NA
2021-08-12 01:59:20 +00:00
Paul Hauner
ceda27371d Ensure doppelganger detects attestations in blocks (#2495)
## Issue Addressed

NA

## Proposed Changes

When testing our (not-yet-released) Doppelganger implementation, I noticed that we aren't detecting attestations included in blocks (only those on the gossip network).

This is because during [block processing](e8c0d1f19b/beacon_node/beacon_chain/src/beacon_chain.rs (L2168)) we only update the `observed_attestations` cache with each attestation, but not the `observed_attesters` cache. This is the correct behaviour when we consider the [p2p spec](https://github.com/ethereum/eth2.0-specs/blob/v1.0.1/specs/phase0/p2p-interface.md):

> [IGNORE] There has been no other valid attestation seen on an attestation subnet that has an identical attestation.data.target.epoch and participating validator index.

We're doing the right thing here and still allowing attestations on gossip that we've seen in a block. However, this doesn't work so nicely for Doppelganger.

To resolve this, I've taken the following steps:

- Add a `observed_block_attesters` cache.
- Rename `observed_attesters` to `observed_gossip_attesters`.

## TODO

- [x] Add a test to ensure a validator that's been seen in a block attestation (but not a gossip attestation) returns `true` for `BeaconChain::validator_seen_at_epoch`.
- [x] Add a test to ensure `observed_block_attesters` isn't polluted via gossip attestations and vice versa. 


Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-08-09 02:43:03 +00:00
Michael Sproul
17a2c778e3 Altair validator client and HTTP API (#2404)
## Proposed Changes

* Implement the validator client and HTTP API changes necessary to support Altair


Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-08-06 00:47:31 +00:00
Pawan Dhananjay
e8c0d1f19b Altair networking (#2300)
## Issue Addressed

Resolves #2278 

## Proposed Changes

Implements the networking components for the Altair hard fork https://github.com/ethereum/eth2.0-specs/blob/dev/specs/altair/p2p-interface.md

## Additional Info

This PR acts as the base branch for networking changes and tracks https://github.com/sigp/lighthouse/pull/2279 . Changes to gossip, rpc and discovery can be separate PRs to be merged here for ease of review.

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-08-04 01:44:57 +00:00
Michael Sproul
187425cdc1 Bump discv5 to v0.1.0-beta.9 (#2479)
Bump discv5 to fix the issues with IP filters and removing nodes.

~~Blocked on an upstream release, and more testnet data.~~
2021-08-03 01:05:06 +00:00
realbigsean
c5786a8821 Doppelganger detection (#2230)
## Issue Addressed

Resolves #2069 

## Proposed Changes

- Adds a `--doppelganger-detection` flag
- Adds a `lighthouse/seen_validators` endpoint, which will make it so the lighthouse VC is not interopable with other client beacon nodes if the `--doppelganger-detection` flag is used, but hopefully this will become standardized. Relevant Eth2 API repo issue: https://github.com/ethereum/eth2.0-APIs/issues/64
- If the `--doppelganger-detection` flag is used, the VC will wait until the beacon node is synced, and then wait an additional 2 epochs. The reason for this is to make sure the beacon node is able to subscribe to the subnets our validators should be attesting on. I think an alternative would be to have the beacon node subscribe to all subnets for 2+ epochs on startup by default.

## Additional Info

I'd like to add tests and would appreciate feedback. 

TODO:  handle validators started via the API, potentially make this default behavior

Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
2021-07-31 03:50:52 +00:00
realbigsean
303deb9969 Rust 1.54.0 lints (#2483)
## Issue Addressed

N/A

## Proposed Changes

- Removing a bunch of unnecessary references
- Updated `Error::VariantError` to `Error::Variant`
- There were additional enum variant lints that I ignored, because I thought our variant names were fine
- removed `MonitoredValidator`'s `pubkey` field, because I couldn't find it used anywhere. It looks like we just use the string version of the pubkey (the `id` field) if there is no index

## Additional Info



Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-07-30 01:11:47 +00:00
Paul Hauner
8efd9fc324 Add AttesterCache for attestation production (#2478)
## Issue Addressed

- Resolves #2169

## Proposed Changes

Adds the `AttesterCache` to allow validators to produce attestations for older slots. Presently, some arbitrary restrictions can force validators to receive an error when attesting to a slot earlier than the present one. This can cause attestation misses when there is excessive load on the validator client or time sync issues between the VC and BN.

## Additional Info

NA
2021-07-29 04:38:26 +00:00
Michael Sproul
923486f34c Use bulk verification for sync_aggregate signature (#2415)
## Proposed Changes

Add the `sync_aggregate` from `BeaconBlock` to the bulk signature verifier for blocks. This necessitates a new signature set constructor for the sync aggregate, which is different from the others due to the use of [`eth2_fast_aggregate_verify`](https://github.com/ethereum/eth2.0-specs/blob/v1.1.0-alpha.7/specs/altair/bls.md#eth2_fast_aggregate_verify) for sync aggregates, per [`process_sync_aggregate`](https://github.com/ethereum/eth2.0-specs/blob/v1.1.0-alpha.7/specs/altair/beacon-chain.md#sync-aggregate-processing). I made the choice to return an optional signature set, with `None` representing the case where the signature is valid on account of being the point at infinity (requires no further checking).

To "dogfood" the changes and prevent duplication, the consensus logic now uses the signature set approach as well whenever it is required to verify signatures (which should only be in testing AFAIK). The EF tests pass with the code as it exists currently, but failed before I adapted the `eth2_fast_aggregate_verify` changes (which is good).

As a result of this change Altair block processing should be a little faster, and importantly, we will no longer accidentally verify signatures when replaying blocks, e.g. when replaying blocks from the database.
2021-07-28 05:40:21 +00:00
Paul Hauner
6e3ca48cb9 Cache participating indices for Altair epoch processing (#2416)
## Issue Addressed

NA

## Proposed Changes

This PR addresses two things:

1. Allows the `ValidatorMonitor` to work with Altair states.
1. Optimizes `altair::process_epoch` (see [code](https://github.com/paulhauner/lighthouse/blob/participation-cache/consensus/state_processing/src/per_epoch_processing/altair/participation_cache.rs) for description)

## Breaking Changes

The breaking changes in this PR revolve around one premise:

*After the Altair fork, it's not longer possible (given only a `BeaconState`) to identify if a validator had *any* attestation included during some epoch. The best we can do is see if that validator made the "timely" source/target/head flags.*

Whilst this seems annoying, it's not actually too bad. Finalization is based upon "timely target" attestations, so that's really the most important thing. Although there's *some* value in knowing if a validator had *any* attestation included, it's far more important to know about "timely target" participation, since this is what affects finality and justification.

For simplicity and consistency, I've also removed the ability to determine if *any* attestation was included from metrics and API endpoints. Now, all Altair and non-Altair states will simply report on the head/target attestations.

The following section details where we've removed fields and provides replacement values.

### Breaking Changes: Prometheus Metrics

Some participation metrics have been removed and replaced. Some were removed since they are no longer relevant to Altair (e.g., total attesting balance) and others replaced with gwei values instead of pre-computed values. This provides more flexibility at display-time (e.g., Grafana).

The following metrics were added as replacements:

- `beacon_participation_prev_epoch_head_attesting_gwei_total`
- `beacon_participation_prev_epoch_target_attesting_gwei_total`
- `beacon_participation_prev_epoch_source_attesting_gwei_total`
- `beacon_participation_prev_epoch_active_gwei_total`

The following metrics were removed:

- `beacon_participation_prev_epoch_attester`
   - instead use `beacon_participation_prev_epoch_source_attesting_gwei_total / beacon_participation_prev_epoch_active_gwei_total`.
- `beacon_participation_prev_epoch_target_attester`
   - instead use `beacon_participation_prev_epoch_target_attesting_gwei_total / beacon_participation_prev_epoch_active_gwei_total`.
- `beacon_participation_prev_epoch_head_attester`
   - instead use `beacon_participation_prev_epoch_head_attesting_gwei_total / beacon_participation_prev_epoch_active_gwei_total`.

The `beacon_participation_prev_epoch_attester` endpoint has been removed. Users should instead use the pre-existing `beacon_participation_prev_epoch_target_attester`. 

### Breaking Changes: HTTP API

The `/lighthouse/validator_inclusion/{epoch}/{validator_id}` endpoint loses the following fields:

- `current_epoch_attesting_gwei` (use `current_epoch_target_attesting_gwei` instead)
- `previous_epoch_attesting_gwei` (use `previous_epoch_target_attesting_gwei` instead)

The `/lighthouse/validator_inclusion/{epoch}/{validator_id}` endpoint lose the following fields:

- `is_current_epoch_attester` (use `is_current_epoch_target_attester` instead)
- `is_previous_epoch_attester` (use `is_previous_epoch_target_attester` instead)
- `is_active_in_current_epoch` becomes `is_active_unslashed_in_current_epoch`.
- `is_active_in_previous_epoch` becomes `is_active_unslashed_in_previous_epoch`.

## Additional Info

NA

## TODO

- [x] Deal with total balances
- [x] Update validator_inclusion API
- [ ] Ensure `beacon_participation_prev_epoch_target_attester` and `beacon_participation_prev_epoch_head_attester` work before Altair

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-07-27 07:01:01 +00:00
Michael Sproul
f5bdca09ff Update to spec v1.1.0-beta.1 (#2460)
## Proposed Changes

Update to the latest version of the Altair spec, which includes new tests and a tweak to the target sync aggregators.

## Additional Info

This change is _not_ required for the imminent Altair devnet, and is waiting on the merge of #2321 to unstable.


Co-authored-by: Paul Hauner <paul@paulhauner.com>
2021-07-27 05:43:35 +00:00
Michael Sproul
84e6d71950 Tree hash caching and optimisations for Altair (#2459)
## Proposed Changes

Remove the remaining Altair `FIXME`s from consensus land.

1. Implement tree hash caching for the participation lists. This required some light type manipulation, including removing the `TreeHash` bound from `CachedTreeHash` which was purely descriptive.
2. Plumb the proposer index through Altair attestation processing, to avoid calculating it for _every_ attestation (potentially 128ms on large networks). This duplicates some work from #2431, but with the aim of getting it in sooner, particularly for the Altair devnets.
3. Removes two FIXMEs related to `superstruct` and cloning, which are unlikely to be particularly detrimental and will be tracked here instead: https://github.com/sigp/superstruct/issues/5
2021-07-23 00:23:53 +00:00