Commit Graph

1540 Commits

Author SHA1 Message Date
Age Manning
aaa14073ff Clean up warnings (#2240)
This is a small PR that cleans up compiler warnings. 

The most controversial change is removing the `data_dir` field from the `BeaconChainBuilder`. 

It was removed because it was never read.


Co-authored-by: Paul Hauner <paul@paulhauner.com>
Co-authored-by: Herman Junge <hermanjunge@protonmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-04-12 00:57:43 +00:00
Mac L
f6f64cf0f5 Correcting disable-enr-auto-update flag definition (#2303)
## Issue Addressed

N/A

## Proposed Changes

Correct the `disable-enr-auto-update` boolean flag so that it no longer requires a value.
Previously it would require a value which was never used.

## Additional Info

Flag is read here: https://github.com/sigp/lighthouse/blob/unstable/beacon_node/src/config.rs#L585-L587
2021-04-11 23:52:29 +00:00
Paul Hauner
e7e5878953 Avoid BeaconState clone during metrics scrape (#2298)
## Issue Addressed

Which issue # does this PR address?

## Proposed Changes

Avoids cloning the `BeaconState` each time Prometheus scrapes our metrics (generally every 5s 😱).

I think the original motivation behind this was *"don't hold the lock on the head whilst we do computation on it"*, however I think is flawed since our computation here is so small that it'll be quicker than the clone.

The primary motivation here is to maintain a small memory footprint by holding less in memory (i.e., the cloned `BeaconState`) and to avoid the fragmentation-creep that occurs when cloning the big contiguous slabs of memory in the `BeaconState`.

I also collapsed the active/slashed/withdrawn counters into a single loop to increase efficiency.

## Additional Info

NA
2021-04-07 01:02:56 +00:00
Pawan Dhananjay
95a362213d Fix local testnet scripts (#2229)
## Issue Addressed

Resolves #2094 

## Proposed Changes

Fixes scripts for creating local testnets. Adds an option in `lighthouse boot_node` to run with a previously generated enr.
2021-03-30 05:17:58 +00:00
Paul Hauner
9eb1945136 v1.2.2 (#2287)
## Issue Addressed

NA

## Proposed Changes

- Bump versions

## Additional Info

NA
2021-03-30 04:07:03 +00:00
Paul Hauner
3d239b85ac Allow for a clock disparity on the duties endpoints (#2283)
## Issue Addressed

Resolves #2280

## Proposed Changes

Allows for API consumers to call the proposer/attester duties endpoints [`MAXIMUM_GOSSIP_CLOCK_DISPARITY`](b34a79dc0b/beacon_node/beacon_chain/src/beacon_chain.rs (L99-L102)) earlier than the current epoch. For additional reasoning, see https://github.com/sigp/lighthouse/issues/2280#issuecomment-805358897.

## Additional Info

NA
2021-03-29 23:42:35 +00:00
Paul Hauner
03cefd0065 Expand observed attestations capacity (#2266)
## Issue Addressed

NA

## Proposed Changes

I noticed the following error on one of our nodes:

```
Mar 18 00:03:35 ip-xxxx lighthouse-bn[333503]: Mar 18 00:03:35.103 ERRO Unable to validate aggregate            error: ObservedAttestersError(EpochTooLow { epoch: Epoch(23961), lowest_permissible_epoch: Epoch(23962) }), peer_id: 16Uiu2HAm5GL5KzPLhvfg9MBBFSpBqTVGRFSiTg285oezzWcZzwEv
```

The slot during this log was 766,815 (the last slot of the epoch). I believe this is due to an off-by-one error in `observed_attesters` where we were failing to provide enough capacity to store observations from the previous, current and next epochs. See code comments for further reasoning.

Here's a link to the spec: https://github.com/ethereum/eth2.0-specs/blob/v1.0.1/specs/phase0/p2p-interface.md#beacon_aggregate_and_proof

## Additional Info

NA
2021-03-29 23:42:34 +00:00
Michael Sproul
f9d60f5436 VC: accept unknown fields in chain spec (#2277)
## Issue Addressed

Closes #2274

## Proposed Changes

* Modify the `YamlConfig` to collect unknown fields into an `extra_fields` map, instead of failing hard.
* Log a debug message if there are extra fields returned to the VC from one of its BNs.

This restores Lighthouse's compatibility with Teku beacon nodes (and therefore Infura)
2021-03-26 04:53:57 +00:00
Paul Hauner
b34a79dc0b v1.2.1 (#2263)
## Issue Addressed

NA

## Proposed Changes

- Bump version.
- Add some new ENR for Prater
    - Afri: https://github.com/eth2-clients/eth2-testnets/pull/42
    - Prysm: https://github.com/eth2-clients/eth2-testnets/pull/43
- Apply the fixes from #2181 to the no-eth1-sim to try fix CI issues. 

## Additional Info

NA
2021-03-18 04:20:46 +00:00
Paul Hauner
015ab7d0a7 Optimize validator duties (#2243)
## Issue Addressed

Closes #2052

## Proposed Changes

- Refactor the attester/proposer duties endpoints in the BN
    - Performance improvements
    - Fixes some potential inconsistencies with the dependent root fields.
    - Removes `http_api::beacon_proposer_cache` and just uses the one on the `BeaconChain` instead.
    - Move the code for the proposer/attester duties endpoints into separate files, for readability.
- Refactor the `DutiesService` in the VC
    - Required to reduce the delay on broadcasting new blocks.
    - Gets rid of the `ValidatorDuty` shim struct that came about when we adopted the standard API.
    - Separate block/attestation duty tasks so that they don't block each other when one is slow.
- In the VC, use `PublicKeyBytes` to represent validators instead of `PublicKey`. `PublicKey` is a legit crypto object whilst `PublicKeyBytes` is just a byte-array, it's much faster to clone/hash `PublicKeyBytes` and this change has had a significant impact on runtimes.
    - Unfortunately this has created lots of dust changes.
 - In the BN, store `PublicKeyBytes` in the `beacon_proposer_cache` and allow access to them. The HTTP API always sends `PublicKeyBytes` over the wire and the conversion from `PublicKey` -> `PublickeyBytes` is non-trivial, especially when queries have 100s/1000s of validators (like Pyrmont).
 - Add the `state_processing::state_advance` mod which dedups a lot of the "apply `n` skip slots to the state" code.
    - This also fixes a bug with some functions which were failing to include a state root as per [this comment](072695284f/consensus/state_processing/src/state_advance.rs (L69-L74)). I couldn't find any instance of this bug that resulted in anything more severe than keying a shuffling cache by the wrong block root.
 - Swap the VC block service to use `mpsc` from `tokio` instead of `futures`. This is consistent with the rest of the code base.
    
~~This PR *reduces* the size of the codebase 🎉~~ It *used* to reduce the size of the code base before I added more comments. 

## Observations on Prymont

- Proposer duties times down from peaks of 450ms to consistent <1ms.
- Current epoch attester duties times down from >1s peaks to a consistent 20-30ms.
- Block production down from +600ms to 100-200ms.

## Additional Info

- ~~Blocked on #2241~~
- ~~Blocked on #2234~~

## TODO

- [x] ~~Refactor this into some smaller PRs?~~ Leaving this as-is for now.
- [x] Address `per_slot_processing` roots.
- [x] Investigate slow next epoch times. Not getting added to cache on block processing?
- [x] Consider [this](072695284f/beacon_node/store/src/hot_cold_store.rs (L811-L812)) in the scenario of replacing the state roots


Co-authored-by: pawan <pawandhananjay@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-03-17 05:09:57 +00:00
Michael Sproul
3919737978 Release v1.2.0 (#2249)
## Proposed Changes

Release v1.2.0 unchanged from the release candidate.
2021-03-10 01:28:32 +00:00
Michael Sproul
770a2ca030 Fix proposer cache priming upon state advance (#2252)
## Proposed Changes

While investigating an incorrect head + target vote for the epoch boundary block 708544, I noticed that the state advance failed to prime the proposer cache, as per these logs:

```
Mar 09 21:42:47.448 DEBG Subscribing to subnet                   target_slot: 708544, subnet: Y, service: attestation_service
Mar 09 21:49:08.063 DEBG Advanced head state one slot            current_slot: 708543, state_slot: 708544, head_root: 0xaf5e69de09f384ee3b4fb501458b7000c53bb6758a48817894ec3d2b030e3e6f, service: state_advance
Mar 09 21:49:08.063 DEBG Completed state advance                 initial_slot: 708543, advanced_slot: 708544, head_root: 0xaf5e69de09f384ee3b4fb501458b7000c53bb6758a48817894ec3d2b030e3e6f, service: state_advance
Mar 09 21:49:14.787 DEBG Proposer shuffling cache miss           block_slot: 708544, block_root: 0x9b14bf68667ab1d9c35e6fd2c95ff5d609aa9e8cf08e0071988ae4aa00b9f9fe, parent_slot: 708543, parent_root: 0xaf5e69de09f384ee3b4fb501458b7000c53bb6758a48817894ec3d2b030e3e6f, service: beacon
Mar 09 21:49:14.800 DEBG Successfully processed gossip block     root: 0x9b14bf68667ab1d9c35e6fd2c95ff5d609aa9e8cf08e0071988ae4aa00b9f9fe, slot: 708544, graffiti: , service: beacon
Mar 09 21:49:14.800 INFO New block received                      hash: 0x9b14…f9fe, slot: 708544
Mar 09 21:49:14.984 DEBG Head beacon block                       slot: 708544, root: 0x9b14…f9fe, finalized_epoch: 22140, finalized_root: 0x28ec…29a7, justified_epoch: 22141, justified_root: 0x59db…e451, service: beacon
Mar 09 21:49:15.055 INFO Unaggregated attestation                validator: XXXXX, src: api, slot: 708544, epoch: 22142, delay_ms: 53, index: Y, head: 0xaf5e69de09f384ee3b4fb501458b7000c53bb6758a48817894ec3d2b030e3e6f, service: val_mon
Mar 09 21:49:17.001 DEBG Slot timer                              sync_state: Synced, current_slot: 708544, head_slot: 708544, head_block: 0x9b14…f9fe, finalized_epoch: 22140, finalized_root: 0x28ec…29a7, peers: 55, service: slot_notifier
```

The reason for this is that the condition was backwards, so that whole block of code was unreachable.

Looking at the attestations for the block included in the block after, we can see that lots of validators missed it. Some of them may be Lighthouse v1.1.1-v1.2.0-rc.0, but it's probable that they would have missed even with the proposer cache primed, given how late the block 708544 arrived (the cache miss occurred 3.787s after the slot start): https://beaconcha.in/block/708545#attestations
2021-03-10 00:20:50 +00:00
Michael Sproul
786e25ea08 Release candidate v1.2.0-rc.0 (#2248)
Prepare for v1.2.0 with this release candidate.

To be merged after #2247 and #2246

Co-authored-by: Age Manning <Age@AgeManning.com>
2021-03-08 06:27:50 +00:00
Age Manning
babd153352 Prevent adding and dialing bootnodes when discovery is disabled (#2247)
This is a small PR which prevents unwanted bootnodes from being added to the DHT and being dialed when the `--disable-discovery` flag is set. 

The main reason one would want to disable discovery is to connect to a fix set of peers. Currently, regardless of what the user does, Lighthouse will populate its DHT with previously known peers and also fill it with the spec's bootnodes. It will then dial the bootnodes that are capable of being dialed. This prevents testing with a fixed peer list.

This PR prevents these excess nodes from being added and dialed if the user has set `--disable-discovery`.
2021-03-08 06:27:49 +00:00
Paul Hauner
e4eb0eb168 Use advanced state for block production (#2241)
## Issue Addressed

NA

## Proposed Changes

- Use the pre-states from #2174 during block production.
    - Running this on Pyrmont shows block production times dropping from ~550ms to ~150ms.
- Create `crit` and `warn` logs when a block is published to the API later than we expect.
    - On mainnet we are issuing a warn if the block is published more than 1s later than the slot start and a crit for more than 3s.
- Rename some methods on the `SnapshotCache` for clarity.
- Add the ability to pass the state root to `BeaconChain::produce_block_on_state` to avoid computing a state root. This is a very common LH optimization.
- Add a metric that tracks how late we broadcast blocks received from the HTTP API. This is *technically* a duplicate of a `ValidatorMonitor` log, but I wanted to have it for the case where we aren't monitoring validators too.
2021-03-04 04:43:31 +00:00
Michael Sproul
363f15f362 Use the database to persist the pubkey cache (#2234)
## Issue Addressed

Closes #1787

## Proposed Changes

* Abstract the `ValidatorPubkeyCache` over a "backing" which is either a file (legacy), or the database.
* Implement a migration from schema v2 to schema v3, whereby the contents of the cache file are copied to the DB, and then the file is deleted. The next release to include this change must be a minor version bump, and we will need to warn users of the inability to downgrade (this is our first DB schema change since mainnet genesis).
* Move the schema migration code from the `store` crate into the `beacon_chain` crate so that it can access the datadir and the `ValidatorPubkeyCache`, etc. It gets injected back into the `store` via a closure (similar to what we do in fork choice).
2021-03-04 01:25:12 +00:00
Age Manning
1c507c588e Update to the latest libp2p (#2239)
Updates to the latest libp2p and ignores RUSTSEC-2020-0146 from cargo-audit


Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-03-02 05:59:49 +00:00
realbigsean
ed9b245de0 update tokio-stream to 0.1.3 and use BroadcastStream (#2212)
## Issue Addressed

Resolves #2189 

## Proposed Changes

use tokio's `BroadcastStream`

## Additional Info

N/A


Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-03-01 01:58:05 +00:00
Michael Sproul
2f077b11fe Allow HTTP API to return SSZ blocks (#2209)
## Issue Addressed

Implements https://github.com/ethereum/eth2.0-APIs/pull/125

## Proposed Changes

Optionally return SSZ bytes from the `beacon/blocks` endpoint.
2021-02-24 04:15:14 +00:00
realbigsean
5bc93869c8 Update ValidatorStatus to match the v1 API (#2149)
## Issue Addressed

N/A

## Proposed Changes

We are currently a bit off of the standard API spec because we have [this](https://hackmd.io/bQxMDRt1RbS1TLno8K4NPg?view) proposal implemented for validator status.  Based on discussion [here](https://github.com/ethereum/eth2.0-APIs/pull/94), it looks like this won't be added to the spec until v2, so this PR implements [this](https://hackmd.io/ofFJ5gOmQpu1jjHilHbdQQ) validator status logic instead

## Additional Info

N/A


Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-02-24 04:15:13 +00:00
Paul Hauner
a764c3b247 Handle early blocks (#2155)
## Issue Addressed

NA

## Problem this PR addresses

There's an issue where Lighthouse is banning a lot of peers due to the following sequence of events:

1. Gossip block 0xabc arrives ~200ms early
    - It is propagated across the network, with respect to [`MAXIMUM_GOSSIP_CLOCK_DISPARITY`](https://github.com/ethereum/eth2.0-specs/blob/v1.0.0/specs/phase0/p2p-interface.md#why-is-there-maximum_gossip_clock_disparity-when-validating-slot-ranges-of-messages-in-gossip-subnets).
    - However, it is not imported to our database since the block is early.
2. Attestations for 0xabc arrive, but the block was not imported.
    - The peer that sent the attestation is down-voted.
        - Each unknown-block attestation causes a score loss of 1, the peer is banned at -100.
        - When the peer is on an attestation subnet there can be hundreds of attestations, so the peer is banned quickly (before the missed block can be obtained via rpc).

## Potential solutions

I can think of three solutions to this:

1. Wait for attestation-queuing (#635) to arrive and solve this.
    - Easy
    - Not immediate fix.
    - Whilst this would work, I don't think it's a perfect solution for this particular issue, rather (3) is better.
1. Allow importing blocks with a tolerance of `MAXIMUM_GOSSIP_CLOCK_DISPARITY`.
    - Easy
    - ~~I have implemented this, for now.~~
1. If a block is verified for gossip propagation (i.e., signature verified) and it's within `MAXIMUM_GOSSIP_CLOCK_DISPARITY`, then queue it to be processed at the start of the appropriate slot.
    - More difficult
    - Feels like the best solution, I will try to implement this.
    
    
**This PR takes approach (3).**

## Changes included

- Implement the `block_delay_queue`, based upon a [`DelayQueue`](https://docs.rs/tokio-util/0.6.3/tokio_util/time/delay_queue/struct.DelayQueue.html) which can store blocks until it's time to import them.
- Add a new `DelayedImportBlock` variant to the `beacon_processor::WorkEvent` enum to handle this new event.
- In the `BeaconProcessor`, refactor a `tokio::select!` to a struct with an explicit `Stream` implementation. I experienced some issues with `tokio::select!` in the block delay queue and I also found it hard to debug. I think this explicit implementation is nicer and functionally equivalent (apart from the fact that `tokio::select!` randomly chooses futures to poll, whereas now we're deterministic).
- Add a testing framework to the `beacon_processor` module that tests this new block delay logic. I also tested a handful of other operations in the beacon processor (attns, slashings, exits) since it was super easy to copy-pasta the code from the `http_api` tester.
    - To implement these tests I added the concept of an optional `work_journal_tx` to the `BeaconProcessor` which will spit out a log of events. I used this in the tests to ensure that things were happening as I expect.
    - The tests are a little racey, but it's hard to avoid that when testing timing-based code. If we see CI failures I can revise. I haven't observed *any* failures due to races on my machine or on CI yet.
    - To assist with testing I allowed for directly setting the time on the `ManualSlotClock`.
- I gave the `beacon_processor::Worker` a `Toolbox` for two reasons; (a) it avoids changing tons of function sigs when you want to pass a new object to the worker and (b) it seemed cute.
2021-02-24 03:08:52 +00:00
Paul Hauner
46920a84e8 v1.1.3 (#2217)
## Issue Addressed

NA

## Proposed Changes

Bump versions

## Additional Info

NA
2021-02-22 06:21:38 +00:00
Paul Hauner
4362ea4f98 Fix false positive "State advance too slow" logs (#2218)
## Issue Addressed

- Resolves #2214

## Proposed Changes

Fix the false positive warning log described in #2214.

## Additional Info

NA
2021-02-21 23:47:53 +00:00
Paul Hauner
8949ae7c4e Address ENR update loop (#2216)
## Issue Addressed

- Resolves #2215

## Proposed Changes

Addresses a potential loop when the majority of peers indicate that we are contactable via an IPv6 address.

See https://github.com/sigp/discv5/pull/62 for further rationale.

## Additional Info

The alternative to this PR is to use `--disable-enr-auto-update` and then manually supply an `--enr-address` and `--enr-upd-port`. However, that requires the user to know their IP addresses in order for discovery to work properly. This might not be practical/achievable for some users, hence this hotfix.
2021-02-21 23:47:52 +00:00
Paul Hauner
8c6537e71d v1.1.2 (#2213)
## Issue Addressed

NA

## Proposed Changes

Bump versions

## Additional Info

NA
2021-02-19 00:49:32 +00:00
Paul Hauner
f8cc82f2b1 Switch back to warp with cors wildcard support (#2211)
## Issue Addressed

- Resolves #2204
- Resolves #2205

## Proposed Changes

Switches to my fork of `warp` which contains support for cors wildcards: https://github.com/paulhauner/warp/tree/cors-wildcard

I have a PR open on the `warp` repo but it hasn't had any interest from the maintainers as of yet: https://github.com/seanmonstar/warp/pull/726. I think running from a fork is the best we can do for now.

## Additional Info

NA
2021-02-18 22:33:12 +00:00
Lion - dapplion
613382f304 Add slot offset computing to be downloaded slot (#2198)
The current implementation assumes the range offset of slots downloaded on a batch to equal zero. This conflicts with the condition to consider this chain as sync. For finalized sync, it results in one extra batch being downloaded which can't be processed.

CC @wemeetagain
2021-02-18 08:24:46 +00:00
Paul Hauner
f819ba5414 v1.1.1 (#2202)
## Issue Addressed

NA

## Proposed Changes

Bump versions
2021-02-16 00:09:02 +00:00
Pawan Dhananjay
4a357c9947 Upgrade rand_core (#2201)
## Issue Addressed

N/A

## Proposed Changes

Upgrade `rand_core` to latest version to fix https://rustsec.org/advisories/RUSTSEC-2021-0023
2021-02-15 20:34:49 +00:00
Paul Hauner
88cc222204 Advance state to next slot after importing block (#2174)
## Issue Addressed

NA

## Proposed Changes

Add an optimization to perform `per_slot_processing` from the *leading-edge* of block processing to the *trailing-edge*. Ultimately, this allows us to import the block at slot `n` faster because we used the tail-end of slot `n - 1` to perform `per_slot_processing`.

Additionally, add a "block proposer cache" which allows us to cache the block proposer for some epoch. Since we're now doing trailing-edge `per_slot_processing`, we can prime this cache with the values for the next epoch before those blocks arrive (assuming those blocks don't have some weird forking).

There were several ancillary changes required to achieve this: 

- Remove the `state_root` field  of `BeaconSnapshot`, since there's no need to know it on a `pre_state` and in all other cases we can just read it from `block.state_root()`.
    - This caused some "dust" changes of `snapshot.beacon_state_root` to `snapshot.beacon_state_root()`, where the `BeaconSnapshot::beacon_state_root()` func just reads the state root from the block.
- Rename `types::ShuffingId` to `AttestationShufflingId`. I originally did this because I added a `ProposerShufflingId` struct which turned out to be not so useful. I thought this new name was more descriptive so I kept it.
- Address https://github.com/ethereum/eth2.0-specs/pull/2196
- Add a debug log when we get a block with an unknown parent. There was previously no logging around this case.
- Add a function to `BeaconState` to compute all proposers for an epoch without re-computing the active indices for each slot.

## Additional Info

- ~~Blocked on #2173~~
- ~~Blocked on #2179~~ That PR was wrapped into this PR.
- There's potentially some places where we could avoid computing the proposer indices in `per_block_processing` but I haven't done this here. These would be an optimization beyond the issue at hand (improving block propagation times) and I think this PR is already doing enough. We can come back for that later.

## TODO

- [x] Tidy, improve comments.
- [x] ~~Try avoid computing proposer index in `per_block_processing`?~~
2021-02-15 07:17:52 +00:00
Paul Hauner
3000f3e5da Dht persistence on drop (v2) (#2200)
## Issue Addressed

NA

## Proposed Changes

This is simply #2177 with a merge conflict fixed.

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-02-15 06:09:55 +00:00
Paul Hauner
8e5c20b6d1 Update for clippy 1.50 (#2193)
## Issue Addressed

NA

## Proposed Changes

Rust 1.50 has landed 🎉

The shiny new `clippy` peers down upon us mere mortals with disgust. Brutish peasants wrapping our `usize`s in superfluous `Option`s... tsk tsk.

I've performed the goat sacrifice and corrected our evil ways in this PR. Tonight we shall pray that Github Actions bestows the almighty green tick upon us.

## Additional Info

NA


Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-02-15 00:09:12 +00:00
realbigsean
e20f64b21a Update to tokio 1.1 (#2172)
## Issue Addressed

resolves #2129
resolves #2099 
addresses some of #1712
unblocks #2076
unblocks #2153 

## Proposed Changes

- Updates all the dependencies mentioned in #2129, except for web3. They haven't merged their tokio 1.0 update because they are waiting on some dependencies of their own. Since we only use web3 in tests, I think updating it in a separate issue is fine. If they are able to merge soon though, I can update in this PR. 

- Updates `tokio_util` to 0.6.2 and `bytes` to 1.0.1.

- We haven't made a discv5 release since merging tokio 1.0 updates so I'm using a commit rather than release atm. **Edit:** I think we should merge an update of `tokio_util` to 0.6.2 into discv5 before this release because it has panic fixes in `DelayQueue`  --> PR in discv5:  https://github.com/sigp/discv5/pull/58

## Additional Info

tokio 1.0 changes that required some changes in lighthouse:

- `interval.next().await.is_some()` -> `interval.tick().await`
- `sleep` future is now `!Unpin` -> https://github.com/tokio-rs/tokio/issues/3028
- `try_recv` has been temporarily removed from `mpsc` -> https://github.com/tokio-rs/tokio/issues/3350
- stream features have moved to `tokio-stream` and `broadcast::Receiver::into_stream()` has been temporarily removed -> `https://github.com/tokio-rs/tokio/issues/2870
- I've copied over the `BroadcastStream` wrapper from this PR, but can update to use `tokio-stream` once it's merged https://github.com/tokio-rs/tokio/pull/3384

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-02-10 23:29:49 +00:00
Paul Hauner
e383ef3e91 Avoid temp allocations with slog (#2183)
## Issue Addressed

Which issue # does this PR address?

## Proposed Changes

Replaces use of `format!` in `slog` logging with it's special no-allocation `?` and `%` shortcuts. According to a `heaptrack` analysis today over about a period of an hour, this will reduce temporary allocations by at least 4%.

## Additional Info

NA
2021-02-04 07:31:47 +00:00
Paul Hauner
ff35fbb121 Add metrics for beacon block propagation (#2173)
## Issue Addressed

NA

## Proposed Changes

Adds some metrics to track delays regarding:

- LH processing of blocks
- delays receiving blocks from other nodes.

## Additional Info

NA
2021-02-04 05:33:56 +00:00
Akihito Nakano
1a22a096c6 Fix clippy errors on tests (#2160)
## Issue Addressed

There are some clippy error on tests.


## Proposed Changes

Enable clippy check on tests and fix the errors. 💪
2021-01-28 23:31:06 +00:00
Paul Hauner
e4b62139d7 v1.1.0 (#2168)
## Issue Addressed

NA

## Proposed Changes

- Bump version
- ~~Run `cargo update`~~

## Additional Info

NA
2021-01-21 02:37:08 +00:00
Paul Hauner
2b2a358522 Detailed validator monitoring (#2151)
## Issue Addressed

- Resolves #2064

## Proposed Changes

Adds a `ValidatorMonitor` struct which provides additional logging and Grafana metrics for specific validators.

Use `lighthouse bn --validator-monitor` to automatically enable monitoring for any validator that hits the [subnet subscription](https://ethereum.github.io/eth2.0-APIs/#/Validator/prepareBeaconCommitteeSubnet) HTTP API endpoint.

Also, use `lighthouse bn --validator-monitor-pubkeys` to supply a list of validators which will always be monitored.

See the new docs included in this PR for more info.

## TODO

- [x] Track validator balance, `slashed` status, etc.
- [x] ~~Register slashings in current epoch, not offense epoch~~
- [ ] Publish Grafana dashboard, update TODO link in docs
- [x] ~~#2130 is merged into this branch, resolve that~~
2021-01-20 19:19:38 +00:00
Paul Hauner
1eb0915301 Fix bug from #2163 (#2165)
## Issue Addressed

NA

## Proposed Changes

Fixes a bug that I missed during a review in #2163. I found this bug by observing that nodes were receiving far less attestations (~1/2 of previous).

I'm not certain on *exactly* how this mistake manifested in a reduction in attestations, but the mistake touches so much code that I think it's reasonable to declare that this it the cause of the observed issue (drop in attestations).

## Additional Info

NA
2021-01-20 10:28:12 +00:00
Paul Hauner
b06559ae97 Disallow attestation production earlier than head (#2130)
## Issue Addressed

The non-finality period on Pyrmont between epochs [`9114`](https://pyrmont.beaconcha.in/epoch/9114) and [`9182`](https://pyrmont.beaconcha.in/epoch/9182) was contributed to by all the `lighthouse_team` validators going down. The nodes saw excessive CPU and RAM usage, resulting in the system to kill the `lighthouse bn` process. The `Restart=on-failure` directive for `systemd` caused the process to bounce in ~10-30m intervals.

Diagnosis with `heaptrack` showed that the `BeaconChain::produce_unaggregated_attestation` function was calling `store::beacon_state::get_full_state` and sometimes resulting in a tree hash cache allocation. These allocations were approximately the size of the hosts physical memory and still allocated when `lighthouse bn` was killed by the OS.

There was no CPU analysis (e.g., `perf`), but the `BeaconChain::produce_unaggregated_attestation` is very CPU-heavy so it is reasonable to assume it is the cause of the excessive CPU usage, too.

## Proposed Changes

`BeaconChain::produce_unaggregated_attestation` has two paths:

1. Fast path: attesting to the head slot or later.
2. Slow path: attesting to a slot earlier than the head block.

Path (2) is the only path that calls `store::beacon_state::get_full_state`, therefore it is the path causing this excessive CPU/RAM usage.

This PR removes the current functionality of path (2) and replaces it with a static error (`BeaconChainError::AttestingPriorToHead`).

This change reduces the generality of `BeaconChain::produce_unaggregated_attestation` (and therefore [`/eth/v1/validator/attestation_data`](https://ethereum.github.io/eth2.0-APIs/#/Validator/produceAttestationData)), but I argue that this functionality is an edge-case and arguably a violation of the [Honest Validator spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md).

It's possible that a validator goes back to a prior slot to "catch up" and submit some missed attestations. This change would prevent such behaviour, returning an error. My concerns with this catch-up behaviour is that it is:

- Not specified as "honest validator" attesting behaviour.
- Is behaviour that is risky for slashing (although, all validator clients *should* have slashing protection and will eventually fail if they do not).
- It disguises clock-sync issues between a BN and VC.

## Additional Info

It's likely feasible to implement path (2) if we implement some sort of caching mechanism. This would be a multi-week task and this PR gets the issue patched in the short term. I haven't created an issue to add path (2), instead I think we should implement it if we get user-demand.
2021-01-20 06:52:37 +00:00
Paul Hauner
d9f940613f Represent slots in secs instead of millisecs (#2163)
## Issue Addressed

NA

## Proposed Changes

Copied from #2083, changes the config milliseconds_per_slot to seconds_per_slot to avoid errors when slot duration is not a multiple of a second. To avoid deserializing old serialized data (with milliseconds instead of seconds) the Serialize and Deserialize derive got removed from the Spec struct (isn't currently used anyway).

This PR replaces #2083 for the purpose of fixing a merge conflict without requiring the input of @blacktemplar.

## Additional Info

NA


Co-authored-by: blacktemplar <blacktemplar@a1.net>
2021-01-19 09:39:51 +00:00
Paul Hauner
805e152f66 Simplify enum -> str with strum (#2164)
## Issue Addressed

NA

## Proposed Changes

As per #2100, uses derives from the sturm library to implement AsRef<str> and AsStaticRef to easily get str values from enums without creating new Strings. Furthermore unifies all attestation error counter into one IntCounterVec vector.

These works are originally by @blacktemplar, I've just created this PR so I can resolve some merge conflicts.

## Additional Info

NA


Co-authored-by: blacktemplar <blacktemplar@a1.net>
2021-01-19 06:33:58 +00:00
realbigsean
7a71977987 Clippy 1.49.0 updates and dht persistence test fix (#2156)
## Issue Addressed

`test_dht_persistence` failing

## Proposed Changes

Bind `NetworkService::start` to an underscore prefixed variable rather than `_`.  `_` was causing it to be dropped immediately

This was failing 5/100 times before this update, but I haven't been able to get it to fail after updating it

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-01-19 00:34:28 +00:00
Pawan Dhananjay
28238d97b1 Disconnect from peers quicker on internet issues (#2147)
## Issue Addressed

Fixes #2146 

## Proposed Changes

Change ping timeout errors to return `LowToleranceErrors` so that we disconnect faster on internet failures/changes.
2021-01-13 08:09:10 +00:00
realbigsean
423dea169c update smallvec (#2152)
## Issue Addressed

`cargo audit` is failing because of a potential for an overflow in the version of `smallvec` we're using

## Proposed Changes

Update to the latest version of `smallvec`, which has the fix


Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-01-11 23:32:11 +00:00
Arthur Woimbée
851a4dca3c replace tempdir by tempfile (#2143)
## Issue Addressed

Fixes #2141 
Remove [tempdir](https://docs.rs/tempdir/0.3.7/tempdir/) in favor of [tempfile](https://docs.rs/tempfile/3.1.0/tempfile/).

## Proposed Changes

`tempfile` has a slightly different api that makes creating temp folders with a name prefix a chore (`tempdir::TempDir::new("toto")` => `tempfile::Builder::new().prefix("toto").tempdir()`).

So I removed temp folder name prefix where I deemed it not useful.

Otherwise, the functionality is the same.
2021-01-06 06:36:11 +00:00
Age Manning
7e4b190df0 Reduce ping interval (#2132)
## Issue Addressed

#2123

## Description

Reduces the TCP ping interval to increase our responsiveness to peer liveness changes.
2021-01-06 04:35:52 +00:00
realbigsean
588b90157d Ssz state api endpoint (#2111)
## Issue Addressed

Catching up to a recently merged API spec PR: https://github.com/ethereum/eth2.0-APIs/pull/119

## Proposed Changes

- Return an SSZ beacon state on `/eth/v1/debug/beacon/states/{stateId}` when passed this header: `accept: application/octet-stream`.
- requests to this endpoint with no  `accept` header or an `accept` header and a value of `application/json` or `*/*` , or will result in a JSON response

## Additional Info


Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-01-06 03:01:46 +00:00
Samuel E. Moelius
939fa717fd test_decode_malicious_status_message improvements (#2104)
## Issue Addressed

None

## Proposed Changes

* Correct typo in one comment, elaborate some others.
* Add asserts to ensure comments match code.
* Eliminate one unnecessary `clone`.

## Additional Info

None
2021-01-06 01:10:26 +00:00
Samuel E. Moelius
0245ddd37b Fix typo in ssz_snappy.rs comment (#2103)
## Issue Addressed

None

## Proposed Changes

Correct a typo in `ssz_snappy.rs`.

## Additional Info

Pedantry at it finest.
2021-01-06 01:10:24 +00:00