Commit Graph

4924 Commits

Author SHA1 Message Date
Paul Hauner
77f3539654 Improve eth1 block sync (#2008)
## Issue Addressed

NA

## Proposed Changes

- Log about eth1 whilst waiting for genesis.
- For the block and deposit caches, update them after each download instead of when *all* downloads are complete.
  - This prevents the case where a single timeout error can cause us to drop *all* previously download blocks/deposits.
- Set `max_log_requests_per_update` to avoid timeouts due to very large log counts in a response.
- Set `max_blocks_per_update` to prevent a single update of the block cache to download an unreasonable number of blocks.
  - This shouldn't have any affect in normal use, it's just a safe-guard against bugs.
- Increase the timeout for eth1 calls from 15s to 60s, as per @pawanjay176's experience with Infura.

## Additional Info

NA
2020-11-30 20:29:17 +00:00
divma
8fcd22992c No string in slog (#2017)
## Issue Addressed

Following slog's documentation, this should help a bit with string allocations. I left it run for two days and mem usage is lower. This is of course anecdotal, but shouldn't harm anyway 

## Proposed Changes

remove `String` creation in logs when possible
2020-11-30 10:33:00 +00:00
Mehdi Zerouali
3f036fd193 Update PGP key in README (#1986)
## Proposed Changes

Update Sigma Prime's PGP key.
2020-11-30 09:28:54 +00:00
Paul Hauner
85e69249e6 Drop discovery log to trace (#2007)
## Issue Addressed

NA

## Proposed Changes

This was causing:

```
Nov 28 21:56:08.154 ERRO slog-async: logger dropped messages due to channel overflow, count: 44, service: libp2p
```

## Additional Info

NA
2020-11-29 03:02:23 +00:00
Age Manning
f7183098ee Bump to version v1.0.2 (#2001)
Update lighthouse to version `v1.0.2`. 

There are two major updates in this version:
- Updates to the task executor to tokio 0.3 and all sub-dependencies relying on core execution, including libp2p
- Update BLST
2020-11-28 13:22:37 +00:00
Justin
cadcc9a76b Fix possible typo in build from source instructions (#1990) 2020-11-28 06:41:34 +00:00
Sean Gulley
9a37f356a9 Update blst to official crate and incorporate subgroup changes (#1979)
## Issue Addressed

Move to latest official version of blst (v0.3.1).  Incorporate all the subgroup check API changes.

## Proposed Changes

Update Cargo.toml to use official blst crate 0.3.1
Modifications to blst.rs wrapper for subgroup check API changes

## Additional Info

The overall subgroup check methodology is public keys should be check for validity using key_validate() at time of first seeing them.  This will check for infinity and in group.  Those keys can then be cached for future usage.  All calls into blst set the pk_validate boolean to false to indicate there is no need for on the fly checking of public keys in the library.  Additionally the public keys are supposed to be validated for proof of possession outside of blst.

For signatures the subgroup check can be done at time of deserialization, prior to being used in aggregation or verification, or in the blst aggregation or verification functions themselves.  In the interface wrapper the call to subgroup_check has been left for one instance, although that could be moved into the 
verify_multiple_aggregate_signatures() call if wanted.  Checking beforehand does save some compute resources in the scenario a bad signature is received.  Elsewhere the subgroup check is being done inside the higher level operations.  See comments in the code.

All checks on signature are done for subgroup only.  There are no checks for infinity.  The rationale is an aggregate signature could technically equal infinity.  If any individual signature was infinity (invalid) then it would fail at time of verification.  A loss of compute resources, although safety would be preserved.
2020-11-28 06:41:32 +00:00
Age Manning
a567f788bd Upgrade to tokio 0.3 (#1839)
## Description

This PR updates Lighthouse to tokio 0.3. It includes a number of dependency updates and some structural changes as to how we create and spawn tasks.

This also brings with it a number of various improvements:

- Discv5 update
- Libp2p update
- Fix for recompilation issues
- Improved UPnP port mapping handling
- Futures dependency update
- Log downgrade to traces for rejecting peers when we've reached our max



Co-authored-by: blacktemplar <blacktemplar@a1.net>
2020-11-28 05:30:57 +00:00
Paul Hauner
5a3b94cbb4
Update to v1.0.1, run cargo update 2020-11-27 21:16:59 +11:00
blacktemplar
38b15deccb Fallback nodes for eth1 access (#1918)
## Issue Addressed

part of  #1883

## Proposed Changes

Adds a new cli argument `--eth1-endpoints` that can be used instead of `--eth1-endpoint` to specify a comma-separated list of endpoints. If the first endpoint returns an error for some request the other endpoints are tried in the given order.

## Additional Info

Currently if the first endpoint fails the fallbacks are used silently (except for `try_fallback_test_endpoint` that is used in `do_update` which logs a `WARN` for each endpoint that is not reachable). A question is if we should add more logs so that the user gets warned if his main endpoint is for example just slow and sometimes hits timeouts.
2020-11-27 08:37:44 +00:00
Michael Sproul
1312844f29 Disable snappy in LevelDB to fix build issues (#1983)
## Proposed Changes

A user on Discord reported build issues when trying to compile Lighthouse checked out to a path with spaces in it. I've fixed the issue upstream in `leveldb-sys` (https://github.com/skade/leveldb-sys/pull/22), but rather than waiting for a new release of the `leveldb` crate, we can also work around the issue by disabling Snappy in LevelDB, which we weren't using anyway.

This may also have the side-effect of slightly improving compilation times, as LevelDB+Snappy was found to be a substantial contributor to build time (although I'm not sure how much was LevelDB and how much was Snappy).
2020-11-27 03:01:57 +00:00
Pawan Dhananjay
0589a14afe Log better error message (#1981)
## Issue Addressed

Fixes #1965 

## Proposed Changes

Log an error and don't update eth1 caches if `chain_id = 0`
2020-11-26 23:13:46 +00:00
Michael Sproul
3486d6a809 Use OS file locks in validator client (#1958)
## Issue Addressed

Closes #1823

## Proposed Changes

* Use OS-level file locking for validator keystores, eliminating problems with lockfiles lingering after ungraceful shutdowns (`SIGKILL`, power outage). I'm using the `fs2` crate because it's cross-platform (unlike `file-lock`), and it seems to have the most downloads on crates.io.
* Deprecate + disable `--delete-lockfiles` CLI param, it's no longer necessary
* Delete the `validator_dir::Manager`, as it was mostly dead code and was only used in the `validator list` command, which has been rewritten to read the validator definitions YAML instead.

## Additional Info

Tested on:

- [x] Linux
- [x] macOS
- [x] Docker Linux
- [x] Docker macOS
- [ ] Windows
2020-11-26 11:25:46 +00:00
divma
fc07cc3fdf Sync metrics (#1975)
## Issue Addressed
- Add metrics to keep track of peer counts by sync type
- Add metric to keep track of the number of syncing chains in range

## Proposed Changes
Plugin to the network metrics update interval and update too the counts for peers wrt to their sync status with us

## Additional Info
For the peer counts
- By the way it is implemented the numbers won't always match to the total peer count in the `libp2p` metric.
- Updating the gauge with every change is messy because it requires to be updated on connection (in the `eth2_libp2p` crate, while metrics are defined in the `network` crate) on Goodbye sent (for an `IrrelevantPeer`) either in the `beacon_processor` or the `peer_manager`, and on disconnection. Since this is not a critical metric I think counting once every second is enough. If you think more accuracy is needed we can do it too, but it would be harder to maintain)

ATM those look like this
![image](https://user-images.githubusercontent.com/26765164/100275387-22137b00-2f60-11eb-93b9-94b0f265240c.png)
2020-11-26 05:23:17 +00:00
Paul Hauner
26741944b1 Add metrics to VC (#1954)
## Issue Addressed

NA

## Proposed Changes

- Adds a HTTP server to the VC which provides Prometheus metrics.
- Moves the health metrics into the `lighthouse_metrics` crate so it can be shared between BN/VC.
- Sprinkle some metrics around the VC.
- Update the book to indicate that we now have VC metrics.
- Shifts the "waiting for genesis" logic later in the `ProductionValidatorClient::new_from_cli`
  - This is worth attention during the review.

## Additional Info

- ~~`clippy` has some new lints that are failing. I'll deal with that in another PR.~~
2020-11-26 01:10:51 +00:00
SjonHortensius
50558e61f7 Fix #1964: remove mainnet warnings which no longer apply (#1970)
## Issue Addressed

#1964

## Proposed Changes

* remove two mainnet warnings
* reword `testnet` in logmessage
* update test
2020-11-25 23:56:21 +00:00
Age Manning
198c4a873d Update ENR construction and mainnet bootnodes (#1968)
## Issue Addressed

Boot nodes were being successfully created and publishing valid ENRs however the `eth2` field was not being saved to disk leading to a discrepancy between published ENR and disk ENR. 

If the `eth2` field is known, it is now constructed in the initial ENR and saved to disk. 

Previous mainnet bootnodes did not contain the `eth2` field and these have also been updated.
2020-11-25 22:48:07 +00:00
realbigsean
7b6a97e73c FAQ/Doc updates (#1966)
## Issue Addressed

N/A

## Proposed Changes

Adding a few FAQ's, updating some formatting


Co-authored-by: realbigsean <seananderson33@gmail.com>
2020-11-25 05:51:10 +00:00
Paul Hauner
7020f5df40 Update docs whenever unstable changes (#1969)
## Issue Addressed

NA

## Proposed Changes

Presently `master` is stable (and will be sunsetted) which means our docs only update after a release. This PR sets the docs to build on the `unstable` branch, which is equivalent to what what we've always had. 

## Additional Info

This does raise the question of whether or not docs should target `stable` or `unstable`, but I'd prefer to maintain current functionality and merge #1966 for now. I think having two versions might be handy, one for stable and one for unstable; I don't imagine this very difficult to achieve.
2020-11-25 03:20:23 +00:00
divma
3b4afc27bf Status race condition (#1967)
## Issue Addressed

Sync stalls due to race conditions between dc notifications and status processing
2020-11-25 02:15:38 +00:00
Paul Hauner
c6baa0eed1
Bump to v1.0.0, run cargo update 2020-11-25 02:02:19 +11:00
Age Manning
a96893744c
Update bootnodes and boot_node cli (#1961) 2020-11-25 02:01:37 +11:00
Paul Hauner
11c4968ea0
DO spec check before waiting for genesis (#1962) 2020-11-25 02:00:11 +11:00
Age Manning
b6eff50ffa
Add lighthouse boot nodes (#1960) 2020-11-25 00:05:53 +11:00
Paul Hauner
61277e3a72
Add mainnet genesis state (#1959)
* Add mainnet genesis state

* Add compressed, remove uncompressed
2020-11-24 23:21:00 +11:00
Mehdi Zerouali
ead6be074e Remove experimental software warning (#1957)
## Proposed Changes

Remove warning message on startup.
2020-11-24 10:29:41 +00:00
Mehdi Zerouali
011cea93b3 Update security details in README (#1956)
## Proposed Changes

Introduces a few minor changes to the README, mainly updating mentions about security.
2020-11-24 10:29:39 +00:00
Michael Sproul
20339ade01 Refine and test slashing protection semantics (#1885)
## Issue Addressed

Closes #1873

## Proposed Changes

Fixes the bug in slashing protection import (#1873) by pruning the database upon import.

Also expands the test generator to cover this case and a few others which are under discussion here:

https://ethereum-magicians.org/t/eip-3076-validator-client-interchange-format-slashing-protection/4883

## Additional Info

Depending on the outcome of the discussion on Eth Magicians, we can either wait for consensus before merging, or merge our preferred solution and patch things later.
2020-11-24 07:21:14 +00:00
Paul Hauner
84b3387d09 Add Prysm and Teku boot nodes (#1953)
## Issue Addressed

NA

## Proposed Changes

- Adds Prysm and Teku's boot nodes.

The boot ENR were collected from [this Prysm PR](https://github.com/prysmaticlabs/prysm/pull/7925/files#diff-c20494db2dc1354ad056bcacaa192681386854bf036fdeef375dfe57336f27a7R42).

## Additional Info

NA
2020-11-24 06:02:28 +00:00
Paul Hauner
e504645767 Update validator guide for mainnet (#1951)
## Issue Addressed

NA

## Proposed Changes

Updates the validator guide to provide instructions for mainnet users.

## Additional Info

- ~~Blocked on #1751~~
2020-11-24 04:42:17 +00:00
realbigsean
a171fb8843 check if the slashing protection database is locked before creating keys (#1949)
## Issue Addressed

Closes #1790

## Proposed Changes

Make a new method that creates an empty transaction with `TransactionBehavior::Exclusive` to check whether the slashing protection is locked. Call this method before attempting to create or import new validator keystores.  

## Additional Info

N/A


Co-authored-by: realbigsean <seananderson33@gmail.com>
2020-11-24 03:25:40 +00:00
divma
6f890c398e Sync Bug fixes (#1950)
## Issue Addressed

Two issues related to empty batches
- Chain target's was not being advanced when the batch was successful, empty and the chain didn't have an optimistic batch
- Not switching finalized chains. We now switch finalized chains requiring a minimum work first
2020-11-24 02:11:31 +00:00
Paul Hauner
21617aa87f Change --testnet flag to --network (#1751)
## Issue Addressed

- Resolves #1689

## Proposed Changes

TBC

## Additional Info

NA
2020-11-23 23:54:03 +00:00
Michael Sproul
7d644103c6 Tweak slasher DB schema and pruning (#1948)
## Issue Addressed

Resolves #1890

## Proposed Changes

Change the slasher database schema to key indexed attestations by `(target_epoch, indexed_attestation_root)` instead of just `indexed_attestation_root`. This allows more straight-forward pruning (linear scan), that is also "re-entrant". By re-entrant, we mean that a pruning pass that gets stuck because of a `MapFull` error can attempt to commit midway, and be resumed later without issue. The previous pruning strategy for indexed attestations did not have this property. There was also a flaw in the previous pruning that could leave "zombie" indexed attestations in the database (ones not referenced by any attester record), which could build up and contribute to bloat (although in practice I think they occur quite infrequently).

## Additional Info

During testing I noticed that a `MapFull` error can still occur during the commit of the transaction itself, which is irritating, but not unbearable. This PR should at least reduce the frequency with which users need to manually resize their DB, and if the `MapFull` on commit rears its ugly head too often we could use a dynamic strategy (temporarily increase the size of the map until the transaction commits).

The extra bytes for the epoch make the database a bit heavier, so the size estimate docs have been updated to reflect this. This is also a breaking schema change, so anyone using a v0 database from a few hours ago will need to drop it and update 😅
2020-11-23 21:33:51 +00:00
Michael Sproul
5828ff1204 Implement slasher (#1567)
This is an implementation of a slasher that lives inside the BN and can be enabled via `lighthouse bn --slasher`.

Features included in this PR:

- [x] Detection of attester slashing conditions (double votes, surrounds existing, surrounded by existing)
- [x] Integration into Lighthouse's attestation verification flow
- [x] Detection of proposer slashing conditions
- [x] Extraction of attestations from blocks as they are verified
- [x] Compression of chunks
- [x] Configurable history length
- [x] Pruning of old attestations and blocks
- [x] More tests

Future work:

* Focus on a slice of history separate from the most recent N epochs (e.g. epochs `current - K` to `current - M`)
* Run out-of-process
* Ingest attestations from the chain without a resync

Design notes are here https://hackmd.io/@sproul/HJSEklmPL
2020-11-23 03:43:22 +00:00
Paul Hauner
59b2247ab8 Improve UX whilst VC is waiting for genesis (#1915)
## Issue Addressed

- Resolves #1424

## Proposed Changes

Add a `GET lighthouse/staking` that returns 200 if the node is ready to stake (i.e., `--eth1` flag is present) or a 404 otherwise.

Whilst the VC is waiting for the genesis time to start (i.e., when the genesis state is known), check the `lighthouse/staking` endpoint and log an error if the node isn't configured for staking.

## Additional Info

NA
2020-11-23 01:00:22 +00:00
Paul Hauner
65b1cf2af1 Add flag to import all attestations (#1941)
## Issue Addressed

NA

## Proposed Changes

Adds the `--import-all-attestations` flag which tells the `network::AttestationService` to import/aggregate all attestations after verification (instead of only ones for subnets that are relevant to local validators).

This is useful for testing/debugging and also for creating back-up nodes that should be all cached up and ready for any validator.

## Additional Info

NA
2020-11-22 23:58:25 +00:00
divma
d0cbf3111a move sync state to the chains KV (#1940)
## Issue Addressed
we have a log saying we add a peer to a chain, and an another one in case the chain is not syncing. To avoid needing to peer there two (and reduce log entries) simply log the chain's syncing state in the chain's KV
2020-11-22 23:58:23 +00:00
Michael Sproul
426b3001e0 Fix race condition in seen caches (#1937)
## Issue Addressed

Closes #1719

## Proposed Changes

Lift the internal `RwLock`s and `Mutex`es from the `Observed*` data structures to resolve the race conditions described in #1719.

Most of this work was done by @paulhauner on his `lift-locks` branch, I merely updated it for the current `master` and checked over it.

## Additional Info

I think it would be prudent to test this on a testnet or two before mainnet launch, just to be sure that the extra lock contention doesn't negatively impact performance.
2020-11-22 23:02:51 +00:00
Paul Hauner
0b556c4405 Fix metrics http server error messages (#1946)
## Issue Addressed

- Resolves #1945

## Proposed Changes

- As per #1945, fix a log message from the metrics server that was falsely claiming to be from the api server.
- Ensure successful api request logs are published to debug, not trace. This is something I've wanted to do for a while.

## Additional Info

NA
2020-11-22 03:39:13 +00:00
Paul Hauner
48f73b21e6 Expand eth1 block cache, add more logs (#1938)
## Issue Addressed

NA

## Proposed Changes

- Caches later blocks than is required by `ETH1_FOLLOW_DISTANCE`.
- Adds logging to `warn` if the eth1 cache is insufficiently primed.
- Use `max_by_key` instead of `max_by` in `BeaconChain::Eth1Chain` since it's simpler.
- Rename `voting_period_start_timestamp` to `voting_target_timestamp` for accuracy.

## Additional Info

The reason for eating into the `ETH1_FOLLOW_DISTANCE` and caching blocks that are closer to the head is due to possibility for `SECONDS_PER_ETH1_BLOCK` to be incorrect (as is the case for the Pyrmont testnet on Goerli).

If `SECONDS_PER_ETH1_BLOCK` is too short, we'll skip back too far from the head and skip over blocks that would be valid [`is_candidate_block`](https://github.com/ethereum/eth2.0-specs/blob/v1.0.0/specs/phase0/validator.md#eth1-data) blocks. This was the case on the Pyrmont testnet and resulted in Lighthouse choosing blocks that were about 30 minutes older than is ideal.
2020-11-21 00:26:15 +00:00
Kirk Baird
3b405f10ea Ensure deposit signatures do not use aggregate functions (#1935)
## Issue Addressed

Resolves #1333 

## Proposed Changes

- Remove `deposit_signature_set()` function
- Prevent deposits from being in `SignatureSets`
- User `Signature.verify()` to verify deposit signatures rather than a signature set which uses `fast_aggregate_verify()`

## Additional Info

n/a
2020-11-20 03:37:20 +00:00
divma
d727e55abe Move some rpc processing to the beacon_processor (#1936)
## Issue Addressed
`BlocksByRange` requests were the main culprit of a series of timeouts to peer's requests in general because they produce build up in the router's processor. Those were moved to the blocking executor but a task is being spawned for each; also not ideal since the amount of resources we give to those is not controlled

## Proposed Changes
- Move `BlocksByRange` and `BlocksByRoots` to the `beacon_processor`. The processor crafts the responses and sends them.
- Move too the processing of `StatusMessage`s from other peers. This is a fast operation but it can also build up and won't scale if we keep it in the router (processing one at the time). These don't need to send an answer, so there is no harm in processing them "later" if that were to happen. Sending responses to status requests is still in the router, so we answer as soon as we see them.
- Some "extras" that are basically clean up:
  - Split the `Worker` logic in sync methods (chain processing and rpc blocks), gossip methods (the majority of methods) and rpc methods (the new ones)
  - Move the `status_message` function previously provided by the router's processor to a more central place since it is used by the router, sync, network_context and beacon_processor
 - Some spelling

## Additional Info
What's left to decide/test more thoroughly is the length of the queues and the priority rules. @paulhauner suggested at some point to put status above attestations, and @AgeManning had described an importance of "protecting gossipsub" so my solution is leaving status requests in the router and RPC methods below attestations. Slashings and Exits are at the end.
2020-11-19 23:33:44 +00:00
Pawan Dhananjay
e47739047d Add additional libp2p tests (#1867)
## Issue Addressed

N/A

## Proposed Changes

Adds tests for the eth2_libp2p crate.
2020-11-19 22:32:09 +00:00
Michael Sproul
37369c6a56 Document system requirements (#1934)
## Proposed Changes

Document some minimal and recommended system specs for running Lighthouse on mainnet with a modest number of validators.
2020-11-19 21:23:56 +00:00
Kirk Baird
c5e97b9bf7 Add validation to kdf parameters (#1930)
## Issue Addressed

Closes #1906 
Closes #1907 

## Proposed Changes

- Emits warnings when the KDF parameters are two low.
- Returns errors when the KDF parameters are high enough to pose a potential DoS threat.
- Validates AES IV length is 128 bits, errors if empty, warnings otherwise.

## Additional Info

NIST advice used for PBKDF2 ranges https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-132.pdf. 
Scrypt ranges are based on the maximum value of the `u32` (i.e 4GB of memory)

The minimum range has been set to anything below the default fields.
2020-11-19 08:52:51 +00:00
Herman Junge
1a530e5a93 [Remote signer] Add signer consumer lib (#1763)
Adds a library `common/remote_signer_consumer`
2020-11-19 04:04:52 +00:00
Kirk Baird
3db9072fee Reject invalid utf-8 characters during encryption (#1928)
## Issue Addressed

Closes #1889 

## Proposed Changes

- Error when passwords which use invalid UTF-8 characters during encryption. 
- Add some tests

## Additional Info

I've decided to error when bad characters are used to create/encrypt a keystore but think we should allow them during decryption since either the keystore was created
-  with invalid UTF-8 characters (possibly by another client or someone whose password is random bytes) in which case we'd want them to be able to decrypt their keystore using the right key.
-  without invalid characters then the password checksum would almost certainly fail.

Happy to add them to decryption if we want to make the decryption more trigger happy 😋 , it would only be a one line change and would tell the user which character index is causing the issue.

See https://eips.ethereum.org/EIPS/eip-2335#password-requirements
2020-11-19 00:37:43 +00:00
realbigsean
79fd9b32b9 Update pool/attestations and committees endpoints (#1899)
## Issue Addressed

Catching up on a few eth2 spec updates:

## Proposed Changes

- adding query params to the `GET pool/attestations` endpoint
- allowing the `POST pool/attestations` endpoint to accept an array of attestations
    - batching attestation submission
- moving `epoch` from a path param to a query param in the `committees` endpoint

## Additional Info


Co-authored-by: realbigsean <seananderson33@gmail.com>
2020-11-18 23:31:39 +00:00
blacktemplar
3408de8151 Avoid string initialization in network metrics and replace by &str where possible (#1898)
## Issue Addressed

NA

## Proposed Changes

Removes most of the temporary string initializations in network metrics and replaces them by directly using `&str`. This further improves on PR https://github.com/sigp/lighthouse/pull/1895.

For the subnet id handling the current approach uses a build script to create a static map. This has the disadvantage that the build script hardcodes the number of subnets. If we want to use more than 64 subnets we need to adjust this in the build script.

## Additional Info

We still have some string initializations for the enum `PeerKind`. To also replace that by `&str` I created a PR in the libp2p dependency: https://github.com/sigp/rust-libp2p/pull/91. Either we wait with merging until this dependency PR is merged (and all conflicts with the newest libp2p version are resolved) or we just merge as is and I will create another PR when the dependency is ready.
2020-11-18 23:31:37 +00:00