## Description
This increases the logging of the underlying UPnP tasks to inform the user of UPnP error/success.
This also decreases the batch syncing size to two epochs per batch.
## Description
Updates to the latest libp2p and includes gossipsub updates.
Of particular note is the limitation of a single topic per gossipsub message.
Co-authored-by: blacktemplar <blacktemplar@a1.net>
## Issue Addressed
Closes#800Closes#1713
## Proposed Changes
Implement the temporary state storage algorithm described in #800. Specifically:
* Add `DBColumn::BeaconStateTemporary`, for storing 0-length temporary marker values.
* Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully.
* Add a garbage collection process to delete leftover temporary states on start-up.
* Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784)
## Additional Info
There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant.
### Race 1: Permanent state marked temporary
EDIT: this has been fixed by the addition of a lock around the relevant critical section
There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events:
1. Thread 1 checks if state `s` already exists, and seeing that it doesn't, prepares an atomic commit of `(s, s_temporary_flag)`.
2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and _deleting_ the temporary flag.
3. Thread 1 is (finally) scheduled again, and marks `s` as temporary with its transaction.
4.
a) The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state `s` will fail... hope it isn't on the main chain! Alternatively (4b) happens...
b) Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state `s` will disappear temporarily, but will come back once thread 1 finishes running.
I _hope_ that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know
This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. [LMDB only allows one R/W transaction at a time](https://docs.rs/lmdb/0.8.0/lmdb/struct.Environment.html#method.begin_rw_txn).
### Race 2: Temporary state returned from `get_state`
I don't think this race really matters, but in `load_hot_state`, if another thread stores a state between when we call `load_state_temporary_flag` and when we call `load_hot_state_summary`, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than _not_ having data).
This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now.
## Issue Addressed
Potentially resolves#1647 and sync stalls.
## Proposed Changes
The handling of the state of banned peers was inadequate for the complex peerdb data structure. We store a limited number of disconnected and banned peers in the db. We were not tracking intermediate "disconnecting" states and the in some circumstances we were updating the peer state without informing the peerdb. This lead to a number of inconsistencies in the peer state.
Further, the peer manager could ban a peer changing a peer's state from being connected to banned. In this circumstance, if the peer then disconnected, we didn't inform the application layer, which lead to applications like sync not being informed of a peers disconnection. This could lead to sync stalling and having to require a lighthouse restart.
Improved handling for peer states and interactions with the peerdb is made in this PR.
## Issue Addressed
NA
## Proposed Changes
Adds a `lighthouse/beacon/states/:state_id/ssz` endpoint to allow us to pull the genesis state from the API.
## Additional Info
NA
## Issue Addressed
Closes#1769Closes#1708
## Proposed Changes
Tweaks the op pool pruning so that the attestation pool is pruned against the wall-clock epoch instead of the finalized state's epoch. This should reduce the unbounded growth that we've seen during periods without finality.
Also fixes up the voluntary exit pruning as raised in #1708.
## Issue Addressed
- Resolves#1766
## Proposed Changes
- Use the `warp::filters::cors` filter instead of our work-around.
## Additional Info
It's not trivial to enable/disable `cors` using `warp`, since using `routes.with(cors)` changes the type of `routes`. This makes it difficult to apply/not apply cors at runtime. My solution has been to *always* use the `warp::filters::cors` wrapper but when cors should be disabled, just pass the HTTP server listen address as the only permissible origin.
## Issue Addressed
`node` endpoints in #1434
## Proposed Changes
Implement these:
```
/eth/v1/node/health
/eth/v1/node/peers/{peer_id}
/eth/v1/node/peers
```
- Add an `Option<Enr>` to `PeerInfo`
- Finish implementation of `/eth/v1/node/identity`
## Additional Info
- should update the `peers` endpoints when #1764 is resolved
Co-authored-by: realbigsean <seananderson33@gmail.com>
## Issue Addressed
Closes#1548
## Proposed Changes
Optimizes attester slashing choice by choosing the ones that cover the most amount of validators slashed, with the highest effective balances
## Additional Info
Initial pass, need to write a test for it
check for advanced peers and the state of the chain wrt the clock slot to decide if a chain is or not synced /transitioning to a head sync. Also a fix that prevented getting the right state while syncing heads
## Issue Addressed
Resolves#1776
## Proposed Changes
The beacon chain builder was using the canonical head's state root for the `genesis_state_root` field.
## Additional Info
## Issue Addressed
Resolves#1792
## Proposed Changes
Use `chain.best_slot()` instead of the sync state's target slot in the `not_while_syncing_filter`
## Additional Info
N/A
## Issue Addressed
#1614 and a couple of sync-stalling problems, the most important is a cyclic dependency between the sync manager and the peer manager
## Issue Addressed
Closes#1557
## Proposed Changes
Modify the pruning algorithm so that it mutates the head-tracker _before_ committing the database transaction to disk, and _only if_ all the heads to be removed are still present in the head-tracker (i.e. no concurrent mutations).
In the process of writing and testing this I also had to make a few other changes:
* Use internal mutability for all `BeaconChainHarness` functions (namely the RNG and the graffiti), in order to enable parallel calls (see testing section below).
* Disable logging in harness tests unless the `test_logger` feature is turned on
And chose to make some clean-ups:
* Delete the `NullMigrator`
* Remove type-based configuration for the migrator in favour of runtime config (simpler, less duplicated code)
* Use the non-blocking migrator unless the blocking migrator is required. In the store tests we need the blocking migrator because some tests make asserts about the state of the DB after the migration has run.
* Rename `validators_keypairs` -> `validator_keypairs` in the `BeaconChainHarness`
## Testing
To confirm that the fix worked, I wrote a test using [Hiatus](https://crates.io/crates/hiatus), which can be found here:
https://github.com/michaelsproul/lighthouse/tree/hiatus-issue-1557
That test can't be merged because it inserts random breakpoints everywhere, but if you check out that branch you can run the test with:
```
$ cd beacon_node/beacon_chain
$ cargo test --release --test parallel_tests --features test_logger
```
It should pass, and the log output should show:
```
WARN Pruning deferred because of a concurrent mutation, message: this is expected only very rarely!
```
## Additional Info
This is a backwards-compatible change with no impact on consensus.
## Issue Addressed
NA
## Proposed Changes
Adds a direction field to `PeerConnectionStatus` that can be accessed by calling `is_outgoing` which will return `true` iff the peer is connected and the first connection was an outgoing one.
## Issue Addressed
#1729#1730
Which issue # does this PR address?
## Proposed Changes
1. Fixes a bug in the simulator where nodes can't find each other due to 0 udp ports in their enr.
2. Fixes bugs in attestation service where we are unsubscribing from a subnet prematurely.
More testing is needed for attestation service fixes.
## Proposed Changes
Adds a gossipsub topic filter that only allows subscribing and incoming subscriptions from valid ETH2 topics.
## Additional Info
Currently the preparation of the valid topic hashes uses only the current fork id but in the future it must also use all possible future fork ids for planned forks. This has to get added when hard coded forks get implemented.
DO NOT MERGE: We first need to merge the libp2p changes (see https://github.com/sigp/rust-libp2p/pull/70) so that we can refer from here to a commit hash inside the lighthouse branch.
## Proposed Changes
Implement the new message id function (see https://github.com/ethereum/eth2.0-specs/pull/2089) using an additional fast message id function for better performance + caching decompressed data.
## Issue Addressed
N/A
## Proposed Changes
This PR limits the length of the stream received by the snappy decoder to be the maximum allowed size for the received rpc message type. Also adds further checks to ensure that the length specified in the rpc [encoding-dependent header](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md#encoding-strategies) is within the bounds for the rpc message type being decoded.
Squashed commit of the following:
commit f99373cbaec9adb2bdbae3f7e903284327962083
Author: Age Manning <Age@AgeManning.com>
Date: Mon Oct 5 18:44:09 2020 +1100
Clean up obsolute TODOs
## Issue Addressed
- Resolves#1706
## Proposed Changes
Updates dependencies across the workspace. Any crate that was not able to be brought to the latest version is listed in #1712.
## Additional Info
NA
* Initial rebase
* Remove old code
* Correct release tests
* Rebase commit
* Remove eth2-testnet dep on eth2libp2p
* Remove crates lost in rebase
* Remove unused dep
## Issue Addressed
Downgrade inconsistent chain segment states from `panic` to `crit`. I don't love this solution but since range can always bounce back from any of those, we don't panic.
Co-authored-by: Age Manning <Age@AgeManning.com>
## Issue Addressed
chain state inconsistencies
## Proposed Changes
- a batch can be fake-failed by Range if it needs to move a peer to another chain. The peer will still send blocks/ errors / produce timeouts for those requests, so check when we get a response from the RPC that the request id matches, instead of only the peer, since a re-request can be directed to the same peer.
- if an optimistic batch succeeds, store the attempt to avoid trying it again when quickly switching chains. Also, use it only if ahead of our current target, instead of the segment's start epoch
## Issue Addressed
NA
## Proposed Changes
Uses a `Drop` implementation to help ensure that `BeaconProcessor` workers are freed. This will help prevent against regression, if someone happens to add an early return and it will also help in the case of a panic.
## Additional Info
NA
## Issue Addressed
Downgrade inconsistent chain segment states from `panic` to `crit`. I don't love this solution but since range can always bounce back from any of those, we don't panic.
Co-authored-by: Age Manning <Age@AgeManning.com>
## Issue Addressed
chain state inconsistencies
## Proposed Changes
- a batch can be fake-failed by Range if it needs to move a peer to another chain. The peer will still send blocks/ errors / produce timeouts for those requests, so check when we get a response from the RPC that the request id matches, instead of only the peer, since a re-request can be directed to the same peer.
- if an optimistic batch succeeds, store the attempt to avoid trying it again when quickly switching chains. Also, use it only if ahead of our current target, instead of the segment's start epoch
## Issue Addressed
NA
## Proposed Changes
Uses a `Drop` implementation to help ensure that `BeaconProcessor` workers are freed. This will help prevent against regression, if someone happens to add an early return and it will also help in the case of a panic.
## Additional Info
NA
## Issue Addressed
NA
## Proposed Changes
Addresses an interesting DoS vector raised by @protolambda by verifying that the head and target are consistent when processing aggregate attestations. This check prevents us from loading very old target blocks and doing lots of work to skip them to the current slot.
## Additional Info
NA
This commit was modified by Paul H whilst rebasing master onto
v0.3.0-staging
Adding UPnP support will help grow the DHT by allowing NAT traversal for peers with UPnP supported routers.
Using IGD library: https://docs.rs/igd/0.10.0/igd/
Adding the the libp2p tcp port and discovery udp port. If this fails it simply logs the attempt and moves on
Co-authored-by: Age Manning <Age@AgeManning.com>
This commit was edited by Paul H when rebasing from master to
v0.3.0-staging.
Solution 2 proposed here: https://github.com/sigp/lighthouse/issues/1435#issuecomment-692317639
- Adds an optional `--wss-checkpoint` flag that takes a string `root:epoch`
- Verify that the given checkpoint exists in the chain, or that the the chain syncs through this checkpoint. If not, shutdown and prompt the user to purge state before restarting.
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
NA
## Proposed Changes
Addresses an interesting DoS vector raised by @protolambda by verifying that the head and target are consistent when processing aggregate attestations. This check prevents us from loading very old target blocks and doing lots of work to skip them to the current slot.
## Additional Info
NA
## Issue Addressed
NA
## Proposed Changes
- Implements a HTTP API for the validator client.
- Creates EIP-2335 keystores with an empty `description` field, instead of a missing `description` field. Adds option to set name.
- Be more graceful with setups without any validators (yet)
- Remove an error log when there are no validators.
- Create the `validator` dir if it doesn't exist.
- Allow building a `ValidatorDir` without a withdrawal keystore (required for the API method where we only post a voting keystore).
- Add optional `description` field to `validator_definitions.yml`
## TODO
- [x] Signature header, as per https://github.com/sigp/lighthouse/issues/1269#issuecomment-649879855
- [x] Return validator descriptions
- [x] Return deposit data
- [x] Respect the mnemonic offset
- [x] Check that mnemonic can derive returned keys
- [x] Be strict about non-localhost
- [x] Allow graceful start without any validators (+ create validator dir)
- [x] Docs final pass
- [x] Swap to EIP-2335 description field.
- [x] Fix Zerioze TODO in VC api types.
- [x] Zeroize secp256k1 key
## Endpoints
- [x] `GET /lighthouse/version`
- [x] `GET /lighthouse/health`
- [x] `GET /lighthouse/validators`
- [x] `POST /lighthouse/validators/hd`
- [x] `POST /lighthouse/validators/keystore`
- [x] `PATCH /lighthouse/validators/:validator_pubkey`
- [ ] ~~`POST /lighthouse/validators/:validator_pubkey/exit/:epoch`~~ Future works
## Additional Info
TBC
Adding UPnP support will help grow the DHT by allowing NAT traversal for peers with UPnP supported routers.
## Issue Addressed
#927
## Proposed Changes
Using IGD library: https://docs.rs/igd/0.10.0/igd/
Adding the the libp2p tcp port and discovery udp port. If this fails it simply logs the attempt and moves on
## Additional Info
Co-authored-by: Age Manning <Age@AgeManning.com>
## Issue Addressed
Solution 2 proposed here: https://github.com/sigp/lighthouse/issues/1435#issuecomment-692317639
## Proposed Changes
- Adds an optional `--wss-checkpoint` flag that takes a string `root:epoch`
- Verify that the given checkpoint exists in the chain, or that the the chain syncs through this checkpoint. If not, shutdown and prompt the user to purge state before restarting.
## Additional Info
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
Closes#673
## Proposed Changes
Store a schema version in the database so that future releases can check they're running against a compatible database version. This would also enable automatic migration on breaking database changes, but that's left as future work.
The database config is also stored in the database so that the `slots_per_restore_point` value can be checked for consistency, which closes#673
- Resolves#1550
- Resolves#824
- Resolves#825
- Resolves#1131
- Resolves#1411
- Resolves#1256
- Resolve#1177
- Includes the `ShufflingId` struct initially defined in #1492. That PR is now closed and the changes are included here, with significant bug fixes.
- Implement the https://github.com/ethereum/eth2.0-APIs in a new `http_api` crate using `warp`. This replaces the `rest_api` crate.
- Add a new `common/eth2` crate which provides a wrapper around `reqwest`, providing the HTTP client that is used by the validator client and for testing. This replaces the `common/remote_beacon_node` crate.
- Create a `http_metrics` crate which is a dedicated server for Prometheus metrics (they are no longer served on the same port as the REST API). We now have flags for `--metrics`, `--metrics-address`, etc.
- Allow the `subnet_id` to be an optional parameter for `VerifiedUnaggregatedAttestation::verify`. This means it does not need to be provided unnecessarily by the validator client.
- Move `fn map_attestation_committee` in `mod beacon_chain::attestation_verification` to a new `fn with_committee_cache` on the `BeaconChain` so the same cache can be used for obtaining validator duties.
- Add some other helpers to `BeaconChain` to assist with common API duties (e.g., `block_root_at_slot`, `head_beacon_block_root`).
- Change the `NaiveAggregationPool` so it can index attestations by `hash_tree_root(attestation.data)`. This is a requirement of the API.
- Add functions to `BeaconChainHarness` to allow it to create slashings and exits.
- Allow for `eth1::Eth1NetworkId` to go to/from a `String`.
- Add functions to the `OperationPool` to allow getting all objects in the pool.
- Add function to `BeaconState` to check if a committee cache is initialized.
- Fix bug where `seconds_per_eth1_block` was not transferring over from `YamlConfig` to `ChainSpec`.
- Add the `deposit_contract_address` to `YamlConfig` and `ChainSpec`. We needed to be able to return it in an API response.
- Change some uses of serde `serialize_with` and `deserialize_with` to a single use of `with` (code quality).
- Impl `Display` and `FromStr` for several BLS fields.
- Check for clock discrepancy when VC polls BN for sync state (with +/- 1 slot tolerance). This is not intended to be comprehensive, it was just easy to do.
- See #1434 for a per-endpoint overview.
- Seeking clarity here: https://github.com/ethereum/eth2.0-APIs/issues/75
- [x] Add docs for prom port to close#1256
- [x] Follow up on this #1177
- [x] ~~Follow up with #1424~~ Will fix in future PR.
- [x] Follow up with #1411
- [x] ~~Follow up with #1260~~ Will fix in future PR.
- [x] Add quotes to all integers.
- [x] Remove `rest_types`
- [x] Address missing beacon block error. (#1629)
- [x] ~~Add tests for lighthouse/peers endpoints~~ Wontfix
- [x] ~~Follow up with validator status proposal~~ Tracked in #1434
- [x] Unify graffiti structs
- [x] ~~Start server when waiting for genesis?~~ Will fix in future PR.
- [x] TODO in http_api tests
- [x] Move lighthouse endpoints off /eth/v1
- [x] Update docs to link to standard
- ~~Blocked on #1586~~
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Closes#1487Closes#1427
Directory restructure in accordance with #1487. Also has temporary migration code to move the old directories into new structure.
Also extracts all default directory names and utility functions into a `directory` crate to avoid repetitio.
~Since `validator_definition.yaml` stores absolute paths, users will have to manually change the keystore paths or delete the file to get the validators picked up by the vc.~. `validator_definition.yaml` is migrated as well from the default directories.
Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
- Resolves#1313
## Proposed Changes
Changes the way we start the validator client and beacon node to ensure that we cleanly drop the validator keystores (which therefore ensures we cleanup their lockfiles).
Previously we were holding the validator keystores in a tokio task that was being forcefully killed (i.e., without `Drop`). Now, we hold them in a task that can gracefully handle a shutdown.
Also, switches the `--strict-lockfiles` flag to `--delete-lockfiles`. This means two things:
1. We are now strict on lockfiles by default (before we weren't).
1. There's a simple way for people delete the lockfiles if they experience a crash.
## Additional Info
I've only given the option to ignore *and* delete lockfiles, not just ignore them. I can't see a strong need for ignore-only but could easily add it, if the need arises.
I've flagged this as `api-breaking` since users that have lockfiles lingering around will be required to supply `--delete-lockfiles` next time they run.
## Issue Addressed
Closes#1680
## Proposed Changes
This PR fixes a race condition in beacon node start-up whereby the pubkey cache could be created by the beacon chain builder before the `PersistedBeaconChain` was stored to disk. When the node restarted, it would find the persisted chain missing, and attempt to start from scratch, creating a new pubkey cache in the process. This call to `ValidatorPubkeyCache::new` would fail if the file already existed (which it did). I changed the behaviour so that pubkey cache initialization now doesn't care whether there's a file already in existence (it's only a cache after all). Instead it will truncate and recreate the file in the race scenario described.
## Issue Addressed
NA
## Proposed Changes
There are four new conditions introduced in v0.12.3:
1. _[REJECT]_ The attestation's epoch matches its target -- i.e. `attestation.data.target.epoch ==
compute_epoch_at_slot(attestation.data.slot)`
1. _[REJECT]_ The attestation's target block is an ancestor of the block named in the LMD vote -- i.e.
`get_ancestor(store, attestation.data.beacon_block_root, compute_start_slot_at_epoch(attestation.data.target.epoch)) == attestation.data.target.root`
1. _[REJECT]_ The committee index is within the expected range -- i.e. `data.index < get_committee_count_per_slot(state, data.target.epoch)`.
1. _[REJECT]_ The number of aggregation bits matches the committee size -- i.e.
`len(attestation.aggregation_bits) == len(get_beacon_committee(state, data.slot, data.index))`.
This PR implements new logic to suit (1) and (2). Tests are added for (3) and (4), although they were already implicitly enforced.
## Additional Info
- There's a bit of edge-case with target root verification that I raised here: https://github.com/ethereum/eth2.0-specs/pull/2001#issuecomment-699246659
- I've had to add an `--ignore` to `cargo audit` to get CI to pass. See https://github.com/sigp/lighthouse/issues/1669
## Issue Addressed
#629
## Proposed Changes
This removes banned peers from the DHT and informs discovery to block the node_id and the known source IP's associated with this node. It has the capabilities of un banning this peer after a period of time.
This also corrects the logic about banning specific IP addresses. We now use seen_ip addresses from libp2p rather than those sent to us via identify (which also include local addresses).
## Issue Addressed
In principle.. closes#1551 but in general are improvements for performance, maintainability and readability. The logic for the optimistic sync in actually simple
## Proposed Changes
There are miscellaneous things here:
- Remove unnecessary `BatchProcessResult::Partial` to simplify the batch validation logic
- Make batches a state machine. This is done to ensure batch state transitions respect our logic (this was previously done by moving batches between `Vec`s) and to ease the cognitive load of the `SyncingChain` struct
- Move most batch-related logic to the batch
- Remove `PendingBatches` in favor of a map of peers to their batches. This is to avoid duplicating peers inside the chain (peer_pool and pending_batches)
- Add `must_use` decoration to the `ProcessingResult` so that chains that request to be removed are handled accordingly. This also means that chains are now removed in more places than before to account for unhandled cases
- Store batches in a sorted map (`BTreeMap`) access is not O(1) but since the number of _active_ batches is bounded this should be fast, and saves performing hashing ops. Batches are indexed by the epoch they start. Sorted, to easily handle chain advancements (range logic)
- Produce the chain Id from the identifying fields: target root and target slot. This, to guarantee there can't be duplicated chains and be able to consistently search chains by either Id or checkpoint
- Fix chain_id not being present in all chain loggers
- Handle mega-edge case where the processor's work queue is full and the batch can't be sent. In this case the chain would lose the blocks, remain in a "syncing" state and waiting for a result that won't arrive, effectively stalling sync.
- When a batch imports blocks or the chain starts syncing with a local finalized epoch greater that the chain's start epoch, the chain is advanced instead of reset. This is to avoid losing download progress and validate batches faster. This also means that the old `start_epoch` now means "current first unvalidated batch", so it represents more accurately the progress of the chain.
- Batch status peers from the same chain to reduce Arc access.
- Handle a couple of cases where the retry counters for a batch were not updated/checked are now handled via the batch state machine. Basically now if we forget to do it, we will know.
- Do not send back the blocks from the processor to the batch. Instead register the attempt before sending the blocks (does not count as failed)
- When re-requesting a batch, try to avoid not only the last failed peer, but all previous failed peers.
- Optimize requesting batches ahead in the buffer by shuffling idle peers just once (this is just addressing a couple of old TODOs in the code)
- In chain_collection, store chains by their id in a map
- Include a mapping from request_ids to (chain, batch) that requested the batch to avoid the double O(n) search on block responses
- Other stuff:
- impl `slog::KV` for batches
- impl `slog::KV` for syncing chains
- PSA: when logging, we can use `%thing` if `thing` implements `Display`. Same for `?` and `Debug`
### Optimistic syncing:
Try first the batch that contains the current head, if the batch imports any block, advance the chain. If not, if this optimistic batch is inside the current processing window leave it there for future use, if not drop it. The tolerance for this block is the same for downloading, but just once for processing
Co-authored-by: Age Manning <Age@AgeManning.com>
## Issue Addressed
N/A
## Proposed Changes
Prevent subscribing to core gossipsub topics until after we have achieved a full sync. This prevents us censoring gossipsub channels, getting penalised in gossipsub 1.1 scoring and saves us computation time in attempting to validate gossipsub messages which we will be unable to do with a non-sync'd chain.
## Issue Addressed
Closes#1472
## Proposed Changes
Add `--staking` ~~and`staking-with-eth1-endpoint`~~ flag to improve UX for stakers.
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
- Resolves#1080
## Proposed Changes
- Call `update_pubkey_cache` only in the `build_all_caches` method and `get_validator_index` method.
## Additional Info
This does reduce the number of places the cache is updated, making it simpler. But the `get_validator_index` method is used a couple times when we are iterating through the entire validator registry (or set of active validators). Before, we would only call `update_pubkey_cache` once before iterating through all validators. So I'm not _totally_ sure this change is worth it.
## Issue Addressed
- Resolves#1616
## Proposed Changes
If we look at the function which persists fork choice and the canonical head to disk:
1db8daae0c/beacon_node/beacon_chain/src/beacon_chain.rs (L234-L280)
There is a race-condition which might cause the canonical head and fork choice values to be out-of-sync.
I believe this is the cause of #1616. I managed to recreate the issue and produce a database that was unable to sync under the `master` branch but able to sync with this branch.
These new changes solve the issue by ignoring the persisted `canonical_head_block_root` value and instead getting fork choice to generate it. This ensures that the canonical head is in-sync with fork choice.
## Additional Info
This is hotfix method that leaves some crusty code hanging around. Once this PR is merged (to satisfy the v0.2.x users) we should later update and merge #1638 so we can have a clean fix for the v0.3.x versions.
## Issue Addressed
N/A
## Proposed Changes
Shifts the local `metadata` to `network_globals` making it accessible to the HTTP API and other areas of lighthouse.
## Additional Info
N/A
## Issue Addressed
N/A
## Proposed Changes
Adds the following check from the spec
> A reader SHOULD NOT read more than max_encoded_len(n) bytes after reading the SSZ length-prefix n from the header.
## Issue Addressed
- Resolves#1616
## Proposed Changes
Fixes a bug where we are unable to read the finalized block from fork choice.
## Detail
I had made an assumption that the finalized block always has a parent root of `None`:
e5fc6bab48/consensus/fork_choice/src/fork_choice.rs (L749-L752)
This was a faulty assumption, we don't set parent *roots* to `None`. Instead we *sometimes* set parent *indices* to `None`, depending if this pruning condition is satisfied:
e5fc6bab48/consensus/proto_array/src/proto_array.rs (L229-L232)
The bug manifested itself like this:
1. We attempt to get the finalized block from fork choice
1. We try to check that the block is descendant of the finalized block (note: they're the same block).
1. We expect the parent root to be `None`, but it's actually the parent root of the finalized root.
1. We therefore end up checking if the parent of the finalized root is a descendant of itself. (note: it's an *ancestor* not a *descendant*).
1. We therefore declare that the finalized block is not a descendant of (or eq to) the finalized block. Bad.
## Additional Info
In reflection, I made a poor assumption in the quest to obtain a probably negligible performance gain. The performance gain wasn't worth the risk and we got burnt.
## Issue Addressed
N/A
## Proposed Changes
This will consider all gossipsub messages that have either the `from`, `seqno` or `signature` field as invalid.
## Additional Info
We should not merge this until all other clients have been sending empty fields for a while.
See https://github.com/ethereum/eth2.0-specs/issues/1981 for reference
## Issue Addressed
https://github.com/ethereum/eth2.0-specs/pull/2044
## Proposed Changes
Shifts the gossipsub message id to use the first 8 bytes of the SHA256 hash of the gossipsub message data field.
## Additional Info
We should merge this in once the spec has been decided on. It will cause issues with gossipsub scoring and gossipsub propagation rates (as we won't receive IWANT) messages from clients that also haven't made this update.
## Issue Addressed
#1590
## Proposed Changes
This is a temporary workaround that prevents finalized chain sync from swapping chains. I'm merging this in now until the full solution is ready.
## Issue Addressed
Malicious users could request very large block ranges, more than we expect. Although technically legal, we are now quadraticaly weighting large step sizes in the filter. Therefore users may request large skips, but not a large number of blocks, to prevent requests forcing us to do long chain lookups.
## Proposed Changes
Weight the step parameter in the RPC filter and prevent any overflows that effect us in the step parameter.
## Additional Info
## Issue Addressed
Closes#1365
## Proposed Changes
Dial peers in the `cached_enrs` who aren't connected, aren't banned and satisfy the subnet predicate before making a subnet discovery query.
## Issue Addressed
Partly addresses #1547
## Proposed Changes
This fix addresses the missing attestations at slot 0 of an epoch (also sometimes slot 1 when slot 0 was skipped).
There are 2 cases:
1. BN receives the block for the attestation slot after 4 seconds (1/3rd of the slot).
2. No block is proposed for this slot.
In both cases, when we produce the attestation, we pass the head state to the
`produce_unaggregated_attestation_for_block` function here
9833eca024/beacon_node/beacon_chain/src/beacon_chain.rs (L845-L850)
Since we don't advance the state in this function, we set `attestation.data.source = state.current_justified_checkpoint` which is atleast 2 epochs lower than current_epoch(wall clock epoch).
This attestation is invalid and cannot be included in a block because of this assert from the spec:
```python
if data.target.epoch == get_current_epoch(state):
assert data.source == state.current_justified_checkpoint
state.current_epoch_attestations.append(pending_attestation)
```
https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/beacon-chain.md#attestations
This PR changes the `produce_unaggregated_attestation_for_block` function to ensure that it advances the state before producing the attestation at the new epoch.
Running this on my node, have missed 0 attestations across all 8 of my validators in a 100 epoch period 🎉
To compare, I was missing ~14 attestations across all 8 validators in the same 100 epoch period before the fix.
Will report missed attestations if any after running for another 100 epochs tomorrow.
## Issue Addressed
#1421
## Proposed Changes
Bounding the error_message that can be returned for RPC domain errors
Co-authored-by: Age Manning <Age@AgeManning.com>
Converts the graffiti binary data to string before printing to logs.
## Issue Addressed
#1566
## Proposed Changes
Rather than converting graffiti to a vector the binary data less the last character is passed to String::from_utf_lossy(). This then allows us to call the to_string() function directly to give us the string
## Additional Info
Rust skills are fairly weak
## Issue Addressed
N/A
## Proposed Changes
Adds extended metrics to get a better idea of what is happening at the gossipsub layer of lighthouse. This provides information about mesh statistics per topics, subscriptions and peer scores.
## Additional Info
## Issue Addressed
Fixes#1509
## Proposed Changes
Exit the beacon node if the eth1 endpoint points to an invalid eth1 network. Check the network id before every eth1 cache update and display an error log if the network id has changed to an invalid one.
## Issue Addressed
#1172
## Proposed Changes
* updates the libp2p dependency
* small adaptions based on changes in libp2p
* report not just valid messages but also invalid and distinguish between `IGNORE`d messages and `REJECT`ed messages
Co-authored-by: Age Manning <Age@AgeManning.com>
The PR:
* Adds the ability to generate a crucial test scenario that isn't possible with `BeaconChainHarness` (i.e. two blocks occupying the same slot; previously forks necessitated skipping slots):
![image](https://user-images.githubusercontent.com/165678/88195404-4bce3580-cc40-11ea-8c08-b48d2e1d5959.png)
* New testing API: Instead of repeatedly calling add_block(), you generate a sorted `Vec<Slot>` and leave it up to the framework to generate blocks at those slots.
* Jumping backwards to an earlier epoch is a hard error, so that tests necessarily generate blocks in a epoch-by-epoch manner.
* Configures the test logger so that output is printed on the console in case a test fails. The logger also plays well with `--nocapture`, contrary to the existing testing framework
* Rewrites existing fork pruning tests to use the new API
* Adds a tests that triggers finalization at a non epoch boundary slot
* Renamed `BeaconChainYoke` to `BeaconChainTestingRig` because the former has been too confusing
* Fixed multiple tests (e.g. `block_production_different_shuffling_long`, `delete_blocks_and_states`, `shuffling_compatible_simple_fork`) that relied on a weird (and accidental) feature of the old `BeaconChainHarness` that attestations aren't produced for epochs earlier than the current one, thus masking potential bugs in test cases.
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
## Issue Addressed
Closes#1488
## Proposed Changes
* Prevent the pruning algorithm from over-eagerly deleting states at skipped slots when they are shared with the canonical chain.
* Add `debug` logging to the pruning algorithm so we have so better chance of debugging future issues from logs.
* Modify the handling of the "finalized state" in the beacon chain, so that it's always the state at the first slot of the finalized epoch (previously it was the state at the finalized block). This gives database pruning a clearer and cleaner view of things, and will marginally impact the pruning of the op pool, observed proposers, etc (in ways that are safe as far as I can tell).
* Remove duplicated `RevertedFinalizedEpoch` check from `after_finalization`
* Delete useless and unused `max_finality_distance`
* Add tests that exercise pruning with shared states at skip slots
* Delete unnecessary `block_strategy` argument from `add_blocks` and friends in the test harness (will likely conflict with #1380 slightly, sorry @adaszko -- but we can fix that)
* Bonus: add a `BeaconChain::with_head` method. I didn't end up needing it, but it turned out quite nice, so I figured we could keep it?
## Additional Info
Any users who have experienced pruning errors on Medalla will need to resync after upgrading to a release including this change. This should end unbounded `chain_db` growth! 🎉
## Issue Addressed
NA
## Proposed Changes
Shift practically all HTTP endpoint handlers to the blocking executor (some very light tasks are left on the core executor).
## Additional Info
This PR covers the `rest_api` which will soon be refactored to suit the standard API. As such, I've cut a few corners and left some existing issues open in this patch. What I have done here should leave the API in state that is not necessary *exactly* the same, but good enough for us to run validators with. Specifically, the number of blocking workers that can be spawned is unbounded and I have not implemented a queue; this will need to be fixed when we implement the standard API.
## Issue Addressed
#1378
## Proposed Changes
Boot node reuses code from beacon_node to initialize network config. This also enables using the network directory to store/load the enr and the private key.
## Additional Info
Note that before this PR the port cli arguments were off (the argument was named `enr-port` but used as `boot-node-enr-port`).
Therefore as port always the cli port argument was used (for both enr and listening). Now the enr-port argument can be used to overwrite the listening port as the public port others should connect to.
Last but not least note, that this restructuring reuses `ethlibp2p::NetworkConfig` that has many more options than the ones used in the boot node. For example the network config has an own `discv5_config` field that gets never used in the boot node and instead another `Discv5Config` gets created later in the boot node process.
Co-authored-by: Age Manning <Age@AgeManning.com>
## Issue Addressed
#1283
## Proposed Changes
All peers with the same IP will be considered banned as long as there are more than 5 (constant) peers with this IP that have a score below the ban threshold. As soon as some of those 5 peers get unbanned (through decay) and if there are then less than 5 peers with a score below the threshold the IP will be considered not banned anymore.
## Issue Addressed
N/A
## Proposed Changes
Refactor attestation service to send out requests to find peers for subnets as soon as we get attestation duties.
Earlier, we had much more involved logic to send the discovery requests to the discovery service only 6 slots before the attestation slot. Now that discovery is much smarter with grouped queries, the complexity in attestation service can be reduced considerably.
Co-authored-by: Age Manning <Age@AgeManning.com>
## Issue Addressed
#1494
## Proposed Changes
- Give the TaskExecutor the sender side of a channel that a task can clone to request shutting down
- The receiver side of this channel is in environment and now we block until ctrl+c or an internal shutdown signal is received
- The swarm now informs when it has reached 0 listeners
- The network receives this message and requests the shutdown
## Issue Addressed
NA
## Proposed Changes
- Refactors the `BeaconProcessor` to remove some excessive nesting and file bloat
- Sorry about the noise from this, it's all contained in 4d3f8c5 though.
- Adds exits, proposer slashings, attester slashings to the `BeaconProcessor` so we don't get overwhelmed with large amounts of slashings (which happened a few hours ago).
## Additional Info
NA
## Issue Addressed
NA
## Proposed Changes
Sets the default max skips to 700 so that it can cover the 693 slot skip from `80894 - 80201`.
## Additional Info
NA