Commit Graph

92 Commits

Author SHA1 Message Date
Michael Sproul
acd49d988d Implement database temp states to reduce memory usage (#1798)
## Issue Addressed

Closes #800
Closes #1713

## Proposed Changes

Implement the temporary state storage algorithm described in #800. Specifically:

* Add `DBColumn::BeaconStateTemporary`, for storing 0-length temporary marker values.
* Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully.
* Add a garbage collection process to delete leftover temporary states on start-up.
* Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784)

## Additional Info

There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant.

### Race 1: Permanent state marked temporary

EDIT: this has been fixed by the addition of a lock around the relevant critical section

There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events:

1. Thread 1 checks if state `s` already exists, and seeing that it doesn't, prepares an atomic commit of `(s, s_temporary_flag)`.
2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and _deleting_ the temporary flag.
3. Thread 1 is (finally) scheduled again, and marks `s` as temporary with its transaction.
4.
    a) The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state `s` will fail... hope it isn't on the main chain! Alternatively (4b) happens...
    b) Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state `s` will disappear temporarily, but will come back once thread 1 finishes running.

I _hope_ that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know

This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. [LMDB only allows one R/W transaction at a time](https://docs.rs/lmdb/0.8.0/lmdb/struct.Environment.html#method.begin_rw_txn).

### Race 2: Temporary state returned from `get_state`

I don't think this race really matters, but in `load_hot_state`, if another thread stores a state between when we call `load_state_temporary_flag` and when we call `load_hot_state_summary`, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than _not_ having data).

This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now.
2020-10-23 01:27:51 +00:00
Michael Sproul
703c33bdc7 Fix head tracker concurrency bugs (#1771)
## Issue Addressed

Closes #1557

## Proposed Changes

Modify the pruning algorithm so that it mutates the head-tracker _before_ committing the database transaction to disk, and _only if_ all the heads to be removed are still present in the head-tracker (i.e. no concurrent mutations).

In the process of writing and testing this I also had to make a few other changes:

* Use internal mutability for all `BeaconChainHarness` functions (namely the RNG and the graffiti), in order to enable parallel calls (see testing section below).
* Disable logging in harness tests unless the `test_logger` feature is turned on

And chose to make some clean-ups:

* Delete the `NullMigrator`
* Remove type-based configuration for the migrator in favour of runtime config (simpler, less duplicated code)
* Use the non-blocking migrator unless the blocking migrator is required. In the store tests we need the blocking migrator because some tests make asserts about the state of the DB after the migration has run.
* Rename `validators_keypairs` -> `validator_keypairs` in the `BeaconChainHarness`

## Testing

To confirm that the fix worked, I wrote a test using [Hiatus](https://crates.io/crates/hiatus), which can be found here:

https://github.com/michaelsproul/lighthouse/tree/hiatus-issue-1557

That test can't be merged because it inserts random breakpoints everywhere, but if you check out that branch you can run the test with:

```
$ cd beacon_node/beacon_chain
$ cargo test --release --test parallel_tests --features test_logger
```

It should pass, and the log output should show:

```
WARN Pruning deferred because of a concurrent mutation, message: this is expected only very rarely!
```

## Additional Info

This is a backwards-compatible change with no impact on consensus.
2020-10-19 05:58:39 +00:00
Paul Hauner
ee7c8a0b7e Update external deps (#1711)
## Issue Addressed

- Resolves #1706 

## Proposed Changes

Updates dependencies across the workspace. Any crate that was not able to be brought to the latest version is listed in #1712.

## Additional Info

NA
2020-10-05 08:22:19 +00:00
Michael Sproul
22aedda1be
Add database schema versioning (#1688)
## Issue Addressed

Closes #673

## Proposed Changes

Store a schema version in the database so that future releases can check they're running against a compatible database version. This would also enable automatic migration on breaking database changes, but that's left as future work.

The database config is also stored in the database so that the `slots_per_restore_point` value can be checked for consistency, which closes #673
2020-10-01 11:12:36 +10:00
Michael Sproul
5d17eb899f Update LevelDB to v0.8.6, removing patch (#1636)
Removes our dependency on a fork of LevelDB now that https://github.com/skade/leveldb-sys/pull/17 is merged
2020-09-21 11:53:53 +00:00
Adam Szkoda
d9f4819fe0 Alternative (to BeaconChainHarness) BeaconChain testing API (#1380)
The PR:

* Adds the ability to generate a crucial test scenario that isn't possible with `BeaconChainHarness` (i.e. two blocks occupying the same slot; previously forks necessitated skipping slots):

![image](https://user-images.githubusercontent.com/165678/88195404-4bce3580-cc40-11ea-8c08-b48d2e1d5959.png)

* New testing API: Instead of repeatedly calling add_block(), you generate a sorted `Vec<Slot>` and leave it up to the framework to generate blocks at those slots.
* Jumping backwards to an earlier epoch is a hard error, so that tests necessarily generate blocks in a epoch-by-epoch manner.
* Configures the test logger so that output is printed on the console in case a test fails.  The logger also plays well with `--nocapture`, contrary to the existing testing framework
* Rewrites existing fork pruning tests to use the new API
* Adds a tests that triggers finalization at a non epoch boundary slot
* Renamed `BeaconChainYoke` to `BeaconChainTestingRig` because the former has been too confusing
* Fixed multiple tests (e.g. `block_production_different_shuffling_long`, `delete_blocks_and_states`, `shuffling_compatible_simple_fork`) that relied on a weird (and accidental) feature of the old `BeaconChainHarness` that attestations aren't produced for epochs earlier than the current one, thus masking potential bugs in test cases.

Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2020-08-26 09:24:55 +00:00
Michael Sproul
4763f03dcc Fix bug in database pruning (#1564)
## Issue Addressed

Closes #1488

## Proposed Changes

* Prevent the pruning algorithm from over-eagerly deleting states at skipped slots when they are shared with the canonical chain.
* Add `debug` logging to the pruning algorithm so we have so better chance of debugging future issues from logs.
* Modify the handling of the "finalized state" in the beacon chain, so that it's always the state at the first slot of the finalized epoch (previously it was the state at the finalized block). This gives database pruning a clearer and cleaner view of things, and will marginally impact the pruning of the op pool, observed proposers, etc (in ways that are safe as far as I can tell).
* Remove duplicated `RevertedFinalizedEpoch` check from `after_finalization`
* Delete useless and unused `max_finality_distance`
* Add tests that exercise pruning with shared states at skip slots
* Delete unnecessary `block_strategy` argument from `add_blocks` and friends in the test harness (will likely conflict with #1380 slightly, sorry @adaszko -- but we can fix that)
* Bonus: add a `BeaconChain::with_head` method. I didn't end up needing it, but it turned out quite nice, so I figured we could keep it?

## Additional Info

Any users who have experienced pruning errors on Medalla will need to resync after upgrading to a release including this change. This should end unbounded `chain_db` growth! 🎉
2020-08-26 00:01:06 +00:00
Paul Hauner
61d5b592cb Memory usage reduction (#1522)
## Issue Addressed

NA

## Proposed Changes

- Adds a new function to allow getting a state with a bad state root history for attestation verification. This reduces unnecessary tree hashing during attestation processing, which accounted for 23% of memory allocations (by bytes) in a recent `heaptrack` observation.
- Don't clone caches on intermediate epoch-boundary states during block processing.
- Reject blocks that are known to fork choice earlier during gossip processing, instead of waiting until after state has been loaded (this only happens in edge-case).
- Avoid multiple re-allocations by creating a "forced" exact size iterator.

## Additional Info

NA
2020-08-17 08:05:13 +00:00
Adam Szkoda
8a1a4051cf Fix a bug in fork pruning (#1507)
Extracted from https://github.com/sigp/lighthouse/pull/1380 because merging #1380 proves to be contentious.

Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2020-08-12 07:00:00 +00:00
Age Manning
09a615b2c0 Lighthouse crate v0.2.0 bump (#1450)
## Description

This PR marks Lighthouse v0.2.0. 

This release marks the stable version of Lighthouse, ready for the approaching Medalla testnet.
2020-08-06 03:43:05 +00:00
blacktemplar
23a8f31f83 Fix clippy warnings (#1385)
## Issue Addressed

NA

## Proposed Changes

Fixes most clippy warnings and ignores the rest of them, see issue #1388.
2020-07-23 14:18:00 +00:00
Paul Hauner
25cd91ce26
Update deps (#1322)
* Run cargo update

* Upgrade prometheus

* Update hex

* Upgrade parking-lot

* Upgrade num-bigint

* Upgrade sha2

* Update dockerfile Rust version

* Run cargo update
2020-07-06 11:55:56 +10:00
Adam Szkoda
c7f47af9fb
Harden the freezing procedure against failures (#1323)
* Enable logging in tests

* Migrate states to the freezer atomically
2020-07-03 09:47:31 +10:00
Adam Szkoda
536728b975
Write new blocks and states to the database atomically (#1285)
* Mostly atomic put_state()
* Reduce number of vec allocations
* Make crucial db operations atomic
* Save restore points
* Remove StateBatch
* Merge two HotColdDB impls
* Further reduce allocations
* Review feedback
* Silence clippy warning
2020-07-01 12:45:57 +10:00
Michael Sproul
305724770d
Bump all spec tags to v0.12.1 (#1275) 2020-06-19 11:18:27 +10:00
Michael Sproul
81c9fe3817
Apply store refactor to new fork choice 2020-06-17 15:20:44 +10:00
Michael Sproul
e6f97bf466
Merge remote-tracking branch 'origin/master' into spec-v0.12 2020-06-17 12:34:11 +10:00
Paul Hauner
764cb2d32a
v0.12 fork choice update (#1229)
* Incomplete scraps

* Add progress on new fork choice impl

* Further progress

* First complete compiling version

* Remove chain reference

* Add new lmd_ghost crate

* Start integrating into beacon chain

* Update `milagro_bls` to new release (#1183)

* Update milagro_bls to new release

Signed-off-by: Kirk Baird <baird.k@outlook.com>

* Tidy up fake cryptos

Signed-off-by: Kirk Baird <baird.k@outlook.com>

* move SecretHash to bls and put plaintext back

Signed-off-by: Kirk Baird <baird.k@outlook.com>

* Update state processing for v0.12

* Fix EF test runners for v0.12

* Fix some tests

* Fix broken attestation verification test

* More test fixes

* Rough beacon chain impl working

* Remove fork_choice_2

* Remove checkpoint manager

* Half finished ssz impl

* Add missed file

* Add persistence

* Tidy, fix some compile errors

* Remove RwLock from ProtoArrayForkChoice

* Fix store-based compile errors

* Add comments, tidy

* Move function out of ForkChoice struct

* Start testing

* More testing

* Fix compile error

* Tidy beacon_chain::fork_choice

* Queue attestations from the current slot

* Allow fork choice to handle prior-to-genesis start

* Improve error granularity

* Test attestation dequeuing

* Process attestations during block

* Store target root in fork choice

* Move fork choice verification into new crate

* Update tests

* Consensus updates for v0.12 (#1228)

* Update state processing for v0.12

* Fix EF test runners for v0.12

* Fix some tests

* Fix broken attestation verification test

* More test fixes

* Fix typo found in review

* Add `Block` struct to ProtoArray

* Start fixing get_ancestor

* Add rough progress on testing

* Get fork choice tests working

* Progress with testing

* Fix partialeq impl

* Move slot clock from fc_store

* Improve testing

* Add testing for best justified

* Add clone back to SystemTimeSlotClock

* Add balances test

* Start adding balances cache again

* Wire-in balances cache

* Improve tests

* Remove commented-out tests

* Remove beacon_chain::ForkChoice

* Rename crates

* Update wider codebase to new fork_choice layout

* Move advance_slot in test harness

* Tidy ForkChoice::update_time

* Fix verification tests

* Fix compile error with iter::once

* Fix fork choice tests

* Ensure block attestations are processed

* Fix failing beacon_chain tests

* Add first invalid block check

* Add finalized block check

* Progress with testing, new store builder

* Add fixes to get_ancestor

* Fix old genesis justification test

* Fix remaining fork choice tests

* Change root iteration method

* Move on_verified_block

* Remove unused method

* Start adding attestation verification tests

* Add invalid ffg target test

* Add target epoch test

* Add queued attestation test

* Remove old fork choice verification tests

* Tidy, add test

* Move fork choice lock drop

* Rename BeaconForkChoiceStore

* Add comments, tidy BeaconForkChoiceStore

* Update metrics, rename fork_choice_store.rs

* Remove genesis_block_root from ForkChoice

* Tidy

* Update fork_choice comments

* Tidy, add comments

* Tidy, simplify ForkChoice, fix compile issue

* Tidy, removed dead file

* Increase http request timeout

* Fix failing rest_api test

* Set HTTP timeout back to 5s

* Apply fix to get_ancestor

* Address Michael's comments

* Fix typo

* Revert "Fix broken attestation verification test"

This reverts commit 722cdc903b12611de27916a57eeecfa3224f2279.

Co-authored-by: Kirk Baird <baird.k@outlook.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2020-06-17 11:10:22 +10:00
Adam Szkoda
9db0c28051
Make key value storage abstractions more accurate (#1267)
* Layer do_atomically() abstractions properly

* Reduce allocs and DRY get_key_for_col()

* Parameterize HotColdDB with hot and cold item stores

* -impl Store for MemoryStore

* Replace Store uses with HotColdDB

* Ditch Store trait

* cargo fmt

* Style fix

* Readd missing dep that broke the build
2020-06-16 11:34:04 +10:00
Michael Sproul
7818447fd2
Check for unused deps in CI (#1262)
* Check for unused deps in CI

* Bump slashing protection parking_lot version
2020-06-14 10:59:50 +10:00
Adam Szkoda
7f036a6e95
Add error handling to iterators (#1243)
* Add error handling to iterators

* Review feedback

* Leverage itertools::process_results() in few places
2020-06-10 09:55:44 +10:00
Paul Hauner
d9d00cc05d
Update lru, leveldb. Run cargo update (#1249)
* Update lru, leveldb. Run cargo update

* Add cmake to docker image

* Move cmake dep in dockerfile
2020-06-06 16:39:42 +10:00
Adam Szkoda
ce10db15da
Remove code duplicating stdlib (#1239)
* Get rid of superfluous ReverseBlockRootIterator

* Get rid of superfluous ReverseStateRootIterator and ReverseChainIterator

* cargo fmt
2020-06-02 10:41:42 +10:00
Adam Szkoda
91cb14ac41
Clean up database abstractions (#1200)
* Remove redundant method

* Pull out a method out of a struct

* More precise db access abstractions

* Move fake trait method out of it

* cargo fmt

* Fix compilation error after refactoring

* Move another fake method out the Store trait

* Get rid of superfluous method

* Fix refactoring bug

* Rename: SimpleStoreItem -> StoreItem

* Get rid of the confusing DiskStore type alias

* Get rid of SimpleDiskStore type alias

* Correction: A method took both self and a ref to Self
2020-06-01 08:13:49 +10:00
Adam Szkoda
919c81fe7d
Ditch StoreItem trait (#1185) 2020-05-25 10:26:54 +10:00
Adam Szkoda
d79e07902e
Relax PartialEq constraint on error enums (#1179) 2020-05-21 10:21:44 +10:00
Paul Hauner
4331834003
Directory Restructure (#1163)
* Move tests -> testing

* Directory restructure

* Update Cargo.toml during restructure

* Update Makefile during restructure

* Fix arbitrary path
2020-05-18 21:24:23 +10:00
Paul Hauner
90b3953dda
v0.1.2 (#1155)
* Version downgrade

* Start updating docs and version tags

Co-authored-by: Age Manning <Age@AgeManning.com>
2020-05-18 15:05:23 +10:00
Age Manning
b6408805a2
Stable futures (#879)
* Port eth1 lib to use stable futures

* Port eth1_test_rig to stable futures

* Port eth1 tests to stable futures

* Port genesis service to stable futures

* Port genesis tests to stable futures

* Port beacon_chain to stable futures

* Port lcli to stable futures

* Fix eth1_test_rig (#1014)

* Fix lcli

* Port timer to stable futures

* Fix timer

* Port websocket_server to stable futures

* Port notifier to stable futures

* Add TODOS

* Update hashmap hashset to stable futures

* Adds panic test to hashset delay

* Port remote_beacon_node to stable futures

* Fix lcli merge conflicts

* Non rpc stuff compiles

* protocol.rs compiles

* Port websockets, timer and notifier to stable futures (#1035)

* Fix lcli

* Port timer to stable futures

* Fix timer

* Port websocket_server to stable futures

* Port notifier to stable futures

* Add TODOS

* Port remote_beacon_node to stable futures

* Partial eth2-libp2p stable future upgrade

* Finished first round of fighting RPC types

* Further progress towards porting eth2-libp2p adds caching to discovery

* Update behaviour

* RPC handler to stable futures

* Update RPC to master libp2p

* Network service additions

* Fix the fallback transport construction (#1102)

* Correct warning

* Remove hashmap delay

* Compiling version of eth2-libp2p

* Update all crates versions

* Fix conversion function and add tests (#1113)

* Port validator_client to stable futures (#1114)

* Add PH & MS slot clock changes

* Account for genesis time

* Add progress on duties refactor

* Add simple is_aggregator bool to val subscription

* Start work on attestation_verification.rs

* Add progress on ObservedAttestations

* Progress with ObservedAttestations

* Fix tests

* Add observed attestations to the beacon chain

* Add attestation observation to processing code

* Add progress on attestation verification

* Add first draft of ObservedAttesters

* Add more tests

* Add observed attesters to beacon chain

* Add observers to attestation processing

* Add more attestation verification

* Create ObservedAggregators map

* Remove commented-out code

* Add observed aggregators into chain

* Add progress

* Finish adding features to attestation verification

* Ensure beacon chain compiles

* Link attn verification into chain

* Integrate new attn verification in chain

* Remove old attestation processing code

* Start trying to fix beacon_chain tests

* Split adding into pools into two functions

* Add aggregation to harness

* Get test harness working again

* Adjust the number of aggregators for test harness

* Fix edge-case in harness

* Integrate new attn processing in network

* Fix compile bug in validator_client

* Update validator API endpoints

* Fix aggreagation in test harness

* Fix enum thing

* Fix attestation observation bug:

* Patch failing API tests

* Start adding comments to attestation verification

* Remove unused attestation field

* Unify "is block known" logic

* Update comments

* Supress fork choice errors for network processing

* Add todos

* Tidy

* Add gossip attn tests

* Disallow test harness to produce old attns

* Comment out in-progress tests

* Partially address pruning tests

* Fix failing store test

* Add aggregate tests

* Add comments about which spec conditions we check

* Dont re-aggregate

* Split apart test harness attn production

* Fix compile error in network

* Make progress on commented-out test

* Fix skipping attestation test

* Add fork choice verification tests

* Tidy attn tests, remove dead code

* Remove some accidentally added code

* Fix clippy lint

* Rename test file

* Add block tests, add cheap block proposer check

* Rename block testing file

* Add observed_block_producers

* Tidy

* Switch around block signature verification

* Finish block testing

* Remove gossip from signature tests

* First pass of self review

* Fix deviation in spec

* Update test spec tags

* Start moving over to hashset

* Finish moving observed attesters to hashmap

* Move aggregation pool over to hashmap

* Make fc attn borrow again

* Fix rest_api compile error

* Fix missing comments

* Fix monster test

* Uncomment increasing slots test

* Address remaining comments

* Remove unsafe, use cfg test

* Remove cfg test flag

* Fix dodgy comment

* Revert "Update hashmap hashset to stable futures"

This reverts commit d432378a3cc5cd67fc29c0b15b96b886c1323554.

* Revert "Adds panic test to hashset delay"

This reverts commit 281502396fc5b90d9c421a309c2c056982c9525b.

* Ported attestation_service

* Ported duties_service

* Ported fork_service

* More ports

* Port block_service

* Minor fixes

* VC compiles

* Update TODOS

* Borrow self where possible

* Ignore aggregates that are already known.

* Unify aggregator modulo logic

* Fix typo in logs

* Refactor validator subscription logic

* Avoid reproducing selection proof

* Skip HTTP call if no subscriptions

* Rename DutyAndState -> DutyAndProof

* Tidy logs

* Print root as dbg

* Fix compile errors in tests

* Fix compile error in test

* Re-Fix attestation and duties service

* Minor fixes

Co-authored-by: Paul Hauner <paul@paulhauner.com>

* Network crate update to stable futures

* Port account_manager to stable futures (#1121)

* Port account_manager to stable futures

* Run async fns in tokio environment

* Port rest_api crate to stable futures (#1118)

* Port rest_api lib to stable futures

* Reduce tokio features

* Update notifier to stable futures

* Builder update

* Further updates

* Convert self referential async functions

* stable futures fixes (#1124)

* Fix eth1 update functions

* Fix genesis and client

* Fix beacon node lib

* Return appropriate runtimes from environment

* Fix test rig

* Refactor eth1 service update

* Upgrade simulator to stable futures

* Lighthouse compiles on stable futures

* Remove println debugging statement

* Update libp2p service, start rpc test upgrade

* Update network crate for new libp2p

* Update tokio::codec to futures_codec (#1128)

* Further work towards RPC corrections

* Correct http timeout and network service select

* Use tokio runtime for libp2p

* Revert "Update tokio::codec to futures_codec (#1128)"

This reverts commit e57aea924acf5cbabdcea18895ac07e38a425ed7.

* Upgrade RPC libp2p tests

* Upgrade secio fallback test

* Upgrade gossipsub examples

* Clean up RPC protocol

* Test fixes (#1133)

* Correct websocket timeout and run on os thread

* Fix network test

* Clean up PR

* Correct tokio tcp move attestation service tests

* Upgrade attestation service tests

* Correct network test

* Correct genesis test

* Test corrections

* Log info when block is received

* Modify logs and update attester service events

* Stable futures: fixes to vc, eth1 and account manager (#1142)

* Add local testnet scripts

* Remove whiteblock script

* Rename local testnet script

* Move spawns onto handle

* Fix VC panic

* Initial fix to block production issue

* Tidy block producer fix

* Tidy further

* Add local testnet clean script

* Run cargo fmt

* Tidy duties service

* Tidy fork service

* Tidy ForkService

* Tidy AttestationService

* Tidy notifier

* Ensure await is not suppressed in eth1

* Ensure await is not suppressed in account_manager

* Use .ok() instead of .unwrap_or(())

* RPC decoding test for proto

* Update discv5 and eth2-libp2p deps

* Fix lcli double runtime issue (#1144)

* Handle stream termination and dialing peer errors

* Correct peer_info variant types

* Remove unnecessary warnings

* Handle subnet unsubscription removal and improve logigng

* Add logs around ping

* Upgrade discv5 and improve logging

* Handle peer connection status for multiple connections

* Improve network service logging

* Improve logging around peer manager

* Upgrade swarm poll centralise peer management

* Identify clients on error

* Fix `remove_peer` in sync (#1150)

* remove_peer removes from all chains

* Remove logs

* Fix early return from loop

* Improved logging, fix panic

* Partially correct tests

* Stable futures: Vc sync (#1149)

* Improve syncing heuristic

* Add comments

* Use safer method for tolerance

* Fix tests

* Stable futures: Fix VC bug, update agg pool, add more metrics (#1151)

* Expose epoch processing summary

* Expose participation metrics to prometheus

* Switch to f64

* Reduce precision

* Change precision

* Expose observed attesters metrics

* Add metrics for agg/unagg attn counts

* Add metrics for gossip rx

* Add metrics for gossip tx

* Adds ignored attns to prom

* Add attestation timing

* Add timer for aggregation pool sig agg

* Add write lock timer for agg pool

* Add more metrics to agg pool

* Change map lock code

* Add extra metric to agg pool

* Change lock handling in agg pool

* Change .write() to .read()

* Add another agg pool timer

* Fix for is_aggregator

* Fix pruning bug

Co-authored-by: pawan <pawandhananjay@gmail.com>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
2020-05-17 11:16:48 +00:00
Adam Szkoda
59ead67f76
Race condition fix + Reliability improvements around forks pruning (#1132)
* Improve error handling in block iteration

* Introduce atomic DB operations

* Fix race condition

An invariant was violated:  For every block hash in head_tracker, that
block is accessible from the store.
2020-05-16 13:23:32 +10:00
Thor Kamphefner
01f42a4d17
removed state-cache-size flag from beacon_node/src (#1120)
* removed state-cache-size flag from beacon_node/src
* removed state-cache-size related lines from store/src/config.rs
2020-05-14 22:34:24 +10:00
Age Manning
9e416a9bcd
Merge latest master 2020-04-22 01:05:46 +10:00
Adam Szkoda
9c3f76a33b
Prune abandoned forks (#916)
* Address compiler warning

* Prune abandoned fork choice forks

* New approach to pruning

* Wrap some block hashes in a newtype pattern

For increased type safety.

* Add Graphviz chain dump emitter for debugging

* Fix broken test case

* Make prunes_abandoned_forks use real DiskStore

* Mark finalized blocks in the GraphViz output

* Refine debug stringification of Slot and Epoch

Before this commit: print!("{:?}", Slot(123)) == "Slot(\n123\n)".
After this commit: print!("{:?", Slot(123)) == "Slot(123)".

* Simplify build_block()

* Rewrite test case using more composable test primitives

* Working rewritten test case

* Tighten fork prunning test checks

* Add another pruning test case

* Bugfix: Finalized blocks weren't always properly detected

* Pruning: Add pruning_does_not_touch_blocks_prior_to_finalization test case

* Tighten pruning tests: check if heads are tracked properly

* Add a failing test case for a buggy scenario

* Change name of function to a more accurate one

* Fix failing test case

* Test case: Were skipped slots' states pruned?

* Style fix: Simplify dereferencing

* Tighten pruning tests: check if abandoned states are deleted

* Towards atomicity of db ops

* Correct typo

* Prune also skipped slots' states

* New logic for handling skipped states

* Make skipped slots test pass

* Post conflict resolution fixes

* Formatting fixes

* Tests passing

* Block hashes in Graphviz node labels

* Removed unused changes

* Fix bug with states having < SlotsPerHistoricalRoot roots

* Consolidate State/BlockRootsIterator for pruning

* Address review feedback

* Fix a bug in pruning tests

* Detach prune_abandoned_forks() from its object

* Move migrate.rs from store to beacon_chain

* Move forks pruning onto a background thread

* Bugfix: Heads weren't pruned when prune set contained only the head

* Rename: freeze_to_state() -> process_finalization()

* Eliminate redundant function parameter

Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2020-04-20 19:59:56 +10:00
Age Manning
1779aa6a8a
Merge latest master in v0.2.0 2020-04-08 16:46:37 +10:00
Paul Hauner
2fb6b7c793
Add no-copy block processing cache (#863)
* Add state cache, remove store cache

* Only build the head committee cache

* Fix compile error

* Fix compile error from merge

* Rename state_cache -> checkpoint_cache

* Rename Checkpoint -> Snapshot

* Tidy, add comments

* Tidy up find_head function

* Change some checkpoint -> snapshot

* Add tests

* Expose max_len

* Remove dead code

* Tidy

* Fix bug
2020-04-06 10:53:33 +10:00
Michael Sproul
26bdc2927b
Update to spec v0.11 (#959)
* Update process_final_updates() hysteresis computation

* Update core to v0.11.1

* Bump tags to v0.11.1

* Update docs and deposit contract

* Add compute_fork_digest

* Address review comments

Co-authored-by: Herman Alonso Junge <alonso.junge@gmail.com>
2020-04-01 22:03:03 +11:00
Age Manning
95c8e476bc
Initial work towards v0.2.0 (#924)
* Remove ping protocol

* Initial renaming of network services

* Correct rebasing relative to latest master

* Start updating types

* Adds HashMapDelay struct to utils

* Initial network restructure

* Network restructure. Adds new types for v0.2.0

* Removes build artefacts

* Shift validation to beacon chain

* Temporarily remove gossip validation

This is to be updated to match current optimisation efforts.

* Adds AggregateAndProof

* Begin rebuilding pubsub encoding/decoding

* Signature hacking

* Shift gossipsup decoding into eth2_libp2p

* Existing EF tests passing with fake_crypto

* Shifts block encoding/decoding into RPC

* Delete outdated API spec

* All release tests passing bar genesis state parsing

* Update and test YamlConfig

* Update to spec v0.10 compatible BLS

* Updates to BLS EF tests

* Add EF test for AggregateVerify

And delete unused hash2curve tests for uncompressed points

* Update EF tests to v0.10.1

* Use optional block root correctly in block proc

* Use genesis fork in deposit domain. All tests pass

* Fast aggregate verify test

* Update REST API docs

* Fix unused import

* Bump spec tags to v0.10.1

* Add `seconds_per_eth1_block` to chainspec

* Update to timestamp based eth1 voting scheme

* Return None from `get_votes_to_consider` if block cache is empty

* Handle overflows in `is_candidate_block`

* Revert to failing tests

* Fix eth1 data sets test

* Choose default vote according to spec

* Fix collect_valid_votes tests

* Fix `get_votes_to_consider` to choose all eligible blocks

* Uncomment winning_vote tests

* Add comments; remove unused code

* Reduce seconds_per_eth1_block for simulation

* Addressed review comments

* Add test for default vote case

* Fix logs

* Remove unused functions

* Meter default eth1 votes

* Fix comments

* Progress on attestation service

* Address review comments; remove unused dependency

* Initial work on removing libp2p lock

* Add LRU caches to store (rollup)

* Update attestation validation for DB changes (WIP)

* Initial version of should_forward_block

* Scaffold

* Progress on attestation validation

Also, consolidate prod+testing slot clocks so that they share much
of the same implementation and can both handle sub-slot time changes.

* Removes lock from libp2p service

* Completed network lock removal

* Finish(?) attestation processing

* Correct network termination future

* Add slot check to block check

* Correct fmt issues

* Remove Drop implementation for network service

* Add first attempt at attestation proc. re-write

* Add version 2 of attestation processing

* Minor fixes

* Add validator pubkey cache

* Make get_indexed_attestation take a committee

* Link signature processing into new attn verification

* First working version

* Ensure pubkey cache is updated

* Add more metrics, slight optimizations

* Clone committee cache during attestation processing

* Update shuffling cache during block processing

* Remove old commented-out code

* Fix shuffling cache insert bug

* Used indexed attestation in fork choice

* Restructure attn processing, add metrics

* Add more detailed metrics

* Tidy, fix failing tests

* Fix failing tests, tidy

* Address reviewers suggestions

* Disable/delete two outdated tests

* Modification of validator for subscriptions

* Add slot signing to validator client

* Further progress on validation subscription

* Adds necessary validator subscription functionality

* Add new Pubkeys struct to signature_sets

* Refactor with functional approach

* Update beacon chain

* Clean up validator <-> beacon node http types

* Add aggregator status to ValidatorDuty

* Impl Clone for manual slot clock

* Fix minor errors

* Further progress validator client subscription

* Initial subscription and aggregation handling

* Remove decompressed member from pubkey bytes

* Progress to modifying val client for attestation aggregation

* First draft of validator client upgrade for aggregate attestations

* Add hashmap for indices lookup

* Add state cache, remove store cache

* Only build the head committee cache

* Removes lock on a network channel

* Partially implement beacon node subscription http api

* Correct compilation issues

* Change `get_attesting_indices` to use Vec

* Fix failing test

* Partial implementation of timer

* Adds timer, removes exit_future, http api to op pool

* Partial multiple aggregate attestation handling

* Permits bulk messages accross gossipsub network channel

* Correct compile issues

* Improve gosispsub messaging and correct rest api helpers

* Added global gossipsub subscriptions

* Update validator subscriptions data structs

* Tidy

* Re-structure validator subscriptions

* Initial handling of subscriptions

* Re-structure network service

* Add pubkey cache persistence file

* Add more comments

* Integrate persistence file into builder

* Add pubkey cache tests

* Add HashSetDelay and introduce into attestation service

* Handles validator subscriptions

* Add data_dir to beacon chain builder

* Remove Option in pubkey cache persistence file

* Ensure consistency between datadir/data_dir

* Fix failing network test

* Peer subnet discovery gets queued for future subscriptions

* Reorganise attestation service functions

* Initial wiring of attestation service

* First draft of attestation service timing logic

* Correct minor typos

* Tidy

* Fix todos

* Improve tests

* Add PeerInfo to connected peers mapping

* Fix compile error

* Fix compile error from merge

* Split up block processing metrics

* Tidy

* Refactor get_pubkey_from_state

* Remove commented-out code

* Rename state_cache -> checkpoint_cache

* Rename Checkpoint -> Snapshot

* Tidy, add comments

* Tidy up find_head function

* Change some checkpoint -> snapshot

* Add tests

* Expose max_len

* Remove dead code

* Tidy

* Fix bug

* Add sync-speed metric

* Add first attempt at VerifiableBlock

* Start integrating into beacon chain

* Integrate VerifiableBlock

* Rename VerifableBlock -> PartialBlockVerification

* Add start of typed methods

* Add progress

* Add further progress

* Rename structs

* Add full block verification to block_processing.rs

* Further beacon chain integration

* Update checks for gossip

* Add todo

* Start adding segement verification

* Add passing chain segement test

* Initial integration with batch sync

* Minor changes

* Tidy, add more error checking

* Start adding chain_segment tests

* Finish invalid signature tests

* Include single and gossip verified blocks in tests

* Add gossip verification tests

* Start adding docs

* Finish adding comments to block_processing.rs

* Rename block_processing.rs -> block_verification

* Start removing old block processing code

* Fixes beacon_chain compilation

* Fix project-wide compile errors

* Remove old code

* Correct code to pass all tests

* Fix bug with beacon proposer index

* Fix shim for BlockProcessingError

* Only process one epoch at a time

* Fix loop in chain segment processing

* Correct tests from master merge

* Add caching for state.eth1_data_votes

* Add BeaconChain::validator_pubkey

* Revert "Add caching for state.eth1_data_votes"

This reverts commit cd73dcd6434fb8d8e6bf30c5356355598ea7b78e.

Co-authored-by: Grant Wuerker <gwuerker@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: Michael Sproul <micsproul@gmail.com>
Co-authored-by: pawan <pawandhananjay@gmail.com>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
2020-03-17 17:24:44 +11:00
Michael Sproul
6b2e9ff246
Less noisy logs for unaligned finalized blocks (#901) 2020-03-12 12:11:46 +11:00
Paul Hauner
8c5bcfe53a
Optimise beacon chain persistence (#851)
* Unfinished progress

* Update more persistence code

* Start fixing tests

* Combine persist head and fork choice

* Persist head on reorg

* Gracefully handle op pool and eth1 cache missing

* Fix test failure

* Address Michael's comments
2020-03-06 16:09:41 +11:00
Michael Sproul
1f16d8fe4d
Add methods to delete blocks and states from disk (#843)
Closes #833
2020-03-04 16:48:35 +11:00
Paul Hauner
fbb630793e
Attempt to remove a tree hash from block replaying (#862)
* Attempt to remove a tree hash from block replaying

* Add missed thing
2020-03-02 13:40:58 +11:00
Michael Sproul
371e5adcf8
Update to Spec v0.10 (#817)
* Start updating types

* WIP

* Signature hacking

* Existing EF tests passing with fake_crypto

* Updates

* Delete outdated API spec

* The refactor continues

* It compiles

* WIP test fixes

* All release tests passing bar genesis state parsing

* Update and test YamlConfig

* Update to spec v0.10 compatible BLS

* Updates to BLS EF tests

* Add EF test for AggregateVerify

And delete unused hash2curve tests for uncompressed points

* Update EF tests to v0.10.1

* Use optional block root correctly in block proc

* Use genesis fork in deposit domain. All tests pass

* Cargo fmt

* Fast aggregate verify test

* Update REST API docs

* Cargo fmt

* Fix unused import

* Bump spec tags to v0.10.1

* Add `seconds_per_eth1_block` to chainspec

* Update to timestamp based eth1 voting scheme

* Return None from `get_votes_to_consider` if block cache is empty

* Handle overflows in `is_candidate_block`

* Revert to failing tests

* Fix eth1 data sets test

* Choose default vote according to spec

* Fix collect_valid_votes tests

* Fix `get_votes_to_consider` to choose all eligible blocks

* Uncomment winning_vote tests

* Add comments; remove unused code

* Reduce seconds_per_eth1_block for simulation

* Addressed review comments

* Add test for default vote case

* Fix logs

* Remove unused functions

* Meter default eth1 votes

* Fix comments

* Address review comments; remove unused dependency

* Disable/delete two outdated tests

* Bump eth1 default vote warn to error

* Delete outdated eth1 test

Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
2020-02-11 10:19:36 +11:00
Michael Sproul
e0b9fa599f
Add LRU cache to database (#837)
* Add LRU caches to store

* Improvements to LRU caches

* Take state by value in `Store::put_state`

* Store blocks by value, configurable cache sizes

* Use a StateBatch to efficiently store skip states

* Fix store tests

* Add CloneConfig test, remove unused metrics

* Use Mutexes instead of RwLocks for LRU caches
2020-02-10 11:30:21 +11:00
Paul Hauner
b771bbb60c
Add proto_array fork choice (#804)
* Start implementing proto_array

* Add progress

* Add unfinished progress

* Add further progress

* Add progress

* Add tree filtering

* Add half-finished modifications

* Add refactored version

* Tidy, add incomplete LmdGhost impl

* Move impls in LmdGhost trait def

* Remove old reduced_tree fork choice

* Combine two functions in to `compute_deltas`

* Start testing

* Add more compute_deltas tests

* Add fork choice testing

* Add more fork choice testing

* Add more fork choice tests

* Add more testing to proto-array

* Remove old tests

* Modify tests

* Add more tests

* Add more testing

* Add comments and fixes

* Re-organise crate

* Tidy, finish pruning tests

* Add ssz encoding, other pub fns

* Rename lmd_ghost > proto_array_fork_choice

* Integrate proto_array into lighthouse

* Add first pass at fixing filter

* Clean out old comments

* Add more comments

* Attempt to fix prune error

* Adjust TODO

* Fix test compile errors

* Add extra justification change check

* Update cargo.lock

* Fix fork choice test compile errors

* Most remove ffg_update_required

* Fix bug with epoch of attestation votes

* Start adding new test format

* Make fork choice tests declarative

* Create test def concept

* Move test defs into crate

* Add binary, re-org crate

* Shuffle files

* Start adding ffg tests

* Add more fork choice tests

* Add fork choice JSON dumping

* Add more detail to best node error

* Ensure fin+just checkpoints from from same block

* Rename JustificationManager

* Move checkpoint manager into own file

* Tidy

* Add targetted logging for sneaky sync bug

* Fix justified balances bug

* Add cache metrics

* Add metrics for log levels

* Fix bug in checkpoint manager

* Fix compile error in fork choice tests

* Ignore duplicate blocks in fork choice

* Add block to fock choice before db

* Rename on_new_block fn

* Fix spec inconsistency in `CheckpointManager`

* Remove BlockRootTree

* Remove old reduced_tree code fragment

* Add API endpoint for fork choice

* Add more ffg tests

* Remove block_root_tree reminents

* Ensure effective balances are used

* Remove old debugging code, fix API fault

* Add check to ensure parent block is in fork choice

* Update readme dates

* Fix readme

* Tidy checkpoint manager

* Remove fork choice yaml files from repo

* Remove fork choice yaml from repo

* General tidy

* Address majority of Michael's comments

* Tidy bin/lib business

* Remove dangling file

* Undo changes for rpc/handler from master

* Revert "Undo changes for rpc/handler from master"

This reverts commit 876edff0e4a501aafbb47113454852826dcc24e8.

Co-authored-by: Age Manning <Age@AgeManning.com>
2020-01-29 15:05:00 +11:00
Pawan Dhananjay
23a35c3767 Persist/load DHT on shutdown/startup (#659)
* Store dht enrs on shutdown

* Load enrs on startup and add tests

* Remove enr_entries from behavior

* Move all dht persisting logic to `NetworkService`

* Move `PersistedDht` from eth2-libp2p to network crate

* Add test to confirm dht persistence

* Add logging

* Remove extra call to beacon_chain persist

* Expose only mutable `add_enr` method from behaviour

* Fix tests

* Fix merge errors
2020-01-23 18:16:11 +11:00
pscott
7396cd2cab Fix clippy warnings (#813)
* Clippy account manager

* Clippy account_manager

* Clippy beacon_node/beacon_chain

* Clippy beacon_node/client

* Clippy beacon_node/eth1

* Clippy beacon_node/eth2-libp2p

* Clippy beacon_node/genesis

* Clippy beacon_node/network

* Clippy beacon_node/rest_api

* Clippy beacon_node/src

* Clippy beacon_node/store

* Clippy eth2/lmd_ghost

* Clippy eth2/operation_pool

* Clippy eth2/state_processing

* Clippy eth2/types

* Clippy eth2/utils/bls

* Clippy eth2/utils/cahced_tree_hash

* Clippy eth2/utils/deposit_contract

* Clippy eth2/utils/eth2_interop_keypairs

* Clippy eth2/utils/eth2_testnet_config

* Clippy eth2/utils/lighthouse_metrics

* Clippy eth2/utils/ssz

* Clippy eth2/utils/ssz_types

* Clippy eth2/utils/tree_hash_derive

* Clippy lcli

* Clippy tests/beacon_chain_sim

* Clippy validator_client

* Cargo fmt
2020-01-21 18:38:56 +11:00
Michael Sproul
5a8f2dd961
Increase default slots per restore point to 2048 (#790)
This should reduce disk usage by 32x while keeping historical state queries to
less than 10s. If historical states are required quickly, the minimum SPRP of 32
can be set on the CLI.
2020-01-10 14:42:49 +11:00
Michael Sproul
95fc840e2c
Fix off-by-one error in get_latest_restore_point (#787)
* Fix off-by-one error in get_latest_restore_point

* Tighten SPRP checks for succinct hot DB change
2020-01-09 21:05:56 +11:00
Michael Sproul
d9e9c17d3b
Avoid building caches during block replay (#783)
Also, make the ExitCache safe.
2020-01-09 11:43:11 +11:00
Michael Sproul
f36a5a15d6
Store states efficiently in the hot database (#746)
* Sparse hot DB and block root tree

* Fix store_tests

* Ensure loads of hot states on boundaries are fast

* Milder error for unaligned finalized blocks
2020-01-08 13:58:01 +11:00