## Issue Addressed
Closes#1557
## Proposed Changes
Modify the pruning algorithm so that it mutates the head-tracker _before_ committing the database transaction to disk, and _only if_ all the heads to be removed are still present in the head-tracker (i.e. no concurrent mutations).
In the process of writing and testing this I also had to make a few other changes:
* Use internal mutability for all `BeaconChainHarness` functions (namely the RNG and the graffiti), in order to enable parallel calls (see testing section below).
* Disable logging in harness tests unless the `test_logger` feature is turned on
And chose to make some clean-ups:
* Delete the `NullMigrator`
* Remove type-based configuration for the migrator in favour of runtime config (simpler, less duplicated code)
* Use the non-blocking migrator unless the blocking migrator is required. In the store tests we need the blocking migrator because some tests make asserts about the state of the DB after the migration has run.
* Rename `validators_keypairs` -> `validator_keypairs` in the `BeaconChainHarness`
## Testing
To confirm that the fix worked, I wrote a test using [Hiatus](https://crates.io/crates/hiatus), which can be found here:
https://github.com/michaelsproul/lighthouse/tree/hiatus-issue-1557
That test can't be merged because it inserts random breakpoints everywhere, but if you check out that branch you can run the test with:
```
$ cd beacon_node/beacon_chain
$ cargo test --release --test parallel_tests --features test_logger
```
It should pass, and the log output should show:
```
WARN Pruning deferred because of a concurrent mutation, message: this is expected only very rarely!
```
## Additional Info
This is a backwards-compatible change with no impact on consensus.
## Issue Addressed
- Resolves#1616
## Proposed Changes
If we look at the function which persists fork choice and the canonical head to disk:
1db8daae0c/beacon_node/beacon_chain/src/beacon_chain.rs (L234-L280)
There is a race-condition which might cause the canonical head and fork choice values to be out-of-sync.
I believe this is the cause of #1616. I managed to recreate the issue and produce a database that was unable to sync under the `master` branch but able to sync with this branch.
These new changes solve the issue by ignoring the persisted `canonical_head_block_root` value and instead getting fork choice to generate it. This ensures that the canonical head is in-sync with fork choice.
## Additional Info
This is hotfix method that leaves some crusty code hanging around. Once this PR is merged (to satisfy the v0.2.x users) we should later update and merge #1638 so we can have a clean fix for the v0.3.x versions.
The PR:
* Adds the ability to generate a crucial test scenario that isn't possible with `BeaconChainHarness` (i.e. two blocks occupying the same slot; previously forks necessitated skipping slots):
![image](https://user-images.githubusercontent.com/165678/88195404-4bce3580-cc40-11ea-8c08-b48d2e1d5959.png)
* New testing API: Instead of repeatedly calling add_block(), you generate a sorted `Vec<Slot>` and leave it up to the framework to generate blocks at those slots.
* Jumping backwards to an earlier epoch is a hard error, so that tests necessarily generate blocks in a epoch-by-epoch manner.
* Configures the test logger so that output is printed on the console in case a test fails. The logger also plays well with `--nocapture`, contrary to the existing testing framework
* Rewrites existing fork pruning tests to use the new API
* Adds a tests that triggers finalization at a non epoch boundary slot
* Renamed `BeaconChainYoke` to `BeaconChainTestingRig` because the former has been too confusing
* Fixed multiple tests (e.g. `block_production_different_shuffling_long`, `delete_blocks_and_states`, `shuffling_compatible_simple_fork`) that relied on a weird (and accidental) feature of the old `BeaconChainHarness` that attestations aren't produced for epochs earlier than the current one, thus masking potential bugs in test cases.
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
* Layer do_atomically() abstractions properly
* Reduce allocs and DRY get_key_for_col()
* Parameterize HotColdDB with hot and cold item stores
* -impl Store for MemoryStore
* Replace Store uses with HotColdDB
* Ditch Store trait
* cargo fmt
* Style fix
* Readd missing dep that broke the build
* Remove redundant method
* Pull out a method out of a struct
* More precise db access abstractions
* Move fake trait method out of it
* cargo fmt
* Fix compilation error after refactoring
* Move another fake method out the Store trait
* Get rid of superfluous method
* Fix refactoring bug
* Rename: SimpleStoreItem -> StoreItem
* Get rid of the confusing DiskStore type alias
* Get rid of SimpleDiskStore type alias
* Correction: A method took both self and a ref to Self
* Unfinished progress
* Update more persistence code
* Start fixing tests
* Combine persist head and fork choice
* Persist head on reorg
* Gracefully handle op pool and eth1 cache missing
* Fix test failure
* Address Michael's comments
* Start updating types
* WIP
* Signature hacking
* Existing EF tests passing with fake_crypto
* Updates
* Delete outdated API spec
* The refactor continues
* It compiles
* WIP test fixes
* All release tests passing bar genesis state parsing
* Update and test YamlConfig
* Update to spec v0.10 compatible BLS
* Updates to BLS EF tests
* Add EF test for AggregateVerify
And delete unused hash2curve tests for uncompressed points
* Update EF tests to v0.10.1
* Use optional block root correctly in block proc
* Use genesis fork in deposit domain. All tests pass
* Cargo fmt
* Fast aggregate verify test
* Update REST API docs
* Cargo fmt
* Fix unused import
* Bump spec tags to v0.10.1
* Add `seconds_per_eth1_block` to chainspec
* Update to timestamp based eth1 voting scheme
* Return None from `get_votes_to_consider` if block cache is empty
* Handle overflows in `is_candidate_block`
* Revert to failing tests
* Fix eth1 data sets test
* Choose default vote according to spec
* Fix collect_valid_votes tests
* Fix `get_votes_to_consider` to choose all eligible blocks
* Uncomment winning_vote tests
* Add comments; remove unused code
* Reduce seconds_per_eth1_block for simulation
* Addressed review comments
* Add test for default vote case
* Fix logs
* Remove unused functions
* Meter default eth1 votes
* Fix comments
* Address review comments; remove unused dependency
* Add first attempt at attestation proc. re-write
* Add version 2 of attestation processing
* Minor fixes
* Add validator pubkey cache
* Make get_indexed_attestation take a committee
* Link signature processing into new attn verification
* First working version
* Ensure pubkey cache is updated
* Add more metrics, slight optimizations
* Clone committee cache during attestation processing
* Update shuffling cache during block processing
* Remove old commented-out code
* Fix shuffling cache insert bug
* Used indexed attestation in fork choice
* Restructure attn processing, add metrics
* Add more detailed metrics
* Tidy, fix failing tests
* Fix failing tests, tidy
* Disable/delete two outdated tests
* Tidy
* Add pubkey cache persistence file
* Add more comments
* Integrate persistence file into builder
* Add pubkey cache tests
* Add data_dir to beacon chain builder
* Remove Option in pubkey cache persistence file
* Ensure consistency between datadir/data_dir
* Fix failing network test
* Tidy
* Fix todos
* Add attestation processing tests
* Add another test
* Only run attestation tests in release
* Make attestation tests MainnetEthSpec
* Address Michael's comments
* Remove redundant check
* Fix warning
* Fix failing test
Co-authored-by: Michael Sproul <micsproul@gmail.com>
Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
* Add LRU caches to store
* Improvements to LRU caches
* Take state by value in `Store::put_state`
* Store blocks by value, configurable cache sizes
* Use a StateBatch to efficiently store skip states
* Fix store tests
* Add CloneConfig test, remove unused metrics
* Use Mutexes instead of RwLocks for LRU caches
* Add TimeoutRwLock to BeaconChain
* Update network crate
* Update rest api
* Fix beacon chain tests
* Fix rest api tests
* Set test back to !debug_assertions
* Draft of checkpoint freezer DB
* Fix bugs
* Adjust root iterators for checkpoint database
* Fix freezer state lookups with no slot hint
* Fix split comment
* Use "restore point" to refer to frozen states
* Resolve some FIXMEs
* Configurable slots per restore point
* Document new freezer DB functions
* Fix up StoreConfig
* Fix new test for merge
* Document SPRP default CLI flag, clarify tests
* Update to spec v0.9.0
* Update to v0.9.1
* Bump spec tags for v0.9.1
* Formatting, fix CI failures
* Resolve accidental KeyPair merge conflict
* Document new BeaconState functions
* Add `validator` changes from `validator-to-rest`
* Add initial (failing) REST api tests
* Fix signature parsing
* Add more tests
* Refactor http router
* Add working tests for publish beacon block
* Add validator duties tests
* Move account_manager under `lighthouse` binary
* Unify logfile handling in `environment` crate.
* Fix incorrect cache drops in `advance_caches`
* Update fork choice for v0.9.1
* Add `deposit_contract` crate
* Add progress on validator onboarding
* Add unfinished attesation code
* Update account manager CLI
* Write eth1 data file as hex string
* Integrate ValidatorDirectory with validator_client
* Move ValidatorDirectory into validator_client
* Clean up some FIXMEs
* Add beacon_chain_sim
* Fix a few docs/logs
* Expand `beacon_chain_sim`
* Fix spec for `beacon_chain_sim
* More testing for api
* Start work on attestation endpoint
* Reject empty attestations
* Allow attestations to genesis block
* Add working tests for `rest_api` validator endpoint
* Remove grpc from beacon_node
* Start heavy refactor of validator client
- Block production is working
* Prune old validator client files
* Start works on attestation service
* Add attestation service to validator client
* Use full pubkey for validator directories
* Add validator duties post endpoint
* Use par_iter for keypair generation
* Use bulk duties request in validator client
* Add version http endpoint tests
* Add interop keys and startup wait
* Ensure a prompt exit
* Add duties pruning
* Fix compile error in beacon node tests
* Add github workflow
* Modify rust.yaml
* Modify gitlab actions
* Add to CI file
* Add sudo to CI npm install
* Move cargo fmt to own job in tests
* Fix cargo fmt in CI
* Add rustup update before cargo fmt
* Change name of CI job
* Make other CI jobs require cargo fmt
* Add CI badge
* Remove gitlab and travis files
* Add different http timeout for debug
* Update docker file, use makefile in CI
* Use make in the dockerfile, skip the test
* Use the makefile for debug GI test
* Update book
* Tidy grpc and misc things
* Apply discv5 fixes
* Address other minor issues
* Fix warnings
* Attempt fix for addr parsing
* Tidy validator config, CLIs
* Tidy comments
* Tidy signing, reduce ForkService duplication
* Fail if skipping too many slots
* Set default recent genesis time to 0
* Add custom http timeout to validator
* Fix compile bug in node_test_rig
* Remove old bootstrap flag from val CLI
* Update docs
* Tidy val client
* Change val client log levels
* Add comments, more validity checks
* Fix compile error, add comments
* Undo changes to eth2-libp2p/src
* Reduce duplication of keypair generation
* Add more logging for validator duties
* Fix beacon_chain_sim, nitpicks
* Fix compile error, minor nits
* Update to use v0.9.2 version of deposit contract
* Add efforts to automate eth1 testnet deployment
* Fix lcli testnet deployer
* Modify bn CLI to parse eth2_testnet_dir
* Progress with account_manager deposit tools
* Make account manager submit deposits
* Add password option for submitting deposits
* Allow custom deposit amount
* Add long names to lcli clap
* Add password option to lcli deploy command
* Fix minor bugs whilst testing
* Address Michael's comments
* Add refund-deposit-contract to lcli
* Use time instead of skip count for denying long skips
* Improve logging for eth1
* Fix bug with validator services exiting on error
* Drop the block cache after genesis
* Modify eth1 testnet config
* Improve eth1 logging
* Make validator wait until genesis time
* Fix bug in eth1 voting
* Add more logging to eth1 voting
* Handle errors in eth1 http module
* Set SECONDS_PER_DAY to sensible minimum
* Shorten delay before testnet start
* Ensure eth1 block is produced without any votes
* Improve eth1 logging
* Fix broken tests in eth1
* Tidy code in rest_api
* Fix failing test in deposit_contract
* Make CLI args more consistent
* Change validator/duties endpoint
* Add time-based skip slot limiting
* Add new error type missed in previous commit
* Add log when waiting for genesis
* Refactor beacon node CLI
* Remove unused dep
* Add lcli eth1-genesis command
* Fix bug in master merge
* Apply clippy lints to beacon node
* Add support for YamlConfig in Eth2TestnetDir
* Upgrade tesnet deposit contract version
* Remove unnecessary logging and correct formatting
* Add a hardcoded eth2 testnet config
* Ensure http server flag works. Overwrite configs with flags.
* Ensure boot nodes are loaded from testnet dir
* Fix account manager CLI bugs
* Fix bugs with beacon node cli
* Allow testnet dir without boot nodes
* Write genesis state as SSZ
* Remove ---/n from the start of testnet_dir files
* Set default libp2p address
* Tidy account manager CLI, add logging
* Add check to see if testnet dir exists
* Apply reviewers suggestions
* Add HeadTracker struct
* Add fork choice persistence
* Shorten slot time for simulator
* Add the /beacon/heads API endpoint
* Update hardcoded testnet
* Add tests for BeaconChain persistence + fix bugs
* Extend BeaconChain persistence testing
* Ensure chain is finalized b4 persistence tests
* Ensure boot_enr.yaml is include in binary
* Refactor beacon_chain_sim
* Move files about in beacon sim
* Update beacon_chain_sim
* Fix bug with deposit inclusion
* Increase log in genesis service, fix todo
* Tidy sim, fix broken rest_api tests
* Fix more broken tests
* Update testnet
* Fix broken rest api test
* Tidy account manager CLI
* Use tempdir for account manager
* Stop hardcoded testnet dir from creating dir
* Rename Eth2TestnetDir to Eth2TestnetConfig
* Change hardcoded -> hard_coded
* Tidy account manager
* Add log to account manager
* Tidy, ensure head tracker is loaded from disk
* Tidy beacon chain builder
* Tidy eth1_chain
* Adds log support for simulator
* Revert "Adds log support for simulator"
This reverts commit ec77c66a052350f551db145cf20f213823428dd3.
* Adds log support for simulator
* Tidy after self-review
* Change default log level
* Address Michael's delicious PR comments
* Fix off-by-one in tests