## Issue Addressed
Closes#1719
## Proposed Changes
Lift the internal `RwLock`s and `Mutex`es from the `Observed*` data structures to resolve the race conditions described in #1719.
Most of this work was done by @paulhauner on his `lift-locks` branch, I merely updated it for the current `master` and checked over it.
## Additional Info
I think it would be prudent to test this on a testnet or two before mainnet launch, just to be sure that the extra lock contention doesn't negatively impact performance.
## Issue Addressed
Closes#1557
## Proposed Changes
Modify the pruning algorithm so that it mutates the head-tracker _before_ committing the database transaction to disk, and _only if_ all the heads to be removed are still present in the head-tracker (i.e. no concurrent mutations).
In the process of writing and testing this I also had to make a few other changes:
* Use internal mutability for all `BeaconChainHarness` functions (namely the RNG and the graffiti), in order to enable parallel calls (see testing section below).
* Disable logging in harness tests unless the `test_logger` feature is turned on
And chose to make some clean-ups:
* Delete the `NullMigrator`
* Remove type-based configuration for the migrator in favour of runtime config (simpler, less duplicated code)
* Use the non-blocking migrator unless the blocking migrator is required. In the store tests we need the blocking migrator because some tests make asserts about the state of the DB after the migration has run.
* Rename `validators_keypairs` -> `validator_keypairs` in the `BeaconChainHarness`
## Testing
To confirm that the fix worked, I wrote a test using [Hiatus](https://crates.io/crates/hiatus), which can be found here:
https://github.com/michaelsproul/lighthouse/tree/hiatus-issue-1557
That test can't be merged because it inserts random breakpoints everywhere, but if you check out that branch you can run the test with:
```
$ cd beacon_node/beacon_chain
$ cargo test --release --test parallel_tests --features test_logger
```
It should pass, and the log output should show:
```
WARN Pruning deferred because of a concurrent mutation, message: this is expected only very rarely!
```
## Additional Info
This is a backwards-compatible change with no impact on consensus.
* Port eth1 lib to use stable futures
* Port eth1_test_rig to stable futures
* Port eth1 tests to stable futures
* Port genesis service to stable futures
* Port genesis tests to stable futures
* Port beacon_chain to stable futures
* Port lcli to stable futures
* Fix eth1_test_rig (#1014)
* Fix lcli
* Port timer to stable futures
* Fix timer
* Port websocket_server to stable futures
* Port notifier to stable futures
* Add TODOS
* Update hashmap hashset to stable futures
* Adds panic test to hashset delay
* Port remote_beacon_node to stable futures
* Fix lcli merge conflicts
* Non rpc stuff compiles
* protocol.rs compiles
* Port websockets, timer and notifier to stable futures (#1035)
* Fix lcli
* Port timer to stable futures
* Fix timer
* Port websocket_server to stable futures
* Port notifier to stable futures
* Add TODOS
* Port remote_beacon_node to stable futures
* Partial eth2-libp2p stable future upgrade
* Finished first round of fighting RPC types
* Further progress towards porting eth2-libp2p adds caching to discovery
* Update behaviour
* RPC handler to stable futures
* Update RPC to master libp2p
* Network service additions
* Fix the fallback transport construction (#1102)
* Correct warning
* Remove hashmap delay
* Compiling version of eth2-libp2p
* Update all crates versions
* Fix conversion function and add tests (#1113)
* Port validator_client to stable futures (#1114)
* Add PH & MS slot clock changes
* Account for genesis time
* Add progress on duties refactor
* Add simple is_aggregator bool to val subscription
* Start work on attestation_verification.rs
* Add progress on ObservedAttestations
* Progress with ObservedAttestations
* Fix tests
* Add observed attestations to the beacon chain
* Add attestation observation to processing code
* Add progress on attestation verification
* Add first draft of ObservedAttesters
* Add more tests
* Add observed attesters to beacon chain
* Add observers to attestation processing
* Add more attestation verification
* Create ObservedAggregators map
* Remove commented-out code
* Add observed aggregators into chain
* Add progress
* Finish adding features to attestation verification
* Ensure beacon chain compiles
* Link attn verification into chain
* Integrate new attn verification in chain
* Remove old attestation processing code
* Start trying to fix beacon_chain tests
* Split adding into pools into two functions
* Add aggregation to harness
* Get test harness working again
* Adjust the number of aggregators for test harness
* Fix edge-case in harness
* Integrate new attn processing in network
* Fix compile bug in validator_client
* Update validator API endpoints
* Fix aggreagation in test harness
* Fix enum thing
* Fix attestation observation bug:
* Patch failing API tests
* Start adding comments to attestation verification
* Remove unused attestation field
* Unify "is block known" logic
* Update comments
* Supress fork choice errors for network processing
* Add todos
* Tidy
* Add gossip attn tests
* Disallow test harness to produce old attns
* Comment out in-progress tests
* Partially address pruning tests
* Fix failing store test
* Add aggregate tests
* Add comments about which spec conditions we check
* Dont re-aggregate
* Split apart test harness attn production
* Fix compile error in network
* Make progress on commented-out test
* Fix skipping attestation test
* Add fork choice verification tests
* Tidy attn tests, remove dead code
* Remove some accidentally added code
* Fix clippy lint
* Rename test file
* Add block tests, add cheap block proposer check
* Rename block testing file
* Add observed_block_producers
* Tidy
* Switch around block signature verification
* Finish block testing
* Remove gossip from signature tests
* First pass of self review
* Fix deviation in spec
* Update test spec tags
* Start moving over to hashset
* Finish moving observed attesters to hashmap
* Move aggregation pool over to hashmap
* Make fc attn borrow again
* Fix rest_api compile error
* Fix missing comments
* Fix monster test
* Uncomment increasing slots test
* Address remaining comments
* Remove unsafe, use cfg test
* Remove cfg test flag
* Fix dodgy comment
* Revert "Update hashmap hashset to stable futures"
This reverts commit d432378a3cc5cd67fc29c0b15b96b886c1323554.
* Revert "Adds panic test to hashset delay"
This reverts commit 281502396fc5b90d9c421a309c2c056982c9525b.
* Ported attestation_service
* Ported duties_service
* Ported fork_service
* More ports
* Port block_service
* Minor fixes
* VC compiles
* Update TODOS
* Borrow self where possible
* Ignore aggregates that are already known.
* Unify aggregator modulo logic
* Fix typo in logs
* Refactor validator subscription logic
* Avoid reproducing selection proof
* Skip HTTP call if no subscriptions
* Rename DutyAndState -> DutyAndProof
* Tidy logs
* Print root as dbg
* Fix compile errors in tests
* Fix compile error in test
* Re-Fix attestation and duties service
* Minor fixes
Co-authored-by: Paul Hauner <paul@paulhauner.com>
* Network crate update to stable futures
* Port account_manager to stable futures (#1121)
* Port account_manager to stable futures
* Run async fns in tokio environment
* Port rest_api crate to stable futures (#1118)
* Port rest_api lib to stable futures
* Reduce tokio features
* Update notifier to stable futures
* Builder update
* Further updates
* Convert self referential async functions
* stable futures fixes (#1124)
* Fix eth1 update functions
* Fix genesis and client
* Fix beacon node lib
* Return appropriate runtimes from environment
* Fix test rig
* Refactor eth1 service update
* Upgrade simulator to stable futures
* Lighthouse compiles on stable futures
* Remove println debugging statement
* Update libp2p service, start rpc test upgrade
* Update network crate for new libp2p
* Update tokio::codec to futures_codec (#1128)
* Further work towards RPC corrections
* Correct http timeout and network service select
* Use tokio runtime for libp2p
* Revert "Update tokio::codec to futures_codec (#1128)"
This reverts commit e57aea924acf5cbabdcea18895ac07e38a425ed7.
* Upgrade RPC libp2p tests
* Upgrade secio fallback test
* Upgrade gossipsub examples
* Clean up RPC protocol
* Test fixes (#1133)
* Correct websocket timeout and run on os thread
* Fix network test
* Clean up PR
* Correct tokio tcp move attestation service tests
* Upgrade attestation service tests
* Correct network test
* Correct genesis test
* Test corrections
* Log info when block is received
* Modify logs and update attester service events
* Stable futures: fixes to vc, eth1 and account manager (#1142)
* Add local testnet scripts
* Remove whiteblock script
* Rename local testnet script
* Move spawns onto handle
* Fix VC panic
* Initial fix to block production issue
* Tidy block producer fix
* Tidy further
* Add local testnet clean script
* Run cargo fmt
* Tidy duties service
* Tidy fork service
* Tidy ForkService
* Tidy AttestationService
* Tidy notifier
* Ensure await is not suppressed in eth1
* Ensure await is not suppressed in account_manager
* Use .ok() instead of .unwrap_or(())
* RPC decoding test for proto
* Update discv5 and eth2-libp2p deps
* Fix lcli double runtime issue (#1144)
* Handle stream termination and dialing peer errors
* Correct peer_info variant types
* Remove unnecessary warnings
* Handle subnet unsubscription removal and improve logigng
* Add logs around ping
* Upgrade discv5 and improve logging
* Handle peer connection status for multiple connections
* Improve network service logging
* Improve logging around peer manager
* Upgrade swarm poll centralise peer management
* Identify clients on error
* Fix `remove_peer` in sync (#1150)
* remove_peer removes from all chains
* Remove logs
* Fix early return from loop
* Improved logging, fix panic
* Partially correct tests
* Stable futures: Vc sync (#1149)
* Improve syncing heuristic
* Add comments
* Use safer method for tolerance
* Fix tests
* Stable futures: Fix VC bug, update agg pool, add more metrics (#1151)
* Expose epoch processing summary
* Expose participation metrics to prometheus
* Switch to f64
* Reduce precision
* Change precision
* Expose observed attesters metrics
* Add metrics for agg/unagg attn counts
* Add metrics for gossip rx
* Add metrics for gossip tx
* Adds ignored attns to prom
* Add attestation timing
* Add timer for aggregation pool sig agg
* Add write lock timer for agg pool
* Add more metrics to agg pool
* Change map lock code
* Add extra metric to agg pool
* Change lock handling in agg pool
* Change .write() to .read()
* Add another agg pool timer
* Fix for is_aggregator
* Fix pruning bug
Co-authored-by: pawan <pawandhananjay@gmail.com>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
* Add PH & MS slot clock changes
* Account for genesis time
* Add progress on duties refactor
* Add simple is_aggregator bool to val subscription
* Start work on attestation_verification.rs
* Add progress on ObservedAttestations
* Progress with ObservedAttestations
* Fix tests
* Add observed attestations to the beacon chain
* Add attestation observation to processing code
* Add progress on attestation verification
* Add first draft of ObservedAttesters
* Add more tests
* Add observed attesters to beacon chain
* Add observers to attestation processing
* Add more attestation verification
* Create ObservedAggregators map
* Remove commented-out code
* Add observed aggregators into chain
* Add progress
* Finish adding features to attestation verification
* Ensure beacon chain compiles
* Link attn verification into chain
* Integrate new attn verification in chain
* Remove old attestation processing code
* Start trying to fix beacon_chain tests
* Split adding into pools into two functions
* Add aggregation to harness
* Get test harness working again
* Adjust the number of aggregators for test harness
* Fix edge-case in harness
* Integrate new attn processing in network
* Fix compile bug in validator_client
* Update validator API endpoints
* Fix aggreagation in test harness
* Fix enum thing
* Fix attestation observation bug:
* Patch failing API tests
* Start adding comments to attestation verification
* Remove unused attestation field
* Unify "is block known" logic
* Update comments
* Supress fork choice errors for network processing
* Add todos
* Tidy
* Add gossip attn tests
* Disallow test harness to produce old attns
* Comment out in-progress tests
* Partially address pruning tests
* Fix failing store test
* Add aggregate tests
* Add comments about which spec conditions we check
* Dont re-aggregate
* Split apart test harness attn production
* Fix compile error in network
* Make progress on commented-out test
* Fix skipping attestation test
* Add fork choice verification tests
* Tidy attn tests, remove dead code
* Remove some accidentally added code
* Fix clippy lint
* Rename test file
* Add block tests, add cheap block proposer check
* Rename block testing file
* Add observed_block_producers
* Tidy
* Switch around block signature verification
* Finish block testing
* Remove gossip from signature tests
* First pass of self review
* Fix deviation in spec
* Update test spec tags
* Start moving over to hashset
* Finish moving observed attesters to hashmap
* Move aggregation pool over to hashmap
* Make fc attn borrow again
* Fix rest_api compile error
* Fix missing comments
* Fix monster test
* Uncomment increasing slots test
* Address remaining comments
* Remove unsafe, use cfg test
* Remove cfg test flag
* Fix dodgy comment
* Ignore aggregates that are already known.
* Unify aggregator modulo logic
* Fix typo in logs
* Refactor validator subscription logic
* Avoid reproducing selection proof
* Skip HTTP call if no subscriptions
* Rename DutyAndState -> DutyAndProof
* Tidy logs
* Print root as dbg
* Fix compile errors in tests
* Fix compile error in test
* Start updating types
* WIP
* Signature hacking
* Existing EF tests passing with fake_crypto
* Updates
* Delete outdated API spec
* The refactor continues
* It compiles
* WIP test fixes
* All release tests passing bar genesis state parsing
* Update and test YamlConfig
* Update to spec v0.10 compatible BLS
* Updates to BLS EF tests
* Add EF test for AggregateVerify
And delete unused hash2curve tests for uncompressed points
* Update EF tests to v0.10.1
* Use optional block root correctly in block proc
* Use genesis fork in deposit domain. All tests pass
* Cargo fmt
* Fast aggregate verify test
* Update REST API docs
* Cargo fmt
* Fix unused import
* Bump spec tags to v0.10.1
* Add `seconds_per_eth1_block` to chainspec
* Update to timestamp based eth1 voting scheme
* Return None from `get_votes_to_consider` if block cache is empty
* Handle overflows in `is_candidate_block`
* Revert to failing tests
* Fix eth1 data sets test
* Choose default vote according to spec
* Fix collect_valid_votes tests
* Fix `get_votes_to_consider` to choose all eligible blocks
* Uncomment winning_vote tests
* Add comments; remove unused code
* Reduce seconds_per_eth1_block for simulation
* Addressed review comments
* Add test for default vote case
* Fix logs
* Remove unused functions
* Meter default eth1 votes
* Fix comments
* Address review comments; remove unused dependency
* Add first attempt at attestation proc. re-write
* Add version 2 of attestation processing
* Minor fixes
* Add validator pubkey cache
* Make get_indexed_attestation take a committee
* Link signature processing into new attn verification
* First working version
* Ensure pubkey cache is updated
* Add more metrics, slight optimizations
* Clone committee cache during attestation processing
* Update shuffling cache during block processing
* Remove old commented-out code
* Fix shuffling cache insert bug
* Used indexed attestation in fork choice
* Restructure attn processing, add metrics
* Add more detailed metrics
* Tidy, fix failing tests
* Fix failing tests, tidy
* Disable/delete two outdated tests
* Add new Pubkeys struct to signature_sets
* Refactor with functional approach
* Update beacon chain
* Remove decompressed member from pubkey bytes
* Add hashmap for indices lookup
* Change `get_attesting_indices` to use Vec
* Fix failing test
* Tidy
* Add pubkey cache persistence file
* Add more comments
* Integrate persistence file into builder
* Add pubkey cache tests
* Add data_dir to beacon chain builder
* Remove Option in pubkey cache persistence file
* Ensure consistency between datadir/data_dir
* Fix failing network test
* Tidy
* Fix todos
* Improve tests
* Split up block processing metrics
* Tidy
* Refactor get_pubkey_from_state
* Remove commented-out code
* Add BeaconChain::validator_pubkey
* Use Option::filter
* Remove Box
* Comment out tests that fail due to hard-coded
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: Michael Sproul <micsproul@gmail.com>
Co-authored-by: pawan <pawandhananjay@gmail.com>
* Unfinished progress
* Update more persistence code
* Start fixing tests
* Combine persist head and fork choice
* Persist head on reorg
* Gracefully handle op pool and eth1 cache missing
* Fix test failure
* Address Michael's comments
* Start updating types
* WIP
* Signature hacking
* Existing EF tests passing with fake_crypto
* Updates
* Delete outdated API spec
* The refactor continues
* It compiles
* WIP test fixes
* All release tests passing bar genesis state parsing
* Update and test YamlConfig
* Update to spec v0.10 compatible BLS
* Updates to BLS EF tests
* Add EF test for AggregateVerify
And delete unused hash2curve tests for uncompressed points
* Update EF tests to v0.10.1
* Use optional block root correctly in block proc
* Use genesis fork in deposit domain. All tests pass
* Cargo fmt
* Fast aggregate verify test
* Update REST API docs
* Cargo fmt
* Fix unused import
* Bump spec tags to v0.10.1
* Add `seconds_per_eth1_block` to chainspec
* Update to timestamp based eth1 voting scheme
* Return None from `get_votes_to_consider` if block cache is empty
* Handle overflows in `is_candidate_block`
* Revert to failing tests
* Fix eth1 data sets test
* Choose default vote according to spec
* Fix collect_valid_votes tests
* Fix `get_votes_to_consider` to choose all eligible blocks
* Uncomment winning_vote tests
* Add comments; remove unused code
* Reduce seconds_per_eth1_block for simulation
* Addressed review comments
* Add test for default vote case
* Fix logs
* Remove unused functions
* Meter default eth1 votes
* Fix comments
* Address review comments; remove unused dependency
* Add first attempt at attestation proc. re-write
* Add version 2 of attestation processing
* Minor fixes
* Add validator pubkey cache
* Make get_indexed_attestation take a committee
* Link signature processing into new attn verification
* First working version
* Ensure pubkey cache is updated
* Add more metrics, slight optimizations
* Clone committee cache during attestation processing
* Update shuffling cache during block processing
* Remove old commented-out code
* Fix shuffling cache insert bug
* Used indexed attestation in fork choice
* Restructure attn processing, add metrics
* Add more detailed metrics
* Tidy, fix failing tests
* Fix failing tests, tidy
* Disable/delete two outdated tests
* Tidy
* Add pubkey cache persistence file
* Add more comments
* Integrate persistence file into builder
* Add pubkey cache tests
* Add data_dir to beacon chain builder
* Remove Option in pubkey cache persistence file
* Ensure consistency between datadir/data_dir
* Fix failing network test
* Tidy
* Fix todos
* Add attestation processing tests
* Add another test
* Only run attestation tests in release
* Make attestation tests MainnetEthSpec
* Address Michael's comments
* Remove redundant check
* Fix warning
* Fix failing test
Co-authored-by: Michael Sproul <micsproul@gmail.com>
Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
* Start updating types
* WIP
* Signature hacking
* Existing EF tests passing with fake_crypto
* Updates
* Delete outdated API spec
* The refactor continues
* It compiles
* WIP test fixes
* All release tests passing bar genesis state parsing
* Update and test YamlConfig
* Update to spec v0.10 compatible BLS
* Updates to BLS EF tests
* Add EF test for AggregateVerify
And delete unused hash2curve tests for uncompressed points
* Update EF tests to v0.10.1
* Use optional block root correctly in block proc
* Use genesis fork in deposit domain. All tests pass
* Cargo fmt
* Fast aggregate verify test
* Update REST API docs
* Cargo fmt
* Fix unused import
* Bump spec tags to v0.10.1
* Add `seconds_per_eth1_block` to chainspec
* Update to timestamp based eth1 voting scheme
* Return None from `get_votes_to_consider` if block cache is empty
* Handle overflows in `is_candidate_block`
* Revert to failing tests
* Fix eth1 data sets test
* Choose default vote according to spec
* Fix collect_valid_votes tests
* Fix `get_votes_to_consider` to choose all eligible blocks
* Uncomment winning_vote tests
* Add comments; remove unused code
* Reduce seconds_per_eth1_block for simulation
* Addressed review comments
* Add test for default vote case
* Fix logs
* Remove unused functions
* Meter default eth1 votes
* Fix comments
* Address review comments; remove unused dependency
* Disable/delete two outdated tests
* Bump eth1 default vote warn to error
* Delete outdated eth1 test
Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
* Add LRU caches to store
* Improvements to LRU caches
* Take state by value in `Store::put_state`
* Store blocks by value, configurable cache sizes
* Use a StateBatch to efficiently store skip states
* Fix store tests
* Add CloneConfig test, remove unused metrics
* Use Mutexes instead of RwLocks for LRU caches
* Start adding interop genesis state to lcli
* Use more efficient method to generate genesis state
* Remove duplicate int_to_bytes32
* Add lcli command to change state genesis time
* Add option to allow VC to start with unsynced BN
* Set VC to do parallel key loading
* Don't default to dummy eth1 backend
* Add endpoint to dump operation pool
* Add metrics for op pool
* Remove state clone for slot notifier
* Add mem size approximation for tree hash cache
* Avoid cloning tree hash when getting head
* Fix failing API tests
* Address Michael's comments
* Add HashMap::from_par_iter
* Add TimeoutRwLock to BeaconChain
* Update network crate
* Update rest api
* Fix beacon chain tests
* Fix rest api tests
* Set test back to !debug_assertions
* Add basic block/state caching on beacon chain
* Adds checkpoint cache
* Stop storing the tree hash cache in the db
* Remove dedunant beacon state write
* Use caching get methods in fork choice
* Use caching state getting in state_by_slot
* Add state.cacheless_clone
* Attempt to improve attestation processing times
* Introduce HeadInfo struct
* Used cache tree hash for block processing
* Use cached tree hash for block production too
* Update to spec v0.9.0
* Update to v0.9.1
* Bump spec tags for v0.9.1
* Formatting, fix CI failures
* Resolve accidental KeyPair merge conflict
* Document new BeaconState functions
* Fix incorrect cache drops in `advance_caches`
* Update fork choice for v0.9.1
* Clean up some FIXMEs
* Fix a few docs/logs
* Change into_iter to iter
* Fix clippy 'easy' warnings
* Clippy eth2/utils
* Add struct NetworkInfo
* Clippy for types, utils, and beacon_node/store/src/iters.rs
* Cargo fmt
* Change foo to my_foo
* Remove complex signature
* suppress clippy warning for unit_value in benches
* Use enumerate instead of iterating over range
* Allow trivially_copy_pass_by_ref in serde_utils
* Renamed fork_choice::process_attestation_from_block
* Processing attestation in fork choice
* Retrieving state from store and checking signature
* Looser check on beacon state validity.
* Cleaned up get_attestation_state
* Expanded fork choice api to provide latest validator message.
* Checking if the an attestation contains a latest message
* Correct process_attestation error handling.
* Copy paste error in comment fixed.
* Tidy ancestor iterators
* Getting attestation slot via helper method
* Refactored attestation creation in test utils
* Revert "Refactored attestation creation in test utils"
This reverts commit 4d277fe4239a7194758b18fb5c00dfe0b8231306.
* Integration tests for free attestation processing
* Implicit conflicts resolved.
* formatting
* Do first pass on Grants code
* Add another attestation processing test
* Tidy attestation processing
* Remove old code fragment
* Add non-compiling half finished changes
* Simplify, fix bugs, add tests for chain iters
* Remove attestation processing from op pool
* Fix bug with fork choice, tidy
* Fix overly restrictive check in fork choice.
* Ensure committee cache is build during attn proc
* Ignore unknown blocks at fork choice
* Various minor fixes
* Make fork choice write lock in to read lock
* Remove unused method
* Tidy comments
* Fix attestation prod. target roots change
* Fix compile error in store iters
* Reject any attestation prior to finalization
* Begin metrics refactor
* Move beacon_chain to new metrics structure.
* Make metrics not panic if already defined
* Use global prometheus gather at rest api
* Unify common metric fns into a crate
* Add heavy metering to block processing
* Remove hypen from prometheus metric name
* Add more beacon chain metrics
* Add beacon chain persistence metric
* Prune op pool on finalization
* Add extra prom beacon chain metrics
* Prefix BeaconChain metrics with "beacon_"
* Add more store metrics
* Add basic metrics to libp2p
* Add metrics to HTTP server
* Remove old `http_server` crate
* Update metrics names to be more like standard
* Fix broken beacon chain metrics, add slot clock metrics
* Add lighthouse_metrics gather fn
* Remove http args
* Fix wrong state given to op pool prune
* Make prom metric names more consistent
* Add more metrics, tidy existing metrics
* Fix store block read metrics
* Tidy attestation metrics
* Fix minor PR comments
* Allow travis failures on beta (see desc)
There's a non-backward compatible change in `cargo fmt`. Stable and beta
do not agree.
* Tidy `lighthouse_metrics` docs
* Fix typo