lighthouse/beacon_node
Michael Sproul 371c216ac3 Use read_recursive locks in database (#2417)
## Issue Addressed

Closes #2245

## Proposed Changes

Replace all calls to `RwLock::read` in the `store` crate with `RwLock::read_recursive`.

## Additional Info

* Unfortunately we can't run the deadlock detector on CI because it's pinned to an old Rust 1.51.0 nightly which cannot compile Lighthouse (one of our deps uses `ptr::addr_of!` which is too new). A fun side-project at some point might be to update the deadlock detector.
* The reason I think we haven't seen this deadlock (at all?) in practice is that _writes_ to the database's split point are quite infrequent, and a concurrent write is required to trigger the deadlock. The split point is only written when finalization advances, which is once per epoch (every ~6 minutes), and state reads are also quite sporadic. Perhaps we've just been incredibly lucky, or there's something about the timing of state reads vs database migration that protects us.
* I wrote a few small programs to demo the deadlock, and the effectiveness of the `read_recursive` fix: https://github.com/michaelsproul/relock_deadlock_mvp
* [The docs for `read_recursive`](https://docs.rs/lock_api/0.4.2/lock_api/struct.RwLock.html#method.read_recursive) warn of starvation for writers. I think in order for starvation to occur the database would have to be spammed with so many state reads that it's unable to ever clear them all and find time for a write, in which case migration of states to the freezer would cease. If an attack could be performed to trigger this starvation then it would likely trigger a deadlock in the current code, and I think ceasing migration is preferable to deadlocking in this extreme situation. In practice neither should occur due to protection from spammy peers at the network layer. Nevertheless, it would be prudent to run this change on the testnet nodes to check that it doesn't cause accidental starvation.
2021-07-12 07:31:26 +00:00
..
beacon_chain Altair consensus changes and refactors (#2279) 2021-07-09 06:15:32 +00:00
client Altair consensus changes and refactors (#2279) 2021-07-09 06:15:32 +00:00
eth1 Capture a missed VC error (#2436) 2021-07-09 03:20:24 +00:00
eth2_libp2p Altair consensus changes and refactors (#2279) 2021-07-09 06:15:32 +00:00
genesis Altair consensus changes and refactors (#2279) 2021-07-09 06:15:32 +00:00
http_api Adjust beacon node timeouts for validator client HTTP requests (#2352) 2021-07-12 01:47:48 +00:00
http_metrics Tune GNU malloc (#2299) 2021-05-28 05:59:45 +00:00
network Altair consensus changes and refactors (#2279) 2021-07-09 06:15:32 +00:00
operation_pool Altair consensus changes and refactors (#2279) 2021-07-09 06:15:32 +00:00
src Altair consensus changes and refactors (#2279) 2021-07-09 06:15:32 +00:00
store Use read_recursive locks in database (#2417) 2021-07-12 07:31:26 +00:00
tests Altair consensus changes and refactors (#2279) 2021-07-09 06:15:32 +00:00
timer Update to tokio 1.1 (#2172) 2021-02-10 23:29:49 +00:00
websocket_server Server sent events (#1920) 2020-12-04 00:18:58 +00:00
Cargo.toml v1.4.0 (#2402) 2021-06-10 01:44:49 +00:00