lighthouse/beacon_node/beacon_chain/src
Michael Sproul acd49d988d Implement database temp states to reduce memory usage (#1798)
## Issue Addressed

Closes #800
Closes #1713

## Proposed Changes

Implement the temporary state storage algorithm described in #800. Specifically:

* Add `DBColumn::BeaconStateTemporary`, for storing 0-length temporary marker values.
* Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully.
* Add a garbage collection process to delete leftover temporary states on start-up.
* Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784)

## Additional Info

There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant.

### Race 1: Permanent state marked temporary

EDIT: this has been fixed by the addition of a lock around the relevant critical section

There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events:

1. Thread 1 checks if state `s` already exists, and seeing that it doesn't, prepares an atomic commit of `(s, s_temporary_flag)`.
2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and _deleting_ the temporary flag.
3. Thread 1 is (finally) scheduled again, and marks `s` as temporary with its transaction.
4.
    a) The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state `s` will fail... hope it isn't on the main chain! Alternatively (4b) happens...
    b) Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state `s` will disappear temporarily, but will come back once thread 1 finishes running.

I _hope_ that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know

This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. [LMDB only allows one R/W transaction at a time](https://docs.rs/lmdb/0.8.0/lmdb/struct.Environment.html#method.begin_rw_txn).

### Race 2: Temporary state returned from `get_state`

I don't think this race really matters, but in `load_hot_state`, if another thread stores a state between when we call `load_state_temporary_flag` and when we call `load_hot_state_summary`, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than _not_ having data).

This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now.
2020-10-23 01:27:51 +00:00
..
attestation_verification.rs Add check for head/target consistency (#1702) 2020-10-03 10:08:06 +10:00
beacon_chain.rs Implement database temp states to reduce memory usage (#1798) 2020-10-23 01:27:51 +00:00
beacon_fork_choice_store.rs Apply store refactor to new fork choice 2020-06-17 15:20:44 +10:00
beacon_snapshot.rs Add no-copy block processing cache (#863) 2020-04-06 10:53:33 +10:00
block_verification.rs Implement database temp states to reduce memory usage (#1798) 2020-10-23 01:27:51 +00:00
builder.rs fix genesis state root provided to HTTP server (#1783) 2020-10-21 23:15:30 +00:00
chain_config.rs Implement database temp states to reduce memory usage (#1798) 2020-10-23 01:27:51 +00:00
errors.rs Weak subjectivity start from genesis (#1675) 2020-10-03 10:00:28 +10:00
eth1_chain.rs Upgrade discovery and restructure task execution (#1693) 2020-10-05 18:45:54 +11:00
events.rs Fix clippy warnings (#1385) 2020-07-23 14:18:00 +00:00
head_tracker.rs Fix head tracker concurrency bugs (#1771) 2020-10-19 05:58:39 +00:00
lib.rs Fix head tracker concurrency bugs (#1771) 2020-10-19 05:58:39 +00:00
metrics.rs Fix head tracker concurrency bugs (#1771) 2020-10-19 05:58:39 +00:00
migrate.rs Implement database temp states to reduce memory usage (#1798) 2020-10-23 01:27:51 +00:00
naive_aggregation_pool.rs Implement standard eth2.0 API (#1569) 2020-10-01 11:12:36 +10:00
observed_attestations.rs Move long-running tests to dbg (#1137) 2020-05-13 10:55:02 +10:00
observed_attesters.rs Fix clippy warnings (#1385) 2020-07-23 14:18:00 +00:00
observed_block_producers.rs Add attestation gossip pre-verification (#983) 2020-05-06 21:42:56 +10:00
observed_operations.rs Process exits and slashings off the network (#1253) 2020-06-18 21:06:34 +10:00
persisted_beacon_chain.rs Fix head tracker concurrency bugs (#1771) 2020-10-19 05:58:39 +00:00
persisted_fork_choice.rs v0.12 fork choice update (#1229) 2020-06-17 11:10:22 +10:00
shuffling_cache.rs Implement standard eth2.0 API (#1569) 2020-10-01 11:12:36 +10:00
snapshot_cache.rs Support multiple BLS implementations (#1335) 2020-07-25 02:03:18 +00:00
test_utils.rs Implement database temp states to reduce memory usage (#1798) 2020-10-23 01:27:51 +00:00
timeout_rw_lock.rs Add timeouts to canonical head rwlock (#759) 2020-01-06 17:30:37 +11:00
validator_pubkey_cache.rs Allow truncation of pubkey cache on creation (#1686) 2020-09-30 04:42:52 +00:00