2019-05-21 08:49:24 +00:00
|
|
|
//! Storage functionality for Lighthouse.
|
|
|
|
//!
|
|
|
|
//! Provides the following stores:
|
|
|
|
//!
|
2020-05-31 22:13:49 +00:00
|
|
|
//! - `HotColdDB`: an on-disk store backed by leveldb. Used in production.
|
2019-05-21 08:49:24 +00:00
|
|
|
//! - `MemoryStore`: an in-memory store backed by a hash-map. Used for testing.
|
|
|
|
//!
|
|
|
|
//! Provides a simple API for storing/retrieving all types that sometimes needs type-hints. See
|
|
|
|
//! tests for implementation examples.
|
2019-08-19 11:02:34 +00:00
|
|
|
#[macro_use]
|
|
|
|
extern crate lazy_static;
|
2019-05-21 08:49:24 +00:00
|
|
|
|
2019-12-06 07:52:11 +00:00
|
|
|
pub mod chunked_iter;
|
2019-11-26 23:54:46 +00:00
|
|
|
pub mod chunked_vector;
|
2019-12-06 03:29:06 +00:00
|
|
|
pub mod config;
|
2020-06-09 23:55:44 +00:00
|
|
|
pub mod errors;
|
2019-12-06 07:52:11 +00:00
|
|
|
mod forwards_iter;
|
Implement database temp states to reduce memory usage (#1798)
## Issue Addressed
Closes #800
Closes #1713
## Proposed Changes
Implement the temporary state storage algorithm described in #800. Specifically:
* Add `DBColumn::BeaconStateTemporary`, for storing 0-length temporary marker values.
* Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully.
* Add a garbage collection process to delete leftover temporary states on start-up.
* Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784)
## Additional Info
There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant.
### Race 1: Permanent state marked temporary
EDIT: this has been fixed by the addition of a lock around the relevant critical section
There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events:
1. Thread 1 checks if state `s` already exists, and seeing that it doesn't, prepares an atomic commit of `(s, s_temporary_flag)`.
2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and _deleting_ the temporary flag.
3. Thread 1 is (finally) scheduled again, and marks `s` as temporary with its transaction.
4.
a) The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state `s` will fail... hope it isn't on the main chain! Alternatively (4b) happens...
b) Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state `s` will disappear temporarily, but will come back once thread 1 finishes running.
I _hope_ that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know
This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. [LMDB only allows one R/W transaction at a time](https://docs.rs/lmdb/0.8.0/lmdb/struct.Environment.html#method.begin_rw_txn).
### Race 2: Temporary state returned from `get_state`
I don't think this race really matters, but in `load_hot_state`, if another thread stores a state between when we call `load_state_temporary_flag` and when we call `load_hot_state_summary`, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than _not_ having data).
This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now.
2020-10-23 01:27:51 +00:00
|
|
|
mod garbage_collection;
|
2020-04-20 09:59:56 +00:00
|
|
|
pub mod hot_cold_store;
|
2019-05-20 08:01:51 +00:00
|
|
|
mod impls;
|
2019-05-21 06:29:34 +00:00
|
|
|
mod leveldb_store;
|
2019-05-21 08:49:24 +00:00
|
|
|
mod memory_store;
|
2021-03-04 01:25:12 +00:00
|
|
|
pub mod metadata;
|
2021-05-26 05:58:41 +00:00
|
|
|
pub mod metrics;
|
2019-11-26 23:54:46 +00:00
|
|
|
mod partial_beacon_state;
|
2019-02-14 01:09:18 +00:00
|
|
|
|
2019-06-15 13:56:41 +00:00
|
|
|
pub mod iter;
|
|
|
|
|
2019-12-06 03:29:06 +00:00
|
|
|
pub use self::config::StoreConfig;
|
2020-08-26 09:24:55 +00:00
|
|
|
pub use self::hot_cold_store::{BlockReplay, HotColdDB, HotStateSummary, Split};
|
2020-05-31 22:13:49 +00:00
|
|
|
pub use self::leveldb_store::LevelDB;
|
2019-05-21 08:49:24 +00:00
|
|
|
pub use self::memory_store::MemoryStore;
|
2019-11-26 23:54:46 +00:00
|
|
|
pub use self::partial_beacon_state::PartialBeaconState;
|
2019-05-20 08:01:51 +00:00
|
|
|
pub use errors::Error;
|
2019-12-06 05:44:03 +00:00
|
|
|
pub use impls::beacon_state::StorageContainer as BeaconStateStorageContainer;
|
2019-08-19 11:02:34 +00:00
|
|
|
pub use metrics::scrape_for_metrics;
|
Implement database temp states to reduce memory usage (#1798)
## Issue Addressed
Closes #800
Closes #1713
## Proposed Changes
Implement the temporary state storage algorithm described in #800. Specifically:
* Add `DBColumn::BeaconStateTemporary`, for storing 0-length temporary marker values.
* Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully.
* Add a garbage collection process to delete leftover temporary states on start-up.
* Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784)
## Additional Info
There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant.
### Race 1: Permanent state marked temporary
EDIT: this has been fixed by the addition of a lock around the relevant critical section
There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events:
1. Thread 1 checks if state `s` already exists, and seeing that it doesn't, prepares an atomic commit of `(s, s_temporary_flag)`.
2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and _deleting_ the temporary flag.
3. Thread 1 is (finally) scheduled again, and marks `s` as temporary with its transaction.
4.
a) The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state `s` will fail... hope it isn't on the main chain! Alternatively (4b) happens...
b) Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state `s` will disappear temporarily, but will come back once thread 1 finishes running.
I _hope_ that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know
This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. [LMDB only allows one R/W transaction at a time](https://docs.rs/lmdb/0.8.0/lmdb/struct.Environment.html#method.begin_rw_txn).
### Race 2: Temporary state returned from `get_state`
I don't think this race really matters, but in `load_hot_state`, if another thread stores a state between when we call `load_state_temporary_flag` and when we call `load_hot_state_summary`, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than _not_ having data).
This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now.
2020-10-23 01:27:51 +00:00
|
|
|
use parking_lot::MutexGuard;
|
2019-05-20 08:01:51 +00:00
|
|
|
pub use types::*;
|
|
|
|
|
2020-05-31 22:13:49 +00:00
|
|
|
pub trait KeyValueStore<E: EthSpec>: Sync + Send + Sized + 'static {
|
2019-11-26 23:54:46 +00:00
|
|
|
/// Retrieve some bytes in `column` with `key`.
|
|
|
|
fn get_bytes(&self, column: &str, key: &[u8]) -> Result<Option<Vec<u8>>, Error>;
|
|
|
|
|
|
|
|
/// Store some `value` in `column`, indexed with `key`.
|
|
|
|
fn put_bytes(&self, column: &str, key: &[u8], value: &[u8]) -> Result<(), Error>;
|
|
|
|
|
2020-07-02 23:47:31 +00:00
|
|
|
/// Same as put_bytes() but also force a flush to disk
|
|
|
|
fn put_bytes_sync(&self, column: &str, key: &[u8], value: &[u8]) -> Result<(), Error>;
|
|
|
|
|
|
|
|
/// Flush to disk. See
|
|
|
|
/// https://chromium.googlesource.com/external/leveldb/+/HEAD/doc/index.md#synchronous-writes
|
|
|
|
/// for details.
|
|
|
|
fn sync(&self) -> Result<(), Error>;
|
|
|
|
|
2019-11-26 23:54:46 +00:00
|
|
|
/// Return `true` if `key` exists in `column`.
|
|
|
|
fn key_exists(&self, column: &str, key: &[u8]) -> Result<bool, Error>;
|
|
|
|
|
|
|
|
/// Removes `key` from `column`.
|
|
|
|
fn key_delete(&self, column: &str, key: &[u8]) -> Result<(), Error>;
|
|
|
|
|
2020-05-31 22:13:49 +00:00
|
|
|
/// Execute either all of the operations in `batch` or none at all, returning an error.
|
2020-07-01 02:45:57 +00:00
|
|
|
fn do_atomically(&self, batch: Vec<KeyValueStoreOp>) -> Result<(), Error>;
|
Implement database temp states to reduce memory usage (#1798)
## Issue Addressed
Closes #800
Closes #1713
## Proposed Changes
Implement the temporary state storage algorithm described in #800. Specifically:
* Add `DBColumn::BeaconStateTemporary`, for storing 0-length temporary marker values.
* Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully.
* Add a garbage collection process to delete leftover temporary states on start-up.
* Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784)
## Additional Info
There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant.
### Race 1: Permanent state marked temporary
EDIT: this has been fixed by the addition of a lock around the relevant critical section
There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events:
1. Thread 1 checks if state `s` already exists, and seeing that it doesn't, prepares an atomic commit of `(s, s_temporary_flag)`.
2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and _deleting_ the temporary flag.
3. Thread 1 is (finally) scheduled again, and marks `s` as temporary with its transaction.
4.
a) The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state `s` will fail... hope it isn't on the main chain! Alternatively (4b) happens...
b) Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state `s` will disappear temporarily, but will come back once thread 1 finishes running.
I _hope_ that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know
This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. [LMDB only allows one R/W transaction at a time](https://docs.rs/lmdb/0.8.0/lmdb/struct.Environment.html#method.begin_rw_txn).
### Race 2: Temporary state returned from `get_state`
I don't think this race really matters, but in `load_hot_state`, if another thread stores a state between when we call `load_state_temporary_flag` and when we call `load_hot_state_summary`, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than _not_ having data).
This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now.
2020-10-23 01:27:51 +00:00
|
|
|
|
|
|
|
/// Return a mutex guard that can be used to synchronize sensitive transactions.
|
|
|
|
///
|
|
|
|
/// This doesn't prevent other threads writing to the DB unless they also use
|
|
|
|
/// this method. In future we may implement a safer mandatory locking scheme.
|
|
|
|
fn begin_rw_transaction(&self) -> MutexGuard<()>;
|
2020-11-09 07:02:21 +00:00
|
|
|
|
|
|
|
/// Compact the database, freeing space used by deleted items.
|
|
|
|
fn compact(&self) -> Result<(), Error>;
|
2020-06-16 01:34:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
pub fn get_key_for_col(column: &str, key: &[u8]) -> Vec<u8> {
|
|
|
|
let mut result = column.as_bytes().to_vec();
|
|
|
|
result.extend_from_slice(key);
|
|
|
|
result
|
|
|
|
}
|
|
|
|
|
|
|
|
pub enum KeyValueStoreOp {
|
2020-07-01 02:45:57 +00:00
|
|
|
PutKeyValue(Vec<u8>, Vec<u8>),
|
2020-06-16 01:34:04 +00:00
|
|
|
DeleteKey(Vec<u8>),
|
2020-05-31 22:13:49 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
pub trait ItemStore<E: EthSpec>: KeyValueStore<E> + Sync + Send + Sized + 'static {
|
2019-05-21 08:49:24 +00:00
|
|
|
/// Store an item in `Self`.
|
2020-05-31 22:13:49 +00:00
|
|
|
fn put<I: StoreItem>(&self, key: &Hash256, item: &I) -> Result<(), Error> {
|
2020-05-25 00:26:54 +00:00
|
|
|
let column = I::db_column().into();
|
|
|
|
let key = key.as_bytes();
|
|
|
|
|
|
|
|
self.put_bytes(column, key, &item.as_store_bytes())
|
|
|
|
.map_err(Into::into)
|
2019-05-20 08:01:51 +00:00
|
|
|
}
|
|
|
|
|
2020-07-02 23:47:31 +00:00
|
|
|
fn put_sync<I: StoreItem>(&self, key: &Hash256, item: &I) -> Result<(), Error> {
|
|
|
|
let column = I::db_column().into();
|
|
|
|
let key = key.as_bytes();
|
|
|
|
|
|
|
|
self.put_bytes_sync(column, key, &item.as_store_bytes())
|
|
|
|
.map_err(Into::into)
|
|
|
|
}
|
|
|
|
|
2019-05-21 08:49:24 +00:00
|
|
|
/// Retrieve an item from `Self`.
|
2020-05-31 22:13:49 +00:00
|
|
|
fn get<I: StoreItem>(&self, key: &Hash256) -> Result<Option<I>, Error> {
|
2020-05-25 00:26:54 +00:00
|
|
|
let column = I::db_column().into();
|
|
|
|
let key = key.as_bytes();
|
|
|
|
|
|
|
|
match self.get_bytes(column, key)? {
|
|
|
|
Some(bytes) => Ok(Some(I::from_store_bytes(&bytes[..])?)),
|
|
|
|
None => Ok(None),
|
|
|
|
}
|
2019-05-20 08:01:51 +00:00
|
|
|
}
|
|
|
|
|
2019-05-21 08:49:24 +00:00
|
|
|
/// Returns `true` if the given key represents an item in `Self`.
|
2020-05-31 22:13:49 +00:00
|
|
|
fn exists<I: StoreItem>(&self, key: &Hash256) -> Result<bool, Error> {
|
2020-05-25 00:26:54 +00:00
|
|
|
let column = I::db_column().into();
|
|
|
|
let key = key.as_bytes();
|
|
|
|
|
|
|
|
self.key_exists(column, key)
|
2019-05-20 08:01:51 +00:00
|
|
|
}
|
|
|
|
|
2019-05-21 08:49:24 +00:00
|
|
|
/// Remove an item from `Self`.
|
2020-05-31 22:13:49 +00:00
|
|
|
fn delete<I: StoreItem>(&self, key: &Hash256) -> Result<(), Error> {
|
2020-05-25 00:26:54 +00:00
|
|
|
let column = I::db_column().into();
|
|
|
|
let key = key.as_bytes();
|
|
|
|
|
|
|
|
self.key_delete(column, key)
|
2019-05-20 08:01:51 +00:00
|
|
|
}
|
2020-05-31 22:13:49 +00:00
|
|
|
}
|
|
|
|
|
2020-05-16 03:23:32 +00:00
|
|
|
/// Reified key-value storage operation. Helps in modifying the storage atomically.
|
|
|
|
/// See also https://github.com/sigp/lighthouse/issues/692
|
2020-07-01 02:45:57 +00:00
|
|
|
pub enum StoreOp<'a, E: EthSpec> {
|
Implement database temp states to reduce memory usage (#1798)
## Issue Addressed
Closes #800
Closes #1713
## Proposed Changes
Implement the temporary state storage algorithm described in #800. Specifically:
* Add `DBColumn::BeaconStateTemporary`, for storing 0-length temporary marker values.
* Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully.
* Add a garbage collection process to delete leftover temporary states on start-up.
* Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784)
## Additional Info
There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant.
### Race 1: Permanent state marked temporary
EDIT: this has been fixed by the addition of a lock around the relevant critical section
There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events:
1. Thread 1 checks if state `s` already exists, and seeing that it doesn't, prepares an atomic commit of `(s, s_temporary_flag)`.
2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and _deleting_ the temporary flag.
3. Thread 1 is (finally) scheduled again, and marks `s` as temporary with its transaction.
4.
a) The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state `s` will fail... hope it isn't on the main chain! Alternatively (4b) happens...
b) Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state `s` will disappear temporarily, but will come back once thread 1 finishes running.
I _hope_ that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know
This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. [LMDB only allows one R/W transaction at a time](https://docs.rs/lmdb/0.8.0/lmdb/struct.Environment.html#method.begin_rw_txn).
### Race 2: Temporary state returned from `get_state`
I don't think this race really matters, but in `load_hot_state`, if another thread stores a state between when we call `load_state_temporary_flag` and when we call `load_hot_state_summary`, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than _not_ having data).
This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now.
2020-10-23 01:27:51 +00:00
|
|
|
PutBlock(Hash256, Box<SignedBeaconBlock<E>>),
|
|
|
|
PutState(Hash256, &'a BeaconState<E>),
|
|
|
|
PutStateSummary(Hash256, HotStateSummary),
|
|
|
|
PutStateTemporaryFlag(Hash256),
|
|
|
|
DeleteStateTemporaryFlag(Hash256),
|
|
|
|
DeleteBlock(Hash256),
|
|
|
|
DeleteState(Hash256, Option<Slot>),
|
2020-05-16 03:23:32 +00:00
|
|
|
}
|
|
|
|
|
2019-05-21 08:49:24 +00:00
|
|
|
/// A unique column identifier.
|
2019-11-26 23:54:46 +00:00
|
|
|
#[derive(Debug, Clone, Copy, PartialEq)]
|
2019-05-20 08:01:51 +00:00
|
|
|
pub enum DBColumn {
|
2019-11-26 23:54:46 +00:00
|
|
|
/// For data related to the database itself.
|
|
|
|
BeaconMeta,
|
2019-05-20 08:01:51 +00:00
|
|
|
BeaconBlock,
|
|
|
|
BeaconState,
|
2020-03-06 05:09:41 +00:00
|
|
|
/// For persisting in-memory state to the database.
|
2019-05-20 08:01:51 +00:00
|
|
|
BeaconChain,
|
2020-03-06 05:09:41 +00:00
|
|
|
OpPool,
|
|
|
|
Eth1Cache,
|
|
|
|
ForkChoice,
|
2021-03-04 01:25:12 +00:00
|
|
|
PubkeyCache,
|
2019-12-06 03:29:06 +00:00
|
|
|
/// For the table mapping restore point numbers to state roots.
|
|
|
|
BeaconRestorePoint,
|
2020-01-08 02:58:01 +00:00
|
|
|
/// For the mapping from state roots to their slots or summaries.
|
|
|
|
BeaconStateSummary,
|
Implement database temp states to reduce memory usage (#1798)
## Issue Addressed
Closes #800
Closes #1713
## Proposed Changes
Implement the temporary state storage algorithm described in #800. Specifically:
* Add `DBColumn::BeaconStateTemporary`, for storing 0-length temporary marker values.
* Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully.
* Add a garbage collection process to delete leftover temporary states on start-up.
* Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784)
## Additional Info
There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant.
### Race 1: Permanent state marked temporary
EDIT: this has been fixed by the addition of a lock around the relevant critical section
There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events:
1. Thread 1 checks if state `s` already exists, and seeing that it doesn't, prepares an atomic commit of `(s, s_temporary_flag)`.
2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and _deleting_ the temporary flag.
3. Thread 1 is (finally) scheduled again, and marks `s` as temporary with its transaction.
4.
a) The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state `s` will fail... hope it isn't on the main chain! Alternatively (4b) happens...
b) Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state `s` will disappear temporarily, but will come back once thread 1 finishes running.
I _hope_ that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know
This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. [LMDB only allows one R/W transaction at a time](https://docs.rs/lmdb/0.8.0/lmdb/struct.Environment.html#method.begin_rw_txn).
### Race 2: Temporary state returned from `get_state`
I don't think this race really matters, but in `load_hot_state`, if another thread stores a state between when we call `load_state_temporary_flag` and when we call `load_hot_state_summary`, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than _not_ having data).
This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now.
2020-10-23 01:27:51 +00:00
|
|
|
/// For the list of temporary states stored during block import,
|
|
|
|
/// and then made non-temporary by the deletion of their state root from this column.
|
|
|
|
BeaconStateTemporary,
|
2019-11-26 23:54:46 +00:00
|
|
|
BeaconBlockRoots,
|
|
|
|
BeaconStateRoots,
|
|
|
|
BeaconHistoricalRoots,
|
|
|
|
BeaconRandaoMixes,
|
2020-01-23 07:16:11 +00:00
|
|
|
DhtEnrs,
|
2019-05-20 08:01:51 +00:00
|
|
|
}
|
|
|
|
|
2019-11-26 23:54:46 +00:00
|
|
|
impl Into<&'static str> for DBColumn {
|
2020-09-30 02:36:07 +00:00
|
|
|
/// Returns a `&str` prefix to be added to keys before they hit the key-value database.
|
2019-11-26 23:54:46 +00:00
|
|
|
fn into(self) -> &'static str {
|
2019-05-20 08:01:51 +00:00
|
|
|
match self {
|
2019-11-26 23:54:46 +00:00
|
|
|
DBColumn::BeaconMeta => "bma",
|
|
|
|
DBColumn::BeaconBlock => "blk",
|
|
|
|
DBColumn::BeaconState => "ste",
|
|
|
|
DBColumn::BeaconChain => "bch",
|
2020-03-06 05:09:41 +00:00
|
|
|
DBColumn::OpPool => "opo",
|
|
|
|
DBColumn::Eth1Cache => "etc",
|
|
|
|
DBColumn::ForkChoice => "frk",
|
2021-03-04 01:25:12 +00:00
|
|
|
DBColumn::PubkeyCache => "pkc",
|
2019-12-06 03:29:06 +00:00
|
|
|
DBColumn::BeaconRestorePoint => "brp",
|
2020-01-08 02:58:01 +00:00
|
|
|
DBColumn::BeaconStateSummary => "bss",
|
Implement database temp states to reduce memory usage (#1798)
## Issue Addressed
Closes #800
Closes #1713
## Proposed Changes
Implement the temporary state storage algorithm described in #800. Specifically:
* Add `DBColumn::BeaconStateTemporary`, for storing 0-length temporary marker values.
* Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully.
* Add a garbage collection process to delete leftover temporary states on start-up.
* Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784)
## Additional Info
There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant.
### Race 1: Permanent state marked temporary
EDIT: this has been fixed by the addition of a lock around the relevant critical section
There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events:
1. Thread 1 checks if state `s` already exists, and seeing that it doesn't, prepares an atomic commit of `(s, s_temporary_flag)`.
2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and _deleting_ the temporary flag.
3. Thread 1 is (finally) scheduled again, and marks `s` as temporary with its transaction.
4.
a) The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state `s` will fail... hope it isn't on the main chain! Alternatively (4b) happens...
b) Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state `s` will disappear temporarily, but will come back once thread 1 finishes running.
I _hope_ that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know
This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. [LMDB only allows one R/W transaction at a time](https://docs.rs/lmdb/0.8.0/lmdb/struct.Environment.html#method.begin_rw_txn).
### Race 2: Temporary state returned from `get_state`
I don't think this race really matters, but in `load_hot_state`, if another thread stores a state between when we call `load_state_temporary_flag` and when we call `load_hot_state_summary`, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than _not_ having data).
This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now.
2020-10-23 01:27:51 +00:00
|
|
|
DBColumn::BeaconStateTemporary => "bst",
|
2019-11-26 23:54:46 +00:00
|
|
|
DBColumn::BeaconBlockRoots => "bbr",
|
|
|
|
DBColumn::BeaconStateRoots => "bsr",
|
|
|
|
DBColumn::BeaconHistoricalRoots => "bhr",
|
|
|
|
DBColumn::BeaconRandaoMixes => "brm",
|
2020-01-23 07:16:11 +00:00
|
|
|
DBColumn::DhtEnrs => "dht",
|
2019-05-20 08:01:51 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Implement database temp states to reduce memory usage (#1798)
## Issue Addressed
Closes #800
Closes #1713
## Proposed Changes
Implement the temporary state storage algorithm described in #800. Specifically:
* Add `DBColumn::BeaconStateTemporary`, for storing 0-length temporary marker values.
* Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully.
* Add a garbage collection process to delete leftover temporary states on start-up.
* Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784)
## Additional Info
There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant.
### Race 1: Permanent state marked temporary
EDIT: this has been fixed by the addition of a lock around the relevant critical section
There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events:
1. Thread 1 checks if state `s` already exists, and seeing that it doesn't, prepares an atomic commit of `(s, s_temporary_flag)`.
2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and _deleting_ the temporary flag.
3. Thread 1 is (finally) scheduled again, and marks `s` as temporary with its transaction.
4.
a) The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state `s` will fail... hope it isn't on the main chain! Alternatively (4b) happens...
b) Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state `s` will disappear temporarily, but will come back once thread 1 finishes running.
I _hope_ that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know
This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. [LMDB only allows one R/W transaction at a time](https://docs.rs/lmdb/0.8.0/lmdb/struct.Environment.html#method.begin_rw_txn).
### Race 2: Temporary state returned from `get_state`
I don't think this race really matters, but in `load_hot_state`, if another thread stores a state between when we call `load_state_temporary_flag` and when we call `load_hot_state_summary`, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than _not_ having data).
This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now.
2020-10-23 01:27:51 +00:00
|
|
|
impl DBColumn {
|
|
|
|
pub fn as_str(self) -> &'static str {
|
|
|
|
self.into()
|
|
|
|
}
|
|
|
|
|
|
|
|
pub fn as_bytes(self) -> &'static [u8] {
|
|
|
|
self.as_str().as_bytes()
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-11-26 23:54:46 +00:00
|
|
|
/// An item that may stored in a `Store` by serializing and deserializing from bytes.
|
2020-05-31 22:13:49 +00:00
|
|
|
pub trait StoreItem: Sized {
|
2019-05-21 08:49:24 +00:00
|
|
|
/// Identifies which column this item should be placed in.
|
2019-05-20 08:01:51 +00:00
|
|
|
fn db_column() -> DBColumn;
|
|
|
|
|
2019-05-21 08:49:24 +00:00
|
|
|
/// Serialize `self` as bytes.
|
2019-05-20 08:01:51 +00:00
|
|
|
fn as_store_bytes(&self) -> Vec<u8>;
|
|
|
|
|
2019-05-21 08:49:24 +00:00
|
|
|
/// De-serialize `self` from bytes.
|
2019-11-26 23:54:46 +00:00
|
|
|
///
|
|
|
|
/// Return an instance of the type and the number of bytes that were read.
|
|
|
|
fn from_store_bytes(bytes: &[u8]) -> Result<Self, Error>;
|
2020-07-01 02:45:57 +00:00
|
|
|
|
|
|
|
fn as_kv_store_op(&self, key: Hash256) -> KeyValueStoreOp {
|
|
|
|
let db_key = get_key_for_col(Self::db_column().into(), key.as_bytes());
|
|
|
|
KeyValueStoreOp::PutKeyValue(db_key, self.as_store_bytes())
|
|
|
|
}
|
2019-11-26 23:54:46 +00:00
|
|
|
}
|
|
|
|
|
2019-05-20 08:01:51 +00:00
|
|
|
#[cfg(test)]
|
|
|
|
mod tests {
|
|
|
|
use super::*;
|
|
|
|
use ssz::{Decode, Encode};
|
|
|
|
use ssz_derive::{Decode, Encode};
|
2019-05-21 06:29:34 +00:00
|
|
|
use tempfile::tempdir;
|
2019-05-20 08:01:51 +00:00
|
|
|
|
|
|
|
#[derive(PartialEq, Debug, Encode, Decode)]
|
|
|
|
struct StorableThing {
|
|
|
|
a: u64,
|
|
|
|
b: u64,
|
|
|
|
}
|
|
|
|
|
2020-05-31 22:13:49 +00:00
|
|
|
impl StoreItem for StorableThing {
|
2019-05-20 08:01:51 +00:00
|
|
|
fn db_column() -> DBColumn {
|
|
|
|
DBColumn::BeaconBlock
|
|
|
|
}
|
|
|
|
|
|
|
|
fn as_store_bytes(&self) -> Vec<u8> {
|
|
|
|
self.as_ssz_bytes()
|
|
|
|
}
|
|
|
|
|
2019-11-26 23:54:46 +00:00
|
|
|
fn from_store_bytes(bytes: &[u8]) -> Result<Self, Error> {
|
2019-05-20 08:01:51 +00:00
|
|
|
Self::from_ssz_bytes(bytes).map_err(Into::into)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-05-31 22:13:49 +00:00
|
|
|
fn test_impl(store: impl ItemStore<MinimalEthSpec>) {
|
2019-05-21 06:29:34 +00:00
|
|
|
let key = Hash256::random();
|
|
|
|
let item = StorableThing { a: 1, b: 42 };
|
|
|
|
|
2021-06-18 05:58:01 +00:00
|
|
|
assert!(!store.exists::<StorableThing>(&key).unwrap());
|
2019-05-21 06:37:15 +00:00
|
|
|
|
2019-05-21 06:29:34 +00:00
|
|
|
store.put(&key, &item).unwrap();
|
|
|
|
|
2021-06-18 05:58:01 +00:00
|
|
|
assert!(store.exists::<StorableThing>(&key).unwrap());
|
2019-05-21 06:29:34 +00:00
|
|
|
|
2019-05-21 06:37:15 +00:00
|
|
|
let retrieved = store.get(&key).unwrap().unwrap();
|
2019-05-21 06:29:34 +00:00
|
|
|
assert_eq!(item, retrieved);
|
2019-05-21 06:37:15 +00:00
|
|
|
|
|
|
|
store.delete::<StorableThing>(&key).unwrap();
|
|
|
|
|
2021-06-18 05:58:01 +00:00
|
|
|
assert!(!store.exists::<StorableThing>(&key).unwrap());
|
2019-05-21 06:37:15 +00:00
|
|
|
|
2020-05-21 00:21:44 +00:00
|
|
|
assert_eq!(store.get::<StorableThing>(&key).unwrap(), None);
|
2019-05-21 06:29:34 +00:00
|
|
|
}
|
|
|
|
|
2019-11-26 23:54:46 +00:00
|
|
|
#[test]
|
|
|
|
fn simplediskdb() {
|
2019-05-21 06:37:15 +00:00
|
|
|
let dir = tempdir().unwrap();
|
|
|
|
let path = dir.path();
|
2020-05-31 22:13:49 +00:00
|
|
|
let store = LevelDB::open(&path).unwrap();
|
2019-05-20 08:01:51 +00:00
|
|
|
|
2019-05-21 06:37:15 +00:00
|
|
|
test_impl(store);
|
|
|
|
}
|
2019-05-20 08:01:51 +00:00
|
|
|
|
2019-05-21 06:37:15 +00:00
|
|
|
#[test]
|
|
|
|
fn memorydb() {
|
2019-05-21 08:20:23 +00:00
|
|
|
let store = MemoryStore::open();
|
2019-05-20 08:01:51 +00:00
|
|
|
|
2019-05-21 06:37:15 +00:00
|
|
|
test_impl(store);
|
2019-05-20 08:01:51 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
fn exists() {
|
2019-12-06 07:52:11 +00:00
|
|
|
let store = MemoryStore::<MinimalEthSpec>::open();
|
2019-05-20 08:01:51 +00:00
|
|
|
let key = Hash256::random();
|
|
|
|
let item = StorableThing { a: 1, b: 42 };
|
|
|
|
|
2021-06-18 05:58:01 +00:00
|
|
|
assert!(!store.exists::<StorableThing>(&key).unwrap());
|
2019-05-20 08:01:51 +00:00
|
|
|
|
|
|
|
store.put(&key, &item).unwrap();
|
|
|
|
|
2021-06-18 05:58:01 +00:00
|
|
|
assert!(store.exists::<StorableThing>(&key).unwrap());
|
2019-05-20 08:01:51 +00:00
|
|
|
|
|
|
|
store.delete::<StorableThing>(&key).unwrap();
|
2019-02-27 23:24:27 +00:00
|
|
|
|
2021-06-18 05:58:01 +00:00
|
|
|
assert!(!store.exists::<StorableThing>(&key).unwrap());
|
2019-05-20 08:01:51 +00:00
|
|
|
}
|
2019-02-27 23:24:27 +00:00
|
|
|
}
|