lighthouse/beacon_node/store/src/metadata.rs

use crate::{DBColumn, Error, StoreItem};
use ssz::{Decode, Encode};
use types::{Checkpoint, Hash256};

pub const CURRENT_SCHEMA_VERSION: SchemaVersion = SchemaVersion(2);

// All the keys that get stored under the `BeaconMeta` column.
//
// We use `repeat_byte` because it's a const fn.
pub const SCHEMA_VERSION_KEY: Hash256 = Hash256::repeat_byte(0);
pub const CONFIG_KEY: Hash256 = Hash256::repeat_byte(1);
pub const SPLIT_KEY: Hash256 = Hash256::repeat_byte(2);
pub const PRUNING_CHECKPOINT_KEY: Hash256 = Hash256::repeat_byte(3);

#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
pub struct SchemaVersion(pub u64);

impl SchemaVersion {
    pub fn as_u64(self) -> u64 {
        self.0
    }
}

impl StoreItem for SchemaVersion {
    fn db_column() -> DBColumn {
        DBColumn::BeaconMeta
    }

    fn as_store_bytes(&self) -> Vec<u8> {
        self.0.as_ssz_bytes()
    }

    fn from_store_bytes(bytes: &[u8]) -> Result<Self, Error> {
        Ok(SchemaVersion(u64::from_ssz_bytes(bytes)?))
    }
}

/// The checkpoint used for pruning the database.
///
/// Updated whenever pruning is successful.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct PruningCheckpoint {
    pub checkpoint: Checkpoint,
}

impl StoreItem for PruningCheckpoint {
    fn db_column() -> DBColumn {
        DBColumn::BeaconMeta
    }

    fn as_store_bytes(&self) -> Vec<u8> {
        self.checkpoint.as_ssz_bytes()
    }

    fn from_store_bytes(bytes: &[u8]) -> Result<Self, Error> {
        Ok(PruningCheckpoint {
            checkpoint: Checkpoint::from_ssz_bytes(bytes)?,
        })
    }
}
Add database schema versioning (#1688) ## Issue Addressed Closes #673 ## Proposed Changes Store a schema version in the database so that future releases can check they're running against a compatible database version. This would also enable automatic migration on breaking database changes, but that's left as future work. The database config is also stored in the database so that the `slots_per_restore_point` value can be checked for consistency, which closes #673 2020-09-30 02:36:07 +00:00			`use crate::{DBColumn, Error, StoreItem};`
			`use ssz::{Decode, Encode};`
Compact database on finalization (#1871) ## Issue Addressed Closes #1866 ## Proposed Changes * Compact the database on finalization. This removes the deleted states from disk completely. Because it happens in the background migrator, it doesn't block other database operations while it runs. On my Medalla node it took about 1 minute and shrank the database from 90GB to 9GB. * Fix an inefficiency in the pruning algorithm where it would always use the genesis checkpoint as the `old_finalized_checkpoint` when running for the first time after start-up. This would result in loading lots of states one-at-a-time back to genesis, and storing a lot of block roots in memory. The new code stores the old finalized checkpoint on disk and only uses genesis if no checkpoint is already stored. This makes it both backwards compatible _and_ forwards compatible -- no schema change required! * Introduce two new `INFO` logs to indicate when pruning has started and completed. Users seem to want to know this information without enabling debug logs! 2020-11-09 07:02:21 +00:00			`use types::{Checkpoint, Hash256};`
Add database schema versioning (#1688) ## Issue Addressed Closes #673 ## Proposed Changes Store a schema version in the database so that future releases can check they're running against a compatible database version. This would also enable automatic migration on breaking database changes, but that's left as future work. The database config is also stored in the database so that the `slots_per_restore_point` value can be checked for consistency, which closes #673 2020-09-30 02:36:07 +00:00
Implement database temp states to reduce memory usage (#1798) ## Issue Addressed Closes #800 Closes #1713 ## Proposed Changes Implement the temporary state storage algorithm described in #800. Specifically: * Add `DBColumn::BeaconStateTemporary`, for storing 0-length temporary marker values. * Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully. * Add a garbage collection process to delete leftover temporary states on start-up. * Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784) ## Additional Info There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant. ### Race 1: Permanent state marked temporary EDIT: this has been fixed by the addition of a lock around the relevant critical section There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events: 1. Thread 1 checks if state `s` already exists, and seeing that it doesn't, prepares an atomic commit of `(s, s_temporary_flag)`. 2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and _deleting_ the temporary flag. 3. Thread 1 is (finally) scheduled again, and marks `s` as temporary with its transaction. 4. a) The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state `s` will fail... hope it isn't on the main chain! Alternatively (4b) happens... b) Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state `s` will disappear temporarily, but will come back once thread 1 finishes running. I _hope_ that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. [LMDB only allows one R/W transaction at a time](https://docs.rs/lmdb/0.8.0/lmdb/struct.Environment.html#method.begin_rw_txn). ### Race 2: Temporary state returned from `get_state` I don't think this race really matters, but in `load_hot_state`, if another thread stores a state between when we call `load_state_temporary_flag` and when we call `load_hot_state_summary`, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than _not_ having data). This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now. 2020-10-23 01:27:51 +00:00			`pub const CURRENT_SCHEMA_VERSION: SchemaVersion = SchemaVersion(2);`
Add database schema versioning (#1688) ## Issue Addressed Closes #673 ## Proposed Changes Store a schema version in the database so that future releases can check they're running against a compatible database version. This would also enable automatic migration on breaking database changes, but that's left as future work. The database config is also stored in the database so that the `slots_per_restore_point` value can be checked for consistency, which closes #673 2020-09-30 02:36:07 +00:00
			// All the keys that get stored under the `BeaconMeta` column.
			`//`
			// We use `repeat_byte` because it's a const fn.
			`pub const SCHEMA_VERSION_KEY: Hash256 = Hash256::repeat_byte(0);`
			`pub const CONFIG_KEY: Hash256 = Hash256::repeat_byte(1);`
			`pub const SPLIT_KEY: Hash256 = Hash256::repeat_byte(2);`
Compact database on finalization (#1871) ## Issue Addressed Closes #1866 ## Proposed Changes * Compact the database on finalization. This removes the deleted states from disk completely. Because it happens in the background migrator, it doesn't block other database operations while it runs. On my Medalla node it took about 1 minute and shrank the database from 90GB to 9GB. * Fix an inefficiency in the pruning algorithm where it would always use the genesis checkpoint as the `old_finalized_checkpoint` when running for the first time after start-up. This would result in loading lots of states one-at-a-time back to genesis, and storing a lot of block roots in memory. The new code stores the old finalized checkpoint on disk and only uses genesis if no checkpoint is already stored. This makes it both backwards compatible _and_ forwards compatible -- no schema change required! * Introduce two new `INFO` logs to indicate when pruning has started and completed. Users seem to want to know this information without enabling debug logs! 2020-11-09 07:02:21 +00:00			`pub const PRUNING_CHECKPOINT_KEY: Hash256 = Hash256::repeat_byte(3);`
Add database schema versioning (#1688) ## Issue Addressed Closes #673 ## Proposed Changes Store a schema version in the database so that future releases can check they're running against a compatible database version. This would also enable automatic migration on breaking database changes, but that's left as future work. The database config is also stored in the database so that the `slots_per_restore_point` value can be checked for consistency, which closes #673 2020-09-30 02:36:07 +00:00
			`#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]`
			`pub struct SchemaVersion(pub u64);`

Implement database temp states to reduce memory usage (#1798) ## Issue Addressed Closes #800 Closes #1713 ## Proposed Changes Implement the temporary state storage algorithm described in #800. Specifically: * Add `DBColumn::BeaconStateTemporary`, for storing 0-length temporary marker values. * Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully. * Add a garbage collection process to delete leftover temporary states on start-up. * Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784) ## Additional Info There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant. ### Race 1: Permanent state marked temporary EDIT: this has been fixed by the addition of a lock around the relevant critical section There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events: 1. Thread 1 checks if state `s` already exists, and seeing that it doesn't, prepares an atomic commit of `(s, s_temporary_flag)`. 2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and _deleting_ the temporary flag. 3. Thread 1 is (finally) scheduled again, and marks `s` as temporary with its transaction. 4. a) The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state `s` will fail... hope it isn't on the main chain! Alternatively (4b) happens... b) Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state `s` will disappear temporarily, but will come back once thread 1 finishes running. I _hope_ that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. [LMDB only allows one R/W transaction at a time](https://docs.rs/lmdb/0.8.0/lmdb/struct.Environment.html#method.begin_rw_txn). ### Race 2: Temporary state returned from `get_state` I don't think this race really matters, but in `load_hot_state`, if another thread stores a state between when we call `load_state_temporary_flag` and when we call `load_hot_state_summary`, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than _not_ having data). This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now. 2020-10-23 01:27:51 +00:00			`impl SchemaVersion {`
			`pub fn as_u64(self) -> u64 {`
			`self.0`
			`}`
			`}`

Add database schema versioning (#1688) ## Issue Addressed Closes #673 ## Proposed Changes Store a schema version in the database so that future releases can check they're running against a compatible database version. This would also enable automatic migration on breaking database changes, but that's left as future work. The database config is also stored in the database so that the `slots_per_restore_point` value can be checked for consistency, which closes #673 2020-09-30 02:36:07 +00:00			`impl StoreItem for SchemaVersion {`
			`fn db_column() -> DBColumn {`
			`DBColumn::BeaconMeta`
			`}`

			`fn as_store_bytes(&self) -> Vec<u8> {`
			`self.0.as_ssz_bytes()`
			`}`

			`fn from_store_bytes(bytes: &[u8]) -> Result<Self, Error> {`
			`Ok(SchemaVersion(u64::from_ssz_bytes(bytes)?))`
			`}`
			`}`
Compact database on finalization (#1871) ## Issue Addressed Closes #1866 ## Proposed Changes * Compact the database on finalization. This removes the deleted states from disk completely. Because it happens in the background migrator, it doesn't block other database operations while it runs. On my Medalla node it took about 1 minute and shrank the database from 90GB to 9GB. * Fix an inefficiency in the pruning algorithm where it would always use the genesis checkpoint as the `old_finalized_checkpoint` when running for the first time after start-up. This would result in loading lots of states one-at-a-time back to genesis, and storing a lot of block roots in memory. The new code stores the old finalized checkpoint on disk and only uses genesis if no checkpoint is already stored. This makes it both backwards compatible _and_ forwards compatible -- no schema change required! * Introduce two new `INFO` logs to indicate when pruning has started and completed. Users seem to want to know this information without enabling debug logs! 2020-11-09 07:02:21 +00:00
			`/// The checkpoint used for pruning the database.`
			`///`
			`/// Updated whenever pruning is successful.`
			`#[derive(Debug, Clone, Copy, PartialEq, Eq)]`
			`pub struct PruningCheckpoint {`
			`pub checkpoint: Checkpoint,`
			`}`

			`impl StoreItem for PruningCheckpoint {`
			`fn db_column() -> DBColumn {`
			`DBColumn::BeaconMeta`
			`}`

			`fn as_store_bytes(&self) -> Vec<u8> {`
			`self.checkpoint.as_ssz_bytes()`
			`}`

			`fn from_store_bytes(bytes: &[u8]) -> Result<Self, Error> {`
			`Ok(PruningCheckpoint {`
			`checkpoint: Checkpoint::from_ssz_bytes(bytes)?,`
			`})`
			`}`
			`}`