lighthouse/beacon_node/beacon_chain/src/schema_change
ethDreamer e8604757a2 Deposit Cache Finalization & Fast WS Sync (#2915)
## Summary

The deposit cache now has the ability to finalize deposits. This will cause it to drop unneeded deposit logs and hashes in the deposit Merkle tree that are no longer required to construct deposit proofs. The cache is finalized whenever the latest finalized checkpoint has a new `Eth1Data` with all deposits imported.

This has three benefits:

1. Improves the speed of constructing Merkle proofs for deposits as we can just replay deposits since the last finalized checkpoint instead of all historical deposits when re-constructing the Merkle tree.
2. Significantly faster weak subjectivity sync as the deposit cache can be transferred to the newly syncing node in compressed form. The Merkle tree that stores `N` finalized deposits requires a maximum of `log2(N)` hashes. The newly syncing node then only needs to download deposits since the last finalized checkpoint to have a full tree.
3. Future proofing in preparation for [EIP-4444](https://eips.ethereum.org/EIPS/eip-4444) as execution nodes will no longer be required to store logs permanently so we won't always have all historical logs available to us.

## More Details

Image to illustrate how the deposit contract merkle tree evolves and finalizes along with the resulting `DepositTreeSnapshot`
![image](https://user-images.githubusercontent.com/37123614/151465302-5fc56284-8a69-4998-b20e-45db3934ac70.png)

## Other Considerations

I've changed the structure of the `SszDepositCache` so once you load & save your database from this version of lighthouse, you will no longer be able to load it from older versions.

Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>
2022-10-30 04:04:24 +00:00
..
migration_schema_v6.rs v1.1.6 Fork Choice changes (#2822) 2021-12-13 20:43:22 +00:00
migration_schema_v7.rs New rust lints for rustc 1.64.0 (#3602) 2022-09-23 03:52:46 +00:00
migration_schema_v8.rs Separate execution payloads in the DB (#3157) 2022-05-12 00:42:17 +00:00
migration_schema_v9.rs Separate execution payloads in the DB (#3157) 2022-05-12 00:42:17 +00:00
migration_schema_v10.rs Realized unrealized experimentation (#3322) 2022-07-25 23:53:26 +00:00
migration_schema_v11.rs Remove equivocating validators from fork choice (#3371) 2022-07-28 09:43:41 +00:00
migration_schema_v12.rs Refactor op pool for speed and correctness (#3312) 2022-08-29 09:10:26 +00:00
migration_schema_v13.rs Deposit Cache Finalization & Fast WS Sync (#2915) 2022-10-30 04:04:24 +00:00
README.md v1.1.6 Fork Choice changes (#2822) 2021-12-13 20:43:22 +00:00
types.rs Realized unrealized experimentation (#3322) 2022-07-25 23:53:26 +00:00

Database Schema Migrations

This document is an attempt to record some best practices and design conventions for applying database schema migrations within Lighthouse.

General Structure

If you make a breaking change to an on-disk data structure you need to increment the SCHEMA_VERSION in beacon_node/store/src/metadata.rs and add a migration from the previous version to the new version.

The entry-point for database migrations is in schema_change.rs, not migrate.rs (which deals with finalization). Supporting code for a specific migration may be added in schema_change/migration_schema_vX.rs, where X is the version being migrated to.

Combining Schema Changes

Schema changes may be combined if they are part of the same pull request to unstable. Once a schema version is defined in unstable we should not apply changes to it without incrementing the version. This prevents conflicts between versions that appear to be the same. This allows us to deploy unstable to nodes without having to worry about needing to resync because of a sneaky schema change.

Changing the on-disk structure for a version before it is merged to unstable is OK. You will just have to handle manually resyncing any test nodes (use checkpoint sync).

Naming Conventions

Prefer to name versions of structs by the version at which the change was introduced. For example if you add a field to Foo in v9, call the previous version FooV1 (assuming this is Foo's first migration) and write a schema change that migrates from FooV1 to FooV9.

Prefer to use explicit version names in schema_change.rs and the schema_change module. To interface with the outside either:

  1. Define a type alias to the latest version, e.g. pub type Foo = FooV9, or
  2. Define a mapping from the latest version to the version used elsewhere, e.g.
    impl From<FooV9> for Foo {}
    

Avoid names like:

  • LegacyFoo
  • OldFoo
  • FooWithoutX

First-version vs Last-version

Previously the schema migration code would name types by the last version at which they were valid. For example if Foo changed in V9 then we would name the two variants FooV8 and FooV9. The problem with this scheme is that if Foo changes again in the future at say v12 then FooV9 would need to be renamed to FooV11, which is annoying. Using the first valid version as described above does not have this issue.

Using SuperStruct

If possible, consider using superstruct to handle data structure changes between versions.

  • Use superstruct(no_enum) to avoid generating an unnecessary top-level enum.

Example

A field is added to Foo in v9, and there are two variants: FooV1 and FooV9. There is a migration from FooV1 to FooV9. Foo is aliased to FooV9.

Some time later another field is added to Foo in v12. A new FooV12 is created, along with a migration from FooV9 to FooV12. The primary Foo type gets re-aliased to FooV12. The previous migration from V1 to V9 shouldn't break because the schema migration refers to FooV9 explicitly rather than Foo. Due to the re-aliasing (or re-mapping) the compiler will check every usage of Foo to make sure that it still makes sense with FooV12.