lighthouse/beacon_node/beacon_chain/src/schema_change/README.md

Database Schema Migrations
====

This document is an attempt to record some best practices and design conventions for applying
database schema migrations within Lighthouse.

## General Structure

If you make a breaking change to an on-disk data structure you need to increment the
`SCHEMA_VERSION` in `beacon_node/store/src/metadata.rs` and add a migration from the previous
version to the new version.

The entry-point for database migrations is in `schema_change.rs`, _not_ `migrate.rs` (which deals
with finalization). Supporting code for a specific migration may be added in
`schema_change/migration_schema_vX.rs`, where `X` is the version being migrated _to_.

## Combining Schema Changes

Schema changes may be combined if they are part of the same pull request to
`unstable`. Once a schema version is defined in `unstable` we should not apply changes to it
without incrementing the version. This prevents conflicts between versions that appear to be the
same. This allows us to deploy `unstable` to nodes without having to worry about needing to resync
because of a sneaky schema change.

Changing the on-disk structure for a version _before_ it is merged to `unstable` is OK. You will
just have to handle manually resyncing any test nodes (use checkpoint sync).

## Naming Conventions

Prefer to name versions of structs by _the version at which the change was introduced_. For example
if you add a field to `Foo` in v9, call the previous version `FooV1` (assuming this is `Foo`'s first
migration) and write a schema change that migrates from `FooV1` to `FooV9`.

Prefer to use explicit version names in `schema_change.rs` and the `schema_change` module. To
interface with the outside either:

1. Define a type alias to the latest version, e.g. `pub type Foo = FooV9`, or
2. Define a mapping from the latest version to the version used elsewhere, e.g.
   ```rust
   impl From<FooV9> for Foo {}
   ```

Avoid names like:

* `LegacyFoo`
* `OldFoo`
* `FooWithoutX`

## First-version vs Last-version

Previously the schema migration code would name types by the _last_ version at which they were
valid. For example if `Foo` changed in `V9` then we would name the two variants `FooV8` and `FooV9`.
The problem with this scheme is that if `Foo` changes again in the future at say v12 then `FooV9` would
need to be renamed to `FooV11`, which is annoying. Using the _first_ valid version as described
above does not have this issue.

## Using SuperStruct

If possible, consider using [`superstruct`](https://crates.io/crates/superstruct) to handle data
structure changes between versions.

* Use `superstruct(no_enum)` to avoid generating an unnecessary top-level enum.

## Example

A field is added to `Foo` in v9, and there are two variants: `FooV1` and `FooV9`. There is a
migration from `FooV1` to `FooV9`. `Foo` is aliased to `FooV9`.

Some time later another field is added to `Foo` in v12. A new `FooV12` is created, along with a
migration from `FooV9` to `FooV12`. The primary `Foo` type gets re-aliased to `FooV12`. The previous
migration from V1 to V9 shouldn't break because the schema migration refers to `FooV9` explicitly
rather than `Foo`. Due to the re-aliasing (or re-mapping) the compiler will check every usage
of `Foo` to make sure that it still makes sense with `FooV12`.
v1.1.6 Fork Choice changes (#2822) ## Issue Addressed Resolves: https://github.com/sigp/lighthouse/issues/2741 Includes: https://github.com/sigp/lighthouse/pull/2853 so that we can get ssz static tests passing here on v1.1.6. If we want to merge that first, we can make this diff slightly smaller ## Proposed Changes - Changes the `justified_epoch` and `finalized_epoch` in the `ProtoArrayNode` each to an `Option<Checkpoint>`. The `Option` is necessary only for the migration, so not ideal. But does allow us to add a default logic to `None` on these fields during the database migration. - Adds a database migration from a legacy fork choice struct to the new one, search for all necessary block roots in fork choice by iterating through blocks in the db. - updates related to https://github.com/ethereum/consensus-specs/pull/2727 - We will have to update the persisted forkchoice to make sure the justified checkpoint stored is correct according to the updated fork choice logic. This boils down to setting the forkchoice store's justified checkpoint to the justified checkpoint of the block that advanced the finalized checkpoint to the current one. - AFAICT there's no migration steps necessary for the update to allow applying attestations from prior blocks, but would appreciate confirmation on that - I updated the consensus spec tests to v1.1.6 here, but they will fail until we also implement the proposer score boost updates. I confirmed that the previously failing scenario `new_finalized_slot_is_justified_checkpoint_ancestor` will now pass after the boost updates, but haven't confirmed _all_ tests will pass because I just quickly stubbed out the proposer boost test scenario formatting. - This PR now also includes proposer boosting https://github.com/ethereum/consensus-specs/pull/2730 ## Additional Info I realized checking justified and finalized roots in fork choice makes it more likely that we trigger this bug: https://github.com/ethereum/consensus-specs/pull/2727 It's possible the combination of justified checkpoint and finalized checkpoint in the forkchoice store is different from in any block in fork choice. So when trying to startup our store's justified checkpoint seems invalid to the rest of fork choice (but it should be valid). When this happens we get an `InvalidBestNode` error and fail to start up. So I'm including that bugfix in this branch. Todo: - [x] Fix fork choice tests - [x] Self review - [x] Add fix for https://github.com/ethereum/consensus-specs/pull/2727 - [x] Rebase onto Kintusgi - [x] Fix `num_active_validators` calculation as @michaelsproul pointed out - [x] Clean up db migrations Co-authored-by: realbigsean <seananderson33@gmail.com> 2021-12-13 20:43:22 +00:00			`Database Schema Migrations`
			`====`

			`This document is an attempt to record some best practices and design conventions for applying`
			`database schema migrations within Lighthouse.`

			`## General Structure`

			`If you make a breaking change to an on-disk data structure you need to increment the`
			`SCHEMA_VERSION` in `beacon_node/store/src/metadata.rs` and add a migration from the previous
			`version to the new version.`

			The entry-point for database migrations is in `schema_change.rs`, _not_ `migrate.rs` (which deals
			`with finalization). Supporting code for a specific migration may be added in`
			`schema_change/migration_schema_vX.rs`, where `X` is the version being migrated _to_.

			`## Combining Schema Changes`

			`Schema changes may be combined if they are part of the same pull request to`
			`unstable`. Once a schema version is defined in `unstable` we should not apply changes to it
			`without incrementing the version. This prevents conflicts between versions that appear to be the`
			same. This allows us to deploy `unstable` to nodes without having to worry about needing to resync
			`because of a sneaky schema change.`

			Changing the on-disk structure for a version _before_ it is merged to `unstable` is OK. You will
			`just have to handle manually resyncing any test nodes (use checkpoint sync).`

			`## Naming Conventions`

			`Prefer to name versions of structs by _the version at which the change was introduced_. For example`
			if you add a field to `Foo` in v9, call the previous version `FooV1` (assuming this is `Foo`'s first
			migration) and write a schema change that migrates from `FooV1` to `FooV9`.

			Prefer to use explicit version names in `schema_change.rs` and the `schema_change` module. To
			`interface with the outside either:`

			1. Define a type alias to the latest version, e.g. `pub type Foo = FooV9`, or
			`2. Define a mapping from the latest version to the version used elsewhere, e.g.`
			```rust
			`impl From<FooV9> for Foo {}`
			```

			`Avoid names like:`

			* `LegacyFoo`
			* `OldFoo`
			* `FooWithoutX`

			`## First-version vs Last-version`

			`Previously the schema migration code would name types by the _last_ version at which they were`
			valid. For example if `Foo` changed in `V9` then we would name the two variants `FooV8` and `FooV9`.
			The problem with this scheme is that if `Foo` changes again in the future at say v12 then `FooV9` would
			need to be renamed to `FooV11`, which is annoying. Using the _first_ valid version as described
			`above does not have this issue.`

			`## Using SuperStruct`

			If possible, consider using [`superstruct`](https://crates.io/crates/superstruct) to handle data
			`structure changes between versions.`

			* Use `superstruct(no_enum)` to avoid generating an unnecessary top-level enum.

			`## Example`

			A field is added to `Foo` in v9, and there are two variants: `FooV1` and `FooV9`. There is a
			migration from `FooV1` to `FooV9`. `Foo` is aliased to `FooV9`.

			Some time later another field is added to `Foo` in v12. A new `FooV12` is created, along with a
			migration from `FooV9` to `FooV12`. The primary `Foo` type gets re-aliased to `FooV12`. The previous
			migration from V1 to V9 shouldn't break because the schema migration refers to `FooV9` explicitly
			rather than `Foo`. Due to the re-aliasing (or re-mapping) the compiler will check every usage
			of `Foo` to make sure that it still makes sense with `FooV12`.