From 4632e9ce52c90e12e94bfc60c8f77987a9904054 Mon Sep 17 00:00:00 2001 From: Michael Sproul Date: Wed, 15 Jan 2020 15:36:12 +1100 Subject: [PATCH] Document the freezer DB space-time trade-off (#808) --- book/src/SUMMARY.md | 2 ++ book/src/advanced.md | 9 ++++++ book/src/advanced_database.md | 60 +++++++++++++++++++++++++++++++++++ 3 files changed, 71 insertions(+) create mode 100644 book/src/advanced.md create mode 100644 book/src/advanced_database.md diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md index 8253cd063..df027e1b0 100644 --- a/book/src/SUMMARY.md +++ b/book/src/SUMMARY.md @@ -16,5 +16,7 @@ * [/consensus](./http_consensus.md) * [/network](./http_network.md) * [WebSocket](./websockets.md) +* [Advanced Usage](./advanced.md) + * [Database Configuration](./advanced_database.md) * [Contributing](./contributing.md) * [Development Environment](./setup.md) diff --git a/book/src/advanced.md b/book/src/advanced.md new file mode 100644 index 000000000..d46cae699 --- /dev/null +++ b/book/src/advanced.md @@ -0,0 +1,9 @@ +# Advanced Usage + +Want to get into the nitty-gritty of Lighthouse configuration? Looking for something not covered +elsewhere? + +This section provides detailed information about configuring Lighthouse for specific use cases, and +tips about how things work under the hood. + +* [Advanced Database Configuration](./advanced_database.md): understanding space-time trade-offs in the database. diff --git a/book/src/advanced_database.md b/book/src/advanced_database.md new file mode 100644 index 000000000..076b66ba3 --- /dev/null +++ b/book/src/advanced_database.md @@ -0,0 +1,60 @@ +# Database Configuration + +Lighthouse uses an efficient "split" database schema, whereby finalized states are stored separately +from recent, unfinalized states. We refer to the portion of the database storing finalized states as +the _freezer_ or _cold DB_, and the portion storing recent states as the _hot DB_. + +In both the hot and cold DBs, full `BeaconState` data structures are only stored periodically, and +intermediate states are reconstructed by quickly replaying blocks on top of the nearest state. For +example, to fetch a state at slot 7 the database might fetch a full state from slot 0, and replay +blocks from slots 1-7 while omitting redundant signature checks and Merkle root calculations. The +full states upon which blocks are replayed are referred to as _restore points_ in the case of the +freezer DB, and _epoch boundary states_ in the case of the hot DB. + +The frequency at which the hot database stores full `BeaconState`s is fixed to one-state-per-epoch +in order to keep loads of recent states performant. For the freezer DB, the frequency is +configurable via the `--slots-per-restore-point` CLI flag, which is the topic of the next section. + +## Freezer DB Space-time Trade-offs + +Frequent restore points use more disk space but accelerate the loading of historical states. +Conversely, infrequent restore points use much less space, but cause the loading of historical +states to slow down dramatically. A lower _slots per restore point_ value (SPRP) corresponds to more +frequent restore points, while a higher SPRP corresponds to less frequent. The table below shows +some example values. + +| Use Case | SPRP | Yearly Disk Usage | Load Historical State | +| ---------------------- | -------------- | ----------------- | --------------------- | +| Block explorer/analysis | 32 | 411 GB | 96 ms | +| Default | 2048 | 6.4 GB | 6 s | +| Validator only | 8192 | 1.6 GB | 25 s | + +As you can see, it's a high-stakes trade-off! The relationships to disk usage and historical state +load time are both linear – doubling SPRP halves disk usage and doubles load time. The minimum SPRP +is 32, and the maximum is 8192. + +The values shown in the table are approximate, calculated using a simple heuristic: each +`BeaconState` consumes around 5MB of disk space, and each block replayed takes around 3ms. The +**Yearly Disk Usage** column shows the approx size of the freezer DB _alone_ (hot DB not included), +and the **Load Historical State** time is the worst-case load time for a state in the last slot of +an epoch. + +To configure your Lighthouse node's database with a non-default SPRP, run your Beacon Node with +the `--slots-per-restore-point` flag: + +```bash +lighthouse beacon_node --slots-per-restore-point 8192 +``` + +## Glossary + +* _Freezer DB_: part of the database storing finalized states. States are stored in a sparser + format, and usually less frequently than in the hot DB. +* _Cold DB_: see _Freezer DB_. +* _Hot DB_: part of the database storing recent states, all blocks, and other runtime data. Full + states are stored every epoch. +* _Restore Point_: a full `BeaconState` stored periodically in the freezer DB. +* _Slots Per Restore Point (SPRP)_: the number of slots between restore points in the freezer DB. +* _Split Slot_: the slot at which states are divided between the hot and the cold DBs. All states + from slots less than the split slot are in the freezer, while all states with slots greater than + or equal to the split slot are in the hot DB.