Commit Graph

28 Commits

Author SHA1 Message Date
Michael Sproul
6f442f2bb8
Improve database compaction and prune-states (#5142)
* Fix no-op state prune check

* Compact freezer DB after pruning

* Refine DB compaction

* Add blobs-db options to inspect/compact

* Better key size

* Fix compaction end key
2024-02-08 10:05:08 +00:00
Jimmy Chen
39e9f7dc6b
Fix Rust beta compiler errors (1.77) (#5180)
* Lint fixes

* More fixes for beta compiler.

* Format fixes

* Move `#[allow(dead_code)]` to field level.

* Remove old comment.

* Update beacon_node/execution_layer/src/test_utils/mod.rs

Co-authored-by: João Oliveira <hello@jxs.pt>

* remove duplicate line
2024-02-05 17:54:11 +00:00
Michael Sproul
051c3e842f
Always use a separate database for blobs (#4892)
* Always use a separate blobs DB

* Add + update tests
2023-11-09 16:51:36 +11:00
Jimmy Chen
36d8849813 Add commmand for pruning states (#4835)
## Issue Addressed

Closes #4481. 

(Continuation of #4648)

## Proposed Changes

- [x] Add `lighthouse db prune-states`
- [x] Make it work
- [x] Ensure block roots are handled correctly (to be addressed in 4735)
- [x] Check perf on mainnet/Goerli/Gnosis (takes a few seconds max)
- [x] Run block root healing logic (#4875 ) at the beginning
- [x] Add some tests
- [x] Update docs
- [x] Add `--freezer` flag and other improvements to `lighthouse db inspect`

Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io>
Co-authored-by: Michael Sproul <micsproul@gmail.com>
2023-11-03 00:12:19 +00:00
Michael Sproul
c574f8136e Fix block backfill with genesis skip slots (#4820)
## Issue Addressed

Closes #4817.

## Proposed Changes

- Fill in the linear block roots array between 0 and the slot of the first block (e.g. slots 0 and 1 on Holesky).
- Backport the `--freezer`, `--skip` and `--limit` options for `lighthouse db inspect` from tree-states. This allows us to easily view the database corruption of 4817 using `lighthouse db inspect --network holesky --freezer --column bbr --output values --limit 2`.
- Backport the `iter_column_from` change and `MemoryStore` overhaul from tree-states. These are required to enable `lighthouse db inspect`.
- Rework `freezer_upper_limit` to allow state lookups for slots below the `state_lower_limit`. Currently state lookups will fail until state reconstruction completes entirely.

There is a new regression test for the main bug, but no test for the `freezer_upper_limit` fix because we don't currently support running state reconstruction partially (see #3026). This will be fixed once we merge `tree-states`! In lieu of an automated test, I've tested manually on a Holesky node while it was reconstructing.

## Additional Info

Users who backfilled Holesky to slot 0 (e.g. using `--reconstruct-historic-states`) need to either:

- Re-sync from genesis.
- Re-sync using checkpoint sync and the changes from this PR.

Due to the recency of the Holesky genesis, writing a custom pass to fix up broken databases (which would require its own thorough testing) was deemed unnecessary. This is the primary reason for this PR being marked `backwards-incompat`.

This will create few conflicts with Deneb, which I've already resolved on `tree-states-deneb` and will be happy to backport to Deneb once this PR is merged to unstable.
2023-10-27 05:08:49 +00:00
realbigsean
7d468cb487
More deneb cleanup (#4640)
* remove protoc and token from network tests github action

* delete unused beacon chain methods

* downgrade writing blobs to store log

* reduce diff in block import logic

* remove some todo's and deneb built in network

* remove unnecessary error, actually use some added metrics

* remove some metrics, fix missing components on publish funcitonality

* fix status tests

* rename sidecar by root to blobs by root

* clean up some metrics

* remove unnecessary feature gate from attestation subnet tests, clean up blobs by range response code

* pawan's suggestion in `protocol_info`, peer score in matching up batch sync block and blobs

* fix range tests for deneb

* pub block and blob db cache behind the same mutex

* remove unused errs and an empty file

* move sidecar trait to new file

* move types from payload to eth2 crate

* update comment and add flag value name

* make function private again, remove allow unused

* use reth rlp for tx decoding

* fix compile after merge

* rename kzg commitments

* cargo fmt

* remove unused dep

* Update beacon_node/execution_layer/src/lib.rs

Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>

* Update beacon_node/beacon_processor/src/lib.rs

Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>

* pawan's suggestiong for vec capacity

* cargo fmt

* Revert "use reth rlp for tx decoding"

This reverts commit 5181837d81c66dcca4c960a85989ac30c7f806e2.

* remove reth rlp

---------

Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
2023-08-20 21:17:17 -04:00
realbigsean
6adb68c17a
fix compile after merge 2023-06-02 12:10:01 -04:00
realbigsean
a227959298
Merge branch 'unstable' of https://github.com/sigp/lighthouse into deneb-free-blobs 2023-06-02 11:57:15 -04:00
Jimmy Chen
88abaaae05 Add db inspect --output values option to support dumping raw db values (#4324)
## Issue Addressed

Add a new `--output values` option to `db inspect` for dumping raw database values to SSZ files.

This could be useful for inspecting the database when we're unable to start the beacon node.

Example usage:

```
# Output the `ForkChoice` column to an SSZ file
lighthouse db inspect --column frk --output values
```

By default, it stores the output files in the current directory, and can be overriden with `--ouput-dir`.

List of columns can be found here:
c547a11b0d/beacon_node/store/src/lib.rs (L169-L216)
2023-05-30 01:38:48 +00:00
Emilia Hane
1300fb7ffa
Fix conflicts from rebasing eip4844 2023-02-09 10:37:11 +01:00
Emilia Hane
f971f3a3a2
Fix rebase conflicts 2023-02-09 07:41:38 +01:00
Emilia Hane
f8c3e7fc91
Lint fix 2023-02-09 07:36:11 +01:00
Emilia Hane
625980e484
Fix rebase conflicts 2023-02-09 07:36:07 +01:00
Emilia Hane
f9737628fc
Store blobs in separate freezer or historical state freezer 2023-02-09 07:34:59 +01:00
Emilia Hane
8f137df02e
fixup! Allow user to set an epoch margin for pruning 2023-02-08 11:44:43 +01:00
Emilia Hane
1812301c9c
Allow user to set an epoch margin for pruning 2023-02-08 11:44:40 +01:00
Emilia Hane
83a9520761
Clarify hybrid blob prune solution and fix error handling 2023-02-08 11:44:38 +01:00
Emilia Hane
7103a257ce
Simplify conceptual design 2023-02-08 11:44:37 +01:00
Emilia Hane
82ffec378a
Fix typo
Co-authored-by: realbigsean <seananderson33@GMAIL.com>
2023-02-08 11:44:32 +01:00
Emilia Hane
fe0c911402
Plug in pruning of blobs into app 2023-02-08 11:44:30 +01:00
Akihito Nakano
8a36acdb1a Super small improvement: Remove unnecessary mut (#3736)
## Issue Addressed

<!--Which issue # does this PR address?-->

Removed some unnecessary `mut`. 🙂 

<!--
## Proposed Changes

Please list or describe the changes introduced by this PR.
-->

<!--
## Additional Info

Please provide any additional information. For example, future considerations
or information useful for reviewers.
-->
2022-11-21 03:15:54 +00:00
Age Manning
230168deff Health Endpoints for UI (#3668)
This PR adds some health endpoints for the beacon node and the validator client.

Specifically it adds the endpoint:
`/lighthouse/ui/health`

These are not entirely stable yet. But provide a base for modification for our UI. 

These also may have issues with various platforms and may need modification.
2022-11-15 05:21:26 +00:00
Divma
8600645f65 Fix rust 1.65 lints (#3682)
## Issue Addressed

New lints for rust 1.65

## Proposed Changes

Notable change is the identification or parameters that are only used in recursion

## Additional Info
na
2022-11-04 07:43:43 +00:00
ethDreamer
e8604757a2 Deposit Cache Finalization & Fast WS Sync (#2915)
## Summary

The deposit cache now has the ability to finalize deposits. This will cause it to drop unneeded deposit logs and hashes in the deposit Merkle tree that are no longer required to construct deposit proofs. The cache is finalized whenever the latest finalized checkpoint has a new `Eth1Data` with all deposits imported.

This has three benefits:

1. Improves the speed of constructing Merkle proofs for deposits as we can just replay deposits since the last finalized checkpoint instead of all historical deposits when re-constructing the Merkle tree.
2. Significantly faster weak subjectivity sync as the deposit cache can be transferred to the newly syncing node in compressed form. The Merkle tree that stores `N` finalized deposits requires a maximum of `log2(N)` hashes. The newly syncing node then only needs to download deposits since the last finalized checkpoint to have a full tree.
3. Future proofing in preparation for [EIP-4444](https://eips.ethereum.org/EIPS/eip-4444) as execution nodes will no longer be required to store logs permanently so we won't always have all historical logs available to us.

## More Details

Image to illustrate how the deposit contract merkle tree evolves and finalizes along with the resulting `DepositTreeSnapshot`
![image](https://user-images.githubusercontent.com/37123614/151465302-5fc56284-8a69-4998-b20e-45db3934ac70.png)

## Other Considerations

I've changed the structure of the `SszDepositCache` so once you load & save your database from this version of lighthouse, you will no longer be able to load it from older versions.

Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>
2022-10-30 04:04:24 +00:00
Michael Sproul
ca42ef2e5a Prune finalized execution payloads (#3565)
## Issue Addressed

Closes https://github.com/sigp/lighthouse/issues/3556

## Proposed Changes

Delete finalized execution payloads from the database in two places:

1. When running the finalization migration in `migrate_database`. We delete the finalized payloads between the last split point and the new updated split point. _If_ payloads are already pruned prior to this then this is sufficient to prune _all_ payloads as non-canonical payloads are already deleted by the head pruner, and all canonical payloads prior to the previous split will already have been pruned.
2. To address the fact that users will update to this code _after_ the merge on mainnet (and testnets), we need a one-off scan to delete the finalized payloads from the canonical chain. This is implemented in `try_prune_execution_payloads` which runs on startup and scans the chain back to the Bellatrix fork or the anchor slot (if checkpoint synced after Bellatrix). In the case where payloads are already pruned this check only imposes a single state load for the split state, which shouldn't be _too slow_. Even so, a flag `--prepare-payloads-on-startup=false` is provided to turn this off after it has run the first time, which provides faster start-up times.

There is also a new `lighthouse db prune_payloads` subcommand for users who prefer to run the pruning manually.

## Additional Info

The tests have been updated to not rely on finalized payloads in the database, instead using the `MockExecutionLayer` to reconstruct them. Additionally a check was added to `check_chain_dump` which asserts the non-existence or existence of payloads on disk depending on their slot.
2022-09-17 02:27:01 +00:00
Paul Hauner
be4e261e74 Use async code when interacting with EL (#3244)
## Overview

This rather extensive PR achieves two primary goals:

1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.

Additionally, it achieves:

- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
    - I had to do this to deal with sending blocks into spawned tasks.
    - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
    - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
    - Avoids cloning *all the blocks* in *every chain segment* during sync.
    - It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough 😅)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.

For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273

## Changes to `canonical_head` and `fork_choice`

Previously, the `BeaconChain` had two separate fields:

```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```

Now, we have grouped these values under a single struct:

```
canonical_head: CanonicalHead {
  cached_head: RwLock<Arc<Snapshot>>,
  fork_choice: RwLock<BeaconForkChoice>
} 
```

Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.

## Breaking Changes

### The `state` (root) field in the `finalized_checkpoint` SSE event

Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:

1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.

Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](de2b2801c8/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java (L171-L182)) it uses [`getStateRootFromBlockRoot`](de2b2801c8/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java (L336-L341)) which uses (1).

I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.

## Notes for Reviewers

I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.

I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".

I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.

I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.

Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.

You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.

I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.

Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
Michael Sproul
375e2b49b3 Conserve disk space by raising default SPRP (#3137)
## Proposed Changes

Increase the default `--slots-per-restore-point` to 8192 for a 4x reduction in freezer DB disk usage.

Existing nodes that use the previous default of 2048 will be left unchanged. Newly synced nodes (with or without checkpoint sync) will use the new 8192 default. 

Long-term we could do away with the freezer DB entirely for validator-only nodes, but this change is much simpler and grants us some extra space in the short term. We can also roll it out gradually across our nodes by purging databases one by one, while keeping the Ansible config the same.

## Additional Info

We ignore a change from 2048 to 8192 if the user hasn't set the 8192 explicitly. We fire a debug log in the case where we do ignore:

```
DEBG Ignoring slots-per-restore-point config in favour of on-disk value, on_disk: 2048, config: 8192
```
2022-04-01 07:16:25 +00:00
Michael Sproul
41e7a07c51 Add lighthouse db command (#3129)
## Proposed Changes

Add a `lighthouse db` command with three initial subcommands:

- `lighthouse db version`: print the database schema version.
- `lighthouse db migrate --to N`: manually upgrade (or downgrade!) the database to a different version.
- `lighthouse db inspect --column C`: log the key and size in bytes of every value in a given `DBColumn`.

This PR lays the groundwork for other changes, namely:

- Mark's fast-deposit sync (https://github.com/sigp/lighthouse/pull/2915), for which I think we should implement a database downgrade (from v9 to v8).
- My `tree-states` work, which already implements a downgrade (v10 to v8).
- Standalone purge commands like `lighthouse db purge-dht` per https://github.com/sigp/lighthouse/issues/2824.

## Additional Info

I updated the `strum` crate to 0.24.0, which necessitated some changes in the network code to remove calls to deprecated methods.

Thanks to @winksaville for the motivation, and implementation work that I used as a source of inspiration (https://github.com/sigp/lighthouse/pull/2685).
2022-04-01 00:58:59 +00:00