Commit Graph

75 Commits

Author SHA1 Message Date
Michael Sproul
a60ab4eff2 Refine compaction (#1916)
## Proposed Changes

In an attempt to fix OOM issues and database consistency issues observed by some users after the introduction of compaction in v0.3.4, this PR makes the following changes:

* Run compaction less often: roughly every 1024 epochs, including after long periods of non-finality. I think the division check proposed by Paul is pretty solid, and ensures we don't miss any events where we should be compacting. LevelDB lacks an easy way to check the size of the DB, which would be another good trigger.
* Make it possible to disable the compaction on finalization using `--auto-compact-db=false`
* Make it possible to trigger a manual, single-threaded foreground compaction on start-up using `--compact-db`
* Downgrade the pruning log to `DEBUG`, as it's particularly noisy during sync

I would like to ship these changes to affected users ASAP, and will document them further in the Advanced Database section of the book if they prove effective.
2020-11-17 09:10:53 +00:00
Michael Sproul
556190ff46 Compact database on finalization (#1871)
## Issue Addressed

Closes #1866

## Proposed Changes

* Compact the database on finalization. This removes the deleted states from disk completely. Because it happens in the background migrator, it doesn't block other database operations while it runs. On my Medalla node it took about 1 minute and shrank the database from 90GB to 9GB.
* Fix an inefficiency in the pruning algorithm where it would always use the genesis checkpoint as the `old_finalized_checkpoint` when running for the first time after start-up. This would result in loading lots of states one-at-a-time back to genesis, and storing a lot of block roots in memory. The new code stores the old finalized checkpoint on disk and only uses genesis if no checkpoint is already stored. This makes it both backwards compatible _and_ forwards compatible -- no schema change required!
* Introduce two new `INFO` logs to indicate when pruning has started and completed. Users seem to want to know this information without enabling debug logs!
2020-11-09 07:02:21 +00:00
Michael Sproul
acd49d988d Implement database temp states to reduce memory usage (#1798)
## Issue Addressed

Closes #800
Closes #1713

## Proposed Changes

Implement the temporary state storage algorithm described in #800. Specifically:

* Add `DBColumn::BeaconStateTemporary`, for storing 0-length temporary marker values.
* Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully.
* Add a garbage collection process to delete leftover temporary states on start-up.
* Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784)

## Additional Info

There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant.

### Race 1: Permanent state marked temporary

EDIT: this has been fixed by the addition of a lock around the relevant critical section

There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events:

1. Thread 1 checks if state `s` already exists, and seeing that it doesn't, prepares an atomic commit of `(s, s_temporary_flag)`.
2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and _deleting_ the temporary flag.
3. Thread 1 is (finally) scheduled again, and marks `s` as temporary with its transaction.
4.
    a) The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state `s` will fail... hope it isn't on the main chain! Alternatively (4b) happens...
    b) Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state `s` will disappear temporarily, but will come back once thread 1 finishes running.

I _hope_ that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know

This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. [LMDB only allows one R/W transaction at a time](https://docs.rs/lmdb/0.8.0/lmdb/struct.Environment.html#method.begin_rw_txn).

### Race 2: Temporary state returned from `get_state`

I don't think this race really matters, but in `load_hot_state`, if another thread stores a state between when we call `load_state_temporary_flag` and when we call `load_hot_state_summary`, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than _not_ having data).

This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now.
2020-10-23 01:27:51 +00:00
Michael Sproul
703c33bdc7 Fix head tracker concurrency bugs (#1771)
## Issue Addressed

Closes #1557

## Proposed Changes

Modify the pruning algorithm so that it mutates the head-tracker _before_ committing the database transaction to disk, and _only if_ all the heads to be removed are still present in the head-tracker (i.e. no concurrent mutations).

In the process of writing and testing this I also had to make a few other changes:

* Use internal mutability for all `BeaconChainHarness` functions (namely the RNG and the graffiti), in order to enable parallel calls (see testing section below).
* Disable logging in harness tests unless the `test_logger` feature is turned on

And chose to make some clean-ups:

* Delete the `NullMigrator`
* Remove type-based configuration for the migrator in favour of runtime config (simpler, less duplicated code)
* Use the non-blocking migrator unless the blocking migrator is required. In the store tests we need the blocking migrator because some tests make asserts about the state of the DB after the migration has run.
* Rename `validators_keypairs` -> `validator_keypairs` in the `BeaconChainHarness`

## Testing

To confirm that the fix worked, I wrote a test using [Hiatus](https://crates.io/crates/hiatus), which can be found here:

https://github.com/michaelsproul/lighthouse/tree/hiatus-issue-1557

That test can't be merged because it inserts random breakpoints everywhere, but if you check out that branch you can run the test with:

```
$ cd beacon_node/beacon_chain
$ cargo test --release --test parallel_tests --features test_logger
```

It should pass, and the log output should show:

```
WARN Pruning deferred because of a concurrent mutation, message: this is expected only very rarely!
```

## Additional Info

This is a backwards-compatible change with no impact on consensus.
2020-10-19 05:58:39 +00:00
Michael Sproul
22aedda1be
Add database schema versioning (#1688)
## Issue Addressed

Closes #673

## Proposed Changes

Store a schema version in the database so that future releases can check they're running against a compatible database version. This would also enable automatic migration on breaking database changes, but that's left as future work.

The database config is also stored in the database so that the `slots_per_restore_point` value can be checked for consistency, which closes #673
2020-10-01 11:12:36 +10:00
Adam Szkoda
d9f4819fe0 Alternative (to BeaconChainHarness) BeaconChain testing API (#1380)
The PR:

* Adds the ability to generate a crucial test scenario that isn't possible with `BeaconChainHarness` (i.e. two blocks occupying the same slot; previously forks necessitated skipping slots):

![image](https://user-images.githubusercontent.com/165678/88195404-4bce3580-cc40-11ea-8c08-b48d2e1d5959.png)

* New testing API: Instead of repeatedly calling add_block(), you generate a sorted `Vec<Slot>` and leave it up to the framework to generate blocks at those slots.
* Jumping backwards to an earlier epoch is a hard error, so that tests necessarily generate blocks in a epoch-by-epoch manner.
* Configures the test logger so that output is printed on the console in case a test fails.  The logger also plays well with `--nocapture`, contrary to the existing testing framework
* Rewrites existing fork pruning tests to use the new API
* Adds a tests that triggers finalization at a non epoch boundary slot
* Renamed `BeaconChainYoke` to `BeaconChainTestingRig` because the former has been too confusing
* Fixed multiple tests (e.g. `block_production_different_shuffling_long`, `delete_blocks_and_states`, `shuffling_compatible_simple_fork`) that relied on a weird (and accidental) feature of the old `BeaconChainHarness` that attestations aren't produced for epochs earlier than the current one, thus masking potential bugs in test cases.

Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2020-08-26 09:24:55 +00:00
Michael Sproul
4763f03dcc Fix bug in database pruning (#1564)
## Issue Addressed

Closes #1488

## Proposed Changes

* Prevent the pruning algorithm from over-eagerly deleting states at skipped slots when they are shared with the canonical chain.
* Add `debug` logging to the pruning algorithm so we have so better chance of debugging future issues from logs.
* Modify the handling of the "finalized state" in the beacon chain, so that it's always the state at the first slot of the finalized epoch (previously it was the state at the finalized block). This gives database pruning a clearer and cleaner view of things, and will marginally impact the pruning of the op pool, observed proposers, etc (in ways that are safe as far as I can tell).
* Remove duplicated `RevertedFinalizedEpoch` check from `after_finalization`
* Delete useless and unused `max_finality_distance`
* Add tests that exercise pruning with shared states at skip slots
* Delete unnecessary `block_strategy` argument from `add_blocks` and friends in the test harness (will likely conflict with #1380 slightly, sorry @adaszko -- but we can fix that)
* Bonus: add a `BeaconChain::with_head` method. I didn't end up needing it, but it turned out quite nice, so I figured we could keep it?

## Additional Info

Any users who have experienced pruning errors on Medalla will need to resync after upgrading to a release including this change. This should end unbounded `chain_db` growth! 🎉
2020-08-26 00:01:06 +00:00
Paul Hauner
61d5b592cb Memory usage reduction (#1522)
## Issue Addressed

NA

## Proposed Changes

- Adds a new function to allow getting a state with a bad state root history for attestation verification. This reduces unnecessary tree hashing during attestation processing, which accounted for 23% of memory allocations (by bytes) in a recent `heaptrack` observation.
- Don't clone caches on intermediate epoch-boundary states during block processing.
- Reject blocks that are known to fork choice earlier during gossip processing, instead of waiting until after state has been loaded (this only happens in edge-case).
- Avoid multiple re-allocations by creating a "forced" exact size iterator.

## Additional Info

NA
2020-08-17 08:05:13 +00:00
Adam Szkoda
8a1a4051cf Fix a bug in fork pruning (#1507)
Extracted from https://github.com/sigp/lighthouse/pull/1380 because merging #1380 proves to be contentious.

Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2020-08-12 07:00:00 +00:00
blacktemplar
23a8f31f83 Fix clippy warnings (#1385)
## Issue Addressed

NA

## Proposed Changes

Fixes most clippy warnings and ignores the rest of them, see issue #1388.
2020-07-23 14:18:00 +00:00
Adam Szkoda
c7f47af9fb
Harden the freezing procedure against failures (#1323)
* Enable logging in tests

* Migrate states to the freezer atomically
2020-07-03 09:47:31 +10:00
Adam Szkoda
536728b975
Write new blocks and states to the database atomically (#1285)
* Mostly atomic put_state()
* Reduce number of vec allocations
* Make crucial db operations atomic
* Save restore points
* Remove StateBatch
* Merge two HotColdDB impls
* Further reduce allocations
* Review feedback
* Silence clippy warning
2020-07-01 12:45:57 +10:00
Michael Sproul
305724770d
Bump all spec tags to v0.12.1 (#1275) 2020-06-19 11:18:27 +10:00
Michael Sproul
81c9fe3817
Apply store refactor to new fork choice 2020-06-17 15:20:44 +10:00
Adam Szkoda
9db0c28051
Make key value storage abstractions more accurate (#1267)
* Layer do_atomically() abstractions properly

* Reduce allocs and DRY get_key_for_col()

* Parameterize HotColdDB with hot and cold item stores

* -impl Store for MemoryStore

* Replace Store uses with HotColdDB

* Ditch Store trait

* cargo fmt

* Style fix

* Readd missing dep that broke the build
2020-06-16 11:34:04 +10:00
Adam Szkoda
7f036a6e95
Add error handling to iterators (#1243)
* Add error handling to iterators

* Review feedback

* Leverage itertools::process_results() in few places
2020-06-10 09:55:44 +10:00
Adam Szkoda
ce10db15da
Remove code duplicating stdlib (#1239)
* Get rid of superfluous ReverseBlockRootIterator

* Get rid of superfluous ReverseStateRootIterator and ReverseChainIterator

* cargo fmt
2020-06-02 10:41:42 +10:00
Adam Szkoda
91cb14ac41
Clean up database abstractions (#1200)
* Remove redundant method

* Pull out a method out of a struct

* More precise db access abstractions

* Move fake trait method out of it

* cargo fmt

* Fix compilation error after refactoring

* Move another fake method out the Store trait

* Get rid of superfluous method

* Fix refactoring bug

* Rename: SimpleStoreItem -> StoreItem

* Get rid of the confusing DiskStore type alias

* Get rid of SimpleDiskStore type alias

* Correction: A method took both self and a ref to Self
2020-06-01 08:13:49 +10:00
Adam Szkoda
919c81fe7d
Ditch StoreItem trait (#1185) 2020-05-25 10:26:54 +10:00
Adam Szkoda
d79e07902e
Relax PartialEq constraint on error enums (#1179) 2020-05-21 10:21:44 +10:00
Adam Szkoda
59ead67f76
Race condition fix + Reliability improvements around forks pruning (#1132)
* Improve error handling in block iteration

* Introduce atomic DB operations

* Fix race condition

An invariant was violated:  For every block hash in head_tracker, that
block is accessible from the store.
2020-05-16 13:23:32 +10:00
Thor Kamphefner
01f42a4d17
removed state-cache-size flag from beacon_node/src (#1120)
* removed state-cache-size flag from beacon_node/src
* removed state-cache-size related lines from store/src/config.rs
2020-05-14 22:34:24 +10:00
Adam Szkoda
9c3f76a33b
Prune abandoned forks (#916)
* Address compiler warning

* Prune abandoned fork choice forks

* New approach to pruning

* Wrap some block hashes in a newtype pattern

For increased type safety.

* Add Graphviz chain dump emitter for debugging

* Fix broken test case

* Make prunes_abandoned_forks use real DiskStore

* Mark finalized blocks in the GraphViz output

* Refine debug stringification of Slot and Epoch

Before this commit: print!("{:?}", Slot(123)) == "Slot(\n123\n)".
After this commit: print!("{:?", Slot(123)) == "Slot(123)".

* Simplify build_block()

* Rewrite test case using more composable test primitives

* Working rewritten test case

* Tighten fork prunning test checks

* Add another pruning test case

* Bugfix: Finalized blocks weren't always properly detected

* Pruning: Add pruning_does_not_touch_blocks_prior_to_finalization test case

* Tighten pruning tests: check if heads are tracked properly

* Add a failing test case for a buggy scenario

* Change name of function to a more accurate one

* Fix failing test case

* Test case: Were skipped slots' states pruned?

* Style fix: Simplify dereferencing

* Tighten pruning tests: check if abandoned states are deleted

* Towards atomicity of db ops

* Correct typo

* Prune also skipped slots' states

* New logic for handling skipped states

* Make skipped slots test pass

* Post conflict resolution fixes

* Formatting fixes

* Tests passing

* Block hashes in Graphviz node labels

* Removed unused changes

* Fix bug with states having < SlotsPerHistoricalRoot roots

* Consolidate State/BlockRootsIterator for pruning

* Address review feedback

* Fix a bug in pruning tests

* Detach prune_abandoned_forks() from its object

* Move migrate.rs from store to beacon_chain

* Move forks pruning onto a background thread

* Bugfix: Heads weren't pruned when prune set contained only the head

* Rename: freeze_to_state() -> process_finalization()

* Eliminate redundant function parameter

Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2020-04-20 19:59:56 +10:00
Paul Hauner
2fb6b7c793
Add no-copy block processing cache (#863)
* Add state cache, remove store cache

* Only build the head committee cache

* Fix compile error

* Fix compile error from merge

* Rename state_cache -> checkpoint_cache

* Rename Checkpoint -> Snapshot

* Tidy, add comments

* Tidy up find_head function

* Change some checkpoint -> snapshot

* Add tests

* Expose max_len

* Remove dead code

* Tidy

* Fix bug
2020-04-06 10:53:33 +10:00
Michael Sproul
26bdc2927b
Update to spec v0.11 (#959)
* Update process_final_updates() hysteresis computation

* Update core to v0.11.1

* Bump tags to v0.11.1

* Update docs and deposit contract

* Add compute_fork_digest

* Address review comments

Co-authored-by: Herman Alonso Junge <alonso.junge@gmail.com>
2020-04-01 22:03:03 +11:00
Michael Sproul
6b2e9ff246
Less noisy logs for unaligned finalized blocks (#901) 2020-03-12 12:11:46 +11:00
Paul Hauner
8c5bcfe53a
Optimise beacon chain persistence (#851)
* Unfinished progress

* Update more persistence code

* Start fixing tests

* Combine persist head and fork choice

* Persist head on reorg

* Gracefully handle op pool and eth1 cache missing

* Fix test failure

* Address Michael's comments
2020-03-06 16:09:41 +11:00
Michael Sproul
1f16d8fe4d
Add methods to delete blocks and states from disk (#843)
Closes #833
2020-03-04 16:48:35 +11:00
Paul Hauner
fbb630793e
Attempt to remove a tree hash from block replaying (#862)
* Attempt to remove a tree hash from block replaying

* Add missed thing
2020-03-02 13:40:58 +11:00
Michael Sproul
371e5adcf8
Update to Spec v0.10 (#817)
* Start updating types

* WIP

* Signature hacking

* Existing EF tests passing with fake_crypto

* Updates

* Delete outdated API spec

* The refactor continues

* It compiles

* WIP test fixes

* All release tests passing bar genesis state parsing

* Update and test YamlConfig

* Update to spec v0.10 compatible BLS

* Updates to BLS EF tests

* Add EF test for AggregateVerify

And delete unused hash2curve tests for uncompressed points

* Update EF tests to v0.10.1

* Use optional block root correctly in block proc

* Use genesis fork in deposit domain. All tests pass

* Cargo fmt

* Fast aggregate verify test

* Update REST API docs

* Cargo fmt

* Fix unused import

* Bump spec tags to v0.10.1

* Add `seconds_per_eth1_block` to chainspec

* Update to timestamp based eth1 voting scheme

* Return None from `get_votes_to_consider` if block cache is empty

* Handle overflows in `is_candidate_block`

* Revert to failing tests

* Fix eth1 data sets test

* Choose default vote according to spec

* Fix collect_valid_votes tests

* Fix `get_votes_to_consider` to choose all eligible blocks

* Uncomment winning_vote tests

* Add comments; remove unused code

* Reduce seconds_per_eth1_block for simulation

* Addressed review comments

* Add test for default vote case

* Fix logs

* Remove unused functions

* Meter default eth1 votes

* Fix comments

* Address review comments; remove unused dependency

* Disable/delete two outdated tests

* Bump eth1 default vote warn to error

* Delete outdated eth1 test

Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
2020-02-11 10:19:36 +11:00
Michael Sproul
e0b9fa599f
Add LRU cache to database (#837)
* Add LRU caches to store

* Improvements to LRU caches

* Take state by value in `Store::put_state`

* Store blocks by value, configurable cache sizes

* Use a StateBatch to efficiently store skip states

* Fix store tests

* Add CloneConfig test, remove unused metrics

* Use Mutexes instead of RwLocks for LRU caches
2020-02-10 11:30:21 +11:00
Paul Hauner
b771bbb60c
Add proto_array fork choice (#804)
* Start implementing proto_array

* Add progress

* Add unfinished progress

* Add further progress

* Add progress

* Add tree filtering

* Add half-finished modifications

* Add refactored version

* Tidy, add incomplete LmdGhost impl

* Move impls in LmdGhost trait def

* Remove old reduced_tree fork choice

* Combine two functions in to `compute_deltas`

* Start testing

* Add more compute_deltas tests

* Add fork choice testing

* Add more fork choice testing

* Add more fork choice tests

* Add more testing to proto-array

* Remove old tests

* Modify tests

* Add more tests

* Add more testing

* Add comments and fixes

* Re-organise crate

* Tidy, finish pruning tests

* Add ssz encoding, other pub fns

* Rename lmd_ghost > proto_array_fork_choice

* Integrate proto_array into lighthouse

* Add first pass at fixing filter

* Clean out old comments

* Add more comments

* Attempt to fix prune error

* Adjust TODO

* Fix test compile errors

* Add extra justification change check

* Update cargo.lock

* Fix fork choice test compile errors

* Most remove ffg_update_required

* Fix bug with epoch of attestation votes

* Start adding new test format

* Make fork choice tests declarative

* Create test def concept

* Move test defs into crate

* Add binary, re-org crate

* Shuffle files

* Start adding ffg tests

* Add more fork choice tests

* Add fork choice JSON dumping

* Add more detail to best node error

* Ensure fin+just checkpoints from from same block

* Rename JustificationManager

* Move checkpoint manager into own file

* Tidy

* Add targetted logging for sneaky sync bug

* Fix justified balances bug

* Add cache metrics

* Add metrics for log levels

* Fix bug in checkpoint manager

* Fix compile error in fork choice tests

* Ignore duplicate blocks in fork choice

* Add block to fock choice before db

* Rename on_new_block fn

* Fix spec inconsistency in `CheckpointManager`

* Remove BlockRootTree

* Remove old reduced_tree code fragment

* Add API endpoint for fork choice

* Add more ffg tests

* Remove block_root_tree reminents

* Ensure effective balances are used

* Remove old debugging code, fix API fault

* Add check to ensure parent block is in fork choice

* Update readme dates

* Fix readme

* Tidy checkpoint manager

* Remove fork choice yaml files from repo

* Remove fork choice yaml from repo

* General tidy

* Address majority of Michael's comments

* Tidy bin/lib business

* Remove dangling file

* Undo changes for rpc/handler from master

* Revert "Undo changes for rpc/handler from master"

This reverts commit 876edff0e4a501aafbb47113454852826dcc24e8.

Co-authored-by: Age Manning <Age@AgeManning.com>
2020-01-29 15:05:00 +11:00
Pawan Dhananjay
23a35c3767 Persist/load DHT on shutdown/startup (#659)
* Store dht enrs on shutdown

* Load enrs on startup and add tests

* Remove enr_entries from behavior

* Move all dht persisting logic to `NetworkService`

* Move `PersistedDht` from eth2-libp2p to network crate

* Add test to confirm dht persistence

* Add logging

* Remove extra call to beacon_chain persist

* Expose only mutable `add_enr` method from behaviour

* Fix tests

* Fix merge errors
2020-01-23 18:16:11 +11:00
pscott
7396cd2cab Fix clippy warnings (#813)
* Clippy account manager

* Clippy account_manager

* Clippy beacon_node/beacon_chain

* Clippy beacon_node/client

* Clippy beacon_node/eth1

* Clippy beacon_node/eth2-libp2p

* Clippy beacon_node/genesis

* Clippy beacon_node/network

* Clippy beacon_node/rest_api

* Clippy beacon_node/src

* Clippy beacon_node/store

* Clippy eth2/lmd_ghost

* Clippy eth2/operation_pool

* Clippy eth2/state_processing

* Clippy eth2/types

* Clippy eth2/utils/bls

* Clippy eth2/utils/cahced_tree_hash

* Clippy eth2/utils/deposit_contract

* Clippy eth2/utils/eth2_interop_keypairs

* Clippy eth2/utils/eth2_testnet_config

* Clippy eth2/utils/lighthouse_metrics

* Clippy eth2/utils/ssz

* Clippy eth2/utils/ssz_types

* Clippy eth2/utils/tree_hash_derive

* Clippy lcli

* Clippy tests/beacon_chain_sim

* Clippy validator_client

* Cargo fmt
2020-01-21 18:38:56 +11:00
Michael Sproul
5a8f2dd961
Increase default slots per restore point to 2048 (#790)
This should reduce disk usage by 32x while keeping historical state queries to
less than 10s. If historical states are required quickly, the minimum SPRP of 32
can be set on the CLI.
2020-01-10 14:42:49 +11:00
Michael Sproul
95fc840e2c
Fix off-by-one error in get_latest_restore_point (#787)
* Fix off-by-one error in get_latest_restore_point

* Tighten SPRP checks for succinct hot DB change
2020-01-09 21:05:56 +11:00
Michael Sproul
d9e9c17d3b
Avoid building caches during block replay (#783)
Also, make the ExitCache safe.
2020-01-09 11:43:11 +11:00
Michael Sproul
f36a5a15d6
Store states efficiently in the hot database (#746)
* Sparse hot DB and block root tree

* Fix store_tests

* Ensure loads of hot states on boundaries are fast

* Milder error for unaligned finalized blocks
2020-01-08 13:58:01 +11:00
Paul Hauner
647034b637
Optimization: avoid recomputing known state roots (#762)
* Start adding optimization

* Add temp fix for protobuf issue

* Fix compile errors

* Fix protobuf import
2020-01-03 15:09:00 +11:00
Michael Sproul
5e7803f00b Clean up database metrics, add freezer DB size (#715)
* Clean up database metrics, add freezer DB size

* Address review comments
2019-12-13 13:30:58 +11:00
Paul Hauner
10a134792b
Testnet2 (#685)
* Apply clippy lints to beacon node

* Remove unnecessary logging and correct formatting

* Initial bones of load-balanced range-sync

* Port bump meshsup tests

* Further structure and network handling logic added

* Basic structure, ignoring error handling

* Correct max peers delay bug

* Clean up and re-write message processor and sync manager

* Restructure directory, correct type issues

* Fix compiler issues

* Completed first testing of new sync

* Correct merge issues

* Clean up warnings

* Push attestation processed log down to dbg

* Add state enc/dec benches

* Correct math error, downgraded logs

* Add example for flamegraph

* Use `PublicKeyBytes` for `Validator`

* Ripple PublicKeyBytes change through codebase

* Add RPC error handling and improved syncing code

* Add benches, optimizations to store BeaconState

* Store BeaconState in StorageContainer too

* Optimize StorageContainer with std::mem magic

* Add libp2p stream error handling and dropping of invalid peers

* Lower logs

* Update lcli to parse spec at boot, remove pycli

* Fix issues when starting with mainnet spec

* Set default spec to mainnet

* Fix lcli --spec param

* Add discovery tweak

* Ensure ETH1_FOLLOW_DISTANCE is in YamlConfig

* Set testnet ETH1_FOLLOW_DISTANCE to 16

* Fix rest_api tests

* Set testnet min validator count

* Update with new testnet dir

* Remove some dbg, println

* Add timeout when notifier waits for libp2p lock

* Add validator count CLI flag to lcli contract deploy

* Extend genesis delay time

* Correct libp2p service locking

* Update testnet dir

* Add basic block/state caching on beacon chain

* Add decimals display to notifier sync speed

* Try merge in change to reduce fork choice calls

* Remove fork choice from process block

* Minor log fix

* Check successes > 0

* Adds checkpoint cache

* Stop storing the tree hash cache in the db

* Handles peer disconnects for sync

* Fix failing beacon chain tests

* Change eth2_testnet_config tests to Mainnet

* Add logs downgrade discovery log

* Remove dedunant beacon state write

* Fix re-org warnings

* Use caching get methods in fork choice

* Fix mistake in prev commit

* Use caching state getting in state_by_slot

* Add state.cacheless_clone

* Less fork choice (#679)

* Try merge in change to reduce fork choice calls

* Remove fork choice from process block

* Minor log fix

* Check successes > 0

* Fix failing beacon chain tests

* Fix re-org warnings

* Fix mistake in prev commit

* Attempt to improve attestation processing times

* Introduce HeadInfo struct

* Used cache tree hash for block processing

* Use cached tree hash for block production too

* Range sync refactor

- Introduces `ChainCollection`
- Correct Disconnect node handling
- Removes duplicate code

* Add more logging for DB

* Various bug fixes

* Remove unnecessary logs

* Maintain syncing state in the transition from finalied to head

* Improved disconnect handling

* Add `Speedo` struct

* Fix bugs in speedo

* Fix bug in speedo

* Fix rounding bug in speedo

* Move code around, reduce speedo observation count

* Adds forwards block interator

* Fix inf NaN

* Add first draft of validator onboarding

* Update docs

* Add documentation link to main README

* Continue docs development

* Update book readme

* Update docs

* Allow vc to run without testnet subcommand

* Small change to onboarding docs

* Tidy CLI help messages

* Update docs

* Add check to val client see if beacon node is synced

* Attempt to fix NaN bug

* Fix compile bug

* Add notifier service to validator client

* Re-order onboarding steps

* Update deposit contract address

* Update testnet dir

* Add note about public eth1 node

* Fix installation link

* Set default eth1 endpoint to sigp

* Fix broken test

* Try fix eth1 cache locking

* Be more specific about eth1 endpoint

* Increase gas limit for deposit

* Fix default deposit amount

* Fix re-org log
2019-12-09 23:14:13 +11:00
Paul Hauner
2bfc512fb6
Add block/state caching on beacon chain (#677)
* Add basic block/state caching on beacon chain

* Adds checkpoint cache

* Stop storing the tree hash cache in the db

* Remove dedunant beacon state write

* Use caching get methods in fork choice

* Use caching state getting in state_by_slot

* Add state.cacheless_clone

* Attempt to improve attestation processing times

* Introduce HeadInfo struct

* Used cache tree hash for block processing

* Use cached tree hash for block production too
2019-12-09 14:20:25 +11:00
Paul Hauner
d4b28d48f8
Remove some dbg, println (#675) 2019-12-07 07:29:20 +11:00
Michael Sproul
bd1b61a5b1 Forwards block root iterators (#672)
* Implement forwards block root iterators

* Clean up errors and docs
2019-12-06 18:52:11 +11:00
Paul Hauner
75efed305c
Faster BeaconState enc/dec (#671)
* Add state enc/dec benches

* Add example for flamegraph

* Use `PublicKeyBytes` for `Validator`

* Ripple PublicKeyBytes change through codebase

* Add benches, optimizations to store BeaconState

* Store BeaconState in StorageContainer too

* Optimize StorageContainer with std::mem magic

* Fix rest_api tests
2019-12-06 16:44:03 +11:00
Michael Sproul
d0319320ce Improve freezer DB efficiency with periodic restore points (#649)
* Draft of checkpoint freezer DB

* Fix bugs

* Adjust root iterators for checkpoint database

* Fix freezer state lookups with no slot hint

* Fix split comment

* Use "restore point" to refer to frozen states

* Resolve some FIXMEs

* Configurable slots per restore point

* Document new freezer DB functions

* Fix up StoreConfig

* Fix new test for merge

* Document SPRP default CLI flag, clarify tests
2019-12-06 14:29:06 +11:00
Michael Sproul
bf2eeae3f2 Implement freezer database (#508)
* Implement freezer database for state vectors

* Improve BeaconState safe accessors

And fix a bug in the compact committees accessor.

* Banish dodgy type bounds back to gRPC

* Clean up

* Switch to exclusive end points in chunked vec

* Cleaning up and start of tests

* Randao fix, more tests

* Fix unsightly hack

* Resolve test FIXMEs

* Config file support

* More clean-ups, migrator beginnings

* Finish migrator, integrate into BeaconChain

* Fixups

* Fix store tests

* Fix BeaconChain tests

* Fix LMD GHOST tests

* Address review comments, delete 'static bounds

* Cargo format

* Address review comments

* Fix LMD ghost tests

* Update to spec v0.9.0

* Update to v0.9.1

* Bump spec tags for v0.9.1

* Formatting, fix CI failures

* Resolve accidental KeyPair merge conflict

* Document new BeaconState functions

* Fix incorrect cache drops in `advance_caches`

* Update fork choice for v0.9.1

* Clean up some FIXMEs

* Fix a few docs/logs

* Update for new builder paradigm, spec changes

* Freezer DB integration into BeaconNode

* Cleaning up

* This works, clean it up

* Cleanups

* Fix and improve store tests

* Refine store test

* Delete unused beacon_chain_builder.rs

* Fix CLI

* Store state at split slot in hot database

* Make fork choice lookup fast again

* Store freezer DB split slot in the database

* Handle potential div by 0 in chunked_vector

* Exclude committee caches from freezer DB

* Remove FIXME about long-running test
2019-11-27 10:54:46 +11:00
Michael Sproul
c1a2238f1a
Implement tree hash caching (#584)
* Implement basic tree hash caching

* Use spaces to indent top-level Cargo.toml

* Optimize BLS tree hash by hashing bytes directly

* Implement tree hash caching for validator registry

* Persist BeaconState tree hash cache to disk

* Address Paul's review comments
2019-11-05 15:46:52 +11:00
pscott
7eb82125ef Clippy clean (#536)
* Change into_iter to iter

* Fix clippy 'easy' warnings

* Clippy eth2/utils

* Add struct NetworkInfo

* Clippy for types, utils, and beacon_node/store/src/iters.rs

* Cargo fmt

* Change foo to my_foo

* Remove complex signature

* suppress clippy warning for unit_value in benches

* Use enumerate instead of iterating over range

* Allow trivially_copy_pass_by_ref in serde_utils
2019-09-30 13:58:45 +10:00
Paul Hauner
c4ced3e0d2
Fix block processing blowup, upgrade metrics (#500)
* Renamed fork_choice::process_attestation_from_block

* Processing attestation in fork choice

* Retrieving state from store and checking signature

* Looser check on beacon state validity.

* Cleaned up get_attestation_state

* Expanded fork choice api to provide latest validator message.

* Checking if the an attestation contains a latest message

* Correct process_attestation error handling.

* Copy paste error in comment fixed.

* Tidy ancestor iterators

* Getting attestation slot via helper method

* Refactored attestation creation in test utils

* Revert "Refactored attestation creation in test utils"

This reverts commit 4d277fe4239a7194758b18fb5c00dfe0b8231306.

* Integration tests for free attestation processing

* Implicit conflicts resolved.

* formatting

* Do first pass on Grants code

* Add another attestation processing test

* Tidy attestation processing

* Remove old code fragment

* Add non-compiling half finished changes

* Simplify, fix bugs, add tests for chain iters

* Remove attestation processing from op pool

* Fix bug with fork choice, tidy

* Fix overly restrictive check in fork choice.

* Ensure committee cache is build during attn proc

* Ignore unknown blocks at fork choice

* Various minor fixes

* Make fork choice write lock in to read lock

* Remove unused method

* Tidy comments

* Fix attestation prod. target roots change

* Fix compile error in store iters

* Reject any attestation prior to finalization

* Begin metrics refactor

* Move beacon_chain to new metrics structure.

* Make metrics not panic if already defined

* Use global prometheus gather at rest api

* Unify common metric fns into a crate

* Add heavy metering to block processing

* Remove hypen from prometheus metric name

* Add more beacon chain metrics

* Add beacon chain persistence metric

* Prune op pool on finalization

* Add extra prom beacon chain metrics

* Prefix BeaconChain metrics with "beacon_"

* Add more store metrics

* Add basic metrics to libp2p

* Add metrics to HTTP server

* Remove old `http_server` crate

* Update metrics names to be more like standard

* Fix broken beacon chain metrics, add slot clock metrics

* Add lighthouse_metrics gather fn

* Remove http args

* Fix wrong state given to op pool prune

* Make prom metric names more consistent

* Add more metrics, tidy existing metrics

* Fix store block read metrics

* Tidy attestation metrics

* Fix minor PR comments

* Allow travis failures on beta (see desc)

There's a non-backward compatible change in `cargo fmt`. Stable and beta
do not agree.

* Tidy `lighthouse_metrics` docs

* Fix typo
2019-08-19 21:02:34 +10:00