Commit Graph

6035 Commits

Author SHA1 Message Date
Jimmy Chen
d4f26ee123 Add block roots heal logic in v18 schema migration. (#4875)
## Issue Addressed

Fixes #4697. 

This also unblocks the state pruning PR (#4835). Because self healing breaks if state pruning is applied to a database with missing block roots.

## Proposed Changes

- Fill in the missing block roots between last restore point slot and split slot when upgrading to latest database version.
2023-10-25 03:42:24 +00:00
antondlr
a228e61773 don't make lcli on self-hosted runners (#4874)
## Issue Addressed

Our self-hosted runners now have a modern (Deneb-ready) version of `lcli` preinstalled so we no longer need to compile it.
2023-10-25 03:42:23 +00:00
Pawan Dhananjay
6315a81260 Upgrade to v1.4.0-beta.3 (#4862)
## Issue Addressed

Makes lighthouse compliant with new kzg changes in https://github.com/ethereum/consensus-specs/releases/tag/v1.4.0-beta.3

## Proposed Changes

1. Adds new official trusted setup
2. Refactors kzg to match upstream changes in https://github.com/ethereum/c-kzg-4844/pull/377
3. Updates pre-generated `BlobBundle` to work with official trusted setup. ~~Using json here instead of ssz to account for different value of `MaxBlobCommitmentsPerBlock` in minimal and mainnet. By using json, we can just use one pre generated bundle for both minimal and mainnet. Size of 2 separate ssz bundles is approximately equal to one json bundle cc @jimmygchen~~ 
Dunno what I was doing, ssz works without any issues  
4. Stores trusted_setup as just bytes in eth2_network_config so that we don't have kzg dependency in that lib and in lcli. 


Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: realbigsean <seananderson33@GMAIL.com>
2023-10-21 13:49:27 +00:00
Pawan Dhananjay
074c4951fc Reduce calls to network channel (#4863)
## Issue Addressed

N/A

## Proposed Changes

Sends blocks and blobs from http_api to the network channel for publishing in a single network channel send. This is to avoid overhead of multiple calls.
Also adds a metric for rpc blob retrieval duration.
2023-10-20 19:42:47 +00:00
Jimmy Chen
e8fba8d3a7 Enable BLS portable feature on all CI tests (#4868)
## Issue Addressed

Addresses the recent CI failures caused by caching `blst` for the wrong CPU type. 

## Proposed Changes

- Use `FEATURES: jemalloc,portable` when building Lighthouse & `lcli` in tests
- Add a new `TEST_FEATURES` and set to `portable` for all CI test jobs.
- Updated Makefiles to read the `TEST_FEATURES` environment variable, and default to none.
2023-10-20 07:30:27 +00:00
Jimmy Chen
8880675eda Add make lint to development environment section in Book (#4866)
## Issue Addressed

We run `clippy` as part of our CI, so it would help new devs if we add the `make lint` command to the dev setup section in the Lighthouse book.
2023-10-20 06:23:29 +00:00
Zackary Scott
b11988223f #4512 inactivity calculation for Altair (#4807)
## Issue Addressed
#4512 
Which issue # does this PR address?

## Proposed Changes
Add inactivity calculation for Altair

Please list or describe the changes introduced by this PR.
Add inactivity calculation for Altair

## Additional Info

Please provide any additional information. For example, future considerations
or information useful for reviewers.


Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>
2023-10-20 06:23:28 +00:00
GoodDaisy
90f78d141f fix typos (#4838) 2023-10-19 22:05:15 +00:00
João Oliveira
c6583bb5fa update libp2p (#4864)
## Issue Addressed

updates libp2p to the latest version and uses the new `SwarmBuilder`. Superseeds https://github.com/sigp/lighthouse/pull/4695/
CC @mxinden I don't think we can use both `bandwidth_loggers` with the new syntax right?
2023-10-19 21:22:55 +00:00
Michael Sproul
8c28d175b8 Fix: write post state in lcli skip-slots (#4843)
## Issue Addressed

Fix a bug in `lcli skip-slots` that resulted in it always writing the pre-state to the output file.

## Proposed Changes

Correctly keep track of the post-state, and write it.
2023-10-19 05:19:25 +00:00
Dustin Brickwood
1c6356f8f3 chore: replace deprecated hub with gh for releases (#4839)
## Issue Addressed

- The tool hub that is used to create draft releases has been removed as of October 2. See: https://github.com/actions/runner-images/issues/8362
- This change replaces `hub` usage in favour of [`gh`](https://cli.github.com/manual/gh_release_create) 

## Proposed Changes

- This change replaces `hub` usage in favour of [`gh`](https://cli.github.com/manual/gh_release_create) 

## Additional Info

Please provide any additional information. For example, future considerations
or information useful for reviewers.


Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2023-10-19 05:19:24 +00:00
Joe Clapis
1de02f731b Add CARGO_USE_GIT_CLI to the Dockerfile to work around an OOM bug during cross-compiling (#4828)
## Issue Addressed
#4827 

## Proposed Changes

This PR introduces a new build-arg to the Lighthouse Dockerfile: `CARGO_USE_GIT_CLI`. This arg will be passed into the `CARGO_NET_GIT_FETCH_WITH_CLI` [environment variable](https://doc.rust-lang.org/cargo/reference/config.html#netgit-fetch-with-cli), which instructs `cargo` to use the git CLI during `fetch` operations instead of the git library. Doing so works around [a bug](https://github.com/rust-lang/cargo/issues/10583) with the git library that causes it to go OOM during `fetch` operations on `arm64` platforms.

The default value is `false` so this doesn't affect Lighthouse builds or the CI pipeline. Running a build with `--build-arg CARGO_USE_GIT_CLI=true` will activate it, which is necessary to cross-compile the `arm64` binary when not using `cross` (i.e., when building via the Dockerfile instead of natively if you don't have a rust environment ready to go).

Special thanks to @michaelsproul for helping me repro the initial problem.

Co-authored-by: Michael Sproul <micsproul@gmail.com>
2023-10-19 05:19:23 +00:00
João Oliveira
f06391717c collect bandwidth metrics per transport (#4805)
## Issue Addressed

Following the conversation on https://github.com/libp2p/rust-libp2p/pull/3666 the changes introduced in this PR will allow us to give more insights if the bandwidth limitations happen at the transport level, namely if quic helps vs yamux and it's [window size limitation](https://github.com/libp2p/rust-yamux/issues/162) or if the bottleneck is at the gossipsub level.
## Proposed Changes

introduce new quic and tcp bandwidth metric gauges.

cc @mxinden (turned out to be easier, Thomas gave me a hint)
2023-10-19 05:19:22 +00:00
Jimmy Chen
98cac2bc6b Deneb review .github (CI cleanup) (#4696)
## Issue Addressed

Related to https://github.com/sigp/lighthouse/issues/4676.

Deneb-specifc CI code to be removed before merging to `unstable`. Dot not merge until we're ready to merge into `unstable`, as we may need to release deneb docker images before merging.

Keep in mind that most of the changes in the below PR (to `unstable`) have already 
been merged to `deneb-free-blobs`, so merging `deneb-free-blobs` into `unstable` would include those changes - it would be ok if the release runners are ready, otherwise we may want to exclude them before merging.
- https://github.com/sigp/lighthouse/pull/4592
2023-10-18 15:23:31 +00:00
Michael Sproul
5bbeedb5b7 Reduce nextest threads to 8 (#4846)
## Issue Addressed

Fix OOMs caused by too many concurrent tests. The runner machine is currently liable to run `32 * 5 = 160` tests in parallel. If each test uses say 300MB max, this is 48GB of RAM!

## Proposed Changes

Reduce the number of threads per runner job to 8. This should cap the memory at 4x lower than the current limit, i.e. around 12GB. If we continue to run out of RAM, we should consider more sophisticated limits.
2023-10-18 14:19:41 +00:00
Michael Sproul
192d442718 Fix Rayon deadlock in test utils (#4837)
## Issue Addressed

Fix a deadlock in the tests that was causing tests on tree-states to run for hours without finishing: https://github.com/sigp/lighthouse/actions/runs/6491194654/job/17628138360.

## Proposed Changes

Avoid using a Mutex under the Rayon `par_iter`. Instead, use an `AtomicUsize`. I've run the new version several times in a loop and it hasn't deadlocked (it was deadlocking consistently on tree-states).

## Additional Info

The same bug exists in unstable and tree-states, but I'm not sure why it was triggering so consistently on the tree-states branch.
2023-10-18 13:36:42 +00:00
Michael Sproul
463e62e833 Generalise compare_fields to work with iterators (#4823)
## Proposed Changes

Add `compare_fields(as_iter)` as a field attribute to `compare_fields_derive`. This allows any iterable type to be compared in the same as a slice (by index). 

This is forwards-compatible with tree-states types like `List` and `Vector` which can not be cast to slices.
2023-10-18 12:59:53 +00:00
Jimmy Chen
1b4545cd9d Remove blob clones in KZG verification (#4852)
## Issue Addressed

This PR removes two instances of blob clones during blob verification that may not be necessary.
2023-10-18 06:52:54 +00:00
Mac L
a7c46bf7ed Fix Homebrew link (#4822)
## Issue Addressed

N/A

## Proposed Changes

I saw a false positive on the link-check CI run and while investigating I noticed that this link technically 404's but is not "dead" in the strict sense. I have updated it to the correct path.
2023-10-18 06:52:53 +00:00
João Oliveira
f10d3d07c3 remove crit! logging from ListenerClosed event on Ok() (#4821)
## Issue Addressed

Since adding Quic support on https://github.com/sigp/lighthouse/pull/4577, and due to `quinn`s api nature LH now triggers the [`ListenerClosed`](https://docs.rs/libp2p/0.52.3/libp2p/swarm/struct.ListenerClosed.html) event.. @michaelsproul noticed we are logging this event as `crit!` independently of the reason. This PR matches the reason, logging with `debug!` and `error!` (instead of `crit!`) according to its `Result`  
## Additional Info
LH will still log `crit!` until https://github.com/libp2p/rust-libp2p/pull/4621 has been merged
2023-10-18 06:52:52 +00:00
Jimmy Chen
18f3edff0a Add vendor directory to .gitignore (#4819)
## Issue Addressed

The vendor directory gets populated after running `cargo vendor`. This directory should be ignored by VCS.
2023-10-18 06:52:51 +00:00
Michael Sproul
5cc0f1097b Fix metric for total block production time (#4794)
## Proposed Changes

Fix the misplacement of the total block production time metric, which occurred during a previous refactor.

Total block production times are no longer skewed low (data from Holesky + blockdreamer):

```
# HELP beacon_block_production_seconds Full runtime of block production
# TYPE beacon_block_production_seconds histogram
beacon_block_production_seconds_bucket{le="0.005"} 0
beacon_block_production_seconds_bucket{le="0.01"} 0
beacon_block_production_seconds_bucket{le="0.025"} 0
beacon_block_production_seconds_bucket{le="0.05"} 0
beacon_block_production_seconds_bucket{le="0.1"} 0
beacon_block_production_seconds_bucket{le="0.25"} 0
beacon_block_production_seconds_bucket{le="0.5"} 37
beacon_block_production_seconds_bucket{le="1"} 65
beacon_block_production_seconds_bucket{le="2.5"} 66
beacon_block_production_seconds_bucket{le="5"} 66
beacon_block_production_seconds_bucket{le="10"} 66
beacon_block_production_seconds_bucket{le="+Inf"} 66
beacon_block_production_seconds_sum 34.225780452
beacon_block_production_seconds_count 66
```

## Additional Info

Cheers to @jimmygchen for helping spot this.
2023-10-18 06:52:50 +00:00
Jimmy Chen
64c156c0c1 Pre-generate test blobs bundle to improve test time. (#4829)
## Issue Addressed

Addresses #4778, and potentially fixes the flaky deneb builder test `builder_works_post_deneb`.

The [deneb builder test](c5c84f1213/beacon_node/http_api/tests/tests.rs (L5371)) has been quite flaky on our CI (`release-tests`) since it was introduced. I'm guessing that it might be timing out on the builder `get_header` call (1 second), and therefore the local payload is used, while the test expects builder payload to be used. 

On my machine the [`get_header` ](c5c84f1213/beacon_node/execution_layer/src/test_utils/mock_builder.rs (L367)) call takes about 550ms, which could easily go over 1s on slower environments (our windows CI runner is much slower than the ubuntu one).

I did a profile on the test and it showed that `blob_to_kzg_commiment` and `compute_kzg_proof` was taking a large chunk of time, so perhaps pre-generating the blobs could help stablise this test.

## Proposed Changes

Pre-generate blobs bundle for Mainnet and Minimal presets.

Before the change `get_header` took about **550ms**, and it's now reduced to **50-55ms** after the change. If timeout was indeed the cause of the flaky test, this fix should stablise it. This also brings the flaky `builder_works_post_deneb` test time from 50s to 10s. (8s if we only use a single blob)
2023-10-18 04:40:29 +00:00
Mac L
369b624b19 Fix broken Nethermind integration tests (#4836)
## Issue Addressed

CI is currently blocked by persistently failing integration tests.

## Proposed Changes

Use latest Nethermind release and apply the appropriate fixes as there have been breaking changes.
Also increase the timeout since I had some local timeouts.


Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: antondlr <anton@delaruelle.net>
Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>
2023-10-18 04:08:55 +00:00
Michael Sproul
8b0545da12
Merge Deneb (#4054) 2023-10-17 10:58:11 +11:00
realbigsean
283ec8cf24
Deneb pr updates 2 (#4851)
* use workspace deps in kzg crate

* delete unused blobs dp path field

* full match on fork name in engine api get payload v3

* only accept v3 payloads on get payload v3 endpoint in mock el

* remove FIXMEs related to merge transition tests

* move static tx to test utils

* default max_per_epoch_activation_churn_limit to mainnet value

* remove unnecessary async

* remove comment

* use task executor in `blob_sidecars` endpoint
2023-10-17 09:53:46 +11:00
Michael Sproul
ba0567d3ef
Merge remote-tracking branch 'origin/unstable' into deneb-free-blobs 2023-10-16 16:33:37 +11:00
Age Manning
cf544b3996
Very minor own nitpicks (#4845) 2023-10-16 16:30:14 +11:00
Michael Sproul
2d662f78ae
Use Deneb fork in generate_genesis_header
Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>
2023-10-16 16:24:27 +11:00
Jimmy Chen
38e7172508
Add blob_sidecar event to SSE (#4790)
* Add `blob_sidecar` event to SSE.

* Return 202 if a block is published but failed blob validation when validation level is `Gossip`.

* Move `BlobSidecar` event to `process_gossip_blob` and add test.

* Emit `BlobSidecar` event when blobs are received over rpc.

* Improve test assertions on `SseBlobSidecar`s.

* Add quotes to blob index serialization in `SseBlobSidecar`

Co-authored-by: realbigsean <seananderson33@GMAIL.com>

---------

Co-authored-by: realbigsean <seananderson33@GMAIL.com>
2023-10-12 10:13:08 -04:00
realbigsean
4555e33048
Remove serde derive references (#4830)
* remove remaining uses of serde_derive

* fix lockfile

---------

Co-authored-by: João Oliveira <hello@jxs.pt>
2023-10-11 13:01:30 -04:00
Michael Sproul
d9acee5a72
Delete unused ssz_types file (#4824) 2023-10-11 12:49:08 -04:00
ethDreamer
8660043024
Prevent Overflow LRU Cache from Exploding (#4801)
* Initial Commit of State LRU Cache

* Build State Caches After Reconstruction

* Cleanup Duplicated Code in OverflowLRUCache Tests

* Added Test for State LRU Cache

* Prune Cache of Old States During Maintenance

* Address Michael's Comments

* Few More Comments

* Removed Unused impl

* Last touch up

* Fix Clippy
2023-10-10 23:51:00 -05:00
Jimmy Chen
4ad7e15732
Address Clippy 1.73 lints on Deneb branch (#4810)
* Address Clippy 1.73 lints (#4809)

## Proposed Changes

Fix Clippy lints enabled by default in Rust 1.73.0, released today.

* Address Clippy 1.73 lints.

---------

Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2023-10-06 12:23:57 +05:30
Michael Sproul
c3321dddb7 Reduce attestation subscription spam from VC (#4806)
## Proposed Changes

Instead of sending every attestation subscription every slot to every BN:

- Send subscriptions 32, 16, 8, 7, 6, 5, 4, 3 slots before they occur.
- Track whether each subscription is sent successfully and retry it in subsequent slots if necessary.

## Additional Info

- [x] Add unit tests for `SubscriptionSlots`.
- [x] Test on Holesky.
- [x] Based on #4774 for testing.
2023-10-06 06:26:18 +00:00
chonghe
accb56e4fb Revise doc API section (#4798)
## Issue Addressed

Partially #4788 

## Proposed Changes

Remove documentation on `/lighthouse/database/reconstruct` API to avoid confusion as the calling the API during historical block download will show an error in the beacon log

Add Events API about `payload_attributes`

## Additional Info

Please provide any additional information. For example, future considerations
or information useful for reviewers.


Co-authored-by: chonghe <44791194+chong-he@users.noreply.github.com>
Co-authored-by: Michael Sproul <micsproul@gmail.com>
2023-10-06 04:34:47 +00:00
Michael Sproul
9769a247b2 Address Clippy 1.73 lints (#4809)
## Proposed Changes

Fix Clippy lints enabled by default in Rust 1.73.0, released today.
2023-10-06 03:05:47 +00:00
realbigsean
203ac65041
Merge pull request #4808 from jimmygchen/merge-unstable-to-deneb-20231005
Merge `unstable` branch to deneb 20231005
2023-10-05 11:17:48 -04:00
Jimmy Chen
a96963fd5f
Re-commit corrupted key files 2023-10-06 00:24:09 +11:00
Jimmy Chen
3692622339
Merge branch 'unstable' into merge-unstable-to-deneb-20231005 2023-10-06 00:20:54 +11:00
Michael Sproul
b82f7843ff Use peeking_take_while in BlockReplayer (#4803)
## Issue Addressed

While reviewing #4801 I noticed that our use of `take_while` in the block replayer means that if a state root iterator _with gaps_ is provided, some additonal state roots will be dropped unnecessarily. In practice the impact is small, because once there's _one_ state root miss, the whole tree hash cache needs to be built anyway, and subsequent misses are less costly. However this was still a little inefficient, so I figured it's better to fix it.

## Proposed Changes

Use [`peeking_take_while`](https://docs.rs/itertools/latest/itertools/trait.Itertools.html#method.peeking_take_while) to avoid consuming the next element when checking whether it satisfies the slot predicate.

## Additional Info

There's a gist here that shows the basic dynamics in isolation: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=40b623cc0febf9ed51705d476ab140c5. Changing the `peeking_take_while` to a `take_while` causes the assert to fail. Similarly I've added a new test `block_replayer_peeking_state_roots` which fails if the same change is applied inside `get_state_root`.
2023-10-05 06:03:24 +00:00
Jimmy Chen
72563ffb41
Fix CI tests 2023-10-05 16:38:06 +11:00
Jimmy Chen
c5c84f1213
Merge branch 'unstable' into merge-unstable-to-deneb-20231005
# Conflicts:
#	.github/workflows/test-suite.yml
#	Cargo.lock
#	beacon_node/execution_layer/Cargo.toml
#	beacon_node/execution_layer/src/test_utils/mock_builder.rs
#	beacon_node/execution_layer/src/test_utils/mod.rs
#	beacon_node/network/src/service/tests.rs
#	consensus/types/src/builder_bid.rs
2023-10-05 15:54:44 +11:00
Nico Flaig
4b619c63d7 Exit aggregation step early if no validator is aggregator (#4774)
## Issue Addressed

Closes https://github.com/sigp/lighthouse/issues/4712

## Proposed Changes

Exit aggregation step early if no validator is aggregator. This avoids an unnecessary request to the beacon node and more importantly fixes noisy errors if Lighthouse VC is used with other clients such as Lodestar and Prysm.

## Additional Info

Related issue https://github.com/ChainSafe/lodestar/issues/5553
2023-10-05 02:14:55 +00:00
duguorong009
7d537214b7 fix(validator_client): return http 404 rather than 405 in http api (#4758)
## Issue Addressed
- Close #4596 

## Proposed Changes
- Add `Filter::recover` to handle rejections specifically as 404 NOT FOUND

Please list or describe the changes introduced by this PR.

## Additional Info

Similar to PR #3836
2023-10-04 00:43:29 +00:00
Akihito Nakano
ba8bcf4bd3 Remove deficit gossipsub scoring during topic transition (#4486)
## Issue Addressed

This PR closes https://github.com/sigp/lighthouse/issues/3237

## Proposed Changes

Remove topic weight of old topics when the fork happens.

## Additional Info

- Divided `NetworkService::start()` into `NetworkService::build()` and `NetworkService::start()` for ease of testing.
2023-10-04 00:43:28 +00:00
Michael Sproul
6ec649a4e2 Optimise head block root API (#4799)
## Issue Addressed

We've had a report of sync committee performance suffering with the beacon processor HTTP API prioritisations.

## Proposed Changes

Increase the priority of `/eth/v1/beacon/blocks/head/root` requests, which are used by the validator client to form sync committee messages, here:

441fc1691b/validator_client/src/sync_committee_service.rs (L181-L188)

Additionally, avoid loading the blinded block in all but the `block_id=block_root` case. I'm not sure why we were doing this previously, I suspect it was just an oversight during the implementation of the `finalized` status on API requests.

## Additional Info

I think this change should have minimal negative impact as:

- The block root endpoint is quick to compute (a few ms max).
- Only the priority of `head` requests is increased. Analytical processes that are making lots of block root requests for past slots are unable to DoS the beacon processor, as their requests will still be processed after attestations.
2023-10-03 23:59:37 +00:00
Pawan Dhananjay
5bab9b866e Don't downscore peers on duplicate blocks (#4791)
## Issue Addressed

N/A

## Proposed Changes

We were currently downscoring a peer for sending us a block that we already have in fork choice. This is unnecessary as we get duplicates in lighthouse only when
1. We published the block, so the block is already in fork choice
2. We imported the same block over rpc

In both scenarios, the peer who sent us the block over gossip is not at fault.

This isn't exploitable as valid duplicates will get dropped by the gossipsub duplicate filter
2023-10-03 23:59:35 +00:00
Lucas Saldanha
f7daf82430 Removed old Teku mainnet bootnode ENRs (#4786)
## Issue Addressed

N/A

## Proposed Changes

Removing the two Teku mainnet bootnodes that are being sunset.

## Additional Info

We are leaving only these two bootnodes: https://github.com/eth-clients/eth2-networks/blob/master/shared/mainnet/bootstrap_nodes.txt#L10-L11
2023-10-03 23:59:34 +00:00
Divma
f11884ccdb enforce non zero enr ports (#4776)
## Issue Addressed

Right now lighthouse accepts zero as enr ports. Since enr ports should be reachable, zero ports should be rejected here

## Proposed Changes

- update the config to use `NonZerou16` as an ENR port for all enr-related fields.
- the enr builder from config now sets the enr to the listening port only if the enr port is not already set (prev behaviour) and the listening port is not zero (new behaviour)
- reject zero listening ports when used with `enr-match`. 
- boot node now rejects listening port as zero, since those are advertised.
- generate-bootnode-enr also rejected zero listening ports for the same reason.
- update local network scripts

## Additional Info

Unrelated, but why do we overwrite `enr-x-port` values with listening ports if `enr-match` is present? we prob should only do this for enr values that are not already set.
2023-10-03 23:59:34 +00:00