1. Add commitments to logs and update the `Display` implementation of `KzgCommitment` to become truncated similarly to block root.
I've been finding it difficult to debug scenarios involving multiple blobs for the same `(index, block_root)`. Logging the commitment will help with this, we can match it to what exists in the block.
Example output:
```
Oct 20 21:13:36.700 DEBG Successfully verified gossip blob commitment: 0xa3c1…1cd8, index: 0, root: 0xf31e…f9de, slot: 154568
Oct 20 21:13:36.785 DEBG Successfully verified gossip block commitments: [0xa3c1…1cd8, 0x8655…02ff, 0x8d6a…955a, 0x84ac…3a1b, 0x9752…629b, 0xb9fc…20fb], root: 0xf31eeb732702e429e89057b15e1c0c631e8452e09e03cb1924353f536ef4f9de, slot: 154568, graffiti: teku/besu, service: beacon
```
Example output in a block with no blobs (this will show up pre-deneb):
```
426734:Oct 20 21:15:24.113 DEBG Successfully verified gossip block, commitments: [], root: 0x619db1360ba0e8d44ae2a0f2450ebca47e167191feecffcfac0e8d7b6c39623c, slot: 154577, graffiti: teku/nethermind, service: beacon, module: beacon_chain::beacon_chain:2765
```
2. Remove `strum::IntoStaticStr` from `AvailabilityCheckError`. This is because `IntoStaticStr` end up dropping information inside the enum. So kzg commitments in this error are dropped, making it more difficult to debug
```
AvailabilityCheckError::KzgCommitmentMismatch {
blob_commitment: KzgCommitment,
block_commitment: KzgCommitment,
},
```
which is output as just `AvailabilityCheckError`
3. Some additional misc sync logs I found useful in debugging https://github.com/sigp/lighthouse/pull/4869
4. This downgrades ”Block returned for single block lookup not present” to debug because I don’t think we can fix the scenario that causes this unless we can cancel inflight rpc requests
Co-authored-by: realbigsean <seananderson33@gmail.com>
## Issue Addressed
Fixes#4697.
This also unblocks the state pruning PR (#4835). Because self healing breaks if state pruning is applied to a database with missing block roots.
## Proposed Changes
- Fill in the missing block roots between last restore point slot and split slot when upgrading to latest database version.
## Issue Addressed
Makes lighthouse compliant with new kzg changes in https://github.com/ethereum/consensus-specs/releases/tag/v1.4.0-beta.3
## Proposed Changes
1. Adds new official trusted setup
2. Refactors kzg to match upstream changes in https://github.com/ethereum/c-kzg-4844/pull/377
3. Updates pre-generated `BlobBundle` to work with official trusted setup. ~~Using json here instead of ssz to account for different value of `MaxBlobCommitmentsPerBlock` in minimal and mainnet. By using json, we can just use one pre generated bundle for both minimal and mainnet. Size of 2 separate ssz bundles is approximately equal to one json bundle cc @jimmygchen~~
Dunno what I was doing, ssz works without any issues
4. Stores trusted_setup as just bytes in eth2_network_config so that we don't have kzg dependency in that lib and in lcli.
Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: realbigsean <seananderson33@GMAIL.com>
## Issue Addressed
N/A
## Proposed Changes
Sends blocks and blobs from http_api to the network channel for publishing in a single network channel send. This is to avoid overhead of multiple calls.
Also adds a metric for rpc blob retrieval duration.
## Issue Addressed
Addresses the recent CI failures caused by caching `blst` for the wrong CPU type.
## Proposed Changes
- Use `FEATURES: jemalloc,portable` when building Lighthouse & `lcli` in tests
- Add a new `TEST_FEATURES` and set to `portable` for all CI test jobs.
- Updated Makefiles to read the `TEST_FEATURES` environment variable, and default to none.
## Issue Addressed
We run `clippy` as part of our CI, so it would help new devs if we add the `make lint` command to the dev setup section in the Lighthouse book.
## Issue Addressed
#4512
Which issue # does this PR address?
## Proposed Changes
Add inactivity calculation for Altair
Please list or describe the changes introduced by this PR.
Add inactivity calculation for Altair
## Additional Info
Please provide any additional information. For example, future considerations
or information useful for reviewers.
Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>
## Issue Addressed
updates libp2p to the latest version and uses the new `SwarmBuilder`. Superseeds https://github.com/sigp/lighthouse/pull/4695/
CC @mxinden I don't think we can use both `bandwidth_loggers` with the new syntax right?
## Issue Addressed
Fix a bug in `lcli skip-slots` that resulted in it always writing the pre-state to the output file.
## Proposed Changes
Correctly keep track of the post-state, and write it.
## Issue Addressed
#4827
## Proposed Changes
This PR introduces a new build-arg to the Lighthouse Dockerfile: `CARGO_USE_GIT_CLI`. This arg will be passed into the `CARGO_NET_GIT_FETCH_WITH_CLI` [environment variable](https://doc.rust-lang.org/cargo/reference/config.html#netgit-fetch-with-cli), which instructs `cargo` to use the git CLI during `fetch` operations instead of the git library. Doing so works around [a bug](https://github.com/rust-lang/cargo/issues/10583) with the git library that causes it to go OOM during `fetch` operations on `arm64` platforms.
The default value is `false` so this doesn't affect Lighthouse builds or the CI pipeline. Running a build with `--build-arg CARGO_USE_GIT_CLI=true` will activate it, which is necessary to cross-compile the `arm64` binary when not using `cross` (i.e., when building via the Dockerfile instead of natively if you don't have a rust environment ready to go).
Special thanks to @michaelsproul for helping me repro the initial problem.
Co-authored-by: Michael Sproul <micsproul@gmail.com>
## Issue Addressed
Following the conversation on https://github.com/libp2p/rust-libp2p/pull/3666 the changes introduced in this PR will allow us to give more insights if the bandwidth limitations happen at the transport level, namely if quic helps vs yamux and it's [window size limitation](https://github.com/libp2p/rust-yamux/issues/162) or if the bottleneck is at the gossipsub level.
## Proposed Changes
introduce new quic and tcp bandwidth metric gauges.
cc @mxinden (turned out to be easier, Thomas gave me a hint)
## Issue Addressed
Related to https://github.com/sigp/lighthouse/issues/4676.
Deneb-specifc CI code to be removed before merging to `unstable`. Dot not merge until we're ready to merge into `unstable`, as we may need to release deneb docker images before merging.
Keep in mind that most of the changes in the below PR (to `unstable`) have already
been merged to `deneb-free-blobs`, so merging `deneb-free-blobs` into `unstable` would include those changes - it would be ok if the release runners are ready, otherwise we may want to exclude them before merging.
- https://github.com/sigp/lighthouse/pull/4592
## Issue Addressed
Fix OOMs caused by too many concurrent tests. The runner machine is currently liable to run `32 * 5 = 160` tests in parallel. If each test uses say 300MB max, this is 48GB of RAM!
## Proposed Changes
Reduce the number of threads per runner job to 8. This should cap the memory at 4x lower than the current limit, i.e. around 12GB. If we continue to run out of RAM, we should consider more sophisticated limits.
## Issue Addressed
Fix a deadlock in the tests that was causing tests on tree-states to run for hours without finishing: https://github.com/sigp/lighthouse/actions/runs/6491194654/job/17628138360.
## Proposed Changes
Avoid using a Mutex under the Rayon `par_iter`. Instead, use an `AtomicUsize`. I've run the new version several times in a loop and it hasn't deadlocked (it was deadlocking consistently on tree-states).
## Additional Info
The same bug exists in unstable and tree-states, but I'm not sure why it was triggering so consistently on the tree-states branch.
## Proposed Changes
Add `compare_fields(as_iter)` as a field attribute to `compare_fields_derive`. This allows any iterable type to be compared in the same as a slice (by index).
This is forwards-compatible with tree-states types like `List` and `Vector` which can not be cast to slices.
## Issue Addressed
N/A
## Proposed Changes
I saw a false positive on the link-check CI run and while investigating I noticed that this link technically 404's but is not "dead" in the strict sense. I have updated it to the correct path.
## Proposed Changes
Fix the misplacement of the total block production time metric, which occurred during a previous refactor.
Total block production times are no longer skewed low (data from Holesky + blockdreamer):
```
# HELP beacon_block_production_seconds Full runtime of block production
# TYPE beacon_block_production_seconds histogram
beacon_block_production_seconds_bucket{le="0.005"} 0
beacon_block_production_seconds_bucket{le="0.01"} 0
beacon_block_production_seconds_bucket{le="0.025"} 0
beacon_block_production_seconds_bucket{le="0.05"} 0
beacon_block_production_seconds_bucket{le="0.1"} 0
beacon_block_production_seconds_bucket{le="0.25"} 0
beacon_block_production_seconds_bucket{le="0.5"} 37
beacon_block_production_seconds_bucket{le="1"} 65
beacon_block_production_seconds_bucket{le="2.5"} 66
beacon_block_production_seconds_bucket{le="5"} 66
beacon_block_production_seconds_bucket{le="10"} 66
beacon_block_production_seconds_bucket{le="+Inf"} 66
beacon_block_production_seconds_sum 34.225780452
beacon_block_production_seconds_count 66
```
## Additional Info
Cheers to @jimmygchen for helping spot this.
## Issue Addressed
Addresses #4778, and potentially fixes the flaky deneb builder test `builder_works_post_deneb`.
The [deneb builder test](c5c84f1213/beacon_node/http_api/tests/tests.rs (L5371)) has been quite flaky on our CI (`release-tests`) since it was introduced. I'm guessing that it might be timing out on the builder `get_header` call (1 second), and therefore the local payload is used, while the test expects builder payload to be used.
On my machine the [`get_header` ](c5c84f1213/beacon_node/execution_layer/src/test_utils/mock_builder.rs (L367)) call takes about 550ms, which could easily go over 1s on slower environments (our windows CI runner is much slower than the ubuntu one).
I did a profile on the test and it showed that `blob_to_kzg_commiment` and `compute_kzg_proof` was taking a large chunk of time, so perhaps pre-generating the blobs could help stablise this test.
## Proposed Changes
Pre-generate blobs bundle for Mainnet and Minimal presets.
Before the change `get_header` took about **550ms**, and it's now reduced to **50-55ms** after the change. If timeout was indeed the cause of the flaky test, this fix should stablise it. This also brings the flaky `builder_works_post_deneb` test time from 50s to 10s. (8s if we only use a single blob)
## Issue Addressed
CI is currently blocked by persistently failing integration tests.
## Proposed Changes
Use latest Nethermind release and apply the appropriate fixes as there have been breaking changes.
Also increase the timeout since I had some local timeouts.
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: antondlr <anton@delaruelle.net>
Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>
* use workspace deps in kzg crate
* delete unused blobs dp path field
* full match on fork name in engine api get payload v3
* only accept v3 payloads on get payload v3 endpoint in mock el
* remove FIXMEs related to merge transition tests
* move static tx to test utils
* default max_per_epoch_activation_churn_limit to mainnet value
* remove unnecessary async
* remove comment
* use task executor in `blob_sidecars` endpoint
* Add `blob_sidecar` event to SSE.
* Return 202 if a block is published but failed blob validation when validation level is `Gossip`.
* Move `BlobSidecar` event to `process_gossip_blob` and add test.
* Emit `BlobSidecar` event when blobs are received over rpc.
* Improve test assertions on `SseBlobSidecar`s.
* Add quotes to blob index serialization in `SseBlobSidecar`
Co-authored-by: realbigsean <seananderson33@GMAIL.com>
---------
Co-authored-by: realbigsean <seananderson33@GMAIL.com>
* Initial Commit of State LRU Cache
* Build State Caches After Reconstruction
* Cleanup Duplicated Code in OverflowLRUCache Tests
* Added Test for State LRU Cache
* Prune Cache of Old States During Maintenance
* Address Michael's Comments
* Few More Comments
* Removed Unused impl
* Last touch up
* Fix Clippy
## Proposed Changes
Instead of sending every attestation subscription every slot to every BN:
- Send subscriptions 32, 16, 8, 7, 6, 5, 4, 3 slots before they occur.
- Track whether each subscription is sent successfully and retry it in subsequent slots if necessary.
## Additional Info
- [x] Add unit tests for `SubscriptionSlots`.
- [x] Test on Holesky.
- [x] Based on #4774 for testing.
## Issue Addressed
Partially #4788
## Proposed Changes
Remove documentation on `/lighthouse/database/reconstruct` API to avoid confusion as the calling the API during historical block download will show an error in the beacon log
Add Events API about `payload_attributes`
## Additional Info
Please provide any additional information. For example, future considerations
or information useful for reviewers.
Co-authored-by: chonghe <44791194+chong-he@users.noreply.github.com>
Co-authored-by: Michael Sproul <micsproul@gmail.com>
## Issue Addressed
While reviewing #4801 I noticed that our use of `take_while` in the block replayer means that if a state root iterator _with gaps_ is provided, some additonal state roots will be dropped unnecessarily. In practice the impact is small, because once there's _one_ state root miss, the whole tree hash cache needs to be built anyway, and subsequent misses are less costly. However this was still a little inefficient, so I figured it's better to fix it.
## Proposed Changes
Use [`peeking_take_while`](https://docs.rs/itertools/latest/itertools/trait.Itertools.html#method.peeking_take_while) to avoid consuming the next element when checking whether it satisfies the slot predicate.
## Additional Info
There's a gist here that shows the basic dynamics in isolation: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=40b623cc0febf9ed51705d476ab140c5. Changing the `peeking_take_while` to a `take_while` causes the assert to fail. Similarly I've added a new test `block_replayer_peeking_state_roots` which fails if the same change is applied inside `get_state_root`.
## Issue Addressed
Closes https://github.com/sigp/lighthouse/issues/4712
## Proposed Changes
Exit aggregation step early if no validator is aggregator. This avoids an unnecessary request to the beacon node and more importantly fixes noisy errors if Lighthouse VC is used with other clients such as Lodestar and Prysm.
## Additional Info
Related issue https://github.com/ChainSafe/lodestar/issues/5553
## Issue Addressed
- Close#4596
## Proposed Changes
- Add `Filter::recover` to handle rejections specifically as 404 NOT FOUND
Please list or describe the changes introduced by this PR.
## Additional Info
Similar to PR #3836
## Issue Addressed
This PR closes https://github.com/sigp/lighthouse/issues/3237
## Proposed Changes
Remove topic weight of old topics when the fork happens.
## Additional Info
- Divided `NetworkService::start()` into `NetworkService::build()` and `NetworkService::start()` for ease of testing.
## Issue Addressed
We've had a report of sync committee performance suffering with the beacon processor HTTP API prioritisations.
## Proposed Changes
Increase the priority of `/eth/v1/beacon/blocks/head/root` requests, which are used by the validator client to form sync committee messages, here:
441fc1691b/validator_client/src/sync_committee_service.rs (L181-L188)
Additionally, avoid loading the blinded block in all but the `block_id=block_root` case. I'm not sure why we were doing this previously, I suspect it was just an oversight during the implementation of the `finalized` status on API requests.
## Additional Info
I think this change should have minimal negative impact as:
- The block root endpoint is quick to compute (a few ms max).
- Only the priority of `head` requests is increased. Analytical processes that are making lots of block root requests for past slots are unable to DoS the beacon processor, as their requests will still be processed after attestations.
## Issue Addressed
N/A
## Proposed Changes
We were currently downscoring a peer for sending us a block that we already have in fork choice. This is unnecessary as we get duplicates in lighthouse only when
1. We published the block, so the block is already in fork choice
2. We imported the same block over rpc
In both scenarios, the peer who sent us the block over gossip is not at fault.
This isn't exploitable as valid duplicates will get dropped by the gossipsub duplicate filter