lighthouse

Author	SHA1	Message	Date
Paul Hauner	d53d43844c	Suggestions for Capella `beacon_chain` (#3999 ) * Remove CapellaReadiness::NotSynced Some EEs have a habit of flipping between synced/not-synced, which causes some spurious "Not read for the merge" messages back before the merge. For the merge, if the EE wasn't synced the CE simple wouldn't go through the transition (due to optimistic sync stuff). However, we don't have that hard requirement for Capella; the CE will go through the fork and just wait for the EE to catch up. I think that removing `NotSynced` here will avoid false-positives on the "Not ready logs..". We'll be creating other WARN/ERRO logs if the EE isn't synced, anyway. * Change some Capella readiness logging There's two changes here: 1. Shorten the log messages, for readability. 2. Change the hints. Connecting a Capella-ready LH to a non-Capella-ready EE gives this log: ``` WARN Not ready for Capella info: The execution endpoint does not appear to support the required engine api methods for Capella: Required Methods Unsupported: engine_getPayloadV2 engine_forkchoiceUpdatedV2 engine_newPayloadV2, service: slot_notifier ``` This variant of error doesn't get a "try updating" style hint, when it's the one that needs it. This is because we detect the method-not-found reponse from the EE and return default capabilities, rather than indicating that the request fails. I think it's fair to say that an EE upgrade is required whenever it doesn't provide the required methods. I changed the `ExchangeCapabilitiesFailed` message since that can only happen when the EE fails to respond with anything other than success or not-found.	2023-02-21 11:05:36 +11:00
Paul Hauner	c3c181aa03	Remove "eip4844" network (#4008 )	2023-02-21 11:01:22 +11:00
Michael Sproul	0b6850221e	Fix Capella schema downgrades (#4004 )	2023-02-20 17:50:42 +11:00
Paul Hauner	9a41f65b89	Add capella fork epoch (#3997 )	2023-02-17 16:25:20 +11:00
Michael Sproul	1f419f4653	Merge pull request #3981 from michaelsproul/capella-update Update `capella` to `unstable`	2023-02-17 12:47:34 +11:00
Michael Sproul	066c27750a	Merge remote-tracking branch 'origin/staging' into capella-update	2023-02-17 12:05:36 +11:00
Paul Hauner	4aa8a2ab12	Suggestions for Capella `execution_layer` (#3983 ) * Restrict Engine::request to FnOnce * Use `Into::into` * Impl IntoIterator for VariableList * Use Instant rather than SystemTime	2023-02-17 11:58:33 +11:00
Michael Sproul	ebf2fec5d0	Fix exec integration tests for Geth v1.11.0 (#3982 ) ## Proposed Changes * Bump Go from 1.17 to 1.20. The latest Geth release v1.11.0 requires 1.18 minimum. * Prevent a cache miss during payload building by using the right fee recipient. This prevents Geth v1.11.0 from building a block with 0 transactions. The payload building mechanism is overhauled in the new Geth to improve the payload every 2s, and the tests were failing because we were falling back on a `getPayload` call with no lookahead due to `get_payload_id` cache miss caused by the mismatched fee recipient. Alternatively we could hack the tests to send `proposer_preparation_data`, but I think the static fee recipient is simpler for now. * Add support for optionally enabling Lighthouse logs in the integration tests. Enable using `cargo run --release --features logging/test_logger`. This was very useful for debugging.	2023-02-16 23:34:33 +00:00
Jimmy Chen	245e922c7b	Improve testing slot clock to allow manipulation of time in tests (#3974 ) ## Issue Addressed I discovered this issue while implementing [this test](https://github.com/jimmygchen/lighthouse/blob/test-example/beacon_node/network/src/beacon_processor/tests.rs#L895), where I tried to manipulate the slot clock with: `rig.chain.slot_clock.set_current_time(duration);` however the change doesn't get reflected in the `slot_clock` in `ReprocessQueue`, and I realised `slot_clock` was cloned a few times in the code, and therefore changing the time in `rig.chain.slot_clock` doesn't have any effect in `ReprocessQueue`. I've incorporated the suggestion from the @paulhauner and @michaelsproul - wrapping the `ManualSlotClock.current_time` (`RwLock<Duration>)` in an `Arc`, and the above test now passes. Let's see if this breaks any existing tests :)	2023-02-16 23:34:32 +00:00
Divma	ffeb8b6e05	blacklist tests in windows (#3961 ) ## Issue Addressed Windows tests for subscription and unsubscriptions fail in CI sporadically. We usually ignore this failures, so this PR aims to help reduce the failure noise. Associated issue is https://github.com/sigp/lighthouse/issues/3960	2023-02-16 23:34:30 +00:00
Michael Sproul	461bda6e85	Execution engine suggestions from code review Co-authored-by: Paul Hauner <paul@paulhauner.com>	2023-02-16 16:54:05 +11:00
realbigsean	55753f8bc8	bump recursion limit	2023-02-15 16:32:50 -05:00
realbigsean	ca8e341649	fix compilation after merge	2023-02-15 14:30:39 -05:00
realbigsean	8320b918ae	merge self limiter	2023-02-15 14:26:18 -05:00
realbigsean	4d0b0f681d	merge self limiter	2023-02-15 14:25:58 -05:00
realbigsean	b805fa6279	merge with upstream	2023-02-15 14:20:12 -05:00
realbigsean	87d1fbeb21	Merge pull request #3905 from emhane/beacon_chain_tests Debug CI	2023-02-15 11:53:41 -05:00
Emilia Hane	aaf6404d4f	Remove unused generic	2023-02-15 17:45:22 +01:00
Michael Sproul	2fcfdf1a01	Fix docker and deps (#3978 ) ## Proposed Changes - Fix this cargo-audit failure for `sqlite3-sys`: https://github.com/sigp/lighthouse/actions/runs/4179008889/jobs/7238473962 - Prevent the Docker builds from running out of RAM on CI by removing `gnosis` and LMDB support from the `-dev` images (see: https://github.com/sigp/lighthouse/pull/3959#issuecomment-1430531155, successful run on my fork: https://github.com/michaelsproul/lighthouse/actions/runs/4179162480/jobs/7239537947).	2023-02-15 11:51:46 +00:00
Emilia Hane	2672cf40bb	Better fix for debug tests	2023-02-15 11:47:56 +01:00
Emilia Hane	9fea440ae6	Fix lint	2023-02-15 09:54:36 +01:00
Michael Sproul	fd379ae2e2	Upgrade sqlite3	2023-02-15 09:44:37 +01:00
realbigsean	44dbccfeae	add v3 to capabilities	2023-02-15 09:23:59 +01:00
Emilia Hane	13efd47238	fixup! Disable use of system time in tests	2023-02-15 09:20:30 +01:00
Michael Sproul	918b688f72	Simplify payload traits and reduce cloning (#3976 ) * Simplify payload traits and reduce cloning * Fix self limiter	2023-02-15 14:17:56 +11:00
Emilia Hane	9e4abc79fb	Comment out tests that use system time	2023-02-14 14:12:50 +01:00
Emilia Hane	73c7ad73b8	Disable use of system time in tests	2023-02-14 13:33:38 +01:00
Emilia Hane	148385eb70	Remove unused error	2023-02-14 12:43:13 +01:00
Emilia Hane	810d875b02	Fix merge conflicts with eip4844	2023-02-14 12:42:41 +01:00
Emilia Hane	8200d37045	Merge pull request #2 from realbigsean/sean-debug-ci Debug CI Sean's PR feedback	2023-02-14 12:24:15 +01:00
Michael Sproul	10d32ee04c	Quote Capella BeaconState fields (#3967 )	2023-02-14 14:41:28 +11:00
Michael Sproul	5fc798dd1b	Merge pull request #3973 from michaelsproul/capella-merge Update Capella to latest `unstable`	2023-02-14 14:41:14 +11:00
Age Manning	8dd9249177	Enforce a timeout on peer disconnect (#3757 ) On heavily crowded networks, we are seeing many attempted connections to our node every second. Often these connections come from peers that have just been disconnected. This can be for a number of reasons including: - We have deemed them to be not as useful as other peers - They have performed poorly - They have dropped the connection with us - The connection was spontaneously lost - They were randomly removed because we have too many peers In all of these cases, if we have reached or exceeded our target peer limit, there is no desire to accept new connections immediately after the disconnect from these peers. In fact, it often costs us resources to handle the established connections and defeats some of the logic of dropping them in the first place. This PR adds a timeout, that prevents recently disconnected peers from reconnecting to us. Technically we implement a ban at the swarm layer to prevent immediate re connections for at least 10 minutes. I decided to keep this light, and use a time-based LRUCache which only gets updated during the peer manager heartbeat to prevent added stress of polling a delay map for what could be a large number of peers. This cache is bounded in time. An extra space bound could be added should people consider this a risk. Co-authored-by: Diva M <divma@protonmail.com>	2023-02-14 03:25:42 +00:00
Michael Sproul	f7bd4bf06e	Update block rewards API for Capella	2023-02-14 12:09:40 +11:00
Michael Sproul	d53ccf8fc7	Placeholder for BlobsByRange outbound rate limit	2023-02-14 12:08:14 +11:00
Michael Sproul	18c8cab4da	Merge remote-tracking branch 'origin/unstable' into capella-merge	2023-02-14 12:07:27 +11:00
realbigsean	d2ecbd942e	fix a couple new lints	2023-02-13 17:13:47 -05:00
realbigsean	cd8757de1c	Revert "make batch size check compile time panic" This reverts commit `68f2484efc`.	2023-02-13 16:51:55 -05:00
realbigsean	68f2484efc	make batch size check compile time panic	2023-02-13 16:51:46 -05:00
realbigsean	4c3561dcaf	make batch size check compile time panic	2023-02-13 16:50:33 -05:00
realbigsean	8f9c5cfca9	remove unused structs	2023-02-13 16:47:36 -05:00
realbigsean	ad9af6d8b1	complete match for `has_context_bytes`	2023-02-13 16:44:54 -05:00
realbigsean	fc2d07b4e3	allow unused	2023-02-13 16:36:38 -05:00
realbigsean	28702c9d5d	merge upstream, add back `get_blobs` logic	2023-02-13 16:29:21 -05:00
realbigsean	e58d7e85bf	Merge pull request #3951 from realbigsean/fix-blob-tx-ssz Encode blob transactions as signed blob transactions	2023-02-13 14:50:44 -05:00
realbigsean	e1cb4b8a11	Merge pull request #3869 from emhane/blobs_freezer Blobs freezer	2023-02-13 14:40:02 -05:00
Nazar Hussain	fa1d4c7054	Invalid cross build feature flag (#3959 ) ## Issue Addressed The documentation referring to build from source mismatches with the what gitworkflow uses. `aa5b7ef783/book/src/installation-source.md (L118-L120)` ## Proposed Changes Because the github workflow uses `cross` to build from source and for that build there is different env variable `CROSS_FEATURES` so need pass at the compile time. ## Additional Info Verified that existing `-dev` builds does not contains the `minimal` spec enabled. ```bash > docker run --rm --name node-5-cl-lighthouse sigp/lighthouse:latest-amd64-unstable-dev lighthouse --version Lighthouse v3.4.0-aa5b7ef BLS library: blst-portable SHA256 hardware acceleration: true Allocator: jemalloc Specs: mainnet (true), minimal (false), gnosis (true) ```	2023-02-13 03:32:03 +00:00
Michael Sproul	2f456ff9eb	Fix regression in DB write atomicity (#3931 ) ## Issue Addressed Fix a bug introduced by #3696. The bug is not expected to occur frequently, so releasing this PR is non-urgent. ## Proposed Changes * Add a variant to `StoreOp` that allows a raw KV operation to be passed around. * Return to using `self.store.do_atomically` rather than `self.store.hot_db.do_atomically`. This streamlines the write back into a single call and makes our auto-revert work again. * Prevent `import_block_update_shuffling_cache` from failing block import. This is an outstanding bug from before v3.4.0 which may have contributed to some random unexplained database corruption. ## Additional Info In #3696 I split the database write into two calls, one to convert the `StoreOp`s to `KeyValueStoreOp`s and one to write them. This had the unfortunate side-effect of damaging our atomicity guarantees in case of a write error. If the first call failed, we would be left with the block in fork choice but not on-disk (or the snapshot cache), which would prevent us from processing any descendant blocks. On `unstable` the first call is very unlikely to fail unless the disk is full, but on `tree-states` the conversion is more involved and a user reported database corruption after it failed in a way that should have been recoverable. Additionally, as @emhane observed, #3696 also inadvertently removed the import of the new block into the block cache. Although this seems like it could have negatively impacted performance, there are several mitigating factors: - For regular block processing we should almost always load the parent block (and state) from the snapshot cache. - We often load blinded blocks, which bypass the block cache anyway. - Metrics show no noticeable increase in the block cache miss rate with v3.4.0. However, I expect the block cache _will_ be useful again in `tree-states`, so it is restored to use by this PR.	2023-02-13 03:32:01 +00:00
Paul Hauner	84843d67d7	Reduce some EE and builder related ERRO logs to WARN (#3966 ) ## Issue Addressed NA ## Proposed Changes Our `ERRO` stream has been rather noisy since the merge due to some unexpected behaviours of builders and EEs. Now that we've been running post-merge for a while, I think we can drop some of these `ERRO` to `WARN` so we're not "crying wolf". The modified logs are: #### `ERRO Execution engine call failed` I'm seeing this quite frequently on Geth nodes. They seem to timeout when they're busy and it rarely indicates a serious issue. We also have logging across block import, fork choice updating and payload production that raise `ERRO` or `CRIT` when the EE times out, so I think we're not at risk of silencing actual issues. #### `ERRO "Builder failed to reveal payload"` In #3775 we reduced this log from `CRIT` to `ERRO` since it's common for builders to fail to reveal the block to the producer directly whilst still broadcasting it to the networ. I think it's worth dropping this to `WARN` since it's rarely interesting. I elected to stay with `WARN` since I really do wish builders would fulfill their API promises by returning the block to us. Perhaps I'm just being pedantic here, I could be convinced otherwise. #### `ERRO "Relay error when registering validator(s)"` It seems like builders and/or mev-boost struggle to handle heavy loads of validator registrations. I haven't observed issues with validators not actually being registered, but I see timeouts on these endpoints many times a day. It doesn't seem like this `ERRO` is worth it. #### `ERRO Error fetching block for peer ExecutionLayerErrorPayloadReconstruction` This means we failed to respond to a peer on the P2P network with a block they requested because of an error in the `execution_layer`. It's very common to see timeouts or incomplete responses on this endpoint whilst the EE is busy and I don't think it's important enough for an `ERRO`. As long as the peer count stays high, I don't think the user needs to be actively concerned about how we're responding to peers. ## Additional Info NA	2023-02-12 23:14:08 +00:00
Michael Sproul	3b4c677727	Use release profile for Windows binaries (#3965 ) ## Proposed Changes Disable `maxperf` profile on Windows due to #3964. This is required for the v3.5.0 release CI to succeed without crashing.	2023-02-12 23:14:07 +00:00

... 8 9 10 11 12 ...

5810 Commits