lighthouse

Author	SHA1	Message	Date
Michael Sproul	c3321dddb7	Reduce attestation subscription spam from VC (#4806 ) ## Proposed Changes Instead of sending every attestation subscription every slot to every BN: - Send subscriptions 32, 16, 8, 7, 6, 5, 4, 3 slots before they occur. - Track whether each subscription is sent successfully and retry it in subsequent slots if necessary. ## Additional Info - [x] Add unit tests for `SubscriptionSlots`. - [x] Test on Holesky. - [x] Based on #4774 for testing.	2023-10-06 06:26:18 +00:00
Michael Sproul	9769a247b2	Address Clippy 1.73 lints (#4809 ) ## Proposed Changes Fix Clippy lints enabled by default in Rust 1.73.0, released today.	2023-10-06 03:05:47 +00:00
Nico Flaig	4b619c63d7	Exit aggregation step early if no validator is aggregator (#4774 ) ## Issue Addressed Closes https://github.com/sigp/lighthouse/issues/4712 ## Proposed Changes Exit aggregation step early if no validator is aggregator. This avoids an unnecessary request to the beacon node and more importantly fixes noisy errors if Lighthouse VC is used with other clients such as Lodestar and Prysm. ## Additional Info Related issue https://github.com/ChainSafe/lodestar/issues/5553	2023-10-05 02:14:55 +00:00
duguorong009	7d537214b7	fix(validator_client): return http 404 rather than 405 in http api (#4758 ) ## Issue Addressed - Close #4596 ## Proposed Changes - Add `Filter::recover` to handle rejections specifically as 404 NOT FOUND Please list or describe the changes introduced by this PR. ## Additional Info Similar to PR #3836	2023-10-04 00:43:29 +00:00
realbigsean	7605494791	Use only lighthouse types in the mock builder (#4793 ) ## Proposed Changes - only use LH types to avoid build issues - use warp instead of axum for the server to avoid importing the dep ## Additional Info - wondering if we can move the `execution_layer/test_utils` to its own crate and import it as a dev dependency - this would be made easier by separating out our engine API types into their own crate so we can use them in the test crate - or maybe we can look into using reth types for the engine api if they are in their own crate Co-authored-by: realbigsean <seananderson33@gmail.com>	2023-10-03 17:59:28 +00:00
Age Manning	8a1b77bf89	Ultra Fast Super Slick CI (#4755 ) Attempting to improve our CI speeds as its recently been a pain point. Major changes: - Use a github action to pull stable/nightly rust rather than building it each run - Shift test suite to `nexttest` https://github.com/nextest-rs/nextest for CI UPDATE: So I've iterated on some changes, and although I think its still not optimal I think this is a good base to start from. Some extra things in this PR: - Shifted where we pull rust from. We're now using this thing: https://github.com/moonrepo/setup-rust . It's got some interesting cache's built in, but was not seeing the gains that Jimmy managed to get. In either case tho, it can pull rust, cargofmt, clippy, cargo nexttest all in < 5s. So I think it's worthwhile. - I've grouped a few of the check-like tests into a single test called `code-test`. Although we were using github runners in parallel which may be faster, it just seems wasteful. There were like 4-5 tests, where we would pull lighthouse, compile it, then run an action, like clippy, cargo-audit or fmt. I've grouped these into a single action, so we only compile lighthouse once, then in each step we run the checks. This avoids compiling lighthouse like 5 times. - Ive made doppelganger tests run on our local machines to avoid pulling foundry, building and making lcli which are all now baked into the images. - We have sccache and do not incremental compile lighthouse Misc bonus things: - Cargo update - Fix web3 signer openssl keys which is required after a cargo update - Use mock_instant in an LRU cache test to avoid non-deterministic test - Remove race condition in building web3signer tests There's still some things we could improve on. Such as downloading the EF tests every run and the web3-signer binary, but I've left these to be out of scope of this PR. I think the above are meaningful improvements. Co-authored-by: Paul Hauner <paul@paulhauner.com> Co-authored-by: realbigsean <seananderson33@gmail.com> Co-authored-by: antondlr <anton@delaruelle.net>	2023-10-03 06:33:15 +00:00
João Oliveira	dcd69dfc62	Move dependencies to workspace (#4650 ) ## Issue Addressed Synchronize dependencies and edition on the workspace `Cargo.toml` ## Proposed Changes with https://github.com/rust-lang/cargo/issues/8415 merged it's now possible to synchronize details on the workspace `Cargo.toml` like the metadata and dependencies. By only having dependencies that are shared between multiple crates aligned on the workspace `Cargo.toml` it's easier to not miss duplicate versions of the same dependency and therefore ease on the compile times. ## Additional Info this PR also removes the no longer required direct dependency of the `serde_derive` crate. should be reviewed after https://github.com/sigp/lighthouse/pull/4639 get's merged. closes https://github.com/sigp/lighthouse/issues/4651 Co-authored-by: Michael Sproul <michael@sigmaprime.io> Co-authored-by: Michael Sproul <micsproul@gmail.com>	2023-09-22 04:30:56 +00:00
Jimmy Chen	c4e907de9f	Update the voluntary exit endpoint to comply with the key manager specification (#4679 ) ## Issue Addressed #4635 ## Proposed Changes Wrap the `SignedVoluntaryExit` object in a `GenericResponse` container, adding an additional `data` layer, to ensure compliance with the key manager API specification. The new response would look like this: ```json {"data":{"message":{"epoch":"196868","validator_index":"505597"},"signature":"0xhexsig"}} ``` This is a backward incompatible change and will affect Siren as well.	2023-09-22 02:33:11 +00:00
João Oliveira	c5588eb66e	require http and metrics for respective flags (#4674 ) ## Issue Addressed following discussion on https://github.com/sigp/lighthouse/pull/4639#discussion_r1305183750 this PR makes the `http` and `metrics` sub-flags to require those main flags enabled	2023-09-22 02:33:10 +00:00
Eitan Seri-Levi	992b476eac	Add SSZ support to validator block production endpoints (#4534 ) ## Issue Addressed #4531 ## Proposed Changes add SSZ support to the following block production endpoints: GET /eth/v2/validator/blocks/{slot} GET /eth/v1/validator/blinded_blocks/{slot} ## Additional Info i updated a few existing tests to use ssz instead of writing completely new tests	2023-09-21 06:38:31 +00:00
João Oliveira	d386a07b0c	validator client: start http api before genesis (#4714 ) ## Issue Addressed On a new network a user might require importing validators before waiting until genesis has occurred. ## Proposed Changes Starts the validator client http api before waiting for genesis ## Additional Info cc @antondlr	2023-09-15 10:08:30 +00:00
Michael Sproul	8e95b69a1a	Send success code for duplicate blocks on HTTP (#4655 ) ## Issue Addressed Closes #4473 (take 3) ## Proposed Changes - Send a 202 status code by default for duplicate blocks, instead of 400. This conveys to the caller that the block was published, but makes no guarantees about its validity. Block relays can count this as a success or a failure as they wish. - For users wanting finer-grained control over which status is returned for duplicates, a flag `--http-duplicate-block-status` can be used to adjust the behaviour. A 400 status can be supplied to restore the old (spec-compliant) behaviour, or a 200 status can be used to silence VCs that warn loudly for non-200 codes (e.g. Lighthouse prior to v4.4.0). - Update the Lighthouse VC to gracefully handle success codes other than 200. The info message isn't the nicest thing to read, but it covers all bases and isn't a nasty `ERRO`/`CRIT` that will wake anyone up. ## Additional Info I'm planning to raise a PR to `beacon-APIs` to specify that clients may return 202 for duplicate blocks. Really it would be nice to use some 2xx code that _isn't_ the same as the code for "published but invalid". I think unfortunately there aren't any suitable codes, and maybe the best fit is `409 CONFLICT`. Given that we need to fix this promptly for our release, I think using the 202 code temporarily with configuration strikes a nice compromise.	2023-08-28 00:55:31 +00:00
zhiqiangxu	b33bfe25b7	add `metrics::VALIDATOR_DUTIES_SYNC_HTTP_POST` for `post_validator_duties_sync` (#4617 ) It seems `post_validator_duties_sync` is the only api which doesn't have its own metric in `duties_service`, this PR adds `metrics::VALIDATOR_DUTIES_SYNC_HTTP_POST` for completeness.	2023-08-17 02:37:31 +00:00
zhiqiangxu	842b42297b	Fix bug of `init_from_beacon_node` (#4613 )	2023-08-14 00:29:47 +00:00
zhiqiangxu	f1ac12f23a	Fix some typos (#4565 )	2023-08-14 00:29:43 +00:00
Paul Hauner	1373dcf076	Add `validator-manager` (#3502 ) ## Issue Addressed Addresses #2557 ## Proposed Changes Adds the `lighthouse validator-manager` command, which provides: - `lighthouse validator-manager create` - Creates a `validators.json` file and a `deposits.json` (same format as https://github.com/ethereum/staking-deposit-cli) - `lighthouse validator-manager import` - Imports validators from a `validators.json` file to the VC via the HTTP API. - `lighthouse validator-manager move` - Moves validators from one VC to the other, utilizing only the VC API. ## Additional Info In 98bcb947c I've reduced some VC `ERRO` and `CRIT` warnings to `WARN` or `DEBG` for the case where a pubkey is missing from the validator store. These were being triggered when we removed a validator but still had it in caches. It seems to me that `UnknownPubkey` will only happen in the case where we've removed a validator, so downgrading the logs is prudent. All the logs are `DEBG` apart from attestations and blocks which are `WARN`. I thought having some logging about this condition might help us down the track. In `856cd7e37d` I've made the VC delete the corresponding password file when it's deleting a keystore. This seemed like nice hygiene. Notably, it'll only delete that password file after it scans the validator definitions and finds that no other validator is also using that password file.	2023-08-08 00:03:22 +00:00
Paul Hauner	5ea75052a8	Increase slashing protection test SQL timeout to 500ms (#4574 ) ## Issue Addressed NA ## Proposed Changes We've been seeing a lot of [CI failures](https://github.com/sigp/lighthouse/actions/runs/5781296217/job/15666209142) with errors like this: ``` ---- extra_interchange_tests::export_same_key_twice stdout ---- thread 'extra_interchange_tests::export_same_key_twice' panicked at 'called `Result::unwrap()` on an `Err` value: SQLError("Unable to open database: Error(None)")', validator_client/slashing_protection/src/extra_interchange_tests.rs:48:67 ``` I'm assuming they're timeouts. I noticed that tests have a 0.1s timeout. Perhaps this just doesn't cut it when our new runners are overloaded. ## Additional Info NA	2023-08-07 22:53:05 +00:00
Jimmy Chen	33c942ff03	Add support for updating validator graffiti (#4417 ) ## Issue Addressed #4386 ## Proposed Changes The original proposal described in the issue adds a new endpoint to support updating validator graffiti, but I realized we already have an endpoint that we use for updating various validator fields in memory and in the validator definitions file, so I think that would be the best place to add this to. ### API endpoint `PATCH lighthouse/validators/{validator_pubkey}` This endpoint updates the graffiti in both the [ validator definition file](https://lighthouse-book.sigmaprime.io/graffiti.html#2-setting-the-graffiti-in-the-validator_definitionsyml) and the in memory `InitializedValidators`. In the next block proposal, the new graffiti will be used. Note that the [`--graffiti-file`](https://lighthouse-book.sigmaprime.io/graffiti.html#1-using-the---graffiti-file-flag-on-the-validator-client) flag has a priority over the validator definitions file, so if the caller attempts to update the graffiti while the `--graffiti-file` flag is present, the endpoint will return an error (Bad request 400). ## Tasks - [x] Add graffiti update support to `PATCH lighthouse/validators/{validator_pubkey}` - [x] Return error if user tries to update graffiti while the `--graffiti-flag` is set - [x] Update Lighthouse book	2023-06-22 02:14:57 +00:00
Paul Hauner	3cac6d9ed5	Configure the `validator/register_validator` batch size via the CLI (#4399 ) ## Issue Addressed NA ## Proposed Changes Adds the `--validator-registration-batch-size` flag to the VC to allow runtime configuration of the number of validators POSTed to the [`validator/register_validator`](https://ethereum.github.io/beacon-APIs/?urls.primaryName=dev#/Validator/registerValidator) endpoint. There are builders (Agnostic and Eden) that are timing out with `regsiterValidator` requests with ~400 validators, even with a 9 second timeout. Exposing the batch size will help tune batch sizes to (hopefully) avoid this. This PR should not change the behavior of Lighthouse when the new flag is not provided (i.e., the same default value is used). ## Additional Info NA	2023-06-22 02:14:56 +00:00
Paul Hauner	5e3fb13cfe	Downgrade a `CRIT` in the VC for builder timeouts (#4366 ) ## Issue Addressed NA ## Proposed Changes Downgrade a `CRIT` to an `ERRO` when there's an `Irrecoverable` error whilst publishing a blinded block. It's quite common for builders successfully broadcast a block to the network whilst failing to respond to the BN when it publishes a signed, blinded block. The VC is currently raising a `CRIT` when this happens and I think that's excessive. These changes have the same intent as #4073. In that PR I only managed to remove the `CRIT`s in the BN but missed this one in the VC. I've also tidied the log messages to: - Give them all the same title ("Error whilst producing block") to help with grepping. - Include the `block_slot` so it's easy to look up the slot in an explorer and see if it was actually skipped. ## Additional Info This PR should not change any logic beyond logging.	2023-06-07 01:50:34 +00:00
Paul Hauner	d07c78bccf	Appease clippy in Rust 1.70 (#4365 ) ## Issue Addressed NA ## Proposed Changes Fixes some new clippy lints raised after updating to Rust 1.70. ## Additional Info NA	2023-06-02 03:17:40 +00:00
Age Manning	aa1ed787e9	Logging via the HTTP API (#4074 ) This PR adds the ability to read the Lighthouse logs from the HTTP API for both the BN and the VC. This is done in such a way to as minimize any kind of performance hit by adding this feature. The current design creates a tokio broadcast channel and mixes is into a form of slog drain that combines with our main global logger drain, only if the http api is enabled. The drain gets the logs, checks the log level and drops them if they are below INFO. If they are INFO or higher, it sends them via a broadcast channel only if there are users subscribed to the HTTP API channel. If not, it drops the logs. If there are more than one subscriber, the channel clones the log records and converts them to json in their independent HTTP API tasks. Co-authored-by: Michael Sproul <micsproul@gmail.com>	2023-05-22 05:57:08 +00:00
Michael Sproul	c27f2bf9c6	Avoid excessive logging of BN online status (#4315 ) ## Issue Addressed https://github.com/sigp/lighthouse/pull/4309#issuecomment-1556052261 ## Proposed Changes Log the `Connected to beacon node` message only if the node was previously offline. This avoids a regression in logging after #4295, whereby the `Connected to beacon node` message would be logged every slot. The new reduced logging is _slightly different_ from what we had prior to my changes in #4295. The main difference is that we used to log the `Connected` message whenever a node was online and subject to a health check (for being unhealthy in some other way). I think the new behaviour is reasonable, as the `Connected` message isn't particularly helpful if the BN is unhealthy, and the specific reason for unhealthiness will be logged by the warnings for `is_compatible`/`is_synced`.	2023-05-22 02:36:43 +00:00
Michael Sproul	3052db29fe	Implement `el_offline` and use it in the VC (#4295 ) ## Issue Addressed Closes https://github.com/sigp/lighthouse/issues/4291, part of #3613. ## Proposed Changes - Implement the `el_offline` field on `/eth/v1/node/syncing`. We set `el_offline=true` if: - The EL's internal status is `Offline` or `AuthFailed`, _or_ - The most recent call to `newPayload` resulted in an error (more on this in a moment). - Use the `el_offline` field in the VC to mark nodes with offline ELs as _unsynced_. These nodes will still be used, but only after synced nodes. - Overhaul the usage of `RequireSynced` so that `::No` is used almost everywhere. The `--allow-unsynced` flag was broken and had the opposite effect to intended, so it has been deprecated. - Add tests for the EL being offline on the upcheck call, and being offline due to the newPayload check. ## Why track `newPayload` errors? Tracking the EL's online/offline status is too coarse-grained to be useful in practice, because: - If the EL is timing out to some calls, it's unlikely to timeout on the `upcheck` call, which is _just_ `eth_syncing`. Every failed call is followed by an upcheck [here](`693886b941/beacon_node/execution_layer/src/engines.rs (L372-L380)`), which would have the effect of masking the failure and keeping the status _online_. - The `newPayload` call is the most likely to time out. It's the call in which ELs tend to do most of their work (often 1-2 seconds), with `forkchoiceUpdated` usually returning much faster (<50ms). - If `newPayload` is failing consistently (e.g. timing out) then this is a good indication that either the node's EL is in trouble, or the network as a whole is. In the first case validator clients _should_ prefer other BNs if they have one available. In the second case, all of their BNs will likely report `el_offline` and they'll just have to proceed with trying to use them. ## Additional Changes - Add utility method `ForkName::latest` which is quite convenient for test writing, but probably other things too. - Delete some stale comments from when we used to support multiple execution nodes.	2023-05-17 05:51:56 +00:00
Jimmy Chen	cf239fed61	Reduce bandwidth over the VC<>BN API using dependant roots (#4170 ) ## Issue Addressed #4157 ## Proposed Changes See description in #4157. In diagram form: ![reduce-attestation-bandwidth](https://user-images.githubusercontent.com/742762/230277084-f97301c1-0c5d-4fb3-92f9-91f99e4dc7d4.png) Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io>	2023-05-15 02:10:40 +00:00
Michael Sproul	c11638c36c	Split common crates out into their own repos (#3890 ) ## Proposed Changes Split out several crates which now exist in separate repos under `sigp`. - [`ssz` and `ssz_derive`](https://github.com/sigp/ethereum_ssz) - [`tree_hash` and `tree_hash_derive`](https://github.com/sigp/tree_hash) - [`ethereum_hashing`](https://github.com/sigp/ethereum_hashing) - [`ethereum_serde_utils`](https://github.com/sigp/ethereum_serde_utils) - [`ssz_types`](https://github.com/sigp/ssz_types) For the published crates see: https://crates.io/teams/github:sigp:crates-io?sort=recent-updates. ## Additional Info - [x] Need to work out how to handle versioning. I was hoping to do 1.0 versions of several crates, but if they depend on `ethereum-types 0.x` that is not going to work. EDIT: decided to go with 0.5.x versions. - [x] Need to port several changes from `tree-states`, `capella`, `eip4844` branches to the external repos.	2023-04-28 01:15:40 +00:00
Age Manning	7456e1e8fa	Separate BN for block proposals (#4182 ) It is a well-known fact that IP addresses for beacon nodes used by specific validators can be de-anonymized. There is an assumed risk that a malicious user may attempt to DOS validators when producing blocks to prevent chain growth/liveness. Although there are a number of ideas put forward to address this, there a few simple approaches we can take to mitigate this risk. Currently, a Lighthouse user is able to set a number of beacon-nodes that their validator client can connect to. If one beacon node is taken offline, it can fallback to another. Different beacon nodes can use VPNs or rotate IPs in order to mask their IPs. This PR provides an additional setup option which further mitigates attacks of this kind. This PR introduces a CLI flag --proposer-only to the beacon node. Setting this flag will configure the beacon node to run with minimal peers and crucially will not subscribe to subnets or sync committees. Therefore nodes of this kind should not be identified as nodes connected to validators of any kind. It also introduces a CLI flag --proposer-nodes to the validator client. Users can then provide a number of beacon nodes (which may or may not run the --proposer-only flag) that the Validator client will use for block production and propagation only. If these nodes fail, the validator client will fallback to the default list of beacon nodes. Users are then able to set up a number of beacon nodes dedicated to block proposals (which are unlikely to be identified as validator nodes) and point their validator clients to produce blocks on these nodes and attest on other beacon nodes. An attack attempting to prevent liveness on the eth2 network would then need to preemptively find and attack the proposer nodes which is significantly more difficult than the default setup. This is a follow on from: #3328 Co-authored-by: Michael Sproul <michael@sigmaprime.io> Co-authored-by: Paul Hauner <paul@paulhauner.com>	2023-04-26 01:12:36 +00:00
Jimmy Chen	e2c68c8893	Add new validator API for voluntary exit (#4119 ) ## Issue Addressed Addresses #4117 ## Proposed Changes See https://github.com/ethereum/keymanager-APIs/pull/58 for proposed API specification. ## TODO - [x] ~~Add submission to BN~~ - removed, see discussion in [keymanager API](https://github.com/ethereum/keymanager-APIs/pull/58) - [x] ~~Add flag to allow voluntary exit via the API~~ - no longer needed now the VC doesn't submit exit directly - [x] ~~Additional verification / checks, e.g. if validator on same network as BN~~ - to be done on client side - [x] ~~Potentially wait for the message to propagate and return some exit information in the response~~ - not required - [x] Update http tests - [x] ~~Update lighthouse book~~ - not required if this endpoint makes it to the standard keymanager API Co-authored-by: Paul Hauner <paul@paulhauner.com> Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io>	2023-04-03 03:02:56 +00:00
Maksim Shcherbo	788a4b718f	Optimise `update_validators` by decrypting key cache only when necessary (#4126 ) ## Title Optimise `update_validators` by decrypting key cache only when necessary ## Issue Addressed Resolves [#3968: Slow performance of validator client PATCH API with hundreds of keys](https://github.com/sigp/lighthouse/issues/3968) ## Proposed Changes 1. Add a check to determine if there is at least one local definition before decrypting the key cache. 2. Assign an empty `KeyCache` when all definitions are of the `Web3Signer` type. 3. Perform cache-related operations (e.g., saving the modified key cache) only if there are local definitions. ## Additional Info This PR addresses the excessive CPU usage and slow performance experienced when using the `PATCH lighthouse/validators/{pubkey}` request with a large number of keys. The issue was caused by the key cache using cryptography to decipher and cipher the cache entities every time the request was made. This operation called `scrypt`, which was very slow and required a lot of memory when there were many concurrent requests. These changes have no impact on the overall functionality but can lead to significant performance improvements when working with remote signers. Importantly, the key cache is never used when there are only `Web3Signer` definitions, avoiding the expensive operation of decrypting the key cache in such cases. Co-authored-by: Maksim Shcherbo <max.shcherbo@consensys.net>	2023-03-29 02:56:39 +00:00
Paul Hauner	3ad77fb52c	Add VC metric for primary BN latency (#4051 ) ## Issue Addressed NA ## Proposed Changes In #4024 we added metrics to expose the latency measurements from a VC to each BN. Whilst playing with these new metrics on our infra I realised it would be great to have a single metric to make sure that the primary BN for each VC has a reasonable latency. With the current "metrics for all BNs" it's hard to tell which is the primary. ## Additional Info NA	2023-03-06 04:08:49 +00:00
Paul Hauner	f775404c10	Log a `WARN` in the VC for a mismatched Capella fork epoch (#4050 ) ## Issue Addressed NA ## Proposed Changes - Adds a `WARN` statement for Capella, just like the previous forks. - Adds a hint message to all those WARNs to suggest the user update the BN or VC. ## Additional Info NA	2023-03-06 04:08:48 +00:00
Michael Sproul	57dfcfd83a	Optimise attestation selection proof signing (#4033 ) ## Issue Addressed Closes #3963 (hopefully) ## Proposed Changes Compute attestation selection proofs gradually each slot rather than in a single `join_all` at the start of each epoch. On a machine with 5k validators this replaces 5k tasks signing 5k proofs with 1 task that signs 5k/32 ~= 160 proofs each slot. Based on testing with Goerli validators this seems to reduce the average time to produce a signature by preventing Tokio and the OS from falling over each other trying to run hundreds of threads. My testing so far has been with local keystores, which run on a dynamic pool of up to 512 OS threads because they use [`spawn_blocking`](https://docs.rs/tokio/1.11.0/tokio/task/fn.spawn_blocking.html) (and we haven't changed the default). An earlier version of this PR hyper-optimised the time-per-signature metric to the detriment of the entire system's performance (see the reverted commits). The current PR is conservative in that it avoids touching the attestation service at all. I think there's more optimising to do here, but we can come back for that in a future PR rather than expanding the scope of this one. The new algorithm for attestation selection proofs is: - We sign a small batch of selection proofs each slot, for slots up to 8 slots in the future. On average we'll sign one slot's worth of proofs per slot, with an 8 slot lookahead. - The batch is signed halfway through the slot when there is unlikely to be contention for signature production (blocks are <4s, attestations are ~4-6 seconds, aggregates are 8s+). ## Performance Data _See first comment for updated graphs_. Graph of median signing times before this PR: ![signing_times_median](https://user-images.githubusercontent.com/4452260/221495627-3ab3c105-319f-406e-b99d-b5913e0ded9c.png) Graph of update attesters metric (includes selection proof signing) before this PR: ![update_attesters_store](https://user-images.githubusercontent.com/4452260/221497057-01ba40e4-8148-45f6-9e21-36a9567a631a.png) Median signing time after this PR (prototype from 12:00, updated version from 13:30): ![signing_times_median_updated](https://user-images.githubusercontent.com/4452260/221771578-47a040cc-b832-482f-9a1a-d1bd9854e00e.png) 99th percentile on signing times (bounded attestation signing from 16:55, now removed): ![signing_times_99pc](https://user-images.githubusercontent.com/4452260/221772055-e64081a8-2220-45ba-ba6d-9d7e344a5bde.png) Attester map update timing after this PR: ![update_attesters_store_updated](https://user-images.githubusercontent.com/4452260/221771757-c8558a48-7f4e-4bb5-9929-dee177a66c1e.png) Selection proof signings per second change: ![signing_attempts](https://user-images.githubusercontent.com/4452260/221771855-64f5da22-1655-478d-926b-810be8a3650c.png) ## Link to late blocks I believe this is related to the slow block signings because logs from Stakely in #3963 show these two logs almost 5 seconds apart: > Feb 23 18:56:23.978 INFO Received unsigned block, slot: 5862880, service: block, module: validator_client::block_service:393 > Feb 23 18:56:28.552 INFO Publishing signed block, slot: 5862880, service: block, module: validator_client::block_service:416 The only thing that happens between those two logs is the signing of the block: `0fb58a680d/validator_client/src/block_service.rs (L410-L414)` Helpfully, Stakely noticed this issue without any Lighthouse BNs in the mix, which pointed to a clear issue in the VC. ## TODO - [x] Further testing on testnet infrastructure. - [x] Make the attestation signing parallelism configurable.	2023-03-05 23:43:31 +00:00
Paul Hauner	6e15533b54	Add latency measurement service to VC (#4024 ) ## Issue Addressed NA ## Proposed Changes Adds a service which periodically polls (11s into each mainnet slot) the `node/version` endpoint on each BN and roughly measures the round-trip latency. The latency is exposed as a `DEBG` log and a Prometheus metric. The `--latency-measurement-service` has been added to the VC, with the following options: - `--latency-measurement-service true`: enable the service (default). - `--latency-measurement-service`: (without a value) has the same effect. - `--latency-measurement-service false`: disable the service. ## Additional Info Whilst looking at our staking setup, I think the BN+VC latency is contributing to late blocks. Now that we have to wait for the builders to respond it's nice to try and do everything we can to reduce that latency. Having visibility is the first step.	2023-03-05 23:43:29 +00:00
Atanas Minkov	2b6348781a	Log a debug message when a request fails for a beacon node candidate (#4036 ) ## Issue Addressed #3985 ## Proposed Changes Log a debug message when a BN candidate returns an error. `Mar 01 16:40:24.011 DEBG Request to beacon node failed error: ServerMessage(ErrorMessage { code: 503, message: "SERVICE_UNAVAILABLE: beacon node is syncing: head slot is 8416, current slot is 5098402", stacktraces: [] }), node: http://localhost:5052/`	2023-03-02 05:26:14 +00:00
Divma	047c7544e3	Clean capella (#4019 ) ## Issue Addressed Cleans up all the remnants of 4844 in capella. This makes sure when 4844 is reviewed there is nothing we are missing because it got included here ## Proposed Changes drop a bomb on every 4844 thing ## Additional Info Merge process I did (locally) is as follows: - squash merge to produce one commit - in new branch off unstable with the squashed commit create a `git revert HEAD` commit - merge that new branch onto 4844 with `--strategy ours` - compare local 4844 to remote 4844 and make sure the diff is empty - enjoy Co-authored-by: Paul Hauner <paul@paulhauner.com>	2023-03-01 03:19:02 +00:00
Alan Höng	cc4fc422b2	Add content-type header to metrics server response (#3970 ) This fixes issues with certain metrics scrapers, which might error if the content-type is not correctly set. ## Issue Addressed Fixes https://github.com/sigp/lighthouse/issues/3437 ## Proposed Changes Simply set header: `Content-Type: text/plain` on metrics server response. Seems like the errored branch does this correctly already. ## Additional Info This is needed also to enable influx-db metric scraping which work very nicely with Geth.	2023-02-28 02:20:50 +00:00
Michael Sproul	066c27750a	Merge remote-tracking branch 'origin/staging' into capella-update	2023-02-17 12:05:36 +11:00
Michael Sproul	2fcfdf1a01	Fix docker and deps (#3978 ) ## Proposed Changes - Fix this cargo-audit failure for `sqlite3-sys`: https://github.com/sigp/lighthouse/actions/runs/4179008889/jobs/7238473962 - Prevent the Docker builds from running out of RAM on CI by removing `gnosis` and LMDB support from the `-dev` images (see: https://github.com/sigp/lighthouse/pull/3959#issuecomment-1430531155, successful run on my fork: https://github.com/michaelsproul/lighthouse/actions/runs/4179162480/jobs/7239537947).	2023-02-15 11:51:46 +00:00
Michael Sproul	18c8cab4da	Merge remote-tracking branch 'origin/unstable' into capella-merge	2023-02-14 12:07:27 +11:00
Pawan Dhananjay	2b735a9e8b	Add attestation duty slot metric (#2704 ) ## Issue Addressed Resolves #2521 ## Proposed Changes Add a metric that indicates the next attestation duty slot for all managed validators in the validator client.	2023-02-09 23:51:17 +00:00
Michael Sproul	bb0e99c097	Merge remote-tracking branch 'origin/unstable' into capella	2023-01-21 10:37:26 +11:00
GeemoCandama	b4d9fc03ee	add logging for starting request and receiving block (#3858 ) ## Issue Addressed #3853 ## Proposed Changes Added `INFO` level logs for requesting and receiving the unsigned block. ## Additional Info Logging for successfully publishing the signed block is already there. And seemingly there is a log for when "We realize we are going to produce a block" in the `start_update_service`: `info!(log, "Block production service started"); `. Is there anywhere else you'd like to see logging around this event? Co-authored-by: GeemoCandama <104614073+GeemoCandama@users.noreply.github.com>	2023-01-17 05:13:48 +00:00
David Theodore	9a970ce3a2	add better err reporting UnableToOpenVotingKeystore (#3781 ) ## Issue Addressed #3780 ## Proposed Changes Add error reporting that notifies the node operator that the `voting_keystore_path` in their `validator_definitions.yml` file may be incorrect. ## Additional Info There is more info in issue #3780 Co-authored-by: Paul Hauner <paul@paulhauner.com>	2023-01-17 05:13:47 +00:00
Michael Sproul	2af8110529	Merge remote-tracking branch 'origin/unstable' into capella Fixing the conflicts involved patching up some of the `block_hash` verification, the rest will be done as part of https://github.com/sigp/lighthouse/issues/3870	2023-01-12 16:22:00 +11:00
realbigsean	c85cd87c10	Web3 signer validator definitions reloading on any request (#3801 ) ## Issue Addressed https://github.com/sigp/lighthouse/issues/3795 Co-authored-by: realbigsean <sean@sigmaprime.io>	2023-01-09 08:18:56 +00:00
realbigsean	d8f7277beb	cleanup	2022-12-30 11:00:14 -05:00
Mark Mackey	c188cde034	merge upstream/unstable	2022-12-28 14:43:25 -06:00
Mark Mackey	c922566fbc	Fixed Some Tests	2022-12-27 15:59:34 -06:00
Divma	ffbf70e2d9	Clippy lints for rust 1.66 (#3810 ) ## Issue Addressed Fixes the new clippy lints for rust 1.66 ## Proposed Changes Most of the changes come from: - [unnecessary_cast](https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_cast) - [iter_kv_map](https://rust-lang.github.io/rust-clippy/master/index.html#iter_kv_map) - [needless_borrow](https://rust-lang.github.io/rust-clippy/master/index.html#needless_borrow) ## Additional Info na	2022-12-16 04:04:00 +00:00
Michael Sproul	991e4094f8	Merge remote-tracking branch 'origin/unstable' into capella-update	2022-12-14 13:00:41 +11:00

1 2 3 4 5 ...

488 Commits