## Issue Addressed
The documentation referring to build from source mismatches with the what gitworkflow uses.
aa5b7ef783/book/src/installation-source.md (L118-L120)
## Proposed Changes
Because the github workflow uses `cross` to build from source and for that build there is different env variable `CROSS_FEATURES` so need pass at the compile time.
## Additional Info
Verified that existing `-dev` builds does not contains the `minimal` spec enabled.
```bash
> docker run --rm --name node-5-cl-lighthouse sigp/lighthouse:latest-amd64-unstable-dev lighthouse --version
Lighthouse v3.4.0-aa5b7ef
BLS library: blst-portable
SHA256 hardware acceleration: true
Allocator: jemalloc
Specs: mainnet (true), minimal (false), gnosis (true)
```
## Issue Addressed
Fix a bug introduced by #3696. The bug is not expected to occur frequently, so releasing this PR is non-urgent.
## Proposed Changes
* Add a variant to `StoreOp` that allows a raw KV operation to be passed around.
* Return to using `self.store.do_atomically` rather than `self.store.hot_db.do_atomically`. This streamlines the write back into a single call and makes our auto-revert work again.
* Prevent `import_block_update_shuffling_cache` from failing block import. This is an outstanding bug from before v3.4.0 which may have contributed to some random unexplained database corruption.
## Additional Info
In #3696 I split the database write into two calls, one to convert the `StoreOp`s to `KeyValueStoreOp`s and one to write them. This had the unfortunate side-effect of damaging our atomicity guarantees in case of a write error. If the first call failed, we would be left with the block in fork choice but not on-disk (or the snapshot cache), which would prevent us from processing any descendant blocks. On `unstable` the first call is very unlikely to fail unless the disk is full, but on `tree-states` the conversion is more involved and a user reported database corruption after it failed in a way that should have been recoverable.
Additionally, as @emhane observed, #3696 also inadvertently removed the import of the new block into the block cache. Although this seems like it could have negatively impacted performance, there are several mitigating factors:
- For regular block processing we should almost always load the parent block (and state) from the snapshot cache.
- We often load blinded blocks, which bypass the block cache anyway.
- Metrics show no noticeable increase in the block cache miss rate with v3.4.0.
However, I expect the block cache _will_ be useful again in `tree-states`, so it is restored to use by this PR.
## Issue Addressed
NA
## Proposed Changes
Our `ERRO` stream has been rather noisy since the merge due to some unexpected behaviours of builders and EEs. Now that we've been running post-merge for a while, I think we can drop some of these `ERRO` to `WARN` so we're not "crying wolf".
The modified logs are:
#### `ERRO Execution engine call failed`
I'm seeing this quite frequently on Geth nodes. They seem to timeout when they're busy and it rarely indicates a serious issue. We also have logging across block import, fork choice updating and payload production that raise `ERRO` or `CRIT` when the EE times out, so I think we're not at risk of silencing actual issues.
#### `ERRO "Builder failed to reveal payload"`
In #3775 we reduced this log from `CRIT` to `ERRO` since it's common for builders to fail to reveal the block to the producer directly whilst still broadcasting it to the networ. I think it's worth dropping this to `WARN` since it's rarely interesting.
I elected to stay with `WARN` since I really do wish builders would fulfill their API promises by returning the block to us. Perhaps I'm just being pedantic here, I could be convinced otherwise.
#### `ERRO "Relay error when registering validator(s)"`
It seems like builders and/or mev-boost struggle to handle heavy loads of validator registrations. I haven't observed issues with validators not actually being registered, but I see timeouts on these endpoints many times a day. It doesn't seem like this `ERRO` is worth it.
#### `ERRO Error fetching block for peer ExecutionLayerErrorPayloadReconstruction`
This means we failed to respond to a peer on the P2P network with a block they requested because of an error in the `execution_layer`. It's very common to see timeouts or incomplete responses on this endpoint whilst the EE is busy and I don't think it's important enough for an `ERRO`. As long as the peer count stays high, I don't think the user needs to be actively concerned about how we're responding to peers.
## Additional Info
NA
Resolves the cargo-audit failure caused by https://rustsec.org/advisories/RUSTSEC-2023-0010.
I also removed the ignore for `RUSTSEC-2020-0159` as we are no longer using a vulnerable version of `chrono`. We still need the other ignore for `time 0.1` because we depend on it via `sloggers -> chrono -> time 0.1`.
## Proposed Changes
Remove the `[patch]` for `fixed-hash`.
We pinned it years ago in #2710 to fix `arbitrary` support. Nowadays the 0.7 version of `fixed-hash` is only used by the `web3` crate and doesn't need `arbitrary`.
~~Blocked on #3916 but could be merged in the same Bors batch.~~
## Issue Addressed
NA
## Description
We were missing an edge case when checking to see if a block is a descendant of the finalized checkpoint. This edge case is described for one of the tests in this PR:
a119edc739/consensus/proto_array/src/proto_array_fork_choice.rs (L1018-L1047)
This bug presented itself in the following mainnet log:
```
Jan 26 15:12:42.841 ERRO Unable to validate attestation error: MissingBeaconState(0x7c30cb80ec3d4ec624133abfa70e4c6cfecfca456bfbbbff3393e14e5b20bf25), peer_id: 16Uiu2HAm8RPRciXJYtYc5c3qtCRdrZwkHn2BXN3XP1nSi1gxHYit, type: "unaggregated", slot: Slot(5660161), beacon_block_root: 0x4a45e59da7cb9487f4836c83bdd1b741b4f31c67010c7ae343fa6771b3330489
```
Here the BN is rejecting an attestation because of a "missing beacon state". Whilst it was correct to reject the attestation, it should have rejected it because it attests to a block that conflicts with finality rather than claiming that the database is inconsistent.
The block that this attestation points to (`0x4a45`) is block `C` in the above diagram. It is a non-canonical block in the first slot of an epoch that conflicts with the finalized checkpoint. Due to our lazy pruning of proto array, `0x4a45` was still present in proto-array. Our missed edge-case in [`ForkChoice::is_descendant_of_finalized`](38514c07f2/consensus/fork_choice/src/fork_choice.rs (L1375-L1379)) would have indicated to us that the block is a descendant of the finalized block. Therefore, we would have accepted the attestation thinking that it attests to a descendant of the finalized *checkpoint*.
Since we didn't have the shuffling for this erroneously processed block, we attempted to read its state from the database. This failed because we prune states from the database by keeping track of the tips of the chain and iterating back until we find a finalized block. This would have deleted `C` from the database, hence the `MissingBeaconState` error.
## Issue Addressed
Resolves#2521
## Proposed Changes
Add a metric that indicates the next attestation duty slot for all managed validators in the validator client.