Reduce some EE and builder related ERRO logs to WARN (#3966)

## Issue Addressed

NA

## Proposed Changes

Our `ERRO` stream has been rather noisy since the merge due to some unexpected behaviours of builders and EEs. Now that we've been running post-merge for a while, I think we can drop some of these `ERRO` to `WARN` so we're not "crying wolf".

The modified logs are:

#### `ERRO Execution engine call failed`

I'm seeing this quite frequently on Geth nodes. They seem to timeout when they're busy and it rarely indicates a serious issue. We also have logging across block import, fork choice updating and payload production that raise `ERRO` or `CRIT` when the EE times out, so I think we're not at risk of silencing actual issues.

#### `ERRO "Builder failed to reveal payload"`

In #3775 we reduced this log from `CRIT` to `ERRO` since it's common for builders to fail to reveal the block to the producer directly whilst still broadcasting it to the networ. I think it's worth dropping this to `WARN` since it's rarely interesting.

I elected to stay with `WARN` since I really do wish builders would fulfill their API promises by returning the block to us. Perhaps I'm just being pedantic here, I could be convinced otherwise.

#### `ERRO "Relay error when registering validator(s)"`

It seems like builders and/or mev-boost struggle to handle heavy loads of validator registrations. I haven't observed issues with validators not actually being registered, but I see timeouts on these endpoints many times a day. It doesn't seem like this `ERRO` is worth it.

#### `ERRO Error fetching block for peer     ExecutionLayerErrorPayloadReconstruction`

This means we failed to respond to a peer on the P2P network with a block they requested because of an error in the `execution_layer`. It's very common to see timeouts or incomplete responses on this endpoint whilst the EE is busy and I don't think it's important enough for an `ERRO`. As long as the peer count stays high, I don't think the user needs to be actively concerned about how we're responding to peers.

## Additional Info

NA
This commit is contained in:
Paul Hauner 2023-02-12 23:14:08 +00:00
parent 3b4c677727
commit 84843d67d7
6 changed files with 28 additions and 12 deletions

1
Cargo.lock generated
View File

@ -4988,6 +4988,7 @@ dependencies = [
"eth2_ssz", "eth2_ssz",
"eth2_ssz_types", "eth2_ssz_types",
"ethereum-types 0.14.1", "ethereum-types 0.14.1",
"execution_layer",
"exit-future", "exit-future",
"fnv", "fnv",
"futures", "futures",

View File

@ -5,7 +5,7 @@ use crate::engine_api::{
}; };
use crate::HttpJsonRpc; use crate::HttpJsonRpc;
use lru::LruCache; use lru::LruCache;
use slog::{debug, error, info, Logger}; use slog::{debug, error, info, warn, Logger};
use std::future::Future; use std::future::Future;
use std::sync::Arc; use std::sync::Arc;
use task_executor::TaskExecutor; use task_executor::TaskExecutor;
@ -325,7 +325,7 @@ impl Engine {
Ok(result) Ok(result)
} }
Err(error) => { Err(error) => {
error!( warn!(
self.log, self.log,
"Execution engine call failed"; "Execution engine call failed";
"error" => ?error, "error" => ?error,

View File

@ -1575,10 +1575,10 @@ impl<T: EthSpec> ExecutionLayer<T> {
&metrics::EXECUTION_LAYER_BUILDER_REVEAL_PAYLOAD_OUTCOME, &metrics::EXECUTION_LAYER_BUILDER_REVEAL_PAYLOAD_OUTCOME,
&[metrics::FAILURE], &[metrics::FAILURE],
); );
error!( warn!(
self.log(), self.log(),
"Builder failed to reveal payload"; "Builder failed to reveal payload";
"info" => "this relay failure may cause a missed proposal", "info" => "this is common behaviour for some builders and may not indicate an issue",
"error" => ?e, "error" => ?e,
"relay_response_ms" => duration.as_millis(), "relay_response_ms" => duration.as_millis(),
"block_root" => ?block_root, "block_root" => ?block_root,

View File

@ -2855,7 +2855,7 @@ pub fn serve<T: BeaconChainTypes>(
.await .await
.map(|resp| warp::reply::json(&resp)) .map(|resp| warp::reply::json(&resp))
.map_err(|e| { .map_err(|e| {
error!( warn!(
log, log,
"Relay error when registering validator(s)"; "Relay error when registering validator(s)";
"num_registrations" => filtered_registration_data.len(), "num_registrations" => filtered_registration_data.len(),

View File

@ -45,6 +45,7 @@ tokio-util = { version = "0.6.3", features = ["time"] }
derivative = "2.2.0" derivative = "2.2.0"
delay_map = "0.1.1" delay_map = "0.1.1"
ethereum-types = { version = "0.14.1", optional = true } ethereum-types = { version = "0.14.1", optional = true }
execution_layer = { path = "../execution_layer" }
[features] [features]
deterministic_long_lived_attnets = [ "ethereum-types" ] deterministic_long_lived_attnets = [ "ethereum-types" ]

View File

@ -7,7 +7,7 @@ use itertools::process_results;
use lighthouse_network::rpc::StatusMessage; use lighthouse_network::rpc::StatusMessage;
use lighthouse_network::rpc::*; use lighthouse_network::rpc::*;
use lighthouse_network::{PeerId, PeerRequestId, ReportSource, Response, SyncInfo}; use lighthouse_network::{PeerId, PeerRequestId, ReportSource, Response, SyncInfo};
use slog::{debug, error}; use slog::{debug, error, warn};
use slot_clock::SlotClock; use slot_clock::SlotClock;
use std::sync::Arc; use std::sync::Arc;
use task_executor::TaskExecutor; use task_executor::TaskExecutor;
@ -392,12 +392,26 @@ impl<T: BeaconChainTypes> Worker<T> {
break; break;
} }
Err(e) => { Err(e) => {
error!( if matches!(
self.log, e,
"Error fetching block for peer"; BeaconChainError::ExecutionLayerErrorPayloadReconstruction(_block_hash, ref boxed_error)
"block_root" => ?root, if matches!(**boxed_error, execution_layer::Error::EngineError(_))
"error" => ?e ) {
); warn!(
self.log,
"Error rebuilding payload for peer";
"info" => "this may occur occasionally when the EE is busy",
"block_root" => ?root,
"error" => ?e,
);
} else {
error!(
self.log,
"Error fetching block for peer";
"block_root" => ?root,
"error" => ?e
);
}
// send the stream terminator // send the stream terminator
self.send_error_response( self.send_error_response(