Improve block delay metrics (#3894)

We recently ran a large-block experiment on the testnet and plan to do a further experiment on mainnet.

Although the metrics recovered from lighthouse nodes were quite useful, I think we could do with greater resolution in the block delay metrics and get some specific values for each block (currently these can be lost to large exponential histogram buckets). 

This PR increases the resolution of the block delay histogram buckets, but also introduces a new metric which records the last block delay. Depending on the polling resolution of the metric server, we can lose some block delay information, however it will always give us a specific value and we will not lose exact data based on poor resolution histogram buckets.
This commit is contained in:
Age Manning 2023-01-20 00:46:56 +00:00 committed by realbigsean
parent 8e50d316de
commit 528f7181bc
2 changed files with 13 additions and 1 deletions

View File

@ -716,6 +716,10 @@ impl<T: BeaconChainTypes> Worker<T> {
&metrics::BEACON_BLOCK_GOSSIP_SLOT_START_DELAY_TIME,
block_delay,
);
metrics::set_gauge(
&metrics::BEACON_BLOCK_LAST_DELAY,
block_delay.as_millis() as i64,
);
let verification_result = self
.chain

View File

@ -357,10 +357,18 @@ lazy_static! {
pub static ref BEACON_BLOCK_GOSSIP_SLOT_START_DELAY_TIME: Result<Histogram> = try_create_histogram_with_buckets(
"beacon_block_gossip_slot_start_delay_time",
"Duration between when the block is received and the start of the slot it belongs to.",
// Create a custom bucket list for greater granularity in block delay
Ok(vec![0.1, 0.2, 0.3,0.4,0.5,0.75,1.0,1.25,1.5,1.75,2.0,2.5,3.0,3.5,4.0,5.0,6.0,7.0,8.0,9.0,10.0,15.0,20.0])
// NOTE: Previous values, which we may want to switch back to.
// [0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50]
decimal_buckets(-1,2)
//decimal_buckets(-1,2)
);
pub static ref BEACON_BLOCK_LAST_DELAY: Result<IntGauge> = try_create_int_gauge(
"beacon_block_last_delay",
"Keeps track of the last block's delay from the start of the slot"
);
pub static ref BEACON_BLOCK_GOSSIP_ARRIVED_LATE_TOTAL: Result<IntCounter> = try_create_int_counter(
"beacon_block_gossip_arrived_late_total",
"Count of times when a gossip block arrived from the network later than the attestation deadline.",