292: Backfill gaps in the recent past on startup when tracking head. #395

Merged
telackey merged 11 commits from telackey/292 into v1.11.6-statediff-v5 2023-06-24 04:04:57 +00:00
Member

This adds a new Backfill check which is run at startup, which attempts to recover from two sorts of gaps which can occur when tracking head:

  1. A contiguous gap from the last statediff position to the current head position. Something like this can happen if statediffs are in-flight when geth is terminated, or if there is an error which prevents writing statediffs but not syncing the chain (eg, the DB goes down). On startup then, we might notice that the current chain block is 2500, but the current statediff position is only 2498. The next ChainEvent we can expect to see will be at 2501, so we want to trigger statediffing of 2498, 2499, and 2500.

  2. Discontiguous gaps in the recent past. These can happen from temporary errors (eg, a momentary disruption in DB connectivity) or if the process is terminated with in-flight statediffs that completed out-of-order. Eg, block 2500 got written but 2498 and 2499 were still in-flight, leaving a sequence like: 2496, 2497, 2500, ... In this scenario, the statediff and chain positions are in sync, but we still need to plug the hole behind the current position.

I set default limits of 7200 blocks (~1 day) in both cases. If the statediff position is more than 7200 blocks behind head, it will not attempt to backfill and will instead log an error. When looking for gaps, it will look no further back than 7200 blocks from the current statediff position. Both are configurable.

To handle more extreme situations, we also have chain-chunker fillgap which can detect and fill gaps across an arbitrary range.

Tests: aa53b8abcb

This adds a new `Backfill` check which is run at startup, which attempts to recover from two sorts of gaps which can occur when tracking head: 1) A contiguous gap from the last statediff position to the current head position. Something like this can happen if statediffs are in-flight when geth is terminated, or if there is an error which prevents writing statediffs but not syncing the chain (eg, the DB goes down). On startup then, we might notice that the current chain block is 2500, but the current statediff position is only 2498. The _next_ ChainEvent we can expect to see will be at 2501, so we want to trigger statediffing of 2498, 2499, and 2500. 2) Discontiguous gaps in the recent past. These can happen from temporary errors (eg, a momentary disruption in DB connectivity) or if the process is terminated with in-flight statediffs that completed out-of-order. Eg, block 2500 got written but 2498 and 2499 were still in-flight, leaving a sequence like: 2496, 2497, 2500, ... In this scenario, the statediff and chain positions are in sync, but we still need to plug the hole behind the current position. I set default limits of 7200 blocks (~1 day) in both cases. If the statediff position is more than 7200 blocks behind head, it will not attempt to backfill and will instead log an error. When looking for gaps, it will look no further back than 7200 blocks from the current statediff position. Both are configurable. To handle more extreme situations, we also have `chain-chunker fillgap` which can detect and fill gaps across an arbitrary range. Tests: https://git.vdb.to/cerc-io/system-tests/commit/aa53b8abcb981688d45a5e453338b9026ad43314
dboreham reviewed 2023-06-22 14:49:18 +00:00
i-norden approved these changes 2023-06-23 17:48:27 +00:00
i-norden left a comment
Member

LGTM! One non-nitpick comment but it doesn't need to block this and we already discussed the testing situation 👍

LGTM! One non-nitpick comment but it doesn't need to block this and we already discussed the testing situation 👍
Author
Member

Tests added here: aa53b8abcb

Tests added here: https://git.vdb.to/cerc-io/system-tests/commit/aa53b8abcb981688d45a5e453338b9026ad43314
telackey reviewed 2023-06-23 22:41:58 +00:00
telackey reviewed 2023-06-23 22:42:05 +00:00
telackey reviewed 2023-06-23 22:44:39 +00:00
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: cerc-io/go-ethereum#395
No description provided.