398: Statediff missing parent blocks automatically. #399
Labels
No Label
bug
critical
duplicate
enhancement
epic
help wanted
in progress
invalid
low priority
question
rebase
v1
v5
wontfix
Copied from Github
Kind/Breaking
Kind/Bug
Kind/Documentation
Kind/Enhancement
Kind/Feature
Kind/Security
Kind/Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Status
Abandoned
Status
Blocked
Status
Need More Info
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: cerc-io/go-ethereum#399
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "telackey/398"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Required DB updates: https://github.com/cerc-io/ipld-eth-db/pull/136
Related tests: cerc-io/system-tests#8
This fixes https://github.com/cerc-io/go-ethereum/issues/398 . After statediffing a block which we received from a ChainEvent, we now check if its parent exists in the index. If not, we statediff it as well, checking for its parent, etc. up to a maximum backfill limit.
This allows us to account for both skipped blocks because the chain was extended or missed blocks because the chain was reorganized.
To keep everything straight in the DB, we use a new 'canonical' boolean flag on the block in eth.header_cids. Whenever a new block is inserted in the DB, it is marked as canonical. If a new block is inserted at the same height later (eg, because of a reorg), any pre-existing blocks are marked as non-canonical at the same time.
This flag will allow for greatly simplified canonical checks in ipld-eth-server (see also https://github.com/cerc-io/ipld-eth-server/issues/251), eg:
We'll also need to handle updating
canonical
in the file indexer, but that's not really doable in the CSV writer. It will need to happen in the post-processing stage.Actually I think it might be better to split the canonicity handling into a new PR and plan it out a bit more. The DB procedure that's currently used should still work with this update to the indexing.
Unless I am misunderstanding, this would only ever be an issue if tracking head and outputting to CSV, but is that something that we would ever want to do? Updating 'canonical' status is just one of many problems in that case, since CSV also has no way for us to detect gaps of any sort, even gaps caused by something as minor as restarting the process, nor can we see find skipped parents when the chain is extended, or any other such inconsistency that is bound to arise, leaving a pretty unreliable record.
CSV makes a lot of sense for bulk and offline processing (eg, eth-statediff-service) but it is much less clear how it would be useful as a real-time stream.
The core issue of #398 is to ensure that the DB contents are valid and consistent when the chain is reorganized, and reflect the current state of the chain. That implies a determination about canonicity as well as completeness, so I don't think the concerns can be separated.
It was being used in production for performance reasons (Postgres proved to become overloaded while trying to track head), and last I heard it still was, but @i-norden or @srwadleigh could confirm.
This is true. I should have raised a concern earlier. We have no fallback mechanism for backfilling, so this needs to work. For detecting in-progress jobs I suppose we will at worst duplicate a few blocks.
The COPYFROM mode is used while tracking head, not the CSV persisting mode. Currently we only use the CSV mode in eth-statediff-service for historical data.
I stand corrected, as discussed the only thing using
file
mode in production iseth-statediff-service
so there's no problem here.Forgot to approve- LGTM 👍