Reorgs lead to invalid block hierarchy in the statediff database. #398
Labels
No Label
bug
critical
duplicate
enhancement
epic
help wanted
in progress
invalid
low priority
question
rebase
v1
v5
wontfix
Copied from Github
Kind/Breaking
Kind/Bug
Kind/Documentation
Kind/Enhancement
Kind/Feature
Kind/Security
Kind/Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Status
Abandoned
Status
Blocked
Status
Need More Info
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: cerc-io/go-ethereum#398
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The reorganization code is invoked in geth whenever more than one block is added to the chain at a time. One recent PR handled a bug where blocks were missed when the chain was purely extended (eg, the chain was at 1000, extended to 1003, and we skipped statediffing blocks 1001 and 1002).
The other case is a reorganization in the more formal sense, when we have a different sequence of blocks at the same height, and switch from and "old" chain to a "new" chain. For example, we have blocks
[1000:0x123, 1001:0xabc]
and are told by the consensus client to switch to[1000:0x123, 1001:0x456, 1002:0x789]
.This will cause a similar bug to the one mentioned above, but in this case we will have the wrong block stored in the database at 1001, rather than no block at all. The fix for #397 will not catch this, because that is only able to observe the gap in the sequence, and there is no gap in this instance.
I was able to induce reorgs with some regularity in the fixturenet by pausing the statediffing container for two slots, then unpausing it for one, in a loop.
In the DB, we then see that that this allows us to insert blocks without a parent:
Here is a specific example:
You can see that 16701 (0x18dc) points to 16700 (0x3d0b), and 16703 (0x710d) points to 16702 (0xdb3f), but 16702 points to a parent (0x406d) that does not exist. Yet there are no duplicated block numbers at 16701, it is just a missing entry.
This can be fixed by explicitly calling
statediff_writeStateDiffAt(16701)
after which the hierarchy is fixed. We see both the old, orphaned 16701 (0x18dc) and the new, canonical 16701 (0x406d).It is an important detail that we could have been informed about this, as the reorganization issues not only a
ChainEvent
for the new head block, but alsoChainSideEvent
s for every block removed from the chain. We do not currently listen forChainSideEvent
s however.If we double-checked that we had the parent of each block we insert, it should serve as a fix for both the #397 issue and this one, while being more reliable and resilient than relying purely on event notification.
We'd need to handle the empty DB case to prevent chasing the rabbit all the way back to genesis, but otherwise it should make sure we do not allow gaps to open.
If these additional checks can be done in a performant way, by all means, please add them. Relatedly, we should make sure that https://github.com/cerc-io/eth-ipfs-state-validator would have caught and corrected these issues.
I can verify that https://github.com/cerc-io/ipld-eth-db-validator/tree/v5 detects this issue, though it cannot correct it.
The reported error looks like: