Investigate deadlock error #136
Labels
No Label
bug
critical
duplicate
enhancement
epic
help wanted
in progress
invalid
low priority
question
rebase
v1
v5
wontfix
Copied from Github
Kind/Breaking
Kind/Bug
Kind/Documentation
Kind/Enhancement
Kind/Feature
Kind/Security
Kind/Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Status
Abandoned
Status
Blocked
Status
Need More Info
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: cerc-io/go-ethereum#136
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
It only appears to be occurring for logTrie and storageTrie nodes, which is illuminating because these are the only nest tries. So it is likely do to an issue with the rctTrie or stateTrie leaf node (respectively) that is being linked to by FK.
Deadlock can occur only when we are trying to modify the same data with two or more transactions.
We are multiple workers in
geth
which creates multiple transactions that can lead to a deadlock and in Postgres upon detecting a deadlock, once deadlock_timeout passes one of the transactions aborts itself and the other one commits itself.This article explains the condition when deadlock can occur in detail: https://rcoh.svbtle.com/postgres-unique-constraints-can-cause-deadlock.
To resolve this issue, I think we should retry the transaction that got aborted or continue and later backfill the data.
Please retry the aborted transaction, thanks!
Thanks! Any further insight into why it is occurring from a process perspective? I understand why deadlocks occur in Postgres, what's unclear is when/why we are modifying the same data simultaneously. If we are running multiple processes that process overlapping blocks it's clear why it occurs, but it's not apparent why we would see it within a single process with multiple goroutines. Those goroutines pull unique blocks off their shared work channel, except in the case of reorgs. But even with reorgs we would only process the same data* if the last common ancestor of the forks was replayed, but afaik it is not (I can't think of why it would be). Two reorgs switching away and then back to the same blocks could cause this even if LCA isn't replayed.
It's possible the changes made to the schema- using the new natural primary/foreign key scheme- in addition to having changed from doing
DO UPDATE
toDO NOTHING
on most the tables on thepostgres_refactor
branch will have gotten rid of the underlying cause of these deadlocks.*Same data including relational context, aka we have tons of the same trie nodes but they exist at different paths in the trie and/or exist at different block heights
The deadlock is arising from the
public.blocks
table1565911d66/statediff/indexer/shared/functions.go (L108)
Two different blocks can get the same storage node and log trie in diff. When we hash the same data it will be stored in the same key. Hence, the deadlock is arising.
If we retry this will get fixed. Not sure if we can avoid locking in this case since we are writing the same data.
Thanks @arijitAD! That explains, all the duplicate data in public.blocks slipped my mind, and in that case our schema updates won't fix the issue.