Carrion PK conflict #367
Labels
No Label
bug
critical
duplicate
enhancement
epic
help wanted
in progress
invalid
low priority
question
rebase
v1
v5
wontfix
Copied from Github
Kind/Breaking
Kind/Bug
Kind/Documentation
Kind/Enhancement
Kind/Feature
Kind/Security
Kind/Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Status
Abandoned
Status
Blocked
Status
Need More Info
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: cerc-io/go-ethereum#367
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
We are running into a PK conflict error on the Vulture/Carrion v4 setup when in the
copyfrom=true
modeThis is due to combination of:
Since it only occurs once in a blue moon under mainnet conditions that can't be reliably reproduced, it will be difficult to explore and test the issue.
So I’m more inclined to either
copyfrom
mode to be able to handle CONFLICT ON clauses, using a temp table as mentioned above, orIn the meantime we may want to revert to using
copyfrom=false
.@dboreham just something to be aware of.
I think we should elevate this to a somewhat high priority, until this is fixed we cannot rely on the
COPY FROM
mode.Looking at the recent logs, here are a few interesting details:
blockchain.ChainEvent
and one or more externally triggered statediffs fromstatediff_writeStatediffAt
API calls.Which after some digging and intentional reproduction in the fixturenet leads me to a theory of what is happening:
ipld-eth-server
, which calls writeStateDiffAt automatically under several conditions.END
is properly logged however, so it doesn't exit completely sideways.BEGIN
, but never anEND
, because they are blocked waiting forever for the leaked DB connections to be returned to the (empty) DB connection pool.I have confirmed that the DB conn is being leaked by intentionally triggering a similar error and observing that the pool gets exhausted, but I need to track down the exact source.
There are a few levels of fixes that need to occur here:
All 3 points are fixed in v5. I am backporting the fixes to v4 for deployment on vulture.
Thanks @telackey !