Statediff for full node #6
Labels
No Label
bug
critical
duplicate
enhancement
epic
help wanted
in progress
invalid
low priority
question
rebase
v1
v5
wontfix
Copied from Github
Kind/Breaking
Kind/Bug
Kind/Documentation
Kind/Enhancement
Kind/Feature
Kind/Security
Kind/Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Status
Abandoned
Status
Blocked
Status
Need More Info
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: cerc-io/go-ethereum#6
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "statediff-for-full-node"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This PR addresses several issues with the initial statediff approach:
When opening a trie based on it's root hash, I was previously looking at an empty in-memory database 🤦🏻♀️ which didn't affect the process with an archive node because it could look the information up from the disk database. Now we're looking at the blockchain's statecache, which is the correct in-mem database instance:
bf0df08b34
After the previous point was fixed, there was still the issue where some necessary trie roots were being pruned before we had a chance to process their state diffs. This was addressed by keeping a collection of state roots that have been state diffed, and only allowing them to be pruned once their state diff and their child's state diff has been calculated:
2ac8afab11
It's also worth noting that for now we're not looking up an address by it's storage path, because this proved to be pretty difficult on a full node (there's a task in Jira to investigate this further). Since for Maker we already have the contract addresses, we're able keccak hash them, which gives us the account leaf key (which we're storing in the csv instead for now).
Here are some example diffs.
Aside from storing the account leaf key instead of the account address in the csv, all of these changes should the archive node statediffing.
LGTM!
One thing I'm wondering is if it makes sense to have a branch of this that only executes on a full, non-archive node? Probably not super high priority, but thinking that it might be an easier sell upstream if the process is only supplementing a way of running the node that doesn't already have the full state.
@ -158,2 +159,4 @@
utils.RegisterEthService(stack, &cfg.Eth)
if ctx.GlobalBool(utils.StateDiffFlag.Name) {
cfg.Eth.StateDiff = true
would it make sense to combine this with the
RegisterStateDiffService
call, if both are based off of the call toctx.GlobalBool(utils.StateDiffFlag.Name
)?@ -997,0 +1009,4 @@
if !bc.allowedRootToBeDereferenced(root.(common.Hash)) {
bc.triegc.Push(root, number)
break
} else {
Maybe worth putting
1
and2
behind variables/constants to clarify what's going on here? My understanding is that a hash needs to be added once as the current hash, and once as the parent hash - and then we're done. Is that right? Do reorgs potentially complicate that expectation?Do we want to retain all these logging statements when we merge?
Looks good to me! It's awesome seeing this come together, thank you Elizabeth!!
Does this error need to be handled in any additional capacity? Right now it still falls through to the
ok
check.@ -158,2 +159,4 @@
utils.RegisterEthService(stack, &cfg.Eth)
if ctx.GlobalBool(utils.StateDiffFlag.Name) {
cfg.Eth.StateDiff = true
yeah, good call!
@ -997,0 +1009,4 @@
if !bc.allowedRootToBeDereferenced(root.(common.Hash)) {
bc.triegc.Push(root, number)
break
} else {
probably a good idea to remove some of them 👍
@ -997,0 +1009,4 @@
if !bc.allowedRootToBeDereferenced(root.(common.Hash)) {
bc.triegc.Push(root, number)
break
} else {
That's a really good catch. Yeah, my intention was that when we processes a state diff for a given hash, it gets added to the
stateDiffsProcessed
collection. It then gets added again when the state diff is processed when it is the parent, so we'll need to make sure it isn't pruned before these diffs have been processed.Considering reorgs is a really good thought, I'm honestly not sure how that would affect this expectation. I think I'd need to spend some time digging into this - do you think it makes sense to hold off merging this in? Or, merging it, and creating a new story to take a look in the future?
Nice catch. This log statement was more for me as informational to see if
AddToStateDiffProcessedCollection
would be called more than twice (once when the hash was current, and once when it was the parent). I don't think there is an if its called more than twice, it would mean that a state diff was processed using this hash more than twice. So, I may remove this if statement unless there's some an edge case that I'm missing.I also wanted to note that I added a sleep in here, to get the service test suite passing, which isn't ideal. I have a suspicion that this is an indication that something else may be going on in the service loop, so @rmulhol or @i-norden if you have a moment to take another look at it, that would be awesome. Or, perhaps we can walk through it together - it's a lot to keep in one's head! 🤯
Would definitely take you up on walking through it together - you've definitely got a better grasp of the larger project architecture than me!
That said I have no strong objection to including the sleep in a mock. My only question would be whether the need for a sleep here implies that actual asynchronous execution might run into a similar problem as whatever was causing the test to fail.
@ -997,0 +1009,4 @@
if !bc.allowedRootToBeDereferenced(root.(common.Hash)) {
bc.triegc.Push(root, number)
break
} else {
Definitely sounds like another story to me, and a bit hard to simulate without getting to the head of the chain and watching it execute in the middle of one.
@ -997,0 +1009,4 @@
if !bc.allowedRootToBeDereferenced(root.(common.Hash)) {
bc.triegc.Push(root, number)
break
} else {
Yeah, it definitely will be tricky. I've added another story so we can make sure to circle back: https://makerdao.atlassian.net/browse/VDB-393
Awesome job @elizabethengelman !! 🚀
Missing a bit of context of the code base obviously, but I noticed a few things that I've commented.
What exactly does this flag entail? Hint hint, there are comments on the rest of them ;)
This is equivalent:
If
!ok
,count
will have the zero value of int:0
anyway. I assume this is run very frequently, would save some computation.https://play.golang.org/p/EAba1v8NRcw
@ -1050,1 +1070,4 @@
// since we need the state tries of the current block and its parent in-memory
// in order to process statediffs, we should avoid dereferencing roots until
// its statediff and its child have been processed
Why? 🙃 Is it to prevent branches from GC/pruning? Worth the explicit information, that was hard to figure out and harder to guess :)
Does this mean we clean up the map when we're finished with a certain diff? Was just about to ask about memory leaks here, but probably okay if that's the case!
Feels like this constant could be ejected and set somewhere else so we don't have to assign it to a local closure all the time.
@ -1486,0 +1503,4 @@
blocks := makeBlockChain(genesis, numberOfBlocks+1, engine, db, canonicalSeed)
_, err := blockchain.InsertChain(blocks)
if err != nil {
t.Fatalf("failed to create pristine chain: %v", err)
✨ pristine ✨
@ -1486,0 +1563,4 @@
}
}
return false
}
This diff shows how we should write our tests instead of that damn Ginkgo! Looks so clean, and probably runs way faster. 🐌 Why are we even using
gomega
assertions when we have good 'olbool
Could probably be split up in smaller tests, but I figure the setup is a bit annoying.
😂 🍴
Minor but this is gonna log weird no?
Error creating tried for newStateRooterrorwhatever error got returned
Dang nvm after checking the docs, that key-value logger is sweet.
@ -179,2 +175,3 @@
deletedAcc := deletions[common.HexToAddress(val)]
createdAcc := creations[common.HexToHash(val)]
deletedAcc := deletions[common.HexToHash(val)]
oldSR := deletedAcc.Root
Why spend efforts duplicating the data when it's just (at a glance?) used to do a map indexing at 134?
@ -1050,1 +1070,4 @@
// since we need the state tries of the current block and its parent in-memory
// in order to process statediffs, we should avoid dereferencing roots until
// its statediff and its child have been processed
Totally makes sense! Is this more clear?
Nice catch!
Yep, if a root is okay to be dereferenced, we removed it from the stateDiffsProcessed collection just to make sure that that collection doesn't get huge. And then it will fall through to
triedb.Dereference(root.(common.Hash))
which removes it from the in-memory trie database.@ -179,2 +175,3 @@
deletedAcc := deletions[common.HexToAddress(val)]
createdAcc := creations[common.HexToHash(val)]
deletedAcc := deletions[common.HexToHash(val)]
oldSR := deletedAcc.Root
The
LeafKey
comment says that, "callers must not retain references to the value after calling Next" - so making a copy of it may be overkill, since I don't think we need to reference the leaf key after we convert it to a hash (on line 123).@ -1050,1 +1070,4 @@
// since we need the state tries of the current block and its parent in-memory
// in order to process statediffs, we should avoid dereferencing roots until
// its statediff and its child have been processed
@elizabethengelman Stellarly so! 🌟
I would also greatly appreciate a walk through but I was so slow to get back to this that you may have already done one! I don't think this would be what is causing the problem in the service loop, but while looking I noticed the break here and I've got myself confused about named breaks again. I'm under the impression this breaks out of the
for
loop entirely, not just the current select-case, so it not only skips the current block in the channel but stops listening altogether?Just realizing that I forgot to mention that named breaks during the brief walk through. My impression is the same as yours @i-norden, where it would break out of the for loop entirely. My thought for including it in the break that you mentioned was that if a block didn't have a parent, then something wrong, and we should quit the process.
That makes sense and I agree we should quit the process. This is really nitpicky but the error kind of makes it sound like we are only skipping that block but otherwise continuing.
That's a great point. I missed this before I merged it into the
statediffing
branch 🤦🏻♀️, but I'll make a note on that PR so that this comment is addressed before we open a PR upstream!