Statediff for full node #6
	
		Labels
		
	
	
	
	
		No Label
		
			
	
	
	bug
		
			critical
		
			duplicate
		
			enhancement
		
			epic
		
			help wanted
		
			in progress
		
			invalid
		
			low priority
		
			question
		
			rebase
		
			v1
		
			v5
		
			wontfix
		
		
			Copied from Github
		
			Kind/Breaking
		
			Kind/Bug
		
			Kind/Documentation
		
			Kind/Enhancement
		
			Kind/Feature
		
			Kind/Security
		
			Kind/Testing
		
			Priority
Critical
		
			Priority
High
		
			Priority
Low
		
			Priority
Medium
		
			Reviewed
Confirmed
		
			Reviewed
Duplicate
		
			Reviewed
Invalid
		
			Reviewed
Won't Fix
		
			Status
Abandoned
		
			Status
Blocked
		
			Status
Need More Info
		
	
		No Milestone
		
			
		
	
	
		
		
		
			No project
			
				
			
		
	
	
	
	
	
		No Assignees
		
			
		
	
	
	
		2 Participants
		
	
	
		
		
			Notifications
			
				
			
		
	
	
		
		
	
	
	Due Date
	No due date set.
			
				Dependencies
				
				
		
	
	
	No dependencies set.
			Reference: cerc-io/go-ethereum#6
			
		
	
		Loading…
	
		Reference in New Issue
	
	Block a user
	
	No description provided.
		
		Delete Branch "statediff-for-full-node"
	
	Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This PR addresses several issues with the initial statediff approach:
When opening a trie based on it's root hash, I was previously looking at an empty in-memory database 🤦🏻♀️ which didn't affect the process with an archive node because it could look the information up from the disk database. Now we're looking at the blockchain's statecache, which is the correct in-mem database instance:
bf0df08b34After the previous point was fixed, there was still the issue where some necessary trie roots were being pruned before we had a chance to process their state diffs. This was addressed by keeping a collection of state roots that have been state diffed, and only allowing them to be pruned once their state diff and their child's state diff has been calculated:
2ac8afab11It's also worth noting that for now we're not looking up an address by it's storage path, because this proved to be pretty difficult on a full node (there's a task in Jira to investigate this further). Since for Maker we already have the contract addresses, we're able keccak hash them, which gives us the account leaf key (which we're storing in the csv instead for now).
Here are some example diffs.
Aside from storing the account leaf key instead of the account address in the csv, all of these changes should the archive node statediffing.
LGTM!
One thing I'm wondering is if it makes sense to have a branch of this that only executes on a full, non-archive node? Probably not super high priority, but thinking that it might be an easier sell upstream if the process is only supplementing a way of running the node that doesn't already have the full state.
@ -158,2 +159,4 @@utils.RegisterEthService(stack, &cfg.Eth)if ctx.GlobalBool(utils.StateDiffFlag.Name) {cfg.Eth.StateDiff = truewould it make sense to combine this with the
RegisterStateDiffServicecall, if both are based off of the call toctx.GlobalBool(utils.StateDiffFlag.Name)?@ -997,0 +1009,4 @@if !bc.allowedRootToBeDereferenced(root.(common.Hash)) {bc.triegc.Push(root, number)break} else {Maybe worth putting
1and2behind variables/constants to clarify what's going on here? My understanding is that a hash needs to be added once as the current hash, and once as the parent hash - and then we're done. Is that right? Do reorgs potentially complicate that expectation?Do we want to retain all these logging statements when we merge?
Looks good to me! It's awesome seeing this come together, thank you Elizabeth!!
Does this error need to be handled in any additional capacity? Right now it still falls through to the
okcheck.@ -158,2 +159,4 @@utils.RegisterEthService(stack, &cfg.Eth)if ctx.GlobalBool(utils.StateDiffFlag.Name) {cfg.Eth.StateDiff = trueyeah, good call!
@ -997,0 +1009,4 @@if !bc.allowedRootToBeDereferenced(root.(common.Hash)) {bc.triegc.Push(root, number)break} else {probably a good idea to remove some of them 👍
@ -997,0 +1009,4 @@if !bc.allowedRootToBeDereferenced(root.(common.Hash)) {bc.triegc.Push(root, number)break} else {That's a really good catch. Yeah, my intention was that when we processes a state diff for a given hash, it gets added to the
stateDiffsProcessedcollection. It then gets added again when the state diff is processed when it is the parent, so we'll need to make sure it isn't pruned before these diffs have been processed.Considering reorgs is a really good thought, I'm honestly not sure how that would affect this expectation. I think I'd need to spend some time digging into this - do you think it makes sense to hold off merging this in? Or, merging it, and creating a new story to take a look in the future?
Nice catch. This log statement was more for me as informational to see if
AddToStateDiffProcessedCollectionwould be called more than twice (once when the hash was current, and once when it was the parent). I don't think there is an if its called more than twice, it would mean that a state diff was processed using this hash more than twice. So, I may remove this if statement unless there's some an edge case that I'm missing.I also wanted to note that I added a sleep in here, to get the service test suite passing, which isn't ideal. I have a suspicion that this is an indication that something else may be going on in the service loop, so @rmulhol or @i-norden if you have a moment to take another look at it, that would be awesome. Or, perhaps we can walk through it together - it's a lot to keep in one's head! 🤯
Would definitely take you up on walking through it together - you've definitely got a better grasp of the larger project architecture than me!
That said I have no strong objection to including the sleep in a mock. My only question would be whether the need for a sleep here implies that actual asynchronous execution might run into a similar problem as whatever was causing the test to fail.
@ -997,0 +1009,4 @@if !bc.allowedRootToBeDereferenced(root.(common.Hash)) {bc.triegc.Push(root, number)break} else {Definitely sounds like another story to me, and a bit hard to simulate without getting to the head of the chain and watching it execute in the middle of one.
@ -997,0 +1009,4 @@if !bc.allowedRootToBeDereferenced(root.(common.Hash)) {bc.triegc.Push(root, number)break} else {Yeah, it definitely will be tricky. I've added another story so we can make sure to circle back: https://makerdao.atlassian.net/browse/VDB-393
Awesome job @elizabethengelman !! 🚀
Missing a bit of context of the code base obviously, but I noticed a few things that I've commented.
What exactly does this flag entail? Hint hint, there are comments on the rest of them ;)
This is equivalent:
If
!ok,countwill have the zero value of int:0anyway. I assume this is run very frequently, would save some computation.https://play.golang.org/p/EAba1v8NRcw
@ -1050,1 +1070,4 @@// since we need the state tries of the current block and its parent in-memory// in order to process statediffs, we should avoid dereferencing roots until// its statediff and its child have been processedWhy? 🙃 Is it to prevent branches from GC/pruning? Worth the explicit information, that was hard to figure out and harder to guess :)
Does this mean we clean up the map when we're finished with a certain diff? Was just about to ask about memory leaks here, but probably okay if that's the case!
Feels like this constant could be ejected and set somewhere else so we don't have to assign it to a local closure all the time.
@ -1486,0 +1503,4 @@blocks := makeBlockChain(genesis, numberOfBlocks+1, engine, db, canonicalSeed)_, err := blockchain.InsertChain(blocks)if err != nil {t.Fatalf("failed to create pristine chain: %v", err)✨ pristine ✨
@ -1486,0 +1563,4 @@}}return false}This diff shows how we should write our tests instead of that damn Ginkgo! Looks so clean, and probably runs way faster. 🐌 Why are we even using
gomegaassertions when we have good 'olboolCould probably be split up in smaller tests, but I figure the setup is a bit annoying.
😂 🍴
Minor but this is gonna log weird no?
Error creating tried for newStateRooterrorwhatever error got returnedDang nvm after checking the docs, that key-value logger is sweet.
@ -179,2 +175,3 @@deletedAcc := deletions[common.HexToAddress(val)]createdAcc := creations[common.HexToHash(val)]deletedAcc := deletions[common.HexToHash(val)]oldSR := deletedAcc.RootWhy spend efforts duplicating the data when it's just (at a glance?) used to do a map indexing at 134?
@ -1050,1 +1070,4 @@// since we need the state tries of the current block and its parent in-memory// in order to process statediffs, we should avoid dereferencing roots until// its statediff and its child have been processedTotally makes sense! Is this more clear?
Nice catch!
Yep, if a root is okay to be dereferenced, we removed it from the stateDiffsProcessed collection just to make sure that that collection doesn't get huge. And then it will fall through to
triedb.Dereference(root.(common.Hash))which removes it from the in-memory trie database.@ -179,2 +175,3 @@deletedAcc := deletions[common.HexToAddress(val)]createdAcc := creations[common.HexToHash(val)]deletedAcc := deletions[common.HexToHash(val)]oldSR := deletedAcc.RootThe
LeafKeycomment says that, "callers must not retain references to the value after calling Next" - so making a copy of it may be overkill, since I don't think we need to reference the leaf key after we convert it to a hash (on line 123).@ -1050,1 +1070,4 @@// since we need the state tries of the current block and its parent in-memory// in order to process statediffs, we should avoid dereferencing roots until// its statediff and its child have been processed@elizabethengelman Stellarly so! 🌟
I would also greatly appreciate a walk through but I was so slow to get back to this that you may have already done one! I don't think this would be what is causing the problem in the service loop, but while looking I noticed the break here and I've got myself confused about named breaks again. I'm under the impression this breaks out of the
forloop entirely, not just the current select-case, so it not only skips the current block in the channel but stops listening altogether?Just realizing that I forgot to mention that named breaks during the brief walk through. My impression is the same as yours @i-norden, where it would break out of the for loop entirely. My thought for including it in the break that you mentioned was that if a block didn't have a parent, then something wrong, and we should quit the process.
That makes sense and I agree we should quit the process. This is really nitpicky but the error kind of makes it sound like we are only skipping that block but otherwise continuing.
That's a great point. I missed this before I merged it into the
statediffingbranch 🤦🏻♀️, but I'll make a note on that PR so that this comment is addressed before we open a PR upstream!