Write state diff to CSV #2

Merged
elizabethengelman merged 47 commits from ee-state-diff into statediff-for-archive-node 2019-01-28 21:31:02 +00:00
elizabethengelman commented 2018-12-13 17:04:42 +00:00 (Migrated from github.com)
  • convert tests to the golang test package (to be consistent with the rest of the geth codebase)
  • only persist the new diff values (rather than old and new)
  • stop the service gracefully when the geth command is exited
  • ensure that the chain events we're subscribing to are sufficient to get all of the blocks, and not just the newest ones
  • ensure that an archive node is not required for this process

For reference, some sample diff data is here: https://github.com/8thlight/maker-vulcanizedb/pull/133

- [x] convert tests to the golang test package (to be consistent with the rest of the geth codebase) - [x] only persist the new diff values (rather than old and new) - [ ] stop the service gracefully when the geth command is exited - [x] ensure that the chain events we're subscribing to are sufficient to get all of the blocks, and not just the newest ones - [ ] ensure that an archive node is not required for this process For reference, some sample diff data is here: https://github.com/8thlight/maker-vulcanizedb/pull/133
rmulhol (Migrated from github.com) reviewed 2019-01-03 23:07:52 +00:00
rmulhol (Migrated from github.com) left a comment

These changes look awesome. My main focus going forward would be making sure we have both keys and values for data we want to consume (e.g. addresses for accounts and paths for storage values) - and please forgive if I'm overlooking how that's being handled 🙏

These changes look awesome. My main focus going forward would be making sure we have both keys and values for data we want to consume (e.g. addresses for accounts and paths for storage values) - and please forgive if I'm overlooking how that's being handled 🙏
rmulhol (Migrated from github.com) commented 2019-01-03 21:21:29 +00:00

Is this used? Not understanding why it would be always set to false here

Is this used? Not understanding why it would be always set to false here
rmulhol (Migrated from github.com) commented 2019-01-03 22:02:54 +00:00

I'm wondering if there's a way to use variable naming to help reveal what's happening here. I might just be being dense but I'm having trouble understanding why a NodeIterator initialized with make([]byte, 0) enables us to track down deletions while initializing it with make([]byte{}) yields creations.

I'm wondering if there's a way to use variable naming to help reveal what's happening here. I might just be being dense but I'm having trouble understanding why a `NodeIterator` initialized with `make([]byte, 0)` enables us to track down deletions while initializing it with `make([]byte{})` yields creations.
rmulhol (Migrated from github.com) commented 2019-01-03 22:54:49 +00:00

I may be missing something but it looks like the creation arg is unused (and I think that's true all the way up to the calls of buildDiffEventual), so we may be able to remove it

I may be missing something but it looks like the `creation` arg is unused (and I think that's true all the way up to the calls of `buildDiffEventual`), so we may be able to remove it
rmulhol (Migrated from github.com) commented 2019-01-03 22:12:58 +00:00

Super minor but I think this should be hexToKeyBytes

Super minor but I think this should be `hexToKeyBytes`
rmulhol (Migrated from github.com) commented 2019-01-03 23:00:34 +00:00

And then I think the _ here could be captured to publish the address

And then I think the `_` here could be captured to publish the address
rmulhol (Migrated from github.com) commented 2019-01-03 23:05:31 +00:00

I'm a little confused about how this works - isn't accountDiff.Storage a map? Does k capture the key and value? Thinking we probably want both of those so that we can parse the new value and associate it with the corresponding key when consuming the data

I'm a little confused about how this works - isn't `accountDiff.Storage` a map? Does `k` capture the key and value? Thinking we probably want both of those so that we can parse the new value and associate it with the corresponding key when consuming the data
@ -0,0 +11,4 @@
)
var (
Headers = []string{
rmulhol (Migrated from github.com) commented 2019-01-03 22:59:58 +00:00

I think we probably want to include the address (and/or trie key) as a field here

I think we probably want to include the address (and/or trie key) as a field here
elizabethengelman (Migrated from github.com) reviewed 2019-01-09 15:04:19 +00:00
elizabethengelman (Migrated from github.com) commented 2019-01-09 15:04:19 +00:00

yep, nice call, i don't think that is being used anymore.

yep, nice call, i don't think that is being used anymore.
elizabethengelman (Migrated from github.com) reviewed 2019-01-09 15:16:51 +00:00
elizabethengelman (Migrated from github.com) commented 2019-01-09 15:16:50 +00:00

Yep, nice catch. Previously we were storing the old value and the new value of each diff, and whether the account was created/deleted affected which value was used as the old. But since removing the old values, we shouldn't need that bool anymore. 🎉

Yep, nice catch. Previously we were storing the old value and the new value of each diff, and whether the account was created/deleted affected which value was used as the old. But since removing the old values, we shouldn't need that bool anymore. 🎉
elizabethengelman (Migrated from github.com) reviewed 2019-01-09 15:55:05 +00:00
elizabethengelman (Migrated from github.com) commented 2019-01-09 15:55:05 +00:00

That's a great point, this is confusing. The gist of it is that we're creating iterators from the old and new tries, and when we're creating the iterators we're passing in a starting key, which I think can just be 0. The difference between make(byte[], 0) and []byte{} was inadvertent, so I'll switch them both to be the same for consistency.

The difference between whether we'e tracking down deletions vs creations is in the order that we're passing the iterators to collectDiffNodes - where we're either finding nodes in the old trie that aren't in the new trie (deletions) or nodes in the new trie that we're in the old (creations).

I feel like there's a way to make this a bit more readable, but it hasn't come to me. Do you have any ideas?

That's a great point, this is confusing. The gist of it is that we're creating iterators from the old and new tries, and when we're creating the iterators we're passing in a starting key, which I think can just be 0. The difference between `make(byte[], 0)` and `[]byte{}` was inadvertent, so I'll switch them both to be the same for consistency. The difference between whether we'e tracking down deletions vs creations is in the order that we're passing the iterators to `collectDiffNodes` - where we're either finding nodes in the old trie that aren't in the new trie (deletions) or nodes in the new trie that we're in the old (creations). I feel like there's a way to make this a bit more readable, but it hasn't come to me. Do you have any ideas?
elizabethengelman (Migrated from github.com) reviewed 2019-01-10 19:33:36 +00:00
elizabethengelman (Migrated from github.com) commented 2019-01-10 19:33:36 +00:00

Yep, good call - k is just capturing the path to the storage value. I think I'll want both the key and the value when I add the storage value to the csv 👌

Yep, good call - `k` is just capturing the path to the storage value. I think I'll want both the key and the value when I add the storage value to the csv 👌
i-norden reviewed 2019-01-11 16:36:17 +00:00
Member

Doesn't look like this needs to return an error at the moment.

Doesn't look like this needs to return an error at the moment.
i-norden reviewed 2019-01-11 18:25:52 +00:00
Member

This is so so minor I feel silly mentioning it but these extra spaces on line 177 and 1191 are artifacts left over from when I went about inserting statediffing directly into the blockchain.

This is so so minor I feel silly mentioning it but these extra spaces on line 177 and 1191 are artifacts left over from when I went about inserting statediffing directly into the blockchain.
elizabethengelman (Migrated from github.com) reviewed 2019-01-15 21:54:17 +00:00
elizabethengelman (Migrated from github.com) commented 2019-01-15 21:54:17 +00:00

👍 nice catch

👍 nice catch
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: cerc-io/go-ethereum#2
No description provided.