Add WriteStateSnapshot #15

roysc · 2023-08-30T17:21:15Z

roysc commented

2023-08-30 17:21:15 +00:00

Adds a method to perform full-state snapshots by diffing against an empty state trie.
This replicates the functionality of ipld-eth-state-snapshot, so that code can use this as a library; see: cerc-io/ipld-eth-state-snapshot#1

Note that due to how incremental diffs are processed (updates are processed after the
trie has been traversed) the iterator state doesn't fully capture the progress of the diff, so it's not currently feasible to track them this way. Full snapshots don't have to worry about updated accounts, so we can support them.

Adds a method to perform full-state snapshots by diffing against an empty state trie. This replicates the functionality of `ipld-eth-state-snapshot`, so that code can use this as a library; see: https://git.vdb.to/cerc-io/ipld-eth-state-snapshot/pulls/1 Note that due to how incremental diffs are processed (updates are processed after the trie has been traversed) the iterator state doesn't fully capture the progress of the diff, so it's not currently feasible to track them this way. Full snapshots don't have to worry about updated accounts, so we can support them.

roysc added 33 commits 2023-08-30 17:21:17 +00:00

tests: update ipld-eth-db image

Test / Run unit tests (pull_request) Failing after 6s

Details

Test / Run integration tests (pull_request) Failing after 34m57s

Details

69fa11d6b1

Update API version c53f17515f

update deps 398ee00308

update plugeth 8b59545c59

Builder clean up dcfb9007bb

- Remove unused bits
- simplify and normalize test data
- Renames for clarity

Implement symmetric difference iterator fdcce5be35

symmetric diff iterator: detect common paths ae30af8544

use symmetric difference iterator bf5cb26c55

consolidate, simplify a6562c81c4

iterator fixes b29d60542c

refactor builder tests ed05fa2979

builder cleanup 9b50554d1b

refactor builder metrics

Test / Run unit tests (pull_request) Failing after 2m6s

Details

Test / Run integration tests (pull_request) Failing after 2m12s

Details

23204cb810

use concurrent iterators 097d636f52

sync indexer publishing 19c96215e0

go mod update & tidy c04cd3eac6

note geth bug 338acc60c9

unused 008b0c48b5

clean up 342b7066cc

rm redundant functions 0e01a67a33

rm redundant functions dffb403664

cleanup 34ea4c393d

unused a9ebc41b3d

uint worker count 95b6edd1a4

factor out PushHeader method 12272aa001

refactor batch 2f540104d1

expose symm diff iterator state in companion object 66f7637086

WriteStateDiffTracked 366d6d4b32

comments & formatting 09723ac40a

rm redundant node info f8d68bd1d8

tweaks c0c5d97c41

re-export indexer types for consuming libs 75a3bef39e

[wip] go mod udpate

Test / Run unit tests (pull_request) Failing after 2m3s

Details

Test / Run integration tests (pull_request) Failing after 2m11s

Details

c2160fac83

roysc force-pushed with-iterator-tracker from c2160fac83 to 5ed22c03d2

2023-09-01 07:09:53 +00:00

Compare

roysc changed title from ~~WIP: Add WriteStateDiffTracked to support tracking & recovery of a statediff~~ to Add WriteStateDiffTracked to support tracking & recovery of a statediff

2023-09-01 20:50:57 +00:00

roysc requested review from i-norden 2023-09-01 20:53:13 +00:00

roysc referenced this pull request from cerc-io/eth-statediff-service

2023-09-01 20:59:34 +00:00

Refactor to use plugeth-statediff #1

roysc referenced this pull request from cerc-io/ipld-eth-state-snapshot

2023-09-01 21:08:59 +00:00

Refactor to use statediff plugin #1

telackey requested review from telackey 2023-09-06 21:25:42 +00:00

roysc force-pushed with-iterator-tracker from 5ed22c03d2 to 77e00278e4

2023-09-25 19:14:12 +00:00

Compare

telackey requested changes 2023-09-26 01:49:53 +00:00

telackey left a comment

For such a massive PR, I really don't have a ton of comments or questions. I was befuddled on a couple of points though which I highlighted.

indexer/database/dump/indexer.go Outdated

						
				@ -93,4 +93,2 @@

					}

					t = time.Now()

					blockTx := &BatchTx{

telackey commented

2023-09-26 01:30:53 +00:00

Cleaning this part of the code up is a big win.

👍 1

roysc marked this conversation as resolved

indexer/test/test.go

						
				@ -48,7 +47,7 @@ func SetupTestData(t *testing.T, ind interfaces.StateDiffIndexer) {

						t.Fatal(err)

					}

					defer func() {

telackey commented

2023-09-26 01:38:00 +00:00

Do you recall why we passed err in to Submit() before? I have a vague memory buzzing in the back of my mind and I want to double check that there is no longer any reason.

roysc commented

2023-09-26 09:44:59 +00:00

Basically it was due to handling rollbacks and commit in the submit function, so it needed the error to check whether to rollback. By factoring the rollback out, we avoid that, but now the caller is responsible for it.

Which reminds me that I need to defer a rollback call from the scope of service.writeStateDiff - since the tx is returned from PushBlock.

Basically it was due to handling rollbacks and commit in the submit function, so it needed the error to check whether to rollback. By factoring the rollback out, we avoid that, but now the caller is responsible for it. Which reminds me that I need to defer a rollback call from the scope of `service.writeStateDiff` - since the tx is returned from `PushBlock`.

👍 1

telackey commented

2023-09-26 15:59:16 +00:00

If you've added the rollback, feel free to resolve this conversation.

roysc marked this conversation as resolved

utils/iterator.go Outdated

						
				@ -36,0 +21,4 @@

				// SymmDiffAux exposes state specific to symmetric difference iteration, which is not accessible

				// from the NodeIterator interface. This includes the number of nodes seen, whether the current key

				// is common to both A and B, and whether the current node is sourced from A or B.

				type SymmDiffAux struct {

telackey commented

2023-09-26 01:46:05 +00:00

What is the advantage/disadvantage of doing this by a sort of "sidecar" type vs extending the interface or type.

Eg, something more traditional like:

type SymDiffNodeIterator interface {
    NodeIterator
    NextComesFromA() bool 
    Position() int
    LastEqPathIndex() int
}

Just as a rule, having one type representing the internal state of another type, and passing them both around, strikes me as odd, but there may be a compelling reason to do it that way.

What is the advantage/disadvantage of doing this by a sort of "sidecar" type vs extending the interface or type. Eg, something more traditional like: ``` type SymDiffNodeIterator interface { NodeIterator NextComesFromA() bool Position() int LastEqPathIndex() int } ``` Just as a rule, having one type representing the internal state of another type, and passing them both around, strikes me as odd, but there may be a compelling reason to do it that way.

roysc commented

2023-09-26 08:16:22 +00:00

Definitely, it's ugly. I didn't want to do this, but the problem comes when we wrap the iterators for tracking. We need to use the wrapped iterator in order for tracking to work, but we need the state exposed by the underlying one. We could extract the base with casts, but that's not much better and leaks the abstraction.

Other workarounds would be to

return the base iterators from tracker.Restore along with the wrapped ones, or
create a new tracker.Iterator interface which exposes the base somehow.

2 means new types and extra state that's mostly not used. 1 is okay, though from this end the result looks about the same (need to pass an extra object).

Definitely, it's ugly. I didn't want to do this, but the problem comes when we wrap the iterators for tracking. We need to use the wrapped iterator in order for tracking to work, but we need the state exposed by the underlying one. We could extract the base with casts, but that's not much better and leaks the abstraction. Other workarounds would be to 1. return the base iterators from `tracker.Restore` along with the wrapped ones, or 2. create a new `tracker.Iterator` interface which exposes the base somehow. 2 means new types and extra state that's mostly not used. 1 is okay, though from this end the result looks about the same (need to pass an extra object).

roysc commented

2023-09-26 10:17:09 +00:00

I've done option 1, but kept the separate state type, so it's clear that only a subset of the symm-diff iterator state is actually used in that context - rather than just passing the same object as a different type.

telackey marked this conversation as resolved

roysc added 2 commits 2023-09-26 10:12:20 +00:00

add missing rollback, doc 5b51346a83

rework iterator state stuff

Test / Run unit tests (pull_request) Failing after 15m46s

Details

Test / Run integration tests (pull_request) Successful in 30m5s

Details

100c106ee5

roysc requested review from telackey 2023-09-26 10:17:22 +00:00

roysc referenced this pull request from cerc-io/eth-statediff-service

2023-09-26 10:21:03 +00:00

Refactor to use plugeth-statediff #1

roysc force-pushed with-iterator-tracker from 100c106ee5 to 367349b85a

2023-09-26 12:12:47 +00:00

Compare

roysc added 1 commit 2023-09-27 13:29:33 +00:00

refactor: only support tracked snapshots

Test / Run unit tests (pull_request) Successful in 16m19s

Details

Test / Run integration tests (pull_request) Successful in 32m16s

Details

c2ceca9230

tracking incremental state diffs is not feasible like this. because updates are processed after the
trie has been traversed, the iterators' state doesn't fully capture the progress of the diff. since
currently snapshots are the only use case for tracking, let's just support that.

roysc added 6 commits 2023-09-27 13:58:48 +00:00

Run compliance tests.

Test / Run integration tests (pull_request) Has been cancelled

Details

Test / Run unit tests (pull_request) Has been cancelled

Details

Test / Run compliance tests (pull_request) Failing after 14s

Details

c37aab8b69

Tweak path

Test / Run integration tests (pull_request) Has been cancelled

Details

Test / Run unit tests (pull_request) Has been cancelled

Details

Test / Run compliance tests (pull_request) Has been cancelled

Details

dfa0660317

jq

Test / Run unit tests (pull_request) Successful in 16m46s

Details

Test / Run compliance tests (pull_request) Successful in 4m16s

Details

Test / Run integration tests (pull_request) Successful in 30m51s

Details

c8e7acaacd

Rm branch spec

Test / Run integration tests (pull_request) Has been cancelled

Details

Test / Run unit tests (pull_request) Has been cancelled

Details

Test / Run compliance tests (pull_request) Failing after 42s

Details

101654f30e

Use v0.1.0

Test / Run unit tests (pull_request) Successful in 16m35s

Details

Test / Run integration tests (pull_request) Successful in 29m30s

Details

Test / Run compliance tests (pull_request) Successful in 4m7s

Details

935c2ce4b5

Merge pull request 'Run compliance tests.' (#16 ) from telackey/cict into with-iterator-tracker

Test / Run unit tests (pull_request) Successful in 17m50s

Details

Test / Run compliance tests (pull_request) Successful in 3m19s

Details

Test / Run integration tests (pull_request) Successful in 31m34s

Details

e668a73bbe

Reviewed-on: #16

roysc changed title from ~~Add WriteStateDiffTracked to support tracking & recovery of a statediff~~ to Add WriteStateSnapshot

2023-09-27 14:24:13 +00:00

telackey reviewed 2023-09-27 22:56:04 +00:00

telackey approved these changes 2023-09-27 23:08:52 +00:00

telackey left a comment

Looks great! So much cleaner.

builder.go

						
				@ -163,0 +174,4 @@

						return fmt.Errorf("error opening new state trie: %w", err)

					}

					subiters, _, err := tracker.Restore(tree.NodeIterator)

telackey commented

2023-09-27 23:00:26 +00:00

I didn't see a test case where the iterators are actually restored. Is there one? Perhaps it exists in ipld-eth-state-snapshot?

roysc commented

2023-09-28 00:45:15 +00:00

It happens in the TestBuilderSnapshot tests, via test_helpers.RunStateSnapshot. Although there's no explicit check for the file in these tests, I do see them getting restored.

It happens in the `TestBuilderSnapshot` tests, via `test_helpers.RunStateSnapshot`. Although there's no explicit check for the file in these tests, I do see them getting restored.

roysc commented

2023-09-28 00:59:34 +00:00

Added a check for the file

telackey commented

2023-09-28 01:00:01 +00:00

Great! I see it now. I read that test, but I didn't put 2 and 2 together regarding the interrupt.

telackey commented

2023-09-28 01:01:14 +00:00

Yeah, a check for the file is even better, but you are right, practically speaking even the previous version of the test couldn't have succeeded if the file were not created and used.

telackey marked this conversation as resolved

indexer/database/sql/indexer.go

						
				@ -236,0 +169,4 @@

				func (sdi *StateDiffIndexer) PushHeader(batch interfaces.Batch, header *types.Header, reward, td *big.Int) (string, error) {

					tx, ok := batch.(*BatchTx)

					if !ok {

						return "", fmt.Errorf("sql: batch is expected to be of type %T, got %T", &BatchTx{}, batch)

telackey commented

2023-09-27 23:06:50 +00:00

Whoa, does this happen? Why?

roysc commented

2023-09-28 00:51:56 +00:00

Haha, it shouldn't and hasn't, but it's technically possible for someone to create an indexer of a different type and pass the wrong batch in. That would just panic otherwise, so the error is more to document the fact that this batch type is covariant with the indexer.

telackey commented

2023-09-28 00:53:58 +00:00

Oh good. Absolutely best always to check, I was just shocked for a moment to think it might be something that really happens.

telackey marked this conversation as resolved

roysc added 1 commit 2023-09-28 01:01:07 +00:00

check for recovery file

Test / Run compliance tests (pull_request) Successful in 3m47s

Details

Test / Run unit tests (pull_request) Successful in 15m27s

Details

Test / Run integration tests (pull_request) Successful in 28m35s

Details