Trie tracer is an auxiliary tool to capture all deleted nodes
which can't be captured by trie.Committer. The deleted nodes
can be removed from the disk later.
* trie: fix memory leak in trie iterator
In the trie iterator, live nodes are tracked in a stack while iterating.
Popped node states should be explictly set to nil in order to get
garbage-collected.
* trie: fix empty trie iterator
This PR adds an addtional API called `NewBatchWithSize` for db
batcher. It turns out that leveldb batch memory allocation is
super inefficient. The main reason is the allocation step of
leveldb Batch is too small when the batch size is large. It can
take a few second to build a leveldb batch with 100MB size.
Luckily, leveldb also offers another API called MakeBatch which can
pre-allocate the memory area. So if the approximate size of batch is
known in advance, this API can be used in this case.
It's needed in new state scheme PR which needs to commit a batch of
trie nodes in a single batch. Implement the feature in a seperate PR.
This functionality is needed in new path-based storage scheme, but
can be implemented in a seperate PR though.
When an account is deleted, then all the storage slots should be
nuked out from the disk as well. In hash-based storage scheme they
are still left in the disk but in new scheme, they will be iterated
and marked as deleted.
But why the NodeBlob API is needed in this scenario? Because when
the node is marked deleted, the previous value is also required to
be recorded to construct the reverse diff.
This change improves the efficiency of the nodeIterator seek
operation. Previously, seek essentially ran the iterator forward
until it found the matching node. With this change, it skips
over fullnode children and avoids resolving them from the database.
This change introduces garbage collection for the light client. Historical
chain data is deleted periodically. If you want to disable the GC, use
the --light.nopruning flag.
* ethdb: remove Set
Set deadlocks immediately and isn't part of the Database interface.
* trie: add Err to Iterator
This is useful for testing because the underlying NodeIterator doesn't
need to be kept in a separate variable just to get the error.
* trie: add LeafKey to iterator, panic when not at leaf
LeafKey is useful for callers that can't interpret Path.
* trie: retry failed seek/peek in iterator Next
Instead of failing iteration irrecoverably, make it so Next retries the
pending seek or peek every time.
Smaller changes in this commit make this easier to test:
* The iterator previously returned from Next on encountering a hash
node. This caused it to visit the same path twice.
* Path returned nibbles with terminator symbol for valueNode attached
to fullNode, but removed it for valueNode attached to shortNode. Now
the terminator is always present. This makes Path unique to each node
and simplifies Leaf.
* trie: add Path to MissingNodeError
The light client trie iterator needs to know the path of the node that's
missing so it can retrieve a proof for it. NodeIterator.Path is not
sufficient because it is updated when the node is resolved and actually
visited by the iterator.
Also remove unused fields. They were added a long time ago before we
knew which fields would be needed for the light client.
The 'step' method is split into two parts, 'peek' and 'push'. peek
returns the next state but doesn't make it current.
The end of iteration was previously tracked by setting 'trie' to nil.
End of iteration is now tracked using the 'iteratorEnd' error, which is
slightly cleaner and requires less code.
Make it so each iterator has exactly one public constructor:
- NodeIterators can be created through a method.
- Iterators can be created through NewIterator on any NodeIterator.
This PR implements a differenceIterator, which allows iterating over trie nodes
that exist in one trie but not in another. This is a prerequisite for most GC
strategies, in order to find obsolete nodes.