* all: implement path-based state scheme
* all: edits from review
* core/rawdb, trie/triedb/pathdb: review changes
* core, light, trie, eth, tests: reimplement pbss history
* core, trie/triedb/pathdb: track block number in state history
* trie/triedb/pathdb: add history documentation
* core, trie/triedb/pathdb: address comments from Peter's review
Important changes to list:
- Cache trie nodes by path in clean cache
- Remove root->id mappings when history is truncated
* trie/triedb/pathdb: fallback to disk if unexpect node in clean cache
* core/rawdb: fix tests
* trie/triedb/pathdb: rename metrics, change clean cache key
* trie/triedb: manage the clean cache inside of disk layer
* trie/triedb/pathdb: move journal function
* trie/triedb/path: fix tests
* trie/triedb/pathdb: fix journal
* trie/triedb/pathdb: fix history
* trie/triedb/pathdb: try to fix tests on windows
* core, trie: address comments
* trie/triedb/pathdb: fix test issues
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
Co-authored-by: Martin Holst Swende <martin@swende.se>
The EmptyRootHash and EmptyCodeHash are defined everywhere in the codebase, this PR replaces all of them with unified one defined in core/types package, and also defines constants for TxRoot, WithdrawalsRoot and UncleRoot
Logs stored on disk have minimal information. Contextual information such as block
number, index of log in block, index of transaction in block are filled in upon request.
We can fill in all these fields only having the block header and list of receipts.
But determining the transaction hash of a log requires the block body.
The goal of this PR is postponing this retrieval until we are sure we the transaction hash.
It happens often that the header bloom filter signals there might be matches in a block,
but after actually checking them reveals the logs do not match. We want to avoid fetching
the body in this case.
Note that this changes the semantics of Backend.GetLogs. Downstream callers of
GetLogs now assume log context fields have not been derived, and need to call
DeriveFields on the logs if necessary.
This commit replaces ioutil.TempDir with t.TempDir in tests. The
directory created by t.TempDir is automatically removed when the test
and all its subtests complete.
Prior to this commit, temporary directory created using ioutil.TempDir
had to be removed manually by calling os.RemoveAll, which is omitted in
some tests. The error handling boilerplate e.g.
defer func() {
if err := os.RemoveAll(dir); err != nil {
t.Fatal(err)
}
}
is also tedious, but t.TempDir handles this for us nicely.
Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
This PR reduces the amount of work we do when answering header queries, e.g. when a peer
is syncing from us.
For some items, e.g block bodies, when we read the rlp-data from database, we plug it
directly into the response package. We didn't do that for headers, but instead read
headers-rlp, decode to types.Header, and re-encode to rlp. This PR changes that to keep it
in RLP-form as much as possible. When a node is syncing from us, it typically requests 192
contiguous headers. On master it has the following effect:
- For headers not in ancient: 2 db lookups. One for translating hash->number (even though
the request is by number), and another for reading by hash (this latter one is sometimes
cached).
- For headers in ancient: 1 file lookup/syscall for translating hash->number (even though
the request is by number), and another for reading the header itself. After this, it
also performes a hashing of the header, to ensure that the hash is what it expected. In
this PR, I instead move the logic for "give me a sequence of blocks" into the lower
layers, where the database can determine how and what to read from leveldb and/or
ancients.
There are basically four types of requests; three of them are improved this way. The
fourth, by hash going backwards, is more tricky to optimize. However, since we know that
the gap is 0, we can look up by the parentHash, and stlil shave off all the number->hash
lookups.
The gapped collection can be optimized similarly, as a follow-up, at least in three out of
four cases.
Co-authored-by: Felix Lange <fjl@twurst.com>
* core/types: rm extranous check in test
* core/rawdb: add lightweight types for block logs
* core/rawdb,eth: use lightweight accessor for log filtering
* core/rawdb: add bench for decoding into rlpLogs
This change is a rewrite of the freezer code.
When writing ancient chain data to the freezer, the previous version first encoded each
individual item to a temporary buffer, then wrote the buffer. For small item sizes (for
example, in the block hash freezer table), this strategy causes a lot of system calls for
writing tiny chunks of data. It also allocated a lot of temporary []byte buffers.
In the new version, we instead encode multiple items into a re-useable batch buffer, which
is then written to the file all at once. This avoids performing a system call for every
inserted item.
To make the internal batching work, the ancient database API had to be changed. While
integrating this new API in BlockChain.InsertReceiptChain, additional optimizations were
also added there.
Co-authored-by: Felix Lange <fjl@twurst.com>
This change introduces garbage collection for the light client. Historical
chain data is deleted periodically. If you want to disable the GC, use
the --light.nopruning flag.