The improvements in the range-export code lead to avoid reading most blocks
twice, as well as to allowing some blocks to be written to disk multiple times.
The cache hit-rate went down from being close to 50% to a maximum of 12% at
the very end of the export. The reason is that most CIDs are never read twice
since they are correctly tracked in the CID set.
These numbers do not support the maintenance of the CachingBlockstore
code. Additional testing shows that removing it has similar memory-usage
behaviour and about 5 minute-faster execution (around 10%).
Less code to maintain and less options to mess up with.
This commit moderately refactors the ranged export code. It addresses several
problems:
* Code does not finish cleanly and things hang on ctrl-c
* Same block is read multiple times in a row (artificially increasing cached
blockstore metrics to 50%)
* It is unclear whether there are additional races (a single worker quits
when reaching height 0)
* CARs produced have duplicated blocks (~400k for an 80M-blocks CAR or
so). Some blocks appear up to 5 times.
* Using pointers for tasks where it is not necessary.
The changes:
* Use a FIFO instead of stack: simpler implementation as its own type. This
has not proven to be much more memory-friendly, but it has not made things
worse either.
* We avoid a probably not small amount of allocations by not using
unnecessary pointers.
* Fix duplicated blocks by atomically checking+adding to CID set.
* Context-termination now works correctly. Worker lifetime is correctly tracked and all channels
are closed, avoiding any memory leaks and deadlocks.
* We ensure all work is finished before finishing, something that might have
been broken in some edge cases previously. In practice, we would not have
seen this except perhaps in very early snapshots close to genesis.
Initial testing shows the code is currently about 5% faster. Resulting
snapshots do not have duplicates so they are a bit smaller. We have manually
verified that no CID is lost versus previous results, with both old and recent
snapshots.
This first commit contains the first and second implementation stabs (after
primary review by @hsanjuan), using a stack for task buffering.
Known issues: ctrl-c (context cancellation) results in the export code getting
deadlocked. Duplicate blocks in exports. Duplicate block reads from store.
Original commit messages:
works
works against mainnet and calibnet
feat: add internal export api method
- will hopfully make things faster by not streaming the export over the json rpc api
polish: better file nameing
fix: potential race in marking cids as seen
chore: improve logging
feat: front export with cache
fix: give hector a good channel buffer on this shit
docsgen
Otherwise we may, e.g., try to estimate gas on a message to an f4
address before the nv18 migration.
I'm _not_ checking the "prior messages" here as this is just a sanity
check.
1. We do allow deploying with empty initcode.
2. Make sure that the encoded "code" is non-empty, if specified.
Basically, this makes everything consistent (and it's how I specified it
in the FIP).
* fix: stmgr: make the tipset and height agree when estimating gas
Specifically re-execute all messages in the current tipset, tacking the new
message onto the end. That way, the epoch is the epoch of the current tipset.
We could try to "make" a fake block and use that, but that's unlikely to
work well.
* fix: stmgr: only apply tipset messages for CallWithGas
* fix: itest: window post dispute
We now enforce the following rules:
1. No duplicate topics or data.
2. Topics must have 32 byte keys.
3. Topics may not be skipped. (e.g., no t1 & t3 without a t2).
4. Raw codecs.
We _don't_ require that topics/data be emitted in any specific order.
We _skip_ events with unknown keys. We _drop_ events that violate the
above rules.
This:
1. Updates the builtin actors bundle (for actors v10).
2. Updates the event entry type to include the codec.
3. Removes the cbor encoding and zero trimming from event data.
I've chose to:
1. _Not_ add codec handling to the event filtering system for now.
2. _Skip_ events with unexpected codecs.
We don't actually _allow_ these events in the FVM right now, and it
simplifies the implementation.
However, I _am_ recording the codecs in the database so we don't have to
migrate it later.
- Event keys are now t1, t2, t3, t4 for topics; and d for data.
- ref-fvm no longer stores events in the blockstore for us. It just
returns events to the client, who is now responsible for handling
them as it wishes / according to its configuration.
- Add a flag to VMOpts to have the events AMT be written in the blockstore.
- Add a flag to the ChainStore to advertise to the rest of the system
if the ChainStore is storing events.
- Enable that flag if the EthRPC is enabled (can also add an explicit
configuration flag if wanted).
* Refactor: Unify EthTx to FilecoinMessage methods
* Filecoin messages can again be converted to Eth Txs
* All BLS messages should calculated tx hash with unsigned message
* Refactor newEthTxReceipt
* fill in from and to for non-eth transactions
* Hoist nil check out of newEthTxFromMessageLookup
---------
Co-authored-by: Aayush <arajasek94@gmail.com>
Co-authored-by: Raúl Kripalani <raul@protocol.ai>
- Add builtin.EthereumAddressManagerActorAddr to builtin.go.template and make jen
- Rename to EthereumAddressManagerActorAddr to match pattern of other actors (CronActorAddr/etc)
We were putting the unsigned/VM message through the pipeline.
The events index was storing the _unsigned_ message CID.
However, the Eth tx hash index maps signed Delegated-signature message
CIDs to transaction hashes, i.e. it uses the _signed_ message CID.
As a result, eth_getLogs and other log-related methods were
unable to resolve the transaction hash from the index properly, and
would end up returning 0x00..00 in the transactionHash field.