This PR introduces as sharded mutex within the ChainIndex#GetTipsetByHeight.
It also replaces a go map with xsync.Map which doesn't require locking.
The lock is taken when it appears that ChainIndex filling work should be
started. After claiming the lock, the status of the cache is rechecked,
if the entry is still missing, the fillCache is started.
Thanks to @snissn and @arajasek for debugging and taking initial stabs at this.
Supersedes #10866 and 10885
Signed-off-by: Jakub Sztandera <kubuxu@protocol.ai>
fix: types: error out on decoding BlockMsg with extraneous data
Fixes OSS-fuzz issue 48208: lotus:fuzz_block_msg
Signed-off-by: Yolan Romailler <anomalroil@users.noreply.github.com>
* release the read lock earlier as it is not needed for chaincomputebasefee
* chain/messagepool/selection.go change to read lock in SelectMessages
* tighten up locks in chain/messagepool/repub.go and two questions on whether curTsLks are needed as comments
* include suggestion from @Jorropo to preallocate our msgs array so that we only need to make a single allocation
* mp.pending should not be accessed directly but through the getter
* from @arajasek: just check whether the sender is a robust address (anything except an ID address is robust) here, and return if so. That will:
be faster
reduce the size of this cache by half, because we can drop mp.keyCache.Add(ka, ka) on line 491.
* do not need curTslk and clean up code comments
This reverts commit 8b2208fd9a, reversing
changes made to 2db6b12b78.
Unfortunately, this is rather tricky code. We've found several issues so
far and, while we've fixed a few, there are outstanding issues that
would require complex fixes we don't have time to tackle right now.
Luckily, this code isn't actually needed by the main Filecoin chain
which relies on consensus fault reporting to handle equivocation. So we
can just try again later.
* fix: sync: fail sync instead of logging if we sync the wrong chain
* fix: sync: write headers in the correct order
Just in case. This shouldn't be necessary, but we might as well.
* fix: minus minus
* fix: do put the tipset
Put != Persist
And fix the message to account for the fact that we now reject _old_
blocks along with new ones.
We frequently receive "out of date" blocks in hello messages from
syncing and/or out of sync nodes. This isn't an error.
This will reject blocks in pubsub validation if they're either:
1. Too far into the future (5 blocks beyond the expected head).
2. Too far into the past (before finality with respect to our current
head).
Specifically:
1. We were previously rejecting future blocks in the sync logic, but not
in pubsub itself.
2. We never used to check if a block was too _old_.
Motivation: Blocks that are too new/too old can cause us to perform
quite a bit of unnecessary work.
We have to save raw blocks to the snapshot, but we should not be scanning them
for additional links as if they were CBOR blocks.
This cleans the logic a bit (we were checking that the parent was a CBOR block
before queueing up the children, but then scanning the children... it was weird).
Additionally, more verbose logging is added for the next time ScanForLinks
fails (currently very little info was given).
Our ScanForLinks callback should only enqueue CBOR for further processing.
We have observed that EthGetTransactionCount is one of the hotspots
on Glif production notes, and we are seeing regular 10-20 second
latencies when calling this rpc method.
I tracked the high latency spikes and they were correlated when
we were running ExecuteTipSet while following the chain.
To address this, we should not rely on tipset computation to get
nounce and instead look at the parent tipset and then count the
messages sent from the 'addr'.
The function/parameter were poorly named and should never have been
exposed. "GC" confidence should always be the same, this parameter
doesn't let us actually set the _confidence_, just the point before
which we no longer support reverts.
fixes#10706
Technically, the block validator caught this panic. But it's pointless
because we have a _real_ mechanism to return the validation reason,
which we should have been using.
In general, panicing like this is a very bad idea because it's
non-obvious and, in this case, completely undocumented.
* have gas estimate call callInternal with applyTsMessages = false and other calls with applyTsMessages=true for gas caclulation optimization
* set applyTsMessages = true in CallWithGas call in shed
* update test with new callwithgas api optimization for eth call
* Update chain/stmgr/call.go
Co-authored-by: Łukasz Magiera <magik6k@users.noreply.github.com>
* env flag LOTUS_SKIP_APPLY_TS_MESSAGE_CALL_WITH_GAS must be 1 in order to have applyTsMessages change
* env flag LOTUS_SKIP_APPLY_TS_MESSAGE_CALL_WITH_GAS must be 1 in order to have applyTsMessages change
* make sure that even if we arent apply ts messages we grab ts messages from the particular user who is requesting gas estimation
---------
Co-authored-by: Jiaying Wang <42981373+jennijuju@users.noreply.github.com>
Co-authored-by: Łukasz Magiera <magik6k@users.noreply.github.com>
Co-authored-by: Ubuntu <ubuntu@ip-10-0-4-29.us-east-2.compute.internal>
We want to make the execution trace cache size configurable as SPs
may want to disable it while exchanges may want to crank it up.
We were also are going with intuition for this value, so having
ability to change it without a new build would help.
Fixes: https://github.com/filecoin-project/lotus/issues/10584
This PR adds a small cache to calls to ExecutionTrace which helps
improve performance for node operators like exchanges and block
explorers.
If items is in cache calls to this function will be 2-3x faster.
Fixes: https://github.com/filecoin-project/lotus/issues/10504
* have gas estimate call callInternal with applyTsMessages = false and other calls with applyTsMessages=true for gas caclulation optimization
* set applyTsMessages = true in CallWithGas call in shed
* update test with new callwithgas api optimization for eth call
* Update chain/stmgr/call.go
Co-authored-by: Łukasz Magiera <magik6k@users.noreply.github.com>
* env flag LOTUS_SKIP_APPLY_TS_MESSAGE_CALL_WITH_GAS must be 1 in order to have applyTsMessages change
* env flag LOTUS_SKIP_APPLY_TS_MESSAGE_CALL_WITH_GAS must be 1 in order to have applyTsMessages change
* make sure that even if we arent apply ts messages we grab ts messages from the particular user who is requesting gas estimation
---------
Co-authored-by: Jiaying Wang <42981373+jennijuju@users.noreply.github.com>
Co-authored-by: Łukasz Magiera <magik6k@users.noreply.github.com>
Co-authored-by: Ubuntu <ubuntu@ip-10-0-4-29.us-east-2.compute.internal>
- drop the execution index; we don't need it
- it is inclusion tipset
- use MessagesForTipset
- hoist db sql stuffs on top for clarity
- add index for tipset on messages
We'll never get an actor/account deployed to one of these
addresses (although we might get a placeholder). However, converting
such an address to an f4 address is definitely wrong.
However, we're leaving the default at 1.25x for backwards compatibility, for now.
Also:
1. Actually use the configured replace fee ratio.
2. Store said ratios as percentages instead of floats. 1.25, or 1+1/(2^2),
can be represented as a float. 1.1, or 1 + 1/(2 * 5), cannot.
fixes#10415
This is now "FVM" native. Changes include:
1. Don't treat "trace" messages like off-chain messages. E.g., don't
include CIDs, versions, etc.
2. Include IPLD codecs where applicable.
3. Remove fields that aren't filled by the FVM (timing, some errors,
code locations, etc.).
This will make `lotus send` mostly just "do what the user wants" in this
case:
1. The user may not explicitly specify a method number.
2. Parameters are automatically cbor-encoded where applicable.
3. The method number is automatically selected based on the
recipient (CreateExternal if sent to the EAM, InvokeEVM otherwise).
The improvements in the range-export code lead to avoid reading most blocks
twice, as well as to allowing some blocks to be written to disk multiple times.
The cache hit-rate went down from being close to 50% to a maximum of 12% at
the very end of the export. The reason is that most CIDs are never read twice
since they are correctly tracked in the CID set.
These numbers do not support the maintenance of the CachingBlockstore
code. Additional testing shows that removing it has similar memory-usage
behaviour and about 5 minute-faster execution (around 10%).
Less code to maintain and less options to mess up with.
This commit moderately refactors the ranged export code. It addresses several
problems:
* Code does not finish cleanly and things hang on ctrl-c
* Same block is read multiple times in a row (artificially increasing cached
blockstore metrics to 50%)
* It is unclear whether there are additional races (a single worker quits
when reaching height 0)
* CARs produced have duplicated blocks (~400k for an 80M-blocks CAR or
so). Some blocks appear up to 5 times.
* Using pointers for tasks where it is not necessary.
The changes:
* Use a FIFO instead of stack: simpler implementation as its own type. This
has not proven to be much more memory-friendly, but it has not made things
worse either.
* We avoid a probably not small amount of allocations by not using
unnecessary pointers.
* Fix duplicated blocks by atomically checking+adding to CID set.
* Context-termination now works correctly. Worker lifetime is correctly tracked and all channels
are closed, avoiding any memory leaks and deadlocks.
* We ensure all work is finished before finishing, something that might have
been broken in some edge cases previously. In practice, we would not have
seen this except perhaps in very early snapshots close to genesis.
Initial testing shows the code is currently about 5% faster. Resulting
snapshots do not have duplicates so they are a bit smaller. We have manually
verified that no CID is lost versus previous results, with both old and recent
snapshots.