This PR introduces as sharded mutex within the ChainIndex#GetTipsetByHeight.
It also replaces a go map with xsync.Map which doesn't require locking.
The lock is taken when it appears that ChainIndex filling work should be
started. After claiming the lock, the status of the cache is rechecked,
if the entry is still missing, the fillCache is started.
Thanks to @snissn and @arajasek for debugging and taking initial stabs at this.
Supersedes #10866 and 10885
Signed-off-by: Jakub Sztandera <kubuxu@protocol.ai>
fix: types: error out on decoding BlockMsg with extraneous data
Fixes OSS-fuzz issue 48208: lotus:fuzz_block_msg
Signed-off-by: Yolan Romailler <anomalroil@users.noreply.github.com>
* release the read lock earlier as it is not needed for chaincomputebasefee
* chain/messagepool/selection.go change to read lock in SelectMessages
* tighten up locks in chain/messagepool/repub.go and two questions on whether curTsLks are needed as comments
* include suggestion from @Jorropo to preallocate our msgs array so that we only need to make a single allocation
* mp.pending should not be accessed directly but through the getter
* from @arajasek: just check whether the sender is a robust address (anything except an ID address is robust) here, and return if so. That will:
be faster
reduce the size of this cache by half, because we can drop mp.keyCache.Add(ka, ka) on line 491.
* do not need curTslk and clean up code comments
This reverts commit 8b2208fd9a, reversing
changes made to 2db6b12b78.
Unfortunately, this is rather tricky code. We've found several issues so
far and, while we've fixed a few, there are outstanding issues that
would require complex fixes we don't have time to tackle right now.
Luckily, this code isn't actually needed by the main Filecoin chain
which relies on consensus fault reporting to handle equivocation. So we
can just try again later.
* fix: sync: fail sync instead of logging if we sync the wrong chain
* fix: sync: write headers in the correct order
Just in case. This shouldn't be necessary, but we might as well.
* fix: minus minus
* fix: do put the tipset
Put != Persist
And fix the message to account for the fact that we now reject _old_
blocks along with new ones.
We frequently receive "out of date" blocks in hello messages from
syncing and/or out of sync nodes. This isn't an error.
This will reject blocks in pubsub validation if they're either:
1. Too far into the future (5 blocks beyond the expected head).
2. Too far into the past (before finality with respect to our current
head).
Specifically:
1. We were previously rejecting future blocks in the sync logic, but not
in pubsub itself.
2. We never used to check if a block was too _old_.
Motivation: Blocks that are too new/too old can cause us to perform
quite a bit of unnecessary work.
We have to save raw blocks to the snapshot, but we should not be scanning them
for additional links as if they were CBOR blocks.
This cleans the logic a bit (we were checking that the parent was a CBOR block
before queueing up the children, but then scanning the children... it was weird).
Additionally, more verbose logging is added for the next time ScanForLinks
fails (currently very little info was given).
Our ScanForLinks callback should only enqueue CBOR for further processing.
We have observed that EthGetTransactionCount is one of the hotspots
on Glif production notes, and we are seeing regular 10-20 second
latencies when calling this rpc method.
I tracked the high latency spikes and they were correlated when
we were running ExecuteTipSet while following the chain.
To address this, we should not rely on tipset computation to get
nounce and instead look at the parent tipset and then count the
messages sent from the 'addr'.
The function/parameter were poorly named and should never have been
exposed. "GC" confidence should always be the same, this parameter
doesn't let us actually set the _confidence_, just the point before
which we no longer support reverts.
fixes#10706