Avoid duplicate committee cache loads (#3574)
## Issue Addressed
NA
## Proposed Changes
I have observed scenarios on Goerli where Lighthouse was receiving attestations which reference the same, un-cached shuffling on multiple threads at the same time. Lighthouse was then loading the same state from database and determining the shuffling on multiple threads at the same time. This is unnecessary load on the disk and RAM.
This PR modifies the shuffling cache so that each entry can be either:
- A committee
- A promise for a committee (i.e., a `crossbeam_channel::Receiver`)
Now, in the scenario where we have thread A and thread B simultaneously requesting the same un-cached shuffling, we will have the following:
1. Thread A will take the write-lock on the shuffling cache, find that there's no cached committee and then create a "promise" (a `crossbeam_channel::Sender`) for a committee before dropping the write-lock.
1. Thread B will then be allowed to take the write-lock for the shuffling cache and find the promise created by thread A. It will block the current thread waiting for thread A to fulfill that promise.
1. Thread A will load the state from disk, obtain the shuffling, send it down the channel, insert the entry into the cache and then continue to verify the attestation.
1. Thread B will then receive the shuffling from the receiver, be un-blocked and then continue to verify the attestation.
In the case where thread A fails to generate the shuffling and drops the sender, the next time that specific shuffling is requested we will detect that the channel is disconnected and return a `None` entry for that shuffling. This will cause the shuffling to be re-calculated.
## Additional Info
NA
2022-09-16 08:54:03 +00:00
|
|
|
use crate::{metrics, BeaconChainError};
|
|
|
|
use crossbeam_channel::{bounded, Receiver, Sender, TryRecvError};
|
2020-03-05 06:19:35 +00:00
|
|
|
use lru::LruCache;
|
Avoid duplicate committee cache loads (#3574)
## Issue Addressed
NA
## Proposed Changes
I have observed scenarios on Goerli where Lighthouse was receiving attestations which reference the same, un-cached shuffling on multiple threads at the same time. Lighthouse was then loading the same state from database and determining the shuffling on multiple threads at the same time. This is unnecessary load on the disk and RAM.
This PR modifies the shuffling cache so that each entry can be either:
- A committee
- A promise for a committee (i.e., a `crossbeam_channel::Receiver`)
Now, in the scenario where we have thread A and thread B simultaneously requesting the same un-cached shuffling, we will have the following:
1. Thread A will take the write-lock on the shuffling cache, find that there's no cached committee and then create a "promise" (a `crossbeam_channel::Sender`) for a committee before dropping the write-lock.
1. Thread B will then be allowed to take the write-lock for the shuffling cache and find the promise created by thread A. It will block the current thread waiting for thread A to fulfill that promise.
1. Thread A will load the state from disk, obtain the shuffling, send it down the channel, insert the entry into the cache and then continue to verify the attestation.
1. Thread B will then receive the shuffling from the receiver, be un-blocked and then continue to verify the attestation.
In the case where thread A fails to generate the shuffling and drops the sender, the next time that specific shuffling is requested we will detect that the channel is disconnected and return a `None` entry for that shuffling. This will cause the shuffling to be re-calculated.
## Additional Info
NA
2022-09-16 08:54:03 +00:00
|
|
|
use std::sync::Arc;
|
2021-02-15 07:17:52 +00:00
|
|
|
use types::{beacon_state::CommitteeCache, AttestationShufflingId, Epoch, Hash256};
|
2020-03-05 06:19:35 +00:00
|
|
|
|
|
|
|
/// The size of the LRU cache that stores committee caches for quicker verification.
|
|
|
|
///
|
|
|
|
/// Each entry should be `8 + 800,000 = 800,008` bytes in size with 100k validators. (8-byte hash +
|
|
|
|
/// 100k indices). Therefore, this cache should be approx `16 * 800,008 = 12.8 MB`. (Note: this
|
|
|
|
/// ignores a few extra bytes in the caches that should be insignificant compared to the indices).
|
|
|
|
const CACHE_SIZE: usize = 16;
|
|
|
|
|
Avoid duplicate committee cache loads (#3574)
## Issue Addressed
NA
## Proposed Changes
I have observed scenarios on Goerli where Lighthouse was receiving attestations which reference the same, un-cached shuffling on multiple threads at the same time. Lighthouse was then loading the same state from database and determining the shuffling on multiple threads at the same time. This is unnecessary load on the disk and RAM.
This PR modifies the shuffling cache so that each entry can be either:
- A committee
- A promise for a committee (i.e., a `crossbeam_channel::Receiver`)
Now, in the scenario where we have thread A and thread B simultaneously requesting the same un-cached shuffling, we will have the following:
1. Thread A will take the write-lock on the shuffling cache, find that there's no cached committee and then create a "promise" (a `crossbeam_channel::Sender`) for a committee before dropping the write-lock.
1. Thread B will then be allowed to take the write-lock for the shuffling cache and find the promise created by thread A. It will block the current thread waiting for thread A to fulfill that promise.
1. Thread A will load the state from disk, obtain the shuffling, send it down the channel, insert the entry into the cache and then continue to verify the attestation.
1. Thread B will then receive the shuffling from the receiver, be un-blocked and then continue to verify the attestation.
In the case where thread A fails to generate the shuffling and drops the sender, the next time that specific shuffling is requested we will detect that the channel is disconnected and return a `None` entry for that shuffling. This will cause the shuffling to be re-calculated.
## Additional Info
NA
2022-09-16 08:54:03 +00:00
|
|
|
/// The maximum number of concurrent committee cache "promises" that can be issued. In effect, this
|
|
|
|
/// limits the number of concurrent states that can be loaded into memory for the committee cache.
|
|
|
|
/// This prevents excessive memory usage at the cost of rejecting some attestations.
|
|
|
|
///
|
|
|
|
/// We set this value to 2 since states can be quite large and have a significant impact on memory
|
|
|
|
/// usage. A healthy network cannot have more than a few committee caches and those caches should
|
|
|
|
/// always be inserted during block import. Unstable networks with a high degree of forking might
|
|
|
|
/// see some attestations dropped due to this concurrency limit, however I propose that this is
|
|
|
|
/// better than low-resource nodes going OOM.
|
|
|
|
const MAX_CONCURRENT_PROMISES: usize = 2;
|
|
|
|
|
|
|
|
#[derive(Clone)]
|
|
|
|
pub enum CacheItem {
|
|
|
|
/// A committee.
|
|
|
|
Committee(Arc<CommitteeCache>),
|
|
|
|
/// A promise for a future committee.
|
|
|
|
Promise(Receiver<Arc<CommitteeCache>>),
|
|
|
|
}
|
|
|
|
|
|
|
|
impl CacheItem {
|
|
|
|
pub fn is_promise(&self) -> bool {
|
|
|
|
matches!(self, CacheItem::Promise(_))
|
|
|
|
}
|
|
|
|
|
|
|
|
pub fn wait(self) -> Result<Arc<CommitteeCache>, BeaconChainError> {
|
|
|
|
match self {
|
|
|
|
CacheItem::Committee(cache) => Ok(cache),
|
|
|
|
CacheItem::Promise(receiver) => receiver
|
|
|
|
.recv()
|
|
|
|
.map_err(BeaconChainError::CommitteeCacheWait),
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-03-05 06:19:35 +00:00
|
|
|
/// Provides an LRU cache for `CommitteeCache`.
|
|
|
|
///
|
|
|
|
/// It has been named `ShufflingCache` because `CommitteeCacheCache` is a bit weird and looks like
|
|
|
|
/// a find/replace error.
|
|
|
|
pub struct ShufflingCache {
|
Avoid duplicate committee cache loads (#3574)
## Issue Addressed
NA
## Proposed Changes
I have observed scenarios on Goerli where Lighthouse was receiving attestations which reference the same, un-cached shuffling on multiple threads at the same time. Lighthouse was then loading the same state from database and determining the shuffling on multiple threads at the same time. This is unnecessary load on the disk and RAM.
This PR modifies the shuffling cache so that each entry can be either:
- A committee
- A promise for a committee (i.e., a `crossbeam_channel::Receiver`)
Now, in the scenario where we have thread A and thread B simultaneously requesting the same un-cached shuffling, we will have the following:
1. Thread A will take the write-lock on the shuffling cache, find that there's no cached committee and then create a "promise" (a `crossbeam_channel::Sender`) for a committee before dropping the write-lock.
1. Thread B will then be allowed to take the write-lock for the shuffling cache and find the promise created by thread A. It will block the current thread waiting for thread A to fulfill that promise.
1. Thread A will load the state from disk, obtain the shuffling, send it down the channel, insert the entry into the cache and then continue to verify the attestation.
1. Thread B will then receive the shuffling from the receiver, be un-blocked and then continue to verify the attestation.
In the case where thread A fails to generate the shuffling and drops the sender, the next time that specific shuffling is requested we will detect that the channel is disconnected and return a `None` entry for that shuffling. This will cause the shuffling to be re-calculated.
## Additional Info
NA
2022-09-16 08:54:03 +00:00
|
|
|
cache: LruCache<AttestationShufflingId, CacheItem>,
|
2020-03-05 06:19:35 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
impl ShufflingCache {
|
|
|
|
pub fn new() -> Self {
|
|
|
|
Self {
|
|
|
|
cache: LruCache::new(CACHE_SIZE),
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Avoid duplicate committee cache loads (#3574)
## Issue Addressed
NA
## Proposed Changes
I have observed scenarios on Goerli where Lighthouse was receiving attestations which reference the same, un-cached shuffling on multiple threads at the same time. Lighthouse was then loading the same state from database and determining the shuffling on multiple threads at the same time. This is unnecessary load on the disk and RAM.
This PR modifies the shuffling cache so that each entry can be either:
- A committee
- A promise for a committee (i.e., a `crossbeam_channel::Receiver`)
Now, in the scenario where we have thread A and thread B simultaneously requesting the same un-cached shuffling, we will have the following:
1. Thread A will take the write-lock on the shuffling cache, find that there's no cached committee and then create a "promise" (a `crossbeam_channel::Sender`) for a committee before dropping the write-lock.
1. Thread B will then be allowed to take the write-lock for the shuffling cache and find the promise created by thread A. It will block the current thread waiting for thread A to fulfill that promise.
1. Thread A will load the state from disk, obtain the shuffling, send it down the channel, insert the entry into the cache and then continue to verify the attestation.
1. Thread B will then receive the shuffling from the receiver, be un-blocked and then continue to verify the attestation.
In the case where thread A fails to generate the shuffling and drops the sender, the next time that specific shuffling is requested we will detect that the channel is disconnected and return a `None` entry for that shuffling. This will cause the shuffling to be re-calculated.
## Additional Info
NA
2022-09-16 08:54:03 +00:00
|
|
|
pub fn get(&mut self, key: &AttestationShufflingId) -> Option<CacheItem> {
|
|
|
|
match self.cache.get(key) {
|
|
|
|
// The cache contained the committee cache, return it.
|
|
|
|
item @ Some(CacheItem::Committee(_)) => {
|
|
|
|
metrics::inc_counter(&metrics::SHUFFLING_CACHE_HITS);
|
|
|
|
item.cloned()
|
|
|
|
}
|
|
|
|
// The cache contains a promise for the committee cache. Check to see if the promise has
|
|
|
|
// already been resolved, without waiting for it.
|
|
|
|
item @ Some(CacheItem::Promise(receiver)) => match receiver.try_recv() {
|
|
|
|
// The promise has already been resolved. Replace the entry in the cache with a
|
|
|
|
// `Committee` entry and then return the committee.
|
|
|
|
Ok(committee) => {
|
|
|
|
metrics::inc_counter(&metrics::SHUFFLING_CACHE_PROMISE_HITS);
|
|
|
|
metrics::inc_counter(&metrics::SHUFFLING_CACHE_HITS);
|
|
|
|
let ready = CacheItem::Committee(committee);
|
|
|
|
self.cache.put(key.clone(), ready.clone());
|
|
|
|
Some(ready)
|
|
|
|
}
|
|
|
|
// The promise has not yet been resolved. Return the promise so the caller can await
|
|
|
|
// it.
|
|
|
|
Err(TryRecvError::Empty) => {
|
|
|
|
metrics::inc_counter(&metrics::SHUFFLING_CACHE_PROMISE_HITS);
|
|
|
|
metrics::inc_counter(&metrics::SHUFFLING_CACHE_HITS);
|
|
|
|
item.cloned()
|
|
|
|
}
|
|
|
|
// The sender has been dropped without sending a committee. There was most likely an
|
|
|
|
// error computing the committee cache. Drop the key from the cache and return
|
|
|
|
// `None` so the caller can recompute the committee.
|
|
|
|
//
|
|
|
|
// It's worth noting that this is the only place where we removed unresolved
|
|
|
|
// promises from the cache. This means unresolved promises will only be removed if
|
|
|
|
// we try to access them again. This is OK, since the promises don't consume much
|
|
|
|
// memory and the nature of the LRU cache means that future, relevant entries will
|
|
|
|
// still be added to the cache. We expect that *all* promises should be resolved,
|
|
|
|
// unless there is a programming or database error.
|
|
|
|
Err(TryRecvError::Disconnected) => {
|
|
|
|
metrics::inc_counter(&metrics::SHUFFLING_CACHE_PROMISE_FAILS);
|
|
|
|
metrics::inc_counter(&metrics::SHUFFLING_CACHE_MISSES);
|
|
|
|
self.cache.pop(key);
|
|
|
|
None
|
|
|
|
}
|
|
|
|
},
|
|
|
|
// The cache does not have this committee and it's not already promised to be computed.
|
|
|
|
None => {
|
|
|
|
metrics::inc_counter(&metrics::SHUFFLING_CACHE_MISSES);
|
|
|
|
None
|
|
|
|
}
|
2020-03-05 06:19:35 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-02-15 07:17:52 +00:00
|
|
|
pub fn contains(&self, key: &AttestationShufflingId) -> bool {
|
2020-09-29 03:46:54 +00:00
|
|
|
self.cache.contains(key)
|
|
|
|
}
|
2020-03-05 06:19:35 +00:00
|
|
|
|
Avoid duplicate committee cache loads (#3574)
## Issue Addressed
NA
## Proposed Changes
I have observed scenarios on Goerli where Lighthouse was receiving attestations which reference the same, un-cached shuffling on multiple threads at the same time. Lighthouse was then loading the same state from database and determining the shuffling on multiple threads at the same time. This is unnecessary load on the disk and RAM.
This PR modifies the shuffling cache so that each entry can be either:
- A committee
- A promise for a committee (i.e., a `crossbeam_channel::Receiver`)
Now, in the scenario where we have thread A and thread B simultaneously requesting the same un-cached shuffling, we will have the following:
1. Thread A will take the write-lock on the shuffling cache, find that there's no cached committee and then create a "promise" (a `crossbeam_channel::Sender`) for a committee before dropping the write-lock.
1. Thread B will then be allowed to take the write-lock for the shuffling cache and find the promise created by thread A. It will block the current thread waiting for thread A to fulfill that promise.
1. Thread A will load the state from disk, obtain the shuffling, send it down the channel, insert the entry into the cache and then continue to verify the attestation.
1. Thread B will then receive the shuffling from the receiver, be un-blocked and then continue to verify the attestation.
In the case where thread A fails to generate the shuffling and drops the sender, the next time that specific shuffling is requested we will detect that the channel is disconnected and return a `None` entry for that shuffling. This will cause the shuffling to be re-calculated.
## Additional Info
NA
2022-09-16 08:54:03 +00:00
|
|
|
pub fn insert_committee_cache<T: ToArcCommitteeCache>(
|
|
|
|
&mut self,
|
|
|
|
key: AttestationShufflingId,
|
|
|
|
committee_cache: &T,
|
|
|
|
) {
|
|
|
|
if self
|
|
|
|
.cache
|
|
|
|
.get(&key)
|
|
|
|
// Replace the committee if it's not present or if it's a promise. A bird in the hand is
|
|
|
|
// worth two in the promise-bush!
|
|
|
|
.map_or(true, CacheItem::is_promise)
|
|
|
|
{
|
|
|
|
self.cache.put(
|
|
|
|
key,
|
|
|
|
CacheItem::Committee(committee_cache.to_arc_committee_cache()),
|
|
|
|
);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
pub fn create_promise(
|
|
|
|
&mut self,
|
|
|
|
key: AttestationShufflingId,
|
|
|
|
) -> Result<Sender<Arc<CommitteeCache>>, BeaconChainError> {
|
|
|
|
let num_active_promises = self
|
|
|
|
.cache
|
|
|
|
.iter()
|
|
|
|
.filter(|(_, item)| item.is_promise())
|
|
|
|
.count();
|
|
|
|
if num_active_promises >= MAX_CONCURRENT_PROMISES {
|
|
|
|
return Err(BeaconChainError::MaxCommitteePromises(num_active_promises));
|
2020-03-05 06:19:35 +00:00
|
|
|
}
|
Avoid duplicate committee cache loads (#3574)
## Issue Addressed
NA
## Proposed Changes
I have observed scenarios on Goerli where Lighthouse was receiving attestations which reference the same, un-cached shuffling on multiple threads at the same time. Lighthouse was then loading the same state from database and determining the shuffling on multiple threads at the same time. This is unnecessary load on the disk and RAM.
This PR modifies the shuffling cache so that each entry can be either:
- A committee
- A promise for a committee (i.e., a `crossbeam_channel::Receiver`)
Now, in the scenario where we have thread A and thread B simultaneously requesting the same un-cached shuffling, we will have the following:
1. Thread A will take the write-lock on the shuffling cache, find that there's no cached committee and then create a "promise" (a `crossbeam_channel::Sender`) for a committee before dropping the write-lock.
1. Thread B will then be allowed to take the write-lock for the shuffling cache and find the promise created by thread A. It will block the current thread waiting for thread A to fulfill that promise.
1. Thread A will load the state from disk, obtain the shuffling, send it down the channel, insert the entry into the cache and then continue to verify the attestation.
1. Thread B will then receive the shuffling from the receiver, be un-blocked and then continue to verify the attestation.
In the case where thread A fails to generate the shuffling and drops the sender, the next time that specific shuffling is requested we will detect that the channel is disconnected and return a `None` entry for that shuffling. This will cause the shuffling to be re-calculated.
## Additional Info
NA
2022-09-16 08:54:03 +00:00
|
|
|
|
|
|
|
let (sender, receiver) = bounded(1);
|
|
|
|
self.cache.put(key, CacheItem::Promise(receiver));
|
|
|
|
Ok(sender)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/// A helper trait to allow lazy-cloning of the committee cache when inserting into the cache.
|
|
|
|
pub trait ToArcCommitteeCache {
|
|
|
|
fn to_arc_committee_cache(&self) -> Arc<CommitteeCache>;
|
|
|
|
}
|
|
|
|
|
|
|
|
impl ToArcCommitteeCache for CommitteeCache {
|
|
|
|
fn to_arc_committee_cache(&self) -> Arc<CommitteeCache> {
|
|
|
|
Arc::new(self.clone())
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
impl ToArcCommitteeCache for Arc<CommitteeCache> {
|
|
|
|
fn to_arc_committee_cache(&self) -> Arc<CommitteeCache> {
|
|
|
|
self.clone()
|
2020-03-05 06:19:35 +00:00
|
|
|
}
|
|
|
|
}
|
2020-09-29 03:46:54 +00:00
|
|
|
|
Use async code when interacting with EL (#3244)
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough :sweat_smile:)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java#L171-L182) it uses [`getStateRootFromBlockRoot`](https://github.com/ConsenSys/teku/blob/de2b2801c89ef5abf983d6bf37867c37fc47121f/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java#L336-L341) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
2022-07-03 05:36:50 +00:00
|
|
|
impl Default for ShufflingCache {
|
|
|
|
fn default() -> Self {
|
|
|
|
Self::new()
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-09-29 03:46:54 +00:00
|
|
|
/// Contains the shuffling IDs for a beacon block.
|
|
|
|
pub struct BlockShufflingIds {
|
2021-02-15 07:17:52 +00:00
|
|
|
pub current: AttestationShufflingId,
|
|
|
|
pub next: AttestationShufflingId,
|
2020-09-29 03:46:54 +00:00
|
|
|
pub block_root: Hash256,
|
|
|
|
}
|
|
|
|
|
|
|
|
impl BlockShufflingIds {
|
|
|
|
/// Returns the shuffling ID for the given epoch.
|
|
|
|
///
|
|
|
|
/// Returns `None` if `epoch` is prior to `self.current.shuffling_epoch`.
|
2021-02-15 07:17:52 +00:00
|
|
|
pub fn id_for_epoch(&self, epoch: Epoch) -> Option<AttestationShufflingId> {
|
2020-09-29 03:46:54 +00:00
|
|
|
if epoch == self.current.shuffling_epoch {
|
|
|
|
Some(self.current.clone())
|
|
|
|
} else if epoch == self.next.shuffling_epoch {
|
|
|
|
Some(self.next.clone())
|
|
|
|
} else if epoch > self.next.shuffling_epoch {
|
2021-02-15 07:17:52 +00:00
|
|
|
Some(AttestationShufflingId::from_components(
|
|
|
|
epoch,
|
|
|
|
self.block_root,
|
|
|
|
))
|
2020-09-29 03:46:54 +00:00
|
|
|
} else {
|
|
|
|
None
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
Avoid duplicate committee cache loads (#3574)
## Issue Addressed
NA
## Proposed Changes
I have observed scenarios on Goerli where Lighthouse was receiving attestations which reference the same, un-cached shuffling on multiple threads at the same time. Lighthouse was then loading the same state from database and determining the shuffling on multiple threads at the same time. This is unnecessary load on the disk and RAM.
This PR modifies the shuffling cache so that each entry can be either:
- A committee
- A promise for a committee (i.e., a `crossbeam_channel::Receiver`)
Now, in the scenario where we have thread A and thread B simultaneously requesting the same un-cached shuffling, we will have the following:
1. Thread A will take the write-lock on the shuffling cache, find that there's no cached committee and then create a "promise" (a `crossbeam_channel::Sender`) for a committee before dropping the write-lock.
1. Thread B will then be allowed to take the write-lock for the shuffling cache and find the promise created by thread A. It will block the current thread waiting for thread A to fulfill that promise.
1. Thread A will load the state from disk, obtain the shuffling, send it down the channel, insert the entry into the cache and then continue to verify the attestation.
1. Thread B will then receive the shuffling from the receiver, be un-blocked and then continue to verify the attestation.
In the case where thread A fails to generate the shuffling and drops the sender, the next time that specific shuffling is requested we will detect that the channel is disconnected and return a `None` entry for that shuffling. This will cause the shuffling to be re-calculated.
## Additional Info
NA
2022-09-16 08:54:03 +00:00
|
|
|
|
|
|
|
// Disable tests in debug since the beacon chain harness is slow unless in release.
|
|
|
|
#[cfg(not(debug_assertions))]
|
|
|
|
#[cfg(test)]
|
|
|
|
mod test {
|
|
|
|
use super::*;
|
|
|
|
use crate::test_utils::EphemeralHarnessType;
|
|
|
|
use types::*;
|
|
|
|
|
|
|
|
type BeaconChainHarness =
|
|
|
|
crate::test_utils::BeaconChainHarness<EphemeralHarnessType<MinimalEthSpec>>;
|
|
|
|
|
|
|
|
/// Returns two different committee caches for testing.
|
|
|
|
fn committee_caches() -> (Arc<CommitteeCache>, Arc<CommitteeCache>) {
|
|
|
|
let harness = BeaconChainHarness::builder(MinimalEthSpec)
|
|
|
|
.default_spec()
|
|
|
|
.deterministic_keypairs(8)
|
|
|
|
.fresh_ephemeral_store()
|
|
|
|
.build();
|
|
|
|
let (mut state, _) = harness.get_current_state_and_root();
|
|
|
|
state
|
|
|
|
.build_committee_cache(RelativeEpoch::Current, &harness.chain.spec)
|
|
|
|
.unwrap();
|
|
|
|
state
|
|
|
|
.build_committee_cache(RelativeEpoch::Next, &harness.chain.spec)
|
|
|
|
.unwrap();
|
|
|
|
let committee_a = state
|
|
|
|
.committee_cache(RelativeEpoch::Current)
|
|
|
|
.unwrap()
|
|
|
|
.clone();
|
|
|
|
let committee_b = state.committee_cache(RelativeEpoch::Next).unwrap().clone();
|
|
|
|
assert!(committee_a != committee_b);
|
|
|
|
(Arc::new(committee_a), Arc::new(committee_b))
|
|
|
|
}
|
|
|
|
|
|
|
|
/// Builds a deterministic but incoherent shuffling ID from a `u64`.
|
|
|
|
fn shuffling_id(id: u64) -> AttestationShufflingId {
|
|
|
|
AttestationShufflingId {
|
|
|
|
shuffling_epoch: id.into(),
|
|
|
|
shuffling_decision_block: Hash256::from_low_u64_be(id),
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
fn resolved_promise() {
|
|
|
|
let (committee_a, _) = committee_caches();
|
|
|
|
let id_a = shuffling_id(1);
|
|
|
|
let mut cache = ShufflingCache::new();
|
|
|
|
|
|
|
|
// Create a promise.
|
|
|
|
let sender = cache.create_promise(id_a.clone()).unwrap();
|
|
|
|
|
|
|
|
// Retrieve the newly created promise.
|
|
|
|
let item = cache.get(&id_a).unwrap();
|
|
|
|
assert!(
|
|
|
|
matches!(item, CacheItem::Promise(_)),
|
|
|
|
"the item should be a promise"
|
|
|
|
);
|
|
|
|
|
|
|
|
// Resolve the promise.
|
|
|
|
sender.send(committee_a.clone()).unwrap();
|
|
|
|
|
|
|
|
// Ensure the promise has been resolved.
|
|
|
|
let item = cache.get(&id_a).unwrap();
|
|
|
|
assert!(
|
|
|
|
matches!(item, CacheItem::Committee(committee) if committee == committee_a),
|
|
|
|
"the promise should be resolved"
|
|
|
|
);
|
|
|
|
assert_eq!(cache.cache.len(), 1, "the cache should have one entry");
|
|
|
|
}
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
fn unresolved_promise() {
|
|
|
|
let id_a = shuffling_id(1);
|
|
|
|
let mut cache = ShufflingCache::new();
|
|
|
|
|
|
|
|
// Create a promise.
|
|
|
|
let sender = cache.create_promise(id_a.clone()).unwrap();
|
|
|
|
|
|
|
|
// Retrieve the newly created promise.
|
|
|
|
let item = cache.get(&id_a).unwrap();
|
|
|
|
assert!(
|
|
|
|
matches!(item, CacheItem::Promise(_)),
|
|
|
|
"the item should be a promise"
|
|
|
|
);
|
|
|
|
|
|
|
|
// Drop the sender without resolving the promise, simulating an error computing the
|
|
|
|
// committee.
|
|
|
|
drop(sender);
|
|
|
|
|
|
|
|
// Ensure the key now indicates an empty slot.
|
|
|
|
assert!(cache.get(&id_a).is_none(), "the slot should be empty");
|
|
|
|
assert!(cache.cache.is_empty(), "the cache should be empty");
|
|
|
|
}
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
fn two_promises() {
|
|
|
|
let (committee_a, committee_b) = committee_caches();
|
|
|
|
let (id_a, id_b) = (shuffling_id(1), shuffling_id(2));
|
|
|
|
let mut cache = ShufflingCache::new();
|
|
|
|
|
|
|
|
// Create promise A.
|
|
|
|
let sender_a = cache.create_promise(id_a.clone()).unwrap();
|
|
|
|
|
|
|
|
// Retrieve promise A.
|
|
|
|
let item = cache.get(&id_a).unwrap();
|
|
|
|
assert!(
|
|
|
|
matches!(item, CacheItem::Promise(_)),
|
|
|
|
"item a should be a promise"
|
|
|
|
);
|
|
|
|
|
|
|
|
// Create promise B.
|
|
|
|
let sender_b = cache.create_promise(id_b.clone()).unwrap();
|
|
|
|
|
|
|
|
// Retrieve promise B.
|
|
|
|
let item = cache.get(&id_b).unwrap();
|
|
|
|
assert!(
|
|
|
|
matches!(item, CacheItem::Promise(_)),
|
|
|
|
"item b should be a promise"
|
|
|
|
);
|
|
|
|
|
|
|
|
// Resolve promise A.
|
|
|
|
sender_a.send(committee_a.clone()).unwrap();
|
|
|
|
// Ensure promise A has been resolved.
|
|
|
|
let item = cache.get(&id_a).unwrap();
|
|
|
|
assert!(
|
|
|
|
matches!(item, CacheItem::Committee(committee) if committee == committee_a),
|
|
|
|
"promise A should be resolved"
|
|
|
|
);
|
|
|
|
|
|
|
|
// Resolve promise B.
|
|
|
|
sender_b.send(committee_b.clone()).unwrap();
|
|
|
|
// Ensure promise B has been resolved.
|
|
|
|
let item = cache.get(&id_b).unwrap();
|
|
|
|
assert!(
|
|
|
|
matches!(item, CacheItem::Committee(committee) if committee == committee_b),
|
|
|
|
"promise B should be resolved"
|
|
|
|
);
|
|
|
|
|
|
|
|
// Check both entries again.
|
|
|
|
assert!(
|
|
|
|
matches!(cache.get(&id_a).unwrap(), CacheItem::Committee(committee) if committee == committee_a),
|
|
|
|
"promise A should remain resolved"
|
|
|
|
);
|
|
|
|
assert!(
|
|
|
|
matches!(cache.get(&id_b).unwrap(), CacheItem::Committee(committee) if committee == committee_b),
|
|
|
|
"promise B should remain resolved"
|
|
|
|
);
|
|
|
|
assert_eq!(cache.cache.len(), 2, "the cache should have two entries");
|
|
|
|
}
|
|
|
|
|
|
|
|
#[test]
|
|
|
|
fn too_many_promises() {
|
|
|
|
let mut cache = ShufflingCache::new();
|
|
|
|
|
|
|
|
for i in 0..MAX_CONCURRENT_PROMISES {
|
|
|
|
cache.create_promise(shuffling_id(i as u64)).unwrap();
|
|
|
|
}
|
|
|
|
|
|
|
|
// Ensure that the next promise returns an error. It is important for the application to
|
|
|
|
// dump his ass when he can't keep his promises, you're a queen and you deserve better.
|
|
|
|
assert!(matches!(
|
|
|
|
cache.create_promise(shuffling_id(MAX_CONCURRENT_PROMISES as u64)),
|
|
|
|
Err(BeaconChainError::MaxCommitteePromises(
|
|
|
|
MAX_CONCURRENT_PROMISES
|
|
|
|
))
|
|
|
|
));
|
|
|
|
assert_eq!(
|
|
|
|
cache.cache.len(),
|
|
|
|
MAX_CONCURRENT_PROMISES,
|
|
|
|
"the cache should have two entries"
|
|
|
|
);
|
|
|
|
}
|
|
|
|
}
|