lighthouse/consensus/cached_tree_hash/src/cache_arena.rs
Paul Hauner fe52322088 Implement SSZ union type (#2579)
## Issue Addressed

NA

## Proposed Changes

Implements the "union" type from the SSZ spec for `ssz`, `ssz_derive`, `tree_hash` and `tree_hash_derive` so it may be derived for `enums`:

https://github.com/ethereum/consensus-specs/blob/v1.1.0-beta.3/ssz/simple-serialize.md#union

The union type is required for the merge, since the `Transaction` type is defined as a single-variant union `Union[OpaqueTransaction]`.

### Crate Updates

This PR will (hopefully) cause CI to publish new versions for the following crates:

- `eth2_ssz_derive`: `0.2.1` -> `0.3.0`
- `eth2_ssz`: `0.3.0` -> `0.4.0`
- `eth2_ssz_types`: `0.2.0` -> `0.2.1`
- `tree_hash`: `0.3.0` -> `0.4.0`
- `tree_hash_derive`: `0.3.0` -> `0.4.0`

These these crates depend on each other, I've had to add a workspace-level `[patch]` for these crates. A follow-up PR will need to remove this patch, ones the new versions are published.

### Union Behaviors

We already had SSZ `Encode` and `TreeHash` derive for enums, however it just did a "transparent" pass-through of the inner value. Since the "union" decoding from the spec is in conflict with the transparent method, I've required that all `enum` have exactly one of the following enum-level attributes:

#### SSZ

-  `#[ssz(enum_behaviour = "union")]`
    - matches the spec used for the merge
-  `#[ssz(enum_behaviour = "transparent")]`
    - maintains existing functionality
    - not supported for `Decode` (never was)
    
#### TreeHash

-  `#[tree_hash(enum_behaviour = "union")]`
    - matches the spec used for the merge
-  `#[tree_hash(enum_behaviour = "transparent")]`
    - maintains existing functionality

This means that we can maintain the existing transparent behaviour, but all existing users will get a compile-time error until they explicitly opt-in to being transparent.

### Legacy Option Encoding

Before this PR, we already had a union-esque encoding for `Option<T>`. However, this was with the *old* SSZ spec where the union selector was 4 bytes. During merge specification, the spec was changed to use 1 byte for the selector.

Whilst the 4-byte `Option` encoding was never used in the spec, we used it in our database. Writing a migrate script for all occurrences of `Option` in the database would be painful, especially since it's used in the `CommitteeCache`. To avoid the migrate script, I added a serde-esque `#[ssz(with = "module")]` field-level attribute to `ssz_derive` so that we can opt into the 4-byte encoding on a field-by-field basis.

The `ssz::legacy::four_byte_impl!` macro allows a one-liner to define the module required for the `#[ssz(with = "module")]` for some `Option<T> where T: Encode + Decode`.

Notably, **I have removed `Encode` and `Decode` impls for `Option`**. I've done this to force a break on downstream users. Like I mentioned, `Option` isn't used in the spec so I don't think it'll be *that* annoying. I think it's nicer than quietly having two different union implementations or quietly breaking the existing `Option` impl.

### Crate Publish Ordering

I've modified the order in which CI publishes crates to ensure that we don't publish a crate without ensuring we already published a crate that it depends upon.

## TODO

- [ ] Queue a follow-up `[patch]`-removing PR.
2021-09-25 05:58:36 +00:00

499 lines
15 KiB
Rust

use crate::SmallVec8;
use ssz::{Decode, Encode};
use ssz_derive::{Decode, Encode};
use std::cmp::Ordering;
use std::marker::PhantomData;
use std::ops::Range;
#[derive(Debug, PartialEq, Clone)]
pub enum Error {
UnknownAllocId(usize),
OffsetOverflow,
OffsetUnderflow,
RangeOverFlow,
}
/// Inspired by the `TypedArena` crate, the `CachedArena` provides a single contiguous memory
/// allocation from which smaller allocations can be produced. In effect this allows for having
/// many `Vec<T>`-like objects all stored contiguously on the heap with the aim of reducing memory
/// fragmentation.
///
/// Because all of the allocations are stored in one big `Vec`, resizing any of the allocations
/// will mean all items to the right of that allocation will be moved.
#[derive(Debug, PartialEq, Clone, Default, Encode, Decode)]
pub struct CacheArena<T: Encode + Decode> {
/// The backing array, storing cached values.
backing: Vec<T>,
/// A list of offsets indicating the start of each allocation.
offsets: Vec<usize>,
}
impl<T: Encode + Decode> CacheArena<T> {
/// Instantiate self with a backing array of the given `capacity`.
pub fn with_capacity(capacity: usize) -> Self {
Self {
backing: Vec::with_capacity(capacity),
offsets: vec![],
}
}
/// Produce an allocation of zero length at the end of the backing array.
pub fn alloc(&mut self) -> CacheArenaAllocation<T> {
let alloc_id = self.offsets.len();
self.offsets.push(self.backing.len());
CacheArenaAllocation {
alloc_id,
_phantom: PhantomData,
}
}
/// Update `self.offsets` to reflect an allocation increasing in size.
fn grow(&mut self, alloc_id: usize, grow_by: usize) -> Result<(), Error> {
if alloc_id < self.offsets.len() {
self.offsets
.iter_mut()
.skip(alloc_id + 1)
.try_for_each(|offset| {
*offset = offset.checked_add(grow_by).ok_or(Error::OffsetOverflow)?;
Ok(())
})
} else {
Err(Error::UnknownAllocId(alloc_id))
}
}
/// Update `self.offsets` to reflect an allocation decreasing in size.
fn shrink(&mut self, alloc_id: usize, shrink_by: usize) -> Result<(), Error> {
if alloc_id < self.offsets.len() {
self.offsets
.iter_mut()
.skip(alloc_id + 1)
.try_for_each(|offset| {
*offset = offset
.checked_sub(shrink_by)
.ok_or(Error::OffsetUnderflow)?;
Ok(())
})
} else {
Err(Error::UnknownAllocId(alloc_id))
}
}
/// Similar to `Vec::splice`, however the range is relative to some allocation (`alloc_id`) and
/// the replaced items are not returned (i.e., it is forgetful).
///
/// To reiterate, the given `range` should be relative to the given `alloc_id`, not
/// `self.backing`. E.g., if the allocation has an offset of `20` and the range is `0..1`, then
/// the splice will translate to `self.backing[20..21]`.
fn splice_forgetful<I: IntoIterator<Item = T>>(
&mut self,
alloc_id: usize,
range: Range<usize>,
replace_with: I,
) -> Result<(), Error> {
let offset = *self
.offsets
.get(alloc_id)
.ok_or(Error::UnknownAllocId(alloc_id))?;
let start = range
.start
.checked_add(offset)
.ok_or(Error::RangeOverFlow)?;
let end = range.end.checked_add(offset).ok_or(Error::RangeOverFlow)?;
let prev_len = self.backing.len();
self.backing.splice(start..end, replace_with);
match prev_len.cmp(&self.backing.len()) {
Ordering::Greater => self.shrink(alloc_id, prev_len - self.backing.len())?,
Ordering::Less => self.grow(alloc_id, self.backing.len() - prev_len)?,
Ordering::Equal => {}
}
Ok(())
}
/// Returns the length of the specified allocation.
fn len(&self, alloc_id: usize) -> Result<usize, Error> {
let start = self
.offsets
.get(alloc_id)
.ok_or(Error::UnknownAllocId(alloc_id))?;
let end = self
.offsets
.get(alloc_id + 1)
.copied()
.unwrap_or_else(|| self.backing.len());
Ok(end - start)
}
/// Get the value at position `i`, relative to the offset at `alloc_id`.
fn get(&self, alloc_id: usize, i: usize) -> Result<Option<&T>, Error> {
if i < self.len(alloc_id)? {
let offset = self
.offsets
.get(alloc_id)
.ok_or(Error::UnknownAllocId(alloc_id))?;
Ok(self.backing.get(i + offset))
} else {
Ok(None)
}
}
/// Mutably get the value at position `i`, relative to the offset at `alloc_id`.
fn get_mut(&mut self, alloc_id: usize, i: usize) -> Result<Option<&mut T>, Error> {
if i < self.len(alloc_id)? {
let offset = self
.offsets
.get(alloc_id)
.ok_or(Error::UnknownAllocId(alloc_id))?;
Ok(self.backing.get_mut(i + offset))
} else {
Ok(None)
}
}
/// Returns the range in `self.backing` that is occupied by some allocation.
fn range(&self, alloc_id: usize) -> Result<Range<usize>, Error> {
let start = *self
.offsets
.get(alloc_id)
.ok_or(Error::UnknownAllocId(alloc_id))?;
let end = self
.offsets
.get(alloc_id + 1)
.copied()
.unwrap_or_else(|| self.backing.len());
Ok(start..end)
}
/// Iterate through all values in some allocation.
fn iter(&self, alloc_id: usize) -> Result<impl Iterator<Item = &T>, Error> {
Ok(self.backing[self.range(alloc_id)?].iter())
}
/// Mutably iterate through all values in some allocation.
fn iter_mut(&mut self, alloc_id: usize) -> Result<impl Iterator<Item = &mut T>, Error> {
let range = self.range(alloc_id)?;
Ok(self.backing[range].iter_mut())
}
/// Returns the total number of items stored in the arena, the sum of all values in all
/// allocations.
pub fn backing_len(&self) -> usize {
self.backing.len()
}
}
/// An allocation from a `CacheArena` that behaves like a `Vec<T>`.
///
/// All functions will modify the given `arena` instead of `self`. As such, it is safe to have
/// multiple instances of this allocation at once.
///
/// For all functions that accept a `CacheArena<T>` parameter, that arena should always be the one
/// that created `Self`. I.e., do not mix-and-match allocations and arenas unless you _really_ know
/// what you're doing (or want to have a bad time).
#[derive(Debug, PartialEq, Clone, Default, Encode, Decode)]
pub struct CacheArenaAllocation<T> {
alloc_id: usize,
#[ssz(skip_serializing, skip_deserializing)]
_phantom: PhantomData<T>,
}
impl<T: Encode + Decode> CacheArenaAllocation<T> {
/// Grow the allocation in `arena`, appending `vec` to the current values.
pub fn extend_with_vec(
&self,
arena: &mut CacheArena<T>,
vec: SmallVec8<T>,
) -> Result<(), Error> {
let len = arena.len(self.alloc_id)?;
arena.splice_forgetful(self.alloc_id, len..len, vec)?;
Ok(())
}
/// Push `item` to the end of the current allocation in `arena`.
///
/// An error is returned if this allocation is not known to the given `arena`.
pub fn push(&self, arena: &mut CacheArena<T>, item: T) -> Result<(), Error> {
let len = arena.len(self.alloc_id)?;
arena.splice_forgetful(self.alloc_id, len..len, vec![item])?;
Ok(())
}
/// Get the i'th item in the `arena` (relative to this allocation).
///
/// An error is returned if this allocation is not known to the given `arena`.
pub fn get<'a>(&self, arena: &'a CacheArena<T>, i: usize) -> Result<Option<&'a T>, Error> {
arena.get(self.alloc_id, i)
}
/// Mutably get the i'th item in the `arena` (relative to this allocation).
///
/// An error is returned if this allocation is not known to the given `arena`.
pub fn get_mut<'a>(
&self,
arena: &'a mut CacheArena<T>,
i: usize,
) -> Result<Option<&'a mut T>, Error> {
arena.get_mut(self.alloc_id, i)
}
/// Iterate through all items in the `arena` (relative to this allocation).
pub fn iter<'a>(&self, arena: &'a CacheArena<T>) -> Result<impl Iterator<Item = &'a T>, Error> {
arena.iter(self.alloc_id)
}
/// Mutably iterate through all items in the `arena` (relative to this allocation).
pub fn iter_mut<'a>(
&self,
arena: &'a mut CacheArena<T>,
) -> Result<impl Iterator<Item = &'a mut T>, Error> {
arena.iter_mut(self.alloc_id)
}
/// Return the number of items stored in this allocation.
pub fn len(&self, arena: &CacheArena<T>) -> Result<usize, Error> {
arena.len(self.alloc_id)
}
/// Returns true if this allocation is empty.
pub fn is_empty(&self, arena: &CacheArena<T>) -> Result<bool, Error> {
self.len(arena).map(|len| len == 0)
}
}
#[cfg(test)]
mod tests {
use crate::Hash256;
use smallvec::smallvec;
type CacheArena = super::CacheArena<Hash256>;
type CacheArenaAllocation = super::CacheArenaAllocation<Hash256>;
fn hash(i: usize) -> Hash256 {
Hash256::from_low_u64_be(i as u64)
}
fn test_routine(arena: &mut CacheArena, sub: &mut CacheArenaAllocation) {
let mut len = sub.len(arena).expect("should exist");
sub.push(arena, hash(len)).expect("should push");
len += 1;
assert_eq!(
sub.len(arena).expect("should exist"),
len,
"after first push sub should have len {}",
len
);
assert!(
!sub.is_empty(arena).expect("should exist"),
"new sub should not be empty"
);
sub.push(arena, hash(len)).expect("should push again");
len += 1;
assert_eq!(
sub.len(arena).expect("should exist"),
len,
"after second push sub should have len {}",
len
);
sub.extend_with_vec(arena, smallvec![hash(len), hash(len + 1)])
.expect("should extend with vec");
len += 2;
assert_eq!(
sub.len(arena).expect("should exist"),
len,
"after extend sub should have len {}",
len
);
let collected = sub
.iter(arena)
.expect("should get iter")
.cloned()
.collect::<Vec<_>>();
let collected_mut = sub
.iter_mut(arena)
.expect("should get mut iter")
.map(|v| *v)
.collect::<Vec<_>>();
for i in 0..len {
assert_eq!(
*sub.get(arena, i)
.expect("should exist")
.expect("should get sub index"),
hash(i),
"get({}) should be hash({})",
i,
i
);
assert_eq!(
collected[i],
hash(i),
"collected[{}] should be hash({})",
i,
i
);
assert_eq!(
collected_mut[i],
hash(i),
"collected_mut[{}] should be hash({})",
i,
i
);
}
}
#[test]
fn single() {
let arena = &mut CacheArena::default();
assert_eq!(arena.backing.len(), 0, "should start with an empty backing");
assert_eq!(arena.offsets.len(), 0, "should start without any offsets");
let mut sub = arena.alloc();
assert_eq!(
sub.len(arena).expect("should exist"),
0,
"new sub should have len 0"
);
assert!(
sub.is_empty(arena).expect("should exist"),
"new sub should be empty"
);
test_routine(arena, &mut sub);
}
#[test]
fn double() {
let arena = &mut CacheArena::default();
assert_eq!(arena.backing.len(), 0, "should start with an empty backing");
assert_eq!(arena.offsets.len(), 0, "should start without any offsets");
let mut sub_01 = arena.alloc();
assert_eq!(
sub_01.len(arena).expect("should exist"),
0,
"new sub should have len 0"
);
assert!(
sub_01.is_empty(arena).expect("should exist"),
"new sub should be empty"
);
let mut sub_02 = arena.alloc();
assert_eq!(
sub_02.len(arena).expect("should exist"),
0,
"new sub should have len 0"
);
assert!(
sub_02.is_empty(arena).expect("should exist"),
"new sub should be empty"
);
test_routine(arena, &mut sub_01);
test_routine(arena, &mut sub_02);
}
#[test]
fn one_then_other() {
let arena = &mut CacheArena::default();
assert_eq!(arena.backing.len(), 0, "should start with an empty backing");
assert_eq!(arena.offsets.len(), 0, "should start without any offsets");
let mut sub_01 = arena.alloc();
assert_eq!(
sub_01.len(arena).expect("should exist"),
0,
"new sub should have len 0"
);
assert!(
sub_01.is_empty(arena).expect("should exist"),
"new sub should be empty"
);
test_routine(arena, &mut sub_01);
let mut sub_02 = arena.alloc();
assert_eq!(
sub_02.len(arena).expect("should exist"),
0,
"new sub should have len 0"
);
assert!(
sub_02.is_empty(arena).expect("should exist"),
"new sub should be empty"
);
test_routine(arena, &mut sub_02);
test_routine(arena, &mut sub_01);
test_routine(arena, &mut sub_02);
}
#[test]
fn many() {
let arena = &mut CacheArena::default();
assert_eq!(arena.backing.len(), 0, "should start with an empty backing");
assert_eq!(arena.offsets.len(), 0, "should start without any offsets");
let mut subs = vec![];
for i in 0..50 {
if i == 0 {
let sub = arena.alloc();
assert_eq!(
sub.len(arena).expect("should exist"),
0,
"new sub should have len 0"
);
assert!(
sub.is_empty(arena).expect("should exist"),
"new sub should be empty"
);
subs.push(sub);
continue;
} else if i % 2 == 0 {
test_routine(arena, &mut subs[i - 1]);
}
let sub = arena.alloc();
assert_eq!(
sub.len(arena).expect("should exist"),
0,
"new sub should have len 0"
);
assert!(
sub.is_empty(arena).expect("should exist"),
"new sub should be empty"
);
subs.push(sub);
}
for mut sub in subs.iter_mut() {
test_routine(arena, &mut sub);
}
}
}