lighthouse/consensus/ssz/src/encode.rs
Paul Hauner fe52322088 Implement SSZ union type (#2579)
## Issue Addressed

NA

## Proposed Changes

Implements the "union" type from the SSZ spec for `ssz`, `ssz_derive`, `tree_hash` and `tree_hash_derive` so it may be derived for `enums`:

https://github.com/ethereum/consensus-specs/blob/v1.1.0-beta.3/ssz/simple-serialize.md#union

The union type is required for the merge, since the `Transaction` type is defined as a single-variant union `Union[OpaqueTransaction]`.

### Crate Updates

This PR will (hopefully) cause CI to publish new versions for the following crates:

- `eth2_ssz_derive`: `0.2.1` -> `0.3.0`
- `eth2_ssz`: `0.3.0` -> `0.4.0`
- `eth2_ssz_types`: `0.2.0` -> `0.2.1`
- `tree_hash`: `0.3.0` -> `0.4.0`
- `tree_hash_derive`: `0.3.0` -> `0.4.0`

These these crates depend on each other, I've had to add a workspace-level `[patch]` for these crates. A follow-up PR will need to remove this patch, ones the new versions are published.

### Union Behaviors

We already had SSZ `Encode` and `TreeHash` derive for enums, however it just did a "transparent" pass-through of the inner value. Since the "union" decoding from the spec is in conflict with the transparent method, I've required that all `enum` have exactly one of the following enum-level attributes:

#### SSZ

-  `#[ssz(enum_behaviour = "union")]`
    - matches the spec used for the merge
-  `#[ssz(enum_behaviour = "transparent")]`
    - maintains existing functionality
    - not supported for `Decode` (never was)
    
#### TreeHash

-  `#[tree_hash(enum_behaviour = "union")]`
    - matches the spec used for the merge
-  `#[tree_hash(enum_behaviour = "transparent")]`
    - maintains existing functionality

This means that we can maintain the existing transparent behaviour, but all existing users will get a compile-time error until they explicitly opt-in to being transparent.

### Legacy Option Encoding

Before this PR, we already had a union-esque encoding for `Option<T>`. However, this was with the *old* SSZ spec where the union selector was 4 bytes. During merge specification, the spec was changed to use 1 byte for the selector.

Whilst the 4-byte `Option` encoding was never used in the spec, we used it in our database. Writing a migrate script for all occurrences of `Option` in the database would be painful, especially since it's used in the `CommitteeCache`. To avoid the migrate script, I added a serde-esque `#[ssz(with = "module")]` field-level attribute to `ssz_derive` so that we can opt into the 4-byte encoding on a field-by-field basis.

The `ssz::legacy::four_byte_impl!` macro allows a one-liner to define the module required for the `#[ssz(with = "module")]` for some `Option<T> where T: Encode + Decode`.

Notably, **I have removed `Encode` and `Decode` impls for `Option`**. I've done this to force a break on downstream users. Like I mentioned, `Option` isn't used in the spec so I don't think it'll be *that* annoying. I think it's nicer than quietly having two different union implementations or quietly breaking the existing `Option` impl.

### Crate Publish Ordering

I've modified the order in which CI publishes crates to ensure that we don't publish a crate without ensuring we already published a crate that it depends upon.

## TODO

- [ ] Queue a follow-up `[patch]`-removing PR.
2021-09-25 05:58:36 +00:00

197 lines
6.1 KiB
Rust

use super::*;
mod impls;
/// Provides SSZ encoding (serialization) via the `as_ssz_bytes(&self)` method.
///
/// See `examples/` for manual implementations or the crate root for implementations using
/// `#[derive(Encode)]`.
pub trait Encode {
/// Returns `true` if this object has a fixed-length.
///
/// I.e., there are no variable length items in this object or any of it's contained objects.
fn is_ssz_fixed_len() -> bool;
/// Append the encoding `self` to `buf`.
///
/// Note, variable length objects need only to append their "variable length" portion, they do
/// not need to provide their offset.
fn ssz_append(&self, buf: &mut Vec<u8>);
/// The number of bytes this object occupies in the fixed-length portion of the SSZ bytes.
///
/// By default, this is set to `BYTES_PER_LENGTH_OFFSET` which is suitable for variable length
/// objects, but not fixed-length objects. Fixed-length objects _must_ return a value which
/// represents their length.
fn ssz_fixed_len() -> usize {
BYTES_PER_LENGTH_OFFSET
}
/// Returns the size (in bytes) when `self` is serialized.
///
/// Returns the same value as `self.as_ssz_bytes().len()` but this method is significantly more
/// efficient.
fn ssz_bytes_len(&self) -> usize;
/// Returns the full-form encoding of this object.
///
/// The default implementation of this method should suffice for most cases.
fn as_ssz_bytes(&self) -> Vec<u8> {
let mut buf = vec![];
self.ssz_append(&mut buf);
buf
}
}
/// Allow for encoding an ordered series of distinct or indistinct objects as SSZ bytes.
///
/// **You must call `finalize(..)` after the final `append(..)` call** to ensure the bytes are
/// written to `buf`.
///
/// ## Example
///
/// Use `SszEncoder` to produce identical output to `foo.as_ssz_bytes()`:
///
/// ```rust
/// use ssz_derive::{Encode, Decode};
/// use ssz::{Decode, Encode, SszEncoder};
///
/// #[derive(PartialEq, Debug, Encode, Decode)]
/// struct Foo {
/// a: u64,
/// b: Vec<u16>,
/// }
///
/// fn ssz_encode_example() {
/// let foo = Foo {
/// a: 42,
/// b: vec![1, 3, 3, 7]
/// };
///
/// let mut buf: Vec<u8> = vec![];
/// let offset = <u64 as Encode>::ssz_fixed_len() + <Vec<u16> as Encode>::ssz_fixed_len();
///
/// let mut encoder = SszEncoder::container(&mut buf, offset);
///
/// encoder.append(&foo.a);
/// encoder.append(&foo.b);
///
/// encoder.finalize();
///
/// assert_eq!(foo.as_ssz_bytes(), buf);
/// }
///
/// ```
pub struct SszEncoder<'a> {
offset: usize,
buf: &'a mut Vec<u8>,
variable_bytes: Vec<u8>,
}
impl<'a> SszEncoder<'a> {
/// Instantiate a new encoder for encoding a SSZ container.
pub fn container(buf: &'a mut Vec<u8>, num_fixed_bytes: usize) -> Self {
buf.reserve(num_fixed_bytes);
Self {
offset: num_fixed_bytes,
buf,
variable_bytes: vec![],
}
}
/// Append some `item` to the SSZ bytes.
pub fn append<T: Encode>(&mut self, item: &T) {
self.append_parameterized(T::is_ssz_fixed_len(), |buf| item.ssz_append(buf))
}
/// Uses `ssz_append` to append the encoding of some item to the SSZ bytes.
pub fn append_parameterized<F>(&mut self, is_ssz_fixed_len: bool, ssz_append: F)
where
F: Fn(&mut Vec<u8>),
{
if is_ssz_fixed_len {
ssz_append(&mut self.buf);
} else {
self.buf
.extend_from_slice(&encode_length(self.offset + self.variable_bytes.len()));
ssz_append(&mut self.variable_bytes);
}
}
/// Write the variable bytes to `self.bytes`.
///
/// This method must be called after the final `append(..)` call when serializing
/// variable-length items.
pub fn finalize(&mut self) -> &mut Vec<u8> {
self.buf.append(&mut self.variable_bytes);
&mut self.buf
}
}
/// Encode `len` as a little-endian byte array of `BYTES_PER_LENGTH_OFFSET` length.
///
/// If `len` is larger than `2 ^ BYTES_PER_LENGTH_OFFSET`, a `debug_assert` is raised.
pub fn encode_length(len: usize) -> [u8; BYTES_PER_LENGTH_OFFSET] {
// Note: it is possible for `len` to be larger than what can be encoded in
// `BYTES_PER_LENGTH_OFFSET` bytes, triggering this debug assertion.
//
// These are the alternatives to using a `debug_assert` here:
//
// 1. Use `assert`.
// 2. Push an error to the caller (e.g., `Option` or `Result`).
// 3. Ignore it completely.
//
// I have avoided (1) because it's basically a choice between "produce invalid SSZ" or "kill
// the entire program". I figure it may be possible for an attacker to trigger this assert and
// take the program down -- I think producing invalid SSZ is a better option than this.
//
// I have avoided (2) because this error will need to be propagated upstream, making encoding a
// function which may fail. I don't think this is ergonomic and the upsides don't outweigh the
// downsides.
//
// I figure a `debug_assertion` is better than (3) as it will give us a change to detect the
// error during testing.
//
// If you have a different opinion, feel free to start an issue and tag @paulhauner.
debug_assert!(len <= MAX_LENGTH_VALUE);
let mut bytes = [0; BYTES_PER_LENGTH_OFFSET];
bytes.copy_from_slice(&len.to_le_bytes()[0..BYTES_PER_LENGTH_OFFSET]);
bytes
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_encode_length() {
assert_eq!(encode_length(0), [0; 4]);
assert_eq!(encode_length(1), [1, 0, 0, 0]);
assert_eq!(
encode_length(MAX_LENGTH_VALUE),
[255; BYTES_PER_LENGTH_OFFSET]
);
}
#[test]
#[should_panic]
#[cfg(debug_assertions)]
fn test_encode_length_above_max_debug_panics() {
encode_length(MAX_LENGTH_VALUE + 1);
}
#[test]
#[cfg(not(debug_assertions))]
fn test_encode_length_above_max_not_debug_does_not_panic() {
assert_eq!(&encode_length(MAX_LENGTH_VALUE + 1)[..], &[0; 4]);
}
}