diff --git a/docs/architecture/README.md b/docs/architecture/README.md
index 3e0879b21f..b60a1b38a5 100644
--- a/docs/architecture/README.md
+++ b/docs/architecture/README.md
@@ -52,4 +52,5 @@ Please add a entry below in your Pull Request for an ADR.
- [ADR 023: Protocol Buffer Naming and Versioning](./adr-023-protobuf-naming.md)
- [ADR 024: Coin Metadata](./adr-024-coin-metadata.md)
- [ADR 025: IBC Passive Channels](./adr-025-ibc-passive-channels.md)
-- [ADR 026: IBC Client Recovery Mechanisms](./adr-026-ibc-client-recovery-mechanisms.md)
\ No newline at end of file
+- [ADR 026: IBC Client Recovery Mechanisms](./adr-026-ibc-client-recovery-mechanisms.md)
+- [ADR 027: Deterministic Protobuf Serialization](./adr-027-deterministic-protobuf-serialization.md)
diff --git a/docs/architecture/adr-020-protobuf-transaction-encoding.md b/docs/architecture/adr-020-protobuf-transaction-encoding.md
index 7146a1aa3e..885e713420 100644
--- a/docs/architecture/adr-020-protobuf-transaction-encoding.md
+++ b/docs/architecture/adr-020-protobuf-transaction-encoding.md
@@ -8,6 +8,7 @@
- 2020 April 30: Switch to `Any`
- 2020 May 14: Describe public key encoding
- 2020 June 08: Store `TxBody` and `AuthInfo` as bytes in `SignDoc`; Document `TxRaw` as broadcast and storage type.
+- 2020 August 07: Use ADR 027 for serializing `SignDoc`.
## Status
@@ -188,8 +189,6 @@ message SignDoc {
bytes auth_info = 2;
string chain_id = 3;
uint64 account_number = 4;
- // account_sequence starts at 1 rather than 0 to avoid the case where
- // the default 0 value must be omitted in protobuf serialization
uint64 account_sequence = 5;
}
```
@@ -197,8 +196,7 @@ message SignDoc {
In order to sign in the default mode, clients take the following steps:
1. Serialize `TxBody` and `AuthInfo` using any valid protobuf implementation.
-2. Create a `SignDoc` and encode it. (The only requirement of the underlying
- protobuf implementation is that fields are serialized in order).
+2. Create a `SignDoc` and serialize it using [ADR 027](./adr-027-deterministic-protobuf-serialization.md).
3. Sign the encoded `SignDoc` bytes.
4. Build a `TxRaw` and serialize it for broadcasting.
@@ -214,8 +212,7 @@ Signature verifiers do:
3. For each required signer:
- Pull account number and sequence from the state.
- Obtain the public key either from state or `AuthInfo`'s `signer_infos`.
- - Create a `SignDoc` and serialize it. Due to the simplicity of the type it
- is expected that this matches the serialization used by the signer.
+ - Create a `SignDoc` and serialize it using [ADR 027](./adr-027-deterministic-protobuf-serialization.md).
- Verify the signature at the the same list position against the serialized `SignDoc`.
#### `SIGN_MODE_LEGACY_AMINO`
@@ -423,8 +420,6 @@ message SignDocAux {
PublicKey public_key = 2;
string chain_id = 3;
uint64 account_number = 4;
- // account_sequence starts at 1 rather than 0 to avoid the case where
- // the default 0 value must be omitted in protobuf serialization
uint64 account_sequence = 5;
}
```
diff --git a/docs/architecture/adr-027-deterministic-protobuf-serialization.md b/docs/architecture/adr-027-deterministic-protobuf-serialization.md
new file mode 100644
index 0000000000..fbc96b7dc1
--- /dev/null
+++ b/docs/architecture/adr-027-deterministic-protobuf-serialization.md
@@ -0,0 +1,251 @@
+# ADR 027: Deterministic Protobuf Serialization
+
+## Changelog
+
+- 2020-08-07: Initial Draft
+
+## Status
+
+Proposed
+
+## Context
+
+[Protobuf](https://developers.google.com/protocol-buffers/docs/proto3)
+seralization is not unique (i.e. there exist a practically unlimited number of
+valid binary representations for a protobuf document)1. For signature
+verification in Cosmos SDK, signer and verifier need to agree on the same
+serialization of a SignDoc as defined in
+[ADR-020](./adr-020-protobuf-transaction-encoding.md) without transmitting the
+serialization. This document describes a deterministic serialization scheme for
+a subset of protobuf documents, that covers this use case but can be reused in
+other cases as well.
+
+## Decision
+
+The following encoding scheme is proposed to be used by other ADRs.
+
+### Scope
+
+This ADR defines a protobuf3 serializer. The output is a valid protobuf
+serialization, such that every protobuf parser can parse it.
+
+No maps are supported in version 1 due to the complexity of defining a
+derterministic serialization. This might change in future. Implementations must
+reject documents containing maps as invalid input.
+
+### Serialization rules
+
+The serialization is based on the
+[protobuf 3 encoding](https://developers.google.com/protocol-buffers/docs/encoding)
+with the following additions:
+
+1. Fields must be serialized only once in ascending order
+2. Extra fields or any extra data must not be added
+3. [Default values](https://developers.google.com/protocol-buffers/docs/proto3#default)
+ must be omitted
+4. `repeated` fields of scalar numeric types must use
+ [packed encoding](https://developers.google.com/protocol-buffers/docs/encoding#packed)
+ by default.
+5. Variant encoding of integers must not be longer than needed.
+
+While rule number 1. and 2. should be pretty straight forward and describe the
+default behaviour of all protobuf encoders the author is aware of, the 3rd rule
+is more interesting. After a protobuf 3 deserialization you cannot differentiate
+between unset fields and fields set to the default value2. At
+serialization level however, it is possible to set the fields with an empty
+value or omitting them entirely. This is a significant difference to e.g. JSON
+where a property can be empty (`""`, `0`), `null` or undefined, leading to 3
+different documents.
+
+Omitting fields set to default values is valid because the parser must assign
+the default value to fields missing in the serialization3. For scalar
+types, omitting defaults is required by the spec4. For `repeated`
+fields, not serializing them is the only way to express empty lists. Enums must
+have a first element of numeric value 0, which is the default5. And
+message fields default to unset6.
+
+Omitting defaults allows for some amount of forward compatibility: users of
+newer versions of a protobuf schema produce the same serialization as users of
+older versions as long as newly added fields are not used (i.e. set to their
+default value).
+
+### Implementation
+
+There are three main implementation strategies, ordered from the least to the
+most custom development:
+
+- **Use a protobuf serializer that follows the above rules by default.** E.g.
+ [gogoproto](https://pkg.go.dev/github.com/gogo/protobuf/gogoproto) is known to
+ be compliant by in most cases, but not when certain annotations such as
+ `nullable = false` are used. It might also be an option to configure an
+ existing serializer accordingly.
+- **Normalize default values before encoding them.** If your serializer follows
+ rule 1. and 2. and allows you to explicitly unset fields for serialization,
+ you can normalize default values to unset. This can be done when working with
+ [protobuf.js](https://www.npmjs.com/package/protobufjs):
+
+ ```js
+ const bytes = SignDoc.encode({
+ bodyBytes: body.length > 0 ? body : null, // normalize empty bytes to unset
+ authInfoBytes: authInfo.length > 0 ? authInfo : null, // normalize empty bytes to unset
+ chainId: chainId || null, // normalize "" to unset
+ accountNumber: accountNumber || null, // normalize 0 to unset
+ accountSequence: accountSequence || null, // normalize 0 to unset
+ }).finish();
+ ```
+
+- **Use a hand-written serializer for the types you need.** If none of the above
+ ways works for you, you can write a serializer yourself. For SignDoc this
+ would look something like this in Go, building on existing protobuf utilities:
+
+ ```go
+ if !signDoc.body_bytes.empty() {
+ buf.WriteUVarInt64(0xA) // wire type and field number for body_bytes
+ buf.WriteUVarInt64(signDoc.body_bytes.length())
+ buf.WriteBytes(signDoc.body_bytes)
+ }
+
+ if !signDoc.auth_info.empty() {
+ buf.WriteUVarInt64(0x12) // wire type and field number for auth_info
+ buf.WriteUVarInt64(signDoc.auth_info.length())
+ buf.WriteBytes(signDoc.auth_info)
+ }
+
+ if !signDoc.chain_id.empty() {
+ buf.WriteUVarInt64(0x1a) // wire type and field number for chain_id
+ buf.WriteUVarInt64(signDoc.chain_id.length())
+ buf.WriteBytes(signDoc.chain_id)
+ }
+
+ if signDoc.account_number != 0 {
+ buf.WriteUVarInt64(0x20) // wire type and field number for account_number
+ buf.WriteUVarInt(signDoc.account_number)
+ }
+
+ if signDoc.account_sequence != 0 {
+ buf.WriteUVarInt64(0x28) // wire type and field number for account_sequence
+ buf.WriteUVarInt(signDoc.account_sequence)
+ }
+ ```
+
+### Test vectors
+
+Given the protobuf definition `Article.proto`
+
+```protobuf
+package blog;
+syntax = "proto3";
+
+enum Type {
+ UNSPECIFIED = 0;
+ IMAGES = 1;
+ NEWS = 2;
+};
+
+enum Review {
+ UNSPECIFIED = 0;
+ ACCEPTED = 1;
+ REJECTED = 2;
+};
+
+message Article {
+ string title = 1;
+ string description = 2;
+ uint64 created = 3;
+ uint64 updated = 4;
+ bool public = 5;
+ bool promoted = 6;
+ Type type = 7;
+ Review review = 8;
+ repeated string comments = 9;
+ repeated string backlinks = 10;
+};
+```
+
+serializing the values
+
+```yaml
+title: "The world needs change 🌳"
+description: ""
+created: 1596806111080
+updated: 0
+public: true
+promoted: false
+type: Type.NEWS
+review: Review.UNSPECIFIED
+comments: ["Nice one", "Thank you"]
+backlinks: []
+```
+
+must result in the serialization
+
+```
+0a1b54686520776f726c64206e65656473206368616e676520f09f8cb318e8bebec8bc2e280138024a084e696365206f6e654a095468616e6b20796f75
+```
+
+When inspecting the serialized document, you see that every second field is
+omitted:
+
+```
+$ echo 0a1b54686520776f726c64206e65656473206368616e676520f09f8cb318e8bebec8bc2e280138024a084e696365206f6e654a095468616e6b20796f75 | xxd -r -p | protoc --decode_raw
+1: "The world needs change \360\237\214\263"
+3: 1596806111080
+5: 1
+7: 2
+9: "Nice one"
+9: "Thank you"
+```
+
+## Consequences
+
+Having such an encoding available allows us to get deterministic serialization
+for all protobuf documents we need in the context of Cosmos SDK signing.
+
+### Positive
+
+- Well defined rules that can be verified independent of a reference
+ implementation
+- Simple enough to keep the barrier to implement transaction signing low
+- It allows us to continue to use 0 and other empty values in SignDoc, avoiding
+ the need to work around 0 sequences. This does not imply the change from
+ https://github.com/cosmos/cosmos-sdk/pull/6949 should not be merged, but not
+ too important anymore.
+
+### Negative
+
+- When implementing transaction signing, the encoding rules above must be
+ understood and implemented.
+- The need for rule number 3. adds some complexity to implementations.
+
+### Neutral
+
+## References
+
+- 1 _When a message is serialized, there is no guaranteed order for
+ how its known or unknown fields should be written. Serialization order is an
+ implementation detail and the details of any particular implementation may
+ change in the future. Therefore, protocol buffer parsers must be able to parse
+ fields in any order._ from
+ https://developers.google.com/protocol-buffers/docs/encoding#order
+- 2 _Note that for scalar message fields, once a message is parsed
+ there's no way of telling whether a field was explicitly set to the default
+ value (for example whether a boolean was set to false) or just not set at all:
+ you should bear this in mind when defining your message types. For example,
+ don't have a boolean that switches on some behaviour when set to false if you
+ don't want that behaviour to also happen by default._ from
+ https://developers.google.com/protocol-buffers/docs/proto3#default
+- 3 _When a message is parsed, if the encoded message does not
+ contain a particular singular element, the corresponding field in the parsed
+ object is set to the default value for that field._ from
+ https://developers.google.com/protocol-buffers/docs/proto3#default
+- 4 _Also note that if a scalar message field is set to its default,
+ the value will not be serialized on the wire._ from
+ https://developers.google.com/protocol-buffers/docs/proto3#default
+- 5 _For enums, the default value is the first defined enum value,
+ which must be 0._ from
+ https://developers.google.com/protocol-buffers/docs/proto3#default
+- 6 _For message fields, the field is not set. Its exact value is
+ language-dependent._ from
+ https://developers.google.com/protocol-buffers/docs/proto3#default
+- Encoding rules and parts of the reasoning taken from
+ [canonical-proto3 Aaron Craelius](https://github.com/regen-network/canonical-proto3)