diff --git a/docs/architecture/adr-019-protobuf-state-encoding.md b/docs/architecture/adr-019-protobuf-state-encoding.md index 3f6a85b2af..98f9db6b82 100644 --- a/docs/architecture/adr-019-protobuf-state-encoding.md +++ b/docs/architecture/adr-019-protobuf-state-encoding.md @@ -4,6 +4,7 @@ - 2020 Feb 15: Initial Draft - 2020 Feb 24: Updates to handle messages with interface fields +- 2020 Apr 27: Convert usages of `oneof` for interfaces to `Any` ## Status @@ -68,7 +69,7 @@ as the concrete codec it accepts and/or extends. This means that all client JSON genesis state, will still use Amino. The ultimate goal will be to replace Amino JSON encoding with Protbuf encoding and thus have modules accept and/or extend `ProtoCodec`. -### Module Design +### Module Codecs Modules that do not require the ability to work with and serialize interfaces, the path to Protobuf migration is pretty straightforward. These modules are to simply migrate any existing types that @@ -110,162 +111,194 @@ type Codec interface { } ``` -Note, concrete types implementing these interfaces can be defined outside the scope of the module -that defines the interface (e.g. `ModuleAccount` in `x/supply`). To handle these cases, a Protobuf -message must be defined at the application-level along with a single codec that will be passed to _all_ -modules using a `oneof` approach. +### Usage of `Any` to encode interfaces -Example: +In general, module-level .proto files should define messages which encode interfaces +using [`google.protobuf.Any`](https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/any.proto). +After [extension discussion](https://github.com/cosmos/cosmos-sdk/issues/6030), +this was chosen as the preferred alternative to application-level `oneof`s +as in our original protobuf design. The arguments in favor of `Any` can be +summarized as follows: -```protobuf -// app/codec/codec.proto +* `Any` provides a simpler, more consistent client UX for dealing with +interfaces than app-level `oneof`s that will need to be coordinated more +carefully across applications. Creating a generic transaction +signing library using `oneof`s may be cumbersome and critical logic may need +to be reimplemented for each chain +* `Any` provides more resistance against human error than `oneof` +* `Any` is generally simpler to implement for both modules and apps -import "third_party/proto/cosmos-proto/cosmos.proto"; -import "x/auth/types/types.proto"; -import "x/auth/vesting/types/types.proto"; -import "x/supply/types/types.proto"; +The main counter-argument to using `Any` centers around its additional space +and possibly performance overhead. The space overhead could be dealt with using +compression at the persistence layer in the future and the performance impact +is likely to be small. Thus, not using `Any` is seem as a pre-mature optimization, +with user experience as the higher order concern. -message Account { - option (cosmos_proto.interface_type) = "*github.com/cosmos/cosmos-sdk/x/auth/exported.Account"; +Note, that given the SDK's decision to adopt the `Codec` interfaces described +above, apps can still choose to use `oneof` to encode state and transactions +but it is not the recommended approach. If apps do choose to use `oneof`s +instead of `Any` they will likely lose compatibility with client apps that +support multiple chains. Thus developers should think carefully about whether +they care more about what is possibly a pre-mature optimization or end-user +and client developer UX. - // sum defines a list of all acceptable concrete Account implementations. - oneof sum { - cosmos_sdk.x.auth.v1.BaseAccount base_account = 1; - cosmos_sdk.x.auth.vesting.v1.ContinuousVestingAccount continuous_vesting_account = 2; - cosmos_sdk.x.auth.vesting.v1.DelayedVestingAccount delayed_vesting_account = 3; - cosmos_sdk.x.auth.vesting.v1.PeriodicVestingAccount periodic_vesting_account = 4; - cosmos_sdk.x.supply.v1.ModuleAccount module_account = 5; - } +### Safe usage of `Any` - // ... -} -``` +By default, the [gogo protobuf implementation of `Any`](https://godoc.org/github.com/gogo/protobuf/types) +uses [global type registration]( https://github.com/gogo/protobuf/blob/master/proto/properties.go#L540) +to decode values packed in `Any` into concrete +go types. This introduces a vulnerability where any malicious module +in the dependency tree could registry a type with the global protobuf registry +and cause it to be loaded and unmarshaled by a transaction that referenced +it in the `type_url` field. + +To prevent this, we introduce a type registration mechanism for decoding `Any` +values into concrete types through the `InterfaceRegistry` interface which +bears some similarity to type registration with Amino: ```go -// app/codec/codec.go +type InterfaceRegistry interface { + // RegisterInterface associates protoName as the public name for the + // interface passed in as iface + // Ex: + // registry.RegisterInterface("cosmos_sdk.Msg", (*sdk.Msg)(nil)) + RegisterInterface(protoName string, iface interface{}) -type Codec struct { - codec.Marshaler + // RegisterImplementations registers impls as a concrete implementations of + // the interface iface + // Ex: + // registry.RegisterImplementations((*sdk.Msg)(nil), &MsgSend{}, &MsgMultiSend{}) + RegisterImplementations(iface interface{}, impls ...proto.Message) - - amino *codec.Codec -} - -func NewAppCodec(amino *codec.Codec) *Codec { - return &Codec{Marshaler: codec.NewHybridCodec(amino), amino: amino} -} - -func (c *Codec) MarshalAccount(accI authexported.Account) ([]byte, error) { - acc := &Account{} - if err := acc.SetAccount(accI); err != nil { - return nil, err - } - - return c.Marshaler.MarshalBinaryBare(acc) -} - -func (c *Codec) UnmarshalAccount(bz []byte) (authexported.Account, error) { - acc := &Account{} - if err := c.Marshaler.UnmarshalBinaryBare(bz, acc); err != nil { - return nil, err - } - - return acc.GetAccount(), nil } ``` -Since the `Codec` implements `auth.Codec` (and all other required interfaces), it is passed to _all_ -the modules and satisfies all the interfaces. Now each module needing to work with interfaces will know -about all the required types. Note, the use of `interface_type` allows us to avoid a significant -amount of code boilerplate when implementing the `Codec`. +In addition to serving as a whitelist, `InterfaceRegistry` can also serve +to communicate the list of concrete types that satisfy an interface to clients. -A similar concept is to be applied for messages that contain interfaces fields. The module will -define a "base" concrete message type that the application-level codec will extend via `oneof` that -fulfills the required message interface. -Example: - -The `MsgSubmitEvidence` defined by the `x/evidence` module contains a field `Evidence` which is an -interface. +The same struct that implements `InterfaceRegistry` will also implement an +interface `InterfaceUnpacker` to be used for unpacking `Any`s: ```go -type MsgSubmitEvidence struct { - Evidence exported.Evidence - Submitter sdk.AccAddress +type InterfaceUnpacker interface { + // UnpackAny unpacks the value in any to the interface pointer passed in as + // iface. Note that the type in any must have been registered with + // RegisterImplementations as a concrete type for that interface + // Ex: + // var msg sdk.Msg + // err := ctx.UnpackAny(any, &msg) + // ... + UnpackAny(any *Any, iface interface{}) error } ``` -Instead, we will implement a "base" message type and an interface which the concrete message type -must implement. +Note that `InterfaceRegistry` usage does not deviate from standard protobuf +usage of `Any`, it just introduces a security and introspection layer for +golang usage. + +`InterfaceRegistry` will be a member of `ProtoCodec` and `HybridCodec` as +described above. In order for modules to register interface types, app modules +can optionally implement the following interface: + +```go +type InterfaceModule interface { + RegisterInterfaceTypes(InterfaceRegistry) +} +``` + +The module manager will include a method to call `RegisterInterfaceTypes` on +every module that implements it in order to populate the `InterfaceRegistry`. + +### Using `Any` to encode state + +The SDK will provide support methods `MarshalAny` and `UnmarshalAny` to allow +easy encoding of state to `Any` in `Codec` implementations. Ex: + +```go +import "github.com/cosmos/cosmos-sdk/codec" + +func (c *Codec) MarshalEvidence(evidenceI eviexported.Evidence) ([]byte, error) { + return codec.MarshalAny(evidenceI) +} + +func (c *Codec) UnmarshalEvidence(bz []byte) (eviexported.Evidence, error) { + var evi eviexported.Evidence + err := codec.UnmarshalAny(c.interfaceContext, &evi, bz) + if err != nil { + return nil, err + } + return evi, nil +} +``` + +### Using `Any` in `sdk.Msg`s + +A similar concept is to be applied for messages that contain interfaces fields. +For example, we can define `MsgSubmitEvidence` as follows where `Evidence` is +an interface: ```protobuf // x/evidence/types/types.proto -message MsgSubmitEvidenceBase { +message MsgSubmitEvidence { bytes submitter = 1 [ (gogoproto.casttype) = "github.com/cosmos/cosmos-sdk/types.AccAddress" ]; + google.protobuf.Any evidence = 2; } ``` +Note that in order to unpack the evidence from `Any` we do need a reference to +`InterfaceRegistry`. In order to reference evidence in methods like +`ValidateBasic` which shouldn't have to know about the `InterfaceRegistry`, we +introduce an `UnpackInterfaces` phase to deserialization which unpacks +interfaces before they're needed. + +### Unpacking Interfaces + +To implement the `UnpackInterfaces` phase of deserialization which unpacks +interfaces wrapped in `Any` before they're needed, we create an interface +that `sdk.Msg`s and other types can implement: +```go +type UnpackInterfacesMsg interface { + UnpackInterfaces(InterfaceUnpacker) error +} +``` + +We also introduce a private `cachedValue interface{}` field onto the `Any` +struct itself with a public getter `GetUnpackedValue() interface{}`. + +The `UnpackInterfaces` method is to be invoked during message deserialization right +after `Unmarshal` and any interface values packed in `Any`s will be decoded +and stored in `cachedValue` for reference later. + +Then unpacked interface values can safely be used in any code afterwards +without knowledge of the `InterfaceRegistry` +and messages can introduce a simple getter to cast the cached value to the +correct interface type. + +This has the added benefit that unmarshaling of `Any` values only happens once +during initial deserialization rather than every time the value is read. Also, +when `Any` values are first packed (for instance in a call to +`NewMsgSubmitEvidence`), the original interface value is cached so that +unmarshaling isn't needed to read it again. + +`MsgSubmitEvidence` could implement `UnpackInterfaces`, plus a convenience getter +`GetEvidence` as follows: + ```go -// x/evidence/exported/evidence.go - -type MsgSubmitEvidence interface { - sdk.Msg - - GetEvidence() Evidence - GetSubmitter() sdk.AccAddress +func (msg MsgSubmitEvidence) UnpackInterfaces(ctx sdk.InterfaceRegistry) error { + var evi eviexported.Evidence + return ctx.UnpackAny(msg.Evidence, *evi) } -``` - -Notice the `MsgSubmitEvidence` interface extends `sdk.Msg` and allows for the `Evidence` interface -to be retrieved from the concrete message type. - -Now, the application-level codec will define the concrete `MsgSubmitEvidence` type and will have it -fulfill the `MsgSubmitEvidence` interface defined by `x/evidence`. - -```protobuf -// app/codec/codec.proto - -message Evidence { - option (gogoproto.equal) = true; - option (cosmos_proto.interface_type) = "github.com/cosmos/cosmos-sdk/x/evidence/exported.Evidence"; - - oneof sum { - cosmos_sdk.x.evidence.v1.Equivocation equivocation = 1; - } -} - -message MsgSubmitEvidence { - option (gogoproto.equal) = true; - option (gogoproto.goproto_getters) = false; - - Evidence evidence = 1; - cosmos_sdk.x.evidence.v1.MsgSubmitEvidenceBase base = 2 - [ - (gogoproto.nullable) = false, - (gogoproto.embed) = true - ]; -} -``` - -```go -// app/codec/msgs.go func (msg MsgSubmitEvidence) GetEvidence() eviexported.Evidence { - return msg.Evidence.GetEvidence() -} - -func (msg MsgSubmitEvidence) GetSubmitter() sdk.AccAddress { - return msg.Submitter + return msg.Evidence.GetUnpackedValue().(eviexported.Evidence) } ``` -Note, however, the module's message handler must now handle the interface `MsgSubmitEvidence` in -addition to any concrete types. - ### Why Wasn't X Chosen Instead For a more complete comparison to alternative protocols, see [here](https://codeburst.io/json-vs-protocol-buffers-vs-flatbuffers-a4247f8bda6f). @@ -288,9 +321,14 @@ untrusted inputs. ## Future Improvements & Roadmap -The landscape and roadmap to restructuring queriers and tx generation to fully support -Protobuf isn't fully understood yet. Once all modules are migrated, we will have a better -understanding on how to proceed with client improvements (e.g. gRPC) 2. +In the future we may consider a compression layer right above the persistence +layer which doesn't change tx or merkle tree hashes, but reduces the storage +overhead of `Any`. In addition, we may adopt protobuf naming conventions which +make type URLs a bit more concise while remaining descriptive. + +Additional code generation support around the usage of `Any` is something that +could also be explored in the future to make the UX for go developers more +seamless. ## Consequences @@ -303,9 +341,8 @@ understanding on how to proceed with client improvements (e.g. gRPC) 2