Dynamic record content indexing #27

Closed
opened 2022-08-31 16:47:32 +00:00 by i-norden · 5 comments
Member

From Ashwin: GQL queries currently do a full scan on registry records when filtering. We need to implement indexing to make this work at scale.

Separate set of tables, inside the same database, to hold the indexes (2ndary mappings, use key prefixing to define buckets). Indexes should not contribute to the state commitment, but they need to be internal indexes (in Badger) so that the dApp can use them when exposing GraphQL.

GraphQL layer is the one that will be using these indexes. Can currently see where this code is performing expensive iteration over the data.

The fields we need to index on are the record attributes (for this endpoint: 17480f2716/x/nameservice/types/nameservice.pb.go (L165)) this will require a dynamic approach for indexing the attributes (which can very from record tor record).

For each new type, we need to define a schema or some set of descriptors for autogenerating indexes from (perhaps using the ORM).

From Ashwin: GQL queries currently do a full scan on registry records when filtering. We need to implement indexing to make this work at scale. Separate set of tables, inside the same database, to hold the indexes (2ndary mappings, use key prefixing to define buckets). Indexes should not contribute to the state commitment, but they need to be internal indexes (in Badger) so that the dApp can use them when exposing GraphQL. GraphQL layer is the one that will be using these indexes. Can currently see where this code is performing expensive iteration over the data. The fields we need to index on are the record attributes (for this endpoint: https://github.com/vulcanize/chiba-clonk/blob/17480f271671e7f650a6195f96969878c40bd35a/x/nameservice/types/nameservice.pb.go#L165) this will require a dynamic approach for indexing the attributes (which can very from record tor record). For each new type, we need to define a schema or some set of descriptors for autogenerating indexes from (perhaps using the ORM).
Author
Member
Record type: https://github.com/cerc-io/laconicd/blob/680d5850847a8a882655c277e862ca291b4744fe/x/nameservice/types/nameservice.pb.go#L158
Author
Member
  1. Convert attributes type from string to protobuf any.Any
  2. Register schema record that contains in its attributes the file descriptor proto (or other type description/schema/identifier) for a content record's attributes field
  3. Record has a new "TypeID" field, for schema records this field will be empty but for content records it will contain a CID that references the schema record which contains the typing information for this record's attributes

Schema/type information is registered in a separate record because many content records will share the same type of attributes and this way they can share references to a single schema record rather than duplicating that data.

1. Convert attributes type from `string` to protobuf `any.Any` 1. Register _schema record_ that contains in its attributes the file descriptor proto (or other type description/schema/identifier) for a _content record's_ attributes field 2. Record has a new "TypeID" field, for _schema records_ this field will be empty but for _content records_ it will contain a CID that references the _schema record_ which contains the typing information for this record's attributes Schema/type information is registered in a separate record because many _content records_ will share the same type of attributes and this way they can share references to a single _schema record_ rather than duplicating that data.
Author
Member

After mapping to the protobuf type that we want to unpack the "Attributes" with, we still requires some level of introspection of the protobuf type to figure out which fields within we want to index the record with. For this reason, the schema record we register in step 2 needs to contain more than just the protobuf type. We need to annotate this type/descriptor in some way, decorating the fields in such a way that identifies which fields to index the record with.

After mapping to the protobuf type that we want to unpack the "Attributes" with, we still requires some level of introspection of the protobuf type to figure out which fields within we want to index the record with. For this reason, the _schema record_ we register in step 2 needs to contain more than just the protobuf type. We need to annotate this type/descriptor in some way, decorating the fields in such a way that identifies which fields to index the record with.
Author
Member

Still the issue of figuring out which fields in a given registered protobuf type to index by, so that will require some additional notation in the protobuf types/descriptors that we register. Also would need to use SelfDescribingMessage and FileDescriptorSet to handle the dynamic type registration/introspection which may negate a lot of the benefits of using protobuf.

Still the issue of figuring out which fields in a given registered protobuf type to index by, so that will require some additional notation in the protobuf types/descriptors that we register. Also would need to use SelfDescribingMessage and FileDescriptorSet to handle the dynamic type registration/introspection which may negate a lot of the benefits of using protobuf.
Author
Member

Closing this and will create a new issue for how we want to extend and/or refine the functionality in #40

Closing this and will create a new issue for how we want to extend and/or refine the functionality in #40
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: cerc-io/laconicd-deprecated#27
No description provided.