ipld-eth-server/documentation/custom-transformers.md

# Custom Transformers
When the capabilities of the generic `contractWatcher` are not sufficient, custom transformers tailored to a specific
purpose can be leveraged.

Individual custom transformers can be composed together from any number of external repositories and executed as a
single process using the `compose` and `execute` commands or the `composeAndExecute` command. This is accomplished by
generating a Go plugin which allows the `vulcanizedb` binary to link to the external transformers, so long as they
abide by one of the standard [interfaces](../staging/libraries/shared/transformer).

## Writing custom transformers
For help with writing different types of custom transformers please see below:

Storage Transformers: transform data derived from contract storage tries
   * [Guide](../../staging/libraries/shared/factories/storage/README.md)
   * [Example](../../staging/libraries/shared/factories/storage/EXAMPLE.md)

Event Transformers: transform data derived from Ethereum log events
   * [Guide](../../staging/libraries/shared/factories/event/README.md)
   * [Example 1](https://github.com/vulcanize/ens_transformers/tree/master/transformers/registar)
   * [Example 2](https://github.com/vulcanize/ens_transformers/tree/master/transformers/registry)
   * [Example 3](https://github.com/vulcanize/ens_transformers/tree/master/transformers/resolver)

Contract Transformers: transform data derived from Ethereum log events and use it to poll public contract methods
   * [Example 1](https://github.com/vulcanize/account_transformers)
   * [Example 2](https://github.com/vulcanize/ens_transformers/tree/master/transformers/domain_records)

## Preparing custom transformers to work as part of a plugin
To plug in an external transformer we need to:

1. Create a package that exports a variable `TransformerInitializer`, `StorageTransformerInitializer`, or `ContractTransformerInitializer` that are of type [TransformerInitializer](../staging/libraries/shared/transformer/event_transformer.go#L33)
or [StorageTransformerInitializer](../../staging/libraries/shared/transformer/storage_transformer.go#L31),
or [ContractTransformerInitializer](../../staging/libraries/shared/transformer/contract_transformer.go#L31), respectively
2. Design the transformers to work in the context of their [event](../staging/libraries/shared/watcher/event_watcher.go#L83),
[storage](../../staging/libraries/shared/watcher/storage_watcher.go#L53),
or [contract](../../staging/libraries/shared/watcher/contract_watcher.go#L68) watcher execution modes
3. Create db migrations to run against vulcanizeDB so that we can store the transformer output
    * Do not `goose fix` the transformer migrations, this is to ensure they are always ran after the core vulcanizedb migrations which are kept in their fixed form
    * Specify migration locations for each transformer in the config with the `exporter.transformer.migrations` fields
    * If the base vDB migrations occupy this path as well, they need to be in their `goose fix`ed form
    as they are [here](../../staging/db/migrations)

To update a plugin repository with changes to the core vulcanizedb repository, use your dependency manager to install the desired version of vDB.

## Building and Running Custom Transformers
### Commands
* The `compose`, `execute`, `composeAndExecute` commands require Go 1.11+ and use [Go plugins](https://golang
.org/pkg/plugin/) which only work on Unix-based systems.

* There is an ongoing [conflict](https://github.com/golang/go/issues/20481) between Go plugins and the use of vendored
dependencies which imposes certain limitations on how the plugins are built.

* Separate `compose` and `execute` commands allow pre-building and linking to the pre-built .so file. So, if
these are run independently, instead of using `composeAndExecute`, a couple of things need to be considered:
    * It is necessary that the .so file was built with the same exact dependencies that are present in the execution
    environment, i.e. we need to `compose` and `execute` the plugin .so file with the same exact version of vulcanizeDB.
    * The plugin migrations are run during the plugin's composition. As such, if `execute` is used to run a prebuilt .so
    in a different environment than the one it was composed in, then the database structure will need to be loaded 
    into the environment's Postgres database. This can either be done by manually loading the plugin's schema into 
    Postgres, or by manually running the plugin's migrations.
     
* The `compose` and `composeAndExecute` commands assume you are in the vulcanizdb directory located at your system's 
`$GOPATH`, and that the plugin dependencies are present at their `$GOPATH` directories.

* The `execute` command does not require the plugin transformer dependencies be located in their `$GOPATH` directories,
instead it expects a .so file (of the name specified in the config file) to be in
`$GOPATH/src/github.com/vulcanize/vulcanizedb/plugins/` and, as noted above, also expects the plugin db migrations to
 have already been ran against the database.

 * Usage:
     * compose: `./vulcanizedb compose --config=environments/config_name.toml`

     * execute: `./vulcanizedb execute --config=environments/config_name.toml`

     * composeAndExecute: `./vulcanizedb composeAndExecute --config=environments/config_name.toml`

### Flags
The `execute` and `composeAndExecute` commands can be passed optional flags to specify the operation of the watchers:

- `--recheck-headers`/`-r` - specifies whether to re-check headers for events after the header has already been queried for watched logs.
Can be useful for redundancy if you suspect that your node is not always returning all desired logs on every query.
Argument is expected to be a boolean: e.g. `-r=true`.
Defaults to `false`.

- `query-recheck-interval`/`-q` - specifies interval for re-checking storage diffs that haven been queued for later processing
(by default, the storage watched queues storage diffs if transformer execution fails, on the assumption that subsequent data derived from the event transformers may enable us to decode storage keys that we don't recognize right now).
Argument is expected to be a duration (integer measured in nanoseconds): e.g. `-q=10m30s` (for 10 minute, 30 second intervals).
Defaults to `5m` (5 minutes).

### Configuration
A .toml config file is specified when executing the commands.
The config provides information for composing a set of transformers from external repositories:

```toml
[database]
    name     = "vulcanize_public"
    hostname = "localhost"
    user     = "vulcanize"
    password = "vulcanize"
    port     = 5432

[client]
    ipcPath  = "/Users/user/Library/Ethereum/geth.ipc"

[exporter]
    home     = "github.com/vulcanize/vulcanizedb"
    name     = "exampleTransformerExporter"
    save     = false
    transformerNames = [
        "transformer1",
        "transformer2",
        "transformer3",
        "transformer4",
    ]
    [exporter.transformer1]
        path = "path/to/transformer1"
        type = "eth_event"
        repository = "github.com/account/repo"
        migrations = "db/migrations"
        rank = "0"
    [exporter.transformer2]
        path = "path/to/transformer2"
        type = "eth_contract"
        repository = "github.com/account/repo"
        migrations = "db/migrations"
        rank = "0"
    [exporter.transformer3]
        path = "path/to/transformer3"
        type = "eth_event"
        repository = "github.com/account/repo"
        migrations = "db/migrations"
        rank = "0"
    [exporter.transformer4]
        path = "path/to/transformer4"
        type = "eth_storage"
        repository = "github.com/account2/repo2"
        migrations = "to/db/migrations"
        rank = "1"
```
- `home` is the name of the package you are building the plugin for, in most cases this is github.com/vulcanize/vulcanizedb
- `name` is the name used for the plugin files (.so and .go)   
- `save` indicates whether or not the user wants to save the .go file instead of removing it after .so compilation. Sometimes useful for debugging/trouble-shooting purposes.
- `transformerNames` is the list of the names of the transformers we are composing together, so we know how to access their submaps in the exporter map
- `exporter.<transformerName>`s are the sub-mappings containing config info for the transformers
    - `repository` is the path for the repository which contains the transformer and its `TransformerInitializer`
    - `path` is the relative path from `repository` to the transformer's `TransformerInitializer` directory (initializer package).
        - Transformer repositories need to be cloned into the user's $GOPATH (`go get`)
    - `type` is the type of the transformer; indicating which type of watcher it works with (for now, there are only two options: `eth_event` and `eth_storage`)
        - `eth_storage` indicates the transformer works with the [storage watcher](../../staging/libraries/shared/watcher/storage_watcher.go)
         that fetches state and storage diffs from an ETH node (instead of, for example, from IPFS)
        - `eth_event` indicates the transformer works with the [event watcher](../../staging/libraries/shared/watcher/event_watcher.go)
         that fetches event logs from an ETH node
        - `eth_contract` indicates the transformer works with the [contract watcher](../staging/libraries/shared/watcher/contract_watcher.go)
        that is made to work with [contract_watcher pkg](../../staging/pkg/contract_watcher)
        based transformers which work with either a header or full sync vDB to watch events and poll public methods ([example1](https://github.com/vulcanize/account_transformers/tree/master/transformers/account/light), [example2](https://github.com/vulcanize/ens_transformers/tree/working/transformers/domain_records))
    - `migrations` is the relative path from `repository` to the db migrations directory for the transformer
    - `rank` determines the order that migrations are ran, with lower ranked migrations running first
        - this is to help isolate any potential conflicts between transformer migrations
        - start at "0" 
        - use strings
        - don't leave gaps
        - transformers with identical migrations/migration paths should share the same rank
- Note: If any of the imported transformers need additional config variables those need to be included as well   

This information is used to write and build a Go plugin which exports the configured transformers.
These transformers are loaded onto their specified watchers and executed.

Transformers of different types can be run together in the same command using a single config file or in separate instances using different config files   

The general structure of a plugin .go file, and what we would see built with the above config is shown below

```go
package main

import (
	interface1 "github.com/vulcanize/vulcanizedb/libraries/shared/transformer"
	transformer1 "github.com/account/repo/path/to/transformer1"
	transformer2 "github.com/account/repo/path/to/transformer2"
	transformer3 "github.com/account/repo/path/to/transformer3"
	transformer4 "github.com/account2/repo2/path/to/transformer4"
)

type exporter string

var Exporter exporter

func (e exporter) Export() []interface1.EventTransformerInitializer, []interface1.StorageTransformerInitializer, []interface1.ContractTransformerInitializer {
	return []interface1.TransformerInitializer{
            transformer1.TransformerInitializer,
            transformer3.TransformerInitializer,
        },     []interface1.StorageTransformerInitializer{
            transformer4.StorageTransformerInitializer,
        },     []interface1.ContractTransformerInitializer{
            transformer2.TransformerInitializer,
        }
}
```

### Storage backfilling
Storage transformers stream data from a geth subscription or parity csv file where the storage diffs are produced and emitted as the
full sync progresses. If the transformers have missed consuming a range of diffs due to lag in the startup of the processes or due to misalignment of the sync,
we can configure our storage transformers to backfill missing diffs from a [modified archival geth client](https://github.com/vulcanize/go-ethereum/tree/statediff_at).

To do so, add the following fields to the config file.
```toml
[storageBackFill]
    on = false
    rpcPath = ""
```
- `on` is set to `true` to turn the backfill process on
- `rpcPath` is the websocket or ipc path to the modified archival geth node that exposes the `StateDiffAt` rpc endpoint we can use to backfill storage diffs
Update transformer documentation broken into generic and custom 2019-05-08 22:16:36 +00:00			`# Custom Transformers`
			When the capabilities of the generic `contractWatcher` are not sufficient, custom transformers tailored to a specific
			`purpose can be leveraged.`

Address PR comment 2019-06-03 18:26:28 +00:00			`Individual custom transformers can be composed together from any number of external repositories and executed as a`
Update transformer documentation broken into generic and custom 2019-05-08 22:16:36 +00:00			single process using the `compose` and `execute` commands or the `composeAndExecute` command. This is accomplished by
Address PR comment 2019-06-03 18:26:28 +00:00			generating a Go plugin which allows the `vulcanizedb` binary to link to the external transformers, so long as they
Update transformer documentation broken into generic and custom 2019-05-08 22:16:36 +00:00			`abide by one of the standard [interfaces](../staging/libraries/shared/transformer).`

			`## Writing custom transformers`
			`For help with writing different types of custom transformers please see below:`

Descript different custom sync transformer types 2019-05-13 18:40:46 +00:00			`Storage Transformers: transform data derived from contract storage tries`
Update transformer documentation broken into generic and custom 2019-05-08 22:16:36 +00:00			`* [Guide](../../staging/libraries/shared/factories/storage/README.md)`
			`* [Example](../../staging/libraries/shared/factories/storage/EXAMPLE.md)`

Descript different custom sync transformer types 2019-05-13 18:40:46 +00:00			`Event Transformers: transform data derived from Ethereum log events`
Update transformer documentation broken into generic and custom 2019-05-08 22:16:36 +00:00			`* [Guide](../../staging/libraries/shared/factories/event/README.md)`
			`* [Example 1](https://github.com/vulcanize/ens_transformers/tree/master/transformers/registar)`
			`* [Example 2](https://github.com/vulcanize/ens_transformers/tree/master/transformers/registry)`
			`* [Example 3](https://github.com/vulcanize/ens_transformers/tree/master/transformers/resolver)`

Descript different custom sync transformer types 2019-05-13 18:40:46 +00:00			`Contract Transformers: transform data derived from Ethereum log events and use it to poll public contract methods`
Update transformer documentation broken into generic and custom 2019-05-08 22:16:36 +00:00			`* [Example 1](https://github.com/vulcanize/account_transformers)`
			`* [Example 2](https://github.com/vulcanize/ens_transformers/tree/master/transformers/domain_records)`

			`## Preparing custom transformers to work as part of a plugin`
			`To plug in an external transformer we need to:`

			1. Create a package that exports a variable `TransformerInitializer`, `StorageTransformerInitializer`, or `ContractTransformerInitializer` that are of type [TransformerInitializer](../staging/libraries/shared/transformer/event_transformer.go#L33)
			`or [StorageTransformerInitializer](../../staging/libraries/shared/transformer/storage_transformer.go#L31),`
			`or [ContractTransformerInitializer](../../staging/libraries/shared/transformer/contract_transformer.go#L31), respectively`
			`2. Design the transformers to work in the context of their [event](../staging/libraries/shared/watcher/event_watcher.go#L83),`
			`[storage](../../staging/libraries/shared/watcher/storage_watcher.go#L53),`
			`or [contract](../../staging/libraries/shared/watcher/contract_watcher.go#L68) watcher execution modes`
			`3. Create db migrations to run against vulcanizeDB so that we can store the transformer output`
			* Do not `goose fix` the transformer migrations, this is to ensure they are always ran after the core vulcanizedb migrations which are kept in their fixed form
			* Specify migration locations for each transformer in the config with the `exporter.transformer.migrations` fields
			* If the base vDB migrations occupy this path as well, they need to be in their `goose fix`ed form
			`as they are [here](../../staging/db/migrations)`

Update docs to reflect using go mod 2019-07-17 20:29:24 +00:00			`To update a plugin repository with changes to the core vulcanizedb repository, use your dependency manager to install the desired version of vDB.`
Update transformer documentation broken into generic and custom 2019-05-08 22:16:36 +00:00
			`## Building and Running Custom Transformers`
			`### Commands`
			* The `compose`, `execute`, `composeAndExecute` commands require Go 1.11+ and use [Go plugins](https://golang
			`.org/pkg/plugin/) which only work on Unix-based systems.`

Address small PR comments 2019-05-10 16:28:20 +00:00			`* There is an ongoing [conflict](https://github.com/golang/go/issues/20481) between Go plugins and the use of vendored`
Update transformer documentation broken into generic and custom 2019-05-08 22:16:36 +00:00			`dependencies which imposes certain limitations on how the plugins are built.`

Address small PR comments 2019-05-10 16:28:20 +00:00			* Separate `compose` and `execute` commands allow pre-building and linking to the pre-built .so file. So, if
Update transformer documentation broken into generic and custom 2019-05-08 22:16:36 +00:00			these are run independently, instead of using `composeAndExecute`, a couple of things need to be considered:
			`* It is necessary that the .so file was built with the same exact dependencies that are present in the execution`
			environment, i.e. we need to `compose` and `execute` the plugin .so file with the same exact version of vulcanizeDB.
			* The plugin migrations are run during the plugin's composition. As such, if `execute` is used to run a prebuilt .so
Updates to custom-transformers doc 2019-05-13 18:57:20 +00:00			`in a different environment than the one it was composed in, then the database structure will need to be loaded`
			`into the environment's Postgres database. This can either be done by manually loading the plugin's schema into`
			`Postgres, or by manually running the plugin's migrations.`
Update transformer documentation broken into generic and custom 2019-05-08 22:16:36 +00:00
			* The `compose` and `composeAndExecute` commands assume you are in the vulcanizdb directory located at your system's
Updates to custom-transformers doc 2019-05-13 18:57:20 +00:00			`$GOPATH`, and that the plugin dependencies are present at their `$GOPATH` directories.
Update transformer documentation broken into generic and custom 2019-05-08 22:16:36 +00:00
			* The `execute` command does not require the plugin transformer dependencies be located in their `$GOPATH` directories,
Address small PR comments 2019-05-10 16:28:20 +00:00			`instead it expects a .so file (of the name specified in the config file) to be in`
Update transformer documentation broken into generic and custom 2019-05-08 22:16:36 +00:00			`$GOPATH/src/github.com/vulcanize/vulcanizedb/plugins/` and, as noted above, also expects the plugin db migrations to
			`have already been ran against the database.`

			`* Usage:`
Address small PR comments 2019-05-10 16:28:20 +00:00			* compose: `./vulcanizedb compose --config=environments/config_name.toml`
Update transformer documentation broken into generic and custom 2019-05-08 22:16:36 +00:00
Address small PR comments 2019-05-10 16:28:20 +00:00			* execute: `./vulcanizedb execute --config=environments/config_name.toml`
Update transformer documentation broken into generic and custom 2019-05-08 22:16:36 +00:00
Address small PR comments 2019-05-10 16:28:20 +00:00			* composeAndExecute: `./vulcanizedb composeAndExecute --config=environments/config_name.toml`
Update transformer documentation broken into generic and custom 2019-05-08 22:16:36 +00:00
			`### Flags`
Address small PR comments 2019-05-10 16:28:20 +00:00			The `execute` and `composeAndExecute` commands can be passed optional flags to specify the operation of the watchers:
Make queue recheck interval configurable via CLI 2019-04-29 20:21:32 +00:00
			- `--recheck-headers`/`-r` - specifies whether to re-check headers for events after the header has already been queried for watched logs.
			`Can be useful for redundancy if you suspect that your node is not always returning all desired logs on every query.`
			Argument is expected to be a boolean: e.g. `-r=true`.
			Defaults to `false`.

			- `query-recheck-interval`/`-q` - specifies interval for re-checking storage diffs that haven been queued for later processing
			`(by default, the storage watched queues storage diffs if transformer execution fails, on the assumption that subsequent data derived from the event transformers may enable us to decode storage keys that we don't recognize right now).`
			Argument is expected to be a duration (integer measured in nanoseconds): e.g. `-q=10m30s` (for 10 minute, 30 second intervals).
			Defaults to `5m` (5 minutes).

Update transformer documentation broken into generic and custom 2019-05-08 22:16:36 +00:00			`### Configuration`
add some info for separate compose and execute commands; note about flaky test 2019-04-19 15:13:27 +00:00			`A .toml config file is specified when executing the commands.`
			`The config provides information for composing a set of transformers from external repositories:`
Refactoring readme 2019-04-05 15:10:34 +00:00
			```toml
			`[database]`
			`name = "vulcanize_public"`
			`hostname = "localhost"`
			`user = "vulcanize"`
			`password = "vulcanize"`
			`port = 5432`

			`[client]`
			`ipcPath = "/Users/user/Library/Ethereum/geth.ipc"`

			`[exporter]`
			`home = "github.com/vulcanize/vulcanizedb"`
			`name = "exampleTransformerExporter"`
			`save = false`
			`transformerNames = [`
			`"transformer1",`
			`"transformer2",`
			`"transformer3",`
			`"transformer4",`
			`]`
			`[exporter.transformer1]`
			`path = "path/to/transformer1"`
			`type = "eth_event"`
			`repository = "github.com/account/repo"`
			`migrations = "db/migrations"`
			`rank = "0"`
			`[exporter.transformer2]`
			`path = "path/to/transformer2"`
			`type = "eth_contract"`
			`repository = "github.com/account/repo"`
			`migrations = "db/migrations"`
			`rank = "0"`
			`[exporter.transformer3]`
			`path = "path/to/transformer3"`
			`type = "eth_event"`
			`repository = "github.com/account/repo"`
			`migrations = "db/migrations"`
			`rank = "0"`
			`[exporter.transformer4]`
			`path = "path/to/transformer4"`
			`type = "eth_storage"`
			`repository = "github.com/account2/repo2"`
			`migrations = "to/db/migrations"`
			`rank = "1"`
			```
			- `home` is the name of the package you are building the plugin for, in most cases this is github.com/vulcanize/vulcanizedb
			- `name` is the name used for the plugin files (.so and .go)
			- `save` indicates whether or not the user wants to save the .go file instead of removing it after .so compilation. Sometimes useful for debugging/trouble-shooting purposes.
			- `transformerNames` is the list of the names of the transformers we are composing together, so we know how to access their submaps in the exporter map
			- `exporter.<transformerName>`s are the sub-mappings containing config info for the transformers
			- `repository` is the path for the repository which contains the transformer and its `TransformerInitializer`
			- `path` is the relative path from `repository` to the transformer's `TransformerInitializer` directory (initializer package).
			- Transformer repositories need to be cloned into the user's $GOPATH (`go get`)
			- `type` is the type of the transformer; indicating which type of watcher it works with (for now, there are only two options: `eth_event` and `eth_storage`)
			- `eth_storage` indicates the transformer works with the [storage watcher](../../staging/libraries/shared/watcher/storage_watcher.go)
			`that fetches state and storage diffs from an ETH node (instead of, for example, from IPFS)`
			- `eth_event` indicates the transformer works with the [event watcher](../../staging/libraries/shared/watcher/event_watcher.go)
			`that fetches event logs from an ETH node`
			- `eth_contract` indicates the transformer works with the [contract watcher](../staging/libraries/shared/watcher/contract_watcher.go)
			`that is made to work with [contract_watcher pkg](../../staging/pkg/contract_watcher)`
(VDB-560) Rename lightSync to headerSync 2019-05-01 06:02:30 +00:00			`based transformers which work with either a header or full sync vDB to watch events and poll public methods ([example1](https://github.com/vulcanize/account_transformers/tree/master/transformers/account/light), [example2](https://github.com/vulcanize/ens_transformers/tree/working/transformers/domain_records))`
Refactoring readme 2019-04-05 15:10:34 +00:00			- `migrations` is the relative path from `repository` to the db migrations directory for the transformer
			- `rank` determines the order that migrations are ran, with lower ranked migrations running first
			`- this is to help isolate any potential conflicts between transformer migrations`
			`- start at "0"`
			`- use strings`
			`- don't leave gaps`
			`- transformers with identical migrations/migration paths should share the same rank`
			`- Note: If any of the imported transformers need additional config variables those need to be included as well`

			`This information is used to write and build a Go plugin which exports the configured transformers.`
			`These transformers are loaded onto their specified watchers and executed.`

			`Transformers of different types can be run together in the same command using a single config file or in separate instances using different config files`

			`The general structure of a plugin .go file, and what we would see built with the above config is shown below`

			```go
			`package main`

			`import (`
			`interface1 "github.com/vulcanize/vulcanizedb/libraries/shared/transformer"`
			`transformer1 "github.com/account/repo/path/to/transformer1"`
			`transformer2 "github.com/account/repo/path/to/transformer2"`
			`transformer3 "github.com/account/repo/path/to/transformer3"`
			`transformer4 "github.com/account2/repo2/path/to/transformer4"`
			`)`

			`type exporter string`

			`var Exporter exporter`

			`func (e exporter) Export() []interface1.EventTransformerInitializer, []interface1.StorageTransformerInitializer, []interface1.ContractTransformerInitializer {`
			`return []interface1.TransformerInitializer{`
			`transformer1.TransformerInitializer,`
			`transformer3.TransformerInitializer,`
			`}, []interface1.StorageTransformerInitializer{`
			`transformer4.StorageTransformerInitializer,`
			`}, []interface1.ContractTransformerInitializer{`
			`transformer2.TransformerInitializer,`
			`}`
			`}`
			```
integrate backfill into storage watcher; documentation for storage backfill 2019-10-24 16:35:39 +00:00
			`### Storage backfilling`
			`Storage transformers stream data from a geth subscription or parity csv file where the storage diffs are produced and emitted as the`
			`full sync progresses. If the transformers have missed consuming a range of diffs due to lag in the startup of the processes or due to misalignment of the sync,`
			`we can configure our storage transformers to backfill missing diffs from a [modified archival geth client](https://github.com/vulcanize/go-ethereum/tree/statediff_at).`

			`To do so, add the following fields to the config file.`
			```toml
			`[storageBackFill]`
			`on = false`
			`rpcPath = ""`
			```
			- `on` is set to `true` to turn the backfill process on
command edits; external pkg for finding min deployment block; gofmt 2019-11-01 05:35:10 +00:00			- `rpcPath` is the websocket or ipc path to the modified archival geth node that exposes the `StateDiffAt` rpc endpoint we can use to backfill storage diffs