Merge pull request #2 from vulcanize/watch

finish updating documentation after refactor
This commit is contained in:
Ian Norden 2020-06-20 11:00:19 -05:00 committed by GitHub
commit 2d4f4ad43c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
16 changed files with 383 additions and 1143 deletions

304
README.md
View File

@ -1,73 +1,40 @@
# Vulcanize DB # Vulcanize DB
[![Build Status](https://travis-ci.org/vulcanize/vulcanizedb.svg?branch=master)](https://travis-ci.org/vulcanize/vulcanizedb) [![Go Report Card](https://goreportcard.com/badge/github.com/vulcanize/ipfs-blockchain-watcher)](https://goreportcard.com/report/github.com/vulcanize/ipfs-blockchain-watcher)
[![Go Report Card](https://goreportcard.com/badge/github.com/vulcanize/ipfs-chain-watcher)](https://goreportcard.com/report/github.com/vulcanize/ipfs-chain-watcher)
> Vulcanize DB is a set of tools that make it easier for developers to write application-specific indexes and caches for dapps built on Ethereum.
> Tool for extracting and indexing blockchain data on PG-IPFS
## Table of Contents ## Table of Contents
1. [Background](#background) 1. [Background](#background)
1. [Architecture](#architecture)
1. [Install](#install) 1. [Install](#install)
1. [Usage](#usage) 1. [Usage](#usage)
1. [Contributing](#contributing) 1. [Contributing](#contributing)
1. [License](#license) 1. [License](#license)
## Background ## Background
The same data structures and encodings that make Ethereum an effective and trust-less distributed virtual machine ipfs-blockchain-watcher is a collection of interfaces that are used to extract, process, and store in Postgres-IPFS
complicate data accessibility and usability for dApp developers. VulcanizeDB improves Ethereum data accessibility by all chain data. The raw data indexed by ipfs-blockchain-watcher serves as the basis for more specific watchers and applications.
providing a suite of tools to ease the extraction and transformation of data into a more useful state, including
allowing for exposing aggregate data from a suite of smart contracts.
VulanizeDB includes processes that sync, transform and expose data. Syncing involves Currently the service supports complete processing of all Bitcoin and Ethereum data.
querying an Ethereum node and then persisting core data into a Postgres database. Transforming focuses on using previously synced data to
query for and transform log event and storage data for specifically configured smart contract addresses. Exposing data is a matter of getting
data from VulcanizeDB's underlying Postgres database and making it accessible.
![VulcanizeDB Overview Diagram](documentation/diagrams/vdb-overview.png) ## Architecture
More details on the design of ipfs-blockchain-watcher can be found in [here](./documentation/architecture.md)
## Install ## Install
1. [Postgres](#postgres)
1. [Goose](#goose)
1. [IPFS](#ipfs)
1. [Blockchain](#blockchain)
1. [Watcher](#watcher)
1. [Dependencies](#dependencies) ### Postgres
1. [Building the project](#building-the-project) 1. [Install Postgres](https://wiki.postgresql.org/wiki/Detailed_installation_guides)
1. [Setting up the database](#setting-up-the-database)
1. [Configuring a synced Ethereum node](#configuring-a-synced-ethereum-node)
### Dependencies
- Go 1.12+
- Postgres 11.2
- Ethereum Node
- [Go Ethereum](https://ethereum.github.io/go-ethereum/downloads/) (1.8.23+)
- [Parity 1.8.11+](https://github.com/paritytech/parity/releases)
### Building the project
Download the codebase to your local `GOPATH` via:
`go get github.com/vulcanize/ipfs-chain-watcher`
Move to the project directory:
`cd $GOPATH/src/github.com/vulcanize/ipfs-chain-watcher`
Be sure you have enabled Go Modules (`export GO111MODULE=on`), and build the executable with:
`make build`
If you need to use a different dependency than what is currently defined in `go.mod`, it may helpful to look into [the replace directive](https://github.com/golang/go/wiki/Modules#when-should-i-use-the-replace-directive).
This instruction enables you to point at a fork or the local filesystem for dependency resolution.
If you are running into issues at this stage, ensure that `GOPATH` is defined in your shell.
If necessary, `GOPATH` can be set in `~/.bashrc` or `~/.bash_profile`, depending upon your system.
It can be additionally helpful to add `$GOPATH/bin` to your shell's `$PATH`.
### Setting up the database
1. Install Postgres
1. Create a superuser for yourself and make sure `psql --list` works without prompting for a password. 1. Create a superuser for yourself and make sure `psql --list` works without prompting for a password.
1. `createdb vulcanize_public` 1. `createdb vulcanize_public`
1. `cd $GOPATH/src/github.com/vulcanize/ipfs-chain-watcher` 1. `cd $GOPATH/src/github.com/vulcanize/ipfs-blockchain-watcher`
1. Run the migrations: `make migrate HOST_NAME=localhost NAME=vulcanize_public PORT=5432` 1. Run the migrations: `make migrate HOST_NAME=localhost NAME=vulcanize_public PORT=5432`
- There is an optional var `USER=username` if the database user is not the default user `postgres` - There are optional vars `USER=username` and `PASS=password` if the database user is not the default user `postgres` and/or a password is present
- To rollback a single step: `make rollback NAME=vulcanize_public` - To rollback a single step: `make rollback NAME=vulcanize_public`
- To rollback to a certain migration: `make rollback_to MIGRATION=n NAME=vulcanize_public` - To rollback to a certain migration: `make rollback_to MIGRATION=n NAME=vulcanize_public`
- To see status of migrations: `make migration_status NAME=vulcanize_public` - To see status of migrations: `make migration_status NAME=vulcanize_public`
@ -79,76 +46,207 @@ localhost. To allow access on Ubuntu, set localhost connections via hostname, ip
(It should be noted that trusted auth should only be enabled on systems without sensitive data in them: development and local test databases) (It should be noted that trusted auth should only be enabled on systems without sensitive data in them: development and local test databases)
### Configuring a synced Ethereum node ### Goose
- To use a local Ethereum node, copy `environments/public.toml.example` to We use [goose](https://github.com/pressly/goose) as our migration management tool. While it is not necessary to use `goose` for manual setup, it
`environments/public.toml` and update the `ipcPath` and `levelDbPath`. is required for running the automated tests.
- `ipcPath` should match the local node's IPC filepath:
- For Geth:
- The IPC file is called `geth.ipc`.
- The geth IPC file path is printed to the console when you start geth.
- The default location is:
- Mac: `<full home path>/Library/Ethereum/geth.ipc`
- Linux: `<full home path>/ethereum/geth.ipc`
- Note: the geth.ipc file may not exist until you've started the geth process
- For Parity: ### IPFS
- The IPC file is called `jsonrpc.ipc`. We use IPFS to store IPLD objects for each type of data we extract from on chain.
- The default location is:
- Mac: `<full home path>/Library/Application\ Support/io.parity.ethereum/`
- Linux: `<full home path>/local/share/io.parity.ethereum/`
- `levelDbPath` should match Geth's chaindata directory path. To start, download and install [IPFS](https://github.com/vulcanize/go-ipfs):
- The geth LevelDB chaindata path is printed to the console when you start geth.
- The default location is:
- Mac: `<full home path>/Library/Ethereum/geth/chaindata`
- Linux: `<full home path>/ethereum/geth/chaindata`
- `levelDbPath` is irrelevant (and `coldImport` is currently unavailable) if only running parity.
`go get github.com/ipfs/go-ipfs`
`cd $GOPATH/src/github.com/ipfs/go-ipfs`
`make install`
If we want to use Postgres as our backing datastore, we need to use the vulcanize fork of go-ipfs.
Start by adding the fork and switching over to it:
`git remote add vulcanize https://github.com/vulcanize/go-ipfs.git`
`git fetch vulcanize`
`git checkout -b postgres_update vulcanize/postgres_update`
Now install this fork of ipfs, first be sure to remove any previous installation:
`make install`
Check that is installed properly by running:
`ipfs`
You should see the CLI info/help output.
And now we initialize with the `postgresds` profile.
If ipfs was previously initialized we will need to remove the old profile first.
We also need to provide env variables for the postgres connection:
We can either set these manually, e.g.
```bash
export IPFS_PGHOST=
export IPFS_PGUSER=
export IPFS_PGDATABASE=
export IPFS_PGPORT=
export IPFS_PGPASSWORD=
```
And then run the ipfs command:
`ipfs init --profile=postgresds`
Or we can use the pre-made script at `GOPATH/src/github.com/ipfs/go-ipfs/misc/utility/ipfs_postgres.sh`
which has usage:
`./ipfs_postgres.sh <IPFS_PGHOST> <IPFS_PGPORT> <IPFS_PGUSER> <IPFS_PGDATABASE>"`
and will ask us to enter the password, avoiding storing it to an ENV variable.
Once we have initialized ipfs, that is all we need to do with it- we do not need to run a daemon during the subsequent processes.
### Blockchain
This section describes how to setup an Ethereum or Bitcoin node to serve as a data source for ipfs-blockchain-watcher
#### Ethereum
For Ethereum, we currently *require* [a special fork of go-ethereum](https://github.com/vulcanize/go-ethereum/tree/statediff_at_anyblock-1.9.11). This can be setup as follows.
Skip this steps if you already have access to a node that displays the statediffing endpoints.
Begin by downloading geth and switching to the vulcanize/rpc_statediffing branch:
`go get github.com/ethereum/go-ethereum`
`cd $GOPATH/src/github.com/ethereum/go-ethereum`
`git remote add vulcanize https://github.com/vulcanize/go-ethereum.git`
`git fetch vulcanize`
`git checkout -b statediffing vulcanize/statediff_at_anyblock-1.9.11`
Now, install this fork of geth (make sure any old versions have been uninstalled/binaries removed first):
`make geth`
And run the output binary with statediffing turned on:
`cd $GOPATH/src/github.com/ethereum/go-ethereum/build/bin`
`./geth --statediff --statediff.streamblock --ws --syncmode=full`
Note: if you wish to access historical data (perform `backFill`) then the node will need to operate as an archival node (`--gcmode=archive`)
Note: other CLI options- statediff specific ones included- can be explored with `./geth help`
The output from geth should mention that it is `Starting statediff service` and block synchronization should begin shortly thereafter.
Note that until it receives a subscriber, the statediffing process does nothing but wait for one. Once a subscription is received, this
will be indicated in the output and node will begin processing and sending statediffs.
Also in the output will be the endpoints that we will use to interface with the node.
The default ws url is "127.0.0.1:8546" and the default http url is "127.0.0.1:8545".
These values will be used as the `ethereum.wsPath` and `ethereum.httpPath` in the config, respectively.
#### Bitcoin
For Bitcoin, ipfs-blockchain-watcher is able to operate entirely through the universally exposed JSON-RPC interfaces.
This means we can use any of the standard full nodes (e.g. bitcoind, btcd) as our data source.
Point at a remote node or set one up locally using the instructions for [bitcoind](https://github.com/bitcoin/bitcoin) and [btcd](https://github.com/btcsuite/btcd).
The default http url is "127.0.0.1:8332". We will use the http endpoint as both the `bitcoin.wsPath` and `bitcoin.httpPath`
(bitcoind does not support websocket endpoints, we are currently using a "subscription" wrapper around the http endpoints)
### Watcher
Finally, we can setup the watcher process itself.
Start by downloading vulcanizedb and moving into the repo:
`go get github.com/vulcanize/ipfs-chain-watcher`
`cd $GOPATH/src/github.com/vulcanize/ipfs-chain-watcher`
Then, build the binary:
`make build`
## Usage ## Usage
As mentioned above, VulcanizeDB's processes can be split into three categories: syncing, transforming and exposing data. After building the binary, run as
### Data syncing `./ipfs-blockchain-watcher watch --config=<config_file.toml`
To provide data for transformations, raw Ethereum data must first be synced into VulcanizeDB.
This is accomplished through the use of the `headerSync` command.
These commands are described in detail [here](documentation/data-syncing.md).
### Data transformation ### Configuration
Data transformation uses the raw data that has been synced into Postgres to filter out and apply transformations to
specific data of interest. Since there are different types of data that may be useful for observing smart contracts, it
follows that there are different ways to transform this data. We've started by categorizing this into Generic and
Custom transformers:
- Generic Contract Transformer: Generic contract transformation can be done using a built-in command, Below is the set of universal config parameters for the ipfs-blockchain-watcher command, in .toml form, with the respective environmental variables commented to the side.
`contractWatcher`, which transforms contract events provided the contract's ABI is available. It also This set of parameters needs to be set no matter the chain type.
provides some state variable coverage by automating polling of public methods, with some restrictions.
`contractWatcher` is described further [here](documentation/generic-transformer.md).
- Custom Transformers: In many cases custom transformers will need to be written to provide ```toml
more comprehensive coverage of contract data. In this case we have provided the `compose`, `execute`, and [database]
`composeAndExecute` commands for running custom transformers from external repositories. Documentation on how to write, name = "vulcanize_public" # $DATABASE_NAME
build and run custom transformers as Go plugins can be found hostname = "localhost" # $DATABASE_HOSTNAME
[here](documentation/custom-transformers.md). port = 5432 # $DATABASE_PORT
user = "vdbm" # $DATABASE_USER
password = "" # $DATABASE_PASSWORD
[ipfs]
path = "~/.ipfs" # $IPFS_PATH
mode = "postgres" # $IPFS_MODE
[superNode]
chain = "bitcoin" # $SUPERNODE_CHAIN
server = true # $SUPERNODE_SERVER
ipcPath = "~/.vulcanize/vulcanize.ipc" # $SUPERNODE_IPC_PATH
wsPath = "127.0.0.1:8082" # $SUPERNODE_WS_PATH
httpPath = "127.0.0.1:8083" # $SUPERNODE_HTTP_PATH
sync = true # $SUPERNODE_SYNC
workers = 1 # $SUPERNODE_WORKERS
backFill = true # $SUPERNODE_BACKFILL
frequency = 45 # $SUPERNODE_FREQUENCY
batchSize = 1 # $SUPERNODE_BATCH_SIZE
batchNumber = 50 # $SUPERNODE_BATCH_NUMBER
timeout = 300 # $HTTP_TIMEOUT
validationLevel = 1 # $SUPERNODE_VALIDATION_LEVEL
```
Additional parameters need to be set depending on the specific chain.
For Bitcoin:
```toml
[bitcoin]
wsPath = "127.0.0.1:8332" # $BTC_WS_PATH
httpPath = "127.0.0.1:8332" # $BTC_HTTP_PATH
pass = "password" # $BTC_NODE_PASSWORD
user = "username" # $BTC_NODE_USER
nodeID = "ocd0" # $BTC_NODE_ID
clientName = "Omnicore" # $BTC_CLIENT_NAME
genesisBlock = "000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f" # $BTC_GENESIS_BLOCK
networkID = "0xD9B4BEF9" # $BTC_NETWORK_ID
```
For Ethereum:
```toml
[ethereum]
wsPath = "127.0.0.1:8546" # $ETH_WS_PATH
httpPath = "127.0.0.1:8545" # $ETH_HTTP_PATH
nodeID = "arch1" # $ETH_NODE_ID
clientName = "Geth" # $ETH_CLIENT_NAME
genesisBlock = "0xd4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3" # $ETH_GENESIS_BLOCK
networkID = "1" # $ETH_NETWORK_ID
```
### Exposing the data ### Exposing the data
[Postgraphile](https://www.graphile.org/postgraphile/) is used to expose GraphQL endpoints for our database schemas, this is described in detail [here](documentation/postgraphile.md). We can expose a number of different APIs for remote access to ipfs-blockchain-watcher data, these are dicussed in more detail [here](./documentation/apis.md)
### Tests
- Replace the empty `ipcPath` in the `environments/testing.toml` with a path to a full node's eth_jsonrpc endpoint (e.g. local geth node ipc path or infura url)
- Note: must be mainnet
- Note: integration tests require configuration with an archival node
- `make test` will run the unit tests and skip the integration tests
- `make integrationtest` will run just the integration tests
- `make test` and `make integrationtest` setup a clean `vulcanize_testing` db
### Testing
`make test` will run the unit tests
`make test` setups a clean `vulcanize_testing` db
## Contributing ## Contributing
Contributions are welcome! Contributions are welcome!
VulcanizeDB follows the [Contributor Covenant Code of Conduct](https://www.contributor-covenant.org/version/1/4/code-of-conduct). VulcanizeDB follows the [Contributor Covenant Code of Conduct](https://www.contributor-covenant.org/version/1/4/code-of-conduct).
For more information on contributing, please see [here](documentation/contributing.md).
## License ## License
[AGPL-3.0](LICENSE) © Vulcanize Inc [AGPL-3.0](LICENSE) © Vulcanize Inc

View File

@ -1,5 +1,5 @@
## VulcanizeDB Super Node APIs ## ipfs-blockchain-watcher APIs
The super node exposes a number of different APIs for remote access to the underlying DB. We can expose a number of different APIs for remote access to ipfs-blockchain-watcher data
### Table of Contents ### Table of Contents
@ -9,7 +9,7 @@ The super node exposes a number of different APIs for remote access to the under
### Postgraphile ### Postgraphile
The super node stores all processed data in Postgres using PG-IPFS, this includes all of the IPLD objects. ipfs-blockchain-watcher stores all processed data in Postgres using PG-IPFS, this includes all of the IPLD objects.
[Postgraphile](https://www.graphile.org/postgraphile/) can be used to expose GraphQL endpoints for the Postgres tables. [Postgraphile](https://www.graphile.org/postgraphile/) can be used to expose GraphQL endpoints for the Postgres tables.
e.g. e.g.
@ -22,15 +22,15 @@ All of their data can then be queried with standard [GraphQL](https://graphql.or
### RPC Subscription Interface ### RPC Subscription Interface
A direct, real-time subscription to the data being processed by the super node can be established over WS or IPC through the [Stream](../../pkg/super_node/api.go#L53) RPC method. A direct, real-time subscription to the data being processed by ipfs-blockchain-watcher can be established over WS or IPC through the [Stream](../pkg/watcher/api.go#L53) RPC method.
This method is not chain-specific and each chain-type supports it, it is accessed under the "vdb" namespace rather than a chain-specific namespace. An interface for This method is not chain-specific and each chain-type supports it, it is accessed under the "vdb" namespace rather than a chain-specific namespace. An interface for
subscribing to this endpoint is provided [here](../../libraries/shared/streamer/super_node_streamer.go). subscribing to this endpoint is provided [here](../pkg/streamer/super_node_streamer.go).
When subscribing to this endpoint, the subscriber provides a set of RLP-encoded subscription parameters. These parameters will be chain-specific, and are used When subscribing to this endpoint, the subscriber provides a set of RLP-encoded subscription parameters. These parameters will be chain-specific, and are used
by the super node to filter and return a requested subset of chain data to the subscriber. (e.g. [BTC](../../pkg/super_node/btc/subscription_config.go), [ETH](../../pkg/super_node/eth/subscription_config.go)). by ipfs-blockchain-watcher to filter and return a requested subset of chain data to the subscriber. (e.g. [BTC](../pkg/btc/subscription_config.go), [ETH](../../pkg/eth/subscription_config.go)).
#### Ethereum RPC Subscription #### Ethereum RPC Subscription
An example of how to subscribe to a real-time Ethereum data feed from the super node using the `Stream` RPC method is provided below An example of how to subscribe to a real-time Ethereum data feed from ipfs-blockchain-watcher using the `Stream` RPC method is provided below
```go ```go
package main package main
@ -40,10 +40,10 @@ An example of how to subscribe to a real-time Ethereum data feed from the super
"github.com/ethereum/go-ethereum/rpc" "github.com/ethereum/go-ethereum/rpc"
"github.com/spf13/viper" "github.com/spf13/viper"
"github.com/vulcanize/ipfs-chain-watcher/libraries/shared/streamer" "github.com/vulcanize/ipfs-chain-watcher/pkg/client"
"github.com/vulcanize/ipfs-chain-watcher/pkg/eth/client" "github.com/vulcanize/ipfs-chain-watcher/pkg/eth"
"github.com/vulcanize/ipfs-chain-watcher/pkg/super_node" "github.com/vulcanize/ipfs-chain-watcher/pkg/streamer"
"github.com/vulcanize/ipfs-chain-watcher/pkg/super_node/eth" "github.com/vulcanize/ipfs-chain-watcher/pkg/watcher"
) )
config, _ := eth.NewEthSubscriptionConfig() config, _ := eth.NewEthSubscriptionConfig()
@ -52,7 +52,7 @@ An example of how to subscribe to a real-time Ethereum data feed from the super
rawRPCClient, _ := rpc.Dial(vulcPath) rawRPCClient, _ := rpc.Dial(vulcPath)
rpcClient := client.NewRPCClient(rawRPCClient, vulcPath) rpcClient := client.NewRPCClient(rawRPCClient, vulcPath)
stream := streamer.NewSuperNodeStreamer(rpcClient) stream := streamer.NewSuperNodeStreamer(rpcClient)
payloadChan := make(chan super_node.SubscriptionPayload, 20000) payloadChan := make(chan watcher.SubscriptionPayload, 20000)
subscription, _ := stream.Stream(payloadChan, rlpConfig) subscription, _ := stream.Stream(payloadChan, rlpConfig)
for { for {
select { select {
@ -103,10 +103,10 @@ These configuration parameters are broken down as follows:
`ethSubscription.wsPath` is used to define the SuperNode ws url OR ipc endpoint we subscribe to `ethSubscription.wsPath` is used to define the SuperNode ws url OR ipc endpoint we subscribe to
`ethSubscription.historicalData` specifies whether or not the super node should look up historical data in its cache and `ethSubscription.historicalData` specifies whether or not ipfs-blockchain-watcher should look up historical data in its cache and
send that to the subscriber, if this is set to `false` then the super node only streams newly synced/incoming data send that to the subscriber, if this is set to `false` then we only streams newly synced/incoming data
`ethSubscription.historicalDataOnly` will tell the super node to only send historical data with the specified range and `ethSubscription.historicalDataOnly` will tell ipfs-blockchain-watcher to only send historical data with the specified range and
not stream forward syncing data not stream forward syncing data
`ethSubscription.startingBlock` is the starting block number for the range we want to receive data in `ethSubscription.startingBlock` is the starting block number for the range we want to receive data in
@ -116,43 +116,43 @@ setting to 0 means there is no end/we will continue streaming indefinitely.
`ethSubscription.headerFilter` has two sub-options: `off` and `uncles`. `ethSubscription.headerFilter` has two sub-options: `off` and `uncles`.
- Setting `off` to true tells the super node to not send any headers to the subscriber - Setting `off` to true tells ipfs-blockchain-watcher to not send any headers to the subscriber
- setting `uncles` to true tells the super node to send uncles in addition to normal headers. - setting `uncles` to true tells ipfs-blockchain-watcher to send uncles in addition to normal headers.
`ethSubscription.txFilter` has three sub-options: `off`, `src`, and `dst`. `ethSubscription.txFilter` has three sub-options: `off`, `src`, and `dst`.
- Setting `off` to true tells the super node to not send any transactions to the subscriber - Setting `off` to true tells ipfs-blockchain-watcher to not send any transactions to the subscriber
- `src` and `dst` are string arrays which can be filled with ETH addresses we want to filter transactions for, - `src` and `dst` are string arrays which can be filled with ETH addresses we want to filter transactions for,
if they have any addresses then the super node will only send transactions that were sent or received by the addresses contained if they have any addresses then ipfs-blockchain-watcher will only send transactions that were sent or received by the addresses contained
in `src` and `dst`, respectively. in `src` and `dst`, respectively.
`ethSubscription.receiptFilter` has four sub-options: `off`, `topics`, `contracts` and `matchTxs`. `ethSubscription.receiptFilter` has four sub-options: `off`, `topics`, `contracts` and `matchTxs`.
- Setting `off` to true tells the super node to not send any receipts to the subscriber - Setting `off` to true tells ipfs-blockchain-watcher to not send any receipts to the subscriber
- `topic0s` is a string array which can be filled with event topics we want to filter for, - `topic0s` is a string array which can be filled with event topics we want to filter for,
if it has any topics then the super node will only send receipts that contain logs which have that topic0. if it has any topics then ipfs-blockchain-watcher will only send receipts that contain logs which have that topic0.
- `contracts` is a string array which can be filled with contract addresses we want to filter for, if it contains any contract addresses the super node will - `contracts` is a string array which can be filled with contract addresses we want to filter for, if it contains any contract addresses the super node will
only send receipts that correspond to one of those contracts. only send receipts that correspond to one of those contracts.
- `matchTrxs` is a bool which when set to true any receipts that correspond to filtered for transactions will be sent by the super node, regardless of whether or not the receipt satisfies the `topics` or `contracts` filters. - `matchTrxs` is a bool which when set to true any receipts that correspond to filtered for transactions will be sent by the super node, regardless of whether or not the receipt satisfies the `topics` or `contracts` filters.
`ethSubscription.stateFilter` has three sub-options: `off`, `addresses`, and `intermediateNodes`. `ethSubscription.stateFilter` has three sub-options: `off`, `addresses`, and `intermediateNodes`.
- Setting `off` to true tells the super node to not send any state data to the subscriber - Setting `off` to true tells ipfs-blockchain-watcher to not send any state data to the subscriber
- `addresses` is a string array which can be filled with ETH addresses we want to filter state for, - `addresses` is a string array which can be filled with ETH addresses we want to filter state for,
if it has any addresses then the super node will only send state leafs (accounts) corresponding to those account addresses. if it has any addresses then ipfs-blockchain-watcher will only send state leafs (accounts) corresponding to those account addresses.
- By default the super node only sends along state leafs, if we want to receive branch and extension nodes as well `intermediateNodes` can be set to `true`. - By default ipfs-blockchain-watcher only sends along state leafs, if we want to receive branch and extension nodes as well `intermediateNodes` can be set to `true`.
`ethSubscription.storageFilter` has four sub-options: `off`, `addresses`, `storageKeys`, and `intermediateNodes`. `ethSubscription.storageFilter` has four sub-options: `off`, `addresses`, `storageKeys`, and `intermediateNodes`.
- Setting `off` to true tells the super node to not send any storage data to the subscriber - Setting `off` to true tells ipfs-blockchain-watcher to not send any storage data to the subscriber
- `addresses` is a string array which can be filled with ETH addresses we want to filter storage for, - `addresses` is a string array which can be filled with ETH addresses we want to filter storage for,
if it has any addresses then the super node will only send storage nodes from the storage tries at those state addresses. if it has any addresses then ipfs-blockchain-watcher will only send storage nodes from the storage tries at those state addresses.
- `storageKeys` is another string array that can be filled with storage keys we want to filter storage data for. It is important to note that the storage keys need to be the actual keccak256 hashes, whereas - `storageKeys` is another string array that can be filled with storage keys we want to filter storage data for. It is important to note that the storage keys need to be the actual keccak256 hashes, whereas
the addresses in the `addresses` fields are pre-hashed ETH addresses. the addresses in the `addresses` fields are pre-hashed ETH addresses.
- By default the super node only sends along storage leafs, if we want to receive branch and extension nodes as well `intermediateNodes` can be set to `true`. - By default ipfs-blockchain-watcher only sends along storage leafs, if we want to receive branch and extension nodes as well `intermediateNodes` can be set to `true`.
### Bitcoin RPC Subscription: ### Bitcoin RPC Subscription:
An example of how to subscribe to a real-time Bitcoin data feed from the super node using the `Stream` RPC method is provided below An example of how to subscribe to a real-time Bitcoin data feed from ipfs-blockchain-watcher using the `Stream` RPC method is provided below
```go ```go
package main package main
@ -212,10 +212,10 @@ These configuration parameters are broken down as follows:
`btcSubscription.wsPath` is used to define the SuperNode ws url OR ipc endpoint we subscribe to `btcSubscription.wsPath` is used to define the SuperNode ws url OR ipc endpoint we subscribe to
`btcSubscription.historicalData` specifies whether or not the super node should look up historical data in its cache and `btcSubscription.historicalData` specifies whether or not ipfs-blockchain-watcher should look up historical data in its cache and
send that to the subscriber, if this is set to `false` then the super node only streams newly synced/incoming data send that to the subscriber, if this is set to `false` then ipfs-blockchain-watcher only streams newly synced/incoming data
`btcSubscription.historicalDataOnly` will tell the super node to only send historical data with the specified range and `btcSubscription.historicalDataOnly` will tell ipfs-blockchain-watcher to only send historical data with the specified range and
not stream forward syncing data not stream forward syncing data
`btcSubscription.startingBlock` is the starting block number for the range we want to receive data in `btcSubscription.startingBlock` is the starting block number for the range we want to receive data in
@ -225,20 +225,20 @@ setting to 0 means there is no end/we will continue streaming indefinitely.
`btcSubscription.headerFilter` has one sub-option: `off`. `btcSubscription.headerFilter` has one sub-option: `off`.
- Setting `off` to true tells the super node to - Setting `off` to true tells ipfs-blockchain-watcher to
not send any headers to the subscriber. not send any headers to the subscriber.
- Additional header-filtering options will be added in the future. - Additional header-filtering options will be added in the future.
`btcSubscription.txFilter` has seven sub-options: `off`, `segwit`, `witnessHashes`, `indexes`, `pkScriptClass`, `multiSig`, and `addresses`. `btcSubscription.txFilter` has seven sub-options: `off`, `segwit`, `witnessHashes`, `indexes`, `pkScriptClass`, `multiSig`, and `addresses`.
- Setting `off` to true tells the super node to not send any transactions to the subscriber. - Setting `off` to true tells ipfs-blockchain-watcher to not send any transactions to the subscriber.
- Setting `segwit` to true tells the super node to only send segwit transactions. - Setting `segwit` to true tells ipfs-blockchain-watcher to only send segwit transactions.
- `witnessHashes` is a string array that can be filled with witness hash string; if it contains any hashes the super node will only send transactions that contain one of those hashes. - `witnessHashes` is a string array that can be filled with witness hash string; if it contains any hashes ipfs-blockchain-watcher will only send transactions that contain one of those hashes.
- `indexes` is an int64 array that can be filled with tx index numbers; if it contains any integers the super node will only send transactions at those indexes (e.g. `[0]` will send only coinbase transactions) - `indexes` is an int64 array that can be filled with tx index numbers; if it contains any integers ipfs-blockchain-watcher will only send transactions at those indexes (e.g. `[0]` will send only coinbase transactions)
- `pkScriptClass` is an uint8 array that can be filled with pk script class numbers; if it contains any integers the super node will only send transactions that have at least one tx output with one of the specified pkscript classes; - `pkScriptClass` is an uint8 array that can be filled with pk script class numbers; if it contains any integers ipfs-blockchain-watcher will only send transactions that have at least one tx output with one of the specified pkscript classes;
possible class types are 0 through 8 as defined [here](https://github.com/btcsuite/btcd/blob/master/txscript/standard.go#L52). possible class types are 0 through 8 as defined [here](https://github.com/btcsuite/btcd/blob/master/txscript/standard.go#L52).
- Setting `multisig` to true tells the super node to send only multi-sig transactions- to send only transaction that have at least one tx output that requires more than one signature to spend. - Setting `multisig` to true tells ipfs-blockchain-watcher to send only multi-sig transactions- to send only transaction that have at least one tx output that requires more than one signature to spend.
- `addresses` is a string array that can be filled with btc address strings; if it contains any addresses the super node will only send transactions that have at least one tx output with at least one of the provided addresses. - `addresses` is a string array that can be filled with btc address strings; if it contains any addresses ipfs-blockchain-watcher will only send transactions that have at least one tx output with at least one of the provided addresses.
### Native API Recapitulation: ### Native API Recapitulation:
@ -246,7 +246,7 @@ In addition to providing novel Postgraphile and RPC-Subscription endpoints, we a
standard chain APIs. This will allow direct compatibility with software that already makes use of the standard interfaces. standard chain APIs. This will allow direct compatibility with software that already makes use of the standard interfaces.
#### Ethereum JSON-RPC API #### Ethereum JSON-RPC API
The super node currently faithfully recapitulates portions of the Ethereum JSON-RPC api standard. ipfs-blockchain-watcher currently faithfully recapitulates portions of the Ethereum JSON-RPC api standard.
The currently supported endpoints include: The currently supported endpoints include:
`eth_blockNumber` `eth_blockNumber`

View File

@ -0,0 +1,136 @@
# ipfs-blockchain-watcher architecture
1. [Processes](#processes)
1. [Command](#command)
1. [Configuration](#config)
1. [Database](#database)
1. [APIs](#apis)
1. [Resync](#resync)
1. [IPFS Considerations](#ipfs-considerations)
## Processes
ipfs-blockchain-watcher is a [service](../pkg/super_node/service.go#L61) comprised of the following interfaces:
* [Payload Fetcher](../pkg/super_node/shared/interfaces.go#L29): Fetches raw chain data from a half-duplex endpoint (HTTP/IPC), used for historical data fetching. ([BTC](../../pkg/super_node/btc/payload_fetcher.go), [ETH](../../pkg/super_node/eth/payload_fetcher.go)).
* [Payload Streamer](../pkg/super_node/shared/interfaces.go#L24): Streams raw chain data from a full-duplex endpoint (WebSocket/IPC), used for syncing data at the head of the chain in real-time. ([BTC](../../pkg/super_node/btc/http_streamer.go), [ETH](../../pkg/super_node/eth/streamer.go)).
* [Payload Converter](../pkg/super_node/shared/interfaces.go#L34): Converters raw chain data to an intermediary form prepared for IPFS publishing. ([BTC](../../pkg/super_node/btc/converter.go), [ETH](../../pkg/super_node/eth/converter.go)).
* [IPLD Publisher](../pkg/super_node/shared/interfaces.go#L39): Publishes the converted data to IPFS, returning their CIDs and associated metadata for indexing. ([BTC](../../pkg/super_node/btc/publisher.go), [ETH](../../pkg/super_node/eth/publisher.go)).
* [CID Indexer](../pkg/super_node/shared/interfaces.go#L44): Indexes CIDs in Postgres with their associated metadata. This metadata is chain specific and selected based on utility. ([BTC](../../pkg/super_node/btc/indexer.go), [ETH](../../pkg/super_node/eth/indexer.go)).
* [CID Retriever](../pkg/super_node/shared/interfaces.go#L54): Retrieves CIDs from Postgres by searching against their associated metadata, is used to lookup data to serve API requests/subscriptions. ([BTC](../../pkg/super_node/btc/retriever.go), [ETH](../../pkg/super_node/eth/retriever.go)).
* [IPLD Fetcher](../pkg/super_node/shared/interfaces.go#L62): Fetches the IPLDs needed to service API requests/subscriptions from IPFS using retrieved CIDS; can route through a IPFS block-exchange to search for objects that are not directly available. ([BTC](../../pkg/super_node/btc/ipld_fetcher.go), [ETH](../../pkg/super_node/eth/ipld_fetcher.go))
* [Response Filterer](../pkg/super_node/shared/interfaces.go#L49): Filters converted data payloads served to API subscriptions; filters according to the subscriber provided parameters. ([BTC](../../pkg/super_node/btc/filterer.go), [ETH](../../pkg/super_node/eth/filterer.go)).
* [API](https://github.com/ethereum/go-ethereum/blob/master/rpc/types.go#L31): Expose RPC methods for clients to interface with the data. Chain-specific APIs should aim to recapitulate as much of the native API as possible. ([VDB](../../pkg/super_node/api.go), [ETH](../../pkg/super_node/eth/api.go)).
Appropriating the service for a new chain is done by creating underlying types to satisfy these interfaces for
the specifics of that chain.
The service uses these interfaces to operate in any combination of three modes: `sync`, `serve`, and `backfill`.
* Sync: Streams raw chain data at the head, converts and publishes it to IPFS, and indexes the resulting set of CIDs in Postgres with useful metadata.
* BackFill: Automatically searches for and detects gaps in the DB; fetches, converts, publishes, and indexes the data to fill these gaps.
* Serve: Opens up IPC, HTTP, and WebSocket servers on top of the ipfs-blockchain-watcher DB and any concurrent sync and/or backfill processes.
These three modes are all operated through a single vulcanizeDB command: `watch`
## Command
Usage: `./ipfs-blockchain-watcher watch --config={config.toml}`
Configuration can also be done through CLI options and/or environmental variables.
CLI options can be found using `./ipfs-blockchain-watcher watch --help`.
## Config
Below is the set of universal config parameters for the ipfs-blockchain-watcher command, in .toml form, with the respective environmental variables commented to the side.
This set of parameters needs to be set no matter the chain type.
```toml
[database]
name = "vulcanize_public" # $DATABASE_NAME
hostname = "localhost" # $DATABASE_HOSTNAME
port = 5432 # $DATABASE_PORT
user = "vdbm" # $DATABASE_USER
password = "" # $DATABASE_PASSWORD
[ipfs]
path = "~/.ipfs" # $IPFS_PATH
mode = "direct" # $IPFS_MODE
[superNode]
chain = "bitcoin" # $SUPERNODE_CHAIN
server = true # $SUPERNODE_SERVER
ipcPath = "~/.vulcanize/vulcanize.ipc" # $SUPERNODE_IPC_PATH
wsPath = "127.0.0.1:8082" # $SUPERNODE_WS_PATH
httpPath = "127.0.0.1:8083" # $SUPERNODE_HTTP_PATH
sync = true # $SUPERNODE_SYNC
workers = 1 # $SUPERNODE_WORKERS
backFill = true # $SUPERNODE_BACKFILL
frequency = 45 # $SUPERNODE_FREQUENCY
batchSize = 1 # $SUPERNODE_BATCH_SIZE
batchNumber = 50 # $SUPERNODE_BATCH_NUMBER
timeout = 300 # $HTTP_TIMEOUT
validationLevel = 1 # $SUPERNODE_VALIDATION_LEVEL
```
Additional parameters need to be set depending on the specific chain.
For Bitcoin:
```toml
[bitcoin]
wsPath = "127.0.0.1:8332" # $BTC_WS_PATH
httpPath = "127.0.0.1:8332" # $BTC_HTTP_PATH
pass = "password" # $BTC_NODE_PASSWORD
user = "username" # $BTC_NODE_USER
nodeID = "ocd0" # $BTC_NODE_ID
clientName = "Omnicore" # $BTC_CLIENT_NAME
genesisBlock = "000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f" # $BTC_GENESIS_BLOCK
networkID = "0xD9B4BEF9" # $BTC_NETWORK_ID
```
For Ethereum:
```toml
[ethereum]
wsPath = "127.0.0.1:8546" # $ETH_WS_PATH
httpPath = "127.0.0.1:8545" # $ETH_HTTP_PATH
nodeID = "arch1" # $ETH_NODE_ID
clientName = "Geth" # $ETH_CLIENT_NAME
genesisBlock = "0xd4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3" # $ETH_GENESIS_BLOCK
networkID = "1" # $ETH_NETWORK_ID
```
## Database
Currently, ipfs-blockchain-watcher persists all data to a single Postgres database. The migrations for this DB can be found [here](../../db/migrations).
Chain-specific data is populated under a chain-specific schema (e.g. `eth` and `btc`) while shared data- such as the IPFS blocks table- is populated under the `public` schema.
Subsequent watchers which act on the raw chain data should build and populate their own schemas or separate databases entirely.
In the future, we will be moving to a foreign table based architecture wherein a single db is used for shared data while each watcher uses
its own database and accesses and acts on the shared data through foreign tables. Isolating watchers to their own databases will prevent complications and
conflicts between watcher db migrations.
## APIs
ipfs-blockchain-watcher provides mutliple types of APIs by which to interface with its data.
More detailed information on the APIs can be found [here](apis.md).
## Resync
A separate command `resync` is available for directing the resyncing of data within specified ranges.
This is useful if we want to re-validate a range of data using a new source or clean out bad/deprecated data.
More detailed information on this command can be found [here](resync.md).
## IPFS Considerations
Currently the IPLD Publisher and Fetcher can either use internalized IPFS processes which interface with a local IPFS repository, or can interface
directly with the backing Postgres database.
Both these options circumvent the need to run a full IPFS daemon with a [go-ipld-eth](https://github.com/ipfs/go-ipld-eth) or [go-ipld-btc](https://github.com/ipld/go-ipld-btc) plugin.
The former approach can lead to issues with lock-contention on the IPFS repo if another IPFS process is configured and running at the same $IPFS_PATH, it also necessitates the need for
a locally configured IPFS repository. The later bypasses the need for a configured IPFS repository/$IPFS_PATH and allows all Postgres write operations at a given block height
to occur in a single transaction, the only disadvantage is that by avoiding moving through an IPFS node intermediary we lose the direct ability to reach out to the block
exchange for data we do not have locally.
Once go-ipld-eth and go-ipld-btc have been updated to work with a modern version of PG-IPFS, an additional option will be provided to direct
all publishing and fetching of IPLD objects through a remote IPFS daemon.

View File

@ -1,52 +0,0 @@
# Contribution guidelines
Contributions are welcome! Please open an Issues or Pull Request for any changes.
In addition to core contributions, developers are encouraged to build their own custom transformers which
can be run together with other custom transformers using the [composeAndExecute](../../staging/documentation/custom-transformers.md) command.
## Pull Requests
- `go fmt` is run as part of `make test` and `make integrationtest`, please make sure to check in the format changes.
- Ensure that new code is well tested, including integration testing if applicable.
- Make sure the build is passing.
- Update the README or any [documentation files](./) as necessary. If editing the Readme, please
conform to the
[standard-readme specification](https://github.com/RichardLitt/standard-readme).
- Once a Pull Request has received two approvals it can be merged in by a core developer.
Pull requests should be opened against the `staging` branch. Periodically, updates on `staging` will be ported over to `master` for tagged release.
## Creating a new migration file
1. `make new_migration NAME=add_columnA_to_table1`
- This will create a new timestamped migration file in `db/migrations`
1. Write the migration code in the created file, under the respective `goose` pragma
- Goose automatically runs each migration in a transaction; don't add `BEGIN` and `COMMIT` statements.
1. Core migrations should be committed in their `goose fix`ed form. To do this, run `make version_migrations` which
converts timestamped migrations to migrations versioned by an incremented integer.
## Diagrams
- Diagrams were created with [draw.io](https://www.draw.io).
- To update a diagram:
1. Go to [draw.io](https://www.draw.io).
1. Click on *File > Open from* and choose the location of the diagram you want to update.
1. Once open in draw.io, you may update it.
1. Export the diagram to this repository's directory and commit it.
## Generating the Changelog
We use [github-changelog-generator](https://github.com/github-changelog-generator/github-changelog-generator) to generate release Changelogs. To be consistent with previous Changelogs, the following flags should be passed to the command:
```
--user vulcanize
--project vulcanizedb
--token {YOUR_GITHUB_TOKEN}
--no-issues
--usernames-as-github-logins
--since-tag {PREVIOUS_RELEASE_TAG}
```
For more information on why your github token is needed, and how to generate it see [https://github
.com/github-changelog-generator/github-changelog-generator#github-token](https://github.com/github-changelog-generator/github-changelog-generator#github-token).
## Code of Conduct
VulcanizeDB follows the [Contributor Covenant Code of Conduct](https://www.contributor-covenant.org/version/1/4/code-of-conduct).

View File

@ -1,212 +0,0 @@
# Custom Transformers
When the capabilities of the generic `contractWatcher` are not sufficient, custom transformers tailored to a specific
purpose can be leveraged.
Individual custom transformers can be composed together from any number of external repositories and executed as a
single process using the `compose` and `execute` commands or the `composeAndExecute` command. This is accomplished by
generating a Go plugin which allows the `vulcanizedb` binary to link to the external transformers, so long as they
abide by one of the standard [interfaces](../staging/libraries/shared/transformer).
## Writing custom transformers
For help with writing different types of custom transformers please see below:
Storage Transformers: transform data derived from contract storage tries
* [Guide](../../staging/libraries/shared/factories/storage/README.md)
* [Example](../../staging/libraries/shared/factories/storage/EXAMPLE.md)
Event Transformers: transform data derived from Ethereum log events
* [Guide](../../staging/libraries/shared/factories/event/README.md)
* [Example 1](https://github.com/vulcanize/ens_transformers/tree/master/transformers/registar)
* [Example 2](https://github.com/vulcanize/ens_transformers/tree/master/transformers/registry)
* [Example 3](https://github.com/vulcanize/ens_transformers/tree/master/transformers/resolver)
Contract Transformers: transform data derived from Ethereum log events and use it to poll public contract methods
* [Example 1](https://github.com/vulcanize/account_transformers)
* [Example 2](https://github.com/vulcanize/ens_transformers/tree/master/transformers/domain_records)
## Preparing custom transformers to work as part of a plugin
To plug in an external transformer we need to:
1. Create a package that exports a variable `TransformerInitializer`, `StorageTransformerInitializer`, or `ContractTransformerInitializer` that are of type [TransformerInitializer](../staging/libraries/shared/transformer/event_transformer.go#L33)
or [StorageTransformerInitializer](../../staging/libraries/shared/transformer/storage_transformer.go#L31),
or [ContractTransformerInitializer](../../staging/libraries/shared/transformer/contract_transformer.go#L31), respectively
2. Design the transformers to work in the context of their [event](../staging/libraries/shared/watcher/event_watcher.go#L83),
[storage](../../staging/libraries/shared/watcher/storage_watcher.go#L53),
or [contract](../../staging/libraries/shared/watcher/contract_watcher.go#L68) watcher execution modes
3. Create db migrations to run against vulcanizeDB so that we can store the transformer output
* Do not `goose fix` the transformer migrations, this is to ensure they are always ran after the core vulcanizedb migrations which are kept in their fixed form
* Specify migration locations for each transformer in the config with the `exporter.transformer.migrations` fields
* If the base vDB migrations occupy this path as well, they need to be in their `goose fix`ed form
as they are [here](../../staging/db/migrations)
To update a plugin repository with changes to the core vulcanizedb repository, use your dependency manager to install the desired version of vDB.
## Building and Running Custom Transformers
### Commands
* The `compose`, `execute`, `composeAndExecute` commands require Go 1.11+ and use [Go plugins](https://golang
.org/pkg/plugin/) which only work on Unix-based systems.
* There is an ongoing [conflict](https://github.com/golang/go/issues/20481) between Go plugins and the use of vendored
dependencies which imposes certain limitations on how the plugins are built.
* Separate `compose` and `execute` commands allow pre-building and linking to the pre-built .so file. So, if
these are run independently, instead of using `composeAndExecute`, a couple of things need to be considered:
* It is necessary that the .so file was built with the same exact dependencies that are present in the execution
environment, i.e. we need to `compose` and `execute` the plugin .so file with the same exact version of vulcanizeDB.
* The plugin migrations are run during the plugin's composition. As such, if `execute` is used to run a prebuilt .so
in a different environment than the one it was composed in, then the database structure will need to be loaded
into the environment's Postgres database. This can either be done by manually loading the plugin's schema into
Postgres, or by manually running the plugin's migrations.
* The `compose` and `composeAndExecute` commands assume you are in the vulcanizdb directory located at your system's
`$GOPATH`, and that the plugin dependencies are present at their `$GOPATH` directories.
* The `execute` command does not require the plugin transformer dependencies be located in their `$GOPATH` directories,
instead it expects a .so file (of the name specified in the config file) to be in
`$GOPATH/src/github.com/vulcanize/ipfs-chain-watcher/plugins/` and, as noted above, also expects the plugin db migrations to
have already been ran against the database.
* Usage:
* compose: `./vulcanizedb compose --config=environments/config_name.toml`
* execute: `./vulcanizedb execute --config=environments/config_name.toml`
* composeAndExecute: `./vulcanizedb composeAndExecute --config=environments/config_name.toml`
### Flags
The `execute` and `composeAndExecute` commands can be passed optional flags to specify the operation of the watchers:
- `--recheck-headers`/`-r` - specifies whether to re-check headers for events after the header has already been queried for watched logs.
Can be useful for redundancy if you suspect that your node is not always returning all desired logs on every query.
Argument is expected to be a boolean: e.g. `-r=true`.
Defaults to `false`.
- `query-recheck-interval`/`-q` - specifies interval for re-checking storage diffs that haven been queued for later processing
(by default, the storage watched queues storage diffs if transformer execution fails, on the assumption that subsequent data derived from the event transformers may enable us to decode storage keys that we don't recognize right now).
Argument is expected to be a duration (integer measured in nanoseconds): e.g. `-q=10m30s` (for 10 minute, 30 second intervals).
Defaults to `5m` (5 minutes).
### Configuration
A .toml config file is specified when executing the commands.
The config provides information for composing a set of transformers from external repositories:
```toml
[database]
name = "vulcanize_public"
hostname = "localhost"
user = "vulcanize"
password = "vulcanize"
port = 5432
[client]
ipcPath = "/Users/user/Library/Ethereum/geth.ipc"
wsPath = "ws://127.0.0.1:8546"
[exporter]
home = "github.com/vulcanize/ipfs-chain-watcher"
name = "exampleTransformerExporter"
save = false
transformerNames = [
"transformer1",
"transformer2",
"transformer3",
"transformer4",
]
[exporter.transformer1]
path = "path/to/transformer1"
type = "eth_event"
repository = "github.com/account/repo"
migrations = "db/migrations"
rank = "0"
[exporter.transformer2]
path = "path/to/transformer2"
type = "eth_contract"
repository = "github.com/account/repo"
migrations = "db/migrations"
rank = "0"
[exporter.transformer3]
path = "path/to/transformer3"
type = "eth_event"
repository = "github.com/account/repo"
migrations = "db/migrations"
rank = "0"
[exporter.transformer4]
path = "path/to/transformer4"
type = "eth_storage"
repository = "github.com/account2/repo2"
migrations = "to/db/migrations"
rank = "1"
```
- `home` is the name of the package you are building the plugin for, in most cases this is github.com/vulcanize/ipfs-chain-watcher
- `name` is the name used for the plugin files (.so and .go)
- `save` indicates whether or not the user wants to save the .go file instead of removing it after .so compilation. Sometimes useful for debugging/trouble-shooting purposes.
- `transformerNames` is the list of the names of the transformers we are composing together, so we know how to access their submaps in the exporter map
- `exporter.<transformerName>`s are the sub-mappings containing config info for the transformers
- `repository` is the path for the repository which contains the transformer and its `TransformerInitializer`
- `path` is the relative path from `repository` to the transformer's `TransformerInitializer` directory (initializer package).
- Transformer repositories need to be cloned into the user's $GOPATH (`go get`)
- `type` is the type of the transformer; indicating which type of watcher it works with (for now, there are only two options: `eth_event` and `eth_storage`)
- `eth_storage` indicates the transformer works with the [storage watcher](../../staging/libraries/shared/watcher/storage_watcher.go)
that fetches state and storage diffs from an ETH node (instead of, for example, from IPFS)
- `eth_event` indicates the transformer works with the [event watcher](../../staging/libraries/shared/watcher/event_watcher.go)
that fetches event logs from an ETH node
- `eth_contract` indicates the transformer works with the [contract watcher](../staging/libraries/shared/watcher/contract_watcher.go)
that is made to work with [contract_watcher pkg](../../staging/pkg/contract_watcher)
based transformers which work with either a header or full sync vDB to watch events and poll public methods ([example1](https://github.com/vulcanize/account_transformers/tree/master/transformers/account/light), [example2](https://github.com/vulcanize/ens_transformers/tree/working/transformers/domain_records))
- `migrations` is the relative path from `repository` to the db migrations directory for the transformer
- `rank` determines the order that migrations are ran, with lower ranked migrations running first
- this is to help isolate any potential conflicts between transformer migrations
- start at "0"
- use strings
- don't leave gaps
- transformers with identical migrations/migration paths should share the same rank
- Note: If any of the imported transformers need additional config variables those need to be included as well
- Note: If the storage transformers are processing storage diffs from geth, we need to configure the websocket endpoint `client.wsPath` for them
This information is used to write and build a Go plugin which exports the configured transformers.
These transformers are loaded onto their specified watchers and executed.
Transformers of different types can be run together in the same command using a single config file or in separate instances using different config files
The general structure of a plugin .go file, and what we would see built with the above config is shown below
```go
package main
import (
interface1 "github.com/vulcanize/ipfs-chain-watcher/libraries/shared/transformer"
transformer1 "github.com/account/repo/path/to/transformer1"
transformer2 "github.com/account/repo/path/to/transformer2"
transformer3 "github.com/account/repo/path/to/transformer3"
transformer4 "github.com/account2/repo2/path/to/transformer4"
)
type exporter string
var Exporter exporter
func (e exporter) Export() []interface1.EventTransformerInitializer, []interface1.StorageTransformerInitializer, []interface1.ContractTransformerInitializer {
return []interface1.TransformerInitializer{
transformer1.TransformerInitializer,
transformer3.TransformerInitializer,
}, []interface1.StorageTransformerInitializer{
transformer4.StorageTransformerInitializer,
}, []interface1.ContractTransformerInitializer{
transformer2.TransformerInitializer,
}
}
```
### Storage backfilling
Storage transformers stream data from a geth subscription or parity csv file where the storage diffs are produced and emitted as the
full sync progresses. If the transformers have missed consuming a range of diffs due to lag in the startup of the processes or due to misalignment of the sync,
we can configure our storage transformers to backfill missing diffs from a [modified archival geth client](https://github.com/vulcanize/go-ethereum/tree/statediff_at).
To do so, add the following field to the config file.
```toml
[storageBackFill]
on = false
```
- `on` is set to `true` to turn the backfill process on
This process uses the regular `client.ipcPath` rpc path, it assumes that it is either an http or ipc path that supports the `StateDiffAt` endpoint.

View File

@ -1,27 +0,0 @@
# Syncing commands
These commands are used to sync raw Ethereum data into Postgres, with varying levels of data granularity.
## headerSync
Syncs block headers from a running Ethereum node into the VulcanizeDB table `headers`.
- Queries the Ethereum node using RPC calls.
- Validates headers from the last 15 blocks to ensure that data is up to date.
- Useful when you want a minimal baseline from which to track targeted data on the blockchain (e.g. individual smart
contract storage values or event logs).
- Handles chain reorgs by [validating the most recent blocks' hashes](../pkg/history/header_validator.go). If the hash is
different from what we have already stored in the database, the header record will be updated.
#### Usage
- Run: `./vulcanizedb headerSync --config <config.toml> --starting-block-number <block-number>`
- The config file must be formatted as follows, and should contain an ipc path to a running Ethereum node:
```toml
[database]
name = "vulcanize_public"
hostname = "localhost"
user = "vulcanize"
password = "vulcanize"
port = 5432
[client]
ipcPath = <path to a running Ethereum node>
```
- Alternatively, the ipc path can be passed as a flag instead `--client-ipcPath`.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 52 KiB

View File

@ -1,160 +0,0 @@
# Generic Transformer
The `contractWatcher` command is a built-in generic contract watcher. It can watch any and all events for a given contract provided the contract's ABI is available.
It also provides some state variable coverage by automating polling of public methods, with some restrictions:
1. The method must have 2 or less arguments
1. The method's arguments must all be of type address or bytes32 (hash)
1. The method must return a single value
This command operates in two modes- `header` and `full`- which require a header or full-synced vulcanizeDB, respectively.
This command requires the contract ABI be available on Etherscan if it is not provided in the config file by the user.
If method polling is turned on we require an archival node at the ETH ipc endpoint in our config, whether or not we are operating in `header` or `full` mode.
Otherwise we only need to connect to a full node.
## Configuration
This command takes a config of the form:
```toml
[database]
name = "vulcanize_public"
hostname = "localhost"
port = 5432
[client]
ipcPath = "/Users/user/Library/Ethereum/geth.ipc"
[contract]
network = ""
addresses = [
"contractAddress1",
"contractAddress2"
]
[contract.contractAddress1]
abi = 'ABI for contract 1'
startingBlock = 982463
[contract.contractAddress2]
abi = 'ABI for contract 2'
events = [
"event1",
"event2"
]
eventArgs = [
"arg1",
"arg2"
]
methods = [
"method1",
"method2"
]
methodArgs = [
"arg1",
"arg2"
]
startingBlock = 4448566
piping = true
````
- The `contract` section defines which contracts we want to watch and with which conditions.
- `network` is only necessary if the ABIs are not provided and wish to be fetched from Etherscan.
- Empty or nil string indicates mainnet
- "ropsten", "kovan", and "rinkeby" indicate their respective networks
- `addresses` lists the contract addresses we are watching and is used to load their individual configuration parameters
- `contract.<contractAddress>` are the sub-mappings which contain the parameters specific to each contract address
- `abi` is the ABI for the contract; if none is provided the application will attempt to fetch one from Etherscan using the provided address and network
- `events` is the list of events to watch
- If this field is omitted or no events are provided then by defualt ALL events extracted from the ABI will be watched
- If event names are provided then only those events will be watched
- `eventArgs` is the list of arguments to filter events with
- If this field is omitted or no eventArgs are provided then by default watched events are not filtered by their argument values
- If eventArgs are provided then only those events which emit at least one of these values as an argument are watched
- `methods` is the list of methods to poll
- If this is omitted or no methods are provided then by default NO methods are polled
- If method names are provided then those methods will be polled, provided
1) Method has two or less arguments
1) Arguments are all of address or hash types
1) Method returns a single value
- `methodArgs` is the list of arguments to limit polling methods to
- If this field is omitted or no methodArgs are provided then by default methods will be polled with every combination of the appropriately typed values that have been collected from watched events
- If methodArgs are provided then only those values will be used to poll methods
- `startingBlock` is the block we want to begin watching the contract, usually the deployment block of that contract
- `piping` is a boolean flag which indicates whether or not we want to pipe return method values forward as arguments to subsequent method calls
At the very minimum, for each contract address an ABI and a starting block number need to be provided (or just the starting block if the ABI can be reliably fetched from Etherscan).
With just this information we will be able to watch all events at the contract, but with no additional filters and no method polling.
## Output
Transformed events and polled method results are committed to Postgres in schemas and tables generated according to the contract abi.
Schemas are created for each contract using the naming convention `<sync-type>_<lowercase contract-address>`
Under this schema, tables are generated for watched events as `<lowercase event name>_event` and for polled methods as `<lowercase method name>_method`
The 'method' and 'event' identifiers are tacked onto the end of the table names to prevent collisions between methods and events of the same lowercase name
## Example:
Modify `./environments/example.toml` to replace the empty `ipcPath` with a path that points to an ethjson_rpc endpoint (e.g. a local geth node ipc path or an Infura url).
This endpoint should be for an archival eth node if we want to perform method polling as this configuration is currently set up to do. To work with a non-archival full node,
remove the `balanceOf` method from the `0x8dd5fbce2f6a956c3022ba3663759011dd51e73e` (TrueUSD) contract.
If you are operating a header sync vDB, run:
`./vulcanizedb contractWatcher --config=./environments/example.toml --mode=header`
If instead you are operating a full sync vDB and provided an archival node IPC path, run in full mode:
`./vulcanizedb contractWatcher --config=./environments/example.toml --mode=full`
This will run the contractWatcher and configures it to watch the contracts specified in the config file. Note that
by default we operate in `header` mode but the flag is included here to demonstrate its use.
The example config we link to in this example watches two contracts, the ENS Registry (0x314159265dD8dbb310642f98f50C066173C1259b) and TrueUSD (0x8dd5fbCe2F6a956C3022bA3663759011Dd51e73E).
Because the ENS Registry is configured with only an ABI and a starting block, we will watch all events for this contract and poll none of its methods. Note that the ENS Registry is an example
of a contract which does not have its ABI available over Etherscan and must have it included in the config file.
The TrueUSD contract is configured with two events (`Transfer` and `Mint`) and a single method (`balanceOf`), as such it will watch these two events and use any addresses it collects emitted from them
to poll the `balanceOf` method with those addresses at every block. Note that we do not provide an ABI for TrueUSD as its ABI can be fetched from Etherscan.
For the ENS contract, it produces and populates a schema with four tables"
`header_0x314159265dd8dbb310642f98f50c066173c1259b.newowner_event`
`header_0x314159265dd8dbb310642f98f50c066173c1259b.newresolver_event`
`header_0x314159265dd8dbb310642f98f50c066173c1259b.newttl_event`
`header_0x314159265dd8dbb310642f98f50c066173c1259b.transfer_event`
For the TrusUSD contract, it produces and populates a schema with three tables:
`header_0x8dd5fbce2f6a956c3022ba3663759011dd51e73e.transfer_event`
`header_0x8dd5fbce2f6a956c3022ba3663759011dd51e73e.mint_event`
`header_0x8dd5fbce2f6a956c3022ba3663759011dd51e73e.balanceof_method`
Column ids and types for these tables are generated based on the event and method argument names and types and method return types, resulting in tables such as:
Table "header_0x8dd5fbce2f6a956c3022ba3663759011dd51e73e.transfer_event"
| Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
|:----------:|:---------------------:|:---------:|:--------:|:--------------------------------------------------------------------------------------------:|:--------:|:------------:|:-----------:|
| id | integer | | not null | nextval('header_0x8dd5fbce2f6a956c3022ba3663759011dd51e73e.transfer_event_id_seq'::regclass) | plain | | |
| header_id | integer | | not null | | plain | | |
| token_name | character varying(66) | | not null | | extended | | |
| raw_log | jsonb | | | | extended | | |
| log_idx | integer | | not null | | plain | | |
| tx_idx | integer | | not null | | plain | | |
| from_ | character varying(66) | | not null | | extended | | |
| to_ | character varying(66) | | not null | | extended | | |
| value_ | numeric | | not null | | main | | |
Table "header_0x8dd5fbce2f6a956c3022ba3663759011dd51e73e.balanceof_method"
| Column | Type | Collation | Nullable | Default | Storage | Stats target | Description |
|:----------:|:---------------------:|:---------:|:--------:|:----------------------------------------------------------------------------------------------:|:--------:|:------------:|:-----------:|
| id | integer | | not null | nextval('header_0x8dd5fbce2f6a956c3022ba3663759011dd51e73e.balanceof_method_id_seq'::regclass) | plain | | |
| token_name | character varying(66) | | not null | | extended | | |
| block | integer | | not null | | plain | | |
| who_ | character varying(66) | | not null | | extended | | |
| returned | numeric | | not null | | main | | |
The addition of '_' after table names is to prevent collisions with reserved Postgres words.
Also notice that the contract address used for the schema name has been down-cased.

View File

@ -1,34 +0,0 @@
# Postgraphile
You can expose VulcanizeDB data via [Postgraphile](https://github.com/graphile/postgraphile).
Check out [their documentation](https://www.graphile.org/postgraphile/) for the most up-to-date instructions on installing, running, and customizing Postgraphile.
## Simple Setup
As of April 30, 2019, you can run Postgraphile pointed at the default `vulcanize_public` database with the following commands:
```
npm install -g postgraphile
postgraphile --connection postgres://localhost/vulcanize_public --schema=public,custom --disable-default-mutations --no-ignore-rbac
```
Arguments:
- `--connection` specifies the database. The above command connects to the default `vulcanize_public` database
defined in [the example config](../environments/public.toml.example).
- `--schema` defines what schema(s) to expose. The above exposes the `public` schema (for core VulcanizeDB data) as well as a `custom` schema (where `custom` is the name of a schema defined in executed transformers).
- `--disable-default-mutations` prevents Postgraphile from exposing create, update, and delete operations on your data, which are otherwise enabled by default.
- `--no-ignore-rbac` ensures that Postgraphile will only expose the tables, columns, fields, and query functions that the user has explicit access to.
## Customizing Postgraphile
By default, Postgraphile will expose queries for all tables defined in your chosen database/schema(s), including [filtering](https://www.graphile.org/postgraphile/filtering/) and [auto-discovered relations](https://www.graphile.org/postgraphile/relations/).
If you'd like to expose more customized windows into your data, there are some techniques you can apply when writing migrations:
- [Computed columns](https://www.graphile.org/postgraphile/computed-columns/) enable you to derive additional fields from types defined in your database.
For example, you could write a function to expose a block header's state root over Postgraphile with a computed column - without modifying the `public.headers` table.
- [Custom queries](https://www.graphile.org/postgraphile/custom-queries/) enable you to provide on-demand access to more complex data (e.g. the product of joining and filtering several tables' data based on a passed argument).
For example, you could write a custom query to get the block timestamp for every transaction originating from a given address.
- [Subscriptions](https://www.graphile.org/postgraphile/subscriptions/) enable you to publish data as it is coming into your database.
The above list is not exhaustive - please see the Postgraphile documentation for a more comprehensive and up-to-date description of available features.

View File

@ -1,5 +1,5 @@
## VulcanizeDB Super Node Resync ## ipfs-blockchain-watcher resync
The `resync` command is made available for directing the resyncing of super node data within specified ranges. The `resync` command is made available for directing the resyncing of ipfs-blockchain-watcherdata within specified ranges.
It also contains a utility for cleaning out old data, and resetting the validation level of data. It also contains a utility for cleaning out old data, and resetting the validation level of data.
### Rational ### Rational
@ -8,15 +8,15 @@ Manual resyncing of data is useful when we want to re-validate data within speci
Cleaning out data is useful when we need to remove bad/deprecated data or prepare for breaking changes to the db schemas. Cleaning out data is useful when we need to remove bad/deprecated data or prepare for breaking changes to the db schemas.
Resetting the validation level of data is useful for designating ranges of data for resyncing by an ongoing super node Resetting the validation level of data is useful for designating ranges of data for resyncing by an ongoing ipfs-blockchain-watcher
backfill process. backfill process.
### Command ### Command
Usage: `./vulcanizedb resync --config={config.toml}` Usage: `./ipfs-blockchain-watcher resync --config={config.toml}`
Configuration can also be done through CLI options and/or environmental variables. Configuration can also be done through CLI options and/or environmental variables.
CLI options can be found using `./vulcanizedb resync --help`. CLI options can be found using `./ipfs-blockchain-watcher resync --help`.
### Config ### Config

View File

@ -1,138 +0,0 @@
# VulcanizeDB Super Node Architecture
The VulcanizeDB super node is a collection of interfaces that are used to extract, process, and store in Postgres-IPFS
all chain data. The raw data indexed by the super node serves as the basis for more specific watchers and applications.
Currently the service supports complete processing of all Bitcoin and Ethereum data.
## Table of Contents
1. [Processes](#processes)
1. [Command](#command)
1. [Configuration](#config)
1. [Database](#database)
1. [APIs](#apis)
1. [Resync](#resync)
1. [IPFS Considerations](#ipfs-considerations)
## Processes
The [super node service](../../pkg/super_node/service.go#L61) is a watcher comprised of the following interfaces:
* [Payload Fetcher](../../pkg/super_node/shared/interfaces.go#L29): Fetches raw chain data from a half-duplex endpoint (HTTP/IPC), used for historical data fetching. ([BTC](../../pkg/super_node/btc/payload_fetcher.go), [ETH](../../pkg/super_node/eth/payload_fetcher.go)).
* [Payload Streamer](../../pkg/super_node/shared/interfaces.go#L24): Streams raw chain data from a full-duplex endpoint (WebSocket/IPC), used for syncing data at the head of the chain in real-time. ([BTC](../../pkg/super_node/btc/http_streamer.go), [ETH](../../pkg/super_node/eth/streamer.go)).
* [Payload Converter](../../pkg/super_node/shared/interfaces.go#L34): Converters raw chain data to an intermediary form prepared for IPFS publishing. ([BTC](../../pkg/super_node/btc/converter.go), [ETH](../../pkg/super_node/eth/converter.go)).
* [IPLD Publisher](../../pkg/super_node/shared/interfaces.go#L39): Publishes the converted data to IPFS, returning their CIDs and associated metadata for indexing. ([BTC](../../pkg/super_node/btc/publisher.go), [ETH](../../pkg/super_node/eth/publisher.go)).
* [CID Indexer](../../pkg/super_node/shared/interfaces.go#L44): Indexes CIDs in Postgres with their associated metadata. This metadata is chain specific and selected based on utility. ([BTC](../../pkg/super_node/btc/indexer.go), [ETH](../../pkg/super_node/eth/indexer.go)).
* [CID Retriever](../../pkg/super_node/shared/interfaces.go#L54): Retrieves CIDs from Postgres by searching against their associated metadata, is used to lookup data to serve API requests/subscriptions. ([BTC](../../pkg/super_node/btc/retriever.go), [ETH](../../pkg/super_node/eth/retriever.go)).
* [IPLD Fetcher](../../pkg/super_node/shared/interfaces.go#L62): Fetches the IPLDs needed to service API requests/subscriptions from IPFS using retrieved CIDS; can route through a IPFS block-exchange to search for objects that are not directly available. ([BTC](../../pkg/super_node/btc/ipld_fetcher.go), [ETH](../../pkg/super_node/eth/ipld_fetcher.go))
* [Response Filterer](../../pkg/super_node/shared/interfaces.go#L49): Filters converted data payloads served to API subscriptions; filters according to the subscriber provided parameters. ([BTC](../../pkg/super_node/btc/filterer.go), [ETH](../../pkg/super_node/eth/filterer.go)).
* [API](https://github.com/ethereum/go-ethereum/blob/master/rpc/types.go#L31): Expose RPC methods for clients to interface with the data. Chain-specific APIs should aim to recapitulate as much of the native API as possible. ([VDB](../../pkg/super_node/api.go), [ETH](../../pkg/super_node/eth/api.go)).
Appropriating the service for a new chain is done by creating underlying types to satisfy these interfaces for
the specifics of that chain.
The service uses these interfaces to operate in any combination of three modes: sync, serve, and backfill.
* Sync: Streams raw chain data at the head, converts and publishes it to IPFS, and indexes the resulting set of CIDs in Postgres with useful metadata.
* BackFill: Automatically searches for and detects gaps in the DB; fetches, converts, publishes, and indexes the data to fill these gaps.
* Serve: Opens up IPC, HTTP, and WebSocket servers on top of the superNode DB and any concurrent sync and/or backfill processes.
These three modes are all operated through a single vulcanizeDB command: `superNode`
## Command
Usage: `./vulcanizedb superNode --config={config.toml}`
Configuration can also be done through CLI options and/or environmental variables.
CLI options can be found using `./vulcanizedb superNode --help`.
## Config
Below is the set of universal config parameters for the superNode command, in .toml form, with the respective environmental variables commented to the side.
This set of parameters needs to be set no matter the chain type.
```toml
[database]
name = "vulcanize_public" # $DATABASE_NAME
hostname = "localhost" # $DATABASE_HOSTNAME
port = 5432 # $DATABASE_PORT
user = "vdbm" # $DATABASE_USER
password = "" # $DATABASE_PASSWORD
[ipfs]
path = "~/.ipfs" # $IPFS_PATH
[superNode]
chain = "bitcoin" # $SUPERNODE_CHAIN
server = true # $SUPERNODE_SERVER
ipcPath = "~/.vulcanize/vulcanize.ipc" # $SUPERNODE_IPC_PATH
wsPath = "127.0.0.1:8082" # $SUPERNODE_WS_PATH
httpPath = "127.0.0.1:8083" # $SUPERNODE_HTTP_PATH
sync = true # $SUPERNODE_SYNC
workers = 1 # $SUPERNODE_WORKERS
backFill = true # $SUPERNODE_BACKFILL
frequency = 45 # $SUPERNODE_FREQUENCY
batchSize = 1 # $SUPERNODE_BATCH_SIZE
batchNumber = 50 # $SUPERNODE_BATCH_NUMBER
timeout = 300 # $HTTP_TIMEOUT
validationLevel = 1 # $SUPERNODE_VALIDATION_LEVEL
```
Additional parameters need to be set depending on the specific chain.
For Bitcoin:
```toml
[bitcoin]
wsPath = "127.0.0.1:8332" # $BTC_WS_PATH
httpPath = "127.0.0.1:8332" # $BTC_HTTP_PATH
pass = "password" # $BTC_NODE_PASSWORD
user = "username" # $BTC_NODE_USER
nodeID = "ocd0" # $BTC_NODE_ID
clientName = "Omnicore" # $BTC_CLIENT_NAME
genesisBlock = "000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f" # $BTC_GENESIS_BLOCK
networkID = "0xD9B4BEF9" # $BTC_NETWORK_ID
```
For Ethereum:
```toml
[ethereum]
wsPath = "127.0.0.1:8546" # $ETH_WS_PATH
httpPath = "127.0.0.1:8545" # $ETH_HTTP_PATH
nodeID = "arch1" # $ETH_NODE_ID
clientName = "Geth" # $ETH_CLIENT_NAME
genesisBlock = "0xd4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3" # $ETH_GENESIS_BLOCK
networkID = "1" # $ETH_NETWORK_ID
```
## Database
Currently, the super node persists all data to a single Postgres database. The migrations for this DB can be found [here](../../db/migrations).
Chain-specific data is populated under a chain-specific schema (e.g. `eth` and `btc`) while shared data- such as the IPFS blocks table- is populated under the `public` schema.
Subsequent watchers which act on the raw chain data should build and populate their own schemas or separate databases entirely.
In the future, we will be moving to a foreign table based architecture wherein a single db is used for shared data while each watcher uses
its own database and accesses and acts on the shared data through foreign tables. Isolating watchers to their own databases will prevent complications and
conflicts between watcher db migrations.
## APIs
The super node provides mutliple types of APIs by which to interface with its data.
More detailed information on the APIs can be found [here](apis.md).
## Resync
A separate command `resync` is available for directing the resyncing of data within specified ranges.
This is useful if we want to re-validate a range of data using a new source or clean out bad/deprecated data.
More detailed information on this command can be found [here](resync.md).
## IPFS Considerations
Currently, the IPLD Publisher and Fetcher use internalized IPFS processes which interface directly with a local IPFS repository.
This circumvents the need to run a full IPFS daemon with a [go-ipld-eth](https://github.com/ipfs/go-ipld-eth) plugin, but can lead to issues
with lock-contention on the IPFS repo if another IPFS process is configured and running at the same $IPFS_PATH. This also necessitates the need for
a locally configured IPFS repository.
Once go-ipld-eth has been updated to work with a modern version of PG-IPFS, an additional option will be provided to direct
all publishing and fetching of IPLD objects through a remote IPFS daemon.

View File

@ -1,160 +0,0 @@
# VulcanizeDB Super Node Setup
Step-by-step instructions for manually setting up and running a VulcanizeDB super node.
Steps:
1. [Postgres](#postgres)
1. [Goose](#goose)
1. [IPFS](#ipfs)
1. [Blockchain](#blockchain)
1. [VulcanizeDB](#vulcanizedb)
### Postgres
A postgresDB is needed to storing all of the data in the vulcanizedb system.
Postgres is used as the backing datastore for IPFS, and is used to index the CIDs for all of the chain data stored on IPFS.
Follow the guides [here](https://wiki.postgresql.org/wiki/Detailed_installation_guides) for setting up Postgres.
Once the Postgres server is running, we will need to make a database for vulcanizedb, e.g. `vulcanize_public`.
`createdb vulcanize_public`
For running the automated tests, also create a database named `vulcanize_testing`.
`createdb vulcanize_testing`
### Goose
We use [goose](https://github.com/pressly/goose) as our migration management tool. While it is not necessary to use `goose` for manual setup, it
is required for running the automated tests.
### IPFS
We use IPFS to store IPLD objects for each type of data we extract from on chain.
To start, download and install [IPFS](https://github.com/vulcanize/go-ipfs):
`go get github.com/ipfs/go-ipfs`
`cd $GOPATH/src/github.com/ipfs/go-ipfs`
`make install`
If we want to use Postgres as our backing datastore, we need to use the vulcanize fork of go-ipfs.
Start by adding the fork and switching over to it:
`git remote add vulcanize https://github.com/vulcanize/go-ipfs.git`
`git fetch vulcanize`
`git checkout -b postgres_update vulcanize/postgres_update`
Now install this fork of ipfs, first be sure to remove any previous installation:
`make install`
Check that is installed properly by running:
`ipfs`
You should see the CLI info/help output.
And now we initialize with the `postgresds` profile.
If ipfs was previously initialized we will need to remove the old profile first.
We also need to provide env variables for the postgres connection:
We can either set these manually, e.g.
```bash
export IPFS_PGHOST=
export IPFS_PGUSER=
export IPFS_PGDATABASE=
export IPFS_PGPORT=
export IPFS_PGPASSWORD=
```
And then run the ipfs command:
`ipfs init --profile=postgresds`
Or we can use the pre-made script at `GOPATH/src/github.com/ipfs/go-ipfs/misc/utility/ipfs_postgres.sh`
which has usage:
`./ipfs_postgres.sh <IPFS_PGHOST> <IPFS_PGPORT> <IPFS_PGUSER> <IPFS_PGDATABASE>"`
and will ask us to enter the password, avoiding storing it to an ENV variable.
Once we have initialized ipfs, that is all we need to do with it- we do not need to run a daemon during the subsequent processes (in fact, we can't).
### Blockchain
This section describes how to setup an Ethereum or Bitcoin node to serve as a data source for the super node
#### Ethereum
For Ethereum, we currently *require* [a special fork of go-ethereum](https://github.com/vulcanize/go-ethereum/tree/statediff_at_anyblock-1.9.11). This can be setup as follows.
Skip this steps if you already have access to a node that displays the statediffing endpoints.
Begin by downloading geth and switching to the vulcanize/rpc_statediffing branch:
`go get github.com/ethereum/go-ethereum`
`cd $GOPATH/src/github.com/ethereum/go-ethereum`
`git remote add vulcanize https://github.com/vulcanize/go-ethereum.git`
`git fetch vulcanize`
`git checkout -b statediffing vulcanize/statediff_at_anyblock-1.9.11`
Now, install this fork of geth (make sure any old versions have been uninstalled/binaries removed first):
`make geth`
And run the output binary with statediffing turned on:
`cd $GOPATH/src/github.com/ethereum/go-ethereum/build/bin`
`./geth --statediff --statediff.streamblock --ws --syncmode=full`
Note: if you wish to access historical data (perform `backFill`) then the node will need to operate as an archival node (`--gcmode=archive`)
Note: other CLI options- statediff specific ones included- can be explored with `./geth help`
The output from geth should mention that it is `Starting statediff service` and block synchronization should begin shortly thereafter.
Note that until it receives a subscriber, the statediffing process does nothing but wait for one. Once a subscription is received, this
will be indicated in the output and node will begin processing and sending statediffs.
Also in the output will be the endpoints that we will use to interface with the node.
The default ws url is "127.0.0.1:8546" and the default http url is "127.0.0.1:8545".
These values will be used as the `ethereum.wsPath` and `ethereum.httpPath` in the super node config, respectively.
#### Bitcoin
For Bitcoin, the super node is able to operate entirely through the universally exposed JSON-RPC interfaces.
This means we can use any of the standard full nodes (e.g. bitcoind, btcd) as our data source.
Point at a remote node or set one up locally using the instructions for [bitcoind](https://github.com/bitcoin/bitcoin) and [btcd](https://github.com/btcsuite/btcd).
The default http url is "127.0.0.1:8332". We will use the http endpoint as both the `bitcoin.wsPath` and `bitcoin.httpPath`
(bitcoind does not support websocket endpoints, we are currently using a "subscription" wrapper around the http endpoints)
### Vulcanizedb
Finally, we can begin the vulcanizeDB process itself.
Start by downloading vulcanizedb and moving into the repo:
`go get github.com/vulcanize/ipfs-chain-watcher`
`cd $GOPATH/src/github.com/vulcanize/ipfs-chain-watcher`
Run the db migrations against the Postgres database we created for vulcanizeDB:
`goose -dir=./db/migrations postgres postgres://localhost:5432/vulcanize_public?sslmode=disable up`
At this point, if we want to run the automated tests:
`make test`
`make integration_test`
Then, build the vulcanizedb binary:
`go build`
And run the super node command with a provided [config](architecture.md/#):
`./vulcanizedb superNode --config=<config_file.toml`

View File

@ -1,36 +0,0 @@
// VulcanizeDB
// Copyright © 2019 Vulcanize
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU Affero General Public License for more details.
// You should have received a copy of the GNU Affero General Public License
// along with this program. If not, see <http://www.gnu.org/licenses/>.
package integration_test
import (
"io/ioutil"
"testing"
"github.com/sirupsen/logrus"
. "github.com/onsi/ginkgo"
. "github.com/onsi/gomega"
)
func TestIntegrationTest(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "IntegrationTest Suite")
}
var _ = BeforeSuite(func() {
logrus.SetOutput(ioutil.Discard)
})

View File

@ -1,63 +0,0 @@
// VulcanizeDB
// Copyright © 2019 Vulcanize
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU Affero General Public License for more details.
// You should have received a copy of the GNU Affero General Public License
// along with this program. If not, see <http://www.gnu.org/licenses/>.
package client
import (
"context"
"math/big"
"github.com/ethereum/go-ethereum"
"github.com/ethereum/go-ethereum/common"
"github.com/ethereum/go-ethereum/core/types"
"github.com/ethereum/go-ethereum/ethclient"
)
type EthClient struct {
client *ethclient.Client
}
func NewEthClient(client *ethclient.Client) EthClient {
return EthClient{client: client}
}
func (client EthClient) BlockByNumber(ctx context.Context, number *big.Int) (*types.Block, error) {
return client.client.BlockByNumber(ctx, number)
}
func (client EthClient) CallContract(ctx context.Context, msg ethereum.CallMsg, blockNumber *big.Int) ([]byte, error) {
return client.client.CallContract(ctx, msg, blockNumber)
}
func (client EthClient) FilterLogs(ctx context.Context, q ethereum.FilterQuery) ([]types.Log, error) {
return client.client.FilterLogs(ctx, q)
}
func (client EthClient) HeaderByNumber(ctx context.Context, number *big.Int) (*types.Header, error) {
return client.client.HeaderByNumber(ctx, number)
}
func (client EthClient) TransactionSender(ctx context.Context, tx *types.Transaction, block common.Hash, index uint) (common.Address, error) {
return client.client.TransactionSender(ctx, tx, block, index)
}
func (client EthClient) TransactionReceipt(ctx context.Context, txHash common.Hash) (*types.Receipt, error) {
return client.client.TransactionReceipt(ctx, txHash)
}
func (client EthClient) BalanceAt(ctx context.Context, account common.Address, blockNumber *big.Int) (*big.Int, error) {
return client.client.BalanceAt(ctx, account, blockNumber)
}

View File

@ -1,112 +0,0 @@
// VulcanizeDB
// Copyright © 2019 Vulcanize
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU Affero General Public License for more details.
// You should have received a copy of the GNU Affero General Public License
// along with this program. If not, see <http://www.gnu.org/licenses/>.
package test_config
import (
"errors"
"fmt"
"os"
"github.com/sirupsen/logrus"
"github.com/spf13/viper"
"github.com/vulcanize/ipfs-chain-watcher/pkg/config"
"github.com/vulcanize/ipfs-chain-watcher/pkg/core"
"github.com/vulcanize/ipfs-chain-watcher/pkg/postgres"
)
var TestConfig *viper.Viper
var DBConfig config.Database
var TestClient config.Client
var ABIFilePath string
func init() {
setTestConfig()
setABIPath()
}
func setTestConfig() {
TestConfig = viper.New()
TestConfig.SetConfigName("testing")
TestConfig.AddConfigPath("$GOPATH/src/github.com/vulcanize/ipfs-chain-watcher/environments/")
err := TestConfig.ReadInConfig()
if err != nil {
logrus.Fatal(err)
}
ipc := TestConfig.GetString("client.ipcPath")
// If we don't have an ipc path in the config file, check the env variable
if ipc == "" {
TestConfig.BindEnv("url", "INFURA_URL")
ipc = TestConfig.GetString("url")
}
if ipc == "" {
logrus.Fatal(errors.New("testing.toml IPC path or $INFURA_URL env variable need to be set"))
}
hn := TestConfig.GetString("database.hostname")
port := TestConfig.GetInt("database.port")
name := TestConfig.GetString("database.name")
DBConfig = config.Database{
Hostname: hn,
Name: name,
Port: port,
}
TestClient = config.Client{
IPCPath: ipc,
}
}
func setABIPath() {
gp := os.Getenv("GOPATH")
ABIFilePath = gp + "/src/github.com/vulcanize/ipfs-chain-watcher/pkg/eth/testing/"
}
func NewTestDB(node core.Node) *postgres.DB {
db, err := postgres.NewDB(DBConfig, node)
if err != nil {
panic(fmt.Sprintf("Could not create new test db: %v", err))
}
return db
}
func CleanTestDB(db *postgres.DB) {
// can't delete from nodes since this function is called after the required node is persisted
db.MustExec("DELETE FROM goose_db_version")
db.MustExec("DELETE FROM header_sync_logs")
db.MustExec("DELETE FROM header_sync_receipts")
db.MustExec("DELETE FROM header_sync_transactions")
db.MustExec("DELETE FROM headers")
db.MustExec("DELETE FROM queued_storage")
db.MustExec("DELETE FROM storage_diff")
}
func CleanCheckedHeadersTable(db *postgres.DB, columnNames []string) {
for _, name := range columnNames {
db.MustExec("ALTER TABLE checked_headers DROP COLUMN IF EXISTS " + name)
}
}
// Returns a new test node, with the same ID
func NewTestNode() core.Node {
return core.Node{
GenesisBlock: "GENESIS",
NetworkID: "1",
ID: "b6f90c0fdd8ec9607aed8ee45c69322e47b7063f0bfb7a29c8ecafab24d0a22d24dd2329b5ee6ed4125a03cb14e57fd584e67f9e53e6c631055cbbd82f080845",
ClientName: "Geth/v1.7.2-stable-1db4ecdc/darwin-amd64/go1.9",
}
}