13 KiB
Seed Node
Vulcanizedb can act as an index for Ethereum data stored on IPFS through the use of the syncAndPublish
and
syncPublishScreenAndServe
commands.
Manual Setup
These commands work in conjunction with a state-diffing full Geth node and IPFS.
IPFS
To start, download and install IPFS
go get github.com/ipfs/go-ipfs
cd $GOPATH/src/github.com/ipfs/go-ipfs
make install
If we want to use Postgres as our backing datastore, we need to use the vulcanize fork of go-ipfs.
Start by adding the fork and switching over to it:
git remote add vulcanize https://github.com/vulcanize/go-ipfs.git
git fetch vulcanize
git checkout -b postgres_update vulcanize/postgres_update
Now install this fork of ipfs, first be sure to remove any previous installation.
make install
Check that is installed properly by running
ipfs
You should see the CLI info/help output.
And now we initialize with the postgresds
profile.
If ipfs was previously initialized we will need to remove the old profile first.
We also need to provide env variables for the postgres connection:
We can either set these manually, e.g.
export IPFS_PGHOST=
export IPFS_PGUSER=
export IPFS_PGDATABASE=
export IPFS_PGPORT=
export IPFS_PGPASSWORD=
And then run the ipfs command
ipfs init --profile=postgresds
Or we can use the pre-made script at GOPATH/src/github.com/ipfs/go-ipfs/misc/utility/ipfs_postgres.sh
which has usage:
./ipfs_postgres.sh <IPFS_PGHOST> <IPFS_PGPORT> <IPFS_PGUSER> <IPFS_PGDATABASE>"
and will ask us to enter the password, avoiding storing it to an ENV variable.
Once we have initialized ipfs, that is all we need to do with it- we do not need to run a daemon during the subsequent processes (in fact, we can't).
Geth
For Geth, we currently require a special fork, and we can set this up as follows:
Begin by downloading geth and switching to the vulcanize/rpc_statediffing branch
go get github.com/ethereum/go-ethereum
cd $GOPATH/src/github.com/ethereum/go-ethereum
git remote add vulcanize https://github.com/vulcanize/go-ethereum.git
git fetch vulcanize
git checkout -b statediffing vulcanize/statediffing
Now, install this fork of geth (make sure any old versions have been uninstalled/binaries removed first)
make geth
And run the output binary with statediffing turned on:
cd $GOPATH/src/github.com/ethereum/go-ethereum/build/bin
./geth --statediff --statediff.streamblock --ws --syncmode=full
Note: other CLI options- statediff specific ones included- can be explored with ./geth help
The output from geth should mention that it is Starting statediff service
and block synchronization should begin shortly thereafter.
Note that until it receives a subscriber, the statediffing process does essentially nothing. Once a subscription is received, this
will be indicated in the output.
Also in the output will be the websocket url and ipc paths that we will use to subscribe to the statediffing process. The default ws url is "ws://127.0.0.1:8546" and the default ipcPath- on Darwin systems only- is "Users/user/Library/Ethereum/geth.ipc"
Vulcanizedb
There are two commands to choose from:
syncAndPublish
syncAndPublih
performs the functions of the super node- syncing data from Geth, converting them to IPLDs,
publishing those IPLDs to IPFS, and creating a local Postgres index to relate their CIDS to useful metadata.
Usage:
./vulcanizedb syncAndPublish --config=<config_file.toml>
The config file for the syncAndPublish
command looks very similar to the basic config file
[database]
name = "vulcanize_demo"
hostname = "localhost"
port = 5432
[client]
ipcPath = "ws://127.0.0.1:8546"
ipfsPath = "/Users/user/.ipfs"
With an additional field, client.ipcPath
, that is either the ws url or the ipc path that Geth has exposed (the url and path output
when the geth sync was started), and client.ipfsPath
which is the path the ipfs datastore directory.
syncPublishScreenAndServe
syncPublishScreenAndServe
does everything that syncAndPublish
does, plus it opens up an RPC server which exposes
an endpoint to allow transformers to subscribe to subsets of the sync-and-published data that are relevant to their transformations
Usage:
./vulcanizedb syncPublishScreenAndServe --config=<config_file.toml>
The config file for the syncPublishScreenAndServe
command has two additional fields and looks like:
[database]
name = "vulcanize_demo"
hostname = "localhost"
port = 5432
[client]
ipcPath = "ws://127.0.0.1:8546"
ipfsPath = "/Users/user/.ipfs"
[server]
ipcPath = "/Users/user/.vulcanize/vulcanize.ipc"
wsEndpoint = "127.0.0.1:80"
[superNodeBackFill]
on = false
ipcPath = ""
frequency = 5
The additional server.ipcPath
and server.wsEndpoint
fields are used to set what ipc endpoint and ws url
the syncPublishScreenAndServe
rpc server will expose itself to subscribing transformers over, respectively.
Any valid and available path and endpoint is acceptable, but keep in mind that this path and endpoint need to
be known by transformers for them to subscribe to the super node.
Because the super node syncs data from a geth full node as it progresses through its block synchronization, there is potential
for the super node to miss data both at the beginning of the sync due to lag between initialization of the two processes and
anywhere throughout the sync if the processes are interrupted. The superNodeBackFill
config mapping is used to optionally configure
the super node with an archival geth client that exposes a statediff.StateDiffAt
rpc endpoint, to enable it to fill in these data gaps.
superNodeBackFill.on
turns the backfill process on, the superNodeBackFill.ipcPath
is the rpc path for the archival geth node, and superNodeBackFill.frequency
sets at what frequency (in minutes) the backfill process checks for and fills in gaps.
Dockerfile Setup
The below provides step-by-step directions for how to setup the super node using the provided Dockerfile on an AWS Linux AMI instance. Note that the instance will need sufficient memory and storage for this to work.
- Install basic dependencies
sudo yum update
sudo yum install -y curl gpg gcc gcc-c++ make git
- Install Go 1.12
wget https://dl.google.com/go/go1.12.6.linux-amd64.tar.gz
tar -xzf go1.12.6.linux-amd64.tar.gz
sudo mv go /usr/local
- Edit .bash_profile to export GOPATH
export GOROOT=/usr/local/go
export GOPATH=$HOME/go
export PATH=$GOPATH/bin:$GOROOT/bin:$PATH
- Install and setup Postgres
sudo yum install postgresql postgresql96-server
sudo service postgresql96 initdb
sudo service postgresql96 start
sudo -u postgres createuser -s ec2-user
sudo -u postgres createdb ec2-user
sudo su postgres
psql
ALTER USER "ec2-user" WITH SUPERUSER;
/q
exit
4b. Edit hba_file to trust connections
psql
SHOW hba_file;
/q
sudo vim {PATH_TO_FILE}
4c. Stop and restart Postgres server to affect changes
sudo service postgresql96 stop
sudo service postgresql96 start
- Install and start Docker (exit and re-enter ec2 instance afterwards to affect changes)
sudo yum install -y docker
sudo service docker start
sudo usermod -aG docker ec2-user
- Fetch the repository and switch to this working branch
go get github.com/vulcanize/vulcanizedb
cd $GOPATH/src/github.com/vulcanize/vulcanizedb
git checkout ipfs_concurrency
- Create the db
createdb vulcanize_public
- Build and run the Docker image
cd $GOPATH/src/github.com/vulcanize/vulcanizedb/dockerfiles/super_node
docker build .
docker run --network host -e VDB_PG_CONNECT=postgres://localhost:5432/vulcanize_public?sslmode=disable {IMAGE_ID}
Subscribing
A transformer can subscribe to the syncPublishScreenAndServe
service over its ipc or ws endpoints, when subscribing the transformer
specifies which subsets of the synced data it is interested in and the server will forward only these data.
The streamSubscribe
command serves as a simple demonstration/example of subscribing to the super-node feed, it subscribes with a set of parameters
defined in the loaded config file, and prints the streamed data to stdout. To build transformers that subscribe to and use super-node data,
the shared/libraries/streamer can be used.
Usage:
./vulcanizedb streamSubscribe --config=<config_file.toml>
The config for streamSubscribe
has the subscribe
set of parameters, for example:
[subscription]
path = "ws://127.0.0.1:8080"
backfill = true
backfillOnly = false
startingBlock = 0
endingBlock = 0
[subscription.headerFilter]
off = false
uncles = false
[subscription.trxFilter]
off = false
src = [
"0xde0B295669a9FD93d5F28D9Ec85E40f4cb697BAe",
]
dst = [
"0xde0B295669a9FD93d5F28D9Ec85E40f4cb697BAe",
]
[subscription.receiptFilter]
off = false
topic0s = [
"0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef",
"0x930a61a57a70a73c2a503615b87e2e54fe5b9cdeacda518270b852296ab1a377"
]
[subscription.stateFilter]
off = false
addresses = [
"0xde0B295669a9FD93d5F28D9Ec85E40f4cb697BAe"
]
intermediateNodes = false
[subscription.storageFilter]
off = true
addresses = [
"",
""
]
storageKeys = [
"",
""
]
intermediateNodes = false
subscription.path
is used to define the ws url OR ipc endpoint we will subscribe to the super-node over
(the server.ipcPath
or server.wsEndpoint
that the super-node has defined in their config file).
subscription.backfill
specifies whether or not the super-node should look up historical data in its cache and
send that to the subscriber, if this is set to false
then the super-node only forwards newly synced/incoming data.
subscription.backfillOnly
will tell the super-node to only send historical data and not stream incoming data going forward.
subscription.startingBlock
is the starting block number for the range we want to receive data in.
subscription.endingBlock
is the ending block number for the range we want to receive data in;
setting to 0 means there is no end/we will continue indefinitely.
subscription.headerFilter
has two sub-options: off
and uncles
. Setting off
to true tells the super-node to
not send any headers to the subscriber; setting uncles
to true tells the super-node to send uncles in addition to normal headers.
subscription.trxFilter
has three sub-options: off
, src
, and dst
. Setting off
to true tells the super-node to
not send any transactions to the subscriber; src
and dst
are string arrays which can be filled with ETH addresses we want to filter transactions for,
if they have any addresses then the super-node will only send transactions that were sent or received by the addresses contained
in src
and dst
, respectively.
subscription.receiptFilter
has two sub-options: off
and topics
. Setting off
to true tells the super-node to
not send any receipts to the subscriber; topic0s
is a string array which can be filled with event topics we want to filter for,
if it has any topics then the super-node will only send receipts that contain logs which have that topic0.
subscription.stateFilter
has three sub-options: off
, addresses
, and intermediateNodes
. Setting off
to true tells the super-node to
not send any state data to the subscriber; addresses
is a string array which can be filled with ETH addresses we want to filter state for,
if it has any addresses then the super-node will only send state leafs (accounts) corresponding to those account addresses. By default the super-node
only sends along state leafs, if we want to receive branch and extension nodes as well intermediateNodes
can be set to true
.
subscription.storageFilter
has four sub-options: off
, addresses
, storageKeys
, and intermediateNodes
. Setting off
to true tells the super-node to
not send any storage data to the subscriber; addresses
is a string array which can be filled with ETH addresses we want to filter storage for,
if it has any addresses then the super-node will only send storage nodes from the storage tries at those state addresses. storageKeys
is another string
array that can be filled with storage keys we want to filter storage data for. It is important to note that the storageKeys are the actual keccak256 hashes, whereas
the addresses in the addresses
fields are the ETH addresses and not their keccak256 hashes that serve as the actual state keys. By default the super-node
only sends along storage leafs, if we want to receive branch and extension nodes as well intermediateNodes
can be set to true
.