ipld-eth-server/documentation/super_node/architecture.md
2020-04-17 17:31:48 -05:00

8.0 KiB

VulcanizeDB Super Node Architecture

The VulcanizeDB super node is a collection of interfaces that are used to extract, process, and store in Postgres-IPFS all chain data. The raw data indexed by the super node serves as the basis for more specific watchers and applications.

Currently the service supports complete processing of all Bitcoin and Ethereum data.

Table of Contents

  1. Processes
  2. Command
  3. Configuration
  4. Database
  5. APIs
  6. Resync
  7. IPFS Considerations

Processes

The super node service is comprised of the following interfaces:

  • Payload Fetcher: Fetches raw chain data from a half-duplex endpoint (HTTP/IPC), used for historical data fetching. (BTC, ETH).
  • Payload Streamer: Streams raw chain data from a full-duplex endpoint (WebSocket/IPC), used for syncing data at the head of the chain in real-time. (BTC, ETH).
  • Payload Converter: Converters raw chain data to an intermediary form prepared for IPFS publishing. (BTC, ETH).
  • IPLD Publisher: Publishes the converted data to IPFS, returning their CIDs and associated metadata for indexing. (BTC, ETH).
  • CID Indexer: Indexes CIDs in Postgres with their associated metadata. This metadata is chain specific and selected based on utility. (BTC, ETH).
  • CID Retriever: Retrieves CIDs from Postgres by searching against their associated metadata, is used to lookup data to serve API requests/subscriptions. (BTC, ETH).
  • IPLD Fetcher: Fetches the IPLDs needed to service API requests/subscriptions from IPFS using retrieved CIDS; can route through a IPFS block-exchange to search for objects that are not directly available. (BTC, ETH)
  • Response Filterer: Filters converted data payloads served to API subscriptions; filters according to the subscriber provided parameters. (BTC, ETH).
  • DB Cleaner: Used to clean out cached IPFS objects, CIDs, and associated metadata. Useful for removing bad data or to introduce incompatible changes to the db schema/tables. (BTC, ETH).
  • API: Expose RPC methods for clients to interface with the data. Chain-specific APIs should aim to recapitulate as much of the native API as possible. (VDB, ETH).

Appropriating the service for a new chain is done by creating underlying types to satisfy these interfaces for the specifics of that chain.

The service uses these interfaces to operate in any combination of three modes: sync, serve, and backfill.

  • Sync: Streams raw chain data at the head, converts and publishes it to IPFS, and indexes the resulting set of CIDs in Postgres with useful metadata.
  • BackFill: Automatically searches for and detects gaps in the DB; fetches, converts, publishes, and indexes the data to fill these gaps.
  • Serve: Opens up IPC, HTTP, and WebSocket servers on top of the superNode DB and any concurrent sync and/or backfill processes.

These three modes are all operated through a single vulcanizeDB command: superNode

Command

Usage: ./vulcanizedb superNode --config={config.toml}

Configuration can also be done through CLI options and/or environmental variables. CLI options can be found using ./vulcanizedb superNode --help.

Config

Below is the set of universal config parameters for the superNode command, in .toml form, with the respective environmental variables commented to the side. This set of parameters needs to be set no matter the chain type.

[database]
    name     = "vulcanize_public" # $DATABASE_NAME
    hostname = "localhost" # $DATABASE_HOSTNAME
    port     = 5432 # $DATABASE_PORT
    user     = "vdbm" # $DATABASE_USER
    password = "" # $DATABASE_PASSWORD

[ipfs]
    path = "~/.ipfs" # $IPFS_PATH

[superNode]
    chain = "bitcoin" # $SUPERNODE_CHAIN
    server = true # $SUPERNODE_SERVER
    ipcPath = "~/.vulcanize/vulcanize.ipc" # $SUPERNODE_IPC_PATH
    wsPath = "127.0.0.1:8082" # $SUPERNODE_WS_PATH
    httpPath = "127.0.0.1:8083" # $SUPERNODE_HTTP_PATH
    sync = true # $SUPERNODE_SYNC
    workers = 1 # $SUPERNODE_WORKERS
    backFill = true # $SUPERNODE_BACKFILL
    frequency = 45 # $SUPERNODE_FREQUENCY
    batchSize = 1 # $SUPERNODE_BATCH_SIZE
    batchNumber = 50 # $SUPERNODE_BATCH_NUMBER
    validationLevel = 1 # $SUPERNODE_VALIDATION_LEVEL

Additional parameters need to be set depending on the specific chain.

For Bitcoin:

[bitcoin]
    wsPath  = "127.0.0.1:8332" # $BTC_WS_PATH
    httpPath = "127.0.0.1:8332" # $BTC_HTTP_PATH
    pass = "password" # $BTC_NODE_PASSWORD
    user = "username" # $BTC_NODE_USER
    nodeID = "ocd0" # $BTC_NODE_ID
    clientName = "Omnicore" # $BTC_CLIENT_NAME
    genesisBlock = "000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f" # $BTC_GENESIS_BLOCK
    networkID = "0xD9B4BEF9" # $BTC_NETWORK_ID

For Ethereum:

[ethereum]
    wsPath  = "127.0.0.1:8546" # $ETH_WS_PATH
    httpPath = "127.0.0.1:8545" # $ETH_HTTP_PATH

Database

Currently, the super node persists all data to a single Postgres database. The migrations for this DB can be found here. Chain-specific data is populated under a chain-specific schema (e.g. eth and btc) while shared data- such as the IPFS blocks table- is populated under the public schema. Subsequent watchers which act on the raw chain data should build and populate their own schemas or separate databases entirely.

In the future, we will be moving to a foreign table based architecture wherein a single db is used for shared data while each watcher uses its own database and accesses and acts on the shared data through foreign tables. Isolating watchers to their own databases will prevent complications and conflicts between watcher db migrations.

APIs

The super node provides mutliple types of APIs by which to interface with its data. More detailed information on the APIs can be found here.

Resync

A separate command resync is available for directing the resyncing of data within specified ranges. This is useful if we want to re-validate a range of data using a new source or clean out bad/deprecated data. More detailed information on this command can be found here.

IPFS Considerations

Currently, the IPLD Publisher and Fetcher use internalized IPFS processes which interface directly with a local IPFS repository. This circumvents the need to run a full IPFS daemon with a go-ipld-eth plugin, but can lead to issues with lock-contention on the IPFS repo if another IPFS process is configured and running at the same $IPFS_PATH. This also necessitates the need for a locally configured IPFS repository.

Once go-ipld-eth has been updated to work with a modern version of PG-IPFS, an additional option will be provided to direct all publishing and fetching of IPLD objects through a remote IPFS daemon.