ipld-eth-server/documentation/architecture.md
2020-08-05 00:23:06 -05:00

7.9 KiB

ipfs-blockchain-watcher architecture

  1. Processes
  2. Command
  3. Configuration
  4. Database
  5. APIs
  6. Resync
  7. IPFS Considerations

Processes

ipfs-blockchain-watcher is a service comprised of the following interfaces:

  • Payload Fetcher: Fetches raw chain data from a half-duplex endpoint (HTTP/IPC), used for historical data fetching. (BTC, ETH).
  • Payload Streamer: Streams raw chain data from a full-duplex endpoint (WebSocket/IPC), used for syncing data at the head of the chain in real-time. (BTC, ETH).
  • Payload Converter: Converters raw chain data to an intermediary form prepared for IPFS publishing. (BTC, ETH).
  • IPLD Publisher: Publishes the converted data to IPFS, returning their CIDs and associated metadata for indexing. (BTC, ETH).
  • CID Indexer: Indexes CIDs in Postgres with their associated metadata. This metadata is chain specific and selected based on utility. (BTC, ETH).
  • CID Retriever: Retrieves CIDs from Postgres by searching against their associated metadata, is used to lookup data to serve API requests/subscriptions. (BTC, ETH).
  • IPLD Fetcher: Fetches the IPLDs needed to service API requests/subscriptions from IPFS using retrieved CIDS; can route through a IPFS block-exchange to search for objects that are not directly available. (BTC, ETH)
  • Response Filterer: Filters converted data payloads served to API subscriptions; filters according to the subscriber provided parameters. (BTC, ETH).
  • API: Expose RPC methods for clients to interface with the data. Chain-specific APIs should aim to recapitulate as much of the native API as possible. (VDB, ETH).

Appropriating the service for a new chain is done by creating underlying types to satisfy these interfaces for the specifics of that chain.

The service uses these interfaces to operate in any combination of three modes: sync, serve, and backfill.

  • Sync: Streams raw chain data at the head, converts and publishes it to IPFS, and indexes the resulting set of CIDs in Postgres with useful metadata.
  • BackFill: Automatically searches for and detects gaps in the DB; fetches, converts, publishes, and indexes the data to fill these gaps.
  • Serve: Opens up IPC, HTTP, and WebSocket servers on top of the ipfs-blockchain-watcher DB and any concurrent sync and/or backfill processes.

These three modes are all operated through a single vulcanizeDB command: watch

Command

Usage: ./ipfs-blockchain-watcher watch --config={config.toml}

Configuration can also be done through CLI options and/or environmental variables. CLI options can be found using ./ipfs-blockchain-watcher watch --help.

Config

Below is the set of universal config parameters for the ipfs-blockchain-watcher command, in .toml form, with the respective environmental variables commented to the side. This set of parameters needs to be set no matter the chain type.

[database]
    name     = "vulcanize_public" # $DATABASE_NAME
    hostname = "localhost" # $DATABASE_HOSTNAME
    port     = 5432 # $DATABASE_PORT
    user     = "vdbm" # $DATABASE_USER
    password = "" # $DATABASE_PASSWORD

[ipfs]
    path = "~/.ipfs" # $IPFS_PATH
    mode = "direct" # $IPFS_MODE

[watcher]
    chain = "bitcoin" # $SUPERNODE_CHAIN
    server = true # $SUPERNODE_SERVER
    ipcPath = "~/.vulcanize/vulcanize.ipc" # $SUPERNODE_IPC_PATH
    wsPath = "127.0.0.1:8082" # $SUPERNODE_WS_PATH
    httpPath = "127.0.0.1:8083" # $SUPERNODE_HTTP_PATH
    sync = true # $SUPERNODE_SYNC
    workers = 1 # $SUPERNODE_WORKERS
    backFill = true # $SUPERNODE_BACKFILL
    frequency = 45 # $SUPERNODE_FREQUENCY
    batchSize = 1 # $SUPERNODE_BATCH_SIZE
    batchNumber = 50 # $SUPERNODE_BATCH_NUMBER
    timeout = 300 # $HTTP_TIMEOUT
    validationLevel = 1 # $SUPERNODE_VALIDATION_LEVEL

Additional parameters need to be set depending on the specific chain.

For Bitcoin:

[bitcoin]
    wsPath  = "127.0.0.1:8332" # $BTC_WS_PATH
    httpPath = "127.0.0.1:8332" # $BTC_HTTP_PATH
    pass = "password" # $BTC_NODE_PASSWORD
    user = "username" # $BTC_NODE_USER
    nodeID = "ocd0" # $BTC_NODE_ID
    clientName = "Omnicore" # $BTC_CLIENT_NAME
    genesisBlock = "000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f" # $BTC_GENESIS_BLOCK
    networkID = "0xD9B4BEF9" # $BTC_NETWORK_ID

For Ethereum:

[ethereum]
    wsPath  = "127.0.0.1:8546" # $ETH_WS_PATH
    httpPath = "127.0.0.1:8545" # $ETH_HTTP_PATH
    nodeID = "arch1" # $ETH_NODE_ID
    clientName = "Geth" # $ETH_CLIENT_NAME
    genesisBlock = "0xd4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3" # $ETH_GENESIS_BLOCK
    networkID = "1" # $ETH_NETWORK_ID

Database

Currently, ipfs-blockchain-watcher persists all data to a single Postgres database. The migrations for this DB can be found here. Chain-specific data is populated under a chain-specific schema (e.g. eth and btc) while shared data- such as the IPFS blocks table- is populated under the public schema. Subsequent watchers which act on the raw chain data should build and populate their own schemas or separate databases entirely.

In the future, the database architecture will be moving to a foreign table based architecture wherein a single db is used for shared data while each watcher uses its own database and accesses and acts on the shared data through foreign tables. Isolating watchers to their own databases will prevent complications and conflicts between watcher db migrations.

APIs

ipfs-blockchain-watcher provides mutliple types of APIs by which to interface with its data. More detailed information on the APIs can be found here.

Resync

A separate command resync is available for directing the resyncing of data within specified ranges. This is useful if there is a need to re-validate a range of data using a new source or clean out bad/deprecated data. More detailed information on this command can be found here.

IPFS Considerations

Currently the IPLD Publisher and Fetcher can either use internalized IPFS processes which interface with a local IPFS repository, or can interface directly with the backing Postgres database. Both these options circumvent the need to run a full IPFS daemon with a go-ipld-eth or go-ipld-btc plugin. The former approach can lead to issues with lock-contention on the IPFS repo if another IPFS process is configured and running at the same $IPFS_PATH, it also necessitates the need for a locally configured IPFS repository. The later bypasses the need for a configured IPFS repository/$IPFS_PATH and allows all Postgres write operations at a given block height to occur in a single transaction, the only disadvantage is that by avoiding moving through an IPFS node intermediary the direct ability to reach out to the block exchange for data not found locally is lost.

Once go-ipld-eth and go-ipld-btc have been updated to work with a modern version of PG-IPFS, an additional option will be provided to direct all publishing and fetching of IPLD objects through a remote IPFS daemon.