Parallelizable statediffing process that extracts from an offline levelDB
Go to file
Thomas E Lackey 167cd2839c Refactor to use plugeth-statediff (#2) (#3)
Rebase of #1 onto real `v5` branch.

* Updates to the v5 schema, by linking the statediff plugin (as of this branch cerc-io/plugeth-statediff#15). This replaces the existing builder code.
* Adds basic CI workflows
* Updates Docker config and docs, cleans up some things

Co-authored-by: Thomas E Lackey <telackey@bozemanpass.com>
Reviewed-on: #2
Co-authored-by: Roy Crihfield <roy@manteia.ltd>
Co-committed-by: Roy Crihfield <roy@manteia.ltd>

Co-authored-by: Roy Crihfield <roy@manteia.ltd>
Reviewed-on: #3
2023-09-29 20:27:08 +00:00
.gitea/workflows Refactor to use plugeth-statediff (#2) (#3) 2023-09-29 20:27:08 +00:00
cmd Refactor to use plugeth-statediff (#2) (#3) 2023-09-29 20:27:08 +00:00
environments Refactor to use plugeth-statediff (#2) (#3) 2023-09-29 20:27:08 +00:00
monitoring Add grafana dashboard. 2021-11-01 19:39:00 +05:30
pkg Refactor to use plugeth-statediff (#2) (#3) 2023-09-29 20:27:08 +00:00
scripts Refactor to use plugeth-statediff (#2) (#3) 2023-09-29 20:27:08 +00:00
test Refactor to use plugeth-statediff (#2) (#3) 2023-09-29 20:27:08 +00:00
version bump major version 2022-02-01 12:11:57 -06:00
.dockerignore Refactor to use plugeth-statediff (#2) (#3) 2023-09-29 20:27:08 +00:00
.gitignore Reuse builder from vulcanize-geth (#92) 2022-06-24 14:12:58 +05:30
Dockerfile Refactor to use plugeth-statediff (#2) (#3) 2023-09-29 20:27:08 +00:00
go.mod Refactor to use plugeth-statediff (#2) (#3) 2023-09-29 20:27:08 +00:00
go.sum Refactor to use plugeth-statediff (#2) (#3) 2023-09-29 20:27:08 +00:00
LICENSE cobra init 2020-08-18 23:27:37 -05:00
main.go Cerc refactor (#104) 2022-09-20 14:00:10 -04:00
README.md Refactor to use plugeth-statediff (#2) (#3) 2023-09-29 20:27:08 +00:00
startup_script.sh Refactor to use plugeth-statediff (#2) (#3) 2023-09-29 20:27:08 +00:00

eth-statediff-service

Go Report Card

A standalone statediffing service which runs directly on top of a go-ethereum LevelDB instance. This service can serve historical state data over the same rpc interface as statediffing geth without needing to run a full node.

Setup

Configure access to the private Git server at git.vdb.to, then build the executable:

go build .

Configuration

See ./environments/example.toml for an annotated example config file.

Note: previous versions of this service used different variable names. To update, change the following:

  • LVLDB_MODE, LVLDB_PATH, LVLDB_ANCIENT, LVLDB_URL => LEVELDB_*
  • LOG_FILE_PATH => LOG_FILE

Local Setup

  • Create a chain config file chain.json according to chain config in genesis json file used by local geth.

    Example:

    {
      "chainId": 41337,
      "homesteadBlock": 0,
      "eip150Block": 0,
      "eip150Hash": "0x0000000000000000000000000000000000000000000000000000000000000000",
      "eip155Block": 0,
      "eip158Block": 0,
      "byzantiumBlock": 0,
      "constantinopleBlock": 0,
      "petersburgBlock": 0,
      "istanbulBlock": 0,
      "clique": {
        "period": 5,
        "epoch": 30000
      }
    }
    

    Provide the path to the above file in the config.

Usage

  • Create / update the config file (refer to example config above).

serve

  • To serve the statediff RPC API:

    ./eth-statediff-service serve --config=<config path>
    

    Example:

    ./eth-statediff-service serve --config environments/config.toml
    
  • Available RPC methods:

    • statediff_stateTrieAt()
    • statediff_streamCodeAndCodeHash()
    • statediff_stateDiffAt()
    • statediff_writeStateDiffAt()
    • statediff_writeStateDiffsInRange()

    Example:

    curl -X POST -H 'Content-Type: application/json' --data '{
      "jsonrpc": "2.0",
      "method": "statediff_writeStateDiffsInRange",
      "params": [0, 1, {
          "ncludeBlock": true,
          "includeReceipts": true,
          "includeTD": true,
          "includeCode": true
        }
      ],
      "id": 1
    }' "$HOST":"$PORT"
    
  • Prerun:

    • The process can be configured locally with sets of ranges to process as a "prerun" to processing directed by the server endpoints.
    • This is done by turning "prerun" on in the config (statediff.prerun = true) and defining ranges and params in the prerun section of the config.
    • Set the range using prerun.start and prerun.stop. Use prerun.ranges if prerun on more than one range is required.
  • NOTE: Currently, params.includeTD must be set to / passed as true.

Monitoring

  • Enable metrics using config parameters prom.metrics and prom.http.
  • eth-statediff-service exposes following prometheus metrics at /metrics endpoint:
    • ranges_queued: Number of range requests currently queued.
    • loaded_height: The last block that was loaded for processing.
    • processed_height: The last block that was processed.
    • stats.t_block_load: Block loading time.
    • stats.t_block_processing: Block (header, uncles, txs, rcts, tx trie, rct trie) processing time.
    • stats.t_state_processing: State (state trie, storage tries, and code) processing time.
    • stats.t_postgres_tx_commit: Postgres tx commit time.
    • http.count: HTTP request count.
    • http.duration: HTTP request duration.
    • ipc.count: Unix socket connection count.

Tests

  • Run unit tests:

    make test
    

Import output data in file mode into a database

  • When eth-statediff-service is run in file mode (database.type: file) the output is in form of a SQL file or multiple CSV files.

SQL

  • Assuming the output files are located in host's ./output_dir directory.

  • Create a directory to store post-processed output:

    mkdir -p output_dir/processed_output
    
  • (Optional) Get row counts in the output:

    wc -l output_dir/statediff.sql > output_stats.txt
    
  • De-duplicate data:

    sort -u output_dir/statediff.sql -o output_dir/processed_output/deduped-statediff.sql
    
  • Copy over the post-processed output files to the DB server (say in /output_dir).

  • Run the following to import data:

    psql -U <DATABASE_USER> -h <DATABASE_HOSTNAME> -p <DATABASE_PORT> <DATABASE_NAME> --set ON_ERROR_STOP=on -f /output_dir/processed_output/deduped-statediff.sql
    

CSV

  • Create an env file with the required variables. Refer .sample.env.

  • (Optional) Get row counts in the output:

    ./scripts/count-lines.sh <ENV_FILE_PATH>
    
  • De-duplicate data:

    ./scripts/dedup.sh <ENV_FILE_PATH>
    
  • Perform column checks:

    ./scripts/check-columns.sh <ENV_FILE_PATH>
    

    Check the output logs for any rows detected with unexpected number of columns.

    Example:

    # log
    eth.header_cids
    Start: Wednesday 21 September 2022 06:00:38 PM IST
    Time taken: 00:00:05
    End: Wednesday 21 September 2022 06:00:43 PM IST
    Total bad rows: 1 ./check-columns/eth.header_cids.txt
    
    # bad row output
    # line number, num. of columns, data
    23 17 22,xxxxxx,0x07f5ea5c94aa8dea60b28f6b6315d92f2b6d78ca4b74ea409adeb191b5a114f2,0x5918487321aa57dd0c50977856c6231e7c4ee79e95b694c7c8830227d77a1ecc,bagiacgzaa726uxeuvkg6uyfsr5vwgfozf4vw26gkjn2ouqe232yzdnnbctza,45,geth,0,0xad8fa8df61b98dbda7acd6ca76d5ce4cbba663d5f608cc940957adcdb94cee8d,0xc621412320a20b4aaff5363bdf063b9d13e394ef82e55689ab703aae5db08e26,0x71ec1c7d81269ce115be81c81f13e1cc2601c292a7f20440a77257ecfdc69940,0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347,\x2000000000000000000000000000000000000000000000000000020000000000000000000000000000000000000000000000000000000000000000000,1658408419,/blocks/DMQAP5PKLSKKVDPKMCZI623DCXMS6K3NPDFEW5HKICNN5MMRWWQRJ4Q,1,0x0000000000000000000000000000000000000000
    
  • Import data using timescaledb-parallel-copy:
    (requires timescaledb-parallel-copy installation; readily comes with TimescaleDB docker image)

    ./scripts/timescaledb-import.sh <ENV_FILE_PATH>
    
  • NOTE: COPY command on CSVs inserts empty strings as NULL in the DB. Passing FORCE_NOT_NULL <COLUMN_NAME> forces it to insert empty strings instead. This is required to maintain compatibility of the imported statediff data with the data generated in postgres mode. Reference: https://www.postgresql.org/docs/14/sql-copy.html

Stats

The binary includes a stats command which reports stats for the offline or remote levelDB.

At this time, the only stat supported is to return the latest/highest block height and hash found the levelDB, this is useful for determining what the upper limit is for a standalone statediffing process on a given levelDB.

./eth-statediff-service stats --config={path to toml config file}