2020-11-26 10:31:29 +00:00
# eth-statediff-service
2020-08-23 20:00:11 +00:00
2023-08-31 09:31:34 +00:00
[![Go Report Card ](https://goreportcard.com/badge/github.com/cerc-io/eth-statediff-service )](https://goreportcard.com/report/github.com/cerc-io/eth-statediff-service)
2020-08-23 20:00:11 +00:00
2023-08-25 11:02:32 +00:00
A standalone statediffing service which runs directly on top of a `go-ethereum` LevelDB instance.
2020-08-19 06:07:34 +00:00
This service can serve historical state data over the same rpc interface as
2022-09-20 18:00:10 +00:00
[statediffing geth ](https://github.com/cerc-io/go-ethereum ) without needing to run a full node.
2020-08-19 06:07:34 +00:00
2022-05-12 14:28:08 +00:00
## Setup
2023-08-25 11:02:32 +00:00
Configure access to the private Git server at `git.vdb.to` , then build the executable:
2022-05-12 14:28:08 +00:00
```bash
2023-08-25 11:02:32 +00:00
go build .
2022-05-12 14:28:08 +00:00
```
2022-07-13 13:28:54 +00:00
## Configuration
2023-08-25 11:02:32 +00:00
See [./environments/example.toml ](./environments/example.toml ) for an annotated example config file.
2022-07-13 13:28:54 +00:00
2022-05-12 14:28:08 +00:00
### Local Setup
2022-05-13 11:55:31 +00:00
* Create a chain config file `chain.json` according to chain config in genesis json file used by local geth.
Example:
```json
{
"chainId": 41337,
"homesteadBlock": 0,
"eip150Block": 0,
"eip150Hash": "0x0000000000000000000000000000000000000000000000000000000000000000",
"eip155Block": 0,
"eip158Block": 0,
"byzantiumBlock": 0,
"constantinopleBlock": 0,
"petersburgBlock": 0,
"istanbulBlock": 0,
"clique": {
"period": 5,
"epoch": 30000
}
}
```
2022-05-12 14:28:08 +00:00
2022-07-13 13:28:54 +00:00
Provide the path to the above file in the config.
## Usage
* Create / update the config file (refer to example config above).
### `serve`
2022-07-19 09:30:01 +00:00
* To serve the statediff RPC API:
2022-07-13 13:28:54 +00:00
```bash
./eth-statediff-service serve --config=< config path >
2022-05-12 14:28:08 +00:00
```
2022-07-13 13:28:54 +00:00
Example:
```bash
./eth-statediff-service serve --config environments/config.toml
2022-05-12 14:28:08 +00:00
```
2022-07-19 09:30:01 +00:00
* Available RPC methods:
2022-07-13 13:28:54 +00:00
* `statediff_stateTrieAt()`
* `statediff_streamCodeAndCodeHash()`
* `statediff_stateDiffAt()`
* `statediff_writeStateDiffAt()`
* `statediff_writeStateDiffsInRange()`
Example:
```bash
2023-08-25 11:02:32 +00:00
curl -X POST -H 'Content-Type: application/json' --data '{
"jsonrpc": "2.0",
"method": "statediff_writeStateDiffsInRange",
"params": [0, 1, {
"ncludeBlock": true,
"includeReceipts": true,
"includeTD": true,
"includeCode": true
}
],
"id": 1
}' "$HOST":"$PORT"
2022-05-17 09:05:11 +00:00
```
2022-07-13 13:28:54 +00:00
* Prerun:
2023-08-25 11:02:32 +00:00
* The process can be configured locally with sets of ranges to process as a "prerun" to
processing directed by the server endpoints.
* This is done by turning "prerun" on in the config (`statediff.prerun = true`) and defining
ranges and params in the `prerun` section of the config.
* Set the range using `prerun.start` and `prerun.stop` . Use `prerun.ranges` if prerun on more
than one range is required.
2022-07-19 09:30:01 +00:00
* NOTE: Currently, `params.includeTD` must be set to / passed as `true` .
2020-08-19 06:07:34 +00:00
2022-07-18 13:44:41 +00:00
## Monitoring
2020-12-03 02:08:20 +00:00
2022-07-18 13:44:41 +00:00
* Enable metrics using config parameters `prom.metrics` and `prom.http` .
* `eth-statediff-service` exposes following prometheus metrics at `/metrics` endpoint:
* `ranges_queued` : Number of range requests currently queued.
* `loaded_height` : The last block that was loaded for processing.
* `processed_height` : The last block that was processed.
* `stats.t_block_load` : Block loading time.
* `stats.t_block_processing` : Block (header, uncles, txs, rcts, tx trie, rct trie) processing time.
* `stats.t_state_processing` : State (state trie, storage tries, and code) processing time.
* `stats.t_postgres_tx_commit` : Postgres tx commit time.
* `http.count` : HTTP request count.
* `http.duration` : HTTP request duration.
* `ipc.count` : Unix socket connection count.
## Tests
2020-08-19 06:07:34 +00:00
2022-07-18 13:44:41 +00:00
* Run unit tests:
2020-11-26 10:31:29 +00:00
2022-07-18 13:44:41 +00:00
```bash
make test
2022-07-13 13:28:54 +00:00
```
2022-05-12 14:28:08 +00:00
2022-07-18 13:44:41 +00:00
## Import output data in file mode into a database
2023-08-25 11:02:32 +00:00
* When `eth-statediff-service` is run in file mode (`database.type`: `file` ) the output is in form of a SQL
file or multiple CSV files.
2022-07-18 13:44:41 +00:00
2022-09-22 08:22:47 +00:00
### SQL
2022-07-18 13:44:41 +00:00
* Assuming the output files are located in host's `./output_dir` directory.
* Create a directory to store post-processed output:
2022-05-12 14:28:08 +00:00
2022-07-13 13:28:54 +00:00
```bash
2022-07-18 13:44:41 +00:00
mkdir -p output_dir/processed_output
2022-07-13 13:28:54 +00:00
```
2020-11-26 10:31:29 +00:00
2022-09-22 08:22:47 +00:00
* (Optional) Get row counts in the output:
```bash
wc -l output_dir/statediff.sql > output_stats.txt
```
2020-12-03 02:08:20 +00:00
2022-07-13 13:28:54 +00:00
* De-duplicate data:
2020-11-26 10:31:29 +00:00
2022-07-13 13:28:54 +00:00
```bash
2022-07-18 13:44:41 +00:00
sort -u output_dir/statediff.sql -o output_dir/processed_output/deduped-statediff.sql
2022-07-13 13:28:54 +00:00
```
2020-11-26 10:31:29 +00:00
2022-07-18 13:44:41 +00:00
* Copy over the post-processed output files to the DB server (say in `/output_dir` ).
2022-07-13 13:28:54 +00:00
* Run the following to import data:
2020-08-19 06:07:34 +00:00
2022-07-13 13:28:54 +00:00
```bash
2022-07-18 13:44:41 +00:00
psql -U < DATABASE_USER > -h < DATABASE_HOSTNAME > -p < DATABASE_PORT > < DATABASE_NAME > --set ON_ERROR_STOP=on -f /output_dir/processed_output/deduped-statediff.sql
2022-07-13 13:28:54 +00:00
```
2020-08-19 06:07:34 +00:00
2022-07-13 13:28:54 +00:00
### CSV
2020-08-19 06:07:34 +00:00
2022-09-22 08:22:47 +00:00
* Create an env file with the required variables. Refer [.sample.env ](./scripts/.sample.env ).
2022-07-18 13:44:41 +00:00
2022-09-22 08:22:47 +00:00
* (Optional) Get row counts in the output:
2022-07-18 13:44:41 +00:00
2022-09-22 08:22:47 +00:00
```bash
./scripts/count-lines.sh < ENV_FILE_PATH >
2022-07-13 13:28:54 +00:00
```
2020-11-26 10:31:29 +00:00
2022-09-22 08:22:47 +00:00
* De-duplicate data:
2020-08-19 06:07:34 +00:00
2022-07-13 13:28:54 +00:00
```bash
2022-09-22 08:22:47 +00:00
./scripts/dedup.sh < ENV_FILE_PATH >
2022-07-13 13:28:54 +00:00
```
2020-11-26 10:31:29 +00:00
2022-09-22 08:22:47 +00:00
* Perform column checks:
2021-11-19 01:18:12 +00:00
2022-07-13 13:28:54 +00:00
```bash
2022-09-22 08:22:47 +00:00
./scripts/check-columns.sh < ENV_FILE_PATH >
```
2022-07-13 13:28:54 +00:00
2022-09-22 08:22:47 +00:00
Check the output logs for any rows detected with unexpected number of columns.
2022-07-13 13:28:54 +00:00
2022-09-22 08:22:47 +00:00
Example:
2022-07-13 13:28:54 +00:00
2022-09-22 08:22:47 +00:00
```bash
# log
eth.header_cids
Start: Wednesday 21 September 2022 06:00:38 PM IST
Time taken: 00:00:05
End: Wednesday 21 September 2022 06:00:43 PM IST
Total bad rows: 1 ./check-columns/eth.header_cids.txt
# bad row output
# line number, num. of columns, data
23 17 22,xxxxxx,0x07f5ea5c94aa8dea60b28f6b6315d92f2b6d78ca4b74ea409adeb191b5a114f2,0x5918487321aa57dd0c50977856c6231e7c4ee79e95b694c7c8830227d77a1ecc,bagiacgzaa726uxeuvkg6uyfsr5vwgfozf4vw26gkjn2ouqe232yzdnnbctza,45,geth,0,0xad8fa8df61b98dbda7acd6ca76d5ce4cbba663d5f608cc940957adcdb94cee8d,0xc621412320a20b4aaff5363bdf063b9d13e394ef82e55689ab703aae5db08e26,0x71ec1c7d81269ce115be81c81f13e1cc2601c292a7f20440a77257ecfdc69940,0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347,\x2000000000000000000000000000000000000000000000000000020000000000000000000000000000000000000000000000000000000000000000000,1658408419,/blocks/DMQAP5PKLSKKVDPKMCZI623DCXMS6K3NPDFEW5HKICNN5MMRWWQRJ4Q,1,0x0000000000000000000000000000000000000000
```
2022-07-13 13:28:54 +00:00
2022-09-22 08:22:47 +00:00
* Import data using `timescaledb-parallel-copy` :
(requires [`timescaledb-parallel-copy` ](https://github.com/timescale/timescaledb-parallel-copy ) installation; readily comes with TimescaleDB docker image)
2022-07-13 13:28:54 +00:00
2022-09-22 08:22:47 +00:00
```bash
./scripts/timescaledb-import.sh < ENV_FILE_PATH >
2022-07-18 13:44:41 +00:00
```
* NOTE: `COPY` command on CSVs inserts empty strings as `NULL` in the DB. Passing `FORCE_NOT_NULL <COLUMN_NAME>` forces it to insert empty strings instead. This is required to maintain compatibility of the imported statediff data with the data generated in `postgres` mode. Reference: https://www.postgresql.org/docs/14/sql-copy.html
2022-10-10 23:24:24 +00:00
### Stats
The binary includes a `stats` command which reports stats for the offline or remote levelDB.
At this time, the only stat supported is to return the latest/highest block height and hash found the levelDB, this is
useful for determining what the upper limit is for a standalone statediffing process on a given levelDB.
`./eth-statediff-service stats --config={path to toml config file}`