ipld-eth-state-snapshot

History

Roy Crihfield 00141776bf Refactor to use statediff plugin (#1 ) * Refactors to replace most of the code with the statediff plugin. * Adds basic CI test workflows for Gitea * Refactors fixtures to use https://git.vdb.to/cerc-io/eth-testing * Renames env vars for consistency with flags and other services: - LOGRUS_{LEVEL,FILE} => LOG_LEVEL, etc. - LVL_DB_PATH => LEVELDB_PATH - ANCIENT_DB_PATH => LEVELDB_ANCIENT - These will need to be updated wherever they are used Reviewed-on: #1		2023-09-29 18:43:26 +00:00
..
compare-snapshots.sh	Refactor to use statediff plugin (#1 )	2023-09-29 18:43:26 +00:00
filter-bad-rows.sh	Add helper scripts for data dump correction (#57 )	2022-08-17 15:14:14 +05:30
find-bad-rows.sh	Add helper scripts for data dump correction (#57 )	2022-08-17 15:14:14 +05:30
README.md	Refactor to use statediff plugin (#1 )	2023-09-29 18:43:26 +00:00

README.md

Data Validation

For a given table in the ipld-eth-db schema, we know the number of columns to be expected in each row in the data dump:

Table Expected columns

public.nodes 5

ipld.blocks 3

eth.header_cids 16

eth.state_cids 8

eth.storage_cids 9

Table	Expected columns
`public.nodes`	5
`ipld.blocks`	3
`eth.header_cids`	16
`eth.state_cids`	8
`eth.storage_cids`	9

Find Bad Data

Run the following command to find any rows having unexpected number of columns:

./scripts/find-bad-rows.sh -i <input-file> -c <expected-columns> -o [output-file] -d [include-data]

input-file -i: Input data file path
expected-columns -c: Expected number of columns in each row of the input file
output-file -o: Output destination file path (default: STDOUT)
include-data -d: Whether to include the data row in the output (true | false) (default: false)

The output is of format: row number, number of columns, the data row

Eg:

./scripts/find-bad-rows.sh -i eth.state_cids.csv -c 8 -o res.txt -d true

Output:

1 9 1500000,xxxxxxxx,0x83952d392f9b0059eea94b10d1a095eefb1943ea91595a16c6698757127d4e1c,,baglacgzasvqcntdahkxhufdnkm7a22s2eetj6mx6nzkarwxtkvy4x3bubdgq,\x0f,0,f,/blocks/,DMQJKYBGZRQDVLT2CRWVGPQNNJNCCJU7GL7G4VAI3LZVK4OL5Q2ARTI

Eg:

./scripts/find-bad-rows.sh -i public.nodes.csv -c 5 -o res.txt -d true
./scripts/find-bad-rows.sh -i ipld.blocks.csv -c 3 -o res.txt -d true
./scripts/find-bad-rows.sh -i eth.header_cids.csv -c 16 -o res.txt -d true
./scripts/find-bad-rows.sh -i eth.state_cids.csv -c 8 -o res.txt -d true
./scripts/find-bad-rows.sh -i eth.storage_cids.csv -c 9 -o res.txt -d true

Data Cleanup

In case of column count mismatch, data from file mode dumps can't be imported readily into ipld-eth-db.

Filter Bad Data

Run the following command to filter out rows having unexpected number of columns:

./scripts/filter-bad-rows.sh -i <input-file> -c <expected-columns> -o <output-file>

input-file -i: Input data file path
expected-columns -c: Expected number of columns in each row of the input file

output-file -o: Output destination file path

Eg:

./scripts/filter-bad-rows.sh -i public.nodes.csv -c 5 -o cleaned-public.nodes.csv
./scripts/filter-bad-rows.sh -i ipld.blocks.csv -c 3 -o cleaned-ipld.blocks.csv
./scripts/filter-bad-rows.sh -i eth.header_cids.csv -c 16 -o cleaned-eth.header_cids.csv
./scripts/filter-bad-rows.sh -i eth.state_cids.csv -c 8 -o cleaned-eth.state_cids.csv
./scripts/filter-bad-rows.sh -i eth.storage_cids.csv -c 9 -o cleaned-eth.storage_cids.csv