Write state diffs directly to Postgres #30
No reviewers
Labels
No Label
bug
critical
duplicate
enhancement
epic
help wanted
in progress
invalid
low priority
question
rebase
v1
v5
wontfix
Copied from Github
Kind/Breaking
Kind/Bug
Kind/Documentation
Kind/Enhancement
Kind/Feature
Kind/Security
Kind/Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Status
Abandoned
Status
Blocked
Status
Need More Info
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: cerc-io/go-ethereum#30
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "statediff-into-postgres"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This provides a new RPC method
statediff_WriteStateDiffAt
to write diffs directly into PostgreSQL upon client request (per https://github.com/vulcanize/eth-statediff-service/issues/11). To do this, I copied over and refactored some of the tooling from https://github.com/vulcanize/ipld-eth-indexer.@i-norden How should we merge the code hash & storage output from https://github.com/vulcanize/go-ethereum/pull/27? I suppose it depends on how that data is consumed.
More minor questions:
I still need a chance to take a closer look at the code but so far this looks great! In regards to those questions:
index*CID
tomerge*CID
orupsert*CID
, to be more directSooner rather than later, please.
Will try to finish up my review today @roysc
In regards to the CLI vs .toml config- I don't have a strong preference but for the sake of usability inside of a docker container in staging environments it is often more convenient to be able to pass config params in by CLI flag or env variable than by .toml file.
I might challenge the idea we should merge eth-statediff-service into here, as the standalone service binary is much smaller and quicker to build and it allows us to provide focused top-level README documentation specific to that service. But I think I could be convinced of the advantages of merging in here.
Alright, I incorporated the SQL output tests from
ipld-eth-indexer
and otherwise this is pretty much covered by the existing test suite since it's behind the original Builder interface.What remains -
ipld-eth-indexer
to test it@AFDudley I can start on concurrency as soon as we merge this.
Looks great! The main thing is that we need to add a new service loop that automatically writes diffs at the head of chain. Only a couple small comments aside from that.
Eventually we'll want to figure out a better way of sharing packages between this and ipld-eth-indexer (use a 3rd repo for some stuff).
And eventually we should convert these ported tests away from ginkgo/gomega since go-ethereum uses the standard lib test package.
I think it'd be better to define a new StateDiffParams struct or pass the three arguments explicitly rather than this dbParams array pointer, especially since it shows up as an argument in more than one function.
@ -0,0 +22,4 @@
// Env variables
const (
DATABASE_NAME = "DATABASE_NAME"
Ideally we would pass individual db connection string components in by CLI flag in a manner that is overridable by these env variables
We need another method that we turn on by CLI param, which automatically writes statediffs to Postgres as they stream off the chainEvents channel of the syncing chain. This will look almost exactly like
Loop()
method on line 154, except instead of sending the diff to subscribers it writes directly to Postgres.@ -0,0 +22,4 @@
// Env variables
const (
DATABASE_NAME = "DATABASE_NAME"
Note: this doesn't need to be resolved now
Definitely - this was a bit of a punt, meant to write a TODO
Makes sense - would that just be triggered via CLI or also toggleable by an RPC call?
Cool cool yeah this doesn't need to be done now
I think just turn it on by CLI is fine for now
@roysc Btw I don't think we should try to port the concurrency stuff into here at this time, there will be data races with the regular ongoing sync processes plus the statediff service needs to operate through the same StateDB object as the BlockChain to utilize it's in-memory cache. This is another reason to maintain the separate repos IMO.
Rather, I think we should port these changes into eth-statediff-service and in doing so utilize the concurrent node iterator used there.
Hey @roysc, tested this locally and it appears to be working!
Using
./build/bin/geth --syncmode=full --gcmode=archive --statediff --statediff.writing --statediff.db=postgres://localhost:5432/vulcanize_testing?sslmode=disable --statediff.dbnodeid=yeee --statediff.dbclientname=yooo
Seeing results in every cid table (including uncles and receipts)
Also working without
--gcmode=archive
Although there is this error when shutting it down in this mode:
ERROR[11-04|11:06:19.592] Dangling trie nodes after full cleanup
, which I think may actually be appropriate in our case since we purposely prevent dereferencing of nodes we have not diffed yet.Noting some changes that don't necessarily need to be made in this PR.
We should change this to the Debug level
We should update all the ported packages to use Geth's log package
Similarly, we should update any tests we ported over to use the default test pkg that the rest of geth uses instead of ginko/gomega
@i-norden Refactored the logging and tests - (btw. noticed we have the gomega module vendored in, guessing we can remove it).
@roysc yes please get rid of the vendor/ dir
LGTM. I think we should go ahead and merge this so we can cut a release and apply the rebase on top. Running it locally things look good and I have confirmed the codehash=>code is being stored correctly. There is a pretty big lag between the head in the Postgres db and the head of the current block import but I'm not surprised considering how quickly it imports chunks of blocks while catching up to head and the limiting resources of my computer.