WIP: add rpc pubsub interface for account diffs #11
No reviewers
Labels
No Label
bug
critical
duplicate
enhancement
epic
help wanted
in progress
invalid
low priority
question
rebase
v1
v5
wontfix
Copied from Github
Kind/Breaking
Kind/Bug
Kind/Documentation
Kind/Enhancement
Kind/Feature
Kind/Security
Kind/Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Status
Abandoned
Status
Blocked
Status
Need More Info
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: cerc-io/go-ethereum#11
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "pubsub-rpc-account-diffing"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Looking for early feedback on this very early stage WIP PR.
The idea is to have a minimal rpc pubsub interface for emitting account diffs, so that we could try to incrementally upstream things like storage diffs, proofs, etc.
I structured the approach based on how I understand the logs pubsub rpc interface to be working - basically the
BlockChain
collects all logs created during block processing and posts them to theEventSystem
, which is responsible for filtering down to relevant logs and notifying subscribers.Still need to figure out how the
eth/filters/api.go
functions are called, which should enable testing this out locally. After that I'd like to add in some metadata (block number/hash) and enable clients to filter down to specific watched accounts.Haven't had a chance to review in detail but some initial questions: I assume this means we are scrapping the statediff service? How does this fit with the seed node which needs headers, transactions, receipts, state and storage diffs and to maintain relation between all the data?
Hey @i-norden - not at all! If anything, I'd say we're more likely to scrap this 🙂
My main motivation with this PR is just to see if we might be able to reduce the diff and stage the PRs incrementally, with the ultimate goal of merging in all of your work. The approach here is a little bit different than the statediffing service only because this seemed like the absolute minimum needed if all we care about are account diffs.
That said I haven't had a chance to run this code or write tests, so I was hoping to pick your and Elizabeth's brain about whether it looked like this approach could plausibly bear fruit.
Definitely no worries if you don't have the time to give this a thorough review, but I'd appreciate any thoughts you can spare!
Overall this seems like it could be a cool way to implement this. A couple of questions I have:
@ -517,6 +517,10 @@ func (fb *filterBackend) SubscribeLogsEvent(ch chan<- []*types.Log) event.Subscr
return fb.bc.SubscribeLogsEvent(ch)
}
func (fb *filterBackend) SubscribeStateDiffs(ch chan<- map[common.Address]state.Account) event.Subscription {
Is the reason that we're adding
SubscribeStateDiffs
to a bunch of places is because they all implement the Backend interface?It's confusing to know which backend we'll actually want to use for VDB. 🤔
Not sure if I'm reading this method correctly, but it seems like everything above this point would end up returning an empty collection of stateDiffs, is that correct?
I think that this way of determining state diffs is really cool, and seems like it could be more efficient than using the trie iterator. Though at the moment I'm a bit weary about using this over the trie iterator, because I'm worried we may missing something. This worry is almost entirely because I don't fully understand it yet, but I wonder if there's a way we could compare the dirty state diffs this returns, vs what the trie iterator returns to make sure the results are the same.
Also, is
GetDirtyAccounts
called before or after the block is finalized? I wonder if that could have any implications on what is considered a dirty account.@ -29,6 +29,7 @@ import (
"github.com/ethereum/go-ethereum/common"
I'm not sure I understand what the
filters
package is for - is it just another way to subscribe to new events in the chain? I think I've previously only interacted with new event subscriptions directly from theblockchain
package.@ -368,6 +399,10 @@ func (es *EventSystem) broadcast(filters filterIndex, ev interface{}) {
}
How come we are adding stateDiffs to the subscribeLogs? Is it because it's in the subscription struct, but it can't be nil so we're adding an empty map?
yeah that's accurate - I'm trying to follow the pattern that's used for logs, where the code returns the empty
coalescedLogs
until it reaches the line directly above this one wherelogs
are appended to the result set.A bit of context, basically this entire approach is designed to mimic my understanding of how the logs subscription works. Walking through it, my understanding is that:
So, big picture, my goal was to do the same thing - hook into the state processor to return all modified accounts, and enable the rpc subscription to apply arbitrary filters.
Definitely down for comparing results to make sure we're not missing anything 👍
I think that this approach should be at least as reliable as a pubsub rpc subscription to filtered logs, but am not sure whether that meets our requirements (have heard anecdotes about that subscription failing to return desired data).
In terms of efficiency, I do think there's some additional cost to the sync process in iterating through all the modified accounts again (something that also happens when state changes are committed after block processing), so maybe it's a good idea to hook into that call and derive modified accounts from there? 🤔
Right now,
GetDirtyAccounts
is called after the block is finalized but before the state is committed (since committing clears the cache). I think all accounts involved in a transaction should be marked as dirty as transactions are applied, but that's a little bit hard to trace so I'll see if I can dig in and document that thoroughly.@ -517,6 +517,10 @@ func (fb *filterBackend) SubscribeLogsEvent(ch chan<- []*types.Log) event.Subscr
return fb.bc.SubscribeLogsEvent(ch)
}
func (fb *filterBackend) SubscribeStateDiffs(ch chan<- map[common.Address]state.Account) event.Subscription {
Yeah, you got it. I think we want to (and do) use the
EthApiBackend
with a subscription to a full (non-fast-syncing) node, where all transactions are applied/verified.@ -368,6 +399,10 @@ func (es *EventSystem) broadcast(filters filterIndex, ev interface{}) {
}
Yeah just following the pattern here, seems like every subscription creates empty channels for all the things it's not subscribing to.
@ -29,6 +29,7 @@ import (
"github.com/ethereum/go-ethereum/common"
I think it's designed to provide a public API for the blockchain's internal subscriptions, but one big question I've had is how functions in this namespace (like
SubscribeNewHeads
) are actually called - will definitely need to sort that out before we can upstreamThanks for taking a look @elizabethengelman - and sorry for my novella-level responses! 😂
The basic idea is that we could indeed support a subscription in the API without needing to configure via special CLI flags, and I'd like to have a storage diffs PR ready to roll in short order after this submitted. Hoping that will just be adding a function similar to
GetDirtyAccounts
(or returning more data fromstatedb.Commit
if we end up deriving the data from there - an idea I'm inclined to pursue).@ -29,6 +29,7 @@ import (
"github.com/ethereum/go-ethereum/common"
It looks like the filters api is being loaded under the "eth" namespace here. So I think you would call it like:
or use the EthSubscribe method since it is under that namespace.
Hey @rmulhol thanks for the quick response! And I really like this approach, it is much cleaner than the separate service and I think much more likely to be accepted into the main branch. My only concern is that in order to recapitulate the statediff service in full we end up adding in more complexity with this approach than with the service, that concern is based on the assumption that we create a separate subscription for storage diffs and then would also need to create a separate subscription for the rest of the block data (I believe there is already a subscription for headers and transactions but there is not one for receipts), which would mean upwards of 5 subscriptions to recapitulate the single subscription. And then splitting the data up across subscriptions also means additional checks on the seed node's end to match the data back together again.
@i-norden that makes sense. I think for the interim we should probably setup the seed node with your fork, and then see what we can get upstreamed to determine whether we eventually want to make the switch. If we were going to get there incrementally, I would think it would entail having just one or a small number of subscriptions that included all the things. So like next up I was thinking would be storage diffs, and I'm hoping it'll be possible to make one subscription that's state and storage diffs filtered by address
Pull request closed