WIP: add rpc pubsub interface for account diffs #11

Closed
rmulhol wants to merge 2 commits from pubsub-rpc-account-diffing into master
rmulhol commented 2019-09-11 21:10:17 +00:00 (Migrated from github.com)

Looking for early feedback on this very early stage WIP PR.

The idea is to have a minimal rpc pubsub interface for emitting account diffs, so that we could try to incrementally upstream things like storage diffs, proofs, etc.

I structured the approach based on how I understand the logs pubsub rpc interface to be working - basically the BlockChain collects all logs created during block processing and posts them to the EventSystem, which is responsible for filtering down to relevant logs and notifying subscribers.

Still need to figure out how the eth/filters/api.go functions are called, which should enable testing this out locally. After that I'd like to add in some metadata (block number/hash) and enable clients to filter down to specific watched accounts.

Looking for early feedback on this very early stage WIP PR. The idea is to have a minimal rpc pubsub interface for emitting account diffs, so that we could try to incrementally upstream things like storage diffs, proofs, etc. I structured the approach based on how I understand the logs pubsub rpc interface to be working - basically the `BlockChain` collects all logs created during block processing and posts them to the `EventSystem`, which is responsible for filtering down to relevant logs and notifying subscribers. Still need to figure out how the `eth/filters/api.go` functions are called, which should enable testing this out locally. After that I'd like to add in some metadata (block number/hash) and enable clients to filter down to specific watched accounts.
i-norden reviewed 2019-09-11 21:10:17 +00:00
Member

Haven't had a chance to review in detail but some initial questions: I assume this means we are scrapping the statediff service? How does this fit with the seed node which needs headers, transactions, receipts, state and storage diffs and to maintain relation between all the data?

Haven't had a chance to review in detail but some initial questions: I assume this means we are scrapping the statediff service? How does this fit with the seed node which needs headers, transactions, receipts, state and storage diffs and to maintain relation between all the data?
rmulhol commented 2019-09-12 01:11:40 +00:00 (Migrated from github.com)

Haven't had a chance to review in detail but some initial questions: I assume this means we are scrapping the statediff service? How does this fit with the seed node which needs headers, transactions, receipts, state and storage diffs and to maintain relation between all the data?

Hey @i-norden - not at all! If anything, I'd say we're more likely to scrap this 🙂

My main motivation with this PR is just to see if we might be able to reduce the diff and stage the PRs incrementally, with the ultimate goal of merging in all of your work. The approach here is a little bit different than the statediffing service only because this seemed like the absolute minimum needed if all we care about are account diffs.

That said I haven't had a chance to run this code or write tests, so I was hoping to pick your and Elizabeth's brain about whether it looked like this approach could plausibly bear fruit.

Definitely no worries if you don't have the time to give this a thorough review, but I'd appreciate any thoughts you can spare!

> Haven't had a chance to review in detail but some initial questions: I assume this means we are scrapping the statediff service? How does this fit with the seed node which needs headers, transactions, receipts, state and storage diffs and to maintain relation between all the data? Hey @i-norden - not at all! If anything, I'd say we're more likely to scrap this 🙂 My main motivation with this PR is just to see if we might be able to reduce the diff and stage the PRs incrementally, with the ultimate goal of merging in all of your work. The approach here is a little bit different than the statediffing service only because this seemed like the absolute minimum needed if all we care about are account diffs. That said I haven't had a chance to run this code or write tests, so I was hoping to pick your and Elizabeth's brain about whether it looked like this approach could plausibly bear fruit. Definitely no worries if you don't have the time to give this a thorough review, but I'd appreciate any thoughts you can spare!
elizabethengelman (Migrated from github.com) reviewed 2019-09-12 15:25:30 +00:00
elizabethengelman (Migrated from github.com) left a comment

Overall this seems like it could be a cool way to implement this. A couple of questions I have:

  • Instead of subscribing to chain events in the state diff loop, is the idea that we could instead subscribe to new state diffs, and use that as an indication that we should compute storage diffs?
  • I like the idea of breaking the state diff service up into smaller PRs to open upstream - hopefully that'll help the geth maintainers review them a bit easer. But, the one thing that I'm thinking is I wonder if this functionality (subscribing to state diffs) can/should stand on it's own, or if we should also include the storage diff computation in this PR as well. And now that I've typed that out, I think I've convinced myself that the storage diff piece could be a separate PR - being able to subscribe to changed accounts seems really valuable in and of itself. I guess I'd love to open a state diff PR and a storage diff PR in quick succession to add some weight to their utility. Also we need them both for mcd work. :)
Overall this seems like it could be a cool way to implement this. A couple of questions I have: - Instead of [subscribing to chain events in the state diff loop](https://github.com/vulcanize/go-ethereum/pull/7/files#diff-1e744335f1fa6400446dda18a4222dd1R109), is the idea that we could instead subscribe to new state diffs, and use that as an indication that we should compute storage diffs? - I like the idea of breaking the state diff service up into smaller PRs to open upstream - hopefully that'll help the geth maintainers review them a bit easer. But, the one thing that I'm thinking is I wonder if this functionality (subscribing to state diffs) can/should stand on it's own, or if we should also include the storage diff computation in this PR as well. And now that I've typed that out, I think I've convinced myself that the storage diff piece could be a separate PR - being able to subscribe to changed accounts seems really valuable in and of itself. I guess I'd love to open a state diff PR and a storage diff PR in quick succession to add some weight to their utility. Also we need them both for mcd work. :)
@ -517,6 +517,10 @@ func (fb *filterBackend) SubscribeLogsEvent(ch chan<- []*types.Log) event.Subscr
return fb.bc.SubscribeLogsEvent(ch)
}
func (fb *filterBackend) SubscribeStateDiffs(ch chan<- map[common.Address]state.Account) event.Subscription {
elizabethengelman (Migrated from github.com) commented 2019-09-12 14:26:02 +00:00

Is the reason that we're adding SubscribeStateDiffs to a bunch of places is because they all implement the Backend interface?

It's confusing to know which backend we'll actually want to use for VDB. 🤔

Is the reason that we're adding `SubscribeStateDiffs` to a bunch of places is because they all implement the Backend interface? It's confusing to know which backend we'll actually want to use for VDB. 🤔
elizabethengelman (Migrated from github.com) commented 2019-09-12 13:44:02 +00:00

Not sure if I'm reading this method correctly, but it seems like everything above this point would end up returning an empty collection of stateDiffs, is that correct?

Not sure if I'm reading this method correctly, but it seems like everything above this point would end up returning an empty collection of stateDiffs, is that correct?
elizabethengelman (Migrated from github.com) commented 2019-09-12 14:08:43 +00:00

I think that this way of determining state diffs is really cool, and seems like it could be more efficient than using the trie iterator. Though at the moment I'm a bit weary about using this over the trie iterator, because I'm worried we may missing something. This worry is almost entirely because I don't fully understand it yet, but I wonder if there's a way we could compare the dirty state diffs this returns, vs what the trie iterator returns to make sure the results are the same.

Also, is GetDirtyAccounts called before or after the block is finalized? I wonder if that could have any implications on what is considered a dirty account.

I think that this way of determining state diffs is really cool, and seems like it could be more efficient than using the trie iterator. Though at the moment I'm a bit weary about using this over the trie iterator, because I'm worried we may missing something. This worry is almost entirely because I don't fully understand it yet, but I wonder if there's a way we could compare the dirty state diffs this returns, vs what the trie iterator returns to make sure the results are the same. Also, is `GetDirtyAccounts` called before or after the block is finalized? I wonder if that could have any implications on what is considered a dirty account.
@ -29,6 +29,7 @@ import (
"github.com/ethereum/go-ethereum/common"
elizabethengelman (Migrated from github.com) commented 2019-09-12 15:17:15 +00:00

I'm not sure I understand what the filters package is for - is it just another way to subscribe to new events in the chain? I think I've previously only interacted with new event subscriptions directly from the blockchain package.

I'm not sure I understand what the `filters` package is for - is it just another way to subscribe to new events in the chain? I think I've previously only interacted with new event subscriptions directly from the `blockchain` package.
@ -368,6 +399,10 @@ func (es *EventSystem) broadcast(filters filterIndex, ev interface{}) {
}
elizabethengelman (Migrated from github.com) commented 2019-09-12 14:43:08 +00:00

How come we are adding stateDiffs to the subscribeLogs? Is it because it's in the subscription struct, but it can't be nil so we're adding an empty map?

How come we are adding stateDiffs to the subscribeLogs? Is it because it's in the subscription struct, but it can't be nil so we're adding an empty map?
rmulhol (Migrated from github.com) reviewed 2019-09-12 15:55:08 +00:00
rmulhol (Migrated from github.com) commented 2019-09-12 15:55:07 +00:00

yeah that's accurate - I'm trying to follow the pattern that's used for logs, where the code returns the empty coalescedLogs until it reaches the line directly above this one where logs are appended to the result set.

A bit of context, basically this entire approach is designed to mimic my understanding of how the logs subscription works. Walking through it, my understanding is that:

So, big picture, my goal was to do the same thing - hook into the state processor to return all modified accounts, and enable the rpc subscription to apply arbitrary filters.

yeah that's accurate - I'm trying to follow the pattern that's used for logs, where the code returns the empty `coalescedLogs` until it reaches the line directly above this one where `logs` are appended to the result set. A bit of context, basically this entire approach is designed to mimic my understanding of how the logs subscription works. Walking through it, my understanding is that: - the event system in eth/filters/filter_system.go [subscribes to log events](https://github.com/vulcanize/go-ethereum/blob/master/eth/filters/filter_system.go#L134) - the eth api backend in that receives that call [creates a subscription on the blockchain](https://github.com/vulcanize/go-ethereum/blob/master/eth/api_backend.go#L159) - the blockchain [adds the subscription to its internal logs feed](https://github.com/vulcanize/go-ethereum/blob/master/core/blockchain.go#L2210) - that logs feed is [sent data](https://github.com/vulcanize/go-ethereum/blob/master/core/blockchain.go#L2009) when chain events are posted - [chain events are posted](https://github.com/vulcanize/go-ethereum/blob/master/core/blockchain.go#L1443) after data is inserted to the chain - inserting data into the chain and returning logs involves executing [the blockchain's state processor](https://github.com/vulcanize/go-ethereum/blob/master/core/blockchain.go#L1624) - the state processor that yields the logs is responsible for applying all transactions and [finalizing the block](https://github.com/vulcanize/go-ethereum/blob/master/core/state_processor.go#L79) - after the filter system receives _all_ logs, it is responsible for [filtering down to logs matching the subscription](https://github.com/vulcanize/go-ethereum/blob/master/eth/filters/filter_system.go#L329) So, big picture, my goal was to do the same thing - hook into the state processor to return all modified accounts, and enable the rpc subscription to apply arbitrary filters.
rmulhol (Migrated from github.com) reviewed 2019-09-12 16:19:27 +00:00
rmulhol (Migrated from github.com) commented 2019-09-12 16:19:26 +00:00

Definitely down for comparing results to make sure we're not missing anything 👍

I think that this approach should be at least as reliable as a pubsub rpc subscription to filtered logs, but am not sure whether that meets our requirements (have heard anecdotes about that subscription failing to return desired data).

In terms of efficiency, I do think there's some additional cost to the sync process in iterating through all the modified accounts again (something that also happens when state changes are committed after block processing), so maybe it's a good idea to hook into that call and derive modified accounts from there? 🤔

Right now, GetDirtyAccounts is called after the block is finalized but before the state is committed (since committing clears the cache). I think all accounts involved in a transaction should be marked as dirty as transactions are applied, but that's a little bit hard to trace so I'll see if I can dig in and document that thoroughly.

Definitely down for comparing results to make sure we're not missing anything 👍 I think that this approach should be at least as reliable as a pubsub rpc subscription to filtered logs, but am not sure whether that meets our requirements (have heard anecdotes about that subscription failing to return desired data). In terms of efficiency, I do think there's some additional cost to the sync process in iterating through all the modified accounts again (something that also happens when state changes are [committed](https://github.com/vulcanize/go-ethereum/blob/master/core/state/statedb.go#L696) after block processing), so maybe it's a good idea to hook into that call and derive modified accounts from there? 🤔 Right now, `GetDirtyAccounts` is called after [the block is finalized](https://github.com/vulcanize/go-ethereum/blob/9b0042f9253dc6ee60c4bd5b444d44a9506bba5d/core/state_processor.go#L79) but before the state is committed (since committing clears the cache). I _think_ all accounts involved in a transaction should be marked as dirty as transactions are applied, but that's a little bit hard to trace so I'll see if I can dig in and document that thoroughly.
rmulhol (Migrated from github.com) reviewed 2019-09-12 16:21:28 +00:00
@ -517,6 +517,10 @@ func (fb *filterBackend) SubscribeLogsEvent(ch chan<- []*types.Log) event.Subscr
return fb.bc.SubscribeLogsEvent(ch)
}
func (fb *filterBackend) SubscribeStateDiffs(ch chan<- map[common.Address]state.Account) event.Subscription {
rmulhol (Migrated from github.com) commented 2019-09-12 16:21:28 +00:00

Yeah, you got it. I think we want to (and do) use the EthApiBackend with a subscription to a full (non-fast-syncing) node, where all transactions are applied/verified.

Yeah, you got it. I think we want to (and do) use the `EthApiBackend` with a subscription to a full (non-fast-syncing) node, where all transactions are applied/verified.
rmulhol (Migrated from github.com) reviewed 2019-09-12 16:22:06 +00:00
@ -368,6 +399,10 @@ func (es *EventSystem) broadcast(filters filterIndex, ev interface{}) {
}
rmulhol (Migrated from github.com) commented 2019-09-12 16:22:06 +00:00

Yeah just following the pattern here, seems like every subscription creates empty channels for all the things it's not subscribing to.

Yeah just following the pattern here, seems like every subscription creates empty channels for all the things it's not subscribing to.
rmulhol (Migrated from github.com) reviewed 2019-09-12 16:23:30 +00:00
@ -29,6 +29,7 @@ import (
"github.com/ethereum/go-ethereum/common"
rmulhol (Migrated from github.com) commented 2019-09-12 16:23:29 +00:00

I think it's designed to provide a public API for the blockchain's internal subscriptions, but one big question I've had is how functions in this namespace (like SubscribeNewHeads) are actually called - will definitely need to sort that out before we can upstream

I think it's designed to provide a public API for the blockchain's internal subscriptions, but one big question I've had is how functions in this namespace (like `SubscribeNewHeads`) are actually called - will definitely need to sort that out before we can upstream
rmulhol commented 2019-09-12 16:26:30 +00:00 (Migrated from github.com)

Thanks for taking a look @elizabethengelman - and sorry for my novella-level responses! 😂

The basic idea is that we could indeed support a subscription in the API without needing to configure via special CLI flags, and I'd like to have a storage diffs PR ready to roll in short order after this submitted. Hoping that will just be adding a function similar to GetDirtyAccounts (or returning more data from statedb.Commit if we end up deriving the data from there - an idea I'm inclined to pursue).

Thanks for taking a look @elizabethengelman - and sorry for my novella-level responses! 😂 The basic idea is that we could indeed support a subscription in the API without needing to configure via special CLI flags, and I'd like to have a storage diffs PR ready to roll in short order after this submitted. Hoping that will just be adding a function similar to `GetDirtyAccounts` (or returning more data from `statedb.Commit` if we end up deriving the data from there - an idea I'm inclined to pursue).
i-norden reviewed 2019-09-12 17:57:55 +00:00
@ -29,6 +29,7 @@ import (
"github.com/ethereum/go-ethereum/common"
Member

It looks like the filters api is being loaded under the "eth" namespace here. So I think you would call it like:

cli, _ := rpc.Dial("ipcPathOrWsURL")
stateDiffChan := make(chan map[common.Address]state.Account, 20000)
rpcSub, err := cli.Subscribe(context.Background(), "eth", stateDiffChan, "NewStateDiffs"})

or use the EthSubscribe method since it is under that namespace.

It looks like the filters api is being loaded under the "eth" namespace [here](https://github.com/vulcanize/go-ethereum/blob/pubsub-rpc-account-diffing/eth/backend.go#L316). So I think you would call it like: ```go cli, _ := rpc.Dial("ipcPathOrWsURL") stateDiffChan := make(chan map[common.Address]state.Account, 20000) rpcSub, err := cli.Subscribe(context.Background(), "eth", stateDiffChan, "NewStateDiffs"}) ``` or use the [EthSubscribe](https://github.com/vulcanize/go-ethereum/blob/pubsub-rpc-account-diffing/rpc/client.go#L397) method since it is under that namespace.
Member

Hey @rmulhol thanks for the quick response! And I really like this approach, it is much cleaner than the separate service and I think much more likely to be accepted into the main branch. My only concern is that in order to recapitulate the statediff service in full we end up adding in more complexity with this approach than with the service, that concern is based on the assumption that we create a separate subscription for storage diffs and then would also need to create a separate subscription for the rest of the block data (I believe there is already a subscription for headers and transactions but there is not one for receipts), which would mean upwards of 5 subscriptions to recapitulate the single subscription. And then splitting the data up across subscriptions also means additional checks on the seed node's end to match the data back together again.

Hey @rmulhol thanks for the quick response! And I really like this approach, it is much cleaner than the separate service and I think much more likely to be accepted into the main branch. My only concern is that in order to recapitulate the statediff service in full we end up adding in more complexity with this approach than with the service, that concern is based on the assumption that we create a separate subscription for storage diffs and then would also need to create a separate subscription for the rest of the block data (I believe there is already a subscription for headers and transactions but there is not one for receipts), which would mean upwards of 5 subscriptions to recapitulate the single subscription. And then splitting the data up across subscriptions also means additional checks on the seed node's end to match the data back together again.
rmulhol commented 2019-09-12 18:19:10 +00:00 (Migrated from github.com)

@i-norden that makes sense. I think for the interim we should probably setup the seed node with your fork, and then see what we can get upstreamed to determine whether we eventually want to make the switch. If we were going to get there incrementally, I would think it would entail having just one or a small number of subscriptions that included all the things. So like next up I was thinking would be storage diffs, and I'm hoping it'll be possible to make one subscription that's state and storage diffs filtered by address

@i-norden that makes sense. I think for the interim we should probably setup the seed node with your fork, and then see what we can get upstreamed to determine whether we eventually want to make the switch. If we were going to get there incrementally, I would think it would entail having just one or a small number of subscriptions that included all the things. So like next up I was thinking would be storage diffs, and I'm hoping it'll be possible to make one subscription that's state and storage diffs filtered by address

Pull request closed

Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: cerc-io/go-ethereum#11
No description provided.