service documentation
This commit is contained in:
parent
d9d05695e6
commit
1328dad81d
@ -32,7 +32,6 @@ import (
|
||||
sdtypes "github.com/ethereum/go-ethereum/statediff/types"
|
||||
)
|
||||
|
||||
// TODO: add test that filters on address
|
||||
var (
|
||||
contractLeafKey []byte
|
||||
emptyDiffs = make([]sdtypes.StateNode, 0)
|
||||
|
@ -1,57 +0,0 @@
|
||||
// Copyright 2019 The go-ethereum Authors
|
||||
// This file is part of the go-ethereum library.
|
||||
//
|
||||
// The go-ethereum library is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Lesser General Public License as published by
|
||||
// the Free Software Foundation, either version 3 of the License, or
|
||||
// (at your option) any later version.
|
||||
//
|
||||
// The go-ethereum library is distributed in the hope that it will be useful,
|
||||
// but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
// GNU Lesser General Public License for more details.
|
||||
//
|
||||
// You should have received a copy of the GNU Lesser General Public License
|
||||
// along with the go-ethereum library. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
/*
|
||||
Package statediff provides an auxiliary service that processes state diff objects from incoming chain events,
|
||||
relaying the objects to any rpc subscriptions.
|
||||
|
||||
This work is adapted from work by Charles Crain at https://github.com/jpmorganchase/quorum/blob/9b7fd9af8082795eeeb6863d9746f12b82dd5078/statediff/statediff.go
|
||||
|
||||
The service is spun up using the below CLI flags
|
||||
--statediff: boolean flag, turns on the service
|
||||
--statediff.streamblock: boolean flag, configures the service to associate and stream out the rest of the block data with the state diffs.
|
||||
--statediff.intermediatenodes: boolean flag, tells service to include intermediate (branch and extension) nodes; default (false) processes leaf nodes only.
|
||||
--statediff.watchedaddresses: string slice flag, used to limit the state diffing process to the given addresses. Usage: --statediff.watchedaddresses=addr1 --statediff.watchedaddresses=addr2 --statediff.watchedaddresses=addr3
|
||||
|
||||
If you wish to use the websocket endpoint to subscribe to the statediff service, be sure to open up the Websocket RPC server with the `--ws` flag. The IPC-RPC server is turned on by default.
|
||||
|
||||
The statediffing services works only with `--syncmode="full", but -importantly- does not require garbage collection to be turned off (does not require an archival node).
|
||||
|
||||
e.g.
|
||||
|
||||
$ ./geth --statediff --statediff.streamblock --ws --syncmode "full"
|
||||
|
||||
This starts up the geth node in full sync mode, starts up the statediffing service, and opens up the websocket endpoint to subscribe to the service.
|
||||
Because the "streamblock" flag has been turned on, the service will strean out block data (headers, transactions, and receipts) along with the diffed state and storage leafs.
|
||||
|
||||
Rpc subscriptions to the service can be created using the rpc.Client.Subscribe() method,
|
||||
with the "statediff" namespace, a statediff.Payload channel, and the name of the statediff api's rpc method- "stream".
|
||||
|
||||
e.g.
|
||||
|
||||
cli, _ := rpc.Dial("ipcPathOrWsURL")
|
||||
stateDiffPayloadChan := make(chan statediff.Payload, 20000)
|
||||
rpcSub, err := cli.Subscribe(context.Background(), "statediff", stateDiffPayloadChan, "stream"})
|
||||
for {
|
||||
select {
|
||||
case stateDiffPayload := <- stateDiffPayloadChan:
|
||||
processPayload(stateDiffPayload)
|
||||
case err := <- rpcSub.Err():
|
||||
log.Error(err)
|
||||
}
|
||||
}
|
||||
*/
|
||||
package statediff
|
207
statediff/doc.md
Normal file
207
statediff/doc.md
Normal file
@ -0,0 +1,207 @@
|
||||
This package provides an auxiliary service that asynchronously processes state diff objects from chain events,
|
||||
either relaying the state objects to rpc subscribers or writing them directly to Postgres.
|
||||
|
||||
It also exposes RPC endpoints for fetching or writing to Postgres the state diff `StateObject` at a specific block height
|
||||
or for a specific block hash, this operates on historic block and state data and so is dependent on having a complete state archive.
|
||||
|
||||
# Statediff Object
|
||||
A state diff `StateObject` is the collection of all the state and storage trie nodes that have been updated in a given block.
|
||||
For convenience, we also associate these nodes with the block number and hash, and optionally the set of code hashes and code for any
|
||||
contracts deployed in this block.
|
||||
|
||||
A complete state diff `StateObject` will include all state and storage intermediate nodes, which is necessary for generating proofs and for
|
||||
traversing the tries.
|
||||
|
||||
```go
|
||||
// StateObject is a collection of state (and linked storage nodes) as well as the associated block number, block hash,
|
||||
// and a set of code hashes and their code
|
||||
type StateObject struct {
|
||||
BlockNumber *big.Int `json:"blockNumber" gencodec:"required"`
|
||||
BlockHash common.Hash `json:"blockHash" gencodec:"required"`
|
||||
Nodes []StateNode `json:"nodes" gencodec:"required"`
|
||||
CodeAndCodeHashes []CodeAndCodeHash `json:"codeMapping"`
|
||||
}
|
||||
|
||||
// StateNode holds the data for a single state diff node
|
||||
type StateNode struct {
|
||||
NodeType NodeType `json:"nodeType" gencodec:"required"`
|
||||
Path []byte `json:"path" gencodec:"required"`
|
||||
NodeValue []byte `json:"value" gencodec:"required"`
|
||||
StorageNodes []StorageNode `json:"storage"`
|
||||
LeafKey []byte `json:"leafKey"`
|
||||
}
|
||||
|
||||
// StorageNode holds the data for a single storage diff node
|
||||
type StorageNode struct {
|
||||
NodeType NodeType `json:"nodeType" gencodec:"required"`
|
||||
Path []byte `json:"path" gencodec:"required"`
|
||||
NodeValue []byte `json:"value" gencodec:"required"`
|
||||
LeafKey []byte `json:"leafKey"`
|
||||
}
|
||||
|
||||
// CodeAndCodeHash struct for holding codehash => code mappings
|
||||
// we can't use an actual map because they are not rlp serializable
|
||||
type CodeAndCodeHash struct {
|
||||
Hash common.Hash `json:"codeHash"`
|
||||
Code []byte `json:"code"`
|
||||
}
|
||||
```
|
||||
These objects are packed into a `Payload` structure which additionally associates the StateObject
|
||||
with the block (header, uncles, and transactions), receipts, and total difficulty.
|
||||
This `Payload` encapsulates all the block and state data at a given block, and allows us to index the entire Ethereum data structure
|
||||
as hash-linked IPLD objects.
|
||||
|
||||
```go
|
||||
// Payload packages the data to send to state diff subscriptions
|
||||
type Payload struct {
|
||||
BlockRlp []byte `json:"blockRlp"`
|
||||
TotalDifficulty *big.Int `json:"totalDifficulty"`
|
||||
ReceiptsRlp []byte `json:"receiptsRlp"`
|
||||
StateObjectRlp []byte `json:"stateObjectRlp" gencodec:"required"`
|
||||
|
||||
encoded []byte
|
||||
err error
|
||||
}
|
||||
```
|
||||
|
||||
# Usage
|
||||
This state diffing service runs as an auxiliary service concurrent to the regular syncing process of the geth node.
|
||||
|
||||
|
||||
## CLI configuration
|
||||
This service introduces a CLI flag namespace `statediff`
|
||||
|
||||
`--statediff` flag is used to turn on the service
|
||||
`--statediff.writing` is used to tell the service to write state diff objects it produces from synced ChainEvents directly to a configured Postgres database
|
||||
`--statediff.db` is the connection string for the Postgres database to write to
|
||||
`--statediff.dbnodeid` is the node id to use in the Postgres database
|
||||
`--statediff.dbclientname` is the client name to use in the Postgres database
|
||||
|
||||
The service can only operate in full sync mode (`--syncmode=full`), but only the historical RPC endpoints require an archive node (`--gcmode=archive`)
|
||||
|
||||
e.g.
|
||||
`
|
||||
./build/bin/geth --syncmode=full --gcmode=archive --statediff --statediff.writing --statediff.db=postgres://localhost:5432/vulcanize_testing?sslmode=disable --statediff.dbnodeid={nodeId} --statediff.dbclientname={dbClientName}
|
||||
`
|
||||
|
||||
## RPC endpoints
|
||||
The state diffing service exposes both a WS subscription endpoint, and a number of HTTP unary endpoints.
|
||||
|
||||
Each of these endpoints requires a set of parameters provided by the caller
|
||||
|
||||
```go
|
||||
// Params is used to carry in parameters from subscribing/requesting clients configuration
|
||||
type Params struct {
|
||||
IntermediateStateNodes bool
|
||||
IntermediateStorageNodes bool
|
||||
IncludeBlock bool
|
||||
IncludeReceipts bool
|
||||
IncludeTD bool
|
||||
IncludeCode bool
|
||||
WatchedAddresses []common.Address
|
||||
WatchedStorageSlots []common.Hash
|
||||
}
|
||||
```
|
||||
|
||||
Using these params we can tell the service whether to include state and/or storage intermediate nodes; whether
|
||||
to include the associated block (header, uncles, and transactions); whether to include the associated receipts;
|
||||
whether to include the total difficult for this block; whether to include the set of code hashes and code for
|
||||
contracts deployed in this block; whether to limit the diffing process to a list of specific addresses; and/or
|
||||
whether to limit the diffing process to a list of specific storage slot keys.
|
||||
|
||||
### Subscription endpoint
|
||||
A websocket supporting RPC endpoint is exposed for subscribing to state diff `StateObjects` that come off the head of the chain while the geth node syncs.
|
||||
|
||||
```go
|
||||
// Stream is a subscription endpoint that fires off state diff payloads as they are created
|
||||
Stream(ctx context.Context, params Params) (*rpc.Subscription, error)
|
||||
```
|
||||
|
||||
To expose this endpoint the node needs to have the websocket server turned on (`--ws`),
|
||||
and the `statediff` namespace exposed (`--ws.api=statediff`).
|
||||
|
||||
Go code subscriptions to this endpoint can be created using the `rpc.Client.Subscribe()` method,
|
||||
with the "statediff" namespace, a `statediff.Payload` channel, and the name of the statediff api's rpc method: "stream".
|
||||
|
||||
e.g.
|
||||
|
||||
```go
|
||||
|
||||
cli, err := rpc.Dial("ipcPathOrWsURL")
|
||||
if err != nil {
|
||||
// handle error
|
||||
}
|
||||
stateDiffPayloadChan := make(chan statediff.Payload, 20000)
|
||||
methodName := "stream"
|
||||
params := statediff.Params{
|
||||
IncludeBlock: true,
|
||||
IncludeTD: true,
|
||||
IncludeReceipts: true,
|
||||
IntermediateStorageNodes: true,
|
||||
IntermediateStateNodes: true,
|
||||
}
|
||||
rpcSub, err := cli.Subscribe(context.Background(), statediff.APIName, stateDiffPayloadChan, methodName, params)
|
||||
if err != nil {
|
||||
// handle error
|
||||
}
|
||||
for {
|
||||
select {
|
||||
case stateDiffPayload := <- stateDiffPayloadChan:
|
||||
// process the payload
|
||||
case err := <- rpcSub.Err():
|
||||
// handle rpc subscription error
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Unary endpoints
|
||||
The service also exposes unary RPC endpoints for retrieving the state diff `StateObject` for a specific block height/hash.
|
||||
```go
|
||||
// StateDiffAt returns a state diff payload at the specific blockheight
|
||||
StateDiffAt(ctx context.Context, blockNumber uint64, params Params) (*Payload, error)
|
||||
|
||||
// StateDiffFor returns a state diff payload for the specific blockhash
|
||||
StateDiffFor(ctx context.Context, blockHash common.Hash, params Params) (*Payload, error)
|
||||
```
|
||||
|
||||
To expose this endpoint the node needs to have the HTTP server turned on (`--http`),
|
||||
and the `statediff` namespace exposed (`--http.api=statediff`).
|
||||
|
||||
## Direct indexing into Postgres
|
||||
If `--statediff.writing` is set, the service will convert the state diff `StateObject` data into IPLD objects, persist them directly to Postgres,
|
||||
and generate secondary indexes around the IPLD data.
|
||||
|
||||
The schema and migrations for this Postgres database are provided in `statediff/db/`.
|
||||
|
||||
### Postgres setup
|
||||
We use [pressly/goose](https://github.com/pressly/goose) as our Postgres migration manager.
|
||||
You can also load the Postgres schema directly into a database using
|
||||
|
||||
`psql database_name < schema.sql`
|
||||
|
||||
This will only work on a version 12.4 Postgres database.
|
||||
|
||||
### Schema overview
|
||||
Our Postgres schemas are built around a single IPFS backing Postgres IPLD blockstore table (`public.blocks`) that conforms with [go-ds-sql](https://github.com/ipfs/go-ds-sql/blob/master/postgres/postgres.go).
|
||||
All IPLD objects are stored in this table, where `key` is blockstore-prefixed multihash key for the IPLD object and `data` contains
|
||||
the bytes for the IPLD block (in the case of all Ethereum IPLDs, this is the RLP byte encoding of the Ethereum object).
|
||||
|
||||
The IPLD objects in this table can be traversed using an IPLD DAG interface, but since this table only maps multihash to raw IPLD object
|
||||
it is not particularly useful for searching through the data or and does not allow us to look up Ethereum objects by their constituent fields
|
||||
(e.g. by block number, tx source/recipient, state/storage trie node path). To improve the accessibility of these Ethereum IPLD objects
|
||||
we generate secondary indexes on top of the raw IPLDs in other Postgres tables. This collection of tables encapsulates an Ethereum [advanced data layout](https://github.com/ipld/specs#schemas-and-advanced-data-layouts) (ADL).
|
||||
|
||||
These secondary index tables fall under the `eth` schema and follow an `{objectType}_cids` naming convention.
|
||||
Each of these tables provides a view into the individual fields of the underlying Ethereum IPLD object and references the raw IPLD object stored in `public.blocks` by multihash foreign key.
|
||||
Additionally, these tables link up to their parent object tables. E.g. the `storage_cids` table contains a `state_id` foreign key which references the `id`
|
||||
for the `state_cids` entry that contains the state leaf node for the contract the storage node belongs to, and in turn that `state_cids` entry contains a `header_id`
|
||||
foreign key which references the `id` of the `header_cids` entry that contains the header for the block these state and storage nodes were updated (diffed).
|
||||
|
||||
## Optimization
|
||||
On mainnet this process is extremely IO intensive and requires significant resources to allow it to keep up with the head of the chain.
|
||||
The state diff processing time for a specific block is dependent on the number and complexity of the state changes that occur in a block and
|
||||
the number of updated state nodes that are available in the in-memory cache vs must be retrieved from disc.
|
||||
|
||||
If memory permits, one means of improving the efficiency of this process is to increase the trie cache allocation.
|
||||
This can be done by increasing the overall `--cache` allocation and/or by increasing the % of the cache allocated to trie
|
||||
usage with `--cache.trie`.
|
Loading…
Reference in New Issue
Block a user