398: Statediff missing parent blocks automatically. #399

telackey · 2023-07-13T21:45:14Z

telackey commented

2023-07-13 21:45:14 +00:00

Required DB updates: https://github.com/cerc-io/ipld-eth-db/pull/136
Related tests: cerc-io/system-tests#8

This fixes https://github.com/cerc-io/go-ethereum/issues/398 . After statediffing a block which we received from a ChainEvent, we now check if its parent exists in the index. If not, we statediff it as well, checking for its parent, etc. up to a maximum backfill limit.

This allows us to account for both skipped blocks because the chain was extended or missed blocks because the chain was reorganized.

To keep everything straight in the DB, we use a new 'canonical' boolean flag on the block in eth.header_cids. Whenever a new block is inserted in the DB, it is marked as canonical. If a new block is inserted at the same height later (eg, because of a reorg), any pre-existing blocks are marked as non-canonical at the same time.

Note: At the time, we only register with geth to be notified of ChainEvents, which are for canonical blocks. We do not listen for ChainSideEvents, which are for non-canonical blocks. So, as it stands, any block geth notifies us about is known to be canonical at the time of the notification.

This flag will allow for greatly simplified canonical checks in ipld-eth-server (see also https://github.com/cerc-io/ipld-eth-server/issues/251), eg:

SELECT * from eth.state_cids s, eth.header_cids h WHERE
   s.state_leaf_key = $1
   and s.block_number <= $2
   and s.block_number = h.block_number
   and s.header_id = h.block_hash
   and h.canonical = true
   ORDER BY s.block_number DESC
   LIMIT 1;

Required DB updates: https://github.com/cerc-io/ipld-eth-db/pull/136 Related tests: https://git.vdb.to/cerc-io/system-tests/pulls/8 This fixes https://github.com/cerc-io/go-ethereum/issues/398 . After statediffing a block which we received from a ChainEvent, we now check if its parent exists in the index. If not, we statediff it as well, checking for its parent, etc. up to a maximum backfill limit. This allows us to account for both skipped blocks because the chain was extended or missed blocks because the chain was reorganized. To keep everything straight in the DB, we use a new 'canonical' boolean flag on the block in eth.header_cids. Whenever a new block is inserted in the DB, it is marked as canonical. If a new block is inserted at the same height later (eg, because of a reorg), any pre-existing blocks are marked as non-canonical at the same time. > Note: At the time, we only register with geth to be notified of ChainEvents, which are for canonical blocks. We do not listen for ChainSideEvents, which are for non-canonical blocks. So, as it stands, any block geth notifies us about is known to be canonical at the time of the notification. This flag will allow for greatly simplified canonical checks in ipld-eth-server (see also https://github.com/cerc-io/ipld-eth-server/issues/251), eg: ``` SELECT * from eth.state_cids s, eth.header_cids h WHERE s.state_leaf_key = $1 and s.block_number <= $2 and s.block_number = h.block_number and s.header_id = h.block_hash and h.canonical = true ORDER BY s.block_number DESC LIMIT 1; ```

dboreham reviewed 2023-07-13 21:45:14 +00:00

roysc reviewed 2023-07-17 11:50:40 +00:00

roysc left a comment

We'll also need to handle updating canonical in the file indexer, but that's not really doable in the CSV writer. It will need to happen in the post-processing stage.

Actually I think it might be better to split the canonicity handling into a new PR and plan it out a bit more. The DB procedure that's currently used should still work with this update to the indexing.

We'll also need to handle updating `canonical` in the file indexer, but that's not really doable in the CSV writer. It will need to happen in the post-processing stage. Actually I think it might be better to split the canonicity handling into a new PR and plan it out a bit more. The DB procedure that's currently used should still work with this update to the indexing.

telackey reviewed 2023-07-17 18:19:37 +00:00

telackey commented

2023-07-17 19:49:22 +00:00

We'll also need to handle updating canonical in the file indexer, but that's not really doable in the CSV writer. It will need to happen in the post-processing stage.

Unless I am misunderstanding, this would only ever be an issue if tracking head and outputting to CSV, but is that something that we would ever want to do? Updating 'canonical' status is just one of many problems in that case, since CSV also has no way for us to detect gaps of any sort, even gaps caused by something as minor as restarting the process, nor can we see find skipped parents when the chain is extended, or any other such inconsistency that is bound to arise, leaving a pretty unreliable record.

CSV makes a lot of sense for bulk and offline processing (eg, eth-statediff-service) but it is much less clear how it would be useful as a real-time stream.

> We'll also need to handle updating `canonical` in the file indexer, but that's not really doable in the CSV writer. It will need to happen in the post-processing stage. Unless I am misunderstanding, this would only ever be an issue if tracking head and outputting to CSV, but is that something that we would ever want to do? Updating 'canonical' status is just one of many problems in that case, since CSV also has no way for us to detect gaps of any sort, even gaps caused by something as minor as restarting the process, nor can we see find skipped parents when the chain is extended, or any other such inconsistency that is bound to arise, leaving a pretty unreliable record. CSV makes a lot of sense for bulk and offline processing (eg, eth-statediff-service) but it is much less clear how it would be useful as a real-time stream.

telackey commented

2023-07-17 20:04:10 +00:00

Actually I think it might be better to split the canonicity handling into a new PR and plan it out a bit more.

The core issue of #398 is to ensure that the DB contents are valid and consistent when the chain is reorganized, and reflect the current state of the chain. That implies a determination about canonicity as well as completeness, so I don't think the concerns can be separated.

> Actually I think it might be better to split the canonicity handling into a new PR and plan it out a bit more. The core issue of #398 is to ensure that the DB contents are valid and consistent when the chain is reorganized, and reflect the current state of the chain. That implies a determination about canonicity as well as completeness, so I don't think the concerns can be separated.

roysc commented

2023-07-18 04:36:16 +00:00

CSV makes a lot of sense for bulk and offline processing (eg, eth-statediff-service) but it is much less clear how it would be useful as a real-time stream.

It was being used in production for performance reasons (Postgres proved to become overloaded while trying to track head), and last I heard it still was, but @i-norden or @srwadleigh could confirm.

Updating 'canonical' status is just one of many problems in that case, since CSV also has no way for us to detect gaps of any sort, even gaps caused by something as minor as restarting the process, nor can we see find skipped parents when the chain is extended, or any other such inconsistency that is bound to arise, leaving a pretty unreliable record.

This is true. I should have raised a concern earlier. We have no fallback mechanism for backfilling, so this needs to work. For detecting in-progress jobs I suppose we will at worst duplicate a few blocks.

> CSV makes a lot of sense for bulk and offline processing (eg, eth-statediff-service) but it is much less clear how it would be useful as a real-time stream. It was being used in production for performance reasons (Postgres proved to become overloaded while trying to track head), and last I heard it still was, but @i-norden or @srwadleigh could confirm. > Updating 'canonical' status is just one of many problems in that case, since CSV also has no way for us to detect gaps of any sort, even gaps caused by something as minor as restarting the process, nor can we see find skipped parents when the chain is extended, or any other such inconsistency that is bound to arise, leaving a pretty unreliable record. This is true. I should have raised a concern earlier. We have no fallback mechanism for backfilling, so this needs to work. For detecting in-progress jobs I suppose we will at worst duplicate a few blocks.

i-norden commented

2023-07-18 13:02:26 +00:00

It was being used in production for performance reasons (Postgres proved to become overloaded while trying to track head), and last I heard it still was, but @i-norden or @srwadleigh could confirm.

The COPYFROM mode is used while tracking head, not the CSV persisting mode. Currently we only use the CSV mode in eth-statediff-service for historical data.

> It was being used in production for performance reasons (Postgres proved to become overloaded while trying to track head), and last I heard it still was, but @i-norden or @srwadleigh could confirm. The COPYFROM mode is used while tracking head, not the CSV persisting mode. Currently we only use the CSV mode in eth-statediff-service for historical data.

roysc commented

2023-07-18 14:12:46 +00:00

I stand corrected, as discussed the only thing using file mode in production is eth-statediff-service so there's no problem here.

I stand corrected, as discussed the only thing using `file` mode in production is `eth-statediff-service` so there's no problem here.

i-norden approved these changes 2023-07-18 18:47:15 +00:00

i-norden left a comment

Forgot to approve- LGTM 👍

Sign in to join this conversation.

No reviewers

No Milestone

No project

No Assignees

4 Participants

Notifications

Due Date

The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: cerc-io/go-ethereum#399