Fix getLogs query (avoid repeated execution, separate tx, and improve performance) #250
Labels
No Label
bug
critical
documentation
duplicate
enhancement
Epic
good first issue
help wanted
Integration tests
invalid
question
v5
wontfix
Copied from Github
Kind/Breaking
Kind/Bug
Kind/Documentation
Kind/Enhancement
Kind/Feature
Kind/Security
Kind/Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Status
Abandoned
Status
Blocked
Status
Need More Info
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: cerc-io/ipld-eth-server#250
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "telackey/249"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Fix for #248 and #249.
This does a few things:
Support for "earliest", "latest", etc. block specifiers. This did not work before.
The default "fromBlock" is now "latest" to match geth.
The old code did a loop across fromBlock:toBlock, executing a query on each one inside a big tx (!). This does a single query, with no separate tx.
Add a max range limit (default 500) with an error message if the range is too broad (I used the same error message as cloudflare)
Simplify the query a good bit. This improved performance significantly in my manual testing adapted to v4. The old version, adapted to use a small range of 50 blocks took over 20s. The new version took about 700ms. A query across 500 blocks with 150K results takes less than 4s.
Some additional test cases added here: https://git.vdb.to/cerc-io/system-tests/src/branch/main/test/test_logs.py
LGTM once the config is added. 100 might be low (I imagine geth doesn't timeout at that range size). And while we are at it we should add the missing
canonical_header_hash
condition on thegetLogs
query.@ -137,0 +146,4 @@
AND eth.receipt_cids.block_number = eth.log_cids.block_number
AND eth.receipt_cids.header_id = eth.log_cids.header_id
AND eth.receipt_cids.tx_id = eth.log_cids.rct_id
AND eth.receipt_cids.tx_id = eth.transaction_cids.tx_hash
We should also add an
AND transaction_cids.header_id = (SELECT canonical_header_hash(transaction_cids.block_number))
statement to ensure only canonical results (based on internal weighing) are returned.@ -137,0 +146,4 @@
AND eth.receipt_cids.block_number = eth.log_cids.block_number
AND eth.receipt_cids.header_id = eth.log_cids.header_id
AND eth.receipt_cids.tx_id = eth.log_cids.rct_id
AND eth.receipt_cids.tx_id = eth.transaction_cids.tx_hash
Hmm, I need to examine that quite carefully, as it may have the side-effect of defeating the indexing.
@ -137,0 +146,4 @@
AND eth.receipt_cids.block_number = eth.log_cids.block_number
AND eth.receipt_cids.header_id = eth.log_cids.header_id
AND eth.receipt_cids.tx_id = eth.log_cids.rct_id
AND eth.receipt_cids.tx_id = eth.transaction_cids.tx_hash
Yes, in a quick test against carrion using a (fairly) narrow 500 block range it moved it from an already too slow 4s to whopping 67s.
It's not apples to apples because I had to tweak the query for v4, but the idea is similar.
This could be quite tricky. Also, is there any particular reason we are opening an explicit TX to do this query? Unless we are dead certain we need one, we should remove it.
@ -137,0 +146,4 @@
AND eth.receipt_cids.block_number = eth.log_cids.block_number
AND eth.receipt_cids.header_id = eth.log_cids.header_id
AND eth.receipt_cids.tx_id = eth.log_cids.rct_id
AND eth.receipt_cids.tx_id = eth.transaction_cids.tx_hash
We were originally opening a TX because the query was a hack that iterated over the range and repeated the same query used for a single blockheight for each blockheight. Now that that is fixed, getting rid of the tx sounds good to me. If
canonical_header_hash
is having that impact we need to return the higher level story of removing/preventing reorg/non-canonical data in the database.@ -137,0 +146,4 @@
AND eth.receipt_cids.block_number = eth.log_cids.block_number
AND eth.receipt_cids.header_id = eth.log_cids.header_id
AND eth.receipt_cids.tx_id = eth.log_cids.rct_id
AND eth.receipt_cids.tx_id = eth.transaction_cids.tx_hash
Curious what the performance imapact is on smaller queries like https://github.com/cerc-io/ipld-eth-server/blob/telackey/249/pkg/eth/sql.go#L98 and https://github.com/cerc-io/ipld-eth-server/blob/telackey/249/pkg/eth/sql.go#L162.
I rewrote the query to select the canonical hash of the blocks in specified range, then check for inclusion in that set. I also simplified the query structure a good bit. I had to use a v4 database for comparison, but using a small block range of 50 was running about 23s on the old query, 700ms on the new.
I also upped the limit to 500, as that is still decently quick for me (3 to 4s) on blocks 17,000,000 to 17,000,500 with about 150,727 results (not that I'm sure we would want to send that many...)