Fix getLogs query (avoid repeated execution, separate tx, and improve performance) #250

Merged
telackey merged 5 commits from telackey/249 into v5 2023-06-16 16:49:57 +00:00
Member

Fix for #248 and #249.

This does a few things:

  1. Support for "earliest", "latest", etc. block specifiers. This did not work before.

  2. The default "fromBlock" is now "latest" to match geth.

  3. The old code did a loop across fromBlock:toBlock, executing a query on each one inside a big tx (!). This does a single query, with no separate tx.

  4. Add a max range limit (default 500) with an error message if the range is too broad (I used the same error message as cloudflare)

  5. Simplify the query a good bit. This improved performance significantly in my manual testing adapted to v4. The old version, adapted to use a small range of 50 blocks took over 20s. The new version took about 700ms. A query across 500 blocks with 150K results takes less than 4s.

Some additional test cases added here: https://git.vdb.to/cerc-io/system-tests/src/branch/main/test/test_logs.py

Fix for #248 and #249. This does a few things: 1) Support for "earliest", "latest", etc. block specifiers. This did not work before. 2) The default "fromBlock" is now "latest" to match geth. 3) The old code did a loop across fromBlock:toBlock, executing a query on each one inside a big tx (!). This does a single query, with no separate tx. 4) Add a max range limit (default 500) with an error message if the range is too broad (I used the same error message as cloudflare) 5) Simplify the query a good bit. This improved performance significantly in my manual testing adapted to v4. The old version, adapted to use a small range of 50 blocks took over 20s. The new version took about 700ms. A query across 500 blocks with 150K results takes less than 4s. Some additional test cases added here: https://git.vdb.to/cerc-io/system-tests/src/branch/main/test/test_logs.py
dboreham reviewed 2023-06-15 00:31:33 +00:00
i-norden approved these changes 2023-06-15 00:46:26 +00:00
i-norden left a comment
Member

LGTM once the config is added. 100 might be low (I imagine geth doesn't timeout at that range size). And while we are at it we should add the missing canonical_header_hash condition on the getLogs query.

LGTM once the config is added. 100 might be low (I imagine geth doesn't timeout at that range size). And while we are at it we should add the missing `canonical_header_hash` condition on the `getLogs` query.
@ -137,0 +146,4 @@
AND eth.receipt_cids.block_number = eth.log_cids.block_number
AND eth.receipt_cids.header_id = eth.log_cids.header_id
AND eth.receipt_cids.tx_id = eth.log_cids.rct_id
AND eth.receipt_cids.tx_id = eth.transaction_cids.tx_hash
Member

We should also add an AND transaction_cids.header_id = (SELECT canonical_header_hash(transaction_cids.block_number)) statement to ensure only canonical results (based on internal weighing) are returned.

We should also add an `AND transaction_cids.header_id = (SELECT canonical_header_hash(transaction_cids.block_number))` statement to ensure only canonical results (based on internal weighing) are returned.
telackey reviewed 2023-06-15 03:14:11 +00:00
@ -137,0 +146,4 @@
AND eth.receipt_cids.block_number = eth.log_cids.block_number
AND eth.receipt_cids.header_id = eth.log_cids.header_id
AND eth.receipt_cids.tx_id = eth.log_cids.rct_id
AND eth.receipt_cids.tx_id = eth.transaction_cids.tx_hash
Author
Member

Hmm, I need to examine that quite carefully, as it may have the side-effect of defeating the indexing.

Hmm, I need to examine that quite carefully, as it may have the side-effect of defeating the indexing.
telackey reviewed 2023-06-15 03:40:56 +00:00
@ -137,0 +146,4 @@
AND eth.receipt_cids.block_number = eth.log_cids.block_number
AND eth.receipt_cids.header_id = eth.log_cids.header_id
AND eth.receipt_cids.tx_id = eth.log_cids.rct_id
AND eth.receipt_cids.tx_id = eth.transaction_cids.tx_hash
Author
Member

Yes, in a quick test against carrion using a (fairly) narrow 500 block range it moved it from an already too slow 4s to whopping 67s.

It's not apples to apples because I had to tweak the query for v4, but the idea is similar.

This could be quite tricky. Also, is there any particular reason we are opening an explicit TX to do this query? Unless we are dead certain we need one, we should remove it.

Yes, in a quick test against carrion using a (fairly) narrow 500 block range it moved it from an already too slow 4s to whopping 67s. It's not apples to apples because I had to tweak the query for v4, but the idea is similar. This could be quite tricky. Also, is there any particular reason we are opening an explicit TX to do this query? Unless we are dead certain we need one, we should remove it.
i-norden reviewed 2023-06-15 13:06:15 +00:00
@ -137,0 +146,4 @@
AND eth.receipt_cids.block_number = eth.log_cids.block_number
AND eth.receipt_cids.header_id = eth.log_cids.header_id
AND eth.receipt_cids.tx_id = eth.log_cids.rct_id
AND eth.receipt_cids.tx_id = eth.transaction_cids.tx_hash
Member

We were originally opening a TX because the query was a hack that iterated over the range and repeated the same query used for a single blockheight for each blockheight. Now that that is fixed, getting rid of the tx sounds good to me. If canonical_header_hash is having that impact we need to return the higher level story of removing/preventing reorg/non-canonical data in the database.

We were originally opening a TX because the query was a hack that iterated over the range and repeated the same query used for a single blockheight for each blockheight. Now that that is fixed, getting rid of the tx sounds good to me. If `canonical_header_hash` is having that impact we need to return the higher level story of removing/preventing reorg/non-canonical data in the database.
i-norden reviewed 2023-06-15 16:26:43 +00:00
@ -137,0 +146,4 @@
AND eth.receipt_cids.block_number = eth.log_cids.block_number
AND eth.receipt_cids.header_id = eth.log_cids.header_id
AND eth.receipt_cids.tx_id = eth.log_cids.rct_id
AND eth.receipt_cids.tx_id = eth.transaction_cids.tx_hash
Member
Curious what the performance imapact is on smaller queries like https://github.com/cerc-io/ipld-eth-server/blob/telackey/249/pkg/eth/sql.go#L98 and https://github.com/cerc-io/ipld-eth-server/blob/telackey/249/pkg/eth/sql.go#L162.
Author
Member

I rewrote the query to select the canonical hash of the blocks in specified range, then check for inclusion in that set. I also simplified the query structure a good bit. I had to use a v4 database for comparison, but using a small block range of 50 was running about 23s on the old query, 700ms on the new.

I rewrote the query to select the canonical hash of the blocks in specified range, then check for inclusion in that set. I also simplified the query structure a good bit. I had to use a v4 database for comparison, but using a small block range of 50 was running about 23s on the old query, 700ms on the new.
Author
Member

I also upped the limit to 500, as that is still decently quick for me (3 to 4s) on blocks 17,000,000 to 17,000,500 with about 150,727 results (not that I'm sure we would want to send that many...)

I also upped the limit to 500, as that is still decently quick for me (3 to 4s) on blocks 17,000,000 to 17,000,500 with about 150,727 results (not that I'm sure we would want to send that many...)
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: cerc-io/ipld-eth-server#250
No description provided.