Fix getLogs query (avoid repeated execution, separate tx, and improve performance) #250

telackey · 2023-06-15T00:31:33Z

telackey commented

2023-06-15 00:31:33 +00:00

Fix for #248 and #249.

This does a few things:

Support for "earliest", "latest", etc. block specifiers. This did not work before.
The default "fromBlock" is now "latest" to match geth.
The old code did a loop across fromBlock:toBlock, executing a query on each one inside a big tx (!). This does a single query, with no separate tx.
Add a max range limit (default 500) with an error message if the range is too broad (I used the same error message as cloudflare)
Simplify the query a good bit. This improved performance significantly in my manual testing adapted to v4. The old version, adapted to use a small range of 50 blocks took over 20s. The new version took about 700ms. A query across 500 blocks with 150K results takes less than 4s.

Some additional test cases added here: https://git.vdb.to/cerc-io/system-tests/src/branch/main/test/test_logs.py

Fix for #248 and #249. This does a few things: 1) Support for "earliest", "latest", etc. block specifiers. This did not work before. 2) The default "fromBlock" is now "latest" to match geth. 3) The old code did a loop across fromBlock:toBlock, executing a query on each one inside a big tx (!). This does a single query, with no separate tx. 4) Add a max range limit (default 500) with an error message if the range is too broad (I used the same error message as cloudflare) 5) Simplify the query a good bit. This improved performance significantly in my manual testing adapted to v4. The old version, adapted to use a small range of 50 blocks took over 20s. The new version took about 700ms. A query across 500 blocks with 150K results takes less than 4s. Some additional test cases added here: https://git.vdb.to/cerc-io/system-tests/src/branch/main/test/test_logs.py

dboreham reviewed 2023-06-15 00:31:33 +00:00

i-norden approved these changes 2023-06-15 00:46:26 +00:00

i-norden left a comment

LGTM once the config is added. 100 might be low (I imagine geth doesn't timeout at that range size). And while we are at it we should add the missing canonical_header_hash condition on the getLogs query.

LGTM once the config is added. 100 might be low (I imagine geth doesn't timeout at that range size). And while we are at it we should add the missing `canonical_header_hash` condition on the `getLogs` query.

pkg/eth/sql.go

						
				@ -137,0 +146,4 @@

											AND eth.receipt_cids.block_number = eth.log_cids.block_number

											AND eth.receipt_cids.header_id = eth.log_cids.header_id

											AND eth.receipt_cids.tx_id = eth.log_cids.rct_id

											AND eth.receipt_cids.tx_id = eth.transaction_cids.tx_hash

i-norden commented

2023-06-15 00:40:04 +00:00

We should also add an AND transaction_cids.header_id = (SELECT canonical_header_hash(transaction_cids.block_number)) statement to ensure only canonical results (based on internal weighing) are returned.

We should also add an `AND transaction_cids.header_id = (SELECT canonical_header_hash(transaction_cids.block_number))` statement to ensure only canonical results (based on internal weighing) are returned.

telackey reviewed 2023-06-15 03:14:11 +00:00

pkg/eth/sql.go

						
				@ -137,0 +146,4 @@

											AND eth.receipt_cids.block_number = eth.log_cids.block_number

											AND eth.receipt_cids.header_id = eth.log_cids.header_id

											AND eth.receipt_cids.tx_id = eth.log_cids.rct_id

											AND eth.receipt_cids.tx_id = eth.transaction_cids.tx_hash

telackey commented

2023-06-15 03:14:11 +00:00

Hmm, I need to examine that quite carefully, as it may have the side-effect of defeating the indexing.

telackey reviewed 2023-06-15 03:40:56 +00:00

pkg/eth/sql.go

						
				@ -137,0 +146,4 @@

											AND eth.receipt_cids.block_number = eth.log_cids.block_number

											AND eth.receipt_cids.header_id = eth.log_cids.header_id

											AND eth.receipt_cids.tx_id = eth.log_cids.rct_id

											AND eth.receipt_cids.tx_id = eth.transaction_cids.tx_hash

telackey commented

2023-06-15 03:40:55 +00:00

Yes, in a quick test against carrion using a (fairly) narrow 500 block range it moved it from an already too slow 4s to whopping 67s.

It's not apples to apples because I had to tweak the query for v4, but the idea is similar.

This could be quite tricky. Also, is there any particular reason we are opening an explicit TX to do this query? Unless we are dead certain we need one, we should remove it.

Yes, in a quick test against carrion using a (fairly) narrow 500 block range it moved it from an already too slow 4s to whopping 67s. It's not apples to apples because I had to tweak the query for v4, but the idea is similar. This could be quite tricky. Also, is there any particular reason we are opening an explicit TX to do this query? Unless we are dead certain we need one, we should remove it.

i-norden reviewed 2023-06-15 13:06:15 +00:00

pkg/eth/sql.go

						
				@ -137,0 +146,4 @@

											AND eth.receipt_cids.block_number = eth.log_cids.block_number

											AND eth.receipt_cids.header_id = eth.log_cids.header_id

											AND eth.receipt_cids.tx_id = eth.log_cids.rct_id

											AND eth.receipt_cids.tx_id = eth.transaction_cids.tx_hash

i-norden commented

2023-06-15 13:06:14 +00:00

We were originally opening a TX because the query was a hack that iterated over the range and repeated the same query used for a single blockheight for each blockheight. Now that that is fixed, getting rid of the tx sounds good to me. If canonical_header_hash is having that impact we need to return the higher level story of removing/preventing reorg/non-canonical data in the database.

We were originally opening a TX because the query was a hack that iterated over the range and repeated the same query used for a single blockheight for each blockheight. Now that that is fixed, getting rid of the tx sounds good to me. If `canonical_header_hash` is having that impact we need to return the higher level story of removing/preventing reorg/non-canonical data in the database.

i-norden reviewed 2023-06-15 16:26:43 +00:00

pkg/eth/sql.go

						
				@ -137,0 +146,4 @@

											AND eth.receipt_cids.block_number = eth.log_cids.block_number

											AND eth.receipt_cids.header_id = eth.log_cids.header_id

											AND eth.receipt_cids.tx_id = eth.log_cids.rct_id

											AND eth.receipt_cids.tx_id = eth.transaction_cids.tx_hash

i-norden commented

2023-06-15 16:26:43 +00:00

Curious what the performance imapact is on smaller queries like https://github.com/cerc-io/ipld-eth-server/blob/telackey/249/pkg/eth/sql.go#L98 and https://github.com/cerc-io/ipld-eth-server/blob/telackey/249/pkg/eth/sql.go#L162.

telackey commented

2023-06-16 02:01:39 +00:00

I rewrote the query to select the canonical hash of the blocks in specified range, then check for inclusion in that set. I also simplified the query structure a good bit. I had to use a v4 database for comparison, but using a small block range of 50 was running about 23s on the old query, 700ms on the new.

🚀 2

telackey commented

2023-06-16 02:11:22 +00:00

I also upped the limit to 500, as that is still decently quick for me (3 to 4s) on blocks 17,000,000 to 17,000,500 with about 150,727 results (not that I'm sure we would want to send that many...)

Sign in to join this conversation.

No reviewers

No Milestone

No project

No Assignees

3 Participants

Notifications

Due Date

The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: cerc-io/ipld-eth-server#250