faild with err pq: sorry, too many clients already #209

New Issue

telackey · 2022-12-22T19:58:35Z

telackey commented

2022-12-22 19:58:35 +00:00

While running a tool that executes a single eth_call, eth_blockNumber, eth_getStorageAt at a time after a few minutes of it looping, I observed repeated errors in the logs of the form:

time="2022-12-22T19:54:20Z" level=error msg="writeStateDiffFor 0xd9ee39c319ebb4a2afde56d292397f0dc524aca89d85e100d534b0565121fdb6 faild with err pq: sorry, too many clients already"

While running a tool that executes a single eth_call, eth_blockNumber, eth_getStorageAt at a time after a few minutes of it looping, I observed repeated errors in the logs of the form: ``` time="2022-12-22T19:54:20Z" level=error msg="writeStateDiffFor 0xd9ee39c319ebb4a2afde56d292397f0dc524aca89d85e100d534b0565121fdb6 faild with err pq: sorry, too many clients already" ```

dboreham commented

2022-12-22 20:14:19 +00:00

Doesn't this mean: "postgresql won't accept any more connections"?

telackey commented

2022-12-22 20:26:08 +00:00

Yes, but I think it is actually on the geth side.

This code has the comment:

// writeStateDiffFor calls out to the proxy statediffing geth client to fill in a gap in the index

Which then makes a remote call:

	if err := pea.rpc.CallContext(ctx, &data, "statediff_writeStateDiffFor", blockHash, params); err != nil {
		logrus.Errorf("writeStateDiffFor %s faild with err %s", blockHash.Hex(), err.Error())
	}

The means the pq error is really coming in response to that RPC call.

In the DB container logs I see this:

2022-12-22 20:20:02.024 UTC [288027] FATAL:  sorry, too many clients already
2022-12-22 20:20:04.904 UTC [288029] FATAL:  sorry, too many clients already
2022-12-22 20:20:05.092 UTC [288030] FATAL:  sorry, too many clients already
2022-12-22 20:20:07.944 UTC [288032] FATAL:  sorry, too many clients already
2022-12-22 20:20:08.315 UTC [288033] FATAL:  sorry, too many clients already
2022-12-22 20:20:08.642 UTC [288035] FATAL:  sorry, too many clients already
2022-12-22 20:22:23.085 UTC [22] LOG:  checkpoints are occurring too frequently (24 seconds apart)
2022-12-22 20:22:23.085 UTC [22] HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-12-22 20:22:49.958 UTC [22] LOG:  checkpoints are occurring too frequently (26 seconds apart)
2022-12-22 20:22:49.958 UTC [22] HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-12-22 20:23:10.464 UTC [22] LOG:  checkpoints are occurring too frequently (21 seconds apart)
2022-12-22 20:23:10.464 UTC [22] HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-12-22 20:23:38.876 UTC [22] LOG:  checkpoints are occurring too frequently (28 seconds apart)
2022-12-22 20:23:38.876 UTC [22] HINT:  Consider increasing the configuration parameter "max_wal_size".

Yes, but I think it is actually on the geth side. This code has the comment: ``` // writeStateDiffFor calls out to the proxy statediffing geth client to fill in a gap in the index ``` Which then makes a remote call: ``` if err := pea.rpc.CallContext(ctx, &data, "statediff_writeStateDiffFor", blockHash, params); err != nil { logrus.Errorf("writeStateDiffFor %s faild with err %s", blockHash.Hex(), err.Error()) } ``` The means the pq error is really coming in response to that RPC call. In the DB container logs I see this: ``` 2022-12-22 20:20:02.024 UTC [288027] FATAL: sorry, too many clients already 2022-12-22 20:20:04.904 UTC [288029] FATAL: sorry, too many clients already 2022-12-22 20:20:05.092 UTC [288030] FATAL: sorry, too many clients already 2022-12-22 20:20:07.944 UTC [288032] FATAL: sorry, too many clients already 2022-12-22 20:20:08.315 UTC [288033] FATAL: sorry, too many clients already 2022-12-22 20:20:08.642 UTC [288035] FATAL: sorry, too many clients already 2022-12-22 20:22:23.085 UTC [22] LOG: checkpoints are occurring too frequently (24 seconds apart) 2022-12-22 20:22:23.085 UTC [22] HINT: Consider increasing the configuration parameter "max_wal_size". 2022-12-22 20:22:49.958 UTC [22] LOG: checkpoints are occurring too frequently (26 seconds apart) 2022-12-22 20:22:49.958 UTC [22] HINT: Consider increasing the configuration parameter "max_wal_size". 2022-12-22 20:23:10.464 UTC [22] LOG: checkpoints are occurring too frequently (21 seconds apart) 2022-12-22 20:23:10.464 UTC [22] HINT: Consider increasing the configuration parameter "max_wal_size". 2022-12-22 20:23:38.876 UTC [22] LOG: checkpoints are occurring too frequently (28 seconds apart) 2022-12-22 20:23:38.876 UTC [22] HINT: Consider increasing the configuration parameter "max_wal_size". ```

telackey commented

2022-12-22 20:29:01 +00:00

From geth:

INFO [12-22|20:27:59.734] Writing state diff                       block height=16,242,632 worker=1
INFO [12-22|20:28:00.553] writing state diff for                   block hash=d9ee39..21fdb6
INFO [12-22|20:28:00.904] writing state diff for                   block hash=d9ee39..21fdb6
INFO [12-22|20:28:01.410] writing state diff for                   block hash=d9ee39..21fdb6
INFO [12-22|20:28:04.424] writing state diff for                   block hash=d9ee39..21fdb6
INFO [12-22|20:28:04.653] writing state diff for                   block hash=d9ee39..21fdb6
INFO [12-22|20:28:05.083] writing state diff for                   block hash=d9ee39..21fdb6
INFO [12-22|20:28:07.692] writing state diff for                   block hash=d9ee39..21fdb6
INFO [12-22|20:28:07.883] writing state diff for                   block hash=d9ee39..21fdb6
INFO [12-22|20:28:08.078] writing state diff for                   block hash=d9ee39..21fdb6
INFO [12-22|20:28:10.716] writing state diff for                   block hash=d9ee39..21fdb6
INFO [12-22|20:28:10.897] writing state diff for                   block hash=d9ee39..21fdb6
INFO [12-22|20:28:11.111] writing state diff for                   block hash=d9ee39..21fdb6
INFO [12-22|20:28:12.538] Imported new potential chain segment     blocks=1 txs=278  mgas=29.866 elapsed=288.939ms    mgasps=103.365  number=16,242,633 hash=f4b516..4c1adc dirty=0.00B
INFO [12-22|20:28:12.636] Chain head was updated                   number=16,242,633 hash=f4b516..4c1adc root=a10bf7..4bdb7a elapsed=6.966165ms
INFO [12-22|20:28:12.636] Writing state diff                       block height=16,242,633 worker=7
ERROR[12-22|20:28:12.659] statediff.Service.WriteLoop: processing error block height=16,242,633 error="pq: sorry, too many clients already" worker=7
WARN [12-22|20:28:12.659] Updating the following block to knownErrorBlocks to be inserted into knownGaps table blockNumber=16,242,633
INFO [12-22|20:28:13.681] writing state diff for                   block hash=d9ee39..21fdb6
WARN [12-22|20:28:13.697] Served statediff_writeStateDiffFor       conn=172.29.0.5:60674 reqid=9949 duration=15.90455ms   err="pq: sorry, too many clients already"
INFO [12-22|20:28:14.032] writing state diff for                   block hash=d9ee39..21fdb6
WARN [12-22|20:28:14.045] Served statediff_writeStateDiffFor       conn=172.29.0.5:60674 reqid=9951 duration=12.418582ms  err="pq: sorry, too many clients already"
INFO [12-22|20:28:14.259] writing state diff for                   block hash=d9ee39..21fdb6
WARN [12-22|20:28:14.271] Served statediff_writeStateDiffFor       conn=172.29.0.5:60674 reqid=9953 duration=12.255946ms  err="pq: sorry, too many clients already"
INFO [12-22|20:28:17.144] writing state diff for                   block hash=d9ee39..21fdb6
INFO [12-22|20:28:17.394] writing state diff for                   block hash=d9ee39..21fdb6
WARN [12-22|20:28:17.408] Served statediff_writeStateDiffFor       conn=172.29.0.5:60680 reqid=9958 duration=13.434204ms  err="pq: sorry, too many clients already"
INFO [12-22|20:28:17.816] writing state diff for                   block hash=d9ee39..21fdb6
WARN [12-22|20:28:17.832] Served statediff_writeStateDiffFor       conn=172.29.0.5:60680 reqid=9960 duration=15.88771ms   err="pq: sorry, too many clients already"
INFO [12-22|20:28:20.686] writing state diff for                   block hash=d9ee39..21fdb6
WARN [12-22|20:28:20.698] Served statediff_writeStateDiffFor       conn=172.29.0.5:60680 reqid=9963 duration=11.752901ms  err="pq: sorry, too many clients already"
INFO [12-22|20:28:20.878] writing state diff for                   block hash=d9ee39..21fdb6
WARN [12-22|20:28:20.901] Served statediff_writeStateDiffFor       conn=172.29.0.5:60680 reqid=9966 duration=22.725864ms  err="pq: sorry, too many clients already"
INFO [12-22|20:28:21.085] writing state diff for                   block hash=d9ee39..21fdb6

From geth: ``` INFO [12-22|20:27:59.734] Writing state diff block height=16,242,632 worker=1 INFO [12-22|20:28:00.553] writing state diff for block hash=d9ee39..21fdb6 INFO [12-22|20:28:00.904] writing state diff for block hash=d9ee39..21fdb6 INFO [12-22|20:28:01.410] writing state diff for block hash=d9ee39..21fdb6 INFO [12-22|20:28:04.424] writing state diff for block hash=d9ee39..21fdb6 INFO [12-22|20:28:04.653] writing state diff for block hash=d9ee39..21fdb6 INFO [12-22|20:28:05.083] writing state diff for block hash=d9ee39..21fdb6 INFO [12-22|20:28:07.692] writing state diff for block hash=d9ee39..21fdb6 INFO [12-22|20:28:07.883] writing state diff for block hash=d9ee39..21fdb6 INFO [12-22|20:28:08.078] writing state diff for block hash=d9ee39..21fdb6 INFO [12-22|20:28:10.716] writing state diff for block hash=d9ee39..21fdb6 INFO [12-22|20:28:10.897] writing state diff for block hash=d9ee39..21fdb6 INFO [12-22|20:28:11.111] writing state diff for block hash=d9ee39..21fdb6 INFO [12-22|20:28:12.538] Imported new potential chain segment blocks=1 txs=278 mgas=29.866 elapsed=288.939ms mgasps=103.365 number=16,242,633 hash=f4b516..4c1adc dirty=0.00B INFO [12-22|20:28:12.636] Chain head was updated number=16,242,633 hash=f4b516..4c1adc root=a10bf7..4bdb7a elapsed=6.966165ms INFO [12-22|20:28:12.636] Writing state diff block height=16,242,633 worker=7 ERROR[12-22|20:28:12.659] statediff.Service.WriteLoop: processing error block height=16,242,633 error="pq: sorry, too many clients already" worker=7 WARN [12-22|20:28:12.659] Updating the following block to knownErrorBlocks to be inserted into knownGaps table blockNumber=16,242,633 INFO [12-22|20:28:13.681] writing state diff for block hash=d9ee39..21fdb6 WARN [12-22|20:28:13.697] Served statediff_writeStateDiffFor conn=172.29.0.5:60674 reqid=9949 duration=15.90455ms err="pq: sorry, too many clients already" INFO [12-22|20:28:14.032] writing state diff for block hash=d9ee39..21fdb6 WARN [12-22|20:28:14.045] Served statediff_writeStateDiffFor conn=172.29.0.5:60674 reqid=9951 duration=12.418582ms err="pq: sorry, too many clients already" INFO [12-22|20:28:14.259] writing state diff for block hash=d9ee39..21fdb6 WARN [12-22|20:28:14.271] Served statediff_writeStateDiffFor conn=172.29.0.5:60674 reqid=9953 duration=12.255946ms err="pq: sorry, too many clients already" INFO [12-22|20:28:17.144] writing state diff for block hash=d9ee39..21fdb6 INFO [12-22|20:28:17.394] writing state diff for block hash=d9ee39..21fdb6 WARN [12-22|20:28:17.408] Served statediff_writeStateDiffFor conn=172.29.0.5:60680 reqid=9958 duration=13.434204ms err="pq: sorry, too many clients already" INFO [12-22|20:28:17.816] writing state diff for block hash=d9ee39..21fdb6 WARN [12-22|20:28:17.832] Served statediff_writeStateDiffFor conn=172.29.0.5:60680 reqid=9960 duration=15.88771ms err="pq: sorry, too many clients already" INFO [12-22|20:28:20.686] writing state diff for block hash=d9ee39..21fdb6 WARN [12-22|20:28:20.698] Served statediff_writeStateDiffFor conn=172.29.0.5:60680 reqid=9963 duration=11.752901ms err="pq: sorry, too many clients already" INFO [12-22|20:28:20.878] writing state diff for block hash=d9ee39..21fdb6 WARN [12-22|20:28:20.901] Served statediff_writeStateDiffFor conn=172.29.0.5:60680 reqid=9966 duration=22.725864ms err="pq: sorry, too many clients already" INFO [12-22|20:28:21.085] writing state diff for block hash=d9ee39..21fdb6 ```

dboreham commented

2022-12-22 20:29:09 +00:00

Could this be related? https://github.com/cerc-io/go-ethereum/issues/293

telackey commented

2022-12-24 01:03:44 +00:00

Could this be related? cerc-io/go-ethereum#293

It looks highly likely.

> Could this be related? [cerc-io/go-ethereum#293](https://github.com/cerc-io/go-ethereum/issues/293) It looks highly likely.

i-norden commented

2022-12-30 19:14:24 +00:00

A couple clarifying questions:

How many operations were looped through in those couple minutes? (approx, order of magnitude)
What Postgres database is this ipld-eth-server operating on top of and what geth node is it talking to?

What I suspect could be happening is:

ipld-eth-server is configured such that it calls statediff_writeStateDiffFor on the proxy geth node when it is unable to satisfy these requests against the local database without error
This happens on (nearly) every request since the Postgres database doesn't have (complete) data (is the ipld-eth-server operating ontop of a Postgres database with data or is it just acting as a pass through to a geth node in these tests?)
Calling statediff_writeStateDiffFor is overloading geth's Postgres connection pool, because creating and writing these statediffs is an expensive operation (especially over historical ranges where the geth node does not have an in-memory cache of most the trie nodes it needs to access) that has performance issues/missing optimizations to the Postgres connection pool and tx lifecycle e.g. https://github.com/cerc-io/go-ethereum/issues/294

A couple clarifying questions: How many operations were looped through in those couple minutes? (approx, order of magnitude) What Postgres database is this ipld-eth-server operating on top of and what geth node is it talking to? What I suspect could be happening is: 1. ipld-eth-server is configured such that it calls `statediff_writeStateDiffFor` on the proxy geth node when it is unable to satisfy these requests against the local database without error 2. This happens on (nearly) every request since the Postgres database doesn't have (complete) data (is the ipld-eth-server operating ontop of a Postgres database with data or is it just acting as a pass through to a geth node in these tests?) 3. Calling `statediff_writeStateDiffFor` is overloading geth's Postgres connection pool, because creating and writing these statediffs is an expensive operation (especially over historical ranges where the geth node does not have an in-memory cache of most the trie nodes it needs to access) that has performance issues/missing optimizations to the Postgres connection pool and tx lifecycle e.g. https://github.com/cerc-io/go-ethereum/issues/294

telackey commented

2023-01-11 23:18:50 +00:00

#221 and #223 should prevent this from the ipld-eth-server side now.

Sign in to join this conversation.

No Milestone

No project

No Assignees

3 Participants

Notifications

Due Date

The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: cerc-io/ipld-eth-server#209