Tests in CI are flakey #142

New Issue

telackey · 2024-01-23T17:45:45Z

telackey commented

2024-01-23 17:45:45 +00:00

Here is an example failure for test-rpc:

=== RUN   TestEth_GetFilterChanges_NoTopics
    rpc_test.go:295: 
        	Error Trace:	/workspace/cerc-io/laconicd/tests/rpc/rpc_test.go:295
        	            				/workspace/cerc-io/laconicd/tests/rpc/rpc_test.go:367
        	Error:      	Expected value not to be nil.
        	Test:       	TestEth_GetFilterChanges_NoTopics
        	Messages:   	transaction failed
--- FAIL: TestEth_GetFilterChanges_NoTopics (16.62s)
=== RUN   TestEth_GetFilterChanges_Topics_AB
    rpc_test.go:454: 
        	Error Trace:	/workspace/cerc-io/laconicd/tests/rpc/rpc_test.go:454
        	Error:      	Not equal: 
        	            	expected: 1
        	            	actual  : 2
        	Test:       	TestEth_GetFilterChanges_Topics_AB
--- FAIL: TestEth_GetFilterChanges_Topics_AB (19.72s)

There is nothing new that is broken here, and re-running the test passed without issue.

The test seems to be dependent on some sort of performance of other environmental factor that means sometimes it works and sometimes it does not, depending on what task runner it gets assigned to, what other actions are running on the machine, etc.

Here is an example failure for `test-rpc`: ``` === RUN TestEth_GetFilterChanges_NoTopics rpc_test.go:295: Error Trace: /workspace/cerc-io/laconicd/tests/rpc/rpc_test.go:295 /workspace/cerc-io/laconicd/tests/rpc/rpc_test.go:367 Error: Expected value not to be nil. Test: TestEth_GetFilterChanges_NoTopics Messages: transaction failed --- FAIL: TestEth_GetFilterChanges_NoTopics (16.62s) === RUN TestEth_GetFilterChanges_Topics_AB rpc_test.go:454: Error Trace: /workspace/cerc-io/laconicd/tests/rpc/rpc_test.go:454 Error: Not equal: expected: 1 actual : 2 Test: TestEth_GetFilterChanges_Topics_AB --- FAIL: TestEth_GetFilterChanges_Topics_AB (19.72s) ``` There is nothing new that is broken here, and re-running the test passed without issue. The test seems to be dependent on some sort of performance of other environmental factor that means sometimes it works and sometimes it does not, depending on what task runner it gets assigned to, what other actions are running on the machine, etc.

telackey commented

2024-01-23 18:14:28 +00:00

Another example:

=== RUN   TestEth_GetFilterChanges_Topics_AB
    rpc_test.go:420: 
        	Error Trace:	/workspace/cerc-io/laconicd/tests/rpc/rpc_test.go:420
        	            				/workspace/cerc-io/laconicd/tests/rpc/rpc_test.go:445
        	Error:      	Expected value not to be nil.
        	Test:       	TestEth_GetFilterChanges_Topics_AB
        	Messages:   	transaction failed
--- FAIL: TestEth_GetFilterChanges_Topics_AB (16.58s)
=== RUN   TestEth_GetFilterChanges_Topics_XB
    rpc_test.go:420: 
        	Error Trace:	/workspace/cerc-io/laconicd/tests/rpc/rpc_test.go:420
        	            				/workspace/cerc-io/laconicd/tests/rpc/rpc_test.go:475
        	Error:      	Expected value not to be nil.
        	Test:       	TestEth_GetFilterChanges_Topics_XB
        	Messages:   	transaction failed
--- FAIL: TestEth_GetFilterChanges_Topics_XB (16.35s)

Another example: ``` === RUN TestEth_GetFilterChanges_Topics_AB rpc_test.go:420: Error Trace: /workspace/cerc-io/laconicd/tests/rpc/rpc_test.go:420 /workspace/cerc-io/laconicd/tests/rpc/rpc_test.go:445 Error: Expected value not to be nil. Test: TestEth_GetFilterChanges_Topics_AB Messages: transaction failed --- FAIL: TestEth_GetFilterChanges_Topics_AB (16.58s) === RUN TestEth_GetFilterChanges_Topics_XB rpc_test.go:420: Error Trace: /workspace/cerc-io/laconicd/tests/rpc/rpc_test.go:420 /workspace/cerc-io/laconicd/tests/rpc/rpc_test.go:475 Error: Expected value not to be nil. Test: TestEth_GetFilterChanges_Topics_XB Messages: transaction failed --- FAIL: TestEth_GetFilterChanges_Topics_XB (16.35s) ```

dboreham commented

2024-01-23 18:15:29 +00:00

Can we get the test time for the passed case?

telackey commented

2024-01-23 18:15:56 +00:00

SDK tests:

    github.com/tendermint/tendermint/mempool/v0.(*CListMempool).CheckTx
    	github.com/tendermint/tendermint@v0.34.24/mempool/v0/clist_mempool.go:254
    account sequence mismatch, expected 8, got 7: incorrect account sequence

SDK tests: ``` github.com/tendermint/tendermint/mempool/v0.(*CListMempool).CheckTx github.com/tendermint/tendermint@v0.34.24/mempool/v0/clist_mempool.go:254 account sequence mismatch, expected 8, got 7: incorrect account sequence ```

dboreham commented

2024-01-23 18:17:16 +00:00

Some parallelism thing?

telackey commented

2024-01-23 18:18:05 +00:00

Can we get the test time for the passed case?

=== RUN   TestEth_GetFilterChanges_NoTopics
--- PASS: TestEth_GetFilterChanges_NoTopics (6.66s)
=== RUN   TestEth_GetFilterChanges_Topics_AB
--- PASS: TestEth_GetFilterChanges_Topics_AB (9.68s)
=== RUN   TestEth_GetFilterChanges_Topics_XB
--- PASS: TestEth_GetFilterChanges_Topics_XB (9.68s)

With an overall runtime of 4m23s

> Can we get the test time for the passed case? ``` === RUN TestEth_GetFilterChanges_NoTopics --- PASS: TestEth_GetFilterChanges_NoTopics (6.66s) === RUN TestEth_GetFilterChanges_Topics_AB --- PASS: TestEth_GetFilterChanges_Topics_AB (9.68s) === RUN TestEth_GetFilterChanges_Topics_XB --- PASS: TestEth_GetFilterChanges_Topics_XB (9.68s) ``` With an overall runtime of 4m23s

dboreham commented

2024-01-23 18:19:27 +00:00

Definitely shorter times than the failure case. Is there some global timeout we can set?

dboreham commented

2024-01-23 18:19:48 +00:00

Also : what's it doing for 20 seconds??

dboreham commented

2024-01-23 18:20:13 +00:00

Another idea: could we just try a faster runner and see if that fixes the problem?

telackey commented

2024-01-23 18:22:33 +00:00

I'm all for trying the that, we just need to know where to put it.

dboreham commented

2024-01-23 18:23:50 +00:00

What does it entail? Could we run it temporarily on one of the big servers?

telackey commented

2024-01-23 18:31:42 +00:00

It is fairly simple to spin up one, and we'd need to give it some sort of unique tag and alter the workflows accordingly.

That would also have the side-effect of queueing all the tasks on the same runner to run sequentially, which might improve performance of individual tests vs the possibility of multiple jobs running on the same host machine at the same time.

I'm not sure how that works long term, because it is pretty inefficient, but it might be useful for diagnostic purposes.

It is fairly simple to spin up one, and we'd need to give it some sort of unique tag and alter the workflows accordingly. That would also have the side-effect of queueing all the tasks on the same runner to run sequentially, which might improve performance of individual tests vs the possibility of multiple jobs running on the same host machine at the same time. I'm not sure how that works long term, because it is pretty inefficient, but it might be useful for diagnostic purposes.

dboreham commented

2024-01-23 18:32:53 +00:00

My thinking was to just try it initially and see if it makes the tests reliable. Then we can think about next steps.

Sign in to join this conversation.