Investigate odd lighthouse error during catch up sync #466

Open
opened 2023-07-27 18:50:09 +00:00 by dboreham · 3 comments
Owner

We keep seeing this error in the lighthouse logs after a warm restart:

WARN Error whilst processing payload status  error: Api { error: Reqwest(reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("mainnet-eth-geth-1")), port: Some(8551), path: "/", query: None, fragment: None }, source: TimedOut }) }, service: exec

Evidence suggests it is not as simple as a timeout on a request to the sibling geth (geth seems perfectly happy).

We keep seeing this error in the lighthouse logs after a warm restart: ``` WARN Error whilst processing payload status error: Api { error: Reqwest(reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("mainnet-eth-geth-1")), port: Some(8551), path: "/", query: None, fragment: None }, source: TimedOut }) }, service: exec ``` Evidence suggests it is not as simple as a timeout on a request to the sibling geth (geth seems perfectly happy).
Author
Owner

Perhaps relevant that this message seems to only be seen when lighthouse is not synced up to head:

INFO Syncing                                 est_time: --, distance: 58 slots (11 mins), peers: 17, service: slot_notifier
Perhaps relevant that this message seems to only be seen when lighthouse is not synced up to head: ``` INFO Syncing est_time: --, distance: 58 slots (11 mins), peers: 17, service: slot_notifier ```
Author
Owner

Actually since the slot backlog count keeps increasing, it may be that sync is totally stalled. Perhaps the stalled sync and the message about whilst etc are related.

Actually since the slot backlog count keeps increasing, it may be that sync is totally stalled. Perhaps the stalled sync and the message about whilst etc are related.
Author
Owner

Suspicions supported by the fact that after a stack restart the errors do not reappear.
Theory is that lighthouse put itself in some bad perpetual stuck state.

Suspicions supported by the fact that after a stack restart the errors do not reappear. Theory is that lighthouse put itself in some bad perpetual stuck state.
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: cerc-io/stack-orchestrator#466
No description provided.