Commit Graph

5 Commits

Author SHA1 Message Date
ognots
445b7f3388 refactor lotus-health agent for robustness
add retry logic when calls to API fail.
if API reconnects fail, restart lotus-daemon as it means lotus-daemon is likely unhealthy.

wait for lotus node's chain to sync during each check cycle, to avoid restarting lotus-daemon if needing to sync.

handle SIGTERM properly.

general cleanup and refactor of code, getting ready of unnecessary channels
2020-01-24 11:46:47 -05:00
ognots
effacec817 range over right index to prevent bounds errors
the test scenario 'healthyHeadCheckWindow5' was causing index out of bounds errors.
the second range function in checkWindow was iterating over the incorrect slice of cids.
should be comparing latest items in slices first
2020-01-21 16:07:53 -05:00
ognots
0c6e4c6c40 fixes health agent bug
was passing wrong variable to updateWindow function argument.
also updates duplicate function comment
2020-01-20 20:34:13 -05:00
ognots
3953227702 use lotus cli and GetFullNodeAPI
also some other minor bug fixes
2020-01-14 12:18:45 -05:00
ognots
d8d8ce7526 health agent to monitor lotus
watch if chain head changes in a given window of api polls
allows setting a threshold of how many times the chain head can remain
unchanged before failing health check
also can set interval for polling chain head
on failure, restarts systemd unit
2020-01-14 12:18:45 -05:00