On Windows, UDPConn.ReadFrom returns an error for packets larger
than the receive buffer. The error is not marked temporary, causing
our loop to exit when the first oversized packet arrived. The fix
is to treat this particular error as temporary.
Fixes: #1579, #2087
Updates: #2082
This change makes it possible to add peers without providing their IP
address. The endpoint of the target node is resolved using the discovery
protocol.
nodeDB.querySeeds was not safe for concurrent use but could be called
concurrenty on multiple goroutines in the following case:
- the table was empty
- a timed refresh started
- a lookup was started and initiated refresh
These conditions are unlikely to coincide during normal use, but are
much more likely to occur all at once when the user's machine just woke
from sleep. The root cause of the issue is that querySeeds reused the
same leveldb iterator until it was exhausted.
This commit moves the refresh scheduling logic into its own goroutine
(so only one refresh is ever active) and changes querySeeds to not use
a persistent iterator. The seed node selection is now more random and
ignores nodes that have not been contacted in the last 5 days.
PR #1621 changed Table locking so the mutex is not held while a
contested node is being pinged. If multiple nodes ping the local node
during this time window, multiple ping packets will be sent to the
contested node. The changes in this commit prevent multiple packets by
tracking whether the node is being replaced.
If the timeout fired (even just nanoseconds) before the deadline of the
next pending reply, the timer was not rescheduled. The timer would've
been rescheduled anyway once the next packet was sent, but there were
cases where no next packet could ever be sent due to the locking issue
fixed in the previous commit.
As timing-related bugs go, this issue had been present for a long time
and I could never reproduce it. The test added in this commit did
reproduce the issue on about one out of 15 runs.
Table.mutex was being held while waiting for a reply packet, which
effectively made many parts of the whole stack block on that packet,
including the net_peerCount RPC call.
Lookup calls would spin out of control when network connectivity was
lost. The throttling that was in place only took effect when the table
returned zero results, which doesn't happen very often.
The new throttling should not have a negative impact when the host is
online. Lookups against the network take some time and dials for all
results must complete or hit the cache before a new one is started. This
usually takes longer than four seconds, leaving online lookups
unaffected.
Fixes#1296