This change extends the peer metrics collection:
- traces the life-cycle of the peers
- meters the peer traffic separately for every peer
- creates event feed for the peer events
- emits the peer events
This PR adds enode.LocalNode and integrates it into the p2p
subsystem. This new object is the keeper of the local node
record. For now, a new version of the record is produced every
time the client restarts. We'll make it smarter to avoid that in
the future.
There are a couple of other changes in this commit: discovery now
waits for all of its goroutines at shutdown and the p2p server
now closes the node database after discovery has shut down. This
fixes a leveldb crash in tests. p2p server startup is faster
because it doesn't need to wait for the external IP query
anymore.
This fixes a rare deadlock with the inproc adapter:
- A node is stopped, which acquires Network.lock.
- The protocol code being simulated (swarm/network in my case)
waits for its goroutines to shut down.
- One of those goroutines calls into the simulation to add a peer,
which waits for Network.lock.
The fix for the deadlock is really simple, just release the lock
before stopping the simulation node.
Other changes in this PR clean up the exec adapter so it reports
node startup errors better and remove the docker adapter because
it just adds overhead.
In the exec adapter, node information is now posted to a one-shot
server. This avoids log parsing and allows reporting startup
errors to the simulation host.
A small change in package node was needed because simulation
nodes use port zero. Node.{HTTP,WS}Endpoint now return the live
endpoints after startup by checking the TCP listener.
Package p2p/enode provides a generalized representation of p2p nodes
which can contain arbitrary information in key/value pairs. It is also
the new home for the node database. The "v4" identity scheme is also
moved here from p2p/enr to remove the dependency on Ethereum crypto from
that package.
Record signature handling is changed significantly. The identity scheme
registry is removed and acceptable schemes must be passed to any method
that needs identity. This means records must now be validated explicitly
after decoding.
The enode API is designed to make signature handling easy and safe: most
APIs around the codebase work with enode.Node, which is a wrapper around
a valid record. Going from enr.Record to enode.Node requires a valid
signature.
* p2p/discover: port to p2p/enode
This ports the discovery code to the new node representation in
p2p/enode. The wire protocol is unchanged, this can be considered a
refactoring change. The Kademlia table can now deal with nodes using an
arbitrary identity scheme. This requires a few incompatible API changes:
- Table.Lookup is not available anymore. It used to take a public key
as argument because v4 protocol requires one. Its replacement is
LookupRandom.
- Table.Resolve takes *enode.Node instead of NodeID. This is also for
v4 protocol compatibility because nodes cannot be looked up by ID
alone.
- Types Node and NodeID are gone. Further commits in the series will be
fixes all over the the codebase to deal with those removals.
* p2p: port to p2p/enode and discovery changes
This adapts package p2p to the changes in p2p/discover. All uses of
discover.Node and discover.NodeID are replaced by their equivalents from
p2p/enode.
New API is added to retrieve the enode.Node instance of a peer. The
behavior of Server.Self with discovery disabled is improved. It now
tries much harder to report a working IP address, falling back to
127.0.0.1 if no suitable address can be determined through other means.
These changes were needed for tests of other packages later in the
series.
* p2p/simulations, p2p/testing: port to p2p/enode
No surprises here, mostly replacements of discover.Node, discover.NodeID
with their new equivalents. The 'interesting' API changes are:
- testing.ProtocolSession tracks complete nodes, not just their IDs.
- adapters.NodeConfig has a new method to create a complete node.
These changes were needed to make swarm tests work.
Note that the NodeID change makes the code incompatible with old
simulation snapshots.
* whisper/whisperv5, whisper/whisperv6: port to p2p/enode
This port was easy because whisper uses []byte for node IDs and
URL strings in the API.
* eth: port to p2p/enode
Again, easy to port because eth uses strings for node IDs and doesn't
care about node information in any way.
* les: port to p2p/enode
Apart from replacing discover.NodeID with enode.ID, most changes are in
the server pool code. It now deals with complete nodes instead
of (Pubkey, IP, Port) triples. The database format is unchanged for now,
but we should probably change it to use the node database later.
* node: port to p2p/enode
This change simply replaces discover.Node and discover.NodeID with their
new equivalents.
* swarm/network: port to p2p/enode
Swarm has its own node address representation, BzzAddr, containing both
an overlay address (the hash of a secp256k1 public key) and an underlay
address (enode:// URL).
There are no changes to the BzzAddr format in this commit, but certain
operations such as creating a BzzAddr from a node ID are now impossible
because node IDs aren't public keys anymore.
Most swarm-related changes in the series remove uses of
NewAddrFromNodeID, replacing it with NewAddr which takes a complete node
as argument. ToOverlayAddr is removed because we can just use the node
ID directly.
* cmd/swarm: minor cli flag text adjustments
* cmd/swarm, swarm/storage, swarm: fix mingw on windows test issues
* cmd/swarm: support for smoke tests on the production swarm cluster
* cmd/swarm/swarm-smoke: simplify cluster logic as per suggestion
* changed colour of landing page
* landing page reacts to enter keypress
* swarm/api/http: sticky footer for swarm landing page using flex
* swarm/api/http: sticky footer for error pages and fix for multiple choices
* swarm: propagate ctx to internal apis (#754)
* swarm/simnet: add basic node/service functions
* swarm/netsim: add buckets for global state and kademlia health check
* swarm/netsim: Use sync.Map as bucket and provide cleanup function for...
* swarm, swarm/netsim: adjust SwarmNetworkTest
* swarm/netsim: fix tests
* swarm: added visualization option to sim net redesign
* swarm/netsim: support multiple services per node
* swarm/netsim: remove redundant return statement
* swarm/netsim: add comments
* swarm: shutdown HTTP in Simulation.Close
* swarm: sim HTTP server timeout
* swarm/netsim: add more simulation methods and peer events examples
* swarm/netsim: add WaitKademlia example
* swarm/netsim: fix comments
* swarm/netsim: terminate peer events goroutines on simulation done
* swarm, swarm/netsim: naming updates
* swarm/netsim: return not healthy kademlias on WaitTillHealthy
* swarm: fix WaitTillHealthy call in testSwarmNetwork
* swarm/netsim: allow bucket to have any type for a key
* swarm: Added snapshots to new netsim
* swarm/netsim: add more tests for bucket
* swarm/netsim: move http related things into separate files
* swarm/netsim: add AddNodeWithService option
* swarm/netsim: add more tests and Start* methods
* swarm/netsim: add peer events and kademlia tests
* swarm/netsim: fix some tests flakiness
* swarm/netsim: improve random nodes selection, fix TestStartStop* tests
* swarm/netsim: remove time measurement from TestClose to avoid flakiness
* swarm/netsim: builder pattern for netsim HTTP server (#773)
* swarm/netsim: add connect related tests
* swarm/netsim: add comment for TestPeerEvents
* swarm: rename netsim package to network/simulation
* p2p/discover: move bond logic from table to transport
This commit moves node endpoint verification (bonding) from the table to
the UDP transport implementation. Previously, adding a node to the table
entailed pinging the node if needed. With this change, the ping-back
logic is embedded in the packet handler at a lower level.
It is easy to verify that the basic protocol is unchanged: we still
require a valid pong reply from the node before findnode is accepted.
The node database tracked the time of last ping sent to the node and
time of last valid pong received from the node. Node endpoints are
considered verified when a valid pong is received and the time of last
pong was called 'bond time'. The time of last ping sent was unused. In
this commit, the last ping database entry is repurposed to mean last
ping _received_. This entry is now used to track whether the node needs
to be pinged back.
The other big change is how nodes are added to the table. We used to add
nodes in Table.bond, which ran when a remote node pinged us or when we
encountered the node in a neighbors reply. The transport now adds to the
table directly after the endpoint is verified through ping. To ensure
that the Table can't be filled just by pinging the node repeatedly, we
retain the isInitDone check. During init, only nodes from neighbors
replies are added.
* p2p/discover: reduce findnode failure counter on success
* p2p/discover: remove unused parameter of loadSeedNodes
* p2p/discover: improve ping-back check and comments
* p2p/discover: add neighbors reply nodes always, not just during init
These RPC calls are analogous to Parity's parity_addReservedPeer and
parity_removeReservedPeer.
They are useful for adjusting the trusted peer set during runtime,
without requiring restarting the server.
This commit adds all changes needed for the merge of swarm-network-rewrite.
The changes:
- build: increase linter timeout
- contracts/ens: export ensNode
- log: add Output method and enable fractional seconds in format
- metrics: relax test timeout
- p2p: reduced some log levels, updates to simulation packages
- rpc: increased maxClientSubscriptionBuffer to 20000
ToECDSAPub was unsafe because it returned a non-nil key with nil X, Y in
case of invalid input. This change replaces ToECDSAPub with
UnmarshalPubkey across the codebase.
This applies spec changes from ethereum/EIPs#1049 and adds support for
pluggable identity schemes.
Some care has been taken to make the "v4" scheme standalone. It uses
public APIs only and could be moved out of package enr at any time.
A couple of minor changes were needed to make identity schemes work:
- The sequence number is now updated in Set instead of when signing.
- Record is now copy-safe, i.e. calling Set on a shallow copy doesn't
modify the record it was copied from.
This change removes a peer information from dialing history
when peer is removed from static list. It allows to force a
server to re-dial concrete peer if it is needed.
In our case we are running geth node on mobile devices, and
it is common for a network connection to flap on mobile.
Almost every time it flaps or network connection is changed
from cellular to wifi peers are disconnected with read
timeout. And usually it takes 30 seconds (default expiration
timeout) to recover connection with static peers after
connectivity is restored.
This change allows us to reconnect with peers almost
immediately and it seems harmless enough.
I forgot to change the check in udp.go when I changed Table.bond to be
based on lastPong instead of node presence in db. Rename lastPong to
bondTime and add hasBond so it's clearer what this DB key is used for
now.
* cmd,node,rpc: add allowedHosts to prevent dns rebinding attacks
* p2p,node: Fix bug with dumpconfig introduced in r54aeb8e4c0bb9f0e7a6c67258af67df3b266af3d
* rpc: add wildcard support for rpcallowedhosts + go fmt
* cmd/geth, cmd/utils, node, rpc: ignore direct ip(v4/6) addresses in rpc virtual hostnames check
* http, rpc, utils: make vhosts into map, address review concerns
* node: change log messages to use geth standard (not sprintf)
* rpc: fix spelling
* p2p: add DialRatio for configuration of inbound vs. dialed connections
* p2p: add connection flags to PeerInfo
* p2p/netutil: add SameNet, DistinctNetSet
* p2p/discover: improve revalidation and seeding
This changes node revalidation to be periodic instead of on-demand. This
should prevent issues where dead nodes get stuck in closer buckets
because no other node will ever come along to replace them.
Every 5 seconds (on average), the last node in a random bucket is
checked and moved to the front of the bucket if it is still responding.
If revalidation fails, the last node is replaced by an entry of the
'replacement list' containing recently-seen nodes.
Most close buckets are removed because it's very unlikely we'll ever
encounter a node that would fall into any of those buckets.
Table seeding is also improved: we now require a few minutes of table
membership before considering a node as a potential seed node. This
should make it less likely to store short-lived nodes as potential
seeds.
* p2p/discover: fix nits in UDP transport
We would skip sending neighbors replies if there were fewer than
maxNeighbors results and CheckRelayIP returned an error for the last
one. While here, also resolve a TODO about pong reply tokens.
This commit affects p2p/discv5 "topic discovery" by running it on
the same UDP port where the old discovery works. This is realized
by giving an "unhandled" packet channel to the old v4 discovery
packet handler where all invalid packets are sent. These packets
are then processed by v5. v5 packets are always invalid when
interpreted by v4 and vice versa. This is ensured by adding one
to the first byte of the packet hash in v5 packets.
DiscoveryV5Bootnodes is also changed to point to new bootnodes
that are implementing the changed packet format with modified
hash. Existing and new v5 bootnodes are both running on different
ports ATM.
* core/types, core/vm, eth, tests: regenerate gencodec files
* Makefile: update devtools target
Install protoc-gen-go and print reminders about npm, solc and protoc.
Also switch to github.com/kevinburke/go-bindata because it's more
maintained.
* contracts/ens: update contracts and regenerate with solidity v0.4.19
The newer upstream version of the FIFSRegistrar contract doesn't set the
resolver anymore. The resolver is now deployed separately.
* contracts/release: regenerate with solidity v0.4.19
* contracts/chequebook: fix fallback and regenerate with solidity v0.4.19
The contract didn't have a fallback function, payments would be rejected
when compiled with newer solidity. References to 'mortal' and 'owned'
use the local file system so we can compile without network access.
* p2p/discv5: regenerate with recent stringer
* cmd/faucet: regenerate
* dashboard: regenerate
* eth/tracers: regenerate
* internal/jsre/deps: regenerate
* dashboard: avoid sed -i because it's not portable
* accounts/usbwallet/internal/trezor: fix go generate warnings
p2p/simulations: introduce dialBan
- Refactor simulations/network connection getters to support
avoiding simultaneous dials between two peers If two peers dial
simultaneously, the connection will be dropped to help avoid
that, we essentially lock the connection object with a
timestamp which serves as a ban on dialing for a period of time
(dialBanTimeout).
- The connection getter InitConn can be wrapped and passed to the
nodes via adapters.NodeConfig#Reachable field and then used by
the respective services when they initiate connections. This
massively stablise the emerging connectivity when running with
hundreds of nodes bootstrapping a network.
p2p: add Inbound public method to p2p.Peer
p2p/simulations: Add server id to logs to support debugging
in-memory network simulations when multiple peers are logging.
p2p: SetupConn now returns error. The dialer checks the error and
only calls resolve if the actual TCP dial fails.
This commit introduces a network simulation framework which
can be used to run simulated networks of devp2p nodes. The
intention is to use this for testing protocols, performing
benchmarks and visualising emergent network behaviour.
Using a Timer over Ticker seems to be a lot better, though I cannot fully
account for why that it behaves so (since Ticker should be more bursty, but not
necessarily more active over time, but that may depend on how long window it
uses to decide on when to tick next)
* p2p/discover, p2p/discv5: add marshaling methods to Node
* p2p/netutil: make Netlist decodable from TOML
* common/math: encode nil HexOrDecimal256 as 0x0
* cmd/geth: add --config file flag
* cmd/geth: add missing license header
* eth: prettify Config again, fix tests
* eth: use gasprice.Config instead of duplicating its fields
* eth/gasprice: hide nil default from dumpconfig output
* cmd/geth: hide genesis block in dumpconfig output
* node: make tests compile
* console: fix tests
* cmd/geth: make TOML keys look exactly like Go struct fields
* p2p: use discovery by default
This makes the zero Config slightly more useful. It also fixes package
node tests because Node detects reuse of the datadir through the
NodeDatabase.
* cmd/geth: make ethstats URL settable through config file
* cmd/faucet: fix configuration
* cmd/geth: dedup attach tests
* eth: add comment for DefaultConfig
* eth: pass downloader.SyncMode in Config
This removes the FastSync, LightSync flags in favour of a more
general SyncMode flag.
* cmd/utils: remove jitvm flags
* cmd/utils: make mutually exclusive flag error prettier
It now reads:
Fatal: flags --dev, --testnet can't be used at the same time
* p2p: fix typo
* node: add DefaultConfig, use it for geth
* mobile: add missing NoDiscovery option
* cmd/utils: drop MakeNode
This exposed a couple of places that needed to be updated to use
node.DefaultConfig.
* node: fix typo
* eth: make fast sync the default mode
* cmd/utils: remove IPCApiFlag (unused)
* node: remove default IPC path
Set it in the frontends instead.
* cmd/geth: add --syncmode
* cmd/utils: make --ipcdisable and --ipcpath mutually exclusive
* cmd/utils: don't enable WS, HTTP when setting addr
* cmd/utils: fix --identity
The p2p packages can now be configured to restrict all communication to
a certain subset of IP networks. This feature is meant to be used for
private networks.
The discovery DHT contains a number of hosts with LAN and loopback IPs.
These get relayed because some implementations do not perform any checks
on the IP.
go-ethereum already prevented relay in most cases because it verifies
that the host actually exists before adding it to the local table. But
this verification causes other issues. We have received several reports
where people's VPSs got shut down by hosting providers because sending
packets to random LAN hosts is indistinguishable from a slow port scan.
The new check prevents sending random packets to LAN by discarding LAN
IPs sent by Internet hosts (and loopback IPs from LAN and Internet
hosts). The new check also blacklists almost all currently registered
special-purpose networks assigned by IANA to avoid inciting random
responses from services in the LAN.
As another precaution against abuse of the DHT, ports below 1024 are now
considered invalid.
The new package contains three things for now:
- IP network list parsing and matching
- The WSAEMSGSIZE workaround, which is duplicated in p2p/discover and
p2p/discv5.
Port mapper auto discovery used to run immediately after parsing the
--nat flag, giving it a slight performance boost. But this is becoming
inconvenient because we create node.Node for all geth operations
including account management and bare chain interaction. Delay
autodiscovery until the first use instead, which avoids any network
interaction until the node is actually started.
On Windows, UDPConn.ReadFrom returns an error for packets larger
than the receive buffer. The error is not marked temporary, causing
our loop to exit when the first oversized packet arrived. The fix
is to treat this particular error as temporary.
Fixes: #1579, #2087
Updates: #2082
This change makes it possible to add peers without providing their IP
address. The endpoint of the target node is resolved using the discovery
protocol.
nodeDB.querySeeds was not safe for concurrent use but could be called
concurrenty on multiple goroutines in the following case:
- the table was empty
- a timed refresh started
- a lookup was started and initiated refresh
These conditions are unlikely to coincide during normal use, but are
much more likely to occur all at once when the user's machine just woke
from sleep. The root cause of the issue is that querySeeds reused the
same leveldb iterator until it was exhausted.
This commit moves the refresh scheduling logic into its own goroutine
(so only one refresh is ever active) and changes querySeeds to not use
a persistent iterator. The seed node selection is now more random and
ignores nodes that have not been contacted in the last 5 days.
PR #1621 changed Table locking so the mutex is not held while a
contested node is being pinged. If multiple nodes ping the local node
during this time window, multiple ping packets will be sent to the
contested node. The changes in this commit prevent multiple packets by
tracking whether the node is being replaced.
If the timeout fired (even just nanoseconds) before the deadline of the
next pending reply, the timer was not rescheduled. The timer would've
been rescheduled anyway once the next packet was sent, but there were
cases where no next packet could ever be sent due to the locking issue
fixed in the previous commit.
As timing-related bugs go, this issue had been present for a long time
and I could never reproduce it. The test added in this commit did
reproduce the issue on about one out of 15 runs.