This PR adds service value measurement statistics to the light client. It
also adds a private API that makes these statistics accessible. A follow-up
PR will add the new server pool which uses these statistics to select
servers with good performance.
This document describes the function of the new components:
https://gist.github.com/zsfelfoldi/3c7ace895234b7b345ab4f71dab102d4
Co-authored-by: rjl493456442 <garyrong0905@gmail.com>
Co-authored-by: rjl493456442 <garyrong0905@gmail.com>
This adds an implementation of the current discovery v5 spec.
There is full integration with cmd/devp2p and enode.Iterator in this
version. In theory we could enable the new protocol as a replacement of
discovery v4 at any time. In practice, there will likely be a few more
changes to the spec and implementation before this can happen.
This adds additional logic to re-resolve the root name of a tree when a
couple of leaf requests have failed. We need this change to avoid
getting into a failure state where leaf requests keep failing for half
an hour when the tree has been updated.
* p2p: new dial scheduler
This change replaces the peer-to-peer dial scheduler with a new and
improved implementation. The new code is better than the previous
implementation in two key aspects:
- The time between discovery of a node and dialing that node is
significantly lower in the new version. The old dialState kept
a buffer of nodes and launched a task to refill it whenever the buffer
became empty. This worked well with the discovery interface we used to
have, but doesn't really work with the new iterator-based discovery
API.
- Selection of static dial candidates (created by Server.AddPeer or
through static-nodes.json) performs much better for large amounts of
static peers. Connections to static nodes are now limited like dynanic
dials and can no longer overstep MaxPeers or the dial ratio.
* p2p/simulations/adapters: adapt to new NodeDialer interface
* p2p: re-add check for self in checkDial
* p2p: remove peersetCh
* p2p: allow static dials when discovery is disabled
* p2p: add test for dialScheduler.removeStatic
* p2p: remove blank line
* p2p: fix documentation of maxDialPeers
* p2p: change "ok" to "added" in static node log
* p2p: improve dialTask docs
Also increase log level for "Can't resolve node"
* p2p: ensure dial resolver is truly nil without discovery
* p2p: add "looking for peers" log message
* p2p: clean up Server.run comments
* p2p: fix maxDialedConns for maxpeers < dialRatio
Always allocate at least one dial slot unless dialing is disabled using
NoDial or MaxPeers == 0. Most importantly, this fixes MaxPeers == 1 to
dedicate the sole slot to dialing instead of listening.
* p2p: fix RemovePeer to disconnect the peer again
Also make RemovePeer synchronous and add a test.
* p2p: remove "Connection set up" log message
* p2p: clean up connection logging
We previously logged outgoing connection failures up to three times.
- in SetupConn() as "Setting up connection failed addr=..."
- in setupConn() with an error-specific message and "id=... addr=..."
- in dial() as "Dial error task=..."
This commit ensures a single log message is emitted per failure and adds
"id=... addr=... conn=..." everywhere (id= omitted when the ID isn't
known yet).
Also avoid printing a log message when a static dial fails but can't be
resolved because discv4 is disabled. The light client hit this case all
the time, increasing the message count to four lines per failed
connection.
* p2p: document that RemovePeer blocks
This is a temporary fix for a problem which started happening when the
dialer was changed to read nodes from an enode.Iterator. Before the
iterator change, discovery queries would always return within a couple
seconds even if there was no Internet access. Since the iterator won't
return unless a node is actually found, discoverTask can take much
longer. This means that the 'emergency connect' logic might not execute
in time, leading to a stuck node.
* p2p/dnsdisc: add support for enode.Iterator
This changes the dnsdisc.Client API to support the enode.Iterator
interface.
* p2p/dnsdisc: rate-limit DNS requests
* p2p/dnsdisc: preserve linked trees across root updates
This improves the way links are handled when the link root changes.
Previously, sync would simply remove all links from the current tree and
garbage-collect all unreachable trees before syncing the new list of
links.
This behavior isn't great in certain cases: Consider a structure where
trees A, B, and C reference each other and D links to A. If D's link
root changed, the sync code would first remove trees A, B and C, only to
re-sync them later when the link to A was found again.
The fix for this problem is to track the current set of links in each
clientTree and removing old links only AFTER all links are synced.
* p2p/dnsdisc: deflake iterator test
* cmd/devp2p: adapt dnsClient to new p2p/dnsdisc API
* p2p/dnsdisc: tiny comment fix
* build: use golangci-lint
This changes build/ci.go to download and run golangci-lint instead
of gometalinter.
* core/state: fix unnecessary conversion
* p2p/simulations: fix lock copying (found by go vet)
* signer/core: fix unnecessary conversions
* crypto/ecies: remove unused function cmpPublic
* core/rawdb: remove unused function print
* core/state: remove unused function xTestFuzzCutter
* core/vm: disable TestWriteExpectedValues in a different way
* core/forkid: remove unused function checksum
* les: remove unused type proofsData
* cmd/utils: remove unused functions prefixedNames, prefixFor
* crypto/bn256: run goimports
* p2p/nat: fix goimports lint issue
* cmd/clef: avoid using unkeyed struct fields
* les: cancel context in testRequest
* rlp: delete unreachable code
* core: gofmt
* internal/build: simplify DownloadFile for Go 1.11 compatibility
* build: remove go test --short flag
* .travis.yml: disable build cache
* whisper/whisperv6: fix ineffectual assignment in TestWhisperIdentityManagement
* .golangci.yml: enable goconst and ineffassign linters
* build: print message when there are no lint issues
* internal/build: refactor download a bit
* rpc: improve codec abstraction
rpc.ServerCodec is an opaque interface. There was only one way to get a
codec using existing APIs: rpc.NewJSONCodec. This change exports
newCodec (as NewFuncCodec) and NewJSONCodec (as NewCodec). It also makes
all codec methods non-public to avoid showing internals in godoc.
While here, remove codec options in tests because they are not
supported anymore.
* p2p/simulations: use github.com/gorilla/websocket
This package was the last remaining user of golang.org/x/net/websocket.
Migrating to the new library wasn't straightforward because it is no
longer possible to treat WebSocket connections as a net.Conn.
* vendor: delete golang.org/x/net/websocket
* rpc: fix godoc comments and run gofmt
This adds all dashboard changes from the last couple months.
We're about to remove the dashboard, but decided that we should
get all the recent work in first in case anyone wants to pick up this
project later on.
* cmd, dashboard, eth, p2p: send peer info to the dashboard
* dashboard: update npm packages, improve UI, rebase
* dashboard, p2p: remove println, change doc
* cmd, dashboard, eth, p2p: cleanup after review
* dashboard: send current block to the dashboard client
This adds an implementation of node discovery via DNS TXT records to the
go-ethereum library. The implementation doesn't match EIP-1459 exactly,
the main difference being that this implementation uses separate merkle
trees for tree links and ENRs. The EIP will be updated to match p2p/dnsdisc.
To maintain DNS trees, cmd/devp2p provides a frontend for the p2p/dnsdisc
library. The new 'dns' subcommands can be used to create, sign and deploy DNS
discovery trees.
Most of these changes are related to the Go 1.13 changes to test binary
flag handling.
* cmd/geth: make attach tests more reliable
This makes the test wait for the endpoint to come up by polling
it instead of waiting for two seconds.
* tests: fix test binary flags for Go 1.13
Calling flag.Parse during package initialization is prohibited
as of Go 1.13 and causes test failures. Call it in TestMain instead.
* crypto/ecies: remove useless -dump flag in tests
* p2p/simulations: fix test binary flags for Go 1.13
Calling flag.Parse during package initialization is prohibited
as of Go 1.13 and causes test failures. Call it in TestMain instead.
* build: remove workaround for ./... vendor matching
This workaround was necessary for Go 1.8. The Go 1.9 release changed
the expansion rules to exclude vendored packages.
* Makefile: use relative path for GOBIN
This makes the "Run ./build/bin/..." line look nicer.
* les: fix test binary flags for Go 1.13
Calling flag.Parse during package initialization is prohibited
as of Go 1.13 and causes test failures. Call it in TestMain instead.
* eth: chain config (genesis + fork) ENR entry
* core/forkid, eth: protocol independent fork ID, update to CRC32 spec
* core/forkid, eth: make forkid a struct, next uint64, enr struct, RLP
* core/forkid: change forkhash rlp encoding from int to [4]byte
* eth: fixup eth entry a bit and update it every block
* eth: fix lint
* eth: fix crash in ethclient tests
The dialer limits itself to one attempt every 30s. Apply the same limit
in Server and reject peers which try to connect too eagerly. The check
against the limit happens right after accepting the connection.
Further changes in this commit ensure we pass the Server logger
down to Peer instances, discovery and dialState. Unit test logging now
works in all Server tests.
* p2p/enr: add entries for for IPv4/IPv6 separation
This adds entry types for "ip6", "udp6", "tcp6" keys. The IP type stays
around because removing it would break a lot of code and force everyone
to care about the distinction.
* p2p/enode: track IPv4 and IPv6 address separately
LocalNode predicts the local node's UDP endpoint and updates the record.
This change makes it predict IPv4 and IPv6 endpoints separately since
they can now be in the record at the same time.
* p2p/enode: implement base64 text format
* all: switch to enode.Parse(...)
This allows passing base64-encoded node records to all the places that
previously accepted enode:// URLs. The URL format is still supported.
* cmd/bootnode, p2p: log node URL instead of ENR
...and return the base64 record in NodeInfo.
* p2p/discover: export Ping and RequestENR
These two are useful for checking the status of a node.
* cmd/devp2p: add devp2p debug tool
This is a new tool for debugging p2p issues. It supports a few
basic tasks for now, but many more things can and will be added
in the near future.
devp2p enrdump -- prints ENRs readably
devp2p discv4 ping -- checks if a node is up
devp2p discv4 requestenr -- gets a node's record
devp2p discv4 resolve -- finds a node through the DHT
This change implements EIP-868. The UDPv4 transport announces support
for the extension in ping/pong and handles enrRequest messages.
There are two uses of the extension: If a remote node announces support
for EIP-868 in their pong, node revalidation pulls the node's record.
The Resolve method requests the record unconditionally.
cmd/swarm/swarm-smoke: improve smoke tests (#1337)
swarm/network: remove dead code (#1339)
swarm/network: remove FetchStore and SyncChunkStore in favor of NetStore (#1342)
This change restructures the internals of p2p/discover to make room for
the discv5 code which will soon be added to this package.
- packet type names now have a "V4" suffix.
- ListenUDP returns *UDPv4 instead of *Table. This technically breaks
the API but the only caller in go-ethereum is package p2p, which uses
a compatible interface and doesn't need changes.
- The internal transport interface is changed to make Table reusable for v5.
- The 'lookup' code moves from table to transport. This required
updating the lookup unit test to use udpTest instead of a custom transport.
* swarm/network: fix data races in TestInitialPeersMsg test
* swarm/network: add Kademlia.Saturation method with lock
* swarm/network: add Hive.Peer method to safely retrieve a bzz peer
* swarm/network: remove duplicate comment
* p2p/testing: prevent goroutine leak in ProtocolTester
* swarm/network: fix data race in newBzzBaseTesterWithAddrs
* swarm/network: fix goroutone leaks in testInitialPeersMsg
* swarm/network: raise number of peer check attempts in testInitialPeersMsg
* swarm/network: use Hive.Peer in Hive.PeerInfo function
* swarm/network: reduce the scope of mutex lock in newBzzBaseTesterWithAddrs
* swarm/storage: disable TestCleanIndex with race detector
This resolves a minor issue where neighbors responses containing less
than 16 nodes would bump the failure counter, removing the node. One
situation where this can happen is a private deployment where the total
number of extant nodes is less than 16.
Issue found by @jsying.
dummyHook's fields were concurrently written by nodes and read by
the test. The simplest solution is to protect all fields with a mutex.
Enable: TestMultiplePeersDropSelf, TestMultiplePeersDropOther as they
seemingly accidentally stayed disabled during a refactor/rewrite
since 1836366ac1.
resolvesethersphere/go-ethereum#1286
* p2p/discover: remove unused function
* p2p/enode: use localItemKey for local sequence number
I added localItemKey for this purpose in #18963, but then
forgot to actually use it. This changes the database layout
yet again and requires bumping the version number.
This change
- implements concurrent LES request serving even for a single peer.
- replaces the request cost estimation method with a cost table based on
benchmarks which gives much more consistent results. Until now the
allowed number of light peers was just a guess which probably contributed
a lot to the fluctuating quality of available service. Everything related
to request cost is implemented in a single object, the 'cost tracker'. It
uses a fixed cost table with a global 'correction factor'. Benchmark code
is included and can be run at any time to adapt costs to low-level
implementation changes.
- reimplements flowcontrol.ClientManager in a cleaner and more efficient
way, with added capabilities: There is now control over bandwidth, which
allows using the flow control parameters for client prioritization.
Target utilization over 100 percent is now supported to model concurrent
request processing. Total serving bandwidth is reduced during block
processing to prevent database contention.
- implements an RPC API for the LES servers allowing server operators to
assign priority bandwidth to certain clients and change prioritized
status even while the client is connected. The new API is meant for
cases where server operators charge for LES using an off-protocol mechanism.
- adds a unit test for the new client manager.
- adds an end-to-end test using the network simulator that tests bandwidth
control functions through the new API.
I added localItemKey for this purpose in #18963, but then
forgot to actually use it. This changes the database layout
yet again and requires bumping the version number.
* swarm/network: DRY out repeated giga comment
I not necessarily agree with the way we wait for event propagation.
But I truly disagree with having duplicated giga comments.
* p2p/simulations: encapsulate Node.Up field so we avoid data races
The Node.Up field was accessed concurrently without "proper" locking.
There was a lock on Network and that was used sometimes to access
the field. Other times the locking was missed and we had
a data race.
For example: https://github.com/ethereum/go-ethereum/pull/18464
The case above was solved, but there were still intermittent/hard to
reproduce races. So let's solve the issue permanently.
resolves: ethersphere/go-ethereum#1146
* p2p/simulations: fix unmarshal of simulations.Node
Making Node.Up field private in 13292ee897e345045fbfab3bda23a77589a271c1
broke TestHTTPNetwork and TestHTTPSnapshot. Because the default
UnmarshalJSON does not handle unexported fields.
Important: The fix is partial and not proper to my taste. But I cut
scope as I think the fix may require a change to the current
serialization format. New ticket:
https://github.com/ethersphere/go-ethereum/issues/1177
* p2p/simulations: Add a sanity test case for Node.Config UnmarshalJSON
* p2p/simulations: revert back to defer Unlock() pattern for Network
It's a good patten to call `defer Unlock()` right after `Lock()` so
(new) error cases won't miss to unlock. Let's get back to that pattern.
The patten was abandoned in 85a79b3ad3,
while fixing a data race. That data race does not exist anymore,
since the Node.Up field got hidden behind its own lock.
* p2p/simulations: consistent naming for test providers Node.UnmarshalJSON
* p2p/simulations: remove JSON annotation from private fields of Node
As unexported fields are not serialized.
* p2p/simulations: fix deadlock in Network.GetRandomDownNode()
Problem: GetRandomDownNode() locks -> getDownNodeIDs() ->
GetNodes() tries to lock -> deadlock
On Network type, unexported functions must assume that `net.lock`
is already acquired and should not call exported functions which
might try to lock again.
* p2p/simulations: ensure method conformity for Network
Connect* methods were moved to p2p/simulations.Network from
swarm/network/simulation. However these new methods did not follow
the pattern of Network methods, i.e., all exported method locks
the whole Network either for read or write.
* p2p/simulations: fix deadlock during network shutdown
`TestDiscoveryPersistenceSimulationSimAdapter` often got into deadlock.
The execution was stuck on two locks, i.e, `Kademlia.lock` and
`p2p/simulations.Network.lock`. Usually the test got stuck once in each
20 executions with high confidence.
`Kademlia` was stuck in `Kademlia.EachAddr()` and `Network` in
`Network.Stop()`.
Solution: in `Network.Stop()` `net.lock` must be released before
calling `node.Stop()` as stopping a node (somehow - I did not find
the exact code path) causes `Network.InitConn()` to be called from
`Kademlia.SuggestPeer()` and that blocks on `net.lock`.
Related ticket: https://github.com/ethersphere/go-ethereum/issues/1223
* swarm/state: simplify if statement in DBStore.Put()
* p2p/simulations: remove faulty godoc from private function
The comment started with the wrong method name.
The method is simple and self explanatory. Also, it's private.
=> Let's just remove the comment.
* node: close AccountsManager in new Close method
* p2p/simulations, p2p/simulations/adapters: handle node close on shutdown
* node: move node ephemeralKeystore cleanup to stop method
* node: call Stop in Node.Close method
* cmd/geth: close node.Node created with makeFullNode in cli commands
* node: close Node instances in tests
* cmd/geth, node: minor code style fixes
* cmd, console, miner, mobile: proper node Close() termination
This change clears up confusion around the two ways in which nodes
can be added to the table.
When a neighbors packet is received as a reply to findnode, the nodes
contained in the reply are added as 'seen' entries if sufficient space
is available.
When a ping is received and the endpoint verification has taken place,
the remote node is added as a 'verified' entry or moved to the front of
the bucket if present. This also updates the node's IP address and port
if they have changed.
This change resolves multiple issues around handling of endpoint proofs.
The proof is now done separately for each IP and completing the proof
requires a matching ping hash.
Also remove waitping because it's equivalent to sleep. waitping was
slightly more efficient, but that may cause issues with findnode if
packets are reordered and the remote end sees findnode before pong.
Logging of received packets was hitherto done after handling the packet,
which meant that sent replies were logged before the packet that
generated them. This change splits up packet handling into 'preverify'
and 'handle'. The error from 'preverify' is logged, but 'handle' happens
after the message is logged. This fixes the order. Packet logs now
contain the node ID.
* p2p/simulation: WIP minimal snapshot test
* p2p/simulation: Add snapshot create, load and verify to snapshot test
* build: add test tag for tests
* p2p/simulations, build: Revert travis change, build test sym always
* p2p/simulations: Add comments, timeout check on additional events
* p2p/simulation: Add benchmark template for minimal peer protocol init
* p2p/simulations: Remove unused code
* p2p/simulation: Correct timer reset
* p2p/simulations: Put snapshot check events in buffer and call blocking
* p2p/simulations: TestSnapshot fail if Load function returns early
* p2p/simulations: TestSnapshot wait for all connections before returning
* p2p/simulation: Revert to before wait for snap load (5e75594)
* p2p/simulations: add "conns after load" subtest to TestSnapshot
and nudge
* swarm/network: Hive - do not notify peer if discovery is disabled
* p2p/simulations: validate all connections on loading a snapshot
* p2p/simulations: track all connections in on snapshot loading
* p2p/simulations: add snapshotLoadTimeout variable
* p2p/simulations: ignore control events in snapshot load
* p2p/simulations: simplify event loop synchronization
* p2p/simulations: return already connected error from Load function
* p2p/simulations: log warning on snapshot loading disconnection