Here we add special handling for sending an error response when the write timeout of the
HTTP server is just about to expire. This is surprisingly difficult to get right, since is
must be ensured that all output is fully flushed in time, which needs support from
multiple levels of the RPC handler stack:
The timeout response can't use chunked transfer-encoding because there is no way to write
the final terminating chunk. net/http writes it when the topmost handler returns, but the
timeout will already be over by the time that happens. We decided to disable chunked
encoding by setting content-length explicitly.
Gzip compression must also be disabled for timeout responses because we don't know the
true content-length before compressing all output, i.e. compression would reintroduce
chunked transfer-encoding.
This changes the CI / release builds to use the latest Go version. It also
upgrades golangci-lint to a newer version compatible with Go 1.19.
In Go 1.19, godoc has gained official support for links and lists. The
syntax for code blocks in doc comments has changed and now requires a
leading tab character. gofmt adapts comments to the new syntax
automatically, so there are a lot of comment re-formatting changes in this
PR. We need to apply the new format in order to pass the CI lint stage with
Go 1.19.
With the linter upgrade, I have decided to disable 'gosec' - it produces
too many false-positive warnings. The 'deadcode' and 'varcheck' linters
have also been removed because golangci-lint warns about them being
unmaintained. 'unused' provides similar coverage and we already have it
enabled, so we don't lose much with this change.
This enables the following linters
- typecheck
- unused
- staticcheck
- bidichk
- durationcheck
- exportloopref
- gosec
WIth a few exceptions.
- We use a deprecated protobuf in trezor. I didn't want to mess with that, since I cannot meaningfully test any changes there.
- The deprecated TypeMux is used in a few places still, so the warning for it is silenced for now.
- Using string type in context.WithValue is apparently wrong, one should use a custom type, to prevent collisions between different places in the hierarchy of callers. That should be fixed at some point, but may require some attention.
- The warnings for using weak random generator are squashed, since we use a lot of random without need for cryptographic guarantees.
This change makes use of the new code generator rlp/rlpgen to improve the
performance of RLP encoding for Header and StateAccount. It also speeds up
encoding of ReceiptForStorage using the new rlp.EncoderBuffer API.
The change is much less transparent than I wanted it to be, because Header and
StateAccount now have an EncodeRLP method defined with pointer receiver. It
used to be possible to encode non-pointer values of these types, but the new
method prevents that and attempting to encode unadressable values (even if
part of another value) will return an error. The error can be surprising and may
pop up in places that previously didn't expect any errors.
To make things work, I also needed to update all code paths (mostly in unit tests)
that lead to encoding of non-pointer values, and pass a pointer instead.
Benchmark results:
name old time/op new time/op delta
EncodeRLP/legacy-header-8 328ns ± 0% 237ns ± 1% -27.63% (p=0.000 n=8+8)
EncodeRLP/london-header-8 353ns ± 0% 247ns ± 1% -30.06% (p=0.000 n=8+8)
EncodeRLP/receipt-for-storage-8 237ns ± 0% 123ns ± 0% -47.86% (p=0.000 n=8+7)
EncodeRLP/receipt-full-8 297ns ± 0% 301ns ± 1% +1.39% (p=0.000 n=8+8)
name old speed new speed delta
EncodeRLP/legacy-header-8 1.66GB/s ± 0% 2.29GB/s ± 1% +38.19% (p=0.000 n=8+8)
EncodeRLP/london-header-8 1.55GB/s ± 0% 2.22GB/s ± 1% +42.99% (p=0.000 n=8+8)
EncodeRLP/receipt-for-storage-8 38.0MB/s ± 0% 64.8MB/s ± 0% +70.48% (p=0.000 n=8+7)
EncodeRLP/receipt-full-8 910MB/s ± 0% 897MB/s ± 1% -1.37% (p=0.000 n=8+8)
name old alloc/op new alloc/op delta
EncodeRLP/legacy-header-8 0.00B 0.00B ~ (all equal)
EncodeRLP/london-header-8 0.00B 0.00B ~ (all equal)
EncodeRLP/receipt-for-storage-8 64.0B ± 0% 0.0B -100.00% (p=0.000 n=8+8)
EncodeRLP/receipt-full-8 320B ± 0% 320B ± 0% ~ (all equal)
Some benchmarks in eth/filters were not good: they weren't reproducible, relying on geth chaindata to be present.
Another one was rejected because the receipt was lacking a backing transcation.
The p2p simulation benchmark had a lot of the warnings below, due to the framework calling both
Stop() and Close(). Apparently, the simulated adapter is the only implementation which has a Close(),
and there is no need to call both Stop and Close on it.
USB enumeration still occured. Make sure it will only occur if --usb is set.
This also deprecates the 'NoUSB' config file option in favor of a new option 'USB'.
- Remove the ws:// prefix from the status endpoint since
the ws:// is already included in the stack.WSEndpoint().
- Don't register the services again in the node start.
Registration is already done in the initialization stage.
- Expose admin namespace via websocket.
This namespace is necessary for connecting the peers via websocket.
- Offer logging relevant options for exec adapter.
It's really painful to mix all log output in the single console. So
this PR offers two additional options for exec adapter in this case
testers can config the log output(e.g. file output) and log level
for each p2p node.
This adds a few tiny fixes for les and the p2p simulation framework:
LES Parts
- Keep the LES-SERVER connection even it's non-synced
We had this idea to reject the connections in LES protocol if the les-server itself is
not synced. However, in LES protocol we will also receive the connection from another
les-server. In this case even the local node is not synced yet, we should keep the tcp
connection for other protocols(e.g. eth protocol).
- Don't count "invalid message" for non-existing GetBlockHeadersMsg request
In the eth syncing mechanism (full sync, fast sync, light sync), it will try to fetch
some non-existent blocks or headers(to ensure we indeed download all the missing chain).
In this case, it's possible that the les-server will receive the request for
non-existent headers. So don't count it as the "invalid message" for scheduling
dropping.
- Copy the announce object in the closure
Before the les-server pushes the latest headers to all connected clients, it will create
a closure and queue it in the underlying request scheduler. In some scenarios it's
problematic. E.g, in private networks, the block can be mined very fast. So before the
first closure is executed, we may already update the latest_announce object. So actually
the "announce" object we want to send is replaced.
The downsize is the client will receive two announces with the same td and then drop the
server.
P2P Simulation Framework
- Don't double register the protocol services in p2p-simulation "Start".
The protocols upon the devp2p are registered in the "New node stage". So don't reigster
them again when starting a node in the p2p simulation framework
- Add one more new config field "ExternalSigner", in order to use clef service in the
framework.
This PR significantly changes the APIs for instantiating Ethereum nodes in
a Go program. The new APIs are not backwards-compatible, but we feel that
this is made up for by the much simpler way of registering services on
node.Node. You can find more information and rationale in the design
document: https://gist.github.com/renaynay/5bec2de19fde66f4d04c535fd24f0775.
There is also a new feature in Node's Go API: it is now possible to
register arbitrary handlers on the user-facing HTTP server. In geth, this
facility is used to enable GraphQL.
There is a single minor change relevant for geth users in this PR: The
GraphQL API is no longer available separately from the JSON-RPC HTTP
server. If you want GraphQL, you need to enable it using the
./geth --http --graphql flag combination.
The --graphql.port and --graphql.addr flags are no longer available.
* p2p: add low port check in dialer
We already have a check like this for UDP ports, add a similar one in
the dialer. This prevents dials to port zero and it's also an extra
layer of protection against spamming HTTP servers.
* p2p/discover: use errLowPort in v4 code
* p2p: change port check
* p2p: add comment
* p2p/simulations/adapters: ensure assigned port is in all node records
* p2p: new dial scheduler
This change replaces the peer-to-peer dial scheduler with a new and
improved implementation. The new code is better than the previous
implementation in two key aspects:
- The time between discovery of a node and dialing that node is
significantly lower in the new version. The old dialState kept
a buffer of nodes and launched a task to refill it whenever the buffer
became empty. This worked well with the discovery interface we used to
have, but doesn't really work with the new iterator-based discovery
API.
- Selection of static dial candidates (created by Server.AddPeer or
through static-nodes.json) performs much better for large amounts of
static peers. Connections to static nodes are now limited like dynanic
dials and can no longer overstep MaxPeers or the dial ratio.
* p2p/simulations/adapters: adapt to new NodeDialer interface
* p2p: re-add check for self in checkDial
* p2p: remove peersetCh
* p2p: allow static dials when discovery is disabled
* p2p: add test for dialScheduler.removeStatic
* p2p: remove blank line
* p2p: fix documentation of maxDialPeers
* p2p: change "ok" to "added" in static node log
* p2p: improve dialTask docs
Also increase log level for "Can't resolve node"
* p2p: ensure dial resolver is truly nil without discovery
* p2p: add "looking for peers" log message
* p2p: clean up Server.run comments
* p2p: fix maxDialedConns for maxpeers < dialRatio
Always allocate at least one dial slot unless dialing is disabled using
NoDial or MaxPeers == 0. Most importantly, this fixes MaxPeers == 1 to
dedicate the sole slot to dialing instead of listening.
* p2p: fix RemovePeer to disconnect the peer again
Also make RemovePeer synchronous and add a test.
* p2p: remove "Connection set up" log message
* p2p: clean up connection logging
We previously logged outgoing connection failures up to three times.
- in SetupConn() as "Setting up connection failed addr=..."
- in setupConn() with an error-specific message and "id=... addr=..."
- in dial() as "Dial error task=..."
This commit ensures a single log message is emitted per failure and adds
"id=... addr=... conn=..." everywhere (id= omitted when the ID isn't
known yet).
Also avoid printing a log message when a static dial fails but can't be
resolved because discv4 is disabled. The light client hit this case all
the time, increasing the message count to four lines per failed
connection.
* p2p: document that RemovePeer blocks
* build: use golangci-lint
This changes build/ci.go to download and run golangci-lint instead
of gometalinter.
* core/state: fix unnecessary conversion
* p2p/simulations: fix lock copying (found by go vet)
* signer/core: fix unnecessary conversions
* crypto/ecies: remove unused function cmpPublic
* core/rawdb: remove unused function print
* core/state: remove unused function xTestFuzzCutter
* core/vm: disable TestWriteExpectedValues in a different way
* core/forkid: remove unused function checksum
* les: remove unused type proofsData
* cmd/utils: remove unused functions prefixedNames, prefixFor
* crypto/bn256: run goimports
* p2p/nat: fix goimports lint issue
* cmd/clef: avoid using unkeyed struct fields
* les: cancel context in testRequest
* rlp: delete unreachable code
* core: gofmt
* internal/build: simplify DownloadFile for Go 1.11 compatibility
* build: remove go test --short flag
* .travis.yml: disable build cache
* whisper/whisperv6: fix ineffectual assignment in TestWhisperIdentityManagement
* .golangci.yml: enable goconst and ineffassign linters
* build: print message when there are no lint issues
* internal/build: refactor download a bit
* rpc: improve codec abstraction
rpc.ServerCodec is an opaque interface. There was only one way to get a
codec using existing APIs: rpc.NewJSONCodec. This change exports
newCodec (as NewFuncCodec) and NewJSONCodec (as NewCodec). It also makes
all codec methods non-public to avoid showing internals in godoc.
While here, remove codec options in tests because they are not
supported anymore.
* p2p/simulations: use github.com/gorilla/websocket
This package was the last remaining user of golang.org/x/net/websocket.
Migrating to the new library wasn't straightforward because it is no
longer possible to treat WebSocket connections as a net.Conn.
* vendor: delete golang.org/x/net/websocket
* rpc: fix godoc comments and run gofmt
Most of these changes are related to the Go 1.13 changes to test binary
flag handling.
* cmd/geth: make attach tests more reliable
This makes the test wait for the endpoint to come up by polling
it instead of waiting for two seconds.
* tests: fix test binary flags for Go 1.13
Calling flag.Parse during package initialization is prohibited
as of Go 1.13 and causes test failures. Call it in TestMain instead.
* crypto/ecies: remove useless -dump flag in tests
* p2p/simulations: fix test binary flags for Go 1.13
Calling flag.Parse during package initialization is prohibited
as of Go 1.13 and causes test failures. Call it in TestMain instead.
* build: remove workaround for ./... vendor matching
This workaround was necessary for Go 1.8. The Go 1.9 release changed
the expansion rules to exclude vendored packages.
* Makefile: use relative path for GOBIN
This makes the "Run ./build/bin/..." line look nicer.
* les: fix test binary flags for Go 1.13
Calling flag.Parse during package initialization is prohibited
as of Go 1.13 and causes test failures. Call it in TestMain instead.
This change
- implements concurrent LES request serving even for a single peer.
- replaces the request cost estimation method with a cost table based on
benchmarks which gives much more consistent results. Until now the
allowed number of light peers was just a guess which probably contributed
a lot to the fluctuating quality of available service. Everything related
to request cost is implemented in a single object, the 'cost tracker'. It
uses a fixed cost table with a global 'correction factor'. Benchmark code
is included and can be run at any time to adapt costs to low-level
implementation changes.
- reimplements flowcontrol.ClientManager in a cleaner and more efficient
way, with added capabilities: There is now control over bandwidth, which
allows using the flow control parameters for client prioritization.
Target utilization over 100 percent is now supported to model concurrent
request processing. Total serving bandwidth is reduced during block
processing to prevent database contention.
- implements an RPC API for the LES servers allowing server operators to
assign priority bandwidth to certain clients and change prioritized
status even while the client is connected. The new API is meant for
cases where server operators charge for LES using an off-protocol mechanism.
- adds a unit test for the new client manager.
- adds an end-to-end test using the network simulator that tests bandwidth
control functions through the new API.
* swarm/network: DRY out repeated giga comment
I not necessarily agree with the way we wait for event propagation.
But I truly disagree with having duplicated giga comments.
* p2p/simulations: encapsulate Node.Up field so we avoid data races
The Node.Up field was accessed concurrently without "proper" locking.
There was a lock on Network and that was used sometimes to access
the field. Other times the locking was missed and we had
a data race.
For example: https://github.com/ethereum/go-ethereum/pull/18464
The case above was solved, but there were still intermittent/hard to
reproduce races. So let's solve the issue permanently.
resolves: ethersphere/go-ethereum#1146
* p2p/simulations: fix unmarshal of simulations.Node
Making Node.Up field private in 13292ee897e345045fbfab3bda23a77589a271c1
broke TestHTTPNetwork and TestHTTPSnapshot. Because the default
UnmarshalJSON does not handle unexported fields.
Important: The fix is partial and not proper to my taste. But I cut
scope as I think the fix may require a change to the current
serialization format. New ticket:
https://github.com/ethersphere/go-ethereum/issues/1177
* p2p/simulations: Add a sanity test case for Node.Config UnmarshalJSON
* p2p/simulations: revert back to defer Unlock() pattern for Network
It's a good patten to call `defer Unlock()` right after `Lock()` so
(new) error cases won't miss to unlock. Let's get back to that pattern.
The patten was abandoned in 85a79b3ad3,
while fixing a data race. That data race does not exist anymore,
since the Node.Up field got hidden behind its own lock.
* p2p/simulations: consistent naming for test providers Node.UnmarshalJSON
* p2p/simulations: remove JSON annotation from private fields of Node
As unexported fields are not serialized.
* p2p/simulations: fix deadlock in Network.GetRandomDownNode()
Problem: GetRandomDownNode() locks -> getDownNodeIDs() ->
GetNodes() tries to lock -> deadlock
On Network type, unexported functions must assume that `net.lock`
is already acquired and should not call exported functions which
might try to lock again.
* p2p/simulations: ensure method conformity for Network
Connect* methods were moved to p2p/simulations.Network from
swarm/network/simulation. However these new methods did not follow
the pattern of Network methods, i.e., all exported method locks
the whole Network either for read or write.
* p2p/simulations: fix deadlock during network shutdown
`TestDiscoveryPersistenceSimulationSimAdapter` often got into deadlock.
The execution was stuck on two locks, i.e, `Kademlia.lock` and
`p2p/simulations.Network.lock`. Usually the test got stuck once in each
20 executions with high confidence.
`Kademlia` was stuck in `Kademlia.EachAddr()` and `Network` in
`Network.Stop()`.
Solution: in `Network.Stop()` `net.lock` must be released before
calling `node.Stop()` as stopping a node (somehow - I did not find
the exact code path) causes `Network.InitConn()` to be called from
`Kademlia.SuggestPeer()` and that blocks on `net.lock`.
Related ticket: https://github.com/ethersphere/go-ethereum/issues/1223
* swarm/state: simplify if statement in DBStore.Put()
* p2p/simulations: remove faulty godoc from private function
The comment started with the wrong method name.
The method is simple and self explanatory. Also, it's private.
=> Let's just remove the comment.
* node: close AccountsManager in new Close method
* p2p/simulations, p2p/simulations/adapters: handle node close on shutdown
* node: move node ephemeralKeystore cleanup to stop method
* node: call Stop in Node.Close method
* cmd/geth: close node.Node created with makeFullNode in cli commands
* node: close Node instances in tests
* cmd/geth, node: minor code style fixes
* cmd, console, miner, mobile: proper node Close() termination
* p2p/simulation: WIP minimal snapshot test
* p2p/simulation: Add snapshot create, load and verify to snapshot test
* build: add test tag for tests
* p2p/simulations, build: Revert travis change, build test sym always
* p2p/simulations: Add comments, timeout check on additional events
* p2p/simulation: Add benchmark template for minimal peer protocol init
* p2p/simulations: Remove unused code
* p2p/simulation: Correct timer reset
* p2p/simulations: Put snapshot check events in buffer and call blocking
* p2p/simulations: TestSnapshot fail if Load function returns early
* p2p/simulations: TestSnapshot wait for all connections before returning
* p2p/simulation: Revert to before wait for snap load (5e75594)
* p2p/simulations: add "conns after load" subtest to TestSnapshot
and nudge
* swarm/network: Hive - do not notify peer if discovery is disabled
* p2p/simulations: validate all connections on loading a snapshot
* p2p/simulations: track all connections in on snapshot loading
* p2p/simulations: add snapshotLoadTimeout variable
* p2p/simulations: ignore control events in snapshot load
* p2p/simulations: simplify event loop synchronization
* p2p/simulations: return already connected error from Load function
* p2p/simulations: log warning on snapshot loading disconnection