Commit Graph

134 Commits

Author SHA1 Message Date
gluk256
8d04154691 p2p/simulations: wait until all connections are recreated when uploading snapshot (#19312)
* swarm/network/simulation: test cases refactored

* swarm/pss: minor refactoring

* swarm/simulation: UploadSnapshot updated

* swarm/network: style fix

* swarm/pss: bugfix
2019-03-22 11:20:17 +01:00
lash
09924cbcaa cmd/swarm, p2p, swarm: Enable ENR in binary/execadapter (#19309)
* cmd/swarm, p2p, swarm: Enable ENR in binary/execadapter

* cmd/p2p/swarm: Remove comments + config.Enode nomarshal

* p2p/simulations: Remove superfluous error check

* p2p/simulation: Move init enode comment

* swarm/api: Check error in config test

* swarm, p2p/simulations, cmd/swarm: Use nodekey in binary record sign

* cmd/swarm: Make nodekey available for swarm api config
2019-03-22 05:55:47 +01:00
Anton Evangelatov
baded64d88
swarm/network: measure time of messages in priority queue (#19250) 2019-03-20 21:30:34 +01:00
gluk256
6e401792ce swarm/pss: negihbourhood addressing simulation tests (#19278)
* swarm/pss: fixed bug in pss.process, test added

* swarm/pss: test case updated

* swarm/pss: WaitTillSnapshotRecreated() func added

* swarm/pss: snapshot test updated

* swarm/pss: WaitTillSnapshotLoaded() fixed

* swarm/pss: gofmt applied

* swarm/pss: refactoring, file renamed

* swarm/pss: input data fixed

* swarm/pss: race condition fixed

* swarm/pss: test timeout increased

* swarm/pss: eliminated the global variables

* swarm/pss: tests added

* swarm/pss: comments added

* swarm/pss: comment fixed

* swarm/pss: refactored according to review

* swarm/pss: style fix

* swarm/pss: increased timeout
2019-03-16 08:39:38 +01:00
lash
4b4f03ca37 swarm, p2p: Prerequities for ENR replacing handshake (#19275)
* swarm/api, swarm/network, p2p/simulations: Prerequisites for handshake remove

* swarm, p2p: Add full sim node configs for protocoltester

* swarm/network: Make stream package pass tests

* swarm/network: Extract peer and addr types out of protocol file

* p2p, swarm: Make p2p/protocols tests pass + rename types.go

* swarm/network: Deactivate ExecAdapter test until binary ENR prep

* swarm/api: Remove comments

* swarm/network: Uncomment bootnode record load
2019-03-15 11:27:17 +01:00
Anton Evangelatov
1a3e25e4c1
swarm: tracing improvements (#19249) 2019-03-11 11:45:34 +01:00
Anton Evangelatov
2cfe0bed9f swarm: fix relationship between spans in open tracing (#19236)
* swarm/network: propagate span with ctx

* swarm/network: try to stop stream.send.request spans on time

* swarm/storage: add chunk ref as a log to netstore.fetcher span
2019-03-08 08:52:25 +01:00
Ferenc Szabo
d45f8d1880 swarm/network: remove *WithServer tests from stream package (#19223)
These tests never run as the build tag excluded them from the CI
execution. As a results the (dead) code got out of sync with other
parts of Swarm and now they would not even compile. => Removed.

resolves ethersphere/go-ethereum#1238
2019-03-07 09:27:56 +01:00
holisticode
a87776a5fe swarm/network/stream: Fix flaky tests in GetSubscriptionsRPC test (#19227)
* swarm/network/stream: fixed timing issues

* swarm/network/stream: only count first iteration of subscriptions

* swarm/network/stream/: fix linter errors
2019-03-07 09:24:28 +01:00
holisticode
81ed700157 Enable longrunning tests to run (#19208)
* p2p/simulations: increased snapshot load timeout for debugging

* swarm/network/stream: less nodes for snapshot longrunning tests

* swarm/network: fixed longrunning tests

* swarm/network/stream: store kademlia in bucket

* swarm/network/stream: disabled healthy check in delivery tests

* swarm/network/stream: longer SyncUpdateDelay for longrunning tests

* swarm/network/stream: more debug output

* swarm/network/stream: reduced longrunning snapshot tests to 64 nodes

* swarm/network/stream: don't WaitTillHealthy in SyncerSimulation

* swarm/network/stream: cleanup for PR
2019-03-05 12:54:46 +01:00
Anton Evangelatov
f9aa1cd21f Revert "swarm/network: Use actual remote peer ip in underlay (#19137)" (#19193)
This reverts commit 460d206f30.
2019-03-02 08:45:07 +01:00
Anton Evangelatov
4e9230ea7a
swarm: enable p2p/discovery and disable dynamic dialling (#19189) 2019-03-01 12:20:37 +01:00
holisticode
994326ba00 swarm: new snapshot files (#19185) 2019-02-28 22:30:36 +01:00
lash
62d9d63858 swarm/network: WIP consider all nodes for healthy iteration (#19155)
* swarm/network: WIP consider all nodes for healthy iteration

* swarm/network/simulation: extend TestWaitTillHealthy to really check kads are healthy

* cmd/swarm/swarm-snapshot: fixed bugs in snapshot creation binary

* swarm/network/simulation: addressed PR comments

* swarm/network/simulation: defer sim.Clsoe()

* swarm/network/simulation: fixed wrong sim.Close()

* swarm/network/simulation: addressed PR comments

* cmd/swarm/swarm-snapshot: reducing default to 8 nodes, more to 4

* cmd/swarm/swarm-snapshot: extended timeout to 3 mins, or 256 nodes snapshot times out

* swarm/network/simulation: More PR comments
2019-02-28 08:12:50 +01:00
Janoš Guljaš
872370e3bc swarm/network/simulation: do not copy node mutex in UploadSnapshot (#19160) 2019-02-25 10:03:31 +01:00
Matthew Halpern
81babe1509 swarm/*: remove redundant type specifiers (#19089) 2019-02-25 08:58:18 +01:00
Matthew Halpern
90b6cdaadf cmd,swarm: enforce camel case variable names (#19060) 2019-02-24 12:39:23 +01:00
Janoš Guljaš
836c846812 swarm/network/master: protect SetNextBatch iterator after close (#19147) 2019-02-21 18:33:49 +01:00
Ferenc Szabo
e38b227ce6 Ci race detector handle failing tests (#19143)
* swarm/storage: increase mget timeout in common_test.go

 TestDbStoreCorrect_1k sometimes timed out with -race on Travis.

--- FAIL: TestDbStoreCorrect_1k (24.63s)
    common_test.go:194: testStore failed: timed out after 10s

* swarm: remove unused vars from TestSnapshotSyncWithServer

nodeCount and chunkCount is returned from setupSim and those values
we use.

* swarm: move race/norace helpers from stream to testutil

As we will need to use the flag in other packages, too.

* swarm: refactor TestSwarmNetwork case

Extract long running test cases for better visibility.

* swarm/network: skip TestSyncingViaGlobalSync with -race

As panics on Travis.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7e351b]

* swarm: run TestSwarmNetwork with fewer nodes with -race

As otherwise we always get test failure with `network_test.go:374:
context deadline exceeded` even with raised `Timeout`.

* swarm/network: run TestDeliveryFromNodes with fewer nodes with -race

Test on Travis times out with 8 or more nodes if -race flag is present.

* swarm/network: smaller node count for discovery tests with -race

TestDiscoveryPersistenceSimulationSimAdapters failed on Travis with
`-race` flag present. The failure was due to extensive memory usage,
coming from the CGO runtime. Using a smaller node count resolves the
issue.

=== RUN   TestDiscoveryPersistenceSimulationSimAdapter
==7227==ERROR: ThreadSanitizer failed to allocate 0x80000 (524288) bytes of clock allocator (error code: 12)
FATAL: ThreadSanitizer CHECK failed: ./gotsan.cc:6976 "((0 && "unable to mmap")) != (0)" (0x0, 0x0)
FAIL    github.com/ethereum/go-ethereum/swarm/network/simulations/discovery     804.826s

* swarm/network: run TestFileRetrieval with fewer nodes with -race

Otherwise we get a failure due to extensive memory usage, as the CGO
runtime cannot allocate more bytes.

=== RUN   TestFileRetrieval
==7366==ERROR: ThreadSanitizer failed to allocate 0x80000 (524288) bytes of clock allocator (error code: 12)
FATAL: ThreadSanitizer CHECK failed: ./gotsan.cc:6976 "((0 && "unable to mmap")) != (0)" (0x0, 0x0)
FAIL	github.com/ethereum/go-ethereum/swarm/network/stream	155.165s

* swarm/network: run TestRetrieval with fewer nodes with -race

Otherwise we get a failure due to extensive memory usage, as the CGO
runtime cannot allocate more bytes ("ThreadSanitizer failed to
allocate").

* swarm/network: skip flaky TestGetSubscriptionsRPC on Travis w/ -race

Test fails a lot with something like:
 streamer_test.go:1332: Real subscriptions and expected amount don't match; real: 0, expected: 20

* swarm/storage: skip TestDB_SubscribePull* tests on Travis w/ -race

Travis just hangs...

ok  	github.com/ethereum/go-ethereum/swarm/storage/feed/lookup	1.307s
keepalive
keepalive
keepalive

or panics after a while.

Without these tests the race detector job is now stable. Let's
invetigate these tests in a separate issue:
https://github.com/ethersphere/go-ethereum/issues/1245
2019-02-20 22:57:42 +01:00
lash
d36e974ba3 swarm/network: Keep span across roundtrip (#19140)
* swarm/newtork: WIP Span request span until delivery and put

* swarm/storage: Introduce new trace across single fetcher lifespan

* swarm/network: Put span ids for sendpriority in context value

* swarm: Add global span store in tracing

* swarm/tracing: Add context key constants

* swarm/tracing: Add comments

* swarm/storage: Remove redundant fix for filestore

* swarm/tracing: Elaborate constants comments

* swarm/network, swarm/storage, swarm:tracing: Minor cleanup
2019-02-20 14:50:37 +01:00
lash
460d206f30 swarm/network: Use actual remote peer ip in underlay (#19137)
* swarm/network: Logline to see handshake addr

* swarm/network: Replace remote ip in handshake uaddr

* swarm/network: Add test for enode uaddr rewrite method

* swarm/network: Remove redundance pointer return from sanitize

* swarm/network: Obeying the linting machine

* swarm/network: Add panic comment

(travis trigger take 1)
2019-02-20 14:46:00 +01:00
Janoš Guljaš
ba2dfa5ce4 swarm/network/stream: fix a goroutine leak in Registry (#19139)
* swarm/network/stream: fix a goroutine leak in Registry

* swarm/network, swamr/network/stream: Kademlia close addr count and depth change chans

* swarm/network/stream: rename close channel to quit

* swarm/network/stream: fix sync between NewRegistry goroutine and Close method
2019-02-20 14:45:25 +01:00
Ferenc Szabo
50b872bf05 p2p, swarm: fix node up races by granular locking (#18976)
* swarm/network: DRY out repeated giga comment

I not necessarily agree with the way we wait for event propagation.
But I truly disagree with having duplicated giga comments.

* p2p/simulations: encapsulate Node.Up field so we avoid data races

The Node.Up field was accessed concurrently without "proper" locking.
There was a lock on Network and that was used sometimes to access
the  field. Other times the locking was missed and we had
a data race.

For example: https://github.com/ethereum/go-ethereum/pull/18464
The case above was solved, but there were still intermittent/hard to
reproduce races. So let's solve the issue permanently.

resolves: ethersphere/go-ethereum#1146

* p2p/simulations: fix unmarshal of simulations.Node

Making Node.Up field private in 13292ee897e345045fbfab3bda23a77589a271c1
broke TestHTTPNetwork and TestHTTPSnapshot. Because the default
UnmarshalJSON does not handle unexported fields.

Important: The fix is partial and not proper to my taste. But I cut
scope as I think the fix may require a change to the current
serialization format. New ticket:
https://github.com/ethersphere/go-ethereum/issues/1177

* p2p/simulations: Add a sanity test case for Node.Config UnmarshalJSON

* p2p/simulations: revert back to defer Unlock() pattern for Network

It's a good patten to call `defer Unlock()` right after `Lock()` so
(new) error cases won't miss to unlock. Let's get back to that pattern.

The patten was abandoned in 85a79b3ad3,
while fixing a data race. That data race does not exist anymore,
since the Node.Up field got hidden behind its own lock.

* p2p/simulations: consistent naming for test providers Node.UnmarshalJSON

* p2p/simulations: remove JSON annotation from private fields of Node

As unexported fields are not serialized.

* p2p/simulations: fix deadlock in Network.GetRandomDownNode()

Problem: GetRandomDownNode() locks -> getDownNodeIDs() ->
GetNodes() tries to lock -> deadlock

On Network type, unexported functions must assume that `net.lock`
is already acquired and should not call exported functions which
might try to lock again.

* p2p/simulations: ensure method conformity for Network

Connect* methods were moved to p2p/simulations.Network from
swarm/network/simulation. However these new methods did not follow
the pattern of Network methods, i.e., all exported method locks
the whole Network either for read or write.

* p2p/simulations: fix deadlock during network shutdown

`TestDiscoveryPersistenceSimulationSimAdapter` often got into deadlock.
The execution was stuck on two locks, i.e, `Kademlia.lock` and
`p2p/simulations.Network.lock`. Usually the test got stuck once in each
20 executions with high confidence.

`Kademlia` was stuck in `Kademlia.EachAddr()` and `Network` in
`Network.Stop()`.

Solution: in `Network.Stop()` `net.lock` must be released before
calling `node.Stop()` as stopping a node (somehow - I did not find
the exact code path) causes `Network.InitConn()` to be called from
`Kademlia.SuggestPeer()` and that blocks on `net.lock`.

Related ticket: https://github.com/ethersphere/go-ethereum/issues/1223

* swarm/state: simplify if statement in DBStore.Put()

* p2p/simulations: remove faulty godoc from private function

The comment started with the wrong method name.

The method is simple and self explanatory. Also, it's private.
=> Let's just remove the comment.
2019-02-18 07:38:14 +01:00
holisticode
2af24724dd swarm/network: Saturation check for healthy networks (#19071)
* swarm/network: new saturation for  implementation

* swarm/network: re-added saturation func in Kademlia as it is used elsewhere

* swarm/network: saturation with higher MinBinSize

* swarm/network: PeersPerBin with depth check

* swarm/network: edited tests to pass new saturated check

* swarm/network: minor fix saturated check

* swarm/network/simulations/discovery: fixed renamed RPC call

* swarm/network: renamed to isSaturated and returns bool

* swarm/network: early depth check
2019-02-14 19:01:50 +01:00
Elad
3ee09ba035 swarm/storage/netstore: add fetcher cancellation on shutdown (#19049)
swarm/network/stream: remove netstore internal wg
swarm/network/stream: run individual tests with t.Run
2019-02-14 07:51:57 +01:00
Janoš Guljaš
3fd6db2bf6 swarm: fix network/stream data races (#19051)
* swarm/network/stream: newStreamerTester cleanup only if err is nil

* swarm/network/stream: raise newStreamerTester waitForPeers timeout

* swarm/network/stream: fix data races in GetPeerSubscriptions

* swarm/storage: prevent data race on LDBStore.batchesC

https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-461775049

* swarm/network/stream: fix TestGetSubscriptionsRPC data race

https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-461768477

* swarm/network/stream: correctly use Simulation.Run callback

https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-461783804

* swarm/network: protect addrCountC in Kademlia.AddrCountC function

https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-462273444

* p2p/simulations: fix a deadlock calling getRandomNode with lock

https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-462317407

* swarm/network/stream: terminate disconnect goruotines in tests

* swarm/network/stream: reduce memory consumption when testing data races

* swarm/network/stream: add watchDisconnections helper function

* swarm/network/stream: add concurrent counter for tests

* swarm/network/stream: rename race/norace test files and use const

* swarm/network/stream: remove watchSim and its panic

* swarm/network/stream: pass context in watchDisconnections

* swarm/network/stream: add concurrent safe bool for watchDisconnections

* swarm/storage: fix LDBStore.batchesC data race by not closing it
2019-02-13 13:03:23 +01:00
Ferenc Szabo
27e3f96819 swarm: CI race detector test adjustments (#19017) 2019-02-08 17:07:11 +01:00
lash
0c10d37606 swarm/network, swarm/storage: Preserve opentracing contexts (#19022) 2019-02-08 16:57:48 +01:00
holisticode
41597c2856 swarm: Debug API and HasChunks() API endpoint (#18980) 2019-02-07 15:49:19 +01:00
Ferenc Szabo
1c3aa8d9b1 swarm/storage: fix test timeout with -race by increasing mget timeout 2019-02-05 14:34:34 +01:00
Anton Evangelatov
597597e8b2 swarm/network: refactor simulation tests bootstrap (#18975) 2019-02-01 09:58:46 +01:00
holisticode
43e1b7b124 swarm: GetPeerSubscriptions RPC (#18972) 2019-01-30 21:03:08 +01:00
Janoš Guljaš
592bf6a59c swarm: fix flaky delivery tests (#18971) 2019-01-30 14:03:11 +01:00
lash
f9401ae011 swarm/network: Remove extra random peer, connect test sanity, comments (#18964) 2019-01-30 09:49:58 +01:00
Elad
2abeb35d54 p2p/testing, swarm: remove unused testing.T in protocol tester (#18500) 2019-01-24 17:23:34 +01:00
gluk256
ad13d2d407 swarm/version: commit version added (#18510) 2019-01-24 12:35:10 +01:00
Anton Evangelatov
bbd120354a
swarm: bootnode-mode, new bootnodes and no p2p package discovery (#18498) 2019-01-24 12:02:18 +01:00
Viktor Trón
15b9b39e6c
swarm/network: unskip tests previously skipped due to suggestPeer issues (#18477) 2019-01-19 08:12:57 +01:00
Ferenc Szabo
19bfcbf911 swarm/network: fix data race in fetcher_test.go (#18469) 2019-01-17 16:45:36 +01:00
Ferenc Szabo
4f8ec44565 swarm/network: fix data race in stream.(*Peer).handleOfferedHashesMsg() (#18468)
* swarm/network: fix data race in stream.(*Peer).handleOfferedHashesMsg()

handleOfferedHashesMsg() contained a data race:
- read => in a goroutine, call to c.batchDone()
- write => in the main thread, write to c.sessionAt

c.batchDone() contained a call to c.AddInterval(). Client was a value
receiver for AddInterval. So on c.AddInterval() call the whole client
struct got copied (read) while one of its field was modified in
handleOfferedHashesMsg() (write).

fixes ethersphere/go-ethereum#1086

* swarm/network: simplify some trivial statements
2019-01-17 14:44:29 +01:00
Elad
81e26d5a48 swarm/network: fix data race warning on TestBzzHandshakeLightNode (#18459) 2019-01-17 11:38:23 +01:00
Viktor Trón
bcb2594151
swarm/network: rewrite of peer suggestion engine, fix skipped tests (#18404)
* swarm/network: fix skipped tests related to suggestPeer

* swarm/network: rename depth to radius

* swarm/network: uncomment assertHealth and improve comments

* swarm/network: remove commented code

* swarm/network: kademlia suggestPeer algo correction

* swarm/network: kademlia suggest peer

 * simplify suggest Peer code
 * improve peer suggestion algo
 * add comments
 * kademlia testing improvements
   * assertHealth -> checkHealth (test helper)
   * testSuggestPeer -> checkSuggestPeer (test helper)
   * remove testSuggestPeerBug and TestKademliaCase

* swarm/network: kademlia suggestPeer cleanup, improved comments

* swarm/network: minor comment, discovery test default arg
2019-01-17 07:29:34 +01:00
Elad
34f11e752f cmd/swarm/swarm-snapshot: swarm snapshot generator (#18453)
* cmd/swarm/swarm-snapshot: add binary to create network snapshots

* cmd/swarm/swarm-snapshot: refactor and extend tests

* p2p/simulations: remove unused triggerChecks func and fix linter

* internal/cmdtest: raise the timeout for killing TestCmd

* cmd/swarm/swarm-snapshot: add more comments and other minor adjustments

* cmd/swarm/swarm-snapshot: remove redundant check in createSnapshot

* cmd/swarm/swarm-snapshot: change comment wording

* p2p/simulations: revert Simulation.Run from master

https://github.com/ethersphere/go-ethereum/pull/1077/files#r247078904

* cmd/swarm/swarm-snapshot: address pr comments

* swarm/network/simulations/discovery: removed snapshot write to file

* cmd/swarm/swarm-snapshot, swarm/network/simulations: removed redundant connection event check, fixed lint error
2019-01-16 14:33:02 +01:00
Janoš Guljaš
96c7c18b18 swarm/network: fix data race in TestNetworkID test (#18460) 2019-01-16 12:56:34 +01:00
gluk256
4aeeecfded swarm/pot: each() functions refactored (#18452) 2019-01-15 11:51:33 +01:00
holisticode
88168ff5c5 Stream subscriptions (#18355)
* swarm/network: eachBin now starts at kaddepth for nn

* swarm/network: fix Kademlia.EachBin

* swarm/network: fix kademlia.EachBin

* swarm/network: correct EachBin implementation according to requirements

* swarm/network: less addresses simplified tests

* swarm: calc kad depth outside loop in EachBin test

* swarm/network: removed printResults

* swarm/network: cleanup imports

* swarm/network: remove kademlia.EachBin; fix RequestSubscriptions and add unit test

* swarm/network/stream: address PR comments

* swarm/network/stream: package-wide subscriptionFunc

* swarm/network/stream: refactor to kad.EachConn
2019-01-11 15:08:09 +01:00
Ferenc Szabo
2eb838ed97 p2p/simulations: eliminate concept of pivot (#18426) 2019-01-11 10:23:45 +01:00
lash
7240f4d800 swarm/network: Rename minproxbinsize, add as member of simulation (#18408)
* swarm/network: Rename minproxbinsize, add as member of simulation

* swarm/network: Deactivate WaitTillHealthy, unreliable pending suggestpeer
2019-01-10 12:33:51 +01:00
Viktor Trón
6df3e4eeb0
swarm/network: remove isproxbin bool from kad.Each* iterfunc (#18239)
* swarm/network, swarm/pss: remove isproxbin bool from kad.Each* iterfunc

* swarm/network: restore comment and unskip snapshot sync tests
2019-01-10 03:36:19 +01:00
Janoš Guljaš
d70c4faf20 swarm: Fix T.Fatal inside a goroutine in tests (#18409)
* swarm/storage: fix T.Fatal inside a goroutine

* swarm/network/simulation: fix T.Fatal inside a goroutine

* swarm/network/stream: fix T.Fatal inside a goroutine

* swarm/network/simulation: consistent failures in TestPeerEventsTimeout

* swarm/network/simulation: rename sendRunSignal to triggerSimulationRun
2019-01-09 07:05:55 +01:00