Commit Graph

148 Commits

Author SHA1 Message Date
Anton Evangelatov
297fa1855c swarm/network: measure addPeer and deletePeer to know if Kad rearranged
swarm/storage: remove traces for put/get/set (#1389)

* swarm/storage: remove traces for put/get/set

* swarm/storage: remove Has traces
2019-05-10 12:29:27 +02:00
Elad
84dfaea246 swarm: instrument setNextBatch
swarm/storage/localstore: add gc metrics, disable flaky test
2019-05-10 12:29:22 +02:00
Anton Evangelatov
3e9ba57669 swarm/storage: improve instrumentation
swarm/storage/localstore: fix broken metric (#1373)

p2p/protocols: count different messages (#1374)

cmd/swarm: disable snapshot create test due to constant flakes (#1376)

swarm/network: remove redundant goroutine (#1377)
2019-05-10 12:27:04 +02:00
Anton Evangelatov
8802b9ce7f swarm-smoke: add syncDelay flag
swarm/network: add want delay timer to syncing (#1367)

swarm/network: synchronise peer.close() (#1369)
2019-05-10 12:26:55 +02:00
Elad
ad6c39012f swarm: push tags integration - request flow
swarm/api: integrate tags to count chunks being split and stored
swarm/api/http: integrate tags in middleware for HTTP `POST` calls and assert chunks being calculated and counted correctly
swarm: remove deprecated and unused code, add swarm hash to DoneSplit signature, remove calls to the api client from the http package
2019-05-10 12:26:52 +02:00
Janoš Guljaš
3030893a21 swarm/network: update syncing 2019-05-10 12:26:49 +02:00
Anton Evangelatov
f8eb8fe64c cmd/swarm-smoke: check if chunks are at most prox host
swarm/network: measure how many chunks a node delivers (#1358)
2019-05-10 12:26:41 +02:00
Elad
a1cd7e6e92 p2p/protocols, swarm/network: fix resource leak with p2p teardown 2019-05-10 12:26:37 +02:00
Anton Evangelatov
993b145f25 swarm/storage/localstore: fix export db.Put signature
cmd/swarm/swarm-smoke: improve smoke tests (#1337)

swarm/network: remove dead code (#1339)

swarm/network: remove FetchStore and SyncChunkStore in favor of NetStore (#1342)
2019-05-10 12:26:30 +02:00
Janoš Guljaš
996755c4a8 cmd/swarm, swarm: LocalStore storage integration 2019-05-10 12:26:26 +02:00
Janoš Guljaš
3873a7314d swarm/network: fix data races in TestInitialPeersMsg test (#19490)
* swarm/network: fix data races in TestInitialPeersMsg test

* swarm/network: add Kademlia.Saturation method with lock

* swarm/network: add Hive.Peer method to safely retrieve a bzz peer

* swarm/network: remove duplicate comment

* p2p/testing: prevent goroutine leak in ProtocolTester

* swarm/network: fix data race in newBzzBaseTesterWithAddrs

* swarm/network: fix goroutone leaks in testInitialPeersMsg

* swarm/network: raise number of peer check attempts in testInitialPeersMsg

* swarm/network: use Hive.Peer in Hive.PeerInfo function

* swarm/network: reduce the scope of mutex lock in newBzzBaseTesterWithAddrs

* swarm/storage: disable TestCleanIndex with race detector
2019-04-25 21:33:18 +02:00
gluk256
d9403690ec swarm/pss: Fix flaky TestProxNetwork (#19471) 2019-04-19 11:15:17 +02:00
Viktor Trón
0529015091
swarm/network: hive bug: needed shallow peers are not sent to nodes beyond connection's proximity order (#19326)
* swarm/network: fix hive bug not sending shallow peers

-  hive bug: needed shallow peers were not sent to nodes beyond connection's proximity order
- add extensive protocol exchange tests for initial subPeersMsg-peersMsg exchange
- modify bzzProtocolTester to allow pregenerated overlay addresses

* swarm/network: attempt to fix hive persistance test

* swarm/network: fix TestHiveStatePersistance (#1320)

* swarm/network: remove trace lines from the hive persistance test

* address PR review comments

* swarm/network: address PR comments on TestInitialPeersMsg

 * eliminate *testing.T argument from bzz/hive protocoltesters
 * add sorting (only runs in test code) on peersMsg payload
 * add random (0 to MaxPeersPerPO) peers for each po
 * add extra peers closer to pivot than control
2019-04-02 09:15:16 +02:00
lash
2f5b6cb442 swarm/network: Use different privatekey for bzz overlay in sim (#19313)
* cmd/swarm, p2p, swarm: Enable ENR in binary/execadapter

* cmd/p2p/swarm: Remove comments + config.Enode nomarshal

* p2p/simulations: Remove superfluous error check

* p2p/simulation: Move init enode comment

* swarm, p2p/simulations, cmd/swarm: Use nodekey in binary record sign

* swarm/network, swarm/pss: Dervice bzzkey

* swarm/pss: Remove unused function

* swarm/network: Store swarm private key in simulation bucket

* swarm/pss: Shorten TextProxNetwork shortrunning test timeout

* swarm/pss: Increase prox test timeout

* swarm/pss: Increase timeout slightly on shortrunning proxtest

* swarm/network: Simplify bucket instantiation in servicectx func

* p2p/simulations: Tcpport -> udpport

* swarm/network, swarm/pss: Simplify + correct lock in servicefunc sim

* swarm/network: Cleanup after rebase on extract swarm enode new

* p2p/simulations, swarm/network: Make exec disc test pass

* swarm/network: Prune ye olde comment

* swarm/pss: Correct revised bzzkey method call

* swarm/network: Clarify comment about privatekey generation data

* swarm/pss: Fix syntax errors after rebase

* swarm/network: Rename misleadingly named method

(amend commit to trigger ci - attempt 5)
2019-03-22 21:37:25 +01:00
gluk256
8d04154691 p2p/simulations: wait until all connections are recreated when uploading snapshot (#19312)
* swarm/network/simulation: test cases refactored

* swarm/pss: minor refactoring

* swarm/simulation: UploadSnapshot updated

* swarm/network: style fix

* swarm/pss: bugfix
2019-03-22 11:20:17 +01:00
lash
09924cbcaa cmd/swarm, p2p, swarm: Enable ENR in binary/execadapter (#19309)
* cmd/swarm, p2p, swarm: Enable ENR in binary/execadapter

* cmd/p2p/swarm: Remove comments + config.Enode nomarshal

* p2p/simulations: Remove superfluous error check

* p2p/simulation: Move init enode comment

* swarm/api: Check error in config test

* swarm, p2p/simulations, cmd/swarm: Use nodekey in binary record sign

* cmd/swarm: Make nodekey available for swarm api config
2019-03-22 05:55:47 +01:00
Anton Evangelatov
baded64d88
swarm/network: measure time of messages in priority queue (#19250) 2019-03-20 21:30:34 +01:00
gluk256
6e401792ce swarm/pss: negihbourhood addressing simulation tests (#19278)
* swarm/pss: fixed bug in pss.process, test added

* swarm/pss: test case updated

* swarm/pss: WaitTillSnapshotRecreated() func added

* swarm/pss: snapshot test updated

* swarm/pss: WaitTillSnapshotLoaded() fixed

* swarm/pss: gofmt applied

* swarm/pss: refactoring, file renamed

* swarm/pss: input data fixed

* swarm/pss: race condition fixed

* swarm/pss: test timeout increased

* swarm/pss: eliminated the global variables

* swarm/pss: tests added

* swarm/pss: comments added

* swarm/pss: comment fixed

* swarm/pss: refactored according to review

* swarm/pss: style fix

* swarm/pss: increased timeout
2019-03-16 08:39:38 +01:00
lash
4b4f03ca37 swarm, p2p: Prerequities for ENR replacing handshake (#19275)
* swarm/api, swarm/network, p2p/simulations: Prerequisites for handshake remove

* swarm, p2p: Add full sim node configs for protocoltester

* swarm/network: Make stream package pass tests

* swarm/network: Extract peer and addr types out of protocol file

* p2p, swarm: Make p2p/protocols tests pass + rename types.go

* swarm/network: Deactivate ExecAdapter test until binary ENR prep

* swarm/api: Remove comments

* swarm/network: Uncomment bootnode record load
2019-03-15 11:27:17 +01:00
Anton Evangelatov
1a3e25e4c1
swarm: tracing improvements (#19249) 2019-03-11 11:45:34 +01:00
Anton Evangelatov
2cfe0bed9f swarm: fix relationship between spans in open tracing (#19236)
* swarm/network: propagate span with ctx

* swarm/network: try to stop stream.send.request spans on time

* swarm/storage: add chunk ref as a log to netstore.fetcher span
2019-03-08 08:52:25 +01:00
Ferenc Szabo
d45f8d1880 swarm/network: remove *WithServer tests from stream package (#19223)
These tests never run as the build tag excluded them from the CI
execution. As a results the (dead) code got out of sync with other
parts of Swarm and now they would not even compile. => Removed.

resolves ethersphere/go-ethereum#1238
2019-03-07 09:27:56 +01:00
holisticode
a87776a5fe swarm/network/stream: Fix flaky tests in GetSubscriptionsRPC test (#19227)
* swarm/network/stream: fixed timing issues

* swarm/network/stream: only count first iteration of subscriptions

* swarm/network/stream/: fix linter errors
2019-03-07 09:24:28 +01:00
holisticode
81ed700157 Enable longrunning tests to run (#19208)
* p2p/simulations: increased snapshot load timeout for debugging

* swarm/network/stream: less nodes for snapshot longrunning tests

* swarm/network: fixed longrunning tests

* swarm/network/stream: store kademlia in bucket

* swarm/network/stream: disabled healthy check in delivery tests

* swarm/network/stream: longer SyncUpdateDelay for longrunning tests

* swarm/network/stream: more debug output

* swarm/network/stream: reduced longrunning snapshot tests to 64 nodes

* swarm/network/stream: don't WaitTillHealthy in SyncerSimulation

* swarm/network/stream: cleanup for PR
2019-03-05 12:54:46 +01:00
Anton Evangelatov
f9aa1cd21f Revert "swarm/network: Use actual remote peer ip in underlay (#19137)" (#19193)
This reverts commit 460d206f30.
2019-03-02 08:45:07 +01:00
Anton Evangelatov
4e9230ea7a
swarm: enable p2p/discovery and disable dynamic dialling (#19189) 2019-03-01 12:20:37 +01:00
holisticode
994326ba00 swarm: new snapshot files (#19185) 2019-02-28 22:30:36 +01:00
lash
62d9d63858 swarm/network: WIP consider all nodes for healthy iteration (#19155)
* swarm/network: WIP consider all nodes for healthy iteration

* swarm/network/simulation: extend TestWaitTillHealthy to really check kads are healthy

* cmd/swarm/swarm-snapshot: fixed bugs in snapshot creation binary

* swarm/network/simulation: addressed PR comments

* swarm/network/simulation: defer sim.Clsoe()

* swarm/network/simulation: fixed wrong sim.Close()

* swarm/network/simulation: addressed PR comments

* cmd/swarm/swarm-snapshot: reducing default to 8 nodes, more to 4

* cmd/swarm/swarm-snapshot: extended timeout to 3 mins, or 256 nodes snapshot times out

* swarm/network/simulation: More PR comments
2019-02-28 08:12:50 +01:00
Janoš Guljaš
872370e3bc swarm/network/simulation: do not copy node mutex in UploadSnapshot (#19160) 2019-02-25 10:03:31 +01:00
Matthew Halpern
81babe1509 swarm/*: remove redundant type specifiers (#19089) 2019-02-25 08:58:18 +01:00
Matthew Halpern
90b6cdaadf cmd,swarm: enforce camel case variable names (#19060) 2019-02-24 12:39:23 +01:00
Janoš Guljaš
836c846812 swarm/network/master: protect SetNextBatch iterator after close (#19147) 2019-02-21 18:33:49 +01:00
Ferenc Szabo
e38b227ce6 Ci race detector handle failing tests (#19143)
* swarm/storage: increase mget timeout in common_test.go

 TestDbStoreCorrect_1k sometimes timed out with -race on Travis.

--- FAIL: TestDbStoreCorrect_1k (24.63s)
    common_test.go:194: testStore failed: timed out after 10s

* swarm: remove unused vars from TestSnapshotSyncWithServer

nodeCount and chunkCount is returned from setupSim and those values
we use.

* swarm: move race/norace helpers from stream to testutil

As we will need to use the flag in other packages, too.

* swarm: refactor TestSwarmNetwork case

Extract long running test cases for better visibility.

* swarm/network: skip TestSyncingViaGlobalSync with -race

As panics on Travis.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7e351b]

* swarm: run TestSwarmNetwork with fewer nodes with -race

As otherwise we always get test failure with `network_test.go:374:
context deadline exceeded` even with raised `Timeout`.

* swarm/network: run TestDeliveryFromNodes with fewer nodes with -race

Test on Travis times out with 8 or more nodes if -race flag is present.

* swarm/network: smaller node count for discovery tests with -race

TestDiscoveryPersistenceSimulationSimAdapters failed on Travis with
`-race` flag present. The failure was due to extensive memory usage,
coming from the CGO runtime. Using a smaller node count resolves the
issue.

=== RUN   TestDiscoveryPersistenceSimulationSimAdapter
==7227==ERROR: ThreadSanitizer failed to allocate 0x80000 (524288) bytes of clock allocator (error code: 12)
FATAL: ThreadSanitizer CHECK failed: ./gotsan.cc:6976 "((0 && "unable to mmap")) != (0)" (0x0, 0x0)
FAIL    github.com/ethereum/go-ethereum/swarm/network/simulations/discovery     804.826s

* swarm/network: run TestFileRetrieval with fewer nodes with -race

Otherwise we get a failure due to extensive memory usage, as the CGO
runtime cannot allocate more bytes.

=== RUN   TestFileRetrieval
==7366==ERROR: ThreadSanitizer failed to allocate 0x80000 (524288) bytes of clock allocator (error code: 12)
FATAL: ThreadSanitizer CHECK failed: ./gotsan.cc:6976 "((0 && "unable to mmap")) != (0)" (0x0, 0x0)
FAIL	github.com/ethereum/go-ethereum/swarm/network/stream	155.165s

* swarm/network: run TestRetrieval with fewer nodes with -race

Otherwise we get a failure due to extensive memory usage, as the CGO
runtime cannot allocate more bytes ("ThreadSanitizer failed to
allocate").

* swarm/network: skip flaky TestGetSubscriptionsRPC on Travis w/ -race

Test fails a lot with something like:
 streamer_test.go:1332: Real subscriptions and expected amount don't match; real: 0, expected: 20

* swarm/storage: skip TestDB_SubscribePull* tests on Travis w/ -race

Travis just hangs...

ok  	github.com/ethereum/go-ethereum/swarm/storage/feed/lookup	1.307s
keepalive
keepalive
keepalive

or panics after a while.

Without these tests the race detector job is now stable. Let's
invetigate these tests in a separate issue:
https://github.com/ethersphere/go-ethereum/issues/1245
2019-02-20 22:57:42 +01:00
lash
d36e974ba3 swarm/network: Keep span across roundtrip (#19140)
* swarm/newtork: WIP Span request span until delivery and put

* swarm/storage: Introduce new trace across single fetcher lifespan

* swarm/network: Put span ids for sendpriority in context value

* swarm: Add global span store in tracing

* swarm/tracing: Add context key constants

* swarm/tracing: Add comments

* swarm/storage: Remove redundant fix for filestore

* swarm/tracing: Elaborate constants comments

* swarm/network, swarm/storage, swarm:tracing: Minor cleanup
2019-02-20 14:50:37 +01:00
lash
460d206f30 swarm/network: Use actual remote peer ip in underlay (#19137)
* swarm/network: Logline to see handshake addr

* swarm/network: Replace remote ip in handshake uaddr

* swarm/network: Add test for enode uaddr rewrite method

* swarm/network: Remove redundance pointer return from sanitize

* swarm/network: Obeying the linting machine

* swarm/network: Add panic comment

(travis trigger take 1)
2019-02-20 14:46:00 +01:00
Janoš Guljaš
ba2dfa5ce4 swarm/network/stream: fix a goroutine leak in Registry (#19139)
* swarm/network/stream: fix a goroutine leak in Registry

* swarm/network, swamr/network/stream: Kademlia close addr count and depth change chans

* swarm/network/stream: rename close channel to quit

* swarm/network/stream: fix sync between NewRegistry goroutine and Close method
2019-02-20 14:45:25 +01:00
Ferenc Szabo
50b872bf05 p2p, swarm: fix node up races by granular locking (#18976)
* swarm/network: DRY out repeated giga comment

I not necessarily agree with the way we wait for event propagation.
But I truly disagree with having duplicated giga comments.

* p2p/simulations: encapsulate Node.Up field so we avoid data races

The Node.Up field was accessed concurrently without "proper" locking.
There was a lock on Network and that was used sometimes to access
the  field. Other times the locking was missed and we had
a data race.

For example: https://github.com/ethereum/go-ethereum/pull/18464
The case above was solved, but there were still intermittent/hard to
reproduce races. So let's solve the issue permanently.

resolves: ethersphere/go-ethereum#1146

* p2p/simulations: fix unmarshal of simulations.Node

Making Node.Up field private in 13292ee897e345045fbfab3bda23a77589a271c1
broke TestHTTPNetwork and TestHTTPSnapshot. Because the default
UnmarshalJSON does not handle unexported fields.

Important: The fix is partial and not proper to my taste. But I cut
scope as I think the fix may require a change to the current
serialization format. New ticket:
https://github.com/ethersphere/go-ethereum/issues/1177

* p2p/simulations: Add a sanity test case for Node.Config UnmarshalJSON

* p2p/simulations: revert back to defer Unlock() pattern for Network

It's a good patten to call `defer Unlock()` right after `Lock()` so
(new) error cases won't miss to unlock. Let's get back to that pattern.

The patten was abandoned in 85a79b3ad3,
while fixing a data race. That data race does not exist anymore,
since the Node.Up field got hidden behind its own lock.

* p2p/simulations: consistent naming for test providers Node.UnmarshalJSON

* p2p/simulations: remove JSON annotation from private fields of Node

As unexported fields are not serialized.

* p2p/simulations: fix deadlock in Network.GetRandomDownNode()

Problem: GetRandomDownNode() locks -> getDownNodeIDs() ->
GetNodes() tries to lock -> deadlock

On Network type, unexported functions must assume that `net.lock`
is already acquired and should not call exported functions which
might try to lock again.

* p2p/simulations: ensure method conformity for Network

Connect* methods were moved to p2p/simulations.Network from
swarm/network/simulation. However these new methods did not follow
the pattern of Network methods, i.e., all exported method locks
the whole Network either for read or write.

* p2p/simulations: fix deadlock during network shutdown

`TestDiscoveryPersistenceSimulationSimAdapter` often got into deadlock.
The execution was stuck on two locks, i.e, `Kademlia.lock` and
`p2p/simulations.Network.lock`. Usually the test got stuck once in each
20 executions with high confidence.

`Kademlia` was stuck in `Kademlia.EachAddr()` and `Network` in
`Network.Stop()`.

Solution: in `Network.Stop()` `net.lock` must be released before
calling `node.Stop()` as stopping a node (somehow - I did not find
the exact code path) causes `Network.InitConn()` to be called from
`Kademlia.SuggestPeer()` and that blocks on `net.lock`.

Related ticket: https://github.com/ethersphere/go-ethereum/issues/1223

* swarm/state: simplify if statement in DBStore.Put()

* p2p/simulations: remove faulty godoc from private function

The comment started with the wrong method name.

The method is simple and self explanatory. Also, it's private.
=> Let's just remove the comment.
2019-02-18 07:38:14 +01:00
holisticode
2af24724dd swarm/network: Saturation check for healthy networks (#19071)
* swarm/network: new saturation for  implementation

* swarm/network: re-added saturation func in Kademlia as it is used elsewhere

* swarm/network: saturation with higher MinBinSize

* swarm/network: PeersPerBin with depth check

* swarm/network: edited tests to pass new saturated check

* swarm/network: minor fix saturated check

* swarm/network/simulations/discovery: fixed renamed RPC call

* swarm/network: renamed to isSaturated and returns bool

* swarm/network: early depth check
2019-02-14 19:01:50 +01:00
Elad
3ee09ba035 swarm/storage/netstore: add fetcher cancellation on shutdown (#19049)
swarm/network/stream: remove netstore internal wg
swarm/network/stream: run individual tests with t.Run
2019-02-14 07:51:57 +01:00
Janoš Guljaš
3fd6db2bf6 swarm: fix network/stream data races (#19051)
* swarm/network/stream: newStreamerTester cleanup only if err is nil

* swarm/network/stream: raise newStreamerTester waitForPeers timeout

* swarm/network/stream: fix data races in GetPeerSubscriptions

* swarm/storage: prevent data race on LDBStore.batchesC

https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-461775049

* swarm/network/stream: fix TestGetSubscriptionsRPC data race

https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-461768477

* swarm/network/stream: correctly use Simulation.Run callback

https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-461783804

* swarm/network: protect addrCountC in Kademlia.AddrCountC function

https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-462273444

* p2p/simulations: fix a deadlock calling getRandomNode with lock

https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-462317407

* swarm/network/stream: terminate disconnect goruotines in tests

* swarm/network/stream: reduce memory consumption when testing data races

* swarm/network/stream: add watchDisconnections helper function

* swarm/network/stream: add concurrent counter for tests

* swarm/network/stream: rename race/norace test files and use const

* swarm/network/stream: remove watchSim and its panic

* swarm/network/stream: pass context in watchDisconnections

* swarm/network/stream: add concurrent safe bool for watchDisconnections

* swarm/storage: fix LDBStore.batchesC data race by not closing it
2019-02-13 13:03:23 +01:00
Ferenc Szabo
27e3f96819 swarm: CI race detector test adjustments (#19017) 2019-02-08 17:07:11 +01:00
lash
0c10d37606 swarm/network, swarm/storage: Preserve opentracing contexts (#19022) 2019-02-08 16:57:48 +01:00
holisticode
41597c2856 swarm: Debug API and HasChunks() API endpoint (#18980) 2019-02-07 15:49:19 +01:00
Ferenc Szabo
1c3aa8d9b1 swarm/storage: fix test timeout with -race by increasing mget timeout 2019-02-05 14:34:34 +01:00
Anton Evangelatov
597597e8b2 swarm/network: refactor simulation tests bootstrap (#18975) 2019-02-01 09:58:46 +01:00
holisticode
43e1b7b124 swarm: GetPeerSubscriptions RPC (#18972) 2019-01-30 21:03:08 +01:00
Janoš Guljaš
592bf6a59c swarm: fix flaky delivery tests (#18971) 2019-01-30 14:03:11 +01:00
lash
f9401ae011 swarm/network: Remove extra random peer, connect test sanity, comments (#18964) 2019-01-30 09:49:58 +01:00
Elad
2abeb35d54 p2p/testing, swarm: remove unused testing.T in protocol tester (#18500) 2019-01-24 17:23:34 +01:00
gluk256
ad13d2d407 swarm/version: commit version added (#18510) 2019-01-24 12:35:10 +01:00