Commit Graph

329 Commits

Author SHA1 Message Date
Janoš Guljaš
02c28046a0 swarm: Fix localstore test deadlock with race detector (#19153)
* swarm/storage/localstore: close localstore in two tests

* swarm/storage/localstore: fix a possible deadlock in tests

* swarm/storage/localstore: re-enable pull subs tests for travis race

* swarm/storage/localstore: stop sending to errChan on context done in tests

* swarm/storage/localstore: better want check in readPullSubscriptionBin

* swarm/storage/localstore: protect chunk put with addr lock in tests

* swamr/storage/localstore: wait for gc and writeGCSize workers on Close

* swarm/storage/localstore: more correct testDB_collectGarbageWorker

* swarm/storage/localstore: set DB Close timeout to 5s
2019-02-22 23:19:09 +01:00
Janoš Guljaš
c8da76e63d swarm/shed: fix a deadlock in meter function (#19149) 2019-02-21 20:42:53 +01:00
Janoš Guljaš
836c846812 swarm/network/master: protect SetNextBatch iterator after close (#19147) 2019-02-21 18:33:49 +01:00
Péter Szilágyi
b9808e392f
swarm/version: bump to v0.3.12 unstable 2019-02-21 15:25:42 +02:00
Matthew Halpern
fbedf62f3d swarm/storage: fix loop bound for database cleanup (#19085)
The current loop continuation condition is always true as a uint8
is always being checked whether it is less than 255 (its maximum
value). Since the loop starts with the value 1, the loop termination
can be guarranteed to exit once the value overflows to 0.
2019-02-21 06:37:32 +01:00
Ferenc Szabo
e38b227ce6 Ci race detector handle failing tests (#19143)
* swarm/storage: increase mget timeout in common_test.go

 TestDbStoreCorrect_1k sometimes timed out with -race on Travis.

--- FAIL: TestDbStoreCorrect_1k (24.63s)
    common_test.go:194: testStore failed: timed out after 10s

* swarm: remove unused vars from TestSnapshotSyncWithServer

nodeCount and chunkCount is returned from setupSim and those values
we use.

* swarm: move race/norace helpers from stream to testutil

As we will need to use the flag in other packages, too.

* swarm: refactor TestSwarmNetwork case

Extract long running test cases for better visibility.

* swarm/network: skip TestSyncingViaGlobalSync with -race

As panics on Travis.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7e351b]

* swarm: run TestSwarmNetwork with fewer nodes with -race

As otherwise we always get test failure with `network_test.go:374:
context deadline exceeded` even with raised `Timeout`.

* swarm/network: run TestDeliveryFromNodes with fewer nodes with -race

Test on Travis times out with 8 or more nodes if -race flag is present.

* swarm/network: smaller node count for discovery tests with -race

TestDiscoveryPersistenceSimulationSimAdapters failed on Travis with
`-race` flag present. The failure was due to extensive memory usage,
coming from the CGO runtime. Using a smaller node count resolves the
issue.

=== RUN   TestDiscoveryPersistenceSimulationSimAdapter
==7227==ERROR: ThreadSanitizer failed to allocate 0x80000 (524288) bytes of clock allocator (error code: 12)
FATAL: ThreadSanitizer CHECK failed: ./gotsan.cc:6976 "((0 && "unable to mmap")) != (0)" (0x0, 0x0)
FAIL    github.com/ethereum/go-ethereum/swarm/network/simulations/discovery     804.826s

* swarm/network: run TestFileRetrieval with fewer nodes with -race

Otherwise we get a failure due to extensive memory usage, as the CGO
runtime cannot allocate more bytes.

=== RUN   TestFileRetrieval
==7366==ERROR: ThreadSanitizer failed to allocate 0x80000 (524288) bytes of clock allocator (error code: 12)
FATAL: ThreadSanitizer CHECK failed: ./gotsan.cc:6976 "((0 && "unable to mmap")) != (0)" (0x0, 0x0)
FAIL	github.com/ethereum/go-ethereum/swarm/network/stream	155.165s

* swarm/network: run TestRetrieval with fewer nodes with -race

Otherwise we get a failure due to extensive memory usage, as the CGO
runtime cannot allocate more bytes ("ThreadSanitizer failed to
allocate").

* swarm/network: skip flaky TestGetSubscriptionsRPC on Travis w/ -race

Test fails a lot with something like:
 streamer_test.go:1332: Real subscriptions and expected amount don't match; real: 0, expected: 20

* swarm/storage: skip TestDB_SubscribePull* tests on Travis w/ -race

Travis just hangs...

ok  	github.com/ethereum/go-ethereum/swarm/storage/feed/lookup	1.307s
keepalive
keepalive
keepalive

or panics after a while.

Without these tests the race detector job is now stable. Let's
invetigate these tests in a separate issue:
https://github.com/ethersphere/go-ethereum/issues/1245
2019-02-20 22:57:42 +01:00
lash
d36e974ba3 swarm/network: Keep span across roundtrip (#19140)
* swarm/newtork: WIP Span request span until delivery and put

* swarm/storage: Introduce new trace across single fetcher lifespan

* swarm/network: Put span ids for sendpriority in context value

* swarm: Add global span store in tracing

* swarm/tracing: Add context key constants

* swarm/tracing: Add comments

* swarm/storage: Remove redundant fix for filestore

* swarm/tracing: Elaborate constants comments

* swarm/network, swarm/storage, swarm:tracing: Minor cleanup
2019-02-20 14:50:37 +01:00
lash
460d206f30 swarm/network: Use actual remote peer ip in underlay (#19137)
* swarm/network: Logline to see handshake addr

* swarm/network: Replace remote ip in handshake uaddr

* swarm/network: Add test for enode uaddr rewrite method

* swarm/network: Remove redundance pointer return from sanitize

* swarm/network: Obeying the linting machine

* swarm/network: Add panic comment

(travis trigger take 1)
2019-02-20 14:46:00 +01:00
Janoš Guljaš
ba2dfa5ce4 swarm/network/stream: fix a goroutine leak in Registry (#19139)
* swarm/network/stream: fix a goroutine leak in Registry

* swarm/network, swamr/network/stream: Kademlia close addr count and depth change chans

* swarm/network/stream: rename close channel to quit

* swarm/network/stream: fix sync between NewRegistry goroutine and Close method
2019-02-20 14:45:25 +01:00
lash
d88c6ce6b0 swarm: Reinstate Pss Protocol add call through swarm service (#19117)
* swarm: Reinstate Pss Protocol add call through swarm service

* swarm: Even less self
2019-02-18 16:44:50 +01:00
Ferenc Szabo
50b872bf05 p2p, swarm: fix node up races by granular locking (#18976)
* swarm/network: DRY out repeated giga comment

I not necessarily agree with the way we wait for event propagation.
But I truly disagree with having duplicated giga comments.

* p2p/simulations: encapsulate Node.Up field so we avoid data races

The Node.Up field was accessed concurrently without "proper" locking.
There was a lock on Network and that was used sometimes to access
the  field. Other times the locking was missed and we had
a data race.

For example: https://github.com/ethereum/go-ethereum/pull/18464
The case above was solved, but there were still intermittent/hard to
reproduce races. So let's solve the issue permanently.

resolves: ethersphere/go-ethereum#1146

* p2p/simulations: fix unmarshal of simulations.Node

Making Node.Up field private in 13292ee897e345045fbfab3bda23a77589a271c1
broke TestHTTPNetwork and TestHTTPSnapshot. Because the default
UnmarshalJSON does not handle unexported fields.

Important: The fix is partial and not proper to my taste. But I cut
scope as I think the fix may require a change to the current
serialization format. New ticket:
https://github.com/ethersphere/go-ethereum/issues/1177

* p2p/simulations: Add a sanity test case for Node.Config UnmarshalJSON

* p2p/simulations: revert back to defer Unlock() pattern for Network

It's a good patten to call `defer Unlock()` right after `Lock()` so
(new) error cases won't miss to unlock. Let's get back to that pattern.

The patten was abandoned in 85a79b3ad3,
while fixing a data race. That data race does not exist anymore,
since the Node.Up field got hidden behind its own lock.

* p2p/simulations: consistent naming for test providers Node.UnmarshalJSON

* p2p/simulations: remove JSON annotation from private fields of Node

As unexported fields are not serialized.

* p2p/simulations: fix deadlock in Network.GetRandomDownNode()

Problem: GetRandomDownNode() locks -> getDownNodeIDs() ->
GetNodes() tries to lock -> deadlock

On Network type, unexported functions must assume that `net.lock`
is already acquired and should not call exported functions which
might try to lock again.

* p2p/simulations: ensure method conformity for Network

Connect* methods were moved to p2p/simulations.Network from
swarm/network/simulation. However these new methods did not follow
the pattern of Network methods, i.e., all exported method locks
the whole Network either for read or write.

* p2p/simulations: fix deadlock during network shutdown

`TestDiscoveryPersistenceSimulationSimAdapter` often got into deadlock.
The execution was stuck on two locks, i.e, `Kademlia.lock` and
`p2p/simulations.Network.lock`. Usually the test got stuck once in each
20 executions with high confidence.

`Kademlia` was stuck in `Kademlia.EachAddr()` and `Network` in
`Network.Stop()`.

Solution: in `Network.Stop()` `net.lock` must be released before
calling `node.Stop()` as stopping a node (somehow - I did not find
the exact code path) causes `Network.InitConn()` to be called from
`Kademlia.SuggestPeer()` and that blocks on `net.lock`.

Related ticket: https://github.com/ethersphere/go-ethereum/issues/1223

* swarm/state: simplify if statement in DBStore.Put()

* p2p/simulations: remove faulty godoc from private function

The comment started with the wrong method name.

The method is simple and self explanatory. Also, it's private.
=> Let's just remove the comment.
2019-02-18 07:38:14 +01:00
gluk256
12ca3b172a swarm/pss: refactoring (#19110)
* swarm/pss: split pss and keystore

* swarm/pss: moved whisper to keystore

* swarm/pss: goimports fixed
2019-02-17 06:29:41 +01:00
Elad
5b8ae7885e swarm/storage: fix influxdb gc metrics report (#19102) 2019-02-15 07:41:42 +01:00
holisticode
2af24724dd swarm/network: Saturation check for healthy networks (#19071)
* swarm/network: new saturation for  implementation

* swarm/network: re-added saturation func in Kademlia as it is used elsewhere

* swarm/network: saturation with higher MinBinSize

* swarm/network: PeersPerBin with depth check

* swarm/network: edited tests to pass new saturated check

* swarm/network: minor fix saturated check

* swarm/network/simulations/discovery: fixed renamed RPC call

* swarm/network: renamed to isSaturated and returns bool

* swarm/network: early depth check
2019-02-14 19:01:50 +01:00
Elad
3ee09ba035 swarm/storage/netstore: add fetcher cancellation on shutdown (#19049)
swarm/network/stream: remove netstore internal wg
swarm/network/stream: run individual tests with t.Run
2019-02-14 07:51:57 +01:00
Janoš Guljaš
3fd6db2bf6 swarm: fix network/stream data races (#19051)
* swarm/network/stream: newStreamerTester cleanup only if err is nil

* swarm/network/stream: raise newStreamerTester waitForPeers timeout

* swarm/network/stream: fix data races in GetPeerSubscriptions

* swarm/storage: prevent data race on LDBStore.batchesC

https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-461775049

* swarm/network/stream: fix TestGetSubscriptionsRPC data race

https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-461768477

* swarm/network/stream: correctly use Simulation.Run callback

https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-461783804

* swarm/network: protect addrCountC in Kademlia.AddrCountC function

https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-462273444

* p2p/simulations: fix a deadlock calling getRandomNode with lock

https://github.com/ethersphere/go-ethereum/issues/1198#issuecomment-462317407

* swarm/network/stream: terminate disconnect goruotines in tests

* swarm/network/stream: reduce memory consumption when testing data races

* swarm/network/stream: add watchDisconnections helper function

* swarm/network/stream: add concurrent counter for tests

* swarm/network/stream: rename race/norace test files and use const

* swarm/network/stream: remove watchSim and its panic

* swarm/network/stream: pass context in watchDisconnections

* swarm/network/stream: add concurrent safe bool for watchDisconnections

* swarm/storage: fix LDBStore.batchesC data race by not closing it
2019-02-13 13:03:23 +01:00
Elad
d596bea2d5 swarm: fix uptime gauge update goroutine leak by introducing cleanup functions (#19040) 2019-02-13 08:15:03 +01:00
holisticode
3d22a46c94 swarm/storage: fix HashExplore concurrency bug ethersphere#1211 (#19028)
* swarm/storage: fix HashExplore concurrency bug ethersphere#1211

*  swarm/storage: lock as value not pointer

* swarm/storage: wait for  to complete

* swarm/storage: fix linter problems

* swarm/storage: append to nil slice
2019-02-13 00:17:44 +01:00
gluk256
b30109df3c swarm/pss: mutex lifecycle fixed (#19045) 2019-02-13 00:12:41 +01:00
Rafael Matias
6cb7d52a29 swarm/docker: add global-store and split docker images (#19038) 2019-02-12 08:34:08 +01:00
Ferenc Szabo
27e3f96819 swarm: CI race detector test adjustments (#19017) 2019-02-08 17:07:11 +01:00
gluk256
cde02e017e swarm/pss: transition to whisper v6 (#19023) 2019-02-08 17:05:10 +01:00
lash
0c10d37606 swarm/network, swarm/storage: Preserve opentracing contexts (#19022) 2019-02-08 16:57:48 +01:00
Janoš Guljaš
4f3d22f06c swarm/storage/localstore: new localstore package (#19015) 2019-02-07 18:40:26 +01:00
holisticode
41597c2856 swarm: Debug API and HasChunks() API endpoint (#18980) 2019-02-07 15:49:19 +01:00
Janoš Guljaš
33d0a0efa6 cmd/swarm/global-store: global store cmd (#19014) 2019-02-07 15:46:58 +01:00
holisticode
7f55b0cbd8 cmd/swarm: hashes command (#19008) 2019-02-07 13:51:24 +01:00
Kiel barry
53b823afc8
contracts/*: golint updates for this or self warning 2019-02-07 13:15:14 +02:00
holisticode
3eff652a7b swarm/storage: Get all chunk references for a given file (#19002) 2019-02-06 12:16:43 +01:00
lash
7c60d0a6a2 swarm/pss: Remove pss service leak in test (#18992) 2019-02-05 14:35:20 +01:00
Ferenc Szabo
1c3aa8d9b1 swarm/storage: fix test timeout with -race by increasing mget timeout 2019-02-05 14:34:34 +01:00
Anton Evangelatov
597597e8b2 swarm/network: refactor simulation tests bootstrap (#18975) 2019-02-01 09:58:46 +01:00
holisticode
43e1b7b124 swarm: GetPeerSubscriptions RPC (#18972) 2019-01-30 21:03:08 +01:00
Janoš Guljaš
592bf6a59c swarm: fix flaky delivery tests (#18971) 2019-01-30 14:03:11 +01:00
lash
f9401ae011 swarm/network: Remove extra random peer, connect test sanity, comments (#18964) 2019-01-30 09:49:58 +01:00
Felix Lange
f4094d09cd params, swarm/version: Geth v1.9.0 unstable, Swarm v0.3.11-unstable 2019-01-29 17:43:13 +01:00
Anton Evangelatov
21acf0bc8d cmd/utils: allow for multiple influxdb tags (#18520)
This PR is replacing the metrics.influxdb.host.tag cmd-line flag with metrics.influxdb.tags - a comma-separated key/value tags, that are passed to the InfluxDB reporter, so that we can index measurements with multiple tags, and not just one host tag.

This will be useful for Swarm, where we want to index measurements not just with the host tag, but also with bzzkey and git commit version (for long-running deployments).
2019-01-29 09:14:24 +01:00
Janoš Guljaš
104e6b2050 swarm/pss/notify: shutdown net in TestStart to fix OOM issue (#18953) 2019-01-28 16:08:33 +01:00
Ferenc Szabo
2209fede4e swarm/pss: fix data race on topicHandlerCaps map (#18523) 2019-01-25 20:18:28 +01:00
Jerzy Lasyk
f28da4f602 swarm/metrics: Send the accounting registry to InfluxDB (#18470) 2019-01-24 18:57:20 +01:00
Elad
2abeb35d54 p2p/testing, swarm: remove unused testing.T in protocol tester (#18500) 2019-01-24 17:23:34 +01:00
Ferenc Szabo
6167dd65b5 swarm/pss: fix data race in notify_test.go (TestStart) (#18518) 2019-01-24 17:07:43 +01:00
gluk256
ad13d2d407 swarm/version: commit version added (#18510) 2019-01-24 12:35:10 +01:00
Ferenc Szabo
3591fc603f swarm/storage: Fix race in TestLDBStoreCollectGarbage. Disable testLDBStoreRemoveThenCollectGarbage (#18512) 2019-01-24 12:34:12 +01:00
Janoš Guljaš
fa34429a26 swarm: fix a data race on startTime (#18511) 2019-01-24 12:02:47 +01:00
Anton Evangelatov
bbd120354a
swarm: bootnode-mode, new bootnodes and no p2p package discovery (#18498) 2019-01-24 12:02:18 +01:00
gluk256
105008b6a1 swarm/pss: fixing race condition (#18487) 2019-01-21 15:22:51 +01:00
Viktor Trón
15b9b39e6c
swarm/network: unskip tests previously skipped due to suggestPeer issues (#18477) 2019-01-19 08:12:57 +01:00
Ferenc Szabo
19bfcbf911 swarm/network: fix data race in fetcher_test.go (#18469) 2019-01-17 16:45:36 +01:00
Ferenc Szabo
4f8ec44565 swarm/network: fix data race in stream.(*Peer).handleOfferedHashesMsg() (#18468)
* swarm/network: fix data race in stream.(*Peer).handleOfferedHashesMsg()

handleOfferedHashesMsg() contained a data race:
- read => in a goroutine, call to c.batchDone()
- write => in the main thread, write to c.sessionAt

c.batchDone() contained a call to c.AddInterval(). Client was a value
receiver for AddInterval. So on c.AddInterval() call the whole client
struct got copied (read) while one of its field was modified in
handleOfferedHashesMsg() (write).

fixes ethersphere/go-ethereum#1086

* swarm/network: simplify some trivial statements
2019-01-17 14:44:29 +01:00