Commit Graph

475 Commits

Author SHA1 Message Date
Łukasz Magiera
f90a387f96 sched: Print worker UUIDs in shed-diag correctly 2020-10-30 18:32:16 +01:00
Łukasz Magiera
774e2ecebf sched: Fix worker reenabling 2020-10-30 18:01:37 +01:00
Łukasz Magiera
7fbb868513 Debug flag to force running sealing scheduler 2020-10-30 11:07:35 +01:00
Łukasz Magiera
c6b03ce62b sectorstorage: Missing unlock in waitWork 2020-10-29 15:18:51 +01:00
Łukasz Magiera
ea5bb5cdab sectorstorage: Fix manager restart edge-case 2020-10-29 12:14:21 +01:00
Łukasz Magiera
f0f75e2d2c
Merge pull request #4627 from karalabe/fix-gpu-usage-tracking
extern/sector-storage: fix GPU usage overwrite bug
2020-10-29 10:13:54 +01:00
Péter Szilágyi
5f657b4333
extern/sector-storage: fix GPU usage overwrite bug 2020-10-28 20:52:33 +02:00
Łukasz Magiera
da7ecc1527 Fix flaky sealing manager tests 2020-10-28 16:15:17 +01:00
Łukasz Magiera
4100f6eead fix TestWDPostDoPost 2020-10-28 15:23:21 +01:00
Łukasz Magiera
ed2f81da2f sched: Fix tests 2020-10-28 14:34:28 +01:00
Łukasz Magiera
4cf00b8b42 worker_local: address review 2020-10-28 14:29:17 +01:00
Łukasz Magiera
96c5ff7e7f sched: use more letters for variables 2020-10-28 14:23:38 +01:00
Łukasz Magiera
8731fe9112 sched: split worker handling into more funcs 2020-10-28 14:14:50 +01:00
Łukasz Magiera
84b567c790 sched: move worker funcs to a separate file 2020-10-28 13:39:28 +01:00
Łukasz Magiera
660236b224 Merge remote-tracking branch 'origin/master' into feat/async-restartable-workers 2020-10-23 23:25:35 +02:00
Łukasz Magiera
29e334de54
Merge pull request #4511 from filecoin-project/steb/generalize-window-post
Manage sectors by size instead of proof type.
2020-10-22 21:27:48 +02:00
Steven Allen
4e730b5ec8 port to v2 imports 2020-10-21 12:16:23 -07:00
Steven Allen
00dcb1bce9 Manage sectors by size instead of proof type.
* We may have multiple sectors with the same size and different proof types, but all these management functions stay the same.
* This simplifies PoSt logic.
2020-10-20 18:30:56 -07:00
Łukasz Magiera
8c86ea6b75 localworker: Try very hard to get ruselts to manager 2020-10-18 19:45:11 +02:00
Łukasz Magiera
dbb421c4f7 localworker: Use better context for calling returnFunc 2020-10-18 19:32:43 +02:00
Łukasz Magiera
8d06cca073 sched: Handle workers using sessions instead of connections 2020-10-18 12:36:06 +02:00
Łukasz Magiera
71b3b9075d Merge remote-tracking branch 'origin/master' into feat/async-restartable-workers 2020-10-13 21:33:21 +02:00
Steven Allen
dc4e73c737 Test the tape upgrade 2020-10-12 00:01:25 -07:00
Steven Allen
83dfc460d4 fix race in unseal
1. Remove an invalid error check.
2. Make sure to shadow the outer error type from within the goroutine instead or
reading the outer type.

This may have been causing test issues (caught in TestMinerAllInfo with the race
detector).
2020-10-09 15:39:41 -07:00
Łukasz Magiera
0de3051821 Merge remote-tracking branch 'origin/master' into feat/async-restartable-workers 2020-10-08 13:10:41 +02:00
Łukasz Magiera
cfd126ee9f
Merge pull request #3961 from filecoin-project/docs/miner-storage
lotus-miner: add more help text to storage / attach
2020-10-07 14:55:42 +02:00
Łukasz Magiera
1fc23fb466 lotus-miner: Cleanup storage attach helptext a bit 2020-10-03 11:30:22 +02:00
Łukasz Magiera
5932f28519 Merge remote-tracking branch 'origin/master' into feat/async-restartable-workers 2020-10-01 02:39:48 +02:00
Łukasz Magiera
6981f776f4 Lower PC2 memory requirements 2020-10-01 00:54:53 +02:00
Łukasz Magiera
1b7cdb9341 Fix storage manager tests 2020-10-01 00:54:34 +02:00
Łukasz Magiera
5e08d56630 sched: Allow some single-thread tasks to run in parallel with PC2/C2 2020-10-01 00:28:44 +02:00
Łukasz Magiera
79d2ddf24f Review 2020-09-30 21:18:12 +02:00
Łukasz Magiera
2cfe22d4e5 Merge remote-tracking branch 'origin/master' into feat/async-restartable-workers 2020-09-30 20:48:16 +02:00
Łukasz Magiera
e3ee4e4718 Fix lint errors 2020-09-30 20:24:03 +02:00
Łukasz Magiera
2d16af6ee6 sectorstorage: Fix TestRedoPC1 2020-09-30 19:18:38 +02:00
Łukasz Magiera
c228598098 sectorstorage: Variable scopes are really hard 2020-09-30 18:16:07 +02:00
Łukasz Magiera
54fdd6ba5a sectorstorage: Variable scopes are hard 2020-09-30 17:48:48 +02:00
Łukasz Magiera
6855284d88 sectorstorage: Cancel non-running work in case of abort in sched 2020-09-30 17:26:09 +02:00
Łukasz Magiera
bc85e3ce50
Merge pull request #4107 from shaodan/worker-no-swap
Add --no-swap flag for worker
2020-09-30 09:25:57 +02:00
Dan Shao
1affd498c1 Add --no-swap flag for worker 2020-09-30 14:23:35 +08:00
Łukasz Magiera
baef3c8dd2 sectorstorage: Fix potential panic in FinalizeSector 2020-09-29 15:22:46 +02:00
Łukasz Magiera
0f2dcf28b1 fsm: Reuse tickets in PC1 on retry 2020-09-29 10:07:49 +02:00
Łukasz Magiera
1e6a69f8aa localworker: Don't mark calls as returned when returning fails 2020-09-28 22:10:02 +02:00
Łukasz Magiera
9bd2537971 stores: Fix error printing in http handler 2020-09-28 22:06:03 +02:00
Łukasz Magiera
810c767200 worker: Redeclare storage on reconnect 2020-09-28 21:06:49 +02:00
Łukasz Magiera
4ba7af6061 worker: Mark return methods as retry-safe 2020-09-28 20:46:44 +02:00
Łukasz Magiera
9e7d6823b1 sectorstorage: Cleanup callToWork mapping after work is done 2020-09-28 13:34:45 +02:00
zgfzgf
1a7aea1906 modify error 2020-09-25 22:59:21 +08:00
zgfzgf
3207bc4704 optimize trySched 2020-09-25 22:41:29 +08:00
zgfzgf
60e950015c modify for unsafe 2020-09-25 22:13:27 +08:00
Łukasz Magiera
04ee53e061 sectorstorage: Show task type of ret-wait jobs 2020-09-24 11:55:11 +02:00
Łukasz Magiera
d817dceb05 Show lost calls in sealing jobs cli 2020-09-23 19:26:35 +02:00
Łukasz Magiera
c17f0d7e61 sectorstorage: Fix panic in returnResult 2020-09-23 17:37:05 +02:00
Łukasz Magiera
86c222ab58 sectorstorage: fix work tracking 2020-09-23 14:56:50 +02:00
Łukasz Magiera
ce6b92484f Merge remote-tracking branch 'origin/master' into feat/async-restartable-workers 2020-09-23 11:31:21 +02:00
Łukasz Magiera
6185e157e9 sectorstorage: calltracker: work around cbor-gen bytearray len limit 2020-09-23 00:29:10 +02:00
Łukasz Magiera
04ad1791b0 localworker: Fix contexts 2020-09-23 00:10:36 +02:00
Łukasz Magiera
bb5cc06677 Fix workid param hash 2020-09-22 23:33:13 +02:00
Travis Person
c66f087f4c lotus-miner: add more help text to storage / attach 2020-09-22 18:15:42 +00:00
Łukasz Magiera
706f4f2ef5 worker: Don't die with the connection 2020-09-22 18:36:44 +02:00
Łukasz Magiera
b8865fb182 workers: Mark on-restart-failed returned tasks as returned 2020-09-22 01:00:28 +02:00
Łukasz Magiera
03c3d8bdb3 workers: Return unfinished tasks on restart 2020-09-22 00:52:33 +02:00
Łukasz Magiera
70faa36b7f Merge remote-tracking branch 'origin/master' into refactor/net-upgrade 2020-09-18 19:29:06 +02:00
Łukasz Magiera
e632643801 api: Test return types 2020-09-17 12:24:50 +02:00
Łukasz Magiera
17680fff55 gofmt 2020-09-17 00:35:57 +02:00
Łukasz Magiera
d9d644b27f sectorstorage: handle restarting manager, test that 2020-09-17 00:35:09 +02:00
Łukasz Magiera
5e09581256 sectorstorage: get new work tracker to run 2020-09-16 22:33:58 +02:00
Łukasz Magiera
b1361aaf8b sectorstorage: wip manager work tracker 2020-09-16 17:08:05 +02:00
Łukasz Magiera
03cf6cca40 Merge remote-tracking branch 'origin/master' into feat/async-restartable-workers 2020-09-15 17:47:03 +02:00
hannahhoward
7dc091052a feat(manager): less restrictive storage lock
Use initial less restrictive storage lock when trying to read unsealed data before acquiring more
restrictive lock needed for unsealing
2020-09-14 18:48:14 -07:00
Łukasz Magiera
e9d25e5919 More fixes 2020-09-14 20:28:47 +02:00
Łukasz Magiera
381a6cdfac Merge remote-tracking branch 'origin/master' into feat/async-restartable-workers 2020-09-14 19:11:50 +02:00
Łukasz Magiera
1ebca8f732 more working code 2020-09-14 19:09:01 +02:00
Łukasz Magiera
bbac86f745 gofmt, mod tidy 2020-09-10 22:07:20 +02:00
Łukasz Magiera
c7b0241a48 ffiwrapper: Test skipping corrupted sectors in PoSt 2020-09-10 21:19:26 +02:00
Łukasz Magiera
5f08fe7ead Merge remote-tracking branch 'origin/master' into feat/async-restartable-workers 2020-09-10 17:30:54 +02:00
Łukasz Magiera
5e7737f55d wdpost: Handle skipped sectors correctly 2020-09-10 02:59:37 +02:00
whyrusleeping
7a6ceebb34 windowed post generation now returns faulty sectors 2020-09-09 14:00:15 -07:00
Dirk McCormick
17c15a74a2 fix: return true from Sealer.ReadPiece() on success 2020-09-08 13:50:56 +02:00
Dirk McCormick
8bbdf2e7cb fix: storage manager - bail out on undefined unsealed cid 2020-09-08 12:54:01 +02:00
Aayush Rajasekaran
d678fe4bfa Fix tests 2020-09-07 15:48:42 -04:00
Aayush Rajasekaran
39755a294a Update to specs v0.9.6 2020-09-07 15:48:41 -04:00
Łukasz Magiera
47c59afea0
Revert "storage manager: bail out with an error if unsealed cid is undefined" 2020-09-07 20:12:29 +02:00
Łukasz Magiera
231a9e4051 Fix sealing sched tests 2020-09-07 17:55:31 +02:00
Dirk McCormick
a97f978cad fix: storage manager - dont fail on successful read piece 2020-09-07 16:14:19 +02:00
Łukasz Magiera
9e6f974f3c storage: Fix build 2020-09-07 16:12:55 +02:00
Dirk McCormick
07a4553e6e fix: storage manager - bail out with an error if unsealed cid is undefined 2020-09-07 16:04:12 +02:00
Łukasz Magiera
06e3852cef storage: Integrate async workers in sealing manager 2020-09-07 12:20:50 +02:00
Łukasz Magiera
5d73943929 storage: Fix import cycle 2020-09-06 18:54:00 +02:00
Łukasz Magiera
159ce13f5e Async worker API 2020-09-06 18:47:16 +02:00
Łukasz Magiera
b774563ec3
Merge pull request #3492 from filecoin-project/fix/readpiece-panic
ffiwrapper: Fix ReadPiece panic
2020-09-02 18:59:22 +02:00
Łukasz Magiera
ca7aa69597 ffiwrapper: More correct error check on openPartialFile 2020-09-02 18:45:07 +02:00
Łukasz Magiera
5a2b439773 sched: Fix tests 2020-09-02 17:37:19 +02:00
Łukasz Magiera
7fe8580da5 sealing sched: Fix deadlock between worker.wndLk / workersLk 2020-09-02 17:06:48 +02:00
Star.LI
82f5984de1 fix crash - segment fault when partialFile.Allocated() is invoked.
When openPartialFile is invoked, more errors than "existed error" are
returned. If only existing error is checked, the allocated field of
partialFile may be nil.

Signed-off-by: Star.LI <star@trapdoortech.com>
2020-09-01 11:41:26 +02:00
Łukasz Magiera
e14c80360d sealing sched: Factor worker queues into utilization calc 2020-08-31 13:41:34 +02:00
Łukasz Magiera
98d51d3d80 storage: Correcty move unsealed sectors in FinalizeSector 2020-08-31 12:45:57 +02:00
Łukasz Magiera
28ac2fce61 sched: Fix panic in workerCompactWindows 2020-08-29 06:41:19 +02:00
Łukasz Magiera
9d0c8ae3dd sectorstorage: update sched tests for new logic 2020-08-28 21:38:21 +02:00
Łukasz Magiera
4a75e1e4b4 sectorstorage: Don't require tasks within a window to run in order 2020-08-28 19:38:55 +02:00
Łukasz Magiera
11b11e416b sectorstorage: Compact assigned windows 2020-08-28 18:26:38 +02:00
Łukasz Magiera
5ee85dc263 sectorstorage: Fix tests 2020-08-28 16:33:41 +02:00
Łukasz Magiera
6d1682a27e storagefsm: wire up RecoverDealIDs fully 2020-08-28 11:44:15 +02:00
Łukasz Magiera
1097d29213 sealing sched: Call trySched less when there are many tasks 2020-08-28 00:03:42 +02:00
Łukasz Magiera
59d2034cbb sealing sched: Wait a bit for tasks to come in on restart 2020-08-27 23:58:37 +02:00
Łukasz Magiera
7fdffc0340 sealing sched: Give more priority to tasks (re)moving data 2020-08-27 23:29:39 +02:00
Łukasz Magiera
f2bd680cc5 gofmt 2020-08-27 23:14:46 +02:00
Łukasz Magiera
59f554b658 sealing sched: Show waiting tasks assigned to workers in sealing jobs cli 2020-08-27 23:14:33 +02:00
Steven Allen
0155a31d1f Fix PoSt with bad sectors
"skipped" sectors must be replaced with a substitute "good" sector, or the
entire partition must be skipped. They should not just be omitted.

This patch also fixes the test to verify the _entire_ proof instead of just
verifying that the proof includes the correct sectors.
2020-08-26 09:56:51 -07:00
Łukasz Magiera
d9796cd25c sectorstorage: Make trySched less very slow 2020-08-24 19:16:16 +02:00
Łukasz Magiera
4311c96a44
Merge pull request #3225 from filecoin-project/fix/sched-missing-worker
check that worker referenced by task is actually still there.
2020-08-22 22:16:32 +02:00
whyrusleeping
54862be3ff check that worker referenced by task is actually still there. 2020-08-21 10:33:36 -07:00
Steven Allen
5733c71c50 Lint everything
We were ignoring quite a few error cases, and had one case where we weren't
actually updating state where we wanted to. Unfortunately, if the linter doesn't
pass, nobody has any reason to actually check lint failures in CI.

There are three remaining XXXs marked in the code for lint.
2020-08-20 20:46:36 -07:00
Łukasz Magiera
6ef7a30b19
Merge pull request #3089 from filecoin-project/integrate/storage-fsm
integrate extern/{storage-fsm,sector-storage} into lotus source tree
2020-08-17 18:37:54 +02:00
Raúl Kripalani
1f016262cd readd int64 cast + nolint directive. 2020-08-17 17:26:07 +01:00
Raúl Kripalani
efdc428d5d keep storage-fsm (renamed to storage-sealing) and sector-storage in extern. 2020-08-17 14:26:18 +01:00
yaohcn
39bcfab37c make addpiece configurable 2020-08-17 17:39:50 +08:00
Raúl Kripalani
3c17cd655e integrate extern/sector-storage into lotus proper. 2020-08-16 11:09:58 +01:00
Łukasz Magiera
632fd36205 sealing sched: Fix deadlock in worker watcher 2020-08-13 12:17:24 +02:00
yaohcn
1555984785 change to RLock 2020-08-13 17:31:18 +08:00
Steven Allen
3ef3f570fb Fix lint errors an broken tests 2020-08-12 23:16:44 +02:00
Steven Allen
9135a5d048 Pass bitfields by-value
This ensures we can't end up decoding nil bitfields from clients when not
expecting them.

Part of https://github.com/filecoin-project/specs-actors/issues/895. Please see
this issue for details and leave any comments there.
2020-08-12 10:32:39 -07:00
yaohcn
6b0f607f4b add space check in StorageFindSector 2020-08-11 15:27:03 +08:00
Łukasz Magiera
85d8133f4a Fix gomods 2020-08-10 17:31:14 +02:00
Łukasz Magiera
0eaf44eb31 Merge sector-storage subtree 2020-08-10 17:25:46 +02:00