Commit Graph

53 Commits

Author SHA1 Message Date
Łukasz Magiera
6ddbe41376 Merge remote-tracking branch 'origin/master' into feat/post-worker 2022-03-18 10:54:44 +01:00
Łukasz Magiera
c4259cb594 worker: RemoveCopies expects one type at a time 2022-03-16 12:28:56 +01:00
Łukasz Magiera
a88edeb79d worker: Call RemoveCopies in MoveStorage 2022-03-16 12:28:56 +01:00
Łukasz Magiera
45b07674e5 stores: http: Support multiple storage IDs in ?keep 2022-03-16 12:28:56 +01:00
Łukasz Magiera
046a9f8af0 Merge remote-tracking branch 'origin/master' into feat/post-worker 2022-03-09 16:27:03 +01:00
Łukasz Magiera
a6892f956e
Merge pull request #7844 from llifezou/add_workerName_in_sealing_err
feat: #6147: Include worker name in sealing errors
2022-03-02 13:13:34 +00:00
llifezou
dac5518005
Update extern/sector-storage/worker_local.go
Co-authored-by: Łukasz Magiera <magik6k@users.noreply.github.com>
2022-02-25 11:12:18 +08:00
Łukasz Magiera
09cfad9d71 Add FinalizeReplicaUpdate into some more places 2022-02-08 17:22:41 +01:00
Łukasz Magiera
142ba6660a wip FinalizeReplicaUpdate 2022-02-08 17:22:41 +01:00
Łukasz Magiera
efdb854a7c fix some races 2022-01-31 20:53:25 +00:00
Łukasz Magiera
f148397e1b post workers: Fix race in setting vproofs 2022-01-21 12:31:24 +01:00
Łukasz Magiera
82c9e72aab post workers: Fix skipped handling 2022-01-21 10:39:14 +01:00
Łukasz Magiera
b38141601c Untangle ffi from api 2022-01-18 11:57:04 +01:00
Łukasz Magiera
4a874eff70 post workers: Cleanup, tests 2022-01-14 14:17:52 +01:00
mz-sirius
793b5c7cc3 fix ci err 2022-01-05 21:41:21 +08:00
mz-sirius
3fd55fa56b decoupling winningpost and windowpost from lotus-miner 2022-01-05 01:50:49 +08:00
llifezou
4b685c5e26 Include worker name in sealing errors 2021-12-23 17:44:43 +08:00
Aayush Rajasekaran
80d5e52923 Merge branch 'master' into next 2021-12-13 13:24:28 -05:00
zenground0
a5be80828a RemoveData and Decode
- Unsealing replica update with sector key works and tested
- Sector key generation added and tested
2021-12-03 15:21:06 -05:00
Łukasz Magiera
f25efecb74 worker: Test resource table overrides 2021-11-30 02:06:58 +01:00
Łukasz Magiera
6d52d8552b Fix docsgen 2021-11-30 02:06:58 +01:00
Łukasz Magiera
c9a2ff4007 cleanup worker resource overrides 2021-11-30 02:06:58 +01:00
Łukasz Magiera
b961e1aab5 sched resources: Separate Parallelism defaults depending on GPU presence 2021-11-30 02:06:58 +01:00
Clint Armstrong
4ef8543128 Permit workers to override resource table
In an environment with heterogenious worker nodes, a universal resource
table for all workers does not allow effective scheduling of tasks. Some
workers may have different proof cache settings, changing the required
memory for different tasks. Some workers may have a different count of
CPUs per core-complex, changing the max parallelism of PC1.

This change allows workers to customize these parameters with
environment variables. A worker could set the environment variable
PC1_MIN_MEMORY for example to customize the minimum memory requirement
for PC1 tasks. If no environment variables are specified, the resource
table on the miner is used, except for PC1 parallelism.

If PC1_MAX_PARALLELISM is not specified, and
FIL_PROOFS_USE_MULTICORE_SDR is set, PC1_MAX_PARALLELSIM will
automatically be set to FIL_PROOFS_MULTICORE_SDR_PRODUCERS + 1.
2021-11-30 02:06:58 +01:00
Clint Armstrong
c4f46171ae Report memory used and swap used in worker res
Attempting to report "memory used by other processes" in the MemReserved
field fails to take into account the fact that the system's memory used
includes memory used by ongoing tasks.

To properly account for this, worker should report the memory and swap
used, then the scheduler that is aware of the memory requirements for a
task can determine if there is sufficient memory available for a task.
2021-11-30 02:06:58 +01:00
Clint Armstrong
e2a1ca7caa Use cgroup limits in worker memory calculations
Worker processes may have memory limitations imposed by Systemd. But
/proc/meminfo shows the entire system memory regardless of these limits.
This results in the scheduler believing the worker has the entire system
memory avaliable and the worker being allocated too many tasks.

This change attempts to read cgroup memory limits for the worker
process. It supports cgroups v1 and v2, and compares cgroup limits
against the system memory and returns the most conservative values to
prevent the worker from being allocated too many tasks and potentially
triggering an OOM event.
2021-11-30 02:06:58 +01:00
zenground0
7d2b3f05db WIP sector storage and integration test 2021-11-29 10:24:00 -05:00
Łukasz Magiera
f352c18290 Don't remove sector data when moving data into a shared path 2021-10-11 21:11:38 +02:00
Raúl Kripalani
f3b6f8de1a add ability to ignore worker resources when scheduling. 2021-06-21 20:08:18 +01:00
aarshkshah1992
2a134887c3 logs to debug read & unseal 2021-06-07 15:03:09 +05:30
aarshkshah1992
3b792a32c3 better logging 2021-06-07 15:03:09 +05:30
aarshkshah1992
ad4b182bfe remove read task type and run gen and docsgen 2021-06-07 15:03:06 +05:30
Łukasz Magiera
a4f3758f4c worker api: better grouping 2020-11-30 23:16:30 +01:00
Łukasz Magiera
3672053ae9 worker: Support setting task types at runtime 2020-11-26 17:33:34 +01:00
Łukasz Magiera
b242d69805 Make storiface.CallError json-friendly 2020-11-17 16:28:41 +01:00
Łukasz Magiera
b8853aa4d5 Add error codes to worker return 2020-11-17 16:17:55 +01:00
Łukasz Magiera
6bea9dd178 Making sealing logic work with multiple seal proof types 2020-11-16 19:03:30 +01:00
zgfzgf
5bcc6339b4 optimize code replace strings with constants 2020-11-09 16:21:16 +08:00
Łukasz Magiera
774e2ecebf sched: Fix worker reenabling 2020-10-30 18:01:37 +01:00
Łukasz Magiera
4cf00b8b42 worker_local: address review 2020-10-28 14:29:17 +01:00
Łukasz Magiera
8731fe9112 sched: split worker handling into more funcs 2020-10-28 14:14:50 +01:00
Łukasz Magiera
660236b224 Merge remote-tracking branch 'origin/master' into feat/async-restartable-workers 2020-10-23 23:25:35 +02:00
Łukasz Magiera
8c86ea6b75 localworker: Try very hard to get ruselts to manager 2020-10-18 19:45:11 +02:00
Łukasz Magiera
dbb421c4f7 localworker: Use better context for calling returnFunc 2020-10-18 19:32:43 +02:00
Łukasz Magiera
8d06cca073 sched: Handle workers using sessions instead of connections 2020-10-18 12:36:06 +02:00
Łukasz Magiera
2cfe22d4e5 Merge remote-tracking branch 'origin/master' into feat/async-restartable-workers 2020-09-30 20:48:16 +02:00
Łukasz Magiera
1e6a69f8aa localworker: Don't mark calls as returned when returning fails 2020-09-28 22:10:02 +02:00
Łukasz Magiera
04ad1791b0 localworker: Fix contexts 2020-09-23 00:10:36 +02:00
Łukasz Magiera
706f4f2ef5 worker: Don't die with the connection 2020-09-22 18:36:44 +02:00
Łukasz Magiera
b8865fb182 workers: Mark on-restart-failed returned tasks as returned 2020-09-22 01:00:28 +02:00