Commit Graph

1007 Commits

Author SHA1 Message Date
Łukasz Magiera
c8e0341248 Fix missing FinalizeReplicaUpdate in tests 2022-02-08 17:22:41 +01:00
Łukasz Magiera
09cfad9d71 Add FinalizeReplicaUpdate into some more places 2022-02-08 17:22:41 +01:00
Łukasz Magiera
e271bae5ec try ClearCache for update cache 2022-02-08 17:22:41 +01:00
Łukasz Magiera
142ba6660a wip FinalizeReplicaUpdate 2022-02-08 17:22:41 +01:00
Aayush Rajasekaran
f476aa937e update to latest FFI 2022-02-08 10:45:58 -05:00
zenground0
1ab2744c84 Fix log 2022-02-07 09:15:23 -05:00
zenground0
47ffceef0d Check sector is active before PRU 2022-02-07 07:41:48 -05:00
Jennifer Wang
33b53c4a0d update to ffi v11.0.1 2022-02-03 11:15:47 -05:00
zenground0
13ccb8cbfe Stop recovery attempts after fault 2022-02-01 11:39:42 +05:30
Aayush
37a345b39d Update FFI 2022-01-27 15:30:01 -05:00
Darko Brdareski
e51ce5c508 Merge remote-tracking branch 'upstream/master' into bloxico/system-test-matrix 2022-01-27 10:57:56 +01:00
Aayush
817c155287 chore: deps: update to latest proofs 2022-01-25 13:23:00 -05:00
Aayush Rajasekaran
a6460be37b remove a log 2022-01-25 13:02:45 -05:00
Aayush
e7123d1a8e fix: sealer: correctly pipe through errors for SectorAbortUpgrade 2022-01-25 13:02:00 -05:00
Aayush
50aba9a8e6 fix: sealer: don't replica update sectors unless they have deals in them 2022-01-25 13:01:56 -05:00
Aayush
1b18236f91 feat: sealer: allow users to abort in-flight snap upgrades 2022-01-25 13:01:51 -05:00
Aayush
e17ae2eaf4 fix: sealer: manager should lock Unsealed for ReplicaUpdate 2022-01-25 13:01:37 -05:00
Aayush
6d567b36e3 Fix: sealer: ReplicaUpdate should fetch the correct files 2022-01-25 13:01:27 -05:00
Aayush Rajasekaran
7b7ab016db create replica update paths in acquireSectors 2022-01-25 13:01:18 -05:00
Aayush Rajasekaran
0c9c94bad1 fix: checkReplica incorrectly returns ErrBadPR 2022-01-25 13:01:12 -05:00
Aayush Rajasekaran
a3c5fadcc0 feat: sealing: Add ReplicaUpdate work to Resource table 2022-01-25 13:01:05 -05:00
Aayush Rajasekaran
2d0929e305
remove a log 2022-01-25 12:55:56 -05:00
Aayush Rajasekaran
92e6f29cc8
chore: sealer: quieten a log 2022-01-24 18:28:52 -05:00
Aayush
5cafdc2f29 fix: sealer: manager should lock Unsealed for ReplicaUpdate 2022-01-21 11:12:12 -05:00
Aayush Rajasekaran
ff845aa793
Merge pull request #7977 from filecoin-project/chore/sealtasks-comment
chore: remove inaccurate comment in sealtasks
2022-01-21 10:53:04 -05:00
Aayush
752f4a3d67 Fix: sealer: ReplicaUpdate should fetch the correct files 2022-01-20 15:06:53 -05:00
Aayush Rajasekaran
3ff23ecbfa :fix: checkReplica incorrectly returns ErrBadPR 2022-01-19 12:00:27 -05:00
Aayush Rajasekaran
ab8bf393c2 create replica update paths in acquireSectors 2022-01-19 11:41:38 -05:00
Aayush Rajasekaran
d0390181ec feat: sealing: Add ReplicaUpdate work to Resource table 2022-01-19 11:41:32 -05:00
Łukasz Magiera
c41ccb6c37
chore: remove inaccurate comment in sealtasks 2022-01-19 10:46:37 +01:00
Aayush Rajasekaran
aad8aa0893 Appease the linter 2022-01-14 17:15:44 -05:00
Jennifer Wang
30013c1f06 fix lint 2022-01-14 17:15:44 -05:00
Jennifer Wang
6901e998e6 Check piece before PRU2 instead of PRU1 as PRU2 is the heavy computation part 2022-01-14 17:15:44 -05:00
Jennifer Wang
8939d5982f just use checkPiece 2022-01-14 17:15:44 -05:00
Jennifer Wang
a20916f9af Add more deal expiration handling for snap deals 2022-01-14 17:15:44 -05:00
Jennifer Wang
ac3bea489b Integrate proof v11.0.0 2022-01-14 17:14:32 -05:00
Aayush Rajasekaran
ca57546ef5 Remove unnecessary params from VerifyWinningPost 2022-01-14 17:14:32 -05:00
zenground0
5b0a0baa9a Fix hande deal recover return value bug 2022-01-14 17:14:32 -05:00
zenground0
a9a523d8c0 Fix TooManyMarkedForUpgrade 2022-01-14 17:14:32 -05:00
zenground0
d6aa17e21f Snap Deals Integration
- FSM handles the actual cc upgrade process including error states
- PoSting (winning and window) works over upgraded and upgrading sectors
- Integration test and changes to itest framework to reduce flakes
- Update CLI to handle new upgrade
- Update dependencies
2022-01-14 17:14:32 -05:00
Aayush Rajasekaran
3a8ac6dffa Update FFI 2022-01-14 17:14:32 -05:00
zenground0
d1480c36c0 RemoveData and Decode
- Unsealing replica update with sector key works and tested
- Sector key generation added and tested
2022-01-14 17:14:32 -05:00
zenground0
4936b4ea44 Review Response 2022-01-14 17:14:32 -05:00
zenground0
c4069824f7 WIP 2022-01-14 17:14:32 -05:00
zenground0
93656e65f8 WIP sector storage and integration test 2022-01-14 17:14:32 -05:00
Aayush Rajasekaran
073b7b4ff5 Update FFI 2022-01-14 17:14:32 -05:00
Aayush Rajasekaran
393d8541e2 Update deps 2022-01-14 17:14:32 -05:00
Aayush Rajasekaran
1ef780d96f Plug in the FFI call 2022-01-14 17:14:32 -05:00
Aayush Rajasekaran
a8cb027c08 Integrate v7 actors 2022-01-14 17:14:32 -05:00
Aayush Rajasekaran
bda4e5be95 Appease the linter 2022-01-12 18:10:07 -05:00
Jennifer Wang
fd50cd128a fix lint 2022-01-11 18:34:26 -05:00
Jennifer Wang
e99b98873c Check piece before PRU2 instead of PRU1 as PRU2 is the heavy computation part 2022-01-11 18:34:20 -05:00
Jennifer Wang
6b953a03d0 just use checkPiece 2022-01-11 18:34:15 -05:00
Jennifer Wang
37a3e610b7 Add more deal expiration handling for snap deals 2022-01-11 18:34:10 -05:00
Jiaying Wang
0130b28879
Merge pull request #7923 from filecoin-project/jen/proofv11
chore: deps: Integrate proof v11.0.0
2022-01-11 17:49:43 -05:00
Aayush Rajasekaran
d645c5fbab Remove unnecessary params from VerifyWinningPost 2022-01-11 12:06:39 -05:00
Jennifer Wang
5b7da270c9 Integrate proof v11.0.0 2022-01-10 23:45:04 -05:00
zenground0
d16c5d0e93 Fix hande deal recover return value bug 2022-01-10 15:47:20 +05:30
zenground0
c309686679 Fix TooManyMarkedForUpgrade 2022-01-10 15:39:38 +05:30
zenground0
33f2d24f54 Snap Deals Integration
- FSM handles the actual cc upgrade process including error states
- PoSting (winning and window) works over upgraded and upgrading sectors
- Integration test and changes to itest framework to reduce flakes
- Update CLI to handle new upgrade
- Update dependencies
2022-01-10 15:39:38 +05:30
llifezou
4b685c5e26 Include worker name in sealing errors 2021-12-23 17:44:43 +08:00
Darko Brdareski
dda1a42a2a Merge branch 'bloxico/system-test-matrix' of https://github.com/filecoin-project/lotus into merge_lotus 2021-12-20 15:48:16 +01:00
shotcollin
d10d0a20b1
fix typo in log warning
very minor but this warning comes up a lot so it'd be nicer if it wasn't a grammatical error too
2021-12-19 17:07:11 -07:00
Darko Brdareski
2f1f35cc71 Annotate storage miner features 2021-12-15 15:30:42 +01:00
Aayush Rajasekaran
3e288f1066 Update FFI 2021-12-13 15:47:17 -05:00
Aayush Rajasekaran
80d5e52923 Merge branch 'master' into next 2021-12-13 13:24:28 -05:00
Łukasz Magiera
ba3c96f8c6 stores: Reduce log spam during retrievals 2021-12-10 16:47:32 -05:00
Łukasz Magiera
dafdb7689c Fix mock ReadPiece 2021-12-10 16:47:12 -05:00
Łukasz Magiera
46ba2b6b4f fr32: Reduce MTTresh from 32M to 512k per core
This results in 64x less bytes allocated when spawning new readers
for larger pieces.

Results in about 30% speedup in 1G unpad benchmark on AMD TR 2950x
2021-12-10 16:47:07 -05:00
Łukasz Magiera
b4c1e340ea piecereader: Avoid allocating 1024MB slices per read 2021-12-10 16:47:01 -05:00
Łukasz Magiera
a438e6fa73 piecereader: Avoid redundant roundtrips when seeking 2021-12-10 16:46:57 -05:00
Łukasz Magiera
b21d3ded2f piecereader: Move closer to storage 2021-12-10 16:46:52 -05:00
Darko Brdareski
0169d0dafd Annotate state feature tests 2021-12-10 16:08:25 +01:00
Łukasz Magiera
e8ef39e734 stores: Reduce log spam during retrievals 2021-12-10 11:28:04 +01:00
Łukasz Magiera
c31f4de7d5 Fix mock ReadPiece 2021-12-09 16:26:59 +01:00
Łukasz Magiera
6fd1609410 fr32: Reduce MTTresh from 32M to 512k per core
This results in 64x less bytes allocated when spawning new readers
for larger pieces.

Results in about 30% speedup in 1G unpad benchmark on AMD TR 2950x
2021-12-09 16:14:47 +01:00
Łukasz Magiera
9c75a3aaa8 piecereader: Avoid allocating 1024MB slices per read 2021-12-09 15:49:43 +01:00
Łukasz Magiera
a3d8494a04 piecereader: Avoid redundant roundtrips when seeking 2021-12-09 14:52:33 +01:00
Łukasz Magiera
13b260e7f7 piecereader: Move closer to storage 2021-12-08 23:20:20 +01:00
zenground0
a5be80828a RemoveData and Decode
- Unsealing replica update with sector key works and tested
- Sector key generation added and tested
2021-12-03 15:21:06 -05:00
Łukasz Magiera
727765b248 Command to list active sector locks 2021-12-03 12:33:23 +01:00
Łukasz Magiera
71329f6c41 Address Scheduler enhancements (#7703) review 2021-11-30 20:50:40 +01:00
zenground0
40d16a8f88 Review Response 2021-11-30 13:53:37 -05:00
zenground0
f88fcdbcfc WIP 2021-11-30 12:40:14 -05:00
Łukasz Magiera
001ecbb561 fix lint 2021-11-30 02:06:58 +01:00
Łukasz Magiera
a597b072b8 fix sched tests 2021-11-30 02:06:58 +01:00
Łukasz Magiera
f25efecb74 worker: Test resource table overrides 2021-11-30 02:06:58 +01:00
Łukasz Magiera
6d52d8552b Fix docsgen 2021-11-30 02:06:58 +01:00
Łukasz Magiera
c9a2ff4007 cleanup worker resource overrides 2021-11-30 02:06:58 +01:00
Łukasz Magiera
b961e1aab5 sched resources: Separate Parallelism defaults depending on GPU presence 2021-11-30 02:06:58 +01:00
Łukasz Magiera
36868a8749 sched: C2 is not all-core load 2021-11-30 02:06:58 +01:00
Clint Armstrong
4ef8543128 Permit workers to override resource table
In an environment with heterogenious worker nodes, a universal resource
table for all workers does not allow effective scheduling of tasks. Some
workers may have different proof cache settings, changing the required
memory for different tasks. Some workers may have a different count of
CPUs per core-complex, changing the max parallelism of PC1.

This change allows workers to customize these parameters with
environment variables. A worker could set the environment variable
PC1_MIN_MEMORY for example to customize the minimum memory requirement
for PC1 tasks. If no environment variables are specified, the resource
table on the miner is used, except for PC1 parallelism.

If PC1_MAX_PARALLELISM is not specified, and
FIL_PROOFS_USE_MULTICORE_SDR is set, PC1_MAX_PARALLELSIM will
automatically be set to FIL_PROOFS_MULTICORE_SDR_PRODUCERS + 1.
2021-11-30 02:06:58 +01:00
Clint Armstrong
93e4656a27 Use a float to represent GPU utilization
Before this change workers can only be allocated one GPU task,
regardless of how much of the GPU resources that task uses, or how many
GPUs are in the system.

This makes GPUUtilization a float which can represent that a task needs
a portion, or multiple GPUs. GPUs are accounted for like RAM and CPUs so
that workers with more GPUs can be allocated more tasks.

A known issue is that PC2 cannot use multiple GPUs. And even if the
worker has multiple GPUs and is allocated multiple PC2 tasks, those
tasks will only run on the first GPU.

This could result in unexpected behavior when a worker with multiple
GPUs is assigned multiple PC2 tasks. But this should not suprise any
existing users who upgrade, as any existing users who run workers with
multiple GPUs should already know this and be running a worker per GPU
for PC2. But now those users have the freedom to customize the GPU
utilization of PC2 to be less than one and effectively run multiple PC2
processes in a single worker.

C2 is capable of utilizing multiple GPUs, and now workers can be
customized for C2 accordingly.
2021-11-30 02:06:58 +01:00
Clint Armstrong
c4f46171ae Report memory used and swap used in worker res
Attempting to report "memory used by other processes" in the MemReserved
field fails to take into account the fact that the system's memory used
includes memory used by ongoing tasks.

To properly account for this, worker should report the memory and swap
used, then the scheduler that is aware of the memory requirements for a
task can determine if there is sufficient memory available for a task.
2021-11-30 02:06:58 +01:00
Clint Armstrong
e2a1ca7caa Use cgroup limits in worker memory calculations
Worker processes may have memory limitations imposed by Systemd. But
/proc/meminfo shows the entire system memory regardless of these limits.
This results in the scheduler believing the worker has the entire system
memory avaliable and the worker being allocated too many tasks.

This change attempts to read cgroup memory limits for the worker
process. It supports cgroups v1 and v2, and compares cgroup limits
against the system memory and returns the most conservative values to
prevent the worker from being allocated too many tasks and potentially
triggering an OOM event.
2021-11-30 02:06:58 +01:00
Łukasz Magiera
d21c44e266 ffiwrapper: Validate PC2 by calling C1 with random seeds 2021-11-30 01:33:05 +01:00
zenground0
7d2b3f05db WIP sector storage and integration test 2021-11-29 10:24:00 -05:00
Łukasz Magiera
05aa860459 Request correct read size with startOffset in pieceProvider 2021-11-27 00:05:45 +01:00
Łukasz Magiera
743ce5a40f Add startOffset support to mock SectorMgr.ReadPiece 2021-11-26 18:48:52 +01:00
Łukasz Magiera
f6de16e95a Fix sector-storage tests 2021-11-26 18:16:53 +01:00