Commit Graph

14237 Commits

Author SHA1 Message Date
Łukasz Magiera
73f16f08e3
Merge pull request #7703 from filecoin-project/feat/scheduler-enhancements
Scheduler enhancements
2021-11-30 19:14:25 +01:00
Łukasz Magiera
330cfc33ee worker: Typo in resources cmd usage
Co-authored-by: Aayush Rajasekaran <arajasek94@gmail.com>
2021-11-30 02:06:58 +01:00
Łukasz Magiera
cf20b0b2b8 worker: Command to print resource-table env vars 2021-11-30 02:06:58 +01:00
Łukasz Magiera
001ecbb561 fix lint 2021-11-30 02:06:58 +01:00
Łukasz Magiera
a597b072b8 fix sched tests 2021-11-30 02:06:58 +01:00
Łukasz Magiera
f25efecb74 worker: Test resource table overrides 2021-11-30 02:06:58 +01:00
Łukasz Magiera
6d52d8552b Fix docsgen 2021-11-30 02:06:58 +01:00
Łukasz Magiera
c9a2ff4007 cleanup worker resource overrides 2021-11-30 02:06:58 +01:00
Łukasz Magiera
b961e1aab5 sched resources: Separate Parallelism defaults depending on GPU presence 2021-11-30 02:06:58 +01:00
Łukasz Magiera
36868a8749 sched: C2 is not all-core load 2021-11-30 02:06:58 +01:00
Clint Armstrong
4ef8543128 Permit workers to override resource table
In an environment with heterogenious worker nodes, a universal resource
table for all workers does not allow effective scheduling of tasks. Some
workers may have different proof cache settings, changing the required
memory for different tasks. Some workers may have a different count of
CPUs per core-complex, changing the max parallelism of PC1.

This change allows workers to customize these parameters with
environment variables. A worker could set the environment variable
PC1_MIN_MEMORY for example to customize the minimum memory requirement
for PC1 tasks. If no environment variables are specified, the resource
table on the miner is used, except for PC1 parallelism.

If PC1_MAX_PARALLELISM is not specified, and
FIL_PROOFS_USE_MULTICORE_SDR is set, PC1_MAX_PARALLELSIM will
automatically be set to FIL_PROOFS_MULTICORE_SDR_PRODUCERS + 1.
2021-11-30 02:06:58 +01:00
Clint Armstrong
93e4656a27 Use a float to represent GPU utilization
Before this change workers can only be allocated one GPU task,
regardless of how much of the GPU resources that task uses, or how many
GPUs are in the system.

This makes GPUUtilization a float which can represent that a task needs
a portion, or multiple GPUs. GPUs are accounted for like RAM and CPUs so
that workers with more GPUs can be allocated more tasks.

A known issue is that PC2 cannot use multiple GPUs. And even if the
worker has multiple GPUs and is allocated multiple PC2 tasks, those
tasks will only run on the first GPU.

This could result in unexpected behavior when a worker with multiple
GPUs is assigned multiple PC2 tasks. But this should not suprise any
existing users who upgrade, as any existing users who run workers with
multiple GPUs should already know this and be running a worker per GPU
for PC2. But now those users have the freedom to customize the GPU
utilization of PC2 to be less than one and effectively run multiple PC2
processes in a single worker.

C2 is capable of utilizing multiple GPUs, and now workers can be
customized for C2 accordingly.
2021-11-30 02:06:58 +01:00
Clint Armstrong
c4f46171ae Report memory used and swap used in worker res
Attempting to report "memory used by other processes" in the MemReserved
field fails to take into account the fact that the system's memory used
includes memory used by ongoing tasks.

To properly account for this, worker should report the memory and swap
used, then the scheduler that is aware of the memory requirements for a
task can determine if there is sufficient memory available for a task.
2021-11-30 02:06:58 +01:00
Clint Armstrong
e2a1ca7caa Use cgroup limits in worker memory calculations
Worker processes may have memory limitations imposed by Systemd. But
/proc/meminfo shows the entire system memory regardless of these limits.
This results in the scheduler believing the worker has the entire system
memory avaliable and the worker being allocated too many tasks.

This change attempts to read cgroup memory limits for the worker
process. It supports cgroups v1 and v2, and compares cgroup limits
against the system memory and returns the most conservative values to
prevent the worker from being allocated too many tasks and potentially
triggering an OOM event.
2021-11-30 02:06:58 +01:00
Łukasz Magiera
19e808fffb
Merge pull request #7710 from filecoin-project/feat/pc2-validation
ffiwrapper: Validate PC2 by calling C1 with random seeds
2021-11-30 02:05:32 +01:00
Łukasz Magiera
d21c44e266 ffiwrapper: Validate PC2 by calling C1 with random seeds 2021-11-30 01:33:05 +01:00
Jiaying Wang
01215b7d2c
Merge pull request #7709 from filecoin-project/logic-error-miner-entrypoint
fix logic error
2021-11-29 18:16:58 -05:00
Łukasz Magiera
da9b0c6735
Merge pull request #7708 from filecoin-project/feat/update-graphsync-v0.10.6
Update go-graphsync v0.10.6
2021-11-29 23:51:32 +01:00
Jiaying Wang
4d2f3375e8
Merge pull request #7699 from filecoin-project/feat/listcids-verbose
Add verbose mode to lotus-miner pieces list-cids
2021-11-29 17:43:09 -05:00
Cory Schwartz
27efbc3485 fix logic error 2021-11-29 13:52:06 -08:00
hannahhoward
d4074e45cf feat(deps): update go-graphsync v0.10.6 2021-11-29 13:29:12 -08:00
Łukasz Magiera
f39942283b
Merge pull request #7706 from filecoin-project/feat/ret-matchding-cars
retrieval: Only output matching nodes, MatchPath dagspec
2021-11-29 22:14:25 +01:00
Łukasz Magiera
468b90a4a4 cli docsgen 2021-11-29 21:50:23 +01:00
Łukasz Magiera
092e12d1be cli: boolean logic is hard 2021-11-29 21:41:38 +01:00
Łukasz Magiera
320b36495d gofmt 2021-11-29 21:39:27 +01:00
Łukasz Magiera
e4f47de6ef retrieval: Check required flags for --car-export-merkle-proof 2021-11-29 21:39:18 +01:00
Łukasz Magiera
410ecb4bbc retrieval: --car-export-merkle-proof flag for client retrieve 2021-11-29 21:37:28 +01:00
Łukasz Magiera
d019853687
review: Cleanup some comments
Co-authored-by: Peter Rabbitson <ribasushi@protocol.ai>
2021-11-29 21:29:00 +01:00
Łukasz Magiera
4d51980cb5 deps: Use tagged go-ipld-selector-text-lite 2021-11-29 21:22:30 +01:00
Łukasz Magiera
5b5e6b9e44 retrieval: DagSpec.MatchPath -> ExportMerkleProof 2021-11-29 21:14:00 +01:00
Łukasz Magiera
58a084049d retrieval: Fix traversal in ls 2021-11-29 21:08:53 +01:00
Łukasz Magiera
9538fc9723 mod tidy, docsgen 2021-11-29 20:56:40 +01:00
Łukasz Magiera
227188e908 retrieval: Test non-matching path traversal 2021-11-29 20:52:55 +01:00
Łukasz Magiera
61791b90ea retrieval: Only output matching nodes, MatchPath dagspec 2021-11-29 20:40:55 +01:00
Steven Allen
797147097c
Merge pull request #7689 from filecoin-project/disable-mplex
disable mplex stream muxer
2021-11-29 09:46:24 -08:00
Łukasz Magiera
f8b132890c Add verbose mode to lotus-miner pieces list-cids 2021-11-29 17:19:47 +01:00
Łukasz Magiera
134aee4582 mod tidy 2021-11-29 16:44:58 +01:00
Łukasz Magiera
77d75b7739
Merge pull request #7688 from filecoin-project/chore/partret_cleanup_comment_and_flow
Cleanup partial retrieval codepaths ( zero functional changes )
2021-11-29 16:39:55 +01:00
Łukasz Magiera
26c3752f48
Merge pull request #7693 from filecoin-project/feat/randacc-dagstore-mount
Make small retrieval 200x faster
2021-11-29 16:10:18 +01:00
Łukasz Magiera
4bcde2f0ff dagstore pieceReader: Cleanup reader nil check 2021-11-29 15:32:27 +01:00
Jiaying Wang
160bb0b050
Merge pull request #7698 from filecoin-project/jen/releasetomaster
Releases back to master
2021-11-26 18:44:18 -05:00
Łukasz Magiera
05aa860459 Request correct read size with startOffset in pieceProvider 2021-11-27 00:05:45 +01:00
Jiaying Wang
a4c2a20851
Merge pull request #7691 from filecoin-project/misc/rle-dump
Add RLE dump code
2021-11-26 17:15:52 -05:00
Jennifer Wang
4df9179d34 Merge branch 'releases' into jen/releasetomaster 2021-11-26 17:09:22 -05:00
Jiaying Wang
8943de27e4
Merge pull request #7695 from filecoin-project/release/v1.13.1
Release/v1.13.1
2021-11-26 16:59:54 -05:00
Jiaying Wang
fc728d9e11
Merge pull request #7694 from filecoin-project/jen/v1.13.1-prep
v1.13.1 prep
2021-11-26 16:44:46 -05:00
Jennifer Wang
f6131ce3b5 v1.13.1 prep 2021-11-26 16:33:39 -05:00
Łukasz Magiera
9110e6f632 dagstore pieceReader: add debug log on stream restart 2021-11-26 20:24:51 +01:00
Łukasz Magiera
331702cd95 Tweak MaxPieceReaderBurnBytes 2021-11-26 18:49:41 +01:00
Łukasz Magiera
743ce5a40f Add startOffset support to mock SectorMgr.ReadPiece 2021-11-26 18:48:52 +01:00