Commit Graph

361 Commits

Author SHA1 Message Date
zenground0
a6ce9c13ff WIP sector storage and integration test 2022-01-10 22:49:29 -05:00
Aayush Rajasekaran
27e21e8db9 Update deps 2022-01-10 22:34:49 -05:00
Aayush Rajasekaran
b44596e48f Plug in the FFI call 2022-01-10 22:34:43 -05:00
Aayush Rajasekaran
3e3bd52c51 Integrate v7 actors 2022-01-10 22:34:33 -05:00
Łukasz Magiera
ba3c96f8c6 stores: Reduce log spam during retrievals 2021-12-10 16:47:32 -05:00
Łukasz Magiera
dafdb7689c Fix mock ReadPiece 2021-12-10 16:47:12 -05:00
Łukasz Magiera
46ba2b6b4f fr32: Reduce MTTresh from 32M to 512k per core
This results in 64x less bytes allocated when spawning new readers
for larger pieces.

Results in about 30% speedup in 1G unpad benchmark on AMD TR 2950x
2021-12-10 16:47:07 -05:00
Łukasz Magiera
b4c1e340ea piecereader: Avoid allocating 1024MB slices per read 2021-12-10 16:47:01 -05:00
Łukasz Magiera
a438e6fa73 piecereader: Avoid redundant roundtrips when seeking 2021-12-10 16:46:57 -05:00
Łukasz Magiera
b21d3ded2f piecereader: Move closer to storage 2021-12-10 16:46:52 -05:00
Łukasz Magiera
71329f6c41 Address Scheduler enhancements (#7703) review 2021-11-30 20:50:40 +01:00
Łukasz Magiera
001ecbb561 fix lint 2021-11-30 02:06:58 +01:00
Łukasz Magiera
a597b072b8 fix sched tests 2021-11-30 02:06:58 +01:00
Łukasz Magiera
f25efecb74 worker: Test resource table overrides 2021-11-30 02:06:58 +01:00
Łukasz Magiera
6d52d8552b Fix docsgen 2021-11-30 02:06:58 +01:00
Łukasz Magiera
c9a2ff4007 cleanup worker resource overrides 2021-11-30 02:06:58 +01:00
Łukasz Magiera
b961e1aab5 sched resources: Separate Parallelism defaults depending on GPU presence 2021-11-30 02:06:58 +01:00
Łukasz Magiera
36868a8749 sched: C2 is not all-core load 2021-11-30 02:06:58 +01:00
Clint Armstrong
4ef8543128 Permit workers to override resource table
In an environment with heterogenious worker nodes, a universal resource
table for all workers does not allow effective scheduling of tasks. Some
workers may have different proof cache settings, changing the required
memory for different tasks. Some workers may have a different count of
CPUs per core-complex, changing the max parallelism of PC1.

This change allows workers to customize these parameters with
environment variables. A worker could set the environment variable
PC1_MIN_MEMORY for example to customize the minimum memory requirement
for PC1 tasks. If no environment variables are specified, the resource
table on the miner is used, except for PC1 parallelism.

If PC1_MAX_PARALLELISM is not specified, and
FIL_PROOFS_USE_MULTICORE_SDR is set, PC1_MAX_PARALLELSIM will
automatically be set to FIL_PROOFS_MULTICORE_SDR_PRODUCERS + 1.
2021-11-30 02:06:58 +01:00
Clint Armstrong
93e4656a27 Use a float to represent GPU utilization
Before this change workers can only be allocated one GPU task,
regardless of how much of the GPU resources that task uses, or how many
GPUs are in the system.

This makes GPUUtilization a float which can represent that a task needs
a portion, or multiple GPUs. GPUs are accounted for like RAM and CPUs so
that workers with more GPUs can be allocated more tasks.

A known issue is that PC2 cannot use multiple GPUs. And even if the
worker has multiple GPUs and is allocated multiple PC2 tasks, those
tasks will only run on the first GPU.

This could result in unexpected behavior when a worker with multiple
GPUs is assigned multiple PC2 tasks. But this should not suprise any
existing users who upgrade, as any existing users who run workers with
multiple GPUs should already know this and be running a worker per GPU
for PC2. But now those users have the freedom to customize the GPU
utilization of PC2 to be less than one and effectively run multiple PC2
processes in a single worker.

C2 is capable of utilizing multiple GPUs, and now workers can be
customized for C2 accordingly.
2021-11-30 02:06:58 +01:00
Clint Armstrong
c4f46171ae Report memory used and swap used in worker res
Attempting to report "memory used by other processes" in the MemReserved
field fails to take into account the fact that the system's memory used
includes memory used by ongoing tasks.

To properly account for this, worker should report the memory and swap
used, then the scheduler that is aware of the memory requirements for a
task can determine if there is sufficient memory available for a task.
2021-11-30 02:06:58 +01:00
Clint Armstrong
e2a1ca7caa Use cgroup limits in worker memory calculations
Worker processes may have memory limitations imposed by Systemd. But
/proc/meminfo shows the entire system memory regardless of these limits.
This results in the scheduler believing the worker has the entire system
memory avaliable and the worker being allocated too many tasks.

This change attempts to read cgroup memory limits for the worker
process. It supports cgroups v1 and v2, and compares cgroup limits
against the system memory and returns the most conservative values to
prevent the worker from being allocated too many tasks and potentially
triggering an OOM event.
2021-11-30 02:06:58 +01:00
Łukasz Magiera
d21c44e266 ffiwrapper: Validate PC2 by calling C1 with random seeds 2021-11-30 01:33:05 +01:00
Łukasz Magiera
05aa860459 Request correct read size with startOffset in pieceProvider 2021-11-27 00:05:45 +01:00
Łukasz Magiera
743ce5a40f Add startOffset support to mock SectorMgr.ReadPiece 2021-11-26 18:48:52 +01:00
Łukasz Magiera
f6de16e95a Fix sector-storage tests 2021-11-26 18:16:53 +01:00
Łukasz Magiera
8d955d5f30 dagstore mount: Add random access support 2021-11-26 17:40:53 +01:00
Łukasz Magiera
8454abcf45 storage: Use 1M buffers for Tar transfers 2021-11-24 20:08:37 +01:00
Łukasz Magiera
2a1505b364 storage: Test StorageFindSector with groups 2021-11-23 16:11:04 +01:00
Łukasz Magiera
8b548ac02f storage: Check allowlists in StorageFindSector 2021-11-23 16:11:04 +01:00
Łukasz Magiera
5c77c25747 storage: Add Group tags to StorageInfo 2021-11-23 16:11:04 +01:00
Łukasz Magiera
d1a63e4173 remote store: Remove debug printf 2021-11-22 17:50:12 +01:00
Łukasz Magiera
e508055dc1 make gen 2021-10-19 11:13:23 +02:00
Łukasz Magiera
080aa3356a Fix locks in worker-tracked 2021-10-18 20:19:21 +02:00
Łukasz Magiera
70589e4406 Block work in tracked worker before it is started 2021-10-18 18:44:56 +02:00
Łukasz Magiera
261238e157 Show prepared tasks in sealing jobs 2021-10-18 18:44:56 +02:00
Łukasz Magiera
11d738eee0 Track prepared work 2021-10-18 18:44:56 +02:00
Marten Seemann
03806f7063 add missing build constraint to statfs_unix.go 2021-10-18 16:29:01 +02:00
Łukasz Magiera
f352c18290 Don't remove sector data when moving data into a shared path 2021-10-11 21:11:38 +02:00
Łukasz Magiera
9af82f2d68 sched: Fix taskDone chan deadlock 2021-10-03 17:09:43 +02:00
Łukasz Magiera
b87142ec8e wip improve scheduling of ready work 2021-10-03 10:38:08 +02:00
Łukasz Magiera
d7fbd8b67d Update proofs to v10.0.0 2021-10-01 18:38:27 +02:00
Łukasz Magiera
a8a9818043 Expose storage states on the metrics endpoint 2021-10-01 14:45:01 +02:00
Łukasz Magiera
ef03314c6d storagemgr: Cleanup workerLk around worker resources 2021-09-15 16:35:19 +02:00
Łukasz Magiera
3118bd1039 stores: Fix reserved disk usage log spam 2021-08-31 13:36:09 +02:00
Steven Allen
1cf556c3a2 feat: expose ChainGetPath on the gateway 2021-08-30 16:43:21 -07:00
Łukasz Magiera
2293ecd8e8 Reduce lotus-miner startup spam 2021-08-27 19:41:54 +02:00
Aarsh Shah
d7076778e2
integrate DAG store and CARv2 in deal-making (#6671)
This commit removes badger from the deal-making processes, and
moves to a new architecture with the dagstore as the cental
component on the miner-side, and CARv2s on the client-side.

Every deal that has been handed off to the sealing subsystem becomes
a shard in the dagstore. Shards are mounted via the LotusMount, which
teaches the dagstore how to load the related piece when serving
retrievals.

When the miner starts the Lotus for the first time with this patch,
we will perform a one-time migration of all active deals into the
dagstore. This is a lightweight process, and it consists simply
of registering the shards in the dagstore.

Shards are backed by the unsealed copy of the piece. This is currently
a CARv1. However, the dagstore keeps CARv2 indices for all pieces, so
when it's time to acquire a shard to serve a retrieval, the unsealed
CARv1 is joined with its index (safeguarded by the dagstore), to form
a read-only blockstore, thus taking the place of the monolithic
badger.

Data transfers have been adjusted to interface directly with CARv2 files.
On inbound transfers (client retrievals, miner storage deals), we stream
the received data into a CARv2 ReadWrite blockstore. On outbound transfers
(client storage deals, miner retrievals), we serve the data off a CARv2
ReadOnly blockstore.

Client-side imports are managed by the refactored *imports.Manager
component (when not using IPFS integration). Just like it before, we use
the go-filestore library to avoid duplicating the data from the original
file in the resulting UnixFS DAG (concretely the leaves). However, the
target of those imports are what we call "ref-CARv2s": CARv2 files placed
under the `$LOTUS_PATH/imports` directory, containing the intermediate
nodes in full, and the leaves as positional references to the original file
on disk.

Client-side retrievals are placed into CARv2 files in the location:
`$LOTUS_PATH/retrievals`.

A new set of `Dagstore*` JSON-RPC operations and `lotus-miner dagstore`
subcommands have been introduced on the miner-side to inspect and manage
the dagstore.

Despite moving to a CARv2-backed system, the IPFS integration has been
respected, and it continues to be possible to make storage deals with data
held in an IPFS node, and to perform retrievals directly into an IPFS node.

NOTE: because the "staging" and "client" Badger blockstores are no longer
used, existing imports on the client will be rendered useless. On startup,
Lotus will enumerate all imports and print WARN statements on the log for
each import that needs to be reimported. These log lines contain these
messages:

- import lacks carv2 path; import will not work; please reimport
- import has missing/broken carv2; please reimport

At the end, we will print a "sanity check completed" message indicating
the count of imports found, and how many were deemed broken.

Co-authored-by: Aarsh Shah <aarshkshah1992@gmail.com>
Co-authored-by: Dirk McCormick <dirkmdev@gmail.com>

Co-authored-by: Raúl Kripalani <raul@protocol.ai>
Co-authored-by: Dirk McCormick <dirkmdev@gmail.com>
2021-08-16 23:34:32 +01:00
Steven Allen
18f39be3ba fix: don't check for t_aux when proving
We don't need it.
2021-08-09 11:07:35 -07:00
Anton Evangelatov
16784aa2cc remove pieceProvider from DI; small refactors 2021-07-12 11:30:26 +02:00