Do not upgrade to the latest storetheindex, because it uses go-ipfs for
bootstrapping mechanism and that comes with dependencies that we don not
wish to upgrade in lotus just now; namely:
- go-bitswap
- go-path
- interface-go-ipfs-core
- go-libp2p-kad-dht
- go-libp2p-quic-transport
This is the very first iteration of the lotus PR template. The goal of adding PR template is to standardize PR requests and encourage contributors to:
- come up with good PR descriptions to give code reviewers a clear overview of what's in the PR
- have a clear PR title as lotus generates a change log based on it
- check that tests and documentation for the codes that changed are icnluded
The PR type follows the https://www.conventionalcommits.org/en/v1.0.0-beta.2/. The [contribution guideline](https://github.com/filecoin-project/lotus#contribute) should be updated with how to create a pr after the template is accepted.
In an environment with heterogenious worker nodes, a universal resource
table for all workers does not allow effective scheduling of tasks. Some
workers may have different proof cache settings, changing the required
memory for different tasks. Some workers may have a different count of
CPUs per core-complex, changing the max parallelism of PC1.
This change allows workers to customize these parameters with
environment variables. A worker could set the environment variable
PC1_MIN_MEMORY for example to customize the minimum memory requirement
for PC1 tasks. If no environment variables are specified, the resource
table on the miner is used, except for PC1 parallelism.
If PC1_MAX_PARALLELISM is not specified, and
FIL_PROOFS_USE_MULTICORE_SDR is set, PC1_MAX_PARALLELSIM will
automatically be set to FIL_PROOFS_MULTICORE_SDR_PRODUCERS + 1.
Before this change workers can only be allocated one GPU task,
regardless of how much of the GPU resources that task uses, or how many
GPUs are in the system.
This makes GPUUtilization a float which can represent that a task needs
a portion, or multiple GPUs. GPUs are accounted for like RAM and CPUs so
that workers with more GPUs can be allocated more tasks.
A known issue is that PC2 cannot use multiple GPUs. And even if the
worker has multiple GPUs and is allocated multiple PC2 tasks, those
tasks will only run on the first GPU.
This could result in unexpected behavior when a worker with multiple
GPUs is assigned multiple PC2 tasks. But this should not suprise any
existing users who upgrade, as any existing users who run workers with
multiple GPUs should already know this and be running a worker per GPU
for PC2. But now those users have the freedom to customize the GPU
utilization of PC2 to be less than one and effectively run multiple PC2
processes in a single worker.
C2 is capable of utilizing multiple GPUs, and now workers can be
customized for C2 accordingly.
Attempting to report "memory used by other processes" in the MemReserved
field fails to take into account the fact that the system's memory used
includes memory used by ongoing tasks.
To properly account for this, worker should report the memory and swap
used, then the scheduler that is aware of the memory requirements for a
task can determine if there is sufficient memory available for a task.
Worker processes may have memory limitations imposed by Systemd. But
/proc/meminfo shows the entire system memory regardless of these limits.
This results in the scheduler believing the worker has the entire system
memory avaliable and the worker being allocated too many tasks.
This change attempts to read cgroup memory limits for the worker
process. It supports cgroups v1 and v2, and compares cgroup limits
against the system memory and returns the most conservative values to
prevent the worker from being allocated too many tasks and potentially
triggering an OOM event.