Syntaxt of selection is located at
https://pkg.go.dev/github.com/ipld/go-ipld-selector-text-lite#SelectorSpecFromPath
Example use, assuming that:
- The root of the deal is a plain dag-pb unixfs directory
- The directory is not sharded
- The user wants to retrieve the first entry in that directory
lotus client retrieve --miner f0XXXXX --datamodel-path-selector 'Links/0/Hash' bafyROOTCID ~/output
For a much more elaborate example see the top of ./itests/deals_partial_retrieval_test.go
This commit removes badger from the deal-making processes, and
moves to a new architecture with the dagstore as the cental
component on the miner-side, and CARv2s on the client-side.
Every deal that has been handed off to the sealing subsystem becomes
a shard in the dagstore. Shards are mounted via the LotusMount, which
teaches the dagstore how to load the related piece when serving
retrievals.
When the miner starts the Lotus for the first time with this patch,
we will perform a one-time migration of all active deals into the
dagstore. This is a lightweight process, and it consists simply
of registering the shards in the dagstore.
Shards are backed by the unsealed copy of the piece. This is currently
a CARv1. However, the dagstore keeps CARv2 indices for all pieces, so
when it's time to acquire a shard to serve a retrieval, the unsealed
CARv1 is joined with its index (safeguarded by the dagstore), to form
a read-only blockstore, thus taking the place of the monolithic
badger.
Data transfers have been adjusted to interface directly with CARv2 files.
On inbound transfers (client retrievals, miner storage deals), we stream
the received data into a CARv2 ReadWrite blockstore. On outbound transfers
(client storage deals, miner retrievals), we serve the data off a CARv2
ReadOnly blockstore.
Client-side imports are managed by the refactored *imports.Manager
component (when not using IPFS integration). Just like it before, we use
the go-filestore library to avoid duplicating the data from the original
file in the resulting UnixFS DAG (concretely the leaves). However, the
target of those imports are what we call "ref-CARv2s": CARv2 files placed
under the `$LOTUS_PATH/imports` directory, containing the intermediate
nodes in full, and the leaves as positional references to the original file
on disk.
Client-side retrievals are placed into CARv2 files in the location:
`$LOTUS_PATH/retrievals`.
A new set of `Dagstore*` JSON-RPC operations and `lotus-miner dagstore`
subcommands have been introduced on the miner-side to inspect and manage
the dagstore.
Despite moving to a CARv2-backed system, the IPFS integration has been
respected, and it continues to be possible to make storage deals with data
held in an IPFS node, and to perform retrievals directly into an IPFS node.
NOTE: because the "staging" and "client" Badger blockstores are no longer
used, existing imports on the client will be rendered useless. On startup,
Lotus will enumerate all imports and print WARN statements on the log for
each import that needs to be reimported. These log lines contain these
messages:
- import lacks carv2 path; import will not work; please reimport
- import has missing/broken carv2; please reimport
At the end, we will print a "sanity check completed" message indicating
the count of imports found, and how many were deemed broken.
Co-authored-by: Aarsh Shah <aarshkshah1992@gmail.com>
Co-authored-by: Dirk McCormick <dirkmdev@gmail.com>
Co-authored-by: Raúl Kripalani <raul@protocol.ai>
Co-authored-by: Dirk McCormick <dirkmdev@gmail.com>
This is aproposal for an additional flag --manual-stateless-deal and a
corresponding API endpoint ClientStatelessDeal. This allows firing off
an offline-style deal against a miner without keeping further track of
it locally.
Not keeping any local state introduces the limitation of requiring free
storage deals, as there is nothing to tie the payment channel setup to.
Rationale/need for this type of flow is the case of incredibly large
sets of data nd deals, where the client and providers have prearranged
payment ahead of time, and the client has a separate-from-lotus database
of deal inventory. This way the client can use their lotus node merely
as a network gateway, without running into any limitations currently
present in both lotus as a whole and go-fil-markets in particular.
Specific context for this work is filecoin-discover, where the requirement
is to onboard ~ 12,000,000 individual deals against a pool of miners
with whom the client has prearranged a relationship.
We'd read the deal ID without synchronizing. This could (and probably did given
the history of flaky deal tests) cause us to miss events.
This patch also makes sure to always unsubscribe from events, even on error.
We were ignoring quite a few error cases, and had one case where we weren't
actually updating state where we wanted to. Unfortunately, if the linter doesn't
pass, nobody has any reason to actually check lint failures in CI.
There are three remaining XXXs marked in the code for lint.
At present, and at least for the medium term (even with the transition to NSE)
the structure of a piece (and thus commP) will remain identical for every size
of sector.
The offline deal flow would benefit greatly if the `lotus client commP`
interface is able to calculate commP without having access to a fully synced
sync, or without even being online.
This is particularly important for filecoin-discover, as we want to allow
miners to spot-check their purchased HDDs, way before they need to accept
the mainnet deal proposals.
See comment/links in node/impl/client/client.go for details on code flow
Revert multistore functionality for retrieval client -- the need to do this is to support powergate.
we actually need an option for configuring how this command works -- so that it can use the
blockstore OR the multistore