lotus/documentation/misc/Building_a_network_skeleton.md
Phi-rjan ec6d3e1483
chore: docs: nv-skeleton documentation (#11065)
* nv-skeleton documentation

Add a tutorial for how one can create a nv-skeleton in Lotus

* Add footnote for `Add migration` step

Add footnote for `Add migration` step

* Indent migration-code

Indent migration-code to make it show properly as a footnote.

* Add ref-fvm and filecoin-ffi checklist

Add ref-fvm and filecoin-ffi checklist

* Add Filecoin-FFI steps

Add Filecoin-FFI steps

* Add step to params_butterfly.go

Add step to params_butterfly.go

* Fix typo

Fix typo

* Add links to reference PRs

Add links to reference PRs

* Update ref-fvm list

Update ref-fvm list
2024-04-25 15:46:13 -04:00

12 KiB

Network Upgrade Skeleton in Lotus

This guide will walk you through the process of creating a skeleton for a network upgrade in Lotus. The process involves making changes in multiple repositories in the following order:

  1. ref-fvm
  2. filecoin-ffi
  3. go-state-types
  4. lotus

Each repository has its own set of steps that need to be followed. This guide will provide detailed instructions for each repository.

Setup

  1. Clone the ref-fvm repository.

  2. Clone the filecoin-ffi repository.

  3. Clone the go-state-types repository.

  4. In your Lotus repository, add replace github.com/filecoin-project/go-state-types => ../go-state-types to the very end of your Lotus go.mod file.

    • This ensures that your local clone copy of go-state-types is used. Any changes you make there will be reflected in your Lotus project.

Ref-FVM Checklist

  1. Add support for the new network version in Ref-FVM:

    • In fvm/Cargo.toml add nvXX-dev as a feature flag in the [features]-section.
    • In fvm/src/gas/price_list.rs, extend the price_list_by_network_version function to support the new network version with the nvXX-dev feature flag.
    • In fvm/src/machine/default.rs, locate the new function within your machine context. You'll find a SUPPORTED_VERSIONS constant that sets the range of supported network versions. Update this range to include the new network version. Do this by replacing the existing feature flag nvXX-dev and NetworkVersion::VXX with the new ones corresponding to your new network version.
    • In shared/src/version/mod.rs, in the NetworkVersion implementation, you will find a series of constants representing different network versions. To add a new network version, you need to declare a new constant: pub const (VXX+1): Self = Self(XX+1);

You can take a look at this Ref-FVM PR as a reference, which added the skeleton for network version 22.

Filecoin-FFI Checklist

  1. Update the TryFrom<u32> implementation for EngineVersion in rust/src/fvm/engine.rs

    • Add the new network version number (XX+1) to the existing match arm for the network version.
  2. Patch the FVM-dependency (fvm3) in rust/cargo.toml to use the custom branch of the FVM created in the Ref-FVM Checklist)

    • Add features = ["your-ref-fvm-branch"] to tell Cargo to use you Ref-FVM branch.

You can take a look at this Filecoin-FFI PR as a reference, which added the skeleton for network version 22.

Go-State-Types Checklist

  1. Follow the go-state-types actor version checklist:

    • Copy go-state-types/builtin/vX to go-state-types/builtin/v(X+1).
    • Change all references from vX to v(X+1) in the new files.
    • Add new network version to network/version.go.
    • Add new actors version to actors/version.go.
      • Add Version(XX+1) Version = XX+1 as a constant.
      • In func VersionForNetwork add case network.Version(XX+1): return Version(XX+1), nil.
    • Add the new version to the gen step of the makefile.
      • Add $(GO_BIN) run ./builtin/v(XX+1)/gen/gen.go.

You can take a look at this Go-State-Types PR as a reference, which added the skeleton for network version 22.

Lotus Checklist

  1. Import new actors:

    • Create a mock actor-bundle for the new network version.
    • In /build/actors run ./pack.sh vXX+1 vXX.0.0 where XX is the current actor bundle version.
  2. Define upgrade heights in build/params_:

    • Update the following files:
      • params_2k.go
        • Set previous UpgradeXxxxxHeight = abi.ChainEpoch(-xx-1)
        • Add var UpgradeXxxxxHeight = abi.ChainEpoch(200)
        • Add UpgradeXxxxxHeight = getUpgradeHeight("LOTUS_XXXXX_HEIGHT", UpgradeXXXXHeight)
        • Set const GenesisNetworkVersion = network.VersionXX where XX is the network version you are upgrading from.
      • params_butterfly.go
        • set previous upgrade to var UpgradeXxxxxHeigh = abi.ChainEpoch(-xx-1)
        • Add comment with ?????? signaling that the new upgrade date is unkown
        • Add const UpgradeXxxxxHeight = 999999999999999
      • params_calibnet.go
        • Add comment with ?????? signaling that the new upgrade date is unkown
        • Add const UpgradeXxxxxHeight = 999999999999999
      • params_interop.go
        • set previous upgrade to var UpgradeXxxxxHeigh = abi.ChainEpoch(-xx-1)
        • Add const UpgradeXxxxxHeight = 50
      • params_mainnet.go
        • Set previous upgrade to const UpgradeXxxxxHeight = XX
        • Add comment with ???? signaling that the new upgrade date is unkown
        • Add var UpgradeXxxxxxHeight = abi.ChainEpoch(9999999999)
        • Change the LOTUS_DISABLE_XXXX env variable to the new network name
      • params_testground.go
        • Add UpgradeXxxxxHeight abi.ChainEpoch = (-xx-1)
  3. Generate adapters:

    • Update gen/inlinegen-data.json.

      • Add XX+1 to "actorVersions" and set "latestActorsVersion" to XX+1.
      • Add XX+1 to "networkVersions" and set "latestNetworkVersion" to XX+1.
    • Run make actors-gen. This generates the /chain/actors/builtin/* code, /chain/actors/policy/policy.go code, /chain/actors/version.go, and /itest/kit/ensemble_opts_nv.go.

  4. Update chain/consensus/filcns/upgrades.go.

    • Import nv(XX+1) "github.com/filecoin-project/go-state-types/builtin/v(XX+1)/migration.
    • Add Schedule. 1
    • Add Migration. 2
  5. Add actorstype to the NewActorRegistry in /chain/consensus/computestate.go.

    • Add inv.Register(actorstypes.Version(XX+1), vm.ActorsVersionPredicate(actorstypes.Version(XX+1)), builtin.MakeRegistry(actorstypes.Version(XX+1)).
  6. Add upgrade field to api/types.go/ForkUpgradeParams.

    • Add UpgradeXxxxxHeight abi.ChainEpoch to ForkUpgradeParams struct.
  7. Add upgrade to node/impl/full/state.go.

    • Add UpgradeXxxxxHeight: build.UpgradeXxxxxHeight,.
  8. Add network version to chain/state/statetree.go.

    • Add network.VersionXX+1 to VersionForNetwork function.
  9. Run make gen.

  10. Run make docsgen-cli.

And you're done! This should create a network upgrade skeleton that you are able to run locally with your local go-state-types clones, and a mock Actors-bundle. This will allow you to:

  • Have a local developer network that starts at the current network version.
  • Be able to see the Actor CIDs/Actor version for the mock v12-bundle through lotus state actor-cids --network-version XX+1
  • Have a successful pre-migration.
  • Complete Migration at upgrade epoch, but fail immidiately after the upgrade.

You can take a look at this Lotus PR as a reference, which added the skeleton for network version 22.

// TODO: Create a video-tutorial going through all the steps


  1. Here is an example of how you can add a schedule:

    {
        Height:    build.UpgradeXxxxHeight,
        Network:   network.Version(XX+1),
        Migration: UpgradeActorsV(XX+1),
        PreMigrations: []stmgr.PreMigration{{
            PreMigration:    PreUpgradeActors(VXX+1),
            StartWithin:     120,
            DontStartWithin: 15,
            StopWithin:      10,
        }},
        Expensive: true,
    },
    

    This schedule should be added to the DefaultUpgradeSchedule function, specifically within the updates array. ↩︎

  2. Here is an example of how you can add a migration:

    func PreUpgradeActorsV(XX+1)(ctx context.Context, sm *stmgr.StateManager, cache stmgr.MigrationCache, root cid.Cid, epoch abi.ChainEpoch, ts *types.TipSet) error {
        // Use half the CPUs for pre-migration, but leave at least 3.
        workerCount := MigrationMaxWorkerCount
        if workerCount <= 4 {
            workerCount = 1
        } else {
            workerCount /= 2
        }
    
        lbts, lbRoot, err := stmgr.GetLookbackTipSetForRound(ctx, sm, ts, epoch)
        if err != nil {
            return xerrors.Errorf("error getting lookback ts for premigration: %w", err)
        }
    
        config := migration.Config{
            MaxWorkers:        uint(workerCount),
            ProgressLogPeriod: time.Minute * 5,
        }
    
        _, err = upgradeActorsV(XX+1)Common(ctx, sm, cache, lbRoot, epoch, lbts, config)
        return err
    }
    
    func UpgradeActorsV(XX+1)(ctx context.Context, sm *stmgr.StateManager, cache stmgr.MigrationCache, cb stmgr.ExecMonitor,
        root cid.Cid, epoch abi.ChainEpoch, ts *types.TipSet) (cid.Cid, error) {
        // Use all the CPUs except 2.
        workerCount := MigrationMaxWorkerCount - 3
        if workerCount <= 0 {
            workerCount = 1
        }
        config := migration.Config{
            MaxWorkers:        uint(workerCount),
            JobQueueSize:      1000,
            ResultQueueSize:   100,
            ProgressLogPeriod: 10 * time.Second,
        }
        newRoot, err := upgradeActorsV(XX+1)Common(ctx, sm, cache, root, epoch, ts, config)
        if err != nil {
            return cid.Undef, xerrors.Errorf("migrating actors v11 state: %w", err)
        }
        return newRoot, nil
    }
    
    func upgradeActorsV(XX+1)Common(
        ctx context.Context, sm *stmgr.StateManager, cache stmgr.MigrationCache,
        root cid.Cid, epoch abi.ChainEpoch, ts *types.TipSet,
        config migration.Config,
    ) (cid.Cid, error) {
        writeStore := blockstore.NewAutobatch(ctx, sm.ChainStore().StateBlockstore(), units.GiB/4)
        adtStore := store.ActorStore(ctx, writeStore)
        // ensure that the manifest is loaded in the blockstore
        if err := bundle.LoadBundles(ctx, writeStore, actorstypes.Version(XX+1)); err != nil {
            return cid.Undef, xerrors.Errorf("failed to load manifest bundle: %w", err)
        }
    
        // Load the state root.
        var stateRoot types.StateRoot
        if err := adtStore.Get(ctx, root, &stateRoot); err != nil {
            return cid.Undef, xerrors.Errorf("failed to decode state root: %w", err)
        }
    
        if stateRoot.Version != types.StateTreeVersion5 {
            return cid.Undef, xerrors.Errorf(
                "expected state root version 5 for actors v(XX+1) upgrade, got %d",
                stateRoot.Version,
            )
        }
    
        manifest, ok := actors.GetManifest(actorstypes.Version(XX+1))
        if !ok {
            return cid.Undef, xerrors.Errorf("no manifest CID for v(XX+1) upgrade")
        }
    
        // Perform the migration
        newHamtRoot, err := nv(XX+1).MigrateStateTree(ctx, adtStore, manifest, stateRoot.Actors, epoch, config,
            migrationLogger{}, cache)
        if err != nil {
            return cid.Undef, xerrors.Errorf("upgrading to actors v11: %w", err)
        }
    
        // Persist the result.
        newRoot, err := adtStore.Put(ctx, &types.StateRoot{
            Version: types.StateTreeVersion5,
            Actors:  newHamtRoot,
            Info:    stateRoot.Info,
        })
        if err != nil {
            return cid.Undef, xerrors.Errorf("failed to persist new state root: %w", err)
        }
    
        // Persists the new tree and shuts down the flush worker
        if err := writeStore.Flush(ctx); err != nil {
            return cid.Undef, xerrors.Errorf("writeStore flush failed: %w", err)
        }
    
        if err := writeStore.Shutdown(ctx); err != nil {
            return cid.Undef, xerrors.Errorf("writeStore shutdown failed: %w", err)
        }
    
        return newRoot, nil
    }
    
    ↩︎