This completely removes publishing to Amamzon ECR. ECR is a private
docker repository (like DockerHub), but since it's private is can only
be used internally by PL teams to launch lotus nodes on AWS
infrastructure. No one currently seems to be using it. All the usual
suspects (Boost, Lotus, Infra) have been asked specifically, and said
they don't, and post has been made in the #engres channel to try and
catch anyone else. No one responded saying we should save it.
Goreleaser checks to make sure we don't have a dirty git state when
releasing, which means the kubo download we use to set up IPFS should be
removed before release.
This is a major refactor of our dockerfile to support the following
- The lotus image will remain as is.
- The lotus-test image will be deprecated.
- The lotus-all-in-one image will also ship with the lotus-seed and lotus-fountain binaries, which it currently does not.
- The lotus-all-in-one image will be built in debug, calibnet, and butterflynet modes in addition to the (current) mainnet mode.
- The lotus-all-in-one image will now be published regularly using the following tags:
- 1.18.0-rc1 , 1.18.0-rc1-debug, 1.18.0-rc1-calibnet, 1.18.0-rc1-butterflynet . This pattern will be used for all lotus releases, including RC releases.
- nightly, nightly-debug, nightly-calibnet, nightly-butterflynet
- stable, stable-debug, stable-calibnet, stable-butterflynet
- Removes cargo caching (since we don't build FFI from source, this
isn't used)
- Removes npm (this isn't a build dependency, so not sure why it was
being installed)
This builds three separate binaries (darwin/amd64, darwin/arm64,
linux/amd64), and then combines them into single release (including a
universal darwin binary) using goreleaser.
Also removes build-ntwk-{calibration,butterfly}
This runs the build more often so we can continue to debug any remaining
issues, and ensures that we release a new image on the 15th of this
month (since it was broken on the 1st)
The builds were erroring only in CircleCI, when run manually the same
command worked fine. I reached out to CircleCI support, and got the
following message:
>>>
The reason you are seeing this error when running in CircleCI and not while debugging with SSH is due to the -e set in #!/bin/bash -eo pipefail at the beginning of the shell while the debugging shell would just be #!/bin/bash. The -e sets to exit to the shell when any non zero [0] exit code status.
Since you say the command works when debugging with SSH you can set the shell to use /bin/bash -o pipefail using a default shell options. Here is an example:
- run:
name: <<command name>>
shell: /bin/bash -o pipefail
command: |
<< some commands>>
Notice that I still added -o pipefail as that prevents errors in a pipeline from being masked.
This is a small refactor of our workflow to test out goreleaser, a yaml
based tool for building, packaging, and releasing go binaries on
multiple platforms. It supports building binaries for to most of the platforms we
care about, including linux and macos, and also supports publishing
those binaries automatically as releases in Github, homebrew, snap, and
even apt / deb.
If this trial goes well, I think we should eventually replace the entire
release workflow with goreleaser. For now, this test is more tightly
scoped to only automated the MacOS release process, since that is the
one we have the most issues with. This PRi / commit:
- Builds darwin-amd64 and darwin-arm64 binaries of lotus, lotus-miner,
and lotus-worker
- Packages them into a universal darwin binary
- Publishes those to a release in Github based on the current tag
- Uses the binaries in the release to auto-publish and updated homebrew
configuration to filecoin-project/homebrew-lotus
- Does a `dry-run` build to produce a snapshot on release branches with
no tag
- Manually generate and upload checksums after goreleaser
jq is already installed now in either a newer version of CircleCI's MacOS VMs or in a previous CI step.
I ran a failing macos job with ssh enabled, and inspected '/usr/local/bin' and found found the following output
lrwxr-xr-x 1 distiller admin 23 Jun 22 14:50 jq -> ../Cellar/jq/1.6/bin/jq
the existing symlink causes the 'Install jq' job to fail.
removing this job should resolve the issue
jq is already installed now in either a newer version of CircleCI's MacOS VMs or in a previous CI step.
I ran a failing macos job with ssh enabled, and inspected '/usr/local/bin' and found found the following output
lrwxr-xr-x 1 distiller admin 23 Jun 22 14:50 jq -> ../Cellar/jq/1.6/bin/jq
the existing symlink causes the 'Install jq' job to fail.
removing this job should resolve the issue