Commit Graph

1165 Commits

Author SHA1 Message Date
A. F. Dudley
d82b3fb881 Only load locally-built images into kind, auto-detect ingress
- Check stack.yml containers: field to determine which images are local builds
- Only load local images via kind load; let k8s pull registry images directly
- Add is_ingress_running() to skip ingress installation if already running
- Fixes deployment failures when public registry images aren't in local Docker

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley
3bc7832d8c Fix deployment name extraction from path
When stack: field in spec.yml contains a path (e.g., stack_orchestrator/data/stacks/name),
extract just the final name component for K8s secret naming. K8s resource names must
be valid RFC 1123 subdomains and cannot contain slashes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley
a75138093b Add setup-repositories to key files list
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley
1128c95969 Split documentation: README for users, CLAUDE.md for agents
README.md: deployment types, external stacks, commands, spec.yml reference
CLAUDE.md: implementation details, code locations, codebase navigation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley
d292e7c48d Add k8s-kind architecture documentation to CLAUDE.md
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:25 -05:00
A. F. Dudley
b057969ddd Clarify create_cluster docstring: one cluster per host by design
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
ca090d2cd5 Add $generate:type:length$ token support for K8s secrets
- Add GENERATE_TOKEN_PATTERN to detect $generate:hex:N$ and $generate:base64:N$ tokens
- Add _generate_and_store_secrets() to create K8s Secrets from spec.yml config
- Modify _write_config_file() to separate secrets from regular config
- Add env_from with secretRef to container spec in cluster_info.py
- Secrets are injected directly into containers via K8s native mechanism

This enables declarative secret generation in spec.yml:
  config:
    SESSION_SECRET: $generate:hex:32$
    DB_PASSWORD: $generate:hex:16$

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
2d3721efa4 Add cluster reuse for multi-stack k8s-kind deployments
When deploying a second stack to k8s-kind, automatically reuse an existing
kind cluster instead of trying to create a new one (which would fail due
to port 80/443 conflicts).

Changes:
- helpers.py: create_cluster() now checks for existing cluster first
- deploy_k8s.py: up() captures returned cluster name and updates self

This enables deploying multiple stacks (e.g., gorbagana-rpc + trashscan-explorer)
to the same kind cluster.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
4408725b08 Fix repo root path calculation (4 parents from stack path) 2026-02-03 17:15:19 -05:00
A. F. Dudley
22d64f1e97 Add --spec-file option to restart and auto-detect GitOps spec
- Add --spec-file option to specify spec location in repo
- Auto-detect deployment/spec.yml in repo as GitOps location
- Fall back to deployment dir if no repo spec found

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
14258500bc Fix restart command for GitOps deployments
- Remove init_operation() from restart - don't regenerate spec from
  commands.py defaults, use existing git-tracked spec.yml instead
- Add docs/deployment_patterns.md documenting GitOps workflow
- Add pre-commit rule to CLAUDE.md
- Fix line length issues in helpers.py

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
3fbd854b8c Use docker for etcd existence check (root-owned dir)
The etcd directory is root-owned, so shell test -f fails.
Use docker with volume mount to check file existence.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
e2d3c44321 Keep timestamped backup of etcd forever
Create member.backup-YYYYMMDD-HHMMSS before cleaning.
Each cluster recreation creates a new backup, preserving history.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
720e01fc75 Preserve original etcd backup until restore is verified
Move original to .bak, move new into place, then delete bak.
If anything fails before the swap, original remains intact.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
5b06cffe17 Use whitelist approach for etcd cleanup
Instead of trying to delete specific stale resources (blacklist),
keep only the valuable data (caddy TLS certs) and delete everything
else. This is more robust as we don't need to maintain a list of
all possible stale resources.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
8948f5bfec Fix etcd cleanup to use docker for root-owned files
Use docker containers with volume mounts to handle all file
operations on root-owned etcd directories, avoiding the need
for sudo on the host.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
675ee87544 Clear stale CNI resources from persisted etcd before cluster creation
When etcd is persisted (for certificate backup) and a cluster is
recreated, kind tries to install CNI (kindnet) fresh but the
persisted etcd already has those resources, causing 'AlreadyExists'
errors and cluster creation failure.

This fix:
- Detects etcd mount path from kind config
- Before cluster creation, clears stale CNI resources (kindnet, coredns)
- Preserves certificate and other important data

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
8d3191e4fd Fix Caddy ingress ACME email and RBAC issues
- Add acme_email_key constant for spec.yml parsing
- Add get_acme_email() method to Spec class
- Modify install_ingress_for_kind() to patch ConfigMap with email
- Pass acme-email from spec to ingress installation
- Add 'delete' verb to leases RBAC for certificate lock cleanup

The acme-email field in spec.yml was previously ignored, causing
Let's Encrypt to fail with "unable to parse email address".
The missing delete permission on leases caused lock cleanup failures.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
c197406cc7 feat(deploy): add deployment restart command
Add `laconic-so deployment restart` command that:
- Pulls latest code from stack git repository
- Regenerates spec.yml from stack's commands.py
- Verifies DNS if hostname changed (with --force to skip)
- Syncs deployment directory preserving cluster ID and data
- Stops and restarts deployment with --skip-cluster-management

Also stores stack-source path in deployment.yml during create
for automatic stack location on restart.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
4713107546 docs(CLAUDE.md): add external stacks preferred guideline
Document that external stack pattern should be used when creating new
stacks for any reason, with directory structure and usage examples.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
88dccdfb7c Merge pull request 'fix(deploy): merge volumes from stack init() instead of overwriting' (#985) from fix-init-volumes-merge into main
Reviewed-on: cerc-io/stack-orchestrator#985
2026-01-31 23:39:38 +00:00
A. F. Dudley
76c0c17c3b fix(deploy): merge volumes from stack init() instead of overwriting
Previously, volumes defined in a stack's commands.py init() function
were being overwritten by volumes discovered from compose files.
This prevented stacks from adding infrastructure volumes like caddy-data
that aren't defined in the compose files.

Now volumes are merged, with init() volumes taking precedence over
compose-discovered defaults.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 18:23:20 -05:00
6a2bbae250 Merge pull request 'Add --update flag to deploy create' (#984) from roysc/deployment-create-sync into main
Reviewed-on: cerc-io/stack-orchestrator#984
Reviewed-by: AFDudley <afdudley@noreply.git.vdb.to>
2026-01-31 22:46:40 +00:00
A. F. Dudley
458b548dcf fix(k8s): add hostPath support for compose host path mounts
Add support for Docker Compose host path mounts (like ../config/file:/path)
in k8s deployments. Previously these were silently skipped, causing k8s
deployments to fail when compose files used host path mounts.

Changes:
- Add helper functions for host path detection and name sanitization
- Generate kind extraMounts for host path mounts
- Create hostPath volumes in pod specs for host path mounts
- Create volumeMounts with sanitized names for host path mounts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 19:25:28 -05:00
789b2dd3a7 Add --update option to deploy create
To allow updating an existing deployment

- Check the deployment dir exists when updating
- Write to temp dir, then safely copy tree
- Don't overwrite data dir or config.env
2026-01-29 08:25:05 -06:00
55b76b9b57 Merge pull request 'multi-port-service' (#980) from multi-port-service into main
Reviewed-on: cerc-io/stack-orchestrator#980
2026-01-24 23:05:14 +00:00
A. F. Dudley
d07a3afd27 Merge origin/main into multi-port-service
Resolve conflicts:
- deployment_context.py: Keep single modify_yaml method from main
- fixturenet-optimism/commands.py: Use modify_yaml helper from main
- deployment_create.py: Keep helm-chart, network-dir, initial-peers options
- deploy_webapp.py: Update create_operation call signature

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 16:48:11 -05:00
A. F. Dudley
a5b373da26 Check for None before creating k8s service
get_service() returns None when there are no http-proxy routes,
so we must check before calling create_namespaced_service().

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 16:39:11 -05:00
A. F. Dudley
99db75da19 Fix invalid docker command in webapp-test
Change 'docker remove -f' to 'docker rm -f' - the 'remove' subcommand
doesn't exist in docker CLI.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 16:39:00 -05:00
A. F. Dudley
d4e935484f Limit test workflow PR triggers to main branch only
Previously these workflows ran on PRs to any branch. Now:
- PRs to main: run all tests (full CI gate)
- Pushes to other branches: use existing path filtering

This reduces CI load on feature branch PRs while maintaining
full test coverage for PRs targeting main.

Affected workflows:
- test-k8s-deploy.yml
- test-k8s-deployment-control.yml
- test-webapp.yml
- test-deploy.yml

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 15:44:54 -05:00
A. F. Dudley
4f01054781 Expose all ports from http-proxy routes in k8s Service
Previously get_service() only exposed the first port from pod definition.
Now it collects all unique ports from http-proxy routes and exposes them
all in the Service spec.

This is needed for WebSocket support where RPC runs on one port (8899)
and WebSocket pubsub on another (8900) - both need to be accessible
through the ingress.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 15:14:48 -05:00
A. F. Dudley
811bbd9db4 Add TODO.md with planned features and refactoring
- Update stack command for continuous deployment workflow
- Separate deployer from CLI
- Separate stacks from orchestrator repo

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 10:43:12 -05:00
A. F. Dudley
8d9682eb47 Use caddy ingress class instead of nginx in cluster_info.py
The ingress annotation was still set to nginx class even though we're now
using Caddy as the ingress controller. Caddy won't pick up ingresses
annotated with the nginx class.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 03:41:35 -05:00
A. F. Dudley
638435873c Add port 443 mapping for kind clusters with Caddy ingress
Caddy provides automatic HTTPS with Let's Encrypt, but needs port 443
mapped from the kind container to the host. Previously only port 80 was
mapped.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 03:35:03 -05:00
A. F. Dudley
97a85359ff Fix helpers.py to use Caddy ingress instead of nginx
The helm-charts-with-caddy branch had the Caddy manifest file but was still
using nginx in the code. This change:

- Switch install_ingress_for_kind() to use ingress-caddy-kind-deploy.yaml
- Update wait_for_ingress_in_kind() to watch caddy-system namespace
- Use correct label selector for Caddy ingress controller pods

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 03:22:07 -05:00
A. F. Dudley
ffa00767d4 Add extra_args support to deploy create command
- Add @click.argument for generic args passthrough to stack commands
- Keep explicit --network-dir and --initial-peers options
- Add DeploymentContext.get_compose_file() helper
- Add DeploymentContext.modify_yaml() helper for stack commands
- Update init() to use absolute paths

This allows stack-specific create commands to receive arbitrary
arguments via: laconic-so deploy create ... -- --custom-arg value

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 03:06:45 -05:00
A. F. Dudley
86462c940f Fix high-memlock spec to include complete OCI runtime config
The base_runtime_spec for containerd requires a complete OCI spec,
not just the rlimits section. The minimal spec was causing runc to
fail with "open /proc/self/fd: no such file or directory" because
essential mounts and namespaces were missing.

This commit uses kind's default cri-base.json as the base and adds
the rlimits configuration on top. The spec includes all necessary
mounts, namespaces, capabilities, and kind-specific hooks.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 02:12:11 -05:00
A. F. Dudley
87db167d7f Add RuntimeClass support for unlimited RLIMIT_MEMLOCK
The previous approach of mounting cri-base.json into kind nodes failed
because we didn't tell containerd to use it via containerdConfigPatches.

RuntimeClass allows different stacks to have different rlimit profiles,
which is essential since kind only supports one cluster per host and
multiple stacks share the same cluster.

Changes:
- Add containerdConfigPatches to kind-config.yml to define runtime handlers
- Create RuntimeClass resources after cluster creation
- Add runtimeClassName to pod specs based on stack's security settings
- Rename cri-base.json to high-memlock-spec.json for clarity
- Add get_runtime_class() method to Spec that auto-derives from
  unlimited-memlock setting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 01:58:38 -05:00
A. F. Dudley
dd856af2d3 Fix pyright type errors across codebase
- Add pyrightconfig.json for pyright 1.1.408 TOML parsing workaround
- Add NoReturn annotations to fatal() functions for proper type narrowing
- Add None checks and assertions after require=True get_record() calls
- Fix AttrDict class with __getattr__ for dynamic attribute access
- Add type annotations and casts for Kubernetes client objects
- Store compose config as DockerDeployer instance attributes
- Filter None values from dotenv and environment mappings
- Use hasattr/getattr patterns for optional container attributes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 01:10:36 -05:00
A. F. Dudley
cd3d908d0d Apply pre-commit linting fixes
- Format code with black (line length 88)
- Fix E501 line length errors by breaking long strings and comments
- Fix F841 unused variable (removed unused 'quiet' variable)
- Configure pyright to disable common type issues in existing codebase
  (reportGeneralTypeIssues, reportOptionalMemberAccess, etc.)
- All pre-commit hooks now pass

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 20:58:31 -05:00
A. F. Dudley
03f9acf869 Add unlimited-memlock support for Kind clusters
Add spec.yml option `security.unlimited-memlock` that configures
RLIMIT_MEMLOCK to unlimited for Kind cluster pods. This is needed
for workloads like Solana validators that require large amounts of
locked memory for memory-mapped files during snapshot decompression.

When enabled, generates a cri-base.json file with rlimits and mounts
it into the Kind node to override the default containerd runtime spec.

Also includes flake8 line-length fixes for affected files.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 20:20:19 -05:00
A. F. Dudley
ba1aad9fa6 Add black, pyright, yamllint to pre-commit hooks
- Add black formatter (rev 23.12.1)
- Add pyright type checker (rev v1.1.345)
- Add yamllint with relaxed mode (rev v1.35.1)
- Update flake8 args: max-line-length=88, extend-ignore=E203,W503,E402
- Remove ansible-lint from dev dependencies (no ansible files in repo)
- Sync pyproject.toml flake8 config with pre-commit

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 20:04:15 -05:00
A. F. Dudley
dc36a6564a Fix misleading error message in load_images_into_kind 2026-01-21 19:32:53 -05:00
A. F. Dudley
c5c3fc1618 Retrigger test-container-registry CI 2026-01-21 19:28:29 -05:00
A. F. Dudley
2e384b7179 Trigger test-container-registry CI
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 19:12:05 -05:00
A. F. Dudley
b708836aa9 Add flake8 to pre-commit hooks
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 19:05:12 -05:00
A. F. Dudley
d8da9b6515 Add missing get_kind_cluster function to helpers.py
Fixes ImportError in k8s_command.py that was causing CI failure.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 19:04:46 -05:00
A. F. Dudley
5a1399f2b2 Apply pre-commit linting fixes
Fix trailing whitespace and end-of-file issues across codebase.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 23:16:44 -05:00
A. F. Dudley
89db6e1e92 Add Caddy ingress and k8s cluster management features
- Add Caddy ingress controller manifest for kind deployments
- Add k8s cluster list command for kind cluster management
- Add k8s_command import and registration in deploy.py
- Fix network section merge to preserve http-proxy settings
- Increase default container resources (4 CPUs, 8GB memory)
- Add UDP protocol support for K8s port definitions
- Add command/entrypoint support for K8s deployments
- Implement docker-compose variable expansion for K8s
- Set ConfigMap defaultMode to 0755 for executable scripts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 23:14:22 -05:00
A. F. Dudley
9bd59f29d9 Add CLAUDE.md, pre-commit config, and pyproject.toml
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 22:40:59 -05:00