Commit Graph

309 Commits

Author SHA1 Message Date
A. F. Dudley
0ac886bf95 fix: chdir to repo root before create_operation in restart
The spec's "stack:" value is a relative path that must resolve from
the repo root. stack_is_external() checks Path(stack).exists() from
cwd, which fails when cwd isn't the repo root.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 15:06:38 +00:00
A. F. Dudley
2484abfcce fix: use git rev-parse for repo root in restart command
The repo_root calculation assumed stack paths are always 4 levels deep
(stack_orchestrator/data/stacks/name). External stacks with different
nesting (e.g. stack-orchestrator/stacks/name = 3 levels) got the wrong
root, causing --spec-file resolution to fail.

Use git rev-parse --show-toplevel instead.

Fixes: so-k1k

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 15:03:24 +00:00
A. F. Dudley
967936e524 Multi-deployment: one k8s Deployment per pod in stack.yml
Each pod entry in stack.yml now creates its own k8s Deployment with
independent lifecycle and update strategy. Pods with PVCs get Recreate,
pods without get RollingUpdate. This enables maintenance services that
survive main pod restarts.

- cluster_info: get_deployments() builds per-pod Deployments, Services
- cluster_info: Ingress routes to correct per-pod Service
- deploy_k8s: _create_deployment() iterates all Deployments/Services
- deployment: restart swaps Ingress to maintenance service during Recreate
- spec: add maintenance-service key

Single-pod stacks are backward compatible (same resource names).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 01:40:45 +00:00
A. F. Dudley
6ace024cd3 fix: use replace instead of patch for k8s resource updates
Strategic merge patch preserves fields not present in the patch body.
This means removed volumes, ports, and env vars persist in the running
Deployment after a restart. Replace sends the complete spec built from
the current compose files — removed fields are actually deleted.

Affects Deployment, Service, Ingress, and NodePort updates. Service
replace preserves clusterIP (immutable field) by reading it from the
existing resource before replacing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 03:44:57 +00:00
A. F. Dudley
ea610bb8d6 Merge branch 'cv-c3c-image-flag-for-restart'
# Conflicts:
#	stack_orchestrator/deploy/k8s/deploy_k8s.py
2026-03-18 23:04:55 +00:00
A. F. Dudley
4b1fc27a1e cv-c3c: add --image flag to deployment restart command
Allows callers to override container images during restart, e.g.:
  laconic-so deployment restart --image backend=ghcr.io/org/app:sha123

The override is applied to the k8s Deployment spec before
create-or-patch. Docker/compose deployers accept the parameter
but ignore it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 22:42:56 +00:00
A. F. Dudley
25e5ff09d9 so-m3m: add credentials-files spec key for on-disk credential injection
_write_config_file() now reads each file listed under the credentials-files
top-level spec key and appends its contents to config.env after config vars.
Paths support ~ expansion. Missing files fail hard with sys.exit(1).

Also adds get_credentials_files() to Spec class following the same pattern
as get_image_registry_config().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 21:55:28 +00:00
A. F. Dudley
0e4ecc3602 refactor: rename registry-credentials to image-pull-secret in spec
The spec key `registry-credentials` was ambiguous — could mean container
registry auth or Laconic registry config. Rename to `image-pull-secret`
which matches the k8s secret name it creates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 21:38:31 +00:00
A. F. Dudley
dc15c0f4a5 feat: auto-generate readiness probes from http-proxy routes
Containers referenced in spec.yml http-proxy routes now get TCP
readiness probes on the proxied port. This tells k8s when a container
is actually ready to serve traffic.

Without readiness probes, k8s considers pods ready immediately after
start, which means:
- Rolling updates cut over before the app is listening
- Broken containers look "ready" and receive traffic (502s)
- kubectl rollout undo has nothing to roll back to

The probes use TCP socket checks (not HTTP) to work with any protocol.
Initial delay 5s, check every 10s, fail after 3 consecutive failures.

Closes so-l2l part C.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 19:43:09 +00:00
A. F. Dudley
2d11ca7bb0 feat: update-in-place deployments with rolling updates
Replace the destroy-and-recreate deployment model with in-place updates.

deploy_k8s.py: All resource creation (Deployment, Service, Ingress,
NodePort, ConfigMap) now uses create-or-update semantics. If a resource
already exists (409 Conflict), it patches instead of failing. For
Deployments, this triggers a k8s rolling update — old pods serve traffic
until new pods pass readiness checks.

deployment.py: restart() no longer calls down(). It just calls up()
which patches existing resources. No namespace deletion, no downtime
gap, no race conditions. k8s handles the rollout.

This gives:
- Zero-downtime deploys (old pods serve during rollout)
- Automatic rollback (if new pods fail readiness, rollout stalls)
- Manual rollback via kubectl rollout undo

Closes so-l2l (parts A and B).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 19:40:20 +00:00
A. F. Dudley
ba39c991f1 fix: create imagePullSecret in deployment namespace, not default
create_registry_secret() hardcoded namespace="default" but deployments
now run in dedicated laconic-* namespaces. The secret was invisible
to pods in the deployment namespace, causing 401 on GHCR pulls.

Accept namespace as parameter, passed from deploy_k8s.py which knows
the correct namespace.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 19:08:52 +00:00
A. F. Dudley
0b3e5559d0 fix: wait for namespace termination in down() before returning
Reverts the label-based deletion approach — resources created by older
laconic-so lack labels, so label queries return empty results. Namespace
deletion is the only reliable cleanup.

Adds _wait_for_namespace_gone() so down() blocks until the namespace
is fully terminated. This prevents the race condition where up() tries
to create resources in a still-terminating namespace (403 Forbidden).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 18:49:38 +00:00
A. F. Dudley
ae2cea3410 fix: never delete namespace on deployment down
down() deleted the entire namespace when it wasn't explicitly set in
the spec. This causes a race condition on restart: up() tries to create
resources in a namespace that's still terminating, getting 403 Forbidden.

Always use _delete_resources_by_label() instead. The namespace is cheap
to keep and required for immediate up() after down(). This also matches
the shared-namespace behavior, making down() consistent regardless of
namespace configuration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 18:47:05 +00:00
A. F. Dudley
e298e7444f fix: add auto-generated header to config.env
config.env is regenerated from spec.yml on every deploy create and
restart, silently overwriting manual edits. Add a header comment
explaining this so operators know to edit spec.yml instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 18:24:27 +00:00
A. F. Dudley
e5a8ec5f06 fix: rename registry secret to image-pull-secret
The secret name `{app}-registry` is ambiguous — it could be a container
registry credential or a Laconic registry config. Rename to
`{app}-image-pull-secret` which clearly describes its purpose as a
Kubernetes imagePullSecret for private container registries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 15:33:11 +00:00
A. F. Dudley
0bbb51067c fix: set imagePullPolicy=Always for kind deployments
Kind deployments used imagePullPolicy=None (defaults to IfNotPresent),
which means the kind node caches images by tag and never re-pulls from
the local registry. After a container rebuild + registry push, the pod
keeps using the stale cached image.

Set Always for all deployment types so k8s re-pulls on every pod
restart. With a local registry this adds negligible overhead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 17:44:35 +00:00
A. F. Dudley
72aabe7d9a fix: deploy create --update now syncs config.env from spec
The --update path excluded config.env from the safe_copy_tree, which
meant new config vars added to spec.yml were never written to
config.env. The XXX comment already flagged this as broken.

Remove config.env from exclude_patterns so --update regenerates it
from spec.yml like the non-update path does.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 08:20:45 +00:00
afd
8a7491d3e0 Support multiple http-proxy entries in a single deployment
Some checks failed
Lint Checks / Run linter (push) Failing after 3h7m12s
Previously get_ingress() only used the first http-proxy entry,
silently ignoring additional hostnames. Now iterates over all
entries, creating an Ingress rule and TLS config per hostname.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 06:16:28 +00:00
e7483bc7d1 Add init containers, shared namespaces, per-volume sizing, and user/label support (#997)
Some checks failed
Smoke Test / Run basic test suite (push) Has been cancelled
Webapp Test / Run webapp test suite (push) Has been cancelled
Deploy Test / Run deploy test suite (push) Has been cancelled
Lint Checks / Run linter (push) Has been cancelled
Publish / Build and publish (push) Failing after 3h0m5s
Fixturenet-Laconicd-Test / Run Laconicd fixturenet and Laconic CLI tests (push) Failing after 18m13s
K8s Deploy Test / Run deploy test suite on kind/k8s (push) Failing after 24m32s
Database Test / Run database hosting test on kind/k8s (push) Failing after 38m22s
Container Registry Test / Run contaier registry hosting test on kind/k8s (push) Failing after 1h5m50s
External Stack Test / Run external stack test suite (push) Failing after 1h27m37s
Reviewed-on: #997
Co-authored-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
Co-committed-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
2026-03-12 10:34:45 +00:00
5af6a83fa2 Add Job and secrets support for k8s-kind deployments (#995)
Some checks failed
Lint Checks / Run linter (push) Successful in 1h39m12s
Deploy Test / Run deploy test suite (push) Failing after 3h12m50s
K8s Deployment Control Test / Run deployment control suite on kind/k8s (push) Failing after 3h12m50s
Webapp Test / Run webapp test suite (push) Failing after 3h13m40s
Smoke Test / Run basic test suite (push) Failing after 3h2m15s
Publish / Build and publish (push) Successful in 1h4m11s
Fixturenet-Laconicd-Test / Run Laconicd fixturenet and Laconic CLI tests (push) Successful in 1h45m35s
K8s Deploy Test / Run deploy test suite on kind/k8s (push) Failing after 1h28m25s
Database Test / Run database hosting test on kind/k8s (push) Successful in 1h6m49s
Container Registry Test / Run contaier registry hosting test on kind/k8s (push) Failing after 2h17m3s
External Stack Test / Run external stack test suite (push) Failing after 2h37m5s
Part of https://plan.wireit.in/deepstack/browse/VUL-315

Reviewed-on: #995
Co-authored-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
Co-committed-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
2026-03-11 03:56:21 +00:00
8cc0a9a19a add/local-test-runner (#996)
Some checks failed
Lint Checks / Run linter (push) Successful in 1m59s
Publish / Build and publish (push) Successful in 3m3s
Deploy Test / Run deploy test suite (push) Successful in 5m24s
Webapp Test / Run webapp test suite (push) Successful in 5m39s
Smoke Test / Run basic test suite (push) Successful in 5m53s
Fixturenet-Laconicd-Test / Run Laconicd fixturenet and Laconic CLI tests (push) Successful in 29m48s
K8s Deploy Test / Run deploy test suite on kind/k8s (push) Failing after 26m37s
Database Test / Run database hosting test on kind/k8s (push) Failing after 33m44s
Container Registry Test / Run contaier registry hosting test on kind/k8s (push) Failing after 47m14s
External Stack Test / Run external stack test suite (push) Failing after 1h14m42s
Co-authored-by: A. F. Dudley <a.frederick.dudley@gmail.com>
Reviewed-on: #996
2026-03-09 20:04:58 +00:00
A. F. Dudley
019225ca18 fix(k8s): translate service names to localhost for sidecar containers
Some checks failed
Lint Checks / Run linter (push) Failing after 3s
Lint Checks / Run linter (pull_request) Failing after 4s
Deploy Test / Run deploy test suite (pull_request) Failing after 4s
K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Failing after 5s
K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Failing after 4s
Webapp Test / Run webapp test suite (pull_request) Failing after 5s
Smoke Test / Run basic test suite (pull_request) Failing after 5s
In docker-compose, services can reference each other by name (e.g., 'db:5432').
In Kubernetes, when multiple containers are in the same pod (sidecars), they
share the same network namespace and must use 'localhost' instead.

This fix adds translate_sidecar_service_names() which replaces docker-compose
service name references with 'localhost' in environment variable values for
containers that share the same pod.

Fixes issue where multi-container pods fail because one container tries to
connect to a sibling using the compose service name instead of localhost.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 18:10:32 -05:00
A. F. Dudley
d913926144 feat(k8s): namespace-per-deployment for resource isolation and cleanup
Some checks failed
Lint Checks / Run linter (push) Failing after 4s
Deploy Test / Run deploy test suite (pull_request) Failing after 5s
K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Failing after 5s
K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Failing after 5s
Webapp Test / Run webapp test suite (pull_request) Failing after 5s
Smoke Test / Run basic test suite (pull_request) Failing after 4s
Lint Checks / Run linter (pull_request) Failing after 3s
Each deployment now gets its own Kubernetes namespace (laconic-{deployment_id}).
This provides:
- Resource isolation between deployments on the same cluster
- Simplified cleanup: deleting the namespace cascades to all namespaced resources
- No orphaned resources possible when deployment IDs change

Changes:
- Set k8s_namespace based on deployment name in __init__
- Add _ensure_namespace() to create namespace before deploying resources
- Add _delete_namespace() for cleanup
- Simplify down() to just delete PVs (cluster-scoped) and the namespace
- Fix hardcoded "default" namespace in logs function

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 18:04:52 -05:00
A. F. Dudley
47d3d10ead fix(k8s): query resources by label in down() for proper cleanup
Some checks failed
Lint Checks / Run linter (push) Failing after 14s
Lint Checks / Run linter (pull_request) Failing after 15s
K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Failing after 2m3s
Deploy Test / Run deploy test suite (pull_request) Successful in 2m10s
K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Failing after 2m51s
Webapp Test / Run webapp test suite (pull_request) Successful in 4m0s
Smoke Test / Run basic test suite (pull_request) Successful in 3m56s
Previously, down() generated resource names from the deployment config
and deleted those specific names. This failed to clean up orphaned
resources when deployment IDs changed (e.g., after force_redeploy).

Changes:
- Add 'app' label to all resources: Ingress, Service, NodePort, ConfigMap, PV
- Refactor down() to query K8s by label selector instead of generating names
- This ensures all resources for a deployment are cleaned up, even if
  the deployment config has changed or been deleted

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:55:14 -05:00
A. F. Dudley
f70e87b848 Add etcd + PKI extraMounts for offline data recovery
Some checks failed
Lint Checks / Run linter (push) Failing after 13s
Lint Checks / Run linter (pull_request) Failing after 16s
Deploy Test / Run deploy test suite (pull_request) Successful in 2m18s
K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Failing after 2m43s
K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Successful in 3m31s
Smoke Test / Run basic test suite (pull_request) Successful in 4m8s
Webapp Test / Run webapp test suite (pull_request) Successful in 4m21s
Mount /var/lib/etcd and /etc/kubernetes/pki to host filesystem
so cluster state is preserved for offline recovery. Each deployment
gets its own backup directory keyed by deployment ID.

Directory structure:
  data/cluster-backups/{deployment_id}/etcd/
  data/cluster-backups/{deployment_id}/pki/

This enables extracting secrets from etcd backups using etcdctl
with the preserved PKI certificates.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:19:52 -05:00
A. F. Dudley
5bc6c978ac feat(k8s): support acme-email config for Caddy ingress
Adds support for configuring ACME email for Let's Encrypt certificates
in kind deployments. The email can be specified in the spec under
network.acme-email and will be used to configure the Caddy ingress
controller ConfigMap.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:19:52 -05:00
A. F. Dudley
ee59918082 Allow relative volume paths for k8s-kind deployments
For k8s-kind, relative paths (e.g., ./data/rpc-config) are resolved to
$DEPLOYMENT_DIR/path by _make_absolute_host_path() during kind config
generation. This provides Docker Host persistence that survives cluster
restarts.

Previously, validation threw an exception before paths could be resolved,
making it impossible to use relative paths for persistent storage.

Changes:
- deployment_create.py: Skip relative path check for k8s-kind
- cluster_info.py: Allow relative paths to reach PV generation
- docs/deployment_patterns.md: Document volume persistence patterns

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:17:44 -05:00
A. F. Dudley
7cecf2caa6 Fix Caddy ACME email race condition by templating YAML
Previously, install_ingress_for_kind() applied the YAML (which starts
the Caddy pod with email: ""), then patched the ConfigMap afterward.
The pod had already read the empty email and Caddy doesn't hot-reload.

Now template the email into the YAML before applying, so the pod starts
with the correct email from the beginning.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley
cb6fdb77a6 Rename image-registry to registry-credentials to avoid collision
The existing 'image-registry' key is used for pushing images to a remote
registry (URL string). Rename the new auth config to 'registry-credentials'
to avoid collision.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley
73ba13aaa5 Add private registry authentication support
Add ability to configure private container registry credentials in spec.yml
for deployments using images from registries like GHCR.

- Add get_image_registry_config() to spec.py for parsing image-registry config
- Add create_registry_secret() to create K8s docker-registry secrets
- Update cluster_info.py to use dynamic {deployment}-registry secret names
- Update deploy_k8s.py to create registry secret before deployment
- Document feature in deployment_patterns.md

The token-env pattern keeps credentials out of git - the spec references an
environment variable name, and the actual token is passed at runtime.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley
d82b3fb881 Only load locally-built images into kind, auto-detect ingress
- Check stack.yml containers: field to determine which images are local builds
- Only load local images via kind load; let k8s pull registry images directly
- Add is_ingress_running() to skip ingress installation if already running
- Fixes deployment failures when public registry images aren't in local Docker

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley
3bc7832d8c Fix deployment name extraction from path
When stack: field in spec.yml contains a path (e.g., stack_orchestrator/data/stacks/name),
extract just the final name component for K8s secret naming. K8s resource names must
be valid RFC 1123 subdomains and cannot contain slashes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:16:26 -05:00
A. F. Dudley
b057969ddd Clarify create_cluster docstring: one cluster per host by design
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
ca090d2cd5 Add $generate:type:length$ token support for K8s secrets
- Add GENERATE_TOKEN_PATTERN to detect $generate:hex:N$ and $generate:base64:N$ tokens
- Add _generate_and_store_secrets() to create K8s Secrets from spec.yml config
- Modify _write_config_file() to separate secrets from regular config
- Add env_from with secretRef to container spec in cluster_info.py
- Secrets are injected directly into containers via K8s native mechanism

This enables declarative secret generation in spec.yml:
  config:
    SESSION_SECRET: $generate:hex:32$
    DB_PASSWORD: $generate:hex:16$

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
2d3721efa4 Add cluster reuse for multi-stack k8s-kind deployments
When deploying a second stack to k8s-kind, automatically reuse an existing
kind cluster instead of trying to create a new one (which would fail due
to port 80/443 conflicts).

Changes:
- helpers.py: create_cluster() now checks for existing cluster first
- deploy_k8s.py: up() captures returned cluster name and updates self

This enables deploying multiple stacks (e.g., gorbagana-rpc + trashscan-explorer)
to the same kind cluster.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
4408725b08 Fix repo root path calculation (4 parents from stack path) 2026-02-03 17:15:19 -05:00
A. F. Dudley
22d64f1e97 Add --spec-file option to restart and auto-detect GitOps spec
- Add --spec-file option to specify spec location in repo
- Auto-detect deployment/spec.yml in repo as GitOps location
- Fall back to deployment dir if no repo spec found

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
14258500bc Fix restart command for GitOps deployments
- Remove init_operation() from restart - don't regenerate spec from
  commands.py defaults, use existing git-tracked spec.yml instead
- Add docs/deployment_patterns.md documenting GitOps workflow
- Add pre-commit rule to CLAUDE.md
- Fix line length issues in helpers.py

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
3fbd854b8c Use docker for etcd existence check (root-owned dir)
The etcd directory is root-owned, so shell test -f fails.
Use docker with volume mount to check file existence.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
e2d3c44321 Keep timestamped backup of etcd forever
Create member.backup-YYYYMMDD-HHMMSS before cleaning.
Each cluster recreation creates a new backup, preserving history.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
720e01fc75 Preserve original etcd backup until restore is verified
Move original to .bak, move new into place, then delete bak.
If anything fails before the swap, original remains intact.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
5b06cffe17 Use whitelist approach for etcd cleanup
Instead of trying to delete specific stale resources (blacklist),
keep only the valuable data (caddy TLS certs) and delete everything
else. This is more robust as we don't need to maintain a list of
all possible stale resources.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
8948f5bfec Fix etcd cleanup to use docker for root-owned files
Use docker containers with volume mounts to handle all file
operations on root-owned etcd directories, avoiding the need
for sudo on the host.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
675ee87544 Clear stale CNI resources from persisted etcd before cluster creation
When etcd is persisted (for certificate backup) and a cluster is
recreated, kind tries to install CNI (kindnet) fresh but the
persisted etcd already has those resources, causing 'AlreadyExists'
errors and cluster creation failure.

This fix:
- Detects etcd mount path from kind config
- Before cluster creation, clears stale CNI resources (kindnet, coredns)
- Preserves certificate and other important data

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
8d3191e4fd Fix Caddy ingress ACME email and RBAC issues
- Add acme_email_key constant for spec.yml parsing
- Add get_acme_email() method to Spec class
- Modify install_ingress_for_kind() to patch ConfigMap with email
- Pass acme-email from spec to ingress installation
- Add 'delete' verb to leases RBAC for certificate lock cleanup

The acme-email field in spec.yml was previously ignored, causing
Let's Encrypt to fail with "unable to parse email address".
The missing delete permission on leases caused lock cleanup failures.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
c197406cc7 feat(deploy): add deployment restart command
Add `laconic-so deployment restart` command that:
- Pulls latest code from stack git repository
- Regenerates spec.yml from stack's commands.py
- Verifies DNS if hostname changed (with --force to skip)
- Syncs deployment directory preserving cluster ID and data
- Stops and restarts deployment with --skip-cluster-management

Also stores stack-source path in deployment.yml during create
for automatic stack location on restart.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:15:19 -05:00
A. F. Dudley
76c0c17c3b fix(deploy): merge volumes from stack init() instead of overwriting
All checks were successful
Lint Checks / Run linter (push) Successful in 14s
Lint Checks / Run linter (pull_request) Successful in 16s
Deploy Test / Run deploy test suite (pull_request) Successful in 2m23s
Webapp Test / Run webapp test suite (pull_request) Successful in 4m5s
Smoke Test / Run basic test suite (pull_request) Successful in 4m7s
K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Successful in 6m51s
K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Successful in 2m0s
Previously, volumes defined in a stack's commands.py init() function
were being overwritten by volumes discovered from compose files.
This prevented stacks from adding infrastructure volumes like caddy-data
that aren't defined in the compose files.

Now volumes are merged, with init() volumes taking precedence over
compose-discovered defaults.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 18:23:20 -05:00
A. F. Dudley
458b548dcf fix(k8s): add hostPath support for compose host path mounts
All checks were successful
Smoke Test / Run basic test suite (pull_request) Successful in 3m52s
Lint Checks / Run linter (pull_request) Successful in 14s
Webapp Test / Run webapp test suite (pull_request) Successful in 3m59s
K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Successful in 4m19s
K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Successful in 1m54s
Deploy Test / Run deploy test suite (pull_request) Successful in 2m4s
Add support for Docker Compose host path mounts (like ../config/file:/path)
in k8s deployments. Previously these were silently skipped, causing k8s
deployments to fail when compose files used host path mounts.

Changes:
- Add helper functions for host path detection and name sanitization
- Generate kind extraMounts for host path mounts
- Create hostPath volumes in pod specs for host path mounts
- Create volumeMounts with sanitized names for host path mounts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 19:25:28 -05:00
789b2dd3a7 Add --update option to deploy create
Some checks failed
Lint Checks / Run linter (pull_request) Successful in 1h21m32s
Deploy Test / Run deploy test suite (pull_request) Successful in 1h43m56s
K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Failing after 2h4m55s
K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Failing after 2h27m42s
Webapp Test / Run webapp test suite (pull_request) Successful in 2h13m35s
Smoke Test / Run basic test suite (pull_request) Successful in 2h17m41s
To allow updating an existing deployment

- Check the deployment dir exists when updating
- Write to temp dir, then safely copy tree
- Don't overwrite data dir or config.env
2026-01-29 08:25:05 -06:00
A. F. Dudley
d07a3afd27 Merge origin/main into multi-port-service
All checks were successful
Lint Checks / Run linter (push) Successful in 24m22s
Lint Checks / Run linter (pull_request) Successful in 23m2s
Deploy Test / Run deploy test suite (pull_request) Successful in 25m37s
K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Successful in 28m31s
K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Successful in 27m46s
Webapp Test / Run webapp test suite (pull_request) Successful in 27m34s
Smoke Test / Run basic test suite (pull_request) Successful in 28m59s
Resolve conflicts:
- deployment_context.py: Keep single modify_yaml method from main
- fixturenet-optimism/commands.py: Use modify_yaml helper from main
- deployment_create.py: Keep helm-chart, network-dir, initial-peers options
- deploy_webapp.py: Update create_operation call signature

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 16:48:11 -05:00