stack-orchestrator

Archived

Author	SHA1	Message	Date
A. F. Dudley	967936e524	Multi-deployment: one k8s Deployment per pod in stack.yml Each pod entry in stack.yml now creates its own k8s Deployment with independent lifecycle and update strategy. Pods with PVCs get Recreate, pods without get RollingUpdate. This enables maintenance services that survive main pod restarts. - cluster_info: get_deployments() builds per-pod Deployments, Services - cluster_info: Ingress routes to correct per-pod Service - deploy_k8s: _create_deployment() iterates all Deployments/Services - deployment: restart swaps Ingress to maintenance service during Recreate - spec: add maintenance-service key Single-pod stacks are backward compatible (same resource names). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 01:40:45 +00:00
A. F. Dudley	6ace024cd3	fix: use replace instead of patch for k8s resource updates Strategic merge patch preserves fields not present in the patch body. This means removed volumes, ports, and env vars persist in the running Deployment after a restart. Replace sends the complete spec built from the current compose files — removed fields are actually deleted. Affects Deployment, Service, Ingress, and NodePort updates. Service replace preserves clusterIP (immutable field) by reading it from the existing resource before replacing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 03:44:57 +00:00
A. F. Dudley	ea610bb8d6	Merge branch 'cv-c3c-image-flag-for-restart' # Conflicts: # stack_orchestrator/deploy/k8s/deploy_k8s.py	2026-03-18 23:04:55 +00:00
A. F. Dudley	4b1fc27a1e	cv-c3c: add --image flag to deployment restart command Allows callers to override container images during restart, e.g.: laconic-so deployment restart --image backend=ghcr.io/org/app:sha123 The override is applied to the k8s Deployment spec before create-or-patch. Docker/compose deployers accept the parameter but ignore it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 22:42:56 +00:00
A. F. Dudley	25e5ff09d9	so-m3m: add credentials-files spec key for on-disk credential injection _write_config_file() now reads each file listed under the credentials-files top-level spec key and appends its contents to config.env after config vars. Paths support ~ expansion. Missing files fail hard with sys.exit(1). Also adds get_credentials_files() to Spec class following the same pattern as get_image_registry_config(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 21:55:28 +00:00
A. F. Dudley	0e4ecc3602	refactor: rename registry-credentials to image-pull-secret in spec The spec key `registry-credentials` was ambiguous — could mean container registry auth or Laconic registry config. Rename to `image-pull-secret` which matches the k8s secret name it creates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 21:38:31 +00:00
A. F. Dudley	dc15c0f4a5	feat: auto-generate readiness probes from http-proxy routes Containers referenced in spec.yml http-proxy routes now get TCP readiness probes on the proxied port. This tells k8s when a container is actually ready to serve traffic. Without readiness probes, k8s considers pods ready immediately after start, which means: - Rolling updates cut over before the app is listening - Broken containers look "ready" and receive traffic (502s) - kubectl rollout undo has nothing to roll back to The probes use TCP socket checks (not HTTP) to work with any protocol. Initial delay 5s, check every 10s, fail after 3 consecutive failures. Closes so-l2l part C. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 19:43:09 +00:00
A. F. Dudley	2d11ca7bb0	feat: update-in-place deployments with rolling updates Replace the destroy-and-recreate deployment model with in-place updates. deploy_k8s.py: All resource creation (Deployment, Service, Ingress, NodePort, ConfigMap) now uses create-or-update semantics. If a resource already exists (409 Conflict), it patches instead of failing. For Deployments, this triggers a k8s rolling update — old pods serve traffic until new pods pass readiness checks. deployment.py: restart() no longer calls down(). It just calls up() which patches existing resources. No namespace deletion, no downtime gap, no race conditions. k8s handles the rollout. This gives: - Zero-downtime deploys (old pods serve during rollout) - Automatic rollback (if new pods fail readiness, rollout stalls) - Manual rollback via kubectl rollout undo Closes so-l2l (parts A and B). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 19:40:20 +00:00
A. F. Dudley	ba39c991f1	fix: create imagePullSecret in deployment namespace, not default create_registry_secret() hardcoded namespace="default" but deployments now run in dedicated laconic-* namespaces. The secret was invisible to pods in the deployment namespace, causing 401 on GHCR pulls. Accept namespace as parameter, passed from deploy_k8s.py which knows the correct namespace. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 19:08:52 +00:00
A. F. Dudley	0b3e5559d0	fix: wait for namespace termination in down() before returning Reverts the label-based deletion approach — resources created by older laconic-so lack labels, so label queries return empty results. Namespace deletion is the only reliable cleanup. Adds _wait_for_namespace_gone() so down() blocks until the namespace is fully terminated. This prevents the race condition where up() tries to create resources in a still-terminating namespace (403 Forbidden). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 18:49:38 +00:00
A. F. Dudley	ae2cea3410	fix: never delete namespace on deployment down down() deleted the entire namespace when it wasn't explicitly set in the spec. This causes a race condition on restart: up() tries to create resources in a namespace that's still terminating, getting 403 Forbidden. Always use _delete_resources_by_label() instead. The namespace is cheap to keep and required for immediate up() after down(). This also matches the shared-namespace behavior, making down() consistent regardless of namespace configuration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 18:47:05 +00:00
A. F. Dudley	e298e7444f	fix: add auto-generated header to config.env config.env is regenerated from spec.yml on every deploy create and restart, silently overwriting manual edits. Add a header comment explaining this so operators know to edit spec.yml instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 18:24:27 +00:00
A. F. Dudley	e5a8ec5f06	fix: rename registry secret to image-pull-secret The secret name `{app}-registry` is ambiguous — it could be a container registry credential or a Laconic registry config. Rename to `{app}-image-pull-secret` which clearly describes its purpose as a Kubernetes imagePullSecret for private container registries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 15:33:11 +00:00
A. F. Dudley	0bbb51067c	fix: set imagePullPolicy=Always for kind deployments Kind deployments used imagePullPolicy=None (defaults to IfNotPresent), which means the kind node caches images by tag and never re-pulls from the local registry. After a container rebuild + registry push, the pod keeps using the stale cached image. Set Always for all deployment types so k8s re-pulls on every pod restart. With a local registry this adds negligible overhead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-17 17:44:35 +00:00
A. F. Dudley	72aabe7d9a	fix: deploy create --update now syncs config.env from spec The --update path excluded config.env from the safe_copy_tree, which meant new config vars added to spec.yml were never written to config.env. The XXX comment already flagged this as broken. Remove config.env from exclude_patterns so --update regenerates it from spec.yml like the non-update path does. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-17 08:20:45 +00:00
afd	8a7491d3e0	Support multiple http-proxy entries in a single deployment Some checks failed Lint Checks / Run linter (push) Failing after 3h7m12s Details Previously get_ingress() only used the first http-proxy entry, silently ignoring additional hostnames. Now iterates over all entries, creating an Ingress rule and TLS config per hostname. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 06:16:28 +00:00
Prathamesh Musale	e7483bc7d1	Add init containers, shared namespaces, per-volume sizing, and user/label support (#997 ) Some checks failed Smoke Test / Run basic test suite (push) Has been cancelled Details Webapp Test / Run webapp test suite (push) Has been cancelled Details Deploy Test / Run deploy test suite (push) Has been cancelled Details Lint Checks / Run linter (push) Has been cancelled Details Publish / Build and publish (push) Failing after 3h0m5s Details Fixturenet-Laconicd-Test / Run Laconicd fixturenet and Laconic CLI tests (push) Failing after 18m13s Details K8s Deploy Test / Run deploy test suite on kind/k8s (push) Failing after 24m32s Details Database Test / Run database hosting test on kind/k8s (push) Failing after 38m22s Details Container Registry Test / Run contaier registry hosting test on kind/k8s (push) Failing after 1h5m50s Details External Stack Test / Run external stack test suite (push) Failing after 1h27m37s Details Reviewed-on: #997 Co-authored-by: Prathamesh Musale <prathamesh.musale0@gmail.com> Co-committed-by: Prathamesh Musale <prathamesh.musale0@gmail.com>	2026-03-12 10:34:45 +00:00
Prathamesh Musale	5af6a83fa2	Add Job and secrets support for k8s-kind deployments (#995 ) Some checks failed Lint Checks / Run linter (push) Successful in 1h39m12s Details Deploy Test / Run deploy test suite (push) Failing after 3h12m50s Details K8s Deployment Control Test / Run deployment control suite on kind/k8s (push) Failing after 3h12m50s Details Webapp Test / Run webapp test suite (push) Failing after 3h13m40s Details Smoke Test / Run basic test suite (push) Failing after 3h2m15s Details Publish / Build and publish (push) Successful in 1h4m11s Details Fixturenet-Laconicd-Test / Run Laconicd fixturenet and Laconic CLI tests (push) Successful in 1h45m35s Details K8s Deploy Test / Run deploy test suite on kind/k8s (push) Failing after 1h28m25s Details Database Test / Run database hosting test on kind/k8s (push) Successful in 1h6m49s Details Container Registry Test / Run contaier registry hosting test on kind/k8s (push) Failing after 2h17m3s Details External Stack Test / Run external stack test suite (push) Failing after 2h37m5s Details Part of https://plan.wireit.in/deepstack/browse/VUL-315 Reviewed-on: #995 Co-authored-by: Prathamesh Musale <prathamesh.musale0@gmail.com> Co-committed-by: Prathamesh Musale <prathamesh.musale0@gmail.com>	2026-03-11 03:56:21 +00:00
AFDudley	8cc0a9a19a	add/local-test-runner (#996 ) Some checks failed Lint Checks / Run linter (push) Successful in 1m59s Details Publish / Build and publish (push) Successful in 3m3s Details Deploy Test / Run deploy test suite (push) Successful in 5m24s Details Webapp Test / Run webapp test suite (push) Successful in 5m39s Details Smoke Test / Run basic test suite (push) Successful in 5m53s Details Fixturenet-Laconicd-Test / Run Laconicd fixturenet and Laconic CLI tests (push) Successful in 29m48s Details K8s Deploy Test / Run deploy test suite on kind/k8s (push) Failing after 26m37s Details Database Test / Run database hosting test on kind/k8s (push) Failing after 33m44s Details Container Registry Test / Run contaier registry hosting test on kind/k8s (push) Failing after 47m14s Details External Stack Test / Run external stack test suite (push) Failing after 1h14m42s Details Co-authored-by: A. F. Dudley <a.frederick.dudley@gmail.com> Reviewed-on: #996	2026-03-09 20:04:58 +00:00
AFDudley	4a1b5d86fd	Merge pull request 'fix(k8s): translate service names to localhost for sidecar containers' (#989 ) from fix-sidecar-localhost into main Some checks failed Lint Checks / Run linter (push) Successful in 16s Details Publish / Build and publish (push) Successful in 29s Details Deploy Test / Run deploy test suite (push) Successful in 2m10s Details Webapp Test / Run webapp test suite (push) Successful in 3m51s Details Smoke Test / Run basic test suite (push) Successful in 3m51s Details Fixturenet-Laconicd-Test / Run Laconicd fixturenet and Laconic CLI tests (push) Successful in 19m36s Details K8s Deploy Test / Run deploy test suite on kind/k8s (push) Failing after 2m52s Details Database Test / Run database hosting test on kind/k8s (push) Failing after 8m16s Details Container Registry Test / Run contaier registry hosting test on kind/k8s (push) Failing after 3m50s Details External Stack Test / Run external stack test suite (push) Failing after 3m19s Details Reviewed-on: #989	2026-02-03 23:13:27 +00:00
A. F. Dudley	019225ca18	fix(k8s): translate service names to localhost for sidecar containers Some checks failed Lint Checks / Run linter (push) Failing after 3s Details Lint Checks / Run linter (pull_request) Failing after 4s Details Deploy Test / Run deploy test suite (pull_request) Failing after 4s Details K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Failing after 5s Details K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Failing after 4s Details Webapp Test / Run webapp test suite (pull_request) Failing after 5s Details Smoke Test / Run basic test suite (pull_request) Failing after 5s Details In docker-compose, services can reference each other by name (e.g., 'db:5432'). In Kubernetes, when multiple containers are in the same pod (sidecars), they share the same network namespace and must use 'localhost' instead. This fix adds translate_sidecar_service_names() which replaces docker-compose service name references with 'localhost' in environment variable values for containers that share the same pod. Fixes issue where multi-container pods fail because one container tries to connect to a sibling using the compose service name instead of localhost. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 18:10:32 -05:00
AFDudley	0296da6f64	Merge pull request 'feat(k8s): namespace-per-deployment for resource isolation and cleanup' (#988 ) from feat-namespace-per-deployment into main Some checks failed Lint Checks / Run linter (push) Failing after 5s Details Deploy Test / Run deploy test suite (push) Failing after 5s Details Publish / Build and publish (push) Failing after 6s Details Webapp Test / Run webapp test suite (push) Failing after 5s Details Smoke Test / Run basic test suite (push) Failing after 5s Details Reviewed-on: #988	2026-02-03 23:09:16 +00:00
A. F. Dudley	d913926144	feat(k8s): namespace-per-deployment for resource isolation and cleanup Some checks failed Lint Checks / Run linter (push) Failing after 4s Details Deploy Test / Run deploy test suite (pull_request) Failing after 5s Details K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Failing after 5s Details K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Failing after 5s Details Webapp Test / Run webapp test suite (pull_request) Failing after 5s Details Smoke Test / Run basic test suite (pull_request) Failing after 4s Details Lint Checks / Run linter (pull_request) Failing after 3s Details Each deployment now gets its own Kubernetes namespace (laconic-{deployment_id}). This provides: - Resource isolation between deployments on the same cluster - Simplified cleanup: deleting the namespace cascades to all namespaced resources - No orphaned resources possible when deployment IDs change Changes: - Set k8s_namespace based on deployment name in __init__ - Add _ensure_namespace() to create namespace before deploying resources - Add _delete_namespace() for cleanup - Simplify down() to just delete PVs (cluster-scoped) and the namespace - Fix hardcoded "default" namespace in logs function Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 18:04:52 -05:00
AFDudley	b41e0cb2f5	Merge pull request 'fix(k8s): query resources by label in down() for proper cleanup' (#987 ) from fix-down-cleanup-by-label into main Some checks failed Lint Checks / Run linter (push) Failing after 17s Details Publish / Build and publish (push) Successful in 27s Details Deploy Test / Run deploy test suite (push) Successful in 2m13s Details Smoke Test / Run basic test suite (push) Successful in 3m54s Details Webapp Test / Run webapp test suite (push) Successful in 4m13s Details Reviewed-on: #987	2026-02-03 22:57:52 +00:00
A. F. Dudley	47d3d10ead	fix(k8s): query resources by label in down() for proper cleanup Some checks failed Lint Checks / Run linter (push) Failing after 14s Details Lint Checks / Run linter (pull_request) Failing after 15s Details K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Failing after 2m3s Details Deploy Test / Run deploy test suite (pull_request) Successful in 2m10s Details K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Failing after 2m51s Details Webapp Test / Run webapp test suite (pull_request) Successful in 4m0s Details Smoke Test / Run basic test suite (pull_request) Successful in 3m56s Details Previously, down() generated resource names from the deployment config and deleted those specific names. This failed to clean up orphaned resources when deployment IDs changed (e.g., after force_redeploy). Changes: - Add 'app' label to all resources: Ingress, Service, NodePort, ConfigMap, PV - Refactor down() to query K8s by label selector instead of generating names - This ensures all resources for a deployment are cleaned up, even if the deployment config has changed or been deleted Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:55:14 -05:00
AFDudley	21d47908cc	Merge pull request 'feat(k8s): ACME email fix, etcd persistence, volume paths' (#986 ) from fix-caddy-acme-email-rbac into main Some checks failed Lint Checks / Run linter (push) Failing after 16s Details Publish / Build and publish (push) Successful in 29s Details Deploy Test / Run deploy test suite (push) Successful in 2m10s Details Webapp Test / Run webapp test suite (push) Successful in 3m46s Details Smoke Test / Run basic test suite (push) Successful in 3m47s Details Reviewed-on: #986	2026-02-03 22:31:47 +00:00
A. F. Dudley	f70e87b848	Add etcd + PKI extraMounts for offline data recovery Some checks failed Lint Checks / Run linter (push) Failing after 13s Details Lint Checks / Run linter (pull_request) Failing after 16s Details Deploy Test / Run deploy test suite (pull_request) Successful in 2m18s Details K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Failing after 2m43s Details K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Successful in 3m31s Details Smoke Test / Run basic test suite (pull_request) Successful in 4m8s Details Webapp Test / Run webapp test suite (pull_request) Successful in 4m21s Details Mount /var/lib/etcd and /etc/kubernetes/pki to host filesystem so cluster state is preserved for offline recovery. Each deployment gets its own backup directory keyed by deployment ID. Directory structure: data/cluster-backups/{deployment_id}/etcd/ data/cluster-backups/{deployment_id}/pki/ This enables extracting secrets from etcd backups using etcdctl with the preserved PKI certificates. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:19:52 -05:00
A. F. Dudley	5bc6c978ac	feat(k8s): support acme-email config for Caddy ingress Adds support for configuring ACME email for Let's Encrypt certificates in kind deployments. The email can be specified in the spec under network.acme-email and will be used to configure the Caddy ingress controller ConfigMap. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:19:52 -05:00
A. F. Dudley	ee59918082	Allow relative volume paths for k8s-kind deployments For k8s-kind, relative paths (e.g., ./data/rpc-config) are resolved to $DEPLOYMENT_DIR/path by _make_absolute_host_path() during kind config generation. This provides Docker Host persistence that survives cluster restarts. Previously, validation threw an exception before paths could be resolved, making it impossible to use relative paths for persistent storage. Changes: - deployment_create.py: Skip relative path check for k8s-kind - cluster_info.py: Allow relative paths to reach PV generation - docs/deployment_patterns.md: Document volume persistence patterns Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:17:44 -05:00
A. F. Dudley	581ceaea94	docs: Add cluster and volume management section Document that: - Volumes persist across cluster deletion by design - Only use --delete-volumes when explicitly requested - Multiple deployments share one kind cluster - Use --skip-cluster-management to stop single deployment Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:16:26 -05:00
A. F. Dudley	7cecf2caa6	Fix Caddy ACME email race condition by templating YAML Previously, install_ingress_for_kind() applied the YAML (which starts the Caddy pod with email: ""), then patched the ConfigMap afterward. The pod had already read the empty email and Caddy doesn't hot-reload. Now template the email into the YAML before applying, so the pod starts with the correct email from the beginning. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:16:26 -05:00
A. F. Dudley	cb6fdb77a6	Rename image-registry to registry-credentials to avoid collision The existing 'image-registry' key is used for pushing images to a remote registry (URL string). Rename the new auth config to 'registry-credentials' to avoid collision. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:16:26 -05:00
A. F. Dudley	73ba13aaa5	Add private registry authentication support Add ability to configure private container registry credentials in spec.yml for deployments using images from registries like GHCR. - Add get_image_registry_config() to spec.py for parsing image-registry config - Add create_registry_secret() to create K8s docker-registry secrets - Update cluster_info.py to use dynamic {deployment}-registry secret names - Update deploy_k8s.py to create registry secret before deployment - Document feature in deployment_patterns.md The token-env pattern keeps credentials out of git - the spec references an environment variable name, and the actual token is passed at runtime. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:16:26 -05:00
A. F. Dudley	d82b3fb881	Only load locally-built images into kind, auto-detect ingress - Check stack.yml containers: field to determine which images are local builds - Only load local images via kind load; let k8s pull registry images directly - Add is_ingress_running() to skip ingress installation if already running - Fixes deployment failures when public registry images aren't in local Docker Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:16:26 -05:00
A. F. Dudley	3bc7832d8c	Fix deployment name extraction from path When stack: field in spec.yml contains a path (e.g., stack_orchestrator/data/stacks/name), extract just the final name component for K8s secret naming. K8s resource names must be valid RFC 1123 subdomains and cannot contain slashes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:16:26 -05:00
A. F. Dudley	a75138093b	Add setup-repositories to key files list Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:16:26 -05:00
A. F. Dudley	1128c95969	Split documentation: README for users, CLAUDE.md for agents README.md: deployment types, external stacks, commands, spec.yml reference CLAUDE.md: implementation details, code locations, codebase navigation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:16:26 -05:00
A. F. Dudley	d292e7c48d	Add k8s-kind architecture documentation to CLAUDE.md Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:16:25 -05:00
A. F. Dudley	b057969ddd	Clarify create_cluster docstring: one cluster per host by design Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
A. F. Dudley	ca090d2cd5	Add $generate:type:length$ token support for K8s secrets - Add GENERATE_TOKEN_PATTERN to detect $generate:hex:N$ and $generate:base64:N$ tokens - Add _generate_and_store_secrets() to create K8s Secrets from spec.yml config - Modify _write_config_file() to separate secrets from regular config - Add env_from with secretRef to container spec in cluster_info.py - Secrets are injected directly into containers via K8s native mechanism This enables declarative secret generation in spec.yml: config: SESSION_SECRET: $generate:hex:32$ DB_PASSWORD: $generate:hex:16$ Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
A. F. Dudley	2d3721efa4	Add cluster reuse for multi-stack k8s-kind deployments When deploying a second stack to k8s-kind, automatically reuse an existing kind cluster instead of trying to create a new one (which would fail due to port 80/443 conflicts). Changes: - helpers.py: create_cluster() now checks for existing cluster first - deploy_k8s.py: up() captures returned cluster name and updates self This enables deploying multiple stacks (e.g., gorbagana-rpc + trashscan-explorer) to the same kind cluster. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
A. F. Dudley	4408725b08	Fix repo root path calculation (4 parents from stack path)	2026-02-03 17:15:19 -05:00
A. F. Dudley	22d64f1e97	Add --spec-file option to restart and auto-detect GitOps spec - Add --spec-file option to specify spec location in repo - Auto-detect deployment/spec.yml in repo as GitOps location - Fall back to deployment dir if no repo spec found Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
A. F. Dudley	14258500bc	Fix restart command for GitOps deployments - Remove init_operation() from restart - don't regenerate spec from commands.py defaults, use existing git-tracked spec.yml instead - Add docs/deployment_patterns.md documenting GitOps workflow - Add pre-commit rule to CLAUDE.md - Fix line length issues in helpers.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
A. F. Dudley	3fbd854b8c	Use docker for etcd existence check (root-owned dir) The etcd directory is root-owned, so shell test -f fails. Use docker with volume mount to check file existence. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
A. F. Dudley	e2d3c44321	Keep timestamped backup of etcd forever Create member.backup-YYYYMMDD-HHMMSS before cleaning. Each cluster recreation creates a new backup, preserving history. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
A. F. Dudley	720e01fc75	Preserve original etcd backup until restore is verified Move original to .bak, move new into place, then delete bak. If anything fails before the swap, original remains intact. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
A. F. Dudley	5b06cffe17	Use whitelist approach for etcd cleanup Instead of trying to delete specific stale resources (blacklist), keep only the valuable data (caddy TLS certs) and delete everything else. This is more robust as we don't need to maintain a list of all possible stale resources. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
A. F. Dudley	8948f5bfec	Fix etcd cleanup to use docker for root-owned files Use docker containers with volume mounts to handle all file operations on root-owned etcd directories, avoiding the need for sudo on the host. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
A. F. Dudley	675ee87544	Clear stale CNI resources from persisted etcd before cluster creation When etcd is persisted (for certificate backup) and a cluster is recreated, kind tries to install CNI (kindnet) fresh but the persisted etcd already has those resources, causing 'AlreadyExists' errors and cluster creation failure. This fix: - Detects etcd mount path from kind config - Before cluster creation, clears stale CNI resources (kindnet, coredns) - Preserves certificate and other important data Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00

1 2 3 4 5 ...

1198 Commits