Compare commits

...

32 Commits

Author SHA1 Message Date
Snake Game Developer
90e32ffd60 Support image-overrides in spec for testing
Some checks failed
Lint Checks / Run linter (push) Failing after 3h11m25s
Spec can override container images:
  image-overrides:
    dumpster-kubo: ghcr.io/.../dumpster-kubo:test-tag

Merged with CLI overrides (CLI wins). Enables testing with
GHCR-pushed test tags without modifying compose files.

Also reverts the image-pull-policy spec key (not needed —
the fix is to use proper GHCR tags, not IfNotPresent).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 01:02:23 +00:00
Snake Game Developer
1052a1d4e7 Support image-pull-policy in spec (default: Always)
Testing specs can set image-pull-policy: IfNotPresent so kind-loaded
local images are used instead of pulling from the registry. Production
specs omit the key and get the default Always behavior.

Root cause: with Always, k8s pulled the GHCR kubo image (with baked
R2 endpoint) instead of the locally-built image (with https://s3:443),
causing kubo to connect to R2 directly and get Unauthorized.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 20:17:06 +00:00
Snake Game Developer
f93541f7db Fix CA cert mounting: subPath for Go, expanduser for configmaps
- CA certs mounted via subPath into /etc/ssl/certs/ so Go's x509
  picks them up (directory mount replaces the entire dir)
- get_configmaps() now expands ~ in paths via os.path.expanduser()
- Both changes discovered during testing with mkcert + MinIO

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 19:27:14 +00:00
Snake Game Developer
713a81c245 Add external-services and ca-certificates spec keys
New spec.yml features for routing external service dependencies:

external-services:
  s3:
    host: example.com  # ExternalName Service (production)
    port: 443
  s3:
    selector: {app: mock}  # headless Service + Endpoints (testing)
    namespace: mock-ns
    port: 443

ca-certificates:
  - ~/.local/share/mkcert/rootCA.pem  # testing only

laconic-so creates the appropriate k8s Service type per mode:
- host mode: ExternalName (DNS CNAME to external provider)
- selector mode: headless Service + Endpoints with pod IPs
  discovered from the target namespace at deploy time

ca-certificates mounts CA files into all containers at
/etc/ssl/certs/ and sets NODE_EXTRA_CA_CERTS for Node/Bun.

Also includes the previously committed PV Released state fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 15:25:47 +00:00
Snake Game Developer
98ff221a21 Fix PV rebinding after deployment stop/start cycle
deployment stop deletes the namespace (and PVCs) but preserves PVs
by default. On the next deployment start, PVs are in Released state
with a stale claimRef pointing at the deleted PVC. New PVCs cannot
bind to Released PVs, so pods get stuck in Pending.

Clear the claimRef on any Released PV during _create_volume_data()
so the PV returns to Available and can accept new PVC bindings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 07:47:23 +00:00
A. F. Dudley
7141dc7637 file so-p3p: laconic-so should manage Caddy ingress image lifecycle
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 00:30:46 +00:00
A. F. Dudley
2555df06b5 fix: use patched Caddy ingress image with ACME storage fix
Switch from caddy/ingress:latest to ghcr.io/laconicnetwork/caddy-ingress:latest
which has the List()/Stat() fix for secret_store. This fixes multi-domain
ACME provisioning deadlock where the second domain's cert request fails
because List() returns mangled keys and Stat() returns wrong IsTerminal.

Source: LaconicNetwork/ingress@109d69a (fix/acme-account-reuse branch)

Fixes: so-o2o (partially — etcd backup investigation still needed)
Closes: ds-v22v (Caddy sequential provisioning no longer needed)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 23:31:39 +00:00
A. F. Dudley
24cf22fea5 File pebbles: mount propagation merge + etcd cert backup broken
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 23:01:20 +00:00
A. F. Dudley
8d03083d0d feat: add kind-mount-root for unified Kind extraMount
When kind-mount-root is set in spec.yml, emit a single extraMount
mapping the root to /mnt instead of per-volume mounts. This allows
adding new volumes without recreating the Kind cluster.

Volumes whose host path is under the root skip individual extraMounts
and their PV paths resolve to /mnt/{relative_path}. Volumes outside
the root keep individual extraMounts as before.

Cherry-picked from branch enya-ac868cc4-kind-mount-propagation-fix
(commits b6d6ad81, 929bdab8) and adapted for current main.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 21:28:40 +00:00
A. F. Dudley
9109cfb7a1 feat: add token-file option for image-pull-secret registry auth
Adds token-file key to image-pull-secret spec config. Reads the
registry token from a file on disk instead of requiring an environment
variable. File path supports ~ expansion. Falls back to token-env
if token-file is not set or file doesn't exist.

This lets operators store the GHCR token in ~/.credentials/ alongside
other secrets, removing the need for ansible to pass REGISTRY_TOKEN
as an env var.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:30:44 +00:00
A. F. Dudley
61afeb255c fix: keep cwd at repo root through entire restart, revert try/except
The stack path in spec.yml is relative — both create_operation and
up_operation need cwd at the repo root for stack_is_external() to
resolve it. Move os.chdir(prev_cwd) to after up_operation completes
instead of between the two operations.

Reverts the SystemExit catch in call_stack_deploy_start — the root
cause was cwd, not the hook.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 15:54:46 +00:00
A. F. Dudley
32f6e57b70 fix: ConfigMap volumes don't force Recreate strategy + resilient hooks
Two fixes for multi-deployment:

1. _pod_has_pvcs now excludes ConfigMap volumes from PVC detection.
   Pods with only ConfigMap volumes (like maintenance) correctly get
   RollingUpdate strategy instead of Recreate.

2. call_stack_deploy_start catches SystemExit when stack path doesn't
   resolve from cwd (common during restart). Most stacks don't have
   deploy hooks, so this is non-fatal.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 15:51:58 +00:00
A. F. Dudley
6923e1c23b refactor: extract methods from K8sDeployer.up to fix C901 complexity
Split up() into _setup_cluster(), _create_ingress(), _create_nodeports().
Reduces cyclomatic complexity below the flake8 threshold.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 15:20:50 +00:00
A. F. Dudley
5b8303f8f9 fix: resolve stack path from repo root + update deploy test
- chdir to git repo root before create_operation so relative stack
  paths in spec.yml resolve correctly via stack_is_external()
- Update deploy test: config.env is now regenerated from spec on
  --update (matching 72aabe7d behavior), verify backup exists

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 15:14:47 +00:00
A. F. Dudley
0ac886bf95 fix: chdir to repo root before create_operation in restart
The spec's "stack:" value is a relative path that must resolve from
the repo root. stack_is_external() checks Path(stack).exists() from
cwd, which fails when cwd isn't the repo root.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 15:06:38 +00:00
A. F. Dudley
2484abfcce fix: use git rev-parse for repo root in restart command
The repo_root calculation assumed stack paths are always 4 levels deep
(stack_orchestrator/data/stacks/name). External stacks with different
nesting (e.g. stack-orchestrator/stacks/name = 3 levels) got the wrong
root, causing --spec-file resolution to fail.

Use git rev-parse --show-toplevel instead.

Fixes: so-k1k

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 15:03:24 +00:00
A. F. Dudley
967936e524 Multi-deployment: one k8s Deployment per pod in stack.yml
Each pod entry in stack.yml now creates its own k8s Deployment with
independent lifecycle and update strategy. Pods with PVCs get Recreate,
pods without get RollingUpdate. This enables maintenance services that
survive main pod restarts.

- cluster_info: get_deployments() builds per-pod Deployments, Services
- cluster_info: Ingress routes to correct per-pod Service
- deploy_k8s: _create_deployment() iterates all Deployments/Services
- deployment: restart swaps Ingress to maintenance service during Recreate
- spec: add maintenance-service key

Single-pod stacks are backward compatible (same resource names).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 01:40:45 +00:00
A. F. Dudley
6ace024cd3 fix: use replace instead of patch for k8s resource updates
Strategic merge patch preserves fields not present in the patch body.
This means removed volumes, ports, and env vars persist in the running
Deployment after a restart. Replace sends the complete spec built from
the current compose files — removed fields are actually deleted.

Affects Deployment, Service, Ingress, and NodePort updates. Service
replace preserves clusterIP (immutable field) by reading it from the
existing resource before replacing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 03:44:57 +00:00
A. F. Dudley
ea610bb8d6 Merge branch 'cv-c3c-image-flag-for-restart'
# Conflicts:
#	stack_orchestrator/deploy/k8s/deploy_k8s.py
2026-03-18 23:04:55 +00:00
A. F. Dudley
4b1fc27a1e cv-c3c: add --image flag to deployment restart command
Allows callers to override container images during restart, e.g.:
  laconic-so deployment restart --image backend=ghcr.io/org/app:sha123

The override is applied to the k8s Deployment spec before
create-or-patch. Docker/compose deployers accept the parameter
but ignore it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 22:42:56 +00:00
A. F. Dudley
25e5ff09d9 so-m3m: add credentials-files spec key for on-disk credential injection
_write_config_file() now reads each file listed under the credentials-files
top-level spec key and appends its contents to config.env after config vars.
Paths support ~ expansion. Missing files fail hard with sys.exit(1).

Also adds get_credentials_files() to Spec class following the same pattern
as get_image_registry_config().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 21:55:28 +00:00
A. F. Dudley
0e4ecc3602 refactor: rename registry-credentials to image-pull-secret in spec
The spec key `registry-credentials` was ambiguous — could mean container
registry auth or Laconic registry config. Rename to `image-pull-secret`
which matches the k8s secret name it creates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 21:38:31 +00:00
A. F. Dudley
dc15c0f4a5 feat: auto-generate readiness probes from http-proxy routes
Containers referenced in spec.yml http-proxy routes now get TCP
readiness probes on the proxied port. This tells k8s when a container
is actually ready to serve traffic.

Without readiness probes, k8s considers pods ready immediately after
start, which means:
- Rolling updates cut over before the app is listening
- Broken containers look "ready" and receive traffic (502s)
- kubectl rollout undo has nothing to roll back to

The probes use TCP socket checks (not HTTP) to work with any protocol.
Initial delay 5s, check every 10s, fail after 3 consecutive failures.

Closes so-l2l part C.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 19:43:09 +00:00
A. F. Dudley
2d11ca7bb0 feat: update-in-place deployments with rolling updates
Replace the destroy-and-recreate deployment model with in-place updates.

deploy_k8s.py: All resource creation (Deployment, Service, Ingress,
NodePort, ConfigMap) now uses create-or-update semantics. If a resource
already exists (409 Conflict), it patches instead of failing. For
Deployments, this triggers a k8s rolling update — old pods serve traffic
until new pods pass readiness checks.

deployment.py: restart() no longer calls down(). It just calls up()
which patches existing resources. No namespace deletion, no downtime
gap, no race conditions. k8s handles the rollout.

This gives:
- Zero-downtime deploys (old pods serve during rollout)
- Automatic rollback (if new pods fail readiness, rollout stalls)
- Manual rollback via kubectl rollout undo

Closes so-l2l (parts A and B).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 19:40:20 +00:00
A. F. Dudley
ba39c991f1 fix: create imagePullSecret in deployment namespace, not default
create_registry_secret() hardcoded namespace="default" but deployments
now run in dedicated laconic-* namespaces. The secret was invisible
to pods in the deployment namespace, causing 401 on GHCR pulls.

Accept namespace as parameter, passed from deploy_k8s.py which knows
the correct namespace.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 19:08:52 +00:00
A. F. Dudley
0b3e5559d0 fix: wait for namespace termination in down() before returning
Reverts the label-based deletion approach — resources created by older
laconic-so lack labels, so label queries return empty results. Namespace
deletion is the only reliable cleanup.

Adds _wait_for_namespace_gone() so down() blocks until the namespace
is fully terminated. This prevents the race condition where up() tries
to create resources in a still-terminating namespace (403 Forbidden).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 18:49:38 +00:00
A. F. Dudley
ae2cea3410 fix: never delete namespace on deployment down
down() deleted the entire namespace when it wasn't explicitly set in
the spec. This causes a race condition on restart: up() tries to create
resources in a namespace that's still terminating, getting 403 Forbidden.

Always use _delete_resources_by_label() instead. The namespace is cheap
to keep and required for immediate up() after down(). This also matches
the shared-namespace behavior, making down() consistent regardless of
namespace configuration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 18:47:05 +00:00
A. F. Dudley
e298e7444f fix: add auto-generated header to config.env
config.env is regenerated from spec.yml on every deploy create and
restart, silently overwriting manual edits. Add a header comment
explaining this so operators know to edit spec.yml instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 18:24:27 +00:00
A. F. Dudley
e5a8ec5f06 fix: rename registry secret to image-pull-secret
The secret name `{app}-registry` is ambiguous — it could be a container
registry credential or a Laconic registry config. Rename to
`{app}-image-pull-secret` which clearly describes its purpose as a
Kubernetes imagePullSecret for private container registries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 15:33:11 +00:00
A. F. Dudley
0bbb51067c fix: set imagePullPolicy=Always for kind deployments
Kind deployments used imagePullPolicy=None (defaults to IfNotPresent),
which means the kind node caches images by tag and never re-pulls from
the local registry. After a container rebuild + registry push, the pod
keeps using the stale cached image.

Set Always for all deployment types so k8s re-pulls on every pod
restart. With a local registry this adds negligible overhead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 17:44:35 +00:00
A. F. Dudley
72aabe7d9a fix: deploy create --update now syncs config.env from spec
The --update path excluded config.env from the safe_copy_tree, which
meant new config vars added to spec.yml were never written to
config.env. The XXX comment already flagged this as broken.

Remove config.env from exclude_patterns so --update regenerates it
from spec.yml like the non-update path does.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 08:20:45 +00:00
afd
8a7491d3e0 Support multiple http-proxy entries in a single deployment
Some checks failed
Lint Checks / Run linter (push) Failing after 3h7m12s
Previously get_ingress() only used the first http-proxy entry,
silently ignoring additional hostnames. Now iterates over all
entries, creating an Ingress rule and TLS config per hostname.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 06:16:28 +00:00
17 changed files with 3551 additions and 369 deletions

1
.gitignore vendored
View File

@ -8,3 +8,4 @@ __pycache__
package
stack_orchestrator/data/build_tag.txt
/build
.worktrees

1
.pebbles/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
pebbles.db

1
.pebbles/config.json Normal file
View File

@ -0,0 +1 @@
{"project": "stack-orchestrator", "prefix": "so"}

10
.pebbles/events.jsonl Normal file
View File

@ -0,0 +1,10 @@
{"type": "create", "timestamp": "2026-03-18T14:45:07.038870Z", "issue_id": "so-a1a", "payload": {"title": "deploy create should support external credential injection", "type": "feature", "priority": "2", "description": "deploy create generates config.env but provides no mechanism to inject external credentials (API keys, tokens, etc.) at creation time. Operators must append to config.env after the fact, which mutates a build artifact. deploy create should accept --credentials-file or similar to include secrets in the generated config.env."}}
{"type": "create", "timestamp": "2026-03-18T14:45:07.038942Z", "issue_id": "so-b2b", "payload": {"title": "REGISTRY_TOKEN / imagePullSecret flow undocumented", "type": "bug", "priority": "2", "description": "create_registry_secret() exists in deployment_create.py and is called during up(), but REGISTRY_TOKEN is not documented in spec.yml or any user-facing docs. The restart command warns \"Registry token env var REGISTRY_TOKEN not set, skipping registry secret\" but doesn't explain how to set it. For GHCR private images, this is required and the flow from spec.yml -> config.env -> imagePullSecret needs documentation."}}
{"type": "create", "timestamp": "2026-03-18T19:10:00.000000Z", "issue_id": "so-k1k", "payload": {"title": "Stack path resolution differs between deploy create and deployment restart", "type": "bug", "priority": "2", "description": "deploy create resolves --stack as a relative path from cwd. deployment restart resolves --stack-path as absolute, then computes repo_root as 4 parents up (assuming stack_orchestrator/data/stacks/name structure). External stacks with different nesting depths (e.g. stack-orchestrator/stacks/name = 3 levels) get wrong repo_root, causing --spec-file resolution to fail. The two commands should use the same path resolution logic."}}
{"type": "create", "timestamp": "2026-03-18T19:25:00.000000Z", "issue_id": "so-l2l", "payload": {"title": "deployment restart should update in place, not delete/recreate", "type": "bug", "priority": "1", "description": "deployment restart deletes the entire namespace then recreates everything from scratch. This causes:\n\n1. **Downtime** — nothing serves traffic between delete and successful recreate\n2. **No rollback** — deleting the namespace destroys ReplicaSet revision history\n3. **Race conditions** — namespace may still be terminating when up() tries to create\n4. **Cascading failures** — if ANY container fails to start, the entire site is down with no fallback\n\nFix: three changes needed.\n\n**A. up() should create-or-update, not just create.** Use patch/apply semantics for Deployments, Services, Ingresses. When the pod spec changes (new env vars, new image), k8s creates a new ReplicaSet, scales it up, waits for readiness probes, then scales the old one down. Old pods serve traffic until new pods are healthy.\n\n**B. down() should never delete the namespace on restart.** Only on explicit teardown. The namespace owns the revision history. Current code: _delete_namespace() on every down(). Should: delete individual resources by label for teardown, do nothing for restart (let update-in-place handle it).\n\n**C. All containers need readiness probes.** Without them k8s considers pods ready immediately, defeating rolling update safety. laconic-so should generate readiness probes from the http-proxy routes in spec.yml (if a container has an http route, probe that port).\n\nWith these changes, k8s native rolling updates provide zero-downtime deploys and automatic rollback (if new pods fail readiness, rollout stalls, old pods keep serving).\n\nSource files:\n- deploy_k8s.py: up(), down(), _create_deployment(), _delete_namespace()\n- cluster_info.py: pod spec generation (needs readiness probes)\n- deployment.py: restart() orchestration"}}
{"type": "create", "timestamp": "2026-03-18T20:15:03.000000Z", "issue_id": "so-m3m", "payload": {"title": "Add credentials-files spec key for on-disk credential injection", "type": "feature", "priority": "1", "description": "deployment restart regenerates config.env from spec.yml, wiping credentials that were appended from on-disk files (e.g. ~/.credentials/*.env). Operators must append credentials after deploy create, which is fragile and breaks on restart.\n\nFix: New top-level spec key credentials-files. _write_config_file() reads each file and appends its contents to config.env after writing config vars. Files are read at deploy time from the deployment host.\n\nSpec syntax:\n credentials-files:\n - ~/.credentials/dumpster-secrets.env\n - ~/.credentials/dumpster-r2.env\n\nFiles:\n- deploy/spec.py: add get_credentials_files() returning list of paths\n- deploy/deployment_create.py: in _write_config_file(), after writing config vars, read and append each credentials file (expand ~ to home dir)\n\nAlso update dumpster-stack spec.yml to use the new key and remove the ansible credential append workaround from woodburn_deployer (group_vars/all.yml credentials_env_files, stack_deploy role append tasks, restart_dumpster.yml credential steps). Those cleanups are in the woodburn_deployer repo."}}
{"type":"status_update","timestamp":"2026-03-18T21:54:12.59148256Z","issue_id":"so-m3m","payload":{"status":"in_progress"}}
{"type":"close","timestamp":"2026-03-18T21:55:31.6035544Z","issue_id":"so-m3m","payload":{}}
{"type": "create", "timestamp": "2026-03-20T23:05:00.000000Z", "issue_id": "so-n1n", "payload": {"title": "Merge kind-mount-propagation branch — HostToContainer propagation for extraMounts", "type": "feature", "priority": "2", "description": "The kind-mount-root feature was cherry-picked to main (commit 8d03083d) but the mount propagation fix (commit 929bdab8 on branch enya-ac868cc4-kind-mount-propagation-fix) adds HostToContainer propagation so host submounts propagate into the Kind node. This is needed for ZFS child datasets and tmpfs mounts under the root. Cherry-pick 929bdab8 to main."}}
{"type": "create", "timestamp": "2026-03-20T23:05:00.000000Z", "issue_id": "so-o2o", "payload": {"title": "etcd cert backup not persisting across cluster deletion", "type": "bug", "priority": "1", "description": "The extraMount for etcd at data/cluster-backups/<id>/etcd is configured but after cluster deletion the directory is empty. Caddy TLS certificates stored in etcd are lost. Either etcd isn't writing to the host mount, or the cleanup code is deleting the backup. Investigate _clean_etcd_keeping_certs in helpers.py."}}
{"type": "create", "timestamp": "2026-03-21T00:20:00.000000Z", "issue_id": "so-p3p", "payload": {"title": "laconic-so should manage Caddy ingress image lifecycle", "type": "feature", "priority": "2", "description": "The Caddy ingress controller image is hardcoded in ingress-caddy-kind-deploy.yaml. There's no mechanism to update it without manual kubectl commands or cluster recreation. laconic-so should: 1) Allow spec.yml to specify a custom Caddy image, 2) Support updating the Caddy image as part of deployment restart, 3) Set strategy: Recreate on the Caddy Deployment (hostPort pods can't do RollingUpdate). This would let cryovial or similar tooling trigger Caddy updates through the normal deployment pipeline."}}

View File

@ -46,3 +46,6 @@ runtime_class_key = "runtime-class"
high_memlock_runtime = "high-memlock"
high_memlock_spec_filename = "high-memlock-spec.json"
acme_email_key = "acme-email"
kind_mount_root_key = "kind-mount-root"
external_services_key = "external-services"
ca_certificates_key = "ca-certificates"

View File

@ -186,8 +186,8 @@ spec:
operator: Equal
containers:
- name: caddy-ingress-controller
image: caddy/ingress:latest
imagePullPolicy: IfNotPresent
image: ghcr.io/laconicnetwork/caddy-ingress:latest
imagePullPolicy: Always
ports:
- name: http
containerPort: 80

View File

@ -48,7 +48,7 @@ class DockerDeployer(Deployer):
self.compose_project_name = compose_project_name
self.compose_env_file = compose_env_file
def up(self, detach, skip_cluster_management, services):
def up(self, detach, skip_cluster_management, services, image_overrides=None):
if not opts.o.dry_run:
try:
return self.docker.compose.up(detach=detach, services=services)

View File

@ -137,7 +137,11 @@ def create_deploy_context(
def up_operation(
ctx, services_list, stay_attached=False, skip_cluster_management=False
ctx,
services_list,
stay_attached=False,
skip_cluster_management=False,
image_overrides=None,
):
global_context = ctx.parent.parent.obj
deploy_context = ctx.obj
@ -156,6 +160,7 @@ def up_operation(
detach=not stay_attached,
skip_cluster_management=skip_cluster_management,
services=services_list,
image_overrides=image_overrides,
)
for post_start_command in cluster_context.post_start_commands:
_run_command(global_context, cluster_context.cluster, post_start_command)

View File

@ -20,7 +20,7 @@ from typing import Optional
class Deployer(ABC):
@abstractmethod
def up(self, detach, skip_cluster_management, services):
def up(self, detach, skip_cluster_management, services, image_overrides=None):
pass
@abstractmethod

View File

@ -17,7 +17,7 @@ import click
from pathlib import Path
import subprocess
import sys
import time
from stack_orchestrator import constants
from stack_orchestrator.deploy.images import push_images_operation
from stack_orchestrator.deploy.deploy import (
@ -248,8 +248,13 @@ def run_job(ctx, job_name, helm_release):
"--expected-ip",
help="Expected IP for DNS verification (if different from egress)",
)
@click.option(
"--image",
multiple=True,
help="Override container image: container=image",
)
@click.pass_context
def restart(ctx, stack_path, spec_file, config_file, force, expected_ip):
def restart(ctx, stack_path, spec_file, config_file, force, expected_ip, image):
"""Pull latest code and restart deployment using git-tracked spec.
GitOps workflow:
@ -276,6 +281,17 @@ def restart(ctx, stack_path, spec_file, config_file, force, expected_ip):
deployment_context: DeploymentContext = ctx.obj
# Parse --image flags into a dict of container_name -> image
image_overrides = {}
for entry in image:
if "=" not in entry:
raise click.BadParameter(
f"Invalid --image format '{entry}', expected container=image",
param_hint="'--image'",
)
container_name, image_ref = entry.split("=", 1)
image_overrides[container_name] = image_ref
# Get current spec info (before git pull)
current_spec = deployment_context.spec
current_http_proxy = current_spec.get_http_proxy()
@ -322,9 +338,22 @@ def restart(ctx, stack_path, spec_file, config_file, force, expected_ip):
# Determine spec file location
# Priority: --spec-file argument > repo's deployment/spec.yml > deployment dir
# Stack path is like: repo/stack_orchestrator/data/stacks/stack-name
# So repo root is 4 parents up
repo_root = stack_source.parent.parent.parent.parent
# Find repo root via git rather than assuming a fixed directory depth.
git_root_result = subprocess.run(
["git", "rev-parse", "--show-toplevel"],
cwd=stack_source,
capture_output=True,
text=True,
)
if git_root_result.returncode == 0:
repo_root = Path(git_root_result.stdout.strip())
else:
# Fallback: walk up from stack_source looking for .git
repo_root = stack_source
while repo_root != repo_root.parent:
if (repo_root / ".git").exists():
break
repo_root = repo_root.parent
if spec_file:
# Spec file relative to repo root
spec_file_path = repo_root / spec_file
@ -368,7 +397,14 @@ def restart(ctx, stack_path, spec_file, config_file, force, expected_ip):
print("\n[2/4] Hostname unchanged, skipping DNS verification")
# Step 3: Sync deployment directory with spec
# The spec's "stack:" value is often a relative path (e.g.
# "stack-orchestrator/stacks/dumpster") that must resolve from the
# repo root. Change cwd so stack_is_external() sees it correctly.
print("\n[3/4] Syncing deployment directory...")
import os
prev_cwd = os.getcwd()
os.chdir(repo_root)
deploy_ctx = make_deploy_context(ctx)
create_operation(
deployment_command_context=deploy_ctx,
@ -378,28 +414,216 @@ def restart(ctx, stack_path, spec_file, config_file, force, expected_ip):
network_dir=None,
initial_peers=None,
)
# Reload deployment context with updated spec
deployment_context.init(deployment_context.deployment_dir)
ctx.obj = deployment_context
# Stop deployment
print("\n[4/4] Restarting deployment...")
# Apply updated deployment.
# If maintenance-service is configured, swap Ingress to maintenance
# backend during the Recreate window so users see a branded page
# instead of bare 502s.
print("\n[4/4] Applying deployment update...")
ctx.obj = make_deploy_context(ctx)
down_operation(
ctx, delete_volumes=False, extra_args_list=[], skip_cluster_management=True
)
# Brief pause to ensure clean shutdown
time.sleep(5)
# Check for maintenance service in the (reloaded) spec
maintenance_svc = deployment_context.spec.get_maintenance_service()
if maintenance_svc:
print(f"Maintenance service configured: {maintenance_svc}")
_restart_with_maintenance(
ctx, deployment_context, maintenance_svc, image_overrides
)
else:
up_operation(
ctx,
services_list=None,
stay_attached=False,
skip_cluster_management=True,
image_overrides=image_overrides or None,
)
# Start deployment
up_operation(
ctx, services_list=None, stay_attached=False, skip_cluster_management=True
)
# Restore cwd after both create_operation and up_operation have run.
# Both need the relative stack path to resolve from repo_root.
os.chdir(prev_cwd)
print("\n=== Restart Complete ===")
print("Deployment restarted with git-tracked configuration.")
print("Deployment updated via rolling update.")
if new_hostname and new_hostname != current_hostname:
print(f"\nNew hostname: {new_hostname}")
print("Caddy will automatically provision TLS certificate.")
def _restart_with_maintenance(
ctx, deployment_context, maintenance_svc, image_overrides
):
"""Restart with Ingress swap to maintenance service during Recreate.
Flow:
1. Deploy all pods (including maintenance pod) with up_operation
2. Patch Ingress: swap all route backends to maintenance service
3. Scale main (non-maintenance) Deployments to 0
4. Scale main Deployments back up (triggers Recreate with new spec)
5. Wait for readiness
6. Patch Ingress: restore original backends
This ensures the maintenance pod is already running before we touch
the Ingress, and the main pods get a clean Recreate.
"""
import time
from kubernetes.client.exceptions import ApiException
from stack_orchestrator.deploy.deploy import up_operation
# Step 1: Apply the full deployment (creates/updates all pods + services)
# This ensures maintenance pod exists before we swap Ingress to it.
up_operation(
ctx,
services_list=None,
stay_attached=False,
skip_cluster_management=True,
image_overrides=image_overrides or None,
)
# Parse maintenance service spec: "container-name:port"
maint_container = maintenance_svc.split(":")[0]
maint_port = int(maintenance_svc.split(":")[1])
# Connect to k8s API
deploy_ctx = ctx.obj
deployer = deploy_ctx.deployer
deployer.connect_api()
namespace = deployer.k8s_namespace
app_name = deployer.cluster_info.app_name
networking_api = deployer.networking_api
apps_api = deployer.apps_api
ingress_name = f"{app_name}-ingress"
# Step 2: Read current Ingress and save original backends
try:
ingress = networking_api.read_namespaced_ingress(
name=ingress_name, namespace=namespace
)
except ApiException:
print("Warning: No Ingress found, skipping maintenance swap")
return
# Resolve which service the maintenance container belongs to
maint_service_name = deployer.cluster_info._resolve_service_name_for_container(
maint_container
)
# Save original backends for restoration
original_backends = []
for rule in ingress.spec.rules:
rule_backends = []
for path in rule.http.paths:
rule_backends.append(
{
"name": path.backend.service.name,
"port": path.backend.service.port.number,
}
)
original_backends.append(rule_backends)
# Patch all Ingress backends to point to maintenance service
print("Swapping Ingress to maintenance service...")
for rule in ingress.spec.rules:
for path in rule.http.paths:
path.backend.service.name = maint_service_name
path.backend.service.port.number = maint_port
networking_api.replace_namespaced_ingress(
name=ingress_name, namespace=namespace, body=ingress
)
print("Ingress now points to maintenance service")
# Step 3: Find main (non-maintenance) Deployments and scale to 0
# then back up to trigger a clean Recreate
deployments_resp = apps_api.list_namespaced_deployment(
namespace=namespace, label_selector=f"app={app_name}"
)
main_deployments = []
for dep in deployments_resp.items:
dep_name = dep.metadata.name
# Skip maintenance deployments
component = (dep.metadata.labels or {}).get("app.kubernetes.io/component", "")
is_maintenance = maint_container in component
if not is_maintenance:
main_deployments.append(dep_name)
if main_deployments:
# Scale down main deployments
for dep_name in main_deployments:
print(f"Scaling down {dep_name}...")
apps_api.patch_namespaced_deployment_scale(
name=dep_name,
namespace=namespace,
body={"spec": {"replicas": 0}},
)
# Wait for pods to terminate
print("Waiting for main pods to terminate...")
deadline = time.monotonic() + 120
while time.monotonic() < deadline:
pods = deployer.core_api.list_namespaced_pod(
namespace=namespace,
label_selector=f"app={app_name}",
)
# Count non-maintenance pods
active = sum(
1
for p in pods.items
if p.metadata
and p.metadata.deletion_timestamp is None
and not any(
maint_container in (c.name or "") for c in (p.spec.containers or [])
)
)
if active == 0:
break
time.sleep(2)
# Scale back up
replicas = deployment_context.spec.get_replicas()
for dep_name in main_deployments:
print(f"Scaling up {dep_name} to {replicas} replicas...")
apps_api.patch_namespaced_deployment_scale(
name=dep_name,
namespace=namespace,
body={"spec": {"replicas": replicas}},
)
# Step 5: Wait for readiness
print("Waiting for main pods to become ready...")
deadline = time.monotonic() + 300
while time.monotonic() < deadline:
all_ready = True
for dep_name in main_deployments:
dep = apps_api.read_namespaced_deployment(
name=dep_name, namespace=namespace
)
ready = dep.status.ready_replicas or 0
desired = dep.spec.replicas or 1
if ready < desired:
all_ready = False
break
if all_ready:
break
time.sleep(5)
# Step 6: Restore original Ingress backends
print("Restoring original Ingress backends...")
ingress = networking_api.read_namespaced_ingress(
name=ingress_name, namespace=namespace
)
for i, rule in enumerate(ingress.spec.rules):
for j, path in enumerate(rule.http.paths):
if i < len(original_backends) and j < len(original_backends[i]):
path.backend.service.name = original_backends[i][j]["name"]
path.backend.service.port.number = original_backends[i][j]["port"]
networking_api.replace_namespaced_ingress(
name=ingress_name, namespace=namespace, body=ingress
)
print("Ingress restored to original backends")

View File

@ -577,7 +577,9 @@ def _generate_and_store_secrets(config_vars: dict, deployment_name: str):
return secrets
def create_registry_secret(spec: Spec, deployment_name: str) -> Optional[str]:
def create_registry_secret(
spec: Spec, deployment_name: str, namespace: str = "default"
) -> Optional[str]:
"""Create K8s docker-registry secret from spec + environment.
Reads registry configuration from spec.yml and creates a Kubernetes
@ -586,6 +588,7 @@ def create_registry_secret(spec: Spec, deployment_name: str) -> Optional[str]:
Args:
spec: The deployment spec containing image-registry config
deployment_name: Name of the deployment (used for secret naming)
namespace: K8s namespace to create the secret in
Returns:
The secret name if created, None if no registry config
@ -599,16 +602,29 @@ def create_registry_secret(spec: Spec, deployment_name: str) -> Optional[str]:
server = registry_config.get("server")
username = registry_config.get("username")
token_env = registry_config.get("token-env")
token_file = registry_config.get("token-file")
if not all([server, username, token_env]):
if not server or not username:
return None
if not token_env and not token_file:
return None
# Type narrowing for pyright - we've validated these aren't None above
assert token_env is not None
token = os.environ.get(token_env)
# Resolve token: file takes precedence over env var
token = None
if token_file:
token_path = os.path.expanduser(token_file)
if os.path.exists(token_path):
with open(token_path) as f:
token = f.read().strip()
else:
print(f"Warning: Registry token file '{token_path}' not found")
if not token and token_env:
token = os.environ.get(token_env)
if not token:
source = token_file or token_env
print(
f"Warning: Registry token env var '{token_env}' not set, "
f"Warning: Registry token not available from '{source}', "
"skipping registry secret"
)
return None
@ -620,7 +636,7 @@ def create_registry_secret(spec: Spec, deployment_name: str) -> Optional[str]:
}
# Secret name derived from deployment name
secret_name = f"{deployment_name}-registry"
secret_name = f"{deployment_name}-image-pull-secret"
# Load kube config
try:
@ -633,7 +649,6 @@ def create_registry_secret(spec: Spec, deployment_name: str) -> Optional[str]:
return None
v1 = client.CoreV1Api()
namespace = "default"
k8s_secret = client.V1Secret(
metadata=client.V1ObjectMeta(name=secret_name),
@ -675,6 +690,15 @@ def _write_config_file(
# Write non-secret config to config.env (exclude $generate:...$ tokens)
with open(config_env_file, "w") as output_file:
output_file.write(
"# AUTO-GENERATED by laconic-so from spec.yml config section.\n"
"# Source: stack_orchestrator/deploy/deployment_create.py"
" _write_config_file()\n"
"# Do not edit — changes will be overwritten on deploy create"
" or restart.\n"
"# To change config, edit the config section in your spec.yml"
" and redeploy.\n"
)
if config_vars:
for variable_name, variable_value in config_vars.items():
# Skip variables with generate tokens - they go to K8s Secret
@ -684,6 +708,19 @@ def _write_config_file(
continue
output_file.write(f"{variable_name}={variable_value}\n")
# Append contents of credentials files listed in spec
credentials_files = spec_content.get("credentials-files", []) or []
for cred_path_str in credentials_files:
cred_path = Path(cred_path_str).expanduser()
if not cred_path.exists():
print(f"Error: credentials file does not exist: {cred_path}")
sys.exit(1)
output_file.write(f"# From credentials file: {cred_path_str}\n")
contents = cred_path.read_text()
output_file.write(contents)
if not contents.endswith("\n"):
output_file.write("\n")
def _write_kube_config_file(external_path: Path, internal_path: Path):
if not external_path.exists():
@ -835,9 +872,7 @@ def create_operation(
# Copy from temp to deployment dir, excluding data volumes
# and backing up changed files.
# Exclude data/* to avoid touching user data volumes.
# Exclude config file to preserve deployment settings
# (XXX breaks passing config vars from spec)
exclude_patterns = ["data", "data/*", constants.config_file_name]
exclude_patterns = ["data", "data/*"]
_safe_copy_tree(
temp_dir, deployment_dir_path, exclude_patterns=exclude_patterns
)
@ -1032,12 +1067,8 @@ def _write_deployment_files(
for configmap in parsed_spec.get_configmaps():
source_config_dir = resolve_config_dir(stack_name, configmap)
if os.path.exists(source_config_dir):
destination_config_dir = target_dir.joinpath(
"configmaps", configmap
)
copytree(
source_config_dir, destination_config_dir, dirs_exist_ok=True
)
destination_config_dir = target_dir.joinpath("configmaps", configmap)
copytree(source_config_dir, destination_config_dir, dirs_exist_ok=True)
# Copy the job files into the target dir
jobs = get_job_list(parsed_stack)

View File

@ -82,7 +82,14 @@ class ClusterInfo:
def __init__(self) -> None:
self.parsed_job_yaml_map = {}
def int(self, pod_files: List[str], compose_env_file, deployment_name, spec: Spec, stack_name=""):
def int(
self,
pod_files: List[str],
compose_env_file,
deployment_name,
spec: Spec,
stack_name="",
):
self.parsed_pod_yaml_map = parsed_pod_files_map_from_file_names(pod_files)
# Find the set of images in the pods
self.image_set = images_for_deployment(pod_files)
@ -160,67 +167,99 @@ class ClusterInfo:
nodeports.append(service)
return nodeports
def _resolve_service_name_for_container(self, container_name: str) -> str:
"""Resolve the k8s Service name that routes to a given container.
For multi-pod stacks, each pod has its own Service. We find which
pod file contains this container and return the corresponding
service name. For single-pod stacks, returns the legacy service name.
"""
pod_files = list(self.parsed_pod_yaml_map.keys())
multi_pod = len(pod_files) > 1
if not multi_pod:
return f"{self.app_name}-service"
for pod_file in pod_files:
pod = self.parsed_pod_yaml_map[pod_file]
if container_name in pod.get("services", {}):
pod_name = self._pod_name_from_file(pod_file)
return f"{self.app_name}-{pod_name}-service"
# Fallback: container not found in any pod file
return f"{self.app_name}-service"
def get_ingress(
self, use_tls=False, certificate=None, cluster_issuer="letsencrypt-prod"
self, use_tls=False, certificates=None, cluster_issuer="letsencrypt-prod"
):
# No ingress for a deployment that has no http-proxy defined, for now
http_proxy_info_list = self.spec.get_http_proxy()
ingress = None
if http_proxy_info_list:
# TODO: handle multiple definitions
http_proxy_info = http_proxy_info_list[0]
if opts.o.debug:
print(f"http-proxy: {http_proxy_info}")
# TODO: good enough parsing for webapp deployment for now
host_name = http_proxy_info["host-name"]
rules = []
tls = (
[
client.V1IngressTLS(
hosts=certificate["spec"]["dnsNames"]
if certificate
else [host_name],
secret_name=certificate["spec"]["secretName"]
if certificate
else f"{self.app_name}-tls",
)
]
if use_tls
else None
)
paths = []
for route in http_proxy_info["routes"]:
path = route["path"]
proxy_to = route["proxy-to"]
tls = [] if use_tls else None
for http_proxy_info in http_proxy_info_list:
if opts.o.debug:
print(f"proxy config: {path} -> {proxy_to}")
# proxy_to has the form <service>:<port>
proxy_to_port = int(proxy_to.split(":")[1])
paths.append(
client.V1HTTPIngressPath(
path_type="Prefix",
path=path,
backend=client.V1IngressBackend(
service=client.V1IngressServiceBackend(
# TODO: this looks wrong
name=f"{self.app_name}-service",
# TODO: pull port number from the service
port=client.V1ServiceBackendPort(number=proxy_to_port),
)
),
print(f"http-proxy: {http_proxy_info}")
host_name = http_proxy_info["host-name"]
certificate = (certificates or {}).get(host_name)
if use_tls:
tls.append(
client.V1IngressTLS(
hosts=(
certificate["spec"]["dnsNames"]
if certificate
else [host_name]
),
secret_name=(
certificate["spec"]["secretName"]
if certificate
else f"{self.app_name}-{host_name}-tls"
),
)
)
paths = []
for route in http_proxy_info["routes"]:
path = route["path"]
proxy_to = route["proxy-to"]
if opts.o.debug:
print(f"proxy config: {path} -> {proxy_to}")
# proxy_to has the form <service>:<port>
container_name = proxy_to.split(":")[0]
proxy_to_port = int(proxy_to.split(":")[1])
service_name = self._resolve_service_name_for_container(
container_name
)
paths.append(
client.V1HTTPIngressPath(
path_type="Prefix",
path=path,
backend=client.V1IngressBackend(
service=client.V1IngressServiceBackend(
name=service_name,
port=client.V1ServiceBackendPort(
number=proxy_to_port
),
)
),
)
)
rules.append(
client.V1IngressRule(
host=host_name,
http=client.V1HTTPIngressRuleValue(paths=paths),
)
)
rules.append(
client.V1IngressRule(
host=host_name, http=client.V1HTTPIngressRuleValue(paths=paths)
)
)
spec = client.V1IngressSpec(tls=tls, rules=rules)
ingress_annotations = {
"kubernetes.io/ingress.class": "caddy",
}
if not certificate:
if not certificates:
ingress_annotations["cert-manager.io/cluster-issuer"] = cluster_issuer
ingress = client.V1Ingress(
@ -233,6 +272,28 @@ class ClusterInfo:
)
return ingress
def _get_readiness_probe_ports(self) -> dict:
"""Map container names to TCP readiness probe ports.
Derives probe ports from http-proxy routes in the spec. If a container
has an http-proxy route (proxy-to: container:port), we probe that port.
This tells k8s when the container is ready to serve traffic, which is
required for safe rolling updates.
"""
probe_ports: dict = {}
http_proxy_list = self.spec.get_http_proxy()
if http_proxy_list:
for http_proxy in http_proxy_list:
for route in http_proxy.get("routes", []):
proxy_to = route.get("proxy-to", "")
if ":" in proxy_to:
container, port_str = proxy_to.rsplit(":", 1)
port = int(port_str)
# Use the first route's port for each container
if container not in probe_ports:
probe_ports[container] = port
return probe_ports
# TODO: suppoprt multiple services
def get_service(self):
# Collect all ports from http-proxy routes
@ -288,8 +349,7 @@ class ClusterInfo:
# Per-volume resources override global, which overrides default.
vol_resources = (
self.spec.get_volume_resources_for(volume_name)
or global_resources
self.spec.get_volume_resources_for(volume_name) or global_resources
)
labels = {
@ -329,6 +389,7 @@ class ClusterInfo:
print(f"{cfg_map_name} not in pod files")
continue
cfg_map_path = os.path.expanduser(cfg_map_path)
if not cfg_map_path.startswith("/") and self.spec.file_path is not None:
cfg_map_path = os.path.join(
os.path.dirname(str(self.spec.file_path)), cfg_map_path
@ -391,12 +452,15 @@ class ClusterInfo:
continue
vol_resources = (
self.spec.get_volume_resources_for(volume_name)
or global_resources
self.spec.get_volume_resources_for(volume_name) or global_resources
)
if self.spec.is_kind_deployment():
host_path = client.V1HostPathVolumeSource(
path=get_kind_pv_bind_mount_path(volume_name)
path=get_kind_pv_bind_mount_path(
volume_name,
kind_mount_root=self.spec.get_kind_mount_root(),
host_path=volume_path,
)
)
else:
host_path = client.V1HostPathVolumeSource(path=volume_path)
@ -467,6 +531,7 @@ class ClusterInfo:
containers = []
init_containers = []
services = {}
readiness_probe_ports = self._get_readiness_probe_ports()
global_resources = self.spec.get_container_resources()
if not global_resources:
global_resources = DEFAULT_CONTAINER_RESOURCES
@ -527,9 +592,7 @@ class ClusterInfo:
if self.spec.get_image_registry() is not None
else image
)
volume_mounts = volume_mounts_for_service(
parsed_yaml_map, service_name
)
volume_mounts = volume_mounts_for_service(parsed_yaml_map, service_name)
# Handle command/entrypoint from compose file
# In docker-compose: entrypoint -> k8s command, command -> k8s args
container_command = None
@ -565,6 +628,16 @@ class ClusterInfo:
container_resources = self._resolve_container_resources(
container_name, service_info, global_resources
)
# Readiness probe from http-proxy routes
readiness_probe = None
probe_port = readiness_probe_ports.get(container_name)
if probe_port:
readiness_probe = client.V1Probe(
tcp_socket=client.V1TCPSocketAction(port=probe_port),
initial_delay_seconds=5,
period_seconds=10,
failure_threshold=3,
)
container = client.V1Container(
name=container_name,
image=image_to_use,
@ -575,14 +648,19 @@ class ClusterInfo:
env_from=env_from,
ports=container_ports if container_ports else None,
volume_mounts=volume_mounts,
readiness_probe=readiness_probe,
security_context=client.V1SecurityContext(
privileged=self.spec.get_privileged(),
run_as_user=int(service_info["user"]) if "user" in service_info else None,
capabilities=client.V1Capabilities(
add=self.spec.get_capabilities()
)
if self.spec.get_capabilities()
else None,
run_as_user=(
int(service_info["user"])
if "user" in service_info
else None
),
capabilities=(
client.V1Capabilities(add=self.spec.get_capabilities())
if self.spec.get_capabilities()
else None
),
),
resources=to_k8s_resource_requirements(container_resources),
)
@ -591,33 +669,53 @@ class ClusterInfo:
svc_labels = service_info.get("labels", {})
if isinstance(svc_labels, list):
# docker-compose labels can be a list of "key=value"
svc_labels = dict(
item.split("=", 1) for item in svc_labels
)
is_init = str(
svc_labels.get("laconic.init-container", "")
).lower() in ("true", "1", "yes")
svc_labels = dict(item.split("=", 1) for item in svc_labels)
is_init = str(svc_labels.get("laconic.init-container", "")).lower() in (
"true",
"1",
"yes",
)
if is_init:
init_containers.append(container)
else:
containers.append(container)
volumes = volumes_for_pod_files(
parsed_yaml_map, self.spec, self.app_name
)
volumes = volumes_for_pod_files(parsed_yaml_map, self.spec, self.app_name)
return containers, init_containers, services, volumes
# TODO: put things like image pull policy into an object-scope struct
def get_deployment(self, image_pull_policy: Optional[str] = None):
containers, init_containers, services, volumes = self._build_containers(
self.parsed_pod_yaml_map, image_pull_policy
)
registry_config = self.spec.get_image_registry_config()
if registry_config:
secret_name = f"{self.app_name}-registry"
image_pull_secrets = [client.V1LocalObjectReference(name=secret_name)]
else:
image_pull_secrets = []
def _pod_name_from_file(self, pod_file: str) -> str:
"""Extract pod name from compose file path.
docker-compose-dumpster.yml -> dumpster
docker-compose-dumpster-maintenance.yml -> dumpster-maintenance
"""
import os
base = os.path.basename(pod_file)
name = base
if name.startswith("docker-compose-"):
name = name[len("docker-compose-") :]
if name.endswith(".yml"):
name = name[: -len(".yml")]
elif name.endswith(".yaml"):
name = name[: -len(".yaml")]
return name
def _pod_has_pvcs(self, parsed_pod_file: Any) -> bool:
"""Check if a parsed compose file declares volumes that become PVCs.
Excludes volumes that are ConfigMaps (declared in spec.configmaps),
since those don't require Recreate strategy.
"""
volumes = parsed_pod_file.get("volumes", {})
configmaps = set(self.spec.get_configmaps().keys())
pvc_volumes = [v for v in volumes if v not in configmaps]
return len(pvc_volumes) > 0
def _build_common_pod_metadata(self, services: dict) -> tuple:
"""Build shared annotations, labels, affinity, tolerations for pods.
Returns (annotations, labels, affinity, tolerations).
"""
annotations = None
labels = {"app": self.app_name}
if self.stack_name:
@ -639,7 +737,6 @@ class ClusterInfo:
if self.spec.get_node_affinities():
affinities = []
for rule in self.spec.get_node_affinities():
# TODO add some input validation here
label_name = rule["label"]
label_value = rule["value"]
affinities.append(
@ -662,7 +759,6 @@ class ClusterInfo:
if self.spec.get_node_tolerations():
tolerations = []
for toleration in self.spec.get_node_tolerations():
# TODO add some input validation here
toleration_key = toleration["key"]
toleration_value = toleration["value"]
tolerations.append(
@ -674,37 +770,224 @@ class ClusterInfo:
)
)
use_host_network = self._any_service_has_host_network()
template = client.V1PodTemplateSpec(
metadata=client.V1ObjectMeta(annotations=annotations, labels=labels),
spec=client.V1PodSpec(
containers=containers,
init_containers=init_containers or None,
image_pull_secrets=image_pull_secrets,
volumes=volumes,
affinity=affinity,
tolerations=tolerations,
runtime_class_name=self.spec.get_runtime_class(),
host_network=use_host_network or None,
dns_policy=("ClusterFirstWithHostNet" if use_host_network else None),
),
)
spec = client.V1DeploymentSpec(
replicas=self.spec.get_replicas(),
template=template,
selector={"matchLabels": {"app": self.app_name}},
)
return annotations, labels, affinity, tolerations
deployment = client.V1Deployment(
api_version="apps/v1",
kind="Deployment",
metadata=client.V1ObjectMeta(
name=f"{self.app_name}-deployment",
labels={"app": self.app_name, **({"app.kubernetes.io/stack": self.stack_name} if self.stack_name else {})},
),
spec=spec,
)
return deployment
# TODO: put things like image pull policy into an object-scope struct
def get_deployment(self, image_pull_policy: Optional[str] = None):
"""Build a single k8s Deployment from all pod files (legacy behavior).
When only one pod is defined in the stack, this is equivalent to
get_deployments()[0]. Kept for backward compatibility.
"""
deployments = self.get_deployments(image_pull_policy)
if not deployments:
return None
# Legacy: return the first (and usually only) deployment
return deployments[0]
def get_deployments(
self, image_pull_policy: Optional[str] = None
) -> List[client.V1Deployment]:
"""Build one k8s Deployment per pod file.
Each pod file (docker-compose-<name>.yml) becomes its own Deployment
with independent lifecycle and update strategy:
- Pods with PVCs get strategy=Recreate (can't do rolling updates
with ReadWriteOnce volumes)
- Pods without PVCs get strategy=RollingUpdate
This enables maintenance services to survive main pod restarts.
"""
if not self.parsed_pod_yaml_map:
return []
registry_config = self.spec.get_image_registry_config()
if registry_config:
secret_name = f"{self.app_name}-image-pull-secret"
image_pull_secrets = [client.V1LocalObjectReference(name=secret_name)]
else:
image_pull_secrets = []
use_host_network = self._any_service_has_host_network()
pod_files = list(self.parsed_pod_yaml_map.keys())
# Single pod file: preserve legacy naming ({app_name}-deployment)
# Multiple pod files: use {app_name}-{pod_name}-deployment
multi_pod = len(pod_files) > 1
deployments = []
for pod_file in pod_files:
pod_name = self._pod_name_from_file(pod_file)
single_pod_map = {pod_file: self.parsed_pod_yaml_map[pod_file]}
containers, init_containers, services, volumes = self._build_containers(
single_pod_map, image_pull_policy
)
annotations, labels, affinity, tolerations = (
self._build_common_pod_metadata(services)
)
# Add pod-name label so Services can target specific pods
if multi_pod:
labels["app.kubernetes.io/component"] = pod_name
has_pvcs = self._pod_has_pvcs(self.parsed_pod_yaml_map[pod_file])
if has_pvcs:
strategy = client.V1DeploymentStrategy(type="Recreate")
else:
strategy = client.V1DeploymentStrategy(
type="RollingUpdate",
rolling_update=client.V1RollingUpdateDeployment(
max_unavailable=0, max_surge=1
),
)
# Pod selector: for multi-pod, select by both app and component
selector_labels = {"app": self.app_name}
if multi_pod:
selector_labels["app.kubernetes.io/component"] = pod_name
# Add CA certificate volume and env vars if configured
_ca_secret, ca_volume, ca_mounts, ca_envs = (
self.get_ca_certificate_resources()
)
if ca_volume:
volumes.append(ca_volume)
for container in containers:
if container.volume_mounts is None:
container.volume_mounts = []
container.volume_mounts.extend(ca_mounts)
if container.env is None:
container.env = []
container.env.extend(ca_envs)
template = client.V1PodTemplateSpec(
metadata=client.V1ObjectMeta(annotations=annotations, labels=labels),
spec=client.V1PodSpec(
containers=containers,
init_containers=init_containers or None,
image_pull_secrets=image_pull_secrets,
volumes=volumes,
affinity=affinity,
tolerations=tolerations,
runtime_class_name=self.spec.get_runtime_class(),
host_network=use_host_network or None,
dns_policy=(
"ClusterFirstWithHostNet" if use_host_network else None
),
),
)
if multi_pod:
deployment_name = f"{self.app_name}-{pod_name}-deployment"
else:
deployment_name = f"{self.app_name}-deployment"
spec = client.V1DeploymentSpec(
replicas=self.spec.get_replicas(),
template=template,
selector={"matchLabels": selector_labels},
strategy=strategy,
)
deployment = client.V1Deployment(
api_version="apps/v1",
kind="Deployment",
metadata=client.V1ObjectMeta(
name=deployment_name,
labels={
"app": self.app_name,
**(
{
"app.kubernetes.io/stack": self.stack_name,
}
if self.stack_name
else {}
),
**(
{"app.kubernetes.io/component": pod_name}
if multi_pod
else {}
),
},
),
spec=spec,
)
deployments.append(deployment)
return deployments
def get_services(self) -> List[client.V1Service]:
"""Build per-pod ClusterIP Services for multi-pod stacks.
Each pod's containers get their own Service so Ingress can route
to specific pods. For single-pod stacks, returns a list with one
service matching the legacy get_service() behavior.
"""
pod_files = list(self.parsed_pod_yaml_map.keys())
multi_pod = len(pod_files) > 1
if not multi_pod:
# Legacy: single service for all pods
svc = self.get_service()
return [svc] if svc else []
# Multi-pod: one service per pod, only for pods that have
# ports referenced by http-proxy routes
http_proxy_list = self.spec.get_http_proxy()
if not http_proxy_list:
return []
# Build map: container_name -> port from http-proxy routes
container_ports: dict = {}
for http_proxy in http_proxy_list:
for route in http_proxy.get("routes", []):
proxy_to = route.get("proxy-to", "")
if ":" in proxy_to:
container, port_str = proxy_to.rsplit(":", 1)
port = int(port_str)
if container not in container_ports:
container_ports[container] = set()
container_ports[container].add(port)
# Build map: pod_file -> set of service names in that pod
pod_services_map: dict = {}
for pod_file in pod_files:
pod = self.parsed_pod_yaml_map[pod_file]
pod_services_map[pod_file] = set(pod.get("services", {}).keys())
services = []
for pod_file in pod_files:
pod_name = self._pod_name_from_file(pod_file)
svc_names = pod_services_map[pod_file]
# Collect ports from http-proxy that belong to this pod's containers
ports_set: Set[int] = set()
for svc_name in svc_names:
if svc_name in container_ports:
ports_set.update(container_ports[svc_name])
if not ports_set:
continue
service_ports = [
client.V1ServicePort(port=p, target_port=p, name=f"port-{p}")
for p in sorted(ports_set)
]
service = client.V1Service(
metadata=client.V1ObjectMeta(
name=f"{self.app_name}-{pod_name}-service",
labels={"app": self.app_name},
),
spec=client.V1ServiceSpec(
type="ClusterIP",
ports=service_ports,
selector={
"app": self.app_name,
"app.kubernetes.io/component": pod_name,
},
),
)
services.append(service)
return services
def get_jobs(self, image_pull_policy: Optional[str] = None) -> List[client.V1Job]:
"""Build k8s Job objects from parsed job compose files.
@ -720,7 +1003,7 @@ class ClusterInfo:
jobs = []
registry_config = self.spec.get_image_registry_config()
if registry_config:
secret_name = f"{self.app_name}-registry"
secret_name = f"{self.app_name}-image-pull-secret"
image_pull_secrets = [client.V1LocalObjectReference(name=secret_name)]
else:
image_pull_secrets = []
@ -728,8 +1011,8 @@ class ClusterInfo:
for job_file in self.parsed_job_yaml_map:
# Build containers for this single job file
single_job_map = {job_file: self.parsed_job_yaml_map[job_file]}
containers, init_containers, _services, volumes = (
self._build_containers(single_job_map, image_pull_policy)
containers, init_containers, _services, volumes = self._build_containers(
single_job_map, image_pull_policy
)
# Derive job name from file path: docker-compose-<name>.yml -> <name>
@ -737,7 +1020,7 @@ class ClusterInfo:
# Strip docker-compose- prefix and .yml suffix
job_name = base
if job_name.startswith("docker-compose-"):
job_name = job_name[len("docker-compose-"):]
job_name = job_name[len("docker-compose-") :]
if job_name.endswith(".yml"):
job_name = job_name[: -len(".yml")]
elif job_name.endswith(".yaml"):
@ -747,12 +1030,14 @@ class ClusterInfo:
# picked up by pods_in_deployment() which queries app={app_name}.
pod_labels = {
"app": f"{self.app_name}-job",
**({"app.kubernetes.io/stack": self.stack_name} if self.stack_name else {}),
**(
{"app.kubernetes.io/stack": self.stack_name}
if self.stack_name
else {}
),
}
template = client.V1PodTemplateSpec(
metadata=client.V1ObjectMeta(
labels=pod_labels
),
metadata=client.V1ObjectMeta(labels=pod_labels),
spec=client.V1PodSpec(
containers=containers,
init_containers=init_containers or None,
@ -765,7 +1050,14 @@ class ClusterInfo:
template=template,
backoff_limit=0,
)
job_labels = {"app": self.app_name, **({"app.kubernetes.io/stack": self.stack_name} if self.stack_name else {})}
job_labels = {
"app": self.app_name,
**(
{"app.kubernetes.io/stack": self.stack_name}
if self.stack_name
else {}
),
}
job = client.V1Job(
api_version="batch/v1",
kind="Job",
@ -778,3 +1070,130 @@ class ClusterInfo:
jobs.append(job)
return jobs
def get_external_service_resources(self) -> List:
"""Build k8s Services (and Endpoints) for external-services in spec.
Two modes:
- host mode: ExternalName Service (DNS CNAME to external host)
- selector mode: headless Service + Endpoints (cross-namespace
routing to a mock pod, IP discovered at deploy time)
Returns a flat list of k8s resource objects (Services + Endpoints).
"""
ext_services = self.spec.get_external_services()
if not ext_services:
return []
resources = []
for name, config in ext_services.items():
port = config.get("port", 443)
if "host" in config:
# ExternalName: DNS CNAME to external host
svc = client.V1Service(
metadata=client.V1ObjectMeta(
name=name,
labels={"app": self.app_name},
),
spec=client.V1ServiceSpec(
type="ExternalName",
external_name=config["host"],
ports=[
client.V1ServicePort(port=port, name=f"port-{port}")
],
),
)
resources.append(svc)
elif "selector" in config and "namespace" in config:
# Cross-namespace headless Service + Endpoints.
# The Endpoints IP is populated in deploy_k8s.py at deploy
# time by querying the target namespace for matching pods.
svc = client.V1Service(
metadata=client.V1ObjectMeta(
name=name,
labels={"app": self.app_name},
),
spec=client.V1ServiceSpec(
cluster_ip="None",
ports=[
client.V1ServicePort(port=port, name=f"port-{port}")
],
),
)
resources.append(svc)
# Endpoints object is created in deploy_k8s.py after pod
# IP discovery — we just return the Service here.
return resources
def get_ca_certificate_resources(self) -> tuple:
"""Build k8s Secret and volume mount config for CA certificates.
Returns (secret, volume, volume_mount, env_vars) or (None, ...) if
no CA certificates are configured. The caller must add the volume
and mount to all containers, and the env vars to all containers.
"""
ca_files = self.spec.get_ca_certificates()
if not ca_files:
return None, None, None, []
# Concatenate all CA files into one Secret
secret_data = {}
for i, ca_path in enumerate(ca_files):
expanded = os.path.expanduser(ca_path)
if not os.path.exists(expanded):
print(f"Warning: CA certificate file not found: {expanded}")
continue
with open(expanded, "rb") as f:
ca_bytes = f.read()
key = f"laconic-extra-ca-{i}.pem"
secret_data[key] = base64.b64encode(ca_bytes).decode()
if not secret_data:
return None, None, None, []
secret_name = f"{self.app_name}-ca-certificates"
secret = client.V1Secret(
metadata=client.V1ObjectMeta(
name=secret_name,
labels={"app": self.app_name},
),
data=secret_data,
)
volume = client.V1Volume(
name="laconic-ca-certs",
secret=client.V1SecretVolumeSource(
secret_name=secret_name,
),
)
# Mount each CA file into /etc/ssl/certs/ (Go reads this dir)
# Mount each CA file directly into /etc/ssl/certs/ using subPath
# so Go's x509 package picks them up (it reads *.pem from that dir).
# Also return env vars for Node/Bun containers.
volume_mounts = []
first_mount_path = None
for key in secret_data.keys():
mount_path = f"/etc/ssl/certs/{key}"
if first_mount_path is None:
first_mount_path = mount_path
volume_mounts.append(
client.V1VolumeMount(
name="laconic-ca-certs",
mount_path=mount_path,
sub_path=key,
read_only=True,
)
)
env_vars = [
client.V1EnvVar(
name="NODE_EXTRA_CA_CERTS",
value=first_mount_path,
),
]
return secret, volume, volume_mounts, env_vars

View File

@ -115,6 +115,7 @@ class K8sDeployer(Deployer):
) -> None:
self.type = type
self.skip_cluster_management = False
self.image_overrides = None
self.k8s_namespace = "default" # Will be overridden below if context exists
# TODO: workaround pending refactoring above to cope with being
# created with a null deployment_context
@ -122,9 +123,13 @@ class K8sDeployer(Deployer):
return
self.deployment_dir = deployment_context.deployment_dir
self.deployment_context = deployment_context
self.kind_cluster_name = deployment_context.spec.get_kind_cluster_name() or compose_project_name
self.kind_cluster_name = (
deployment_context.spec.get_kind_cluster_name() or compose_project_name
)
# Use spec namespace if provided, otherwise derive from cluster-id
self.k8s_namespace = deployment_context.spec.get_namespace() or f"laconic-{compose_project_name}"
self.k8s_namespace = (
deployment_context.spec.get_namespace() or f"laconic-{compose_project_name}"
)
self.cluster_info = ClusterInfo()
# stack.name may be an absolute path (from spec "stack:" key after
# path resolution). Extract just the directory basename for labels.
@ -204,6 +209,43 @@ class K8sDeployer(Deployer):
else:
raise
def _wait_for_namespace_gone(self, timeout_seconds: int = 120):
"""Wait for namespace to finish terminating."""
if opts.o.dry_run:
return
import time
deadline = time.monotonic() + timeout_seconds
while time.monotonic() < deadline:
try:
ns = self.core_api.read_namespace(name=self.k8s_namespace)
if ns.status and ns.status.phase == "Terminating":
if opts.o.debug:
print(
f"Waiting for namespace {self.k8s_namespace}"
" to finish terminating..."
)
time.sleep(2)
continue
# Namespace exists and is Active — shouldn't happen after delete
break
except ApiException as e:
if e.status == 404:
# Gone — success
return
raise
# If we get here, namespace still exists after timeout
try:
self.core_api.read_namespace(name=self.k8s_namespace)
print(
f"Warning: namespace {self.k8s_namespace} still exists"
f" after {timeout_seconds}s"
)
except ApiException as e:
if e.status == 404:
return
raise
def _delete_resources_by_label(self, label_selector: str, delete_volumes: bool):
"""Delete only this stack's resources from a shared namespace."""
ns = self.k8s_namespace
@ -232,7 +274,8 @@ class K8sDeployer(Deployer):
for job in jobs.items:
print(f"Deleting Job {job.metadata.name}")
self.batch_api.delete_namespaced_job(
name=job.metadata.name, namespace=ns,
name=job.metadata.name,
namespace=ns,
body=client.V1DeleteOptions(propagation_policy="Background"),
)
except ApiException as e:
@ -303,7 +346,22 @@ class K8sDeployer(Deployer):
name=pv.metadata.name
)
if pv_resp:
if opts.o.debug:
# If PV is in Released state (stale claimRef from a
# previous deployment), clear the claimRef so a new
# PVC can bind to it. This happens after stop+start
# because stop deletes the namespace (and PVCs) but
# preserves PVs by default.
if pv_resp.status and pv_resp.status.phase == "Released":
print(
f"PV {pv.metadata.name} is Released, "
"clearing claimRef for rebinding"
)
pv_resp.spec.claim_ref = None
self.core_api.patch_persistent_volume(
name=pv.metadata.name,
body={"spec": {"claimRef": None}},
)
elif opts.o.debug:
print("PVs already present:")
print(f"{pv_resp}")
continue
@ -347,12 +405,148 @@ class K8sDeployer(Deployer):
if opts.o.debug:
print(f"Sending this ConfigMap: {cfg_map}")
if not opts.o.dry_run:
cfg_rsp = self.core_api.create_namespaced_config_map(
body=cfg_map, namespace=self.k8s_namespace
cm_name = cfg_map.metadata.name
try:
self.core_api.create_namespaced_config_map(
body=cfg_map, namespace=self.k8s_namespace
)
except ApiException as e:
if e.status == 409:
self.core_api.patch_namespaced_config_map(
name=cm_name,
namespace=self.k8s_namespace,
body=cfg_map,
)
else:
raise
def _create_external_services(self):
"""Create k8s Services for external-services declared in the spec.
For host mode: ExternalName Service (DNS CNAME).
For selector mode: headless Service + Endpoints with pod IPs
discovered from the target namespace.
"""
resources = self.cluster_info.get_external_service_resources()
ext_services = self.cluster_info.spec.get_external_services()
for resource in resources:
if opts.o.dry_run:
print(f"Dry run: would create external service: {resource.metadata.name}")
continue
svc_name = resource.metadata.name
try:
self.core_api.create_namespaced_service(
body=resource, namespace=self.k8s_namespace
)
if opts.o.debug:
print("ConfigMap created:")
print(f"{cfg_rsp}")
print(f"Created external service '{svc_name}'")
except ApiException as e:
if e.status == 409:
self.core_api.replace_namespaced_service(
name=svc_name,
namespace=self.k8s_namespace,
body=resource,
)
print(f"Updated external service '{svc_name}'")
else:
raise
# Create Endpoints for selector-mode services
for name, config in ext_services.items():
if "selector" not in config or "namespace" not in config:
continue
if opts.o.dry_run:
continue
target_ns = config["namespace"]
selector = config["selector"]
port = config.get("port", 443)
# Build label selector string from dict
label_selector = ",".join(f"{k}={v}" for k, v in selector.items())
# Discover pod IPs in target namespace
pods = self.core_api.list_namespaced_pod(
namespace=target_ns, label_selector=label_selector
)
pod_ips = [
p.status.pod_ip
for p in pods.items
if p.status and p.status.pod_ip
]
if not pod_ips:
print(
f"Warning: no pods found in {target_ns} matching "
f"{label_selector} for external service '{name}'"
)
continue
endpoints = client.V1Endpoints(
metadata=client.V1ObjectMeta(
name=name,
labels={"app": self.cluster_info.app_name},
),
subsets=[
client.V1EndpointSubset(
addresses=[
client.V1EndpointAddress(ip=ip) for ip in pod_ips
],
ports=[
client.CoreV1EndpointPort(
port=port, name=f"port-{port}"
)
],
)
],
)
try:
self.core_api.create_namespaced_endpoints(
body=endpoints, namespace=self.k8s_namespace
)
print(f"Created endpoints for '{name}'{pod_ips}")
except ApiException as e:
if e.status == 409:
self.core_api.replace_namespaced_endpoints(
name=name,
namespace=self.k8s_namespace,
body=endpoints,
)
print(f"Updated endpoints for '{name}'{pod_ips}")
else:
raise
def _create_ca_certificates(self):
"""Create k8s Secret for CA certificates declared in the spec.
The Secret is mounted into containers by get_deployments() in
cluster_info.py. This method just ensures the Secret exists.
"""
ca_secret, _, _, _ = self.cluster_info.get_ca_certificate_resources()
if not ca_secret:
return
if opts.o.dry_run:
print(f"Dry run: would create CA certificate secret")
return
secret_name = ca_secret.metadata.name
try:
self.core_api.create_namespaced_secret(
body=ca_secret, namespace=self.k8s_namespace
)
print(f"Created CA certificate secret '{secret_name}'")
except ApiException as e:
if e.status == 409:
self.core_api.replace_namespaced_secret(
name=secret_name,
namespace=self.k8s_namespace,
body=ca_secret,
)
print(f"Updated CA certificate secret '{secret_name}'")
else:
raise
def _create_deployment(self):
# Skip if there are no pods to deploy (e.g. jobs-only stacks)
@ -360,48 +554,109 @@ class K8sDeployer(Deployer):
if opts.o.debug:
print("No pods defined, skipping Deployment creation")
return
# Process compose files into a Deployment
deployment = self.cluster_info.get_deployment(
image_pull_policy=None if self.is_kind() else "Always"
)
# Create the k8s objects
if opts.o.debug:
print(f"Sending this deployment: {deployment}")
if not opts.o.dry_run:
deployment_resp = cast(
client.V1Deployment,
self.apps_api.create_namespaced_deployment(
body=deployment, namespace=self.k8s_namespace
),
)
# Process compose files into Deployments (one per pod file)
# image-pull-policy from spec, default Always (production).
# Testing specs use IfNotPresent so kind-loaded local images are used.
pull_policy = self.cluster_info.spec.get("image-pull-policy", "Always")
deployments = self.cluster_info.get_deployments(image_pull_policy=pull_policy)
for deployment in deployments:
# Apply image overrides if provided
if self.image_overrides:
for container in deployment.spec.template.spec.containers:
if container.name in self.image_overrides:
container.image = self.image_overrides[container.name]
if opts.o.debug:
print(
f"Overriding image for {container.name}:"
f" {container.image}"
)
# Create or update the k8s Deployment
if opts.o.debug:
print("Deployment created:")
meta = deployment_resp.metadata
spec = deployment_resp.spec
if meta and spec and spec.template.spec:
ns = meta.namespace
name = meta.name
gen = meta.generation
containers = spec.template.spec.containers
img = containers[0].image if containers else None
print(f"{ns} {name} {gen} {img}")
print(f"Sending this deployment: {deployment}")
if not opts.o.dry_run:
name = deployment.metadata.name
try:
deployment_resp = cast(
client.V1Deployment,
self.apps_api.create_namespaced_deployment(
body=deployment, namespace=self.k8s_namespace
),
)
strategy = (
deployment.spec.strategy.type
if deployment.spec.strategy
else "default"
)
print(f"Created Deployment {name} (strategy: {strategy})")
except ApiException as e:
if e.status == 409:
# Already exists — replace to ensure removed fields
# (volumes, mounts, env vars) are actually deleted.
existing = self.apps_api.read_namespaced_deployment(
name=name, namespace=self.k8s_namespace
)
deployment.metadata.resource_version = (
existing.metadata.resource_version
)
deployment_resp = cast(
client.V1Deployment,
self.apps_api.replace_namespaced_deployment(
name=name,
namespace=self.k8s_namespace,
body=deployment,
),
)
print(f"Updated Deployment {name} (rolling update)")
else:
raise
if opts.o.debug:
meta = deployment_resp.metadata
spec = deployment_resp.spec
if meta and spec and spec.template.spec:
containers = spec.template.spec.containers
img = containers[0].image if containers else None
print(
f" {meta.namespace} {meta.name}"
f" gen={meta.generation} {img}"
)
service = self.cluster_info.get_service()
if opts.o.debug:
print(f"Sending this service: {service}")
if service and not opts.o.dry_run:
service_resp = self.core_api.create_namespaced_service(
namespace=self.k8s_namespace, body=service
)
# Create Services (one per pod for multi-pod, or one for single-pod)
services = self.cluster_info.get_services()
for service in services:
if opts.o.debug:
print("Service created:")
print(f"{service_resp}")
print(f"Sending this service: {service}")
if service and not opts.o.dry_run:
svc_name = service.metadata.name
try:
service_resp = self.core_api.create_namespaced_service(
namespace=self.k8s_namespace, body=service
)
print(f"Created Service {svc_name}")
except ApiException as e:
if e.status == 409:
# Replace to ensure removed ports are deleted.
# Must preserve clusterIP (immutable) and resourceVersion.
existing = self.core_api.read_namespaced_service(
name=svc_name, namespace=self.k8s_namespace
)
service.metadata.resource_version = (
existing.metadata.resource_version
)
service.spec.cluster_ip = existing.spec.cluster_ip
service_resp = self.core_api.replace_namespaced_service(
name=svc_name,
namespace=self.k8s_namespace,
body=service,
)
print(f"Updated Service {svc_name}")
else:
raise
if opts.o.debug:
print(f" {service_resp}")
def _create_jobs(self):
# Process job compose files into k8s Jobs
jobs = self.cluster_info.get_jobs(
image_pull_policy=None if self.is_kind() else "Always"
)
jobs = self.cluster_info.get_jobs(image_pull_policy="Always")
for job in jobs:
if opts.o.debug:
print(f"Sending this job: {job}")
@ -453,107 +708,149 @@ class K8sDeployer(Deployer):
return cert
return None
def up(self, detach, skip_cluster_management, services):
def _setup_cluster(self):
"""Create/reuse kind cluster, load images, ensure namespace."""
if self.is_kind() and not self.skip_cluster_management:
kind_config = str(
self.deployment_dir.joinpath(constants.kind_config_filename)
)
actual_cluster = create_cluster(self.kind_cluster_name, kind_config)
if actual_cluster != self.kind_cluster_name:
self.kind_cluster_name = actual_cluster
# Only load locally-built images into kind
local_containers = self.deployment_context.stack.obj.get("containers", [])
if local_containers:
local_images = {
img
for img in self.cluster_info.image_set
if any(c in img for c in local_containers)
}
if local_images:
load_images_into_kind(self.kind_cluster_name, local_images)
self.connect_api()
self._ensure_namespace()
if self.is_kind() and not self.skip_cluster_management:
if not is_ingress_running():
install_ingress_for_kind(self.cluster_info.spec.get_acme_email())
wait_for_ingress_in_kind()
if self.cluster_info.spec.get_unlimited_memlock():
_create_runtime_class(
constants.high_memlock_runtime,
constants.high_memlock_runtime,
)
def _create_ingress(self):
"""Create or update Ingress with TLS certificate lookup."""
http_proxy_info = self.cluster_info.spec.get_http_proxy()
use_tls = http_proxy_info and not self.is_kind()
certificates = None
if use_tls:
certificates = {}
for proxy in http_proxy_info:
host_name = proxy["host-name"]
cert = self._find_certificate_for_host_name(host_name)
if cert:
certificates[host_name] = cert
if opts.o.debug:
print(f"Using existing certificate for {host_name}: {cert}")
ingress = self.cluster_info.get_ingress(
use_tls=use_tls, certificates=certificates
)
if ingress:
if opts.o.debug:
print(f"Sending this ingress: {ingress}")
if not opts.o.dry_run:
ing_name = ingress.metadata.name
try:
self.networking_api.create_namespaced_ingress(
namespace=self.k8s_namespace, body=ingress
)
print(f"Created Ingress {ing_name}")
except ApiException as e:
if e.status == 409:
existing = self.networking_api.read_namespaced_ingress(
name=ing_name, namespace=self.k8s_namespace
)
ingress.metadata.resource_version = (
existing.metadata.resource_version
)
self.networking_api.replace_namespaced_ingress(
name=ing_name,
namespace=self.k8s_namespace,
body=ingress,
)
print(f"Updated Ingress {ing_name}")
else:
raise
else:
if opts.o.debug:
print("No ingress configured")
def _create_nodeports(self):
"""Create or update NodePort services."""
nodeports: List[client.V1Service] = self.cluster_info.get_nodeports()
for nodeport in nodeports:
if opts.o.debug:
print(f"Sending this nodeport: {nodeport}")
if not opts.o.dry_run:
np_name = nodeport.metadata.name
try:
self.core_api.create_namespaced_service(
namespace=self.k8s_namespace, body=nodeport
)
except ApiException as e:
if e.status == 409:
existing = self.core_api.read_namespaced_service(
name=np_name, namespace=self.k8s_namespace
)
nodeport.metadata.resource_version = (
existing.metadata.resource_version
)
nodeport.spec.cluster_ip = existing.spec.cluster_ip
self.core_api.replace_namespaced_service(
name=np_name,
namespace=self.k8s_namespace,
body=nodeport,
)
else:
raise
def up(self, detach, skip_cluster_management, services, image_overrides=None):
# Merge spec-level image overrides with CLI overrides
spec_overrides = self.cluster_info.spec.get("image-overrides", {})
if spec_overrides:
if image_overrides:
spec_overrides.update(image_overrides) # CLI wins
image_overrides = spec_overrides
self.image_overrides = image_overrides
self.skip_cluster_management = skip_cluster_management
if not opts.o.dry_run:
if self.is_kind() and not self.skip_cluster_management:
# Create the kind cluster (or reuse existing one)
kind_config = str(
self.deployment_dir.joinpath(constants.kind_config_filename)
)
actual_cluster = create_cluster(self.kind_cluster_name, kind_config)
if actual_cluster != self.kind_cluster_name:
# An existing cluster was found, use it instead
self.kind_cluster_name = actual_cluster
# Only load locally-built images into kind
# Registry images (docker.io, ghcr.io, etc.) will be pulled by k8s
local_containers = self.deployment_context.stack.obj.get(
"containers", []
)
if local_containers:
# Filter image_set to only images matching local containers
local_images = {
img
for img in self.cluster_info.image_set
if any(c in img for c in local_containers)
}
if local_images:
load_images_into_kind(self.kind_cluster_name, local_images)
# Note: if no local containers defined, all images come from registries
self.connect_api()
# Create deployment-specific namespace for resource isolation
self._ensure_namespace()
if self.is_kind() and not self.skip_cluster_management:
# Configure ingress controller (not installed by default in kind)
# Skip if already running (idempotent for shared cluster)
if not is_ingress_running():
install_ingress_for_kind(self.cluster_info.spec.get_acme_email())
# Wait for ingress to start
# (deployment provisioning will fail unless this is done)
wait_for_ingress_in_kind()
# Create RuntimeClass if unlimited_memlock is enabled
if self.cluster_info.spec.get_unlimited_memlock():
_create_runtime_class(
constants.high_memlock_runtime,
constants.high_memlock_runtime,
)
self._setup_cluster()
else:
print("Dry run mode enabled, skipping k8s API connect")
# Create registry secret if configured
from stack_orchestrator.deploy.deployment_create import create_registry_secret
create_registry_secret(self.cluster_info.spec, self.cluster_info.app_name)
create_registry_secret(
self.cluster_info.spec, self.cluster_info.app_name, self.k8s_namespace
)
self._create_volume_data()
self._create_external_services()
self._create_ca_certificates()
self._create_deployment()
self._create_jobs()
http_proxy_info = self.cluster_info.spec.get_http_proxy()
# Note: we don't support tls for kind (enabling tls causes errors)
use_tls = http_proxy_info and not self.is_kind()
certificate = (
self._find_certificate_for_host_name(http_proxy_info[0]["host-name"])
if use_tls
else None
)
if opts.o.debug:
if certificate:
print(f"Using existing certificate: {certificate}")
ingress = self.cluster_info.get_ingress(
use_tls=use_tls, certificate=certificate
)
if ingress:
if opts.o.debug:
print(f"Sending this ingress: {ingress}")
if not opts.o.dry_run:
ingress_resp = self.networking_api.create_namespaced_ingress(
namespace=self.k8s_namespace, body=ingress
)
if opts.o.debug:
print("Ingress created:")
print(f"{ingress_resp}")
else:
if opts.o.debug:
print("No ingress configured")
nodeports: List[client.V1Service] = self.cluster_info.get_nodeports()
for nodeport in nodeports:
if opts.o.debug:
print(f"Sending this nodeport: {nodeport}")
if not opts.o.dry_run:
nodeport_resp = self.core_api.create_namespaced_service(
namespace=self.k8s_namespace, body=nodeport
)
if opts.o.debug:
print("NodePort created:")
print(f"{nodeport_resp}")
self._create_ingress()
self._create_nodeports()
# Call start() hooks — stacks can create additional k8s resources
if self.deployment_context:
from stack_orchestrator.deploy.deployment_create import call_stack_deploy_start
from stack_orchestrator.deploy.deployment_create import (
call_stack_deploy_start,
)
call_stack_deploy_start(self.deployment_context)
def down(self, timeout, volumes, skip_cluster_management):
@ -565,9 +862,7 @@ class K8sDeployer(Deployer):
# PersistentVolumes are cluster-scoped (not namespaced), so delete by label
if volumes:
try:
pvs = self.core_api.list_persistent_volume(
label_selector=app_label
)
pvs = self.core_api.list_persistent_volume(label_selector=app_label)
for pv in pvs.items:
if opts.o.debug:
print(f"Deleting PV: {pv.metadata.name}")
@ -579,14 +874,14 @@ class K8sDeployer(Deployer):
if opts.o.debug:
print(f"Error listing PVs: {e}")
# When namespace is explicitly set in the spec, it may be shared with
# other stacks — delete only this stack's resources by label.
# Otherwise the namespace is owned by this deployment, delete it entirely.
shared_namespace = self.deployment_context.spec.get_namespace() is not None
if shared_namespace:
self._delete_resources_by_label(app_label, volumes)
else:
self._delete_namespace()
# Delete the namespace to ensure clean slate.
# Resources created by older laconic-so versions lack labels, so
# label-based deletion can't find them. Namespace deletion is the
# only reliable cleanup.
self._delete_namespace()
# Wait for namespace to finish terminating before returning,
# so that up() can recreate it immediately.
self._wait_for_namespace_gone()
if self.is_kind() and not self.skip_cluster_management:
# Destroy the kind cluster
@ -711,14 +1006,18 @@ class K8sDeployer(Deployer):
def logs(self, services, tail, follow, stream):
self.connect_api()
pods = pods_in_deployment(self.core_api, self.cluster_info.app_name, namespace=self.k8s_namespace)
pods = pods_in_deployment(
self.core_api, self.cluster_info.app_name, namespace=self.k8s_namespace
)
if len(pods) > 1:
print("Warning: more than one pod in the deployment")
if len(pods) == 0:
log_data = "******* Pods not running ********\n"
else:
k8s_pod_name = pods[0]
containers = containers_in_pod(self.core_api, k8s_pod_name, namespace=self.k8s_namespace)
containers = containers_in_pod(
self.core_api, k8s_pod_name, namespace=self.k8s_namespace
)
# If pod not started, logs request below will throw an exception
try:
log_data = ""
@ -741,48 +1040,49 @@ class K8sDeployer(Deployer):
print("No pods defined, skipping update")
return
self.connect_api()
ref_deployment = self.cluster_info.get_deployment()
if not ref_deployment or not ref_deployment.metadata:
return
ref_name = ref_deployment.metadata.name
if not ref_name:
return
ref_deployments = self.cluster_info.get_deployments()
for ref_deployment in ref_deployments:
if not ref_deployment or not ref_deployment.metadata:
continue
ref_name = ref_deployment.metadata.name
if not ref_name:
continue
deployment = cast(
client.V1Deployment,
self.apps_api.read_namespaced_deployment(
name=ref_name, namespace=self.k8s_namespace
),
)
if not deployment.spec or not deployment.spec.template:
return
template_spec = deployment.spec.template.spec
if not template_spec or not template_spec.containers:
return
deployment = cast(
client.V1Deployment,
self.apps_api.read_namespaced_deployment(
name=ref_name, namespace=self.k8s_namespace
),
)
if not deployment.spec or not deployment.spec.template:
continue
template_spec = deployment.spec.template.spec
if not template_spec or not template_spec.containers:
continue
ref_spec = ref_deployment.spec
if ref_spec and ref_spec.template and ref_spec.template.spec:
ref_containers = ref_spec.template.spec.containers
if ref_containers:
new_env = ref_containers[0].env
for container in template_spec.containers:
old_env = container.env
if old_env != new_env:
container.env = new_env
ref_spec = ref_deployment.spec
if ref_spec and ref_spec.template and ref_spec.template.spec:
ref_containers = ref_spec.template.spec.containers
if ref_containers:
new_env = ref_containers[0].env
for container in template_spec.containers:
old_env = container.env
if old_env != new_env:
container.env = new_env
template_meta = deployment.spec.template.metadata
if template_meta:
template_meta.annotations = {
"kubectl.kubernetes.io/restartedAt": datetime.utcnow()
.replace(tzinfo=timezone.utc)
.isoformat()
}
template_meta = deployment.spec.template.metadata
if template_meta:
template_meta.annotations = {
"kubectl.kubernetes.io/restartedAt": datetime.utcnow()
.replace(tzinfo=timezone.utc)
.isoformat()
}
self.apps_api.patch_namespaced_deployment(
name=ref_name,
namespace=self.k8s_namespace,
body=deployment,
)
self.apps_api.patch_namespaced_deployment(
name=ref_name,
namespace=self.k8s_namespace,
body=deployment,
)
def run(
self,
@ -817,9 +1117,7 @@ class K8sDeployer(Deployer):
else:
# Non-Helm path: create job from ClusterInfo
self.connect_api()
jobs = self.cluster_info.get_jobs(
image_pull_policy=None if self.is_kind() else "Always"
)
jobs = self.cluster_info.get_jobs(image_pull_policy="Always")
# Find the matching job by name
target_name = f"{self.cluster_info.app_name}-job-{job_name}"
matched_job = None

View File

@ -393,7 +393,9 @@ def load_images_into_kind(kind_cluster_name: str, image_set: Set[str]):
raise DeployerException(f"kind load docker-image failed: {result}")
def pods_in_deployment(core_api: client.CoreV1Api, deployment_name: str, namespace: str = "default"):
def pods_in_deployment(
core_api: client.CoreV1Api, deployment_name: str, namespace: str = "default"
):
pods = []
pod_response = core_api.list_namespaced_pod(
namespace=namespace, label_selector=f"app={deployment_name}"
@ -406,7 +408,9 @@ def pods_in_deployment(core_api: client.CoreV1Api, deployment_name: str, namespa
return pods
def containers_in_pod(core_api: client.CoreV1Api, pod_name: str, namespace: str = "default") -> List[str]:
def containers_in_pod(
core_api: client.CoreV1Api, pod_name: str, namespace: str = "default"
) -> List[str]:
containers: List[str] = []
pod_response = cast(
client.V1Pod, core_api.read_namespaced_pod(pod_name, namespace=namespace)
@ -440,7 +444,20 @@ def named_volumes_from_pod_files(parsed_pod_files):
return named_volumes
def get_kind_pv_bind_mount_path(volume_name: str):
def get_kind_pv_bind_mount_path(
volume_name: str,
kind_mount_root: Optional[str] = None,
host_path: Optional[str] = None,
):
"""Get the path inside the Kind node for a PV.
When kind-mount-root is set and the volume's host path is under
that root, return /mnt/{relative_path} so it resolves through the
single root extraMount. Otherwise fall back to /mnt/{volume_name}.
"""
if kind_mount_root and host_path and host_path.startswith(kind_mount_root):
rel = os.path.relpath(host_path, kind_mount_root)
return f"/mnt/{rel}"
return f"/mnt/{volume_name}"
@ -563,6 +580,7 @@ def _generate_kind_mounts(parsed_pod_files, deployment_dir, deployment_context):
volume_definitions = []
volume_host_path_map = _get_host_paths_for_volumes(deployment_context)
seen_host_path_mounts = set() # Track to avoid duplicate mounts
kind_mount_root = deployment_context.spec.get_kind_mount_root()
# Cluster state backup for offline data recovery (unique per deployment)
# etcd contains all k8s state; PKI certs needed to decrypt etcd offline
@ -583,6 +601,16 @@ def _generate_kind_mounts(parsed_pod_files, deployment_dir, deployment_context):
f" - hostPath: {pki_host_path}\n" f" containerPath: /etc/kubernetes/pki\n"
)
# When kind-mount-root is set, emit a single extraMount for the root.
# Individual volumes whose host path starts with the root are covered
# by this single mount and don't need their own extraMount entries.
mount_root_emitted = False
if kind_mount_root:
volume_definitions.append(
f" - hostPath: {kind_mount_root}\n" f" containerPath: /mnt\n"
)
mount_root_emitted = True
# Note these paths are relative to the location of the pod files (at present)
# So we need to fix up to make them correct and absolute because kind assumes
# relative to the cwd.
@ -642,6 +670,12 @@ def _generate_kind_mounts(parsed_pod_files, deployment_dir, deployment_context):
volume_host_path_map[volume_name],
deployment_dir,
)
# Skip individual extraMount if covered
# by the kind-mount-root single mount
if mount_root_emitted and str(host_path).startswith(
kind_mount_root
):
continue
container_path = get_kind_pv_bind_mount_path(
volume_name
)
@ -978,7 +1012,7 @@ def translate_sidecar_service_names(
def envs_from_environment_variables_map(
map: Mapping[str, str]
map: Mapping[str, str],
) -> List[client.V1EnvVar]:
result = []
for env_var, env_val in map.items():

View File

@ -98,16 +98,17 @@ class Spec:
def get_image_registry(self):
return self.obj.get(constants.image_registry_key)
def get_credentials_files(self) -> typing.List[str]:
"""Returns list of credential file paths to append to config.env."""
return self.obj.get("credentials-files", [])
def get_image_registry_config(self) -> typing.Optional[typing.Dict]:
"""Returns registry auth config: {server, username, token-env}.
Used for private container registries like GHCR. The token-env field
specifies an environment variable containing the API token/PAT.
Note: Uses 'registry-credentials' key to avoid collision with
'image-registry' key which is for pushing images.
"""
return self.obj.get("registry-credentials")
return self.obj.get("image-pull-secret")
def get_volumes(self):
return self.obj.get(constants.volumes_key, {})
@ -170,15 +171,13 @@ class Spec:
Returns the per-volume Resources if found, otherwise None.
The caller should fall back to get_volume_resources() then the default.
"""
vol_section = (
self.obj.get(constants.resources_key, {}).get(constants.volumes_key, {})
vol_section = self.obj.get(constants.resources_key, {}).get(
constants.volumes_key, {}
)
if volume_name not in vol_section:
return None
entry = vol_section[volume_name]
if isinstance(entry, dict) and (
"reservations" in entry or "limits" in entry
):
if isinstance(entry, dict) and ("reservations" in entry or "limits" in entry):
return Resources(entry)
return None
@ -265,5 +264,46 @@ class Spec:
def is_kind_deployment(self):
return self.get_deployment_type() in [constants.k8s_kind_deploy_type]
def get_kind_mount_root(self) -> typing.Optional[str]:
"""Return kind-mount-root path or None.
When set, laconic-so emits a single Kind extraMount mapping this
host path to /mnt inside the Kind node. Volumes with host paths
under this root resolve to /mnt/{relative_path} and don't need
individual extraMounts. This allows adding new volumes without
recreating the Kind cluster.
"""
return self.obj.get(constants.kind_mount_root_key)
def get_maintenance_service(self) -> typing.Optional[str]:
"""Return maintenance-service value (e.g. 'dumpster-maintenance:8000') or None.
When set, the restart command swaps Ingress backends to this service
during the main pod Recreate, so users see a branded maintenance page
instead of a bare 502.
"""
return self.obj.get("maintenance-service")
def get_external_services(self) -> typing.Dict[str, typing.Dict]:
"""Return external-services config from spec.
Each entry maps a service name to its routing config:
- host mode: {host: "example.com", port: 443}
ExternalName k8s Service (DNS CNAME)
- selector mode: {selector: {app: "foo"}, namespace: "ns", port: 443}
Headless Service + Endpoints (cross-namespace routing to mock pod)
"""
return self.obj.get(constants.external_services_key, {})
def get_ca_certificates(self) -> typing.List[str]:
"""Return list of CA certificate file paths to trust.
Used in testing specs to inject mkcert root CAs so containers
trust TLS certs on mock services. Files are mounted into all
containers at /etc/ssl/certs/ and NODE_EXTRA_CA_CERTS is set.
Production specs omit this key entirely.
"""
return self.obj.get(constants.ca_certificates_key, [])
def is_docker_deployment(self):
return self.get_deployment_type() in [constants.compose_deploy_type]

View File

@ -141,28 +141,35 @@ echo "$test_config_file_changed_content" > "$test_config_file"
test_unchanged_config="$test_deployment_dir/config/test/script.sh"
# Modify spec file to simulate an update
sed -i.bak 's/CERC_TEST_PARAM_3:/CERC_TEST_PARAM_3: FASTER/' $test_deployment_spec
sed -i.bak 's/CERC_TEST_PARAM_3: FAST/CERC_TEST_PARAM_3: FASTER/' $test_deployment_spec
# Create/modify config.env to test it isn't overwritten during sync
# Save config.env before update (to verify it gets backed up)
config_env_file="$test_deployment_dir/config.env"
config_env_persistent_content="PERSISTENT_VALUE=should-not-be-overwritten-$(date +%s)"
echo "$config_env_persistent_content" >> "$config_env_file"
original_config_env_content=$(<$config_env_file)
# Run sync to update deployment files without destroying data
$TEST_TARGET_SO --stack test deploy create --spec-file $test_deployment_spec --deployment-dir $test_deployment_dir --update
# Verify config.env was not overwritten
# Verify config.env was regenerated from spec (reflects the FASTER change)
synced_config_env_content=$(<$config_env_file)
if [ "$synced_config_env_content" == "$original_config_env_content" ]; then
echo "deployment update test: config.env preserved - passed"
if [[ "$synced_config_env_content" == *"CERC_TEST_PARAM_3=FASTER"* ]]; then
echo "deployment update test: config.env regenerated from spec - passed"
else
echo "deployment update test: config.env was overwritten - FAILED"
echo "Expected: $original_config_env_content"
echo "deployment update test: config.env not regenerated - FAILED"
echo "Expected CERC_TEST_PARAM_3=FASTER in config.env"
echo "Got: $synced_config_env_content"
exit 1
fi
# Verify old config.env was backed up
config_env_backup="${config_env_file}.bak"
if [ -f "$config_env_backup" ]; then
echo "deployment update test: config.env backed up - passed"
else
echo "deployment update test: config.env backup not created - FAILED"
exit 1
fi
# Verify the spec file was updated in deployment dir
updated_deployed_spec=$(<$test_deployment_dir/spec.yml)
if [[ "$updated_deployed_spec" == *"FASTER"* ]]; then

2108
uv.lock generated Normal file

File diff suppressed because it is too large Load Diff