Fix registry secret created in wrong namespace (#998 )

`create_registry_secret()` was hardcoded to use the "default" namespace, but pods are deployed to the spec's configured namespace. The secret must be in the same namespace as the pods for `imagePullSecrets` to work. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Reviewed-on: #998 Co-authored-by: Prathamesh Musale <prathamesh.musale0@gmail.com> Co-committed-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
2026-03-26 08:36:39 +00:00
17 changed files with 368 additions and 3550 deletions
--- a/.gitignore
+++ b/.gitignore
@ -8,4 +8,3 @@ __pycache__
 package
 stack_orchestrator/data/build_tag.txt
 /build
 .worktrees
--- a/.pebbles/.gitignore
+++ b/.pebbles/.gitignore
@ -1 +0,0 @@
 pebbles.db
--- a/.pebbles/config.json
+++ b/.pebbles/config.json
@ -1 +0,0 @@
 {"project": "stack-orchestrator", "prefix": "so"}
--- a/.pebbles/events.jsonl
+++ b/.pebbles/events.jsonl
@ -1,10 +0,0 @@
 {"type": "create", "timestamp": "2026-03-18T14:45:07.038870Z", "issue_id": "so-a1a", "payload": {"title": "deploy create should support external credential injection", "type": "feature", "priority": "2", "description": "deploy create generates config.env but provides no mechanism to inject external credentials (API keys, tokens, etc.) at creation time. Operators must append to config.env after the fact, which mutates a build artifact. deploy create should accept --credentials-file or similar to include secrets in the generated config.env."}}
 {"type": "create", "timestamp": "2026-03-18T14:45:07.038942Z", "issue_id": "so-b2b", "payload": {"title": "REGISTRY_TOKEN / imagePullSecret flow undocumented", "type": "bug", "priority": "2", "description": "create_registry_secret() exists in deployment_create.py and is called during up(), but REGISTRY_TOKEN is not documented in spec.yml or any user-facing docs. The restart command warns \"Registry token env var REGISTRY_TOKEN not set, skipping registry secret\" but doesn't explain how to set it. For GHCR private images, this is required and the flow from spec.yml -> config.env -> imagePullSecret needs documentation."}}
 {"type": "create", "timestamp": "2026-03-18T19:10:00.000000Z", "issue_id": "so-k1k", "payload": {"title": "Stack path resolution differs between deploy create and deployment restart", "type": "bug", "priority": "2", "description": "deploy create resolves --stack as a relative path from cwd. deployment restart resolves --stack-path as absolute, then computes repo_root as 4 parents up (assuming stack_orchestrator/data/stacks/name structure). External stacks with different nesting depths (e.g. stack-orchestrator/stacks/name = 3 levels) get wrong repo_root, causing --spec-file resolution to fail. The two commands should use the same path resolution logic."}}
 {"type": "create", "timestamp": "2026-03-18T19:25:00.000000Z", "issue_id": "so-l2l", "payload": {"title": "deployment restart should update in place, not delete/recreate", "type": "bug", "priority": "1", "description": "deployment restart deletes the entire namespace then recreates everything from scratch. This causes:\n\n1. **Downtime** — nothing serves traffic between delete and successful recreate\n2. **No rollback** — deleting the namespace destroys ReplicaSet revision history\n3. **Race conditions** — namespace may still be terminating when up() tries to create\n4. **Cascading failures** — if ANY container fails to start, the entire site is down with no fallback\n\nFix: three changes needed.\n\n**A. up() should create-or-update, not just create.** Use patch/apply semantics for Deployments, Services, Ingresses. When the pod spec changes (new env vars, new image), k8s creates a new ReplicaSet, scales it up, waits for readiness probes, then scales the old one down. Old pods serve traffic until new pods are healthy.\n\n**B. down() should never delete the namespace on restart.** Only on explicit teardown. The namespace owns the revision history. Current code: _delete_namespace() on every down(). Should: delete individual resources by label for teardown, do nothing for restart (let update-in-place handle it).\n\n**C. All containers need readiness probes.** Without them k8s considers pods ready immediately, defeating rolling update safety. laconic-so should generate readiness probes from the http-proxy routes in spec.yml (if a container has an http route, probe that port).\n\nWith these changes, k8s native rolling updates provide zero-downtime deploys and automatic rollback (if new pods fail readiness, rollout stalls, old pods keep serving).\n\nSource files:\n- deploy_k8s.py: up(), down(), _create_deployment(), _delete_namespace()\n- cluster_info.py: pod spec generation (needs readiness probes)\n- deployment.py: restart() orchestration"}}
 {"type": "create", "timestamp": "2026-03-18T20:15:03.000000Z", "issue_id": "so-m3m", "payload": {"title": "Add credentials-files spec key for on-disk credential injection", "type": "feature", "priority": "1", "description": "deployment restart regenerates config.env from spec.yml, wiping credentials that were appended from on-disk files (e.g. ~/.credentials/*.env). Operators must append credentials after deploy create, which is fragile and breaks on restart.\n\nFix: New top-level spec key credentials-files. _write_config_file() reads each file and appends its contents to config.env after writing config vars. Files are read at deploy time from the deployment host.\n\nSpec syntax:\n  credentials-files:\n    - ~/.credentials/dumpster-secrets.env\n    - ~/.credentials/dumpster-r2.env\n\nFiles:\n- deploy/spec.py: add get_credentials_files() returning list of paths\n- deploy/deployment_create.py: in _write_config_file(), after writing config vars, read and append each credentials file (expand ~ to home dir)\n\nAlso update dumpster-stack spec.yml to use the new key and remove the ansible credential append workaround from woodburn_deployer (group_vars/all.yml credentials_env_files, stack_deploy role append tasks, restart_dumpster.yml credential steps). Those cleanups are in the woodburn_deployer repo."}}
 {"type":"status_update","timestamp":"2026-03-18T21:54:12.59148256Z","issue_id":"so-m3m","payload":{"status":"in_progress"}}
 {"type":"close","timestamp":"2026-03-18T21:55:31.6035544Z","issue_id":"so-m3m","payload":{}}
 {"type": "create", "timestamp": "2026-03-20T23:05:00.000000Z", "issue_id": "so-n1n", "payload": {"title": "Merge kind-mount-propagation branch — HostToContainer propagation for extraMounts", "type": "feature", "priority": "2", "description": "The kind-mount-root feature was cherry-picked to main (commit 8d03083d) but the mount propagation fix (commit 929bdab8 on branch enya-ac868cc4-kind-mount-propagation-fix) adds HostToContainer propagation so host submounts propagate into the Kind node. This is needed for ZFS child datasets and tmpfs mounts under the root. Cherry-pick 929bdab8 to main."}}
 {"type": "create", "timestamp": "2026-03-20T23:05:00.000000Z", "issue_id": "so-o2o", "payload": {"title": "etcd cert backup not persisting across cluster deletion", "type": "bug", "priority": "1", "description": "The extraMount for etcd at data/cluster-backups/<id>/etcd is configured but after cluster deletion the directory is empty. Caddy TLS certificates stored in etcd are lost. Either etcd isn't writing to the host mount, or the cleanup code is deleting the backup. Investigate _clean_etcd_keeping_certs in helpers.py."}}
 {"type": "create", "timestamp": "2026-03-21T00:20:00.000000Z", "issue_id": "so-p3p", "payload": {"title": "laconic-so should manage Caddy ingress image lifecycle", "type": "feature", "priority": "2", "description": "The Caddy ingress controller image is hardcoded in ingress-caddy-kind-deploy.yaml. There's no mechanism to update it without manual kubectl commands or cluster recreation. laconic-so should: 1) Allow spec.yml to specify a custom Caddy image, 2) Support updating the Caddy image as part of deployment restart, 3) Set strategy: Recreate on the Caddy Deployment (hostPort pods can't do RollingUpdate). This would let cryovial or similar tooling trigger Caddy updates through the normal deployment pipeline."}}
--- a/stack_orchestrator/constants.py
+++ b/stack_orchestrator/constants.py
@ -46,6 +46,3 @@ runtime_class_key = "runtime-class"
 high_memlock_runtime = "high-memlock"
 high_memlock_spec_filename = "high-memlock-spec.json"
 acme_email_key = "acme-email"
 kind_mount_root_key = "kind-mount-root"
 external_services_key = "external-services"
 ca_certificates_key = "ca-certificates"
--- a/stack_orchestrator/data/k8s/components/ingress/ingress-caddy-kind-deploy.yaml
+++ b/stack_orchestrator/data/k8s/components/ingress/ingress-caddy-kind-deploy.yaml
@ -186,8 +186,8 @@ spec:
          operator: Equal
      containers:
        - name: caddy-ingress-controller
-          image: ghcr.io/laconicnetwork/caddy-ingress:latest
+          image: caddy/ingress:latest
-          imagePullPolicy: Always
+          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 80
--- a/stack_orchestrator/deploy/compose/deploy_docker.py
+++ b/stack_orchestrator/deploy/compose/deploy_docker.py
@ -48,7 +48,7 @@ class DockerDeployer(Deployer):
        self.compose_project_name = compose_project_name
        self.compose_env_file = compose_env_file
-    def up(self, detach, skip_cluster_management, services, image_overrides=None):
+    def up(self, detach, skip_cluster_management, services):
        if not opts.o.dry_run:
            try:
                return self.docker.compose.up(detach=detach, services=services)
--- a/stack_orchestrator/deploy/deploy.py
+++ b/stack_orchestrator/deploy/deploy.py
@ -137,11 +137,7 @@ def create_deploy_context(
 def up_operation(
-    ctx,
+    ctx, services_list, stay_attached=False, skip_cluster_management=False
    services_list,
    stay_attached=False,
    skip_cluster_management=False,
    image_overrides=None,
 ):
    global_context = ctx.parent.parent.obj
    deploy_context = ctx.obj
@ -160,7 +156,6 @@ def up_operation(
        detach=not stay_attached,
        skip_cluster_management=skip_cluster_management,
        services=services_list,
        image_overrides=image_overrides,
    )
    for post_start_command in cluster_context.post_start_commands:
        _run_command(global_context, cluster_context.cluster, post_start_command)
--- a/stack_orchestrator/deploy/deployer.py
+++ b/stack_orchestrator/deploy/deployer.py
@ -20,7 +20,7 @@ from typing import Optional
 class Deployer(ABC):
    @abstractmethod
-    def up(self, detach, skip_cluster_management, services, image_overrides=None):
+    def up(self, detach, skip_cluster_management, services):
        pass
    @abstractmethod
--- a/stack_orchestrator/deploy/deployment.py
+++ b/stack_orchestrator/deploy/deployment.py
@ -17,7 +17,7 @@ import click
 from pathlib import Path
 import subprocess
 import sys
-
+import time
 from stack_orchestrator import constants
 from stack_orchestrator.deploy.images import push_images_operation
 from stack_orchestrator.deploy.deploy import (
@ -248,13 +248,8 @@ def run_job(ctx, job_name, helm_release):
    "--expected-ip",
    help="Expected IP for DNS verification (if different from egress)",
 )
@click.option(
    "--image",
    multiple=True,
    help="Override container image: container=image",
 )
@click.pass_context
-def restart(ctx, stack_path, spec_file, config_file, force, expected_ip, image):
+def restart(ctx, stack_path, spec_file, config_file, force, expected_ip):
    """Pull latest code and restart deployment using git-tracked spec.
    GitOps workflow:
@ -281,17 +276,6 @@ def restart(ctx, stack_path, spec_file, config_file, force, expected_ip, image):
    deployment_context: DeploymentContext = ctx.obj
    # Parse --image flags into a dict of container_name -> image
    image_overrides = {}
    for entry in image:
        if "=" not in entry:
            raise click.BadParameter(
                f"Invalid --image format '{entry}', expected container=image",
                param_hint="'--image'",
            )
        container_name, image_ref = entry.split("=", 1)
        image_overrides[container_name] = image_ref
    # Get current spec info (before git pull)
    current_spec = deployment_context.spec
    current_http_proxy = current_spec.get_http_proxy()
@ -338,22 +322,9 @@ def restart(ctx, stack_path, spec_file, config_file, force, expected_ip, image):
    # Determine spec file location
    # Priority: --spec-file argument > repo's deployment/spec.yml > deployment dir
-    # Find repo root via git rather than assuming a fixed directory depth.
+    # Stack path is like: repo/stack_orchestrator/data/stacks/stack-name
-    git_root_result = subprocess.run(
+    # So repo root is 4 parents up
-        ["git", "rev-parse", "--show-toplevel"],
+    repo_root = stack_source.parent.parent.parent.parent
        cwd=stack_source,
        capture_output=True,
        text=True,
    )
    if git_root_result.returncode == 0:
        repo_root = Path(git_root_result.stdout.strip())
    else:
        # Fallback: walk up from stack_source looking for .git
        repo_root = stack_source
        while repo_root != repo_root.parent:
            if (repo_root / ".git").exists():
                break
            repo_root = repo_root.parent
    if spec_file:
        # Spec file relative to repo root
        spec_file_path = repo_root / spec_file
@ -397,14 +368,7 @@ def restart(ctx, stack_path, spec_file, config_file, force, expected_ip, image):
        print("\n[2/4] Hostname unchanged, skipping DNS verification")
    # Step 3: Sync deployment directory with spec
    # The spec's "stack:" value is often a relative path (e.g.
    # "stack-orchestrator/stacks/dumpster") that must resolve from the
    # repo root.  Change cwd so stack_is_external() sees it correctly.
    print("\n[3/4] Syncing deployment directory...")
    import os
    prev_cwd = os.getcwd()
    os.chdir(repo_root)
    deploy_ctx = make_deploy_context(ctx)
    create_operation(
        deployment_command_context=deploy_ctx,
@ -414,216 +378,28 @@ def restart(ctx, stack_path, spec_file, config_file, force, expected_ip, image):
        network_dir=None,
        initial_peers=None,
    )
    # Reload deployment context with updated spec
    deployment_context.init(deployment_context.deployment_dir)
    ctx.obj = deployment_context
-    # Apply updated deployment.
+    # Stop deployment
-    # If maintenance-service is configured, swap Ingress to maintenance
+    print("\n[4/4] Restarting deployment...")
    # backend during the Recreate window so users see a branded page
    # instead of bare 502s.
    print("\n[4/4] Applying deployment update...")
    ctx.obj = make_deploy_context(ctx)
    down_operation(
        ctx, delete_volumes=False, extra_args_list=[], skip_cluster_management=True
    )
-    # Check for maintenance service in the (reloaded) spec
+    # Brief pause to ensure clean shutdown
-    maintenance_svc = deployment_context.spec.get_maintenance_service()
+    time.sleep(5)
    if maintenance_svc:
        print(f"Maintenance service configured: {maintenance_svc}")
        _restart_with_maintenance(
            ctx, deployment_context, maintenance_svc, image_overrides
        )
    else:
        up_operation(
            ctx,
            services_list=None,
            stay_attached=False,
            skip_cluster_management=True,
            image_overrides=image_overrides or None,
        )
-    # Restore cwd after both create_operation and up_operation have run.
+    # Start deployment
-    # Both need the relative stack path to resolve from repo_root.
+    up_operation(
-    os.chdir(prev_cwd)
+        ctx, services_list=None, stay_attached=False, skip_cluster_management=True
    )
    print("\n=== Restart Complete ===")
-    print("Deployment updated via rolling update.")
+    print("Deployment restarted with git-tracked configuration.")
    if new_hostname and new_hostname != current_hostname:
        print(f"\nNew hostname: {new_hostname}")
        print("Caddy will automatically provision TLS certificate.")
 def _restart_with_maintenance(
    ctx, deployment_context, maintenance_svc, image_overrides
 ):
    """Restart with Ingress swap to maintenance service during Recreate.
    Flow:
    1. Deploy all pods (including maintenance pod) with up_operation
    2. Patch Ingress: swap all route backends to maintenance service
    3. Scale main (non-maintenance) Deployments to 0
    4. Scale main Deployments back up (triggers Recreate with new spec)
    5. Wait for readiness
    6. Patch Ingress: restore original backends
    This ensures the maintenance pod is already running before we touch
    the Ingress, and the main pods get a clean Recreate.
    """
    import time
    from kubernetes.client.exceptions import ApiException
    from stack_orchestrator.deploy.deploy import up_operation
    # Step 1: Apply the full deployment (creates/updates all pods + services)
    # This ensures maintenance pod exists before we swap Ingress to it.
    up_operation(
        ctx,
        services_list=None,
        stay_attached=False,
        skip_cluster_management=True,
        image_overrides=image_overrides or None,
    )
    # Parse maintenance service spec: "container-name:port"
    maint_container = maintenance_svc.split(":")[0]
    maint_port = int(maintenance_svc.split(":")[1])
    # Connect to k8s API
    deploy_ctx = ctx.obj
    deployer = deploy_ctx.deployer
    deployer.connect_api()
    namespace = deployer.k8s_namespace
    app_name = deployer.cluster_info.app_name
    networking_api = deployer.networking_api
    apps_api = deployer.apps_api
    ingress_name = f"{app_name}-ingress"
    # Step 2: Read current Ingress and save original backends
    try:
        ingress = networking_api.read_namespaced_ingress(
            name=ingress_name, namespace=namespace
        )
    except ApiException:
        print("Warning: No Ingress found, skipping maintenance swap")
        return
    # Resolve which service the maintenance container belongs to
    maint_service_name = deployer.cluster_info._resolve_service_name_for_container(
        maint_container
    )
    # Save original backends for restoration
    original_backends = []
    for rule in ingress.spec.rules:
        rule_backends = []
        for path in rule.http.paths:
            rule_backends.append(
                {
                    "name": path.backend.service.name,
                    "port": path.backend.service.port.number,
                }
            )
        original_backends.append(rule_backends)
    # Patch all Ingress backends to point to maintenance service
    print("Swapping Ingress to maintenance service...")
    for rule in ingress.spec.rules:
        for path in rule.http.paths:
            path.backend.service.name = maint_service_name
            path.backend.service.port.number = maint_port
    networking_api.replace_namespaced_ingress(
        name=ingress_name, namespace=namespace, body=ingress
    )
    print("Ingress now points to maintenance service")
    # Step 3: Find main (non-maintenance) Deployments and scale to 0
    # then back up to trigger a clean Recreate
    deployments_resp = apps_api.list_namespaced_deployment(
        namespace=namespace, label_selector=f"app={app_name}"
    )
    main_deployments = []
    for dep in deployments_resp.items:
        dep_name = dep.metadata.name
        # Skip maintenance deployments
        component = (dep.metadata.labels or {}).get("app.kubernetes.io/component", "")
        is_maintenance = maint_container in component
        if not is_maintenance:
            main_deployments.append(dep_name)
    if main_deployments:
        # Scale down main deployments
        for dep_name in main_deployments:
            print(f"Scaling down {dep_name}...")
            apps_api.patch_namespaced_deployment_scale(
                name=dep_name,
                namespace=namespace,
                body={"spec": {"replicas": 0}},
            )
        # Wait for pods to terminate
        print("Waiting for main pods to terminate...")
        deadline = time.monotonic() + 120
        while time.monotonic() < deadline:
            pods = deployer.core_api.list_namespaced_pod(
                namespace=namespace,
                label_selector=f"app={app_name}",
            )
            # Count non-maintenance pods
            active = sum(
                1
                for p in pods.items
                if p.metadata
                and p.metadata.deletion_timestamp is None
                and not any(
                    maint_container in (c.name or "") for c in (p.spec.containers or [])
                )
            )
            if active == 0:
                break
            time.sleep(2)
        # Scale back up
        replicas = deployment_context.spec.get_replicas()
        for dep_name in main_deployments:
            print(f"Scaling up {dep_name} to {replicas} replicas...")
            apps_api.patch_namespaced_deployment_scale(
                name=dep_name,
                namespace=namespace,
                body={"spec": {"replicas": replicas}},
            )
        # Step 5: Wait for readiness
        print("Waiting for main pods to become ready...")
        deadline = time.monotonic() + 300
        while time.monotonic() < deadline:
            all_ready = True
            for dep_name in main_deployments:
                dep = apps_api.read_namespaced_deployment(
                    name=dep_name, namespace=namespace
                )
                ready = dep.status.ready_replicas or 0
                desired = dep.spec.replicas or 1
                if ready < desired:
                    all_ready = False
                    break
            if all_ready:
                break
            time.sleep(5)
    # Step 6: Restore original Ingress backends
    print("Restoring original Ingress backends...")
    ingress = networking_api.read_namespaced_ingress(
        name=ingress_name, namespace=namespace
    )
    for i, rule in enumerate(ingress.spec.rules):
        for j, path in enumerate(rule.http.paths):
            if i < len(original_backends) and j < len(original_backends[i]):
                path.backend.service.name = original_backends[i][j]["name"]
                path.backend.service.port.number = original_backends[i][j]["port"]
    networking_api.replace_namespaced_ingress(
        name=ingress_name, namespace=namespace, body=ingress
    )
    print("Ingress restored to original backends")
--- a/stack_orchestrator/deploy/deployment_create.py
+++ b/stack_orchestrator/deploy/deployment_create.py
@ -577,9 +577,7 @@ def _generate_and_store_secrets(config_vars: dict, deployment_name: str):
    return secrets
-def create_registry_secret(
+def create_registry_secret(spec: Spec, deployment_name: str, namespace: str = "default") -> Optional[str]:
    spec: Spec, deployment_name: str, namespace: str = "default"
 ) -> Optional[str]:
    """Create K8s docker-registry secret from spec + environment.
    Reads registry configuration from spec.yml and creates a Kubernetes
@ -588,7 +586,7 @@ def create_registry_secret(
    Args:
        spec: The deployment spec containing image-registry config
        deployment_name: Name of the deployment (used for secret naming)
-        namespace: K8s namespace to create the secret in
+        namespace: Kubernetes namespace to create the secret in
    Returns:
        The secret name if created, None if no registry config
@ -602,29 +600,16 @@ def create_registry_secret(
    server = registry_config.get("server")
    username = registry_config.get("username")
    token_env = registry_config.get("token-env")
    token_file = registry_config.get("token-file")
-    if not server or not username:
+    if not all([server, username, token_env]):
        return None
    if not token_env and not token_file:
        return None
-    # Resolve token: file takes precedence over env var
+    # Type narrowing for pyright - we've validated these aren't None above
-    token = None
+    assert token_env is not None
-    if token_file:
+    token = os.environ.get(token_env)
        token_path = os.path.expanduser(token_file)
        if os.path.exists(token_path):
            with open(token_path) as f:
                token = f.read().strip()
        else:
            print(f"Warning: Registry token file '{token_path}' not found")
    if not token and token_env:
        token = os.environ.get(token_env)
    if not token:
        source = token_file or token_env
        print(
-            f"Warning: Registry token not available from '{source}', "
+            f"Warning: Registry token env var '{token_env}' not set, "
            "skipping registry secret"
        )
        return None
@ -636,7 +621,7 @@ def create_registry_secret(
    }
    # Secret name derived from deployment name
-    secret_name = f"{deployment_name}-image-pull-secret"
+    secret_name = f"{deployment_name}-registry"
    # Load kube config
    try:
@ -690,15 +675,6 @@ def _write_config_file(
    # Write non-secret config to config.env (exclude $generate:...$ tokens)
    with open(config_env_file, "w") as output_file:
        output_file.write(
            "# AUTO-GENERATED by laconic-so from spec.yml config section.\n"
            "# Source: stack_orchestrator/deploy/deployment_create.py"
            " _write_config_file()\n"
            "# Do not edit — changes will be overwritten on deploy create"
            " or restart.\n"
            "# To change config, edit the config section in your spec.yml"
            " and redeploy.\n"
        )
        if config_vars:
            for variable_name, variable_value in config_vars.items():
                # Skip variables with generate tokens - they go to K8s Secret
@ -708,19 +684,6 @@ def _write_config_file(
                    continue
                output_file.write(f"{variable_name}={variable_value}\n")
        # Append contents of credentials files listed in spec
        credentials_files = spec_content.get("credentials-files", []) or []
        for cred_path_str in credentials_files:
            cred_path = Path(cred_path_str).expanduser()
            if not cred_path.exists():
                print(f"Error: credentials file does not exist: {cred_path}")
                sys.exit(1)
            output_file.write(f"# From credentials file: {cred_path_str}\n")
            contents = cred_path.read_text()
            output_file.write(contents)
            if not contents.endswith("\n"):
                output_file.write("\n")
 def _write_kube_config_file(external_path: Path, internal_path: Path):
    if not external_path.exists():
@ -872,7 +835,9 @@ def create_operation(
            # Copy from temp to deployment dir, excluding data volumes
            # and backing up changed files.
            # Exclude data/* to avoid touching user data volumes.
-            exclude_patterns = ["data", "data/*"]
+            # Exclude config file to preserve deployment settings
            # (XXX breaks passing config vars from spec)
            exclude_patterns = ["data", "data/*", constants.config_file_name]
            _safe_copy_tree(
                temp_dir, deployment_dir_path, exclude_patterns=exclude_patterns
            )
@ -1067,8 +1032,12 @@ def _write_deployment_files(
        for configmap in parsed_spec.get_configmaps():
            source_config_dir = resolve_config_dir(stack_name, configmap)
            if os.path.exists(source_config_dir):
-                destination_config_dir = target_dir.joinpath("configmaps", configmap)
+                destination_config_dir = target_dir.joinpath(
-                copytree(source_config_dir, destination_config_dir, dirs_exist_ok=True)
+                    "configmaps", configmap
                )
                copytree(
                    source_config_dir, destination_config_dir, dirs_exist_ok=True
                )
    # Copy the job files into the target dir
    jobs = get_job_list(parsed_stack)
--- a/stack_orchestrator/deploy/k8s/cluster_info.py
+++ b/stack_orchestrator/deploy/k8s/cluster_info.py
@ -82,14 +82,7 @@ class ClusterInfo:
    def __init__(self) -> None:
        self.parsed_job_yaml_map = {}
-    def int(
+    def int(self, pod_files: List[str], compose_env_file, deployment_name, spec: Spec, stack_name=""):
        self,
        pod_files: List[str],
        compose_env_file,
        deployment_name,
        spec: Spec,
        stack_name="",
    ):
        self.parsed_pod_yaml_map = parsed_pod_files_map_from_file_names(pod_files)
        # Find the set of images in the pods
        self.image_set = images_for_deployment(pod_files)
@ -167,99 +160,67 @@ class ClusterInfo:
                        nodeports.append(service)
        return nodeports
    def _resolve_service_name_for_container(self, container_name: str) -> str:
        """Resolve the k8s Service name that routes to a given container.
        For multi-pod stacks, each pod has its own Service. We find which
        pod file contains this container and return the corresponding
        service name. For single-pod stacks, returns the legacy service name.
        """
        pod_files = list(self.parsed_pod_yaml_map.keys())
        multi_pod = len(pod_files) > 1
        if not multi_pod:
            return f"{self.app_name}-service"
        for pod_file in pod_files:
            pod = self.parsed_pod_yaml_map[pod_file]
            if container_name in pod.get("services", {}):
                pod_name = self._pod_name_from_file(pod_file)
                return f"{self.app_name}-{pod_name}-service"
        # Fallback: container not found in any pod file
        return f"{self.app_name}-service"
    def get_ingress(
-        self, use_tls=False, certificates=None, cluster_issuer="letsencrypt-prod"
+        self, use_tls=False, certificate=None, cluster_issuer="letsencrypt-prod"
    ):
        # No ingress for a deployment that has no http-proxy defined, for now
        http_proxy_info_list = self.spec.get_http_proxy()
        ingress = None
        if http_proxy_info_list:
            # TODO: handle multiple definitions
            http_proxy_info = http_proxy_info_list[0]
            if opts.o.debug:
                print(f"http-proxy: {http_proxy_info}")
            # TODO: good enough parsing for webapp deployment for now
            host_name = http_proxy_info["host-name"]
            rules = []
-            tls = [] if use_tls else None
+            tls = (
-
+                [
-            for http_proxy_info in http_proxy_info_list:
+                    client.V1IngressTLS(
                        hosts=certificate["spec"]["dnsNames"]
                        if certificate
                        else [host_name],
                        secret_name=certificate["spec"]["secretName"]
                        if certificate
                        else f"{self.app_name}-tls",
                    )
                ]
                if use_tls
                else None
            )
            paths = []
            for route in http_proxy_info["routes"]:
                path = route["path"]
                proxy_to = route["proxy-to"]
                if opts.o.debug:
-                    print(f"http-proxy: {http_proxy_info}")
+                    print(f"proxy config: {path} -> {proxy_to}")
-                host_name = http_proxy_info["host-name"]
+                # proxy_to has the form <service>:<port>
-                certificate = (certificates or {}).get(host_name)
+                proxy_to_port = int(proxy_to.split(":")[1])
-
+                paths.append(
-                if use_tls:
+                    client.V1HTTPIngressPath(
-                    tls.append(
+                        path_type="Prefix",
-                        client.V1IngressTLS(
+                        path=path,
-                            hosts=(
+                        backend=client.V1IngressBackend(
-                                certificate["spec"]["dnsNames"]
+                            service=client.V1IngressServiceBackend(
-                                if certificate
+                                # TODO: this looks wrong
-                                else [host_name]
+                                name=f"{self.app_name}-service",
-                            ),
+                                # TODO: pull port number from the service
-                            secret_name=(
+                                port=client.V1ServiceBackendPort(number=proxy_to_port),
-                                certificate["spec"]["secretName"]
+                            )
-                                if certificate
+                        ),
                                else f"{self.app_name}-{host_name}-tls"
                            ),
                        )
                    )
                paths = []
                for route in http_proxy_info["routes"]:
                    path = route["path"]
                    proxy_to = route["proxy-to"]
                    if opts.o.debug:
                        print(f"proxy config: {path} -> {proxy_to}")
                    # proxy_to has the form <service>:<port>
                    container_name = proxy_to.split(":")[0]
                    proxy_to_port = int(proxy_to.split(":")[1])
                    service_name = self._resolve_service_name_for_container(
                        container_name
                    )
                    paths.append(
                        client.V1HTTPIngressPath(
                            path_type="Prefix",
                            path=path,
                            backend=client.V1IngressBackend(
                                service=client.V1IngressServiceBackend(
                                    name=service_name,
                                    port=client.V1ServiceBackendPort(
                                        number=proxy_to_port
                                    ),
                                )
                            ),
                        )
                    )
                rules.append(
                    client.V1IngressRule(
                        host=host_name,
                        http=client.V1HTTPIngressRuleValue(paths=paths),
                    )
                )
-
+            rules.append(
                client.V1IngressRule(
                    host=host_name, http=client.V1HTTPIngressRuleValue(paths=paths)
                )
            )
            spec = client.V1IngressSpec(tls=tls, rules=rules)
            ingress_annotations = {
                "kubernetes.io/ingress.class": "caddy",
            }
-            if not certificates:
+            if not certificate:
                ingress_annotations["cert-manager.io/cluster-issuer"] = cluster_issuer
            ingress = client.V1Ingress(
@ -272,28 +233,6 @@ class ClusterInfo:
            )
        return ingress
    def _get_readiness_probe_ports(self) -> dict:
        """Map container names to TCP readiness probe ports.
        Derives probe ports from http-proxy routes in the spec. If a container
        has an http-proxy route (proxy-to: container:port), we probe that port.
        This tells k8s when the container is ready to serve traffic, which is
        required for safe rolling updates.
        """
        probe_ports: dict = {}
        http_proxy_list = self.spec.get_http_proxy()
        if http_proxy_list:
            for http_proxy in http_proxy_list:
                for route in http_proxy.get("routes", []):
                    proxy_to = route.get("proxy-to", "")
                    if ":" in proxy_to:
                        container, port_str = proxy_to.rsplit(":", 1)
                        port = int(port_str)
                        # Use the first route's port for each container
                        if container not in probe_ports:
                            probe_ports[container] = port
        return probe_ports
    # TODO: suppoprt multiple services
    def get_service(self):
        # Collect all ports from http-proxy routes
@ -349,7 +288,8 @@ class ClusterInfo:
            # Per-volume resources override global, which overrides default.
            vol_resources = (
-                self.spec.get_volume_resources_for(volume_name) or global_resources
+                self.spec.get_volume_resources_for(volume_name)
                or global_resources
            )
            labels = {
@ -389,7 +329,6 @@ class ClusterInfo:
                    print(f"{cfg_map_name} not in pod files")
                continue
            cfg_map_path = os.path.expanduser(cfg_map_path)
            if not cfg_map_path.startswith("/") and self.spec.file_path is not None:
                cfg_map_path = os.path.join(
                    os.path.dirname(str(self.spec.file_path)), cfg_map_path
@ -452,15 +391,12 @@ class ClusterInfo:
                    continue
            vol_resources = (
-                self.spec.get_volume_resources_for(volume_name) or global_resources
+                self.spec.get_volume_resources_for(volume_name)
                or global_resources
            )
            if self.spec.is_kind_deployment():
                host_path = client.V1HostPathVolumeSource(
-                    path=get_kind_pv_bind_mount_path(
+                    path=get_kind_pv_bind_mount_path(volume_name)
                        volume_name,
                        kind_mount_root=self.spec.get_kind_mount_root(),
                        host_path=volume_path,
                    )
                )
            else:
                host_path = client.V1HostPathVolumeSource(path=volume_path)
@ -531,7 +467,6 @@ class ClusterInfo:
        containers = []
        init_containers = []
        services = {}
        readiness_probe_ports = self._get_readiness_probe_ports()
        global_resources = self.spec.get_container_resources()
        if not global_resources:
            global_resources = DEFAULT_CONTAINER_RESOURCES
@ -592,7 +527,9 @@ class ClusterInfo:
                    if self.spec.get_image_registry() is not None
                    else image
                )
-                volume_mounts = volume_mounts_for_service(parsed_yaml_map, service_name)
+                volume_mounts = volume_mounts_for_service(
                    parsed_yaml_map, service_name
                )
                # Handle command/entrypoint from compose file
                # In docker-compose: entrypoint -> k8s command, command -> k8s args
                container_command = None
@ -628,16 +565,6 @@ class ClusterInfo:
                container_resources = self._resolve_container_resources(
                    container_name, service_info, global_resources
                )
                # Readiness probe from http-proxy routes
                readiness_probe = None
                probe_port = readiness_probe_ports.get(container_name)
                if probe_port:
                    readiness_probe = client.V1Probe(
                        tcp_socket=client.V1TCPSocketAction(port=probe_port),
                        initial_delay_seconds=5,
                        period_seconds=10,
                        failure_threshold=3,
                    )
                container = client.V1Container(
                    name=container_name,
                    image=image_to_use,
@ -648,19 +575,14 @@ class ClusterInfo:
                    env_from=env_from,
                    ports=container_ports if container_ports else None,
                    volume_mounts=volume_mounts,
                    readiness_probe=readiness_probe,
                    security_context=client.V1SecurityContext(
                        privileged=self.spec.get_privileged(),
-                        run_as_user=(
+                        run_as_user=int(service_info["user"]) if "user" in service_info else None,
-                            int(service_info["user"])
+                        capabilities=client.V1Capabilities(
-                            if "user" in service_info
+                            add=self.spec.get_capabilities()
-                            else None
+                        )
-                        ),
+                        if self.spec.get_capabilities()
-                        capabilities=(
+                        else None,
                            client.V1Capabilities(add=self.spec.get_capabilities())
                            if self.spec.get_capabilities()
                            else None
                        ),
                    ),
                    resources=to_k8s_resource_requirements(container_resources),
                )
@ -669,53 +591,33 @@ class ClusterInfo:
                svc_labels = service_info.get("labels", {})
                if isinstance(svc_labels, list):
                    # docker-compose labels can be a list of "key=value"
-                    svc_labels = dict(item.split("=", 1) for item in svc_labels)
+                    svc_labels = dict(
-                is_init = str(svc_labels.get("laconic.init-container", "")).lower() in (
+                        item.split("=", 1) for item in svc_labels
-                    "true",
+                    )
-                    "1",
+                is_init = str(
-                    "yes",
+                    svc_labels.get("laconic.init-container", "")
-                )
+                ).lower() in ("true", "1", "yes")
                if is_init:
                    init_containers.append(container)
                else:
                    containers.append(container)
-        volumes = volumes_for_pod_files(parsed_yaml_map, self.spec, self.app_name)
+        volumes = volumes_for_pod_files(
            parsed_yaml_map, self.spec, self.app_name
        )
        return containers, init_containers, services, volumes
-    def _pod_name_from_file(self, pod_file: str) -> str:
+    # TODO: put things like image pull policy into an object-scope struct
-        """Extract pod name from compose file path.
+    def get_deployment(self, image_pull_policy: Optional[str] = None):
        containers, init_containers, services, volumes = self._build_containers(
            self.parsed_pod_yaml_map, image_pull_policy
        )
        registry_config = self.spec.get_image_registry_config()
        if registry_config:
            secret_name = f"{self.app_name}-registry"
            image_pull_secrets = [client.V1LocalObjectReference(name=secret_name)]
        else:
            image_pull_secrets = []
        docker-compose-dumpster.yml -> dumpster
        docker-compose-dumpster-maintenance.yml -> dumpster-maintenance
        """
        import os
        base = os.path.basename(pod_file)
        name = base
        if name.startswith("docker-compose-"):
            name = name[len("docker-compose-") :]
        if name.endswith(".yml"):
            name = name[: -len(".yml")]
        elif name.endswith(".yaml"):
            name = name[: -len(".yaml")]
        return name
    def _pod_has_pvcs(self, parsed_pod_file: Any) -> bool:
        """Check if a parsed compose file declares volumes that become PVCs.
        Excludes volumes that are ConfigMaps (declared in spec.configmaps),
        since those don't require Recreate strategy.
        """
        volumes = parsed_pod_file.get("volumes", {})
        configmaps = set(self.spec.get_configmaps().keys())
        pvc_volumes = [v for v in volumes if v not in configmaps]
        return len(pvc_volumes) > 0
    def _build_common_pod_metadata(self, services: dict) -> tuple:
        """Build shared annotations, labels, affinity, tolerations for pods.
        Returns (annotations, labels, affinity, tolerations).
        """
        annotations = None
        labels = {"app": self.app_name}
        if self.stack_name:
@ -737,6 +639,7 @@ class ClusterInfo:
        if self.spec.get_node_affinities():
            affinities = []
            for rule in self.spec.get_node_affinities():
                # TODO add some input validation here
                label_name = rule["label"]
                label_value = rule["value"]
                affinities.append(
@ -759,6 +662,7 @@ class ClusterInfo:
        if self.spec.get_node_tolerations():
            tolerations = []
            for toleration in self.spec.get_node_tolerations():
                # TODO add some input validation here
                toleration_key = toleration["key"]
                toleration_value = toleration["value"]
                tolerations.append(
@ -770,224 +674,37 @@ class ClusterInfo:
                    )
                )
        return annotations, labels, affinity, tolerations
    # TODO: put things like image pull policy into an object-scope struct
    def get_deployment(self, image_pull_policy: Optional[str] = None):
        """Build a single k8s Deployment from all pod files (legacy behavior).
        When only one pod is defined in the stack, this is equivalent to
        get_deployments()[0]. Kept for backward compatibility.
        """
        deployments = self.get_deployments(image_pull_policy)
        if not deployments:
            return None
        # Legacy: return the first (and usually only) deployment
        return deployments[0]
    def get_deployments(
        self, image_pull_policy: Optional[str] = None
    ) -> List[client.V1Deployment]:
        """Build one k8s Deployment per pod file.
        Each pod file (docker-compose-<name>.yml) becomes its own Deployment
        with independent lifecycle and update strategy:
        - Pods with PVCs get strategy=Recreate (can't do rolling updates
          with ReadWriteOnce volumes)
        - Pods without PVCs get strategy=RollingUpdate
        This enables maintenance services to survive main pod restarts.
        """
        if not self.parsed_pod_yaml_map:
            return []
        registry_config = self.spec.get_image_registry_config()
        if registry_config:
            secret_name = f"{self.app_name}-image-pull-secret"
            image_pull_secrets = [client.V1LocalObjectReference(name=secret_name)]
        else:
            image_pull_secrets = []
        use_host_network = self._any_service_has_host_network()
-        pod_files = list(self.parsed_pod_yaml_map.keys())
+        template = client.V1PodTemplateSpec(
            metadata=client.V1ObjectMeta(annotations=annotations, labels=labels),
            spec=client.V1PodSpec(
                containers=containers,
                init_containers=init_containers or None,
                image_pull_secrets=image_pull_secrets,
                volumes=volumes,
                affinity=affinity,
                tolerations=tolerations,
                runtime_class_name=self.spec.get_runtime_class(),
                host_network=use_host_network or None,
                dns_policy=("ClusterFirstWithHostNet" if use_host_network else None),
            ),
        )
        spec = client.V1DeploymentSpec(
            replicas=self.spec.get_replicas(),
            template=template,
            selector={"matchLabels": {"app": self.app_name}},
        )
-        # Single pod file: preserve legacy naming ({app_name}-deployment)
+        deployment = client.V1Deployment(
-        # Multiple pod files: use {app_name}-{pod_name}-deployment
+            api_version="apps/v1",
-        multi_pod = len(pod_files) > 1
+            kind="Deployment",
-
+            metadata=client.V1ObjectMeta(
-        deployments = []
+                name=f"{self.app_name}-deployment",
-        for pod_file in pod_files:
+                labels={"app": self.app_name, **({"app.kubernetes.io/stack": self.stack_name} if self.stack_name else {})},
-            pod_name = self._pod_name_from_file(pod_file)
+            ),
-            single_pod_map = {pod_file: self.parsed_pod_yaml_map[pod_file]}
+            spec=spec,
-            containers, init_containers, services, volumes = self._build_containers(
+        )
-                single_pod_map, image_pull_policy
+        return deployment
            )
            annotations, labels, affinity, tolerations = (
                self._build_common_pod_metadata(services)
            )
            # Add pod-name label so Services can target specific pods
            if multi_pod:
                labels["app.kubernetes.io/component"] = pod_name
            has_pvcs = self._pod_has_pvcs(self.parsed_pod_yaml_map[pod_file])
            if has_pvcs:
                strategy = client.V1DeploymentStrategy(type="Recreate")
            else:
                strategy = client.V1DeploymentStrategy(
                    type="RollingUpdate",
                    rolling_update=client.V1RollingUpdateDeployment(
                        max_unavailable=0, max_surge=1
                    ),
                )
            # Pod selector: for multi-pod, select by both app and component
            selector_labels = {"app": self.app_name}
            if multi_pod:
                selector_labels["app.kubernetes.io/component"] = pod_name
            # Add CA certificate volume and env vars if configured
            _ca_secret, ca_volume, ca_mounts, ca_envs = (
                self.get_ca_certificate_resources()
            )
            if ca_volume:
                volumes.append(ca_volume)
                for container in containers:
                    if container.volume_mounts is None:
                        container.volume_mounts = []
                    container.volume_mounts.extend(ca_mounts)
                    if container.env is None:
                        container.env = []
                    container.env.extend(ca_envs)
            template = client.V1PodTemplateSpec(
                metadata=client.V1ObjectMeta(annotations=annotations, labels=labels),
                spec=client.V1PodSpec(
                    containers=containers,
                    init_containers=init_containers or None,
                    image_pull_secrets=image_pull_secrets,
                    volumes=volumes,
                    affinity=affinity,
                    tolerations=tolerations,
                    runtime_class_name=self.spec.get_runtime_class(),
                    host_network=use_host_network or None,
                    dns_policy=(
                        "ClusterFirstWithHostNet" if use_host_network else None
                    ),
                ),
            )
            if multi_pod:
                deployment_name = f"{self.app_name}-{pod_name}-deployment"
            else:
                deployment_name = f"{self.app_name}-deployment"
            spec = client.V1DeploymentSpec(
                replicas=self.spec.get_replicas(),
                template=template,
                selector={"matchLabels": selector_labels},
                strategy=strategy,
            )
            deployment = client.V1Deployment(
                api_version="apps/v1",
                kind="Deployment",
                metadata=client.V1ObjectMeta(
                    name=deployment_name,
                    labels={
                        "app": self.app_name,
                        **(
                            {
                                "app.kubernetes.io/stack": self.stack_name,
                            }
                            if self.stack_name
                            else {}
                        ),
                        **(
                            {"app.kubernetes.io/component": pod_name}
                            if multi_pod
                            else {}
                        ),
                    },
                ),
                spec=spec,
            )
            deployments.append(deployment)
        return deployments
    def get_services(self) -> List[client.V1Service]:
        """Build per-pod ClusterIP Services for multi-pod stacks.
        Each pod's containers get their own Service so Ingress can route
        to specific pods. For single-pod stacks, returns a list with one
        service matching the legacy get_service() behavior.
        """
        pod_files = list(self.parsed_pod_yaml_map.keys())
        multi_pod = len(pod_files) > 1
        if not multi_pod:
            # Legacy: single service for all pods
            svc = self.get_service()
            return [svc] if svc else []
        # Multi-pod: one service per pod, only for pods that have
        # ports referenced by http-proxy routes
        http_proxy_list = self.spec.get_http_proxy()
        if not http_proxy_list:
            return []
        # Build map: container_name -> port from http-proxy routes
        container_ports: dict = {}
        for http_proxy in http_proxy_list:
            for route in http_proxy.get("routes", []):
                proxy_to = route.get("proxy-to", "")
                if ":" in proxy_to:
                    container, port_str = proxy_to.rsplit(":", 1)
                    port = int(port_str)
                    if container not in container_ports:
                        container_ports[container] = set()
                    container_ports[container].add(port)
        # Build map: pod_file -> set of service names in that pod
        pod_services_map: dict = {}
        for pod_file in pod_files:
            pod = self.parsed_pod_yaml_map[pod_file]
            pod_services_map[pod_file] = set(pod.get("services", {}).keys())
        services = []
        for pod_file in pod_files:
            pod_name = self._pod_name_from_file(pod_file)
            svc_names = pod_services_map[pod_file]
            # Collect ports from http-proxy that belong to this pod's containers
            ports_set: Set[int] = set()
            for svc_name in svc_names:
                if svc_name in container_ports:
                    ports_set.update(container_ports[svc_name])
            if not ports_set:
                continue
            service_ports = [
                client.V1ServicePort(port=p, target_port=p, name=f"port-{p}")
                for p in sorted(ports_set)
            ]
            service = client.V1Service(
                metadata=client.V1ObjectMeta(
                    name=f"{self.app_name}-{pod_name}-service",
                    labels={"app": self.app_name},
                ),
                spec=client.V1ServiceSpec(
                    type="ClusterIP",
                    ports=service_ports,
                    selector={
                        "app": self.app_name,
                        "app.kubernetes.io/component": pod_name,
                    },
                ),
            )
            services.append(service)
        return services
    def get_jobs(self, image_pull_policy: Optional[str] = None) -> List[client.V1Job]:
        """Build k8s Job objects from parsed job compose files.
@ -1003,7 +720,7 @@ class ClusterInfo:
        jobs = []
        registry_config = self.spec.get_image_registry_config()
        if registry_config:
-            secret_name = f"{self.app_name}-image-pull-secret"
+            secret_name = f"{self.app_name}-registry"
            image_pull_secrets = [client.V1LocalObjectReference(name=secret_name)]
        else:
            image_pull_secrets = []
@ -1011,8 +728,8 @@ class ClusterInfo:
        for job_file in self.parsed_job_yaml_map:
            # Build containers for this single job file
            single_job_map = {job_file: self.parsed_job_yaml_map[job_file]}
-            containers, init_containers, _services, volumes = self._build_containers(
+            containers, init_containers, _services, volumes = (
-                single_job_map, image_pull_policy
+                self._build_containers(single_job_map, image_pull_policy)
            )
            # Derive job name from file path: docker-compose-<name>.yml -> <name>
@ -1020,7 +737,7 @@ class ClusterInfo:
            # Strip docker-compose- prefix and .yml suffix
            job_name = base
            if job_name.startswith("docker-compose-"):
-                job_name = job_name[len("docker-compose-") :]
+                job_name = job_name[len("docker-compose-"):]
            if job_name.endswith(".yml"):
                job_name = job_name[: -len(".yml")]
            elif job_name.endswith(".yaml"):
@ -1030,14 +747,12 @@ class ClusterInfo:
            # picked up by pods_in_deployment() which queries app={app_name}.
            pod_labels = {
                "app": f"{self.app_name}-job",
-                **(
+                **({"app.kubernetes.io/stack": self.stack_name} if self.stack_name else {}),
                    {"app.kubernetes.io/stack": self.stack_name}
                    if self.stack_name
                    else {}
                ),
            }
            template = client.V1PodTemplateSpec(
-                metadata=client.V1ObjectMeta(labels=pod_labels),
+                metadata=client.V1ObjectMeta(
                    labels=pod_labels
                ),
                spec=client.V1PodSpec(
                    containers=containers,
                    init_containers=init_containers or None,
@ -1050,14 +765,7 @@ class ClusterInfo:
                template=template,
                backoff_limit=0,
            )
-            job_labels = {
+            job_labels = {"app": self.app_name, **({"app.kubernetes.io/stack": self.stack_name} if self.stack_name else {})}
                "app": self.app_name,
                **(
                    {"app.kubernetes.io/stack": self.stack_name}
                    if self.stack_name
                    else {}
                ),
            }
            job = client.V1Job(
                api_version="batch/v1",
                kind="Job",
@ -1070,130 +778,3 @@ class ClusterInfo:
            jobs.append(job)
        return jobs
    def get_external_service_resources(self) -> List:
        """Build k8s Services (and Endpoints) for external-services in spec.
        Two modes:
        - host mode: ExternalName Service (DNS CNAME to external host)
        - selector mode: headless Service + Endpoints (cross-namespace
          routing to a mock pod, IP discovered at deploy time)
        Returns a flat list of k8s resource objects (Services + Endpoints).
        """
        ext_services = self.spec.get_external_services()
        if not ext_services:
            return []
        resources = []
        for name, config in ext_services.items():
            port = config.get("port", 443)
            if "host" in config:
                # ExternalName: DNS CNAME to external host
                svc = client.V1Service(
                    metadata=client.V1ObjectMeta(
                        name=name,
                        labels={"app": self.app_name},
                    ),
                    spec=client.V1ServiceSpec(
                        type="ExternalName",
                        external_name=config["host"],
                        ports=[
                            client.V1ServicePort(port=port, name=f"port-{port}")
                        ],
                    ),
                )
                resources.append(svc)
            elif "selector" in config and "namespace" in config:
                # Cross-namespace headless Service + Endpoints.
                # The Endpoints IP is populated in deploy_k8s.py at deploy
                # time by querying the target namespace for matching pods.
                svc = client.V1Service(
                    metadata=client.V1ObjectMeta(
                        name=name,
                        labels={"app": self.app_name},
                    ),
                    spec=client.V1ServiceSpec(
                        cluster_ip="None",
                        ports=[
                            client.V1ServicePort(port=port, name=f"port-{port}")
                        ],
                    ),
                )
                resources.append(svc)
                # Endpoints object is created in deploy_k8s.py after pod
                # IP discovery — we just return the Service here.
        return resources
    def get_ca_certificate_resources(self) -> tuple:
        """Build k8s Secret and volume mount config for CA certificates.
        Returns (secret, volume, volume_mount, env_vars) or (None, ...) if
        no CA certificates are configured. The caller must add the volume
        and mount to all containers, and the env vars to all containers.
        """
        ca_files = self.spec.get_ca_certificates()
        if not ca_files:
            return None, None, None, []
        # Concatenate all CA files into one Secret
        secret_data = {}
        for i, ca_path in enumerate(ca_files):
            expanded = os.path.expanduser(ca_path)
            if not os.path.exists(expanded):
                print(f"Warning: CA certificate file not found: {expanded}")
                continue
            with open(expanded, "rb") as f:
                ca_bytes = f.read()
            key = f"laconic-extra-ca-{i}.pem"
            secret_data[key] = base64.b64encode(ca_bytes).decode()
        if not secret_data:
            return None, None, None, []
        secret_name = f"{self.app_name}-ca-certificates"
        secret = client.V1Secret(
            metadata=client.V1ObjectMeta(
                name=secret_name,
                labels={"app": self.app_name},
            ),
            data=secret_data,
        )
        volume = client.V1Volume(
            name="laconic-ca-certs",
            secret=client.V1SecretVolumeSource(
                secret_name=secret_name,
            ),
        )
        # Mount each CA file into /etc/ssl/certs/ (Go reads this dir)
        # Mount each CA file directly into /etc/ssl/certs/ using subPath
        # so Go's x509 package picks them up (it reads *.pem from that dir).
        # Also return env vars for Node/Bun containers.
        volume_mounts = []
        first_mount_path = None
        for key in secret_data.keys():
            mount_path = f"/etc/ssl/certs/{key}"
            if first_mount_path is None:
                first_mount_path = mount_path
            volume_mounts.append(
                client.V1VolumeMount(
                    name="laconic-ca-certs",
                    mount_path=mount_path,
                    sub_path=key,
                    read_only=True,
                )
            )
        env_vars = [
            client.V1EnvVar(
                name="NODE_EXTRA_CA_CERTS",
                value=first_mount_path,
            ),
        ]
        return secret, volume, volume_mounts, env_vars
--- a/stack_orchestrator/deploy/k8s/deploy_k8s.py
+++ b/stack_orchestrator/deploy/k8s/deploy_k8s.py
@ -115,7 +115,6 @@ class K8sDeployer(Deployer):
    ) -> None:
        self.type = type
        self.skip_cluster_management = False
        self.image_overrides = None
        self.k8s_namespace = "default"  # Will be overridden below if context exists
        # TODO: workaround pending refactoring above to cope with being
        # created with a null deployment_context
@ -123,13 +122,9 @@ class K8sDeployer(Deployer):
            return
        self.deployment_dir = deployment_context.deployment_dir
        self.deployment_context = deployment_context
-        self.kind_cluster_name = (
+        self.kind_cluster_name = deployment_context.spec.get_kind_cluster_name() or compose_project_name
            deployment_context.spec.get_kind_cluster_name() or compose_project_name
        )
        # Use spec namespace if provided, otherwise derive from cluster-id
-        self.k8s_namespace = (
+        self.k8s_namespace = deployment_context.spec.get_namespace() or f"laconic-{compose_project_name}"
            deployment_context.spec.get_namespace() or f"laconic-{compose_project_name}"
        )
        self.cluster_info = ClusterInfo()
        # stack.name may be an absolute path (from spec "stack:" key after
        # path resolution). Extract just the directory basename for labels.
@ -209,43 +204,6 @@ class K8sDeployer(Deployer):
            else:
                raise
    def _wait_for_namespace_gone(self, timeout_seconds: int = 120):
        """Wait for namespace to finish terminating."""
        if opts.o.dry_run:
            return
        import time
        deadline = time.monotonic() + timeout_seconds
        while time.monotonic() < deadline:
            try:
                ns = self.core_api.read_namespace(name=self.k8s_namespace)
                if ns.status and ns.status.phase == "Terminating":
                    if opts.o.debug:
                        print(
                            f"Waiting for namespace {self.k8s_namespace}"
                            " to finish terminating..."
                        )
                    time.sleep(2)
                    continue
                # Namespace exists and is Active — shouldn't happen after delete
                break
            except ApiException as e:
                if e.status == 404:
                    # Gone — success
                    return
                raise
        # If we get here, namespace still exists after timeout
        try:
            self.core_api.read_namespace(name=self.k8s_namespace)
            print(
                f"Warning: namespace {self.k8s_namespace} still exists"
                f" after {timeout_seconds}s"
            )
        except ApiException as e:
            if e.status == 404:
                return
            raise
    def _delete_resources_by_label(self, label_selector: str, delete_volumes: bool):
        """Delete only this stack's resources from a shared namespace."""
        ns = self.k8s_namespace
@ -274,8 +232,7 @@ class K8sDeployer(Deployer):
            for job in jobs.items:
                print(f"Deleting Job {job.metadata.name}")
                self.batch_api.delete_namespaced_job(
-                    name=job.metadata.name,
+                    name=job.metadata.name, namespace=ns,
                    namespace=ns,
                    body=client.V1DeleteOptions(propagation_policy="Background"),
                )
        except ApiException as e:
@ -346,22 +303,7 @@ class K8sDeployer(Deployer):
                        name=pv.metadata.name
                    )
                    if pv_resp:
-                        # If PV is in Released state (stale claimRef from a
+                        if opts.o.debug:
                        # previous deployment), clear the claimRef so a new
                        # PVC can bind to it. This happens after stop+start
                        # because stop deletes the namespace (and PVCs) but
                        # preserves PVs by default.
                        if pv_resp.status and pv_resp.status.phase == "Released":
                            print(
                                f"PV {pv.metadata.name} is Released, "
                                "clearing claimRef for rebinding"
                            )
                            pv_resp.spec.claim_ref = None
                            self.core_api.patch_persistent_volume(
                                name=pv.metadata.name,
                                body={"spec": {"claimRef": None}},
                            )
                        elif opts.o.debug:
                            print("PVs already present:")
                            print(f"{pv_resp}")
                        continue
@ -405,148 +347,12 @@ class K8sDeployer(Deployer):
            if opts.o.debug:
                print(f"Sending this ConfigMap: {cfg_map}")
            if not opts.o.dry_run:
-                cm_name = cfg_map.metadata.name
+                cfg_rsp = self.core_api.create_namespaced_config_map(
-                try:
+                    body=cfg_map, namespace=self.k8s_namespace
                    self.core_api.create_namespaced_config_map(
                        body=cfg_map, namespace=self.k8s_namespace
                    )
                except ApiException as e:
                    if e.status == 409:
                        self.core_api.patch_namespaced_config_map(
                            name=cm_name,
                            namespace=self.k8s_namespace,
                            body=cfg_map,
                        )
                    else:
                        raise
    def _create_external_services(self):
        """Create k8s Services for external-services declared in the spec.
        For host mode: ExternalName Service (DNS CNAME).
        For selector mode: headless Service + Endpoints with pod IPs
        discovered from the target namespace.
        """
        resources = self.cluster_info.get_external_service_resources()
        ext_services = self.cluster_info.spec.get_external_services()
        for resource in resources:
            if opts.o.dry_run:
                print(f"Dry run: would create external service: {resource.metadata.name}")
                continue
            svc_name = resource.metadata.name
            try:
                self.core_api.create_namespaced_service(
                    body=resource, namespace=self.k8s_namespace
                )
-                print(f"Created external service '{svc_name}'")
+                if opts.o.debug:
-            except ApiException as e:
+                    print("ConfigMap created:")
-                if e.status == 409:
+                    print(f"{cfg_rsp}")
                    self.core_api.replace_namespaced_service(
                        name=svc_name,
                        namespace=self.k8s_namespace,
                        body=resource,
                    )
                    print(f"Updated external service '{svc_name}'")
                else:
                    raise
        # Create Endpoints for selector-mode services
        for name, config in ext_services.items():
            if "selector" not in config or "namespace" not in config:
                continue
            if opts.o.dry_run:
                continue
            target_ns = config["namespace"]
            selector = config["selector"]
            port = config.get("port", 443)
            # Build label selector string from dict
            label_selector = ",".join(f"{k}={v}" for k, v in selector.items())
            # Discover pod IPs in target namespace
            pods = self.core_api.list_namespaced_pod(
                namespace=target_ns, label_selector=label_selector
            )
            pod_ips = [
                p.status.pod_ip
                for p in pods.items
                if p.status and p.status.pod_ip
            ]
            if not pod_ips:
                print(
                    f"Warning: no pods found in {target_ns} matching "
                    f"{label_selector} for external service '{name}'"
                )
                continue
            endpoints = client.V1Endpoints(
                metadata=client.V1ObjectMeta(
                    name=name,
                    labels={"app": self.cluster_info.app_name},
                ),
                subsets=[
                    client.V1EndpointSubset(
                        addresses=[
                            client.V1EndpointAddress(ip=ip) for ip in pod_ips
                        ],
                        ports=[
                            client.CoreV1EndpointPort(
                                port=port, name=f"port-{port}"
                            )
                        ],
                    )
                ],
            )
            try:
                self.core_api.create_namespaced_endpoints(
                    body=endpoints, namespace=self.k8s_namespace
                )
                print(f"Created endpoints for '{name}' → {pod_ips}")
            except ApiException as e:
                if e.status == 409:
                    self.core_api.replace_namespaced_endpoints(
                        name=name,
                        namespace=self.k8s_namespace,
                        body=endpoints,
                    )
                    print(f"Updated endpoints for '{name}' → {pod_ips}")
                else:
                    raise
    def _create_ca_certificates(self):
        """Create k8s Secret for CA certificates declared in the spec.
        The Secret is mounted into containers by get_deployments() in
        cluster_info.py. This method just ensures the Secret exists.
        """
        ca_secret, _, _, _ = self.cluster_info.get_ca_certificate_resources()
        if not ca_secret:
            return
        if opts.o.dry_run:
            print(f"Dry run: would create CA certificate secret")
            return
        secret_name = ca_secret.metadata.name
        try:
            self.core_api.create_namespaced_secret(
                body=ca_secret, namespace=self.k8s_namespace
            )
            print(f"Created CA certificate secret '{secret_name}'")
        except ApiException as e:
            if e.status == 409:
                self.core_api.replace_namespaced_secret(
                    name=secret_name,
                    namespace=self.k8s_namespace,
                    body=ca_secret,
                )
                print(f"Updated CA certificate secret '{secret_name}'")
            else:
                raise
    def _create_deployment(self):
        # Skip if there are no pods to deploy (e.g. jobs-only stacks)
@ -554,109 +360,48 @@ class K8sDeployer(Deployer):
            if opts.o.debug:
                print("No pods defined, skipping Deployment creation")
            return
-        # Process compose files into Deployments (one per pod file)
+        # Process compose files into a Deployment
-        # image-pull-policy from spec, default Always (production).
+        deployment = self.cluster_info.get_deployment(
-        # Testing specs use IfNotPresent so kind-loaded local images are used.
+            image_pull_policy=None if self.is_kind() else "Always"
-        pull_policy = self.cluster_info.spec.get("image-pull-policy", "Always")
+        )
-        deployments = self.cluster_info.get_deployments(image_pull_policy=pull_policy)
+        # Create the k8s objects
-        for deployment in deployments:
+        if opts.o.debug:
-            # Apply image overrides if provided
+            print(f"Sending this deployment: {deployment}")
-            if self.image_overrides:
+        if not opts.o.dry_run:
-                for container in deployment.spec.template.spec.containers:
+            deployment_resp = cast(
-                    if container.name in self.image_overrides:
+                client.V1Deployment,
-                        container.image = self.image_overrides[container.name]
+                self.apps_api.create_namespaced_deployment(
-                        if opts.o.debug:
+                    body=deployment, namespace=self.k8s_namespace
-                            print(
+                ),
-                                f"Overriding image for {container.name}:"
+            )
                                f" {container.image}"
                            )
            # Create or update the k8s Deployment
            if opts.o.debug:
-                print(f"Sending this deployment: {deployment}")
+                print("Deployment created:")
-            if not opts.o.dry_run:
+                meta = deployment_resp.metadata
-                name = deployment.metadata.name
+                spec = deployment_resp.spec
-                try:
+                if meta and spec and spec.template.spec:
-                    deployment_resp = cast(
+                    ns = meta.namespace
-                        client.V1Deployment,
+                    name = meta.name
-                        self.apps_api.create_namespaced_deployment(
+                    gen = meta.generation
-                            body=deployment, namespace=self.k8s_namespace
+                    containers = spec.template.spec.containers
-                        ),
+                    img = containers[0].image if containers else None
-                    )
+                    print(f"{ns} {name} {gen} {img}")
                    strategy = (
                        deployment.spec.strategy.type
                        if deployment.spec.strategy
                        else "default"
                    )
                    print(f"Created Deployment {name} (strategy: {strategy})")
                except ApiException as e:
                    if e.status == 409:
                        # Already exists — replace to ensure removed fields
                        # (volumes, mounts, env vars) are actually deleted.
                        existing = self.apps_api.read_namespaced_deployment(
                            name=name, namespace=self.k8s_namespace
                        )
                        deployment.metadata.resource_version = (
                            existing.metadata.resource_version
                        )
                        deployment_resp = cast(
                            client.V1Deployment,
                            self.apps_api.replace_namespaced_deployment(
                                name=name,
                                namespace=self.k8s_namespace,
                                body=deployment,
                            ),
                        )
                        print(f"Updated Deployment {name} (rolling update)")
                    else:
                        raise
                if opts.o.debug:
                    meta = deployment_resp.metadata
                    spec = deployment_resp.spec
                    if meta and spec and spec.template.spec:
                        containers = spec.template.spec.containers
                        img = containers[0].image if containers else None
                        print(
                            f"  {meta.namespace} {meta.name}"
                            f" gen={meta.generation} {img}"
                        )
-        # Create Services (one per pod for multi-pod, or one for single-pod)
+        service = self.cluster_info.get_service()
-        services = self.cluster_info.get_services()
+        if opts.o.debug:
-        for service in services:
+            print(f"Sending this service: {service}")
        if service and not opts.o.dry_run:
            service_resp = self.core_api.create_namespaced_service(
                namespace=self.k8s_namespace, body=service
            )
            if opts.o.debug:
-                print(f"Sending this service: {service}")
+                print("Service created:")
-            if service and not opts.o.dry_run:
+                print(f"{service_resp}")
                svc_name = service.metadata.name
                try:
                    service_resp = self.core_api.create_namespaced_service(
                        namespace=self.k8s_namespace, body=service
                    )
                    print(f"Created Service {svc_name}")
                except ApiException as e:
                    if e.status == 409:
                        # Replace to ensure removed ports are deleted.
                        # Must preserve clusterIP (immutable) and resourceVersion.
                        existing = self.core_api.read_namespaced_service(
                            name=svc_name, namespace=self.k8s_namespace
                        )
                        service.metadata.resource_version = (
                            existing.metadata.resource_version
                        )
                        service.spec.cluster_ip = existing.spec.cluster_ip
                        service_resp = self.core_api.replace_namespaced_service(
                            name=svc_name,
                            namespace=self.k8s_namespace,
                            body=service,
                        )
                        print(f"Updated Service {svc_name}")
                    else:
                        raise
                if opts.o.debug:
                    print(f"  {service_resp}")
    def _create_jobs(self):
        # Process job compose files into k8s Jobs
-        jobs = self.cluster_info.get_jobs(image_pull_policy="Always")
+        jobs = self.cluster_info.get_jobs(
            image_pull_policy=None if self.is_kind() else "Always"
        )
        for job in jobs:
            if opts.o.debug:
                print(f"Sending this job: {job}")
@ -708,149 +453,107 @@ class K8sDeployer(Deployer):
                                return cert
        return None
-    def _setup_cluster(self):
+    def up(self, detach, skip_cluster_management, services):
        """Create/reuse kind cluster, load images, ensure namespace."""
        if self.is_kind() and not self.skip_cluster_management:
            kind_config = str(
                self.deployment_dir.joinpath(constants.kind_config_filename)
            )
            actual_cluster = create_cluster(self.kind_cluster_name, kind_config)
            if actual_cluster != self.kind_cluster_name:
                self.kind_cluster_name = actual_cluster
            # Only load locally-built images into kind
            local_containers = self.deployment_context.stack.obj.get("containers", [])
            if local_containers:
                local_images = {
                    img
                    for img in self.cluster_info.image_set
                    if any(c in img for c in local_containers)
                }
                if local_images:
                    load_images_into_kind(self.kind_cluster_name, local_images)
        self.connect_api()
        self._ensure_namespace()
        if self.is_kind() and not self.skip_cluster_management:
            if not is_ingress_running():
                install_ingress_for_kind(self.cluster_info.spec.get_acme_email())
                wait_for_ingress_in_kind()
            if self.cluster_info.spec.get_unlimited_memlock():
                _create_runtime_class(
                    constants.high_memlock_runtime,
                    constants.high_memlock_runtime,
                )
    def _create_ingress(self):
        """Create or update Ingress with TLS certificate lookup."""
        http_proxy_info = self.cluster_info.spec.get_http_proxy()
        use_tls = http_proxy_info and not self.is_kind()
        certificates = None
        if use_tls:
            certificates = {}
            for proxy in http_proxy_info:
                host_name = proxy["host-name"]
                cert = self._find_certificate_for_host_name(host_name)
                if cert:
                    certificates[host_name] = cert
                    if opts.o.debug:
                        print(f"Using existing certificate for {host_name}: {cert}")
        ingress = self.cluster_info.get_ingress(
            use_tls=use_tls, certificates=certificates
        )
        if ingress:
            if opts.o.debug:
                print(f"Sending this ingress: {ingress}")
            if not opts.o.dry_run:
                ing_name = ingress.metadata.name
                try:
                    self.networking_api.create_namespaced_ingress(
                        namespace=self.k8s_namespace, body=ingress
                    )
                    print(f"Created Ingress {ing_name}")
                except ApiException as e:
                    if e.status == 409:
                        existing = self.networking_api.read_namespaced_ingress(
                            name=ing_name, namespace=self.k8s_namespace
                        )
                        ingress.metadata.resource_version = (
                            existing.metadata.resource_version
                        )
                        self.networking_api.replace_namespaced_ingress(
                            name=ing_name,
                            namespace=self.k8s_namespace,
                            body=ingress,
                        )
                        print(f"Updated Ingress {ing_name}")
                    else:
                        raise
        else:
            if opts.o.debug:
                print("No ingress configured")
    def _create_nodeports(self):
        """Create or update NodePort services."""
        nodeports: List[client.V1Service] = self.cluster_info.get_nodeports()
        for nodeport in nodeports:
            if opts.o.debug:
                print(f"Sending this nodeport: {nodeport}")
            if not opts.o.dry_run:
                np_name = nodeport.metadata.name
                try:
                    self.core_api.create_namespaced_service(
                        namespace=self.k8s_namespace, body=nodeport
                    )
                except ApiException as e:
                    if e.status == 409:
                        existing = self.core_api.read_namespaced_service(
                            name=np_name, namespace=self.k8s_namespace
                        )
                        nodeport.metadata.resource_version = (
                            existing.metadata.resource_version
                        )
                        nodeport.spec.cluster_ip = existing.spec.cluster_ip
                        self.core_api.replace_namespaced_service(
                            name=np_name,
                            namespace=self.k8s_namespace,
                            body=nodeport,
                        )
                    else:
                        raise
    def up(self, detach, skip_cluster_management, services, image_overrides=None):
        # Merge spec-level image overrides with CLI overrides
        spec_overrides = self.cluster_info.spec.get("image-overrides", {})
        if spec_overrides:
            if image_overrides:
                spec_overrides.update(image_overrides)  # CLI wins
            image_overrides = spec_overrides
        self.image_overrides = image_overrides
        self.skip_cluster_management = skip_cluster_management
        if not opts.o.dry_run:
-            self._setup_cluster()
+            if self.is_kind() and not self.skip_cluster_management:
                # Create the kind cluster (or reuse existing one)
                kind_config = str(
                    self.deployment_dir.joinpath(constants.kind_config_filename)
                )
                actual_cluster = create_cluster(self.kind_cluster_name, kind_config)
                if actual_cluster != self.kind_cluster_name:
                    # An existing cluster was found, use it instead
                    self.kind_cluster_name = actual_cluster
                # Only load locally-built images into kind
                # Registry images (docker.io, ghcr.io, etc.) will be pulled by k8s
                local_containers = self.deployment_context.stack.obj.get(
                    "containers", []
                )
                if local_containers:
                    # Filter image_set to only images matching local containers
                    local_images = {
                        img
                        for img in self.cluster_info.image_set
                        if any(c in img for c in local_containers)
                    }
                    if local_images:
                        load_images_into_kind(self.kind_cluster_name, local_images)
                # Note: if no local containers defined, all images come from registries
            self.connect_api()
            # Create deployment-specific namespace for resource isolation
            self._ensure_namespace()
            if self.is_kind() and not self.skip_cluster_management:
                # Configure ingress controller (not installed by default in kind)
                # Skip if already running (idempotent for shared cluster)
                if not is_ingress_running():
                    install_ingress_for_kind(self.cluster_info.spec.get_acme_email())
                    # Wait for ingress to start
                    # (deployment provisioning will fail unless this is done)
                    wait_for_ingress_in_kind()
                # Create RuntimeClass if unlimited_memlock is enabled
                if self.cluster_info.spec.get_unlimited_memlock():
                    _create_runtime_class(
                        constants.high_memlock_runtime,
                        constants.high_memlock_runtime,
                    )
        else:
            print("Dry run mode enabled, skipping k8s API connect")
        # Create registry secret if configured
        from stack_orchestrator.deploy.deployment_create import create_registry_secret
-        create_registry_secret(
+        create_registry_secret(self.cluster_info.spec, self.cluster_info.app_name, self.k8s_namespace)
            self.cluster_info.spec, self.cluster_info.app_name, self.k8s_namespace
        )
        self._create_volume_data()
        self._create_external_services()
        self._create_ca_certificates()
        self._create_deployment()
        self._create_jobs()
-        self._create_ingress()
+
-        self._create_nodeports()
+        http_proxy_info = self.cluster_info.spec.get_http_proxy()
        # Note: we don't support tls for kind (enabling tls causes errors)
        use_tls = http_proxy_info and not self.is_kind()
        certificate = (
            self._find_certificate_for_host_name(http_proxy_info[0]["host-name"])
            if use_tls
            else None
        )
        if opts.o.debug:
            if certificate:
                print(f"Using existing certificate: {certificate}")
        ingress = self.cluster_info.get_ingress(
            use_tls=use_tls, certificate=certificate
        )
        if ingress:
            if opts.o.debug:
                print(f"Sending this ingress: {ingress}")
            if not opts.o.dry_run:
                ingress_resp = self.networking_api.create_namespaced_ingress(
                    namespace=self.k8s_namespace, body=ingress
                )
                if opts.o.debug:
                    print("Ingress created:")
                    print(f"{ingress_resp}")
        else:
            if opts.o.debug:
                print("No ingress configured")
        nodeports: List[client.V1Service] = self.cluster_info.get_nodeports()
        for nodeport in nodeports:
            if opts.o.debug:
                print(f"Sending this nodeport: {nodeport}")
            if not opts.o.dry_run:
                nodeport_resp = self.core_api.create_namespaced_service(
                    namespace=self.k8s_namespace, body=nodeport
                )
                if opts.o.debug:
                    print("NodePort created:")
                    print(f"{nodeport_resp}")
        # Call start() hooks — stacks can create additional k8s resources
        if self.deployment_context:
-            from stack_orchestrator.deploy.deployment_create import (
+            from stack_orchestrator.deploy.deployment_create import call_stack_deploy_start
                call_stack_deploy_start,
            )
            call_stack_deploy_start(self.deployment_context)
    def down(self, timeout, volumes, skip_cluster_management):
@ -862,7 +565,9 @@ class K8sDeployer(Deployer):
        # PersistentVolumes are cluster-scoped (not namespaced), so delete by label
        if volumes:
            try:
-                pvs = self.core_api.list_persistent_volume(label_selector=app_label)
+                pvs = self.core_api.list_persistent_volume(
                    label_selector=app_label
                )
                for pv in pvs.items:
                    if opts.o.debug:
                        print(f"Deleting PV: {pv.metadata.name}")
@ -874,14 +579,14 @@ class K8sDeployer(Deployer):
                if opts.o.debug:
                    print(f"Error listing PVs: {e}")
-        # Delete the namespace to ensure clean slate.
+        # When namespace is explicitly set in the spec, it may be shared with
-        # Resources created by older laconic-so versions lack labels, so
+        # other stacks — delete only this stack's resources by label.
-        # label-based deletion can't find them. Namespace deletion is the
+        # Otherwise the namespace is owned by this deployment, delete it entirely.
-        # only reliable cleanup.
+        shared_namespace = self.deployment_context.spec.get_namespace() is not None
-        self._delete_namespace()
+        if shared_namespace:
-        # Wait for namespace to finish terminating before returning,
+            self._delete_resources_by_label(app_label, volumes)
-        # so that up() can recreate it immediately.
+        else:
-        self._wait_for_namespace_gone()
+            self._delete_namespace()
        if self.is_kind() and not self.skip_cluster_management:
            # Destroy the kind cluster
@ -1006,18 +711,14 @@ class K8sDeployer(Deployer):
    def logs(self, services, tail, follow, stream):
        self.connect_api()
-        pods = pods_in_deployment(
+        pods = pods_in_deployment(self.core_api, self.cluster_info.app_name, namespace=self.k8s_namespace)
            self.core_api, self.cluster_info.app_name, namespace=self.k8s_namespace
        )
        if len(pods) > 1:
            print("Warning: more than one pod in the deployment")
        if len(pods) == 0:
            log_data = "******* Pods not running ********\n"
        else:
            k8s_pod_name = pods[0]
-            containers = containers_in_pod(
+            containers = containers_in_pod(self.core_api, k8s_pod_name, namespace=self.k8s_namespace)
                self.core_api, k8s_pod_name, namespace=self.k8s_namespace
            )
            # If pod not started, logs request below will throw an exception
            try:
                log_data = ""
@ -1040,49 +741,48 @@ class K8sDeployer(Deployer):
                print("No pods defined, skipping update")
            return
        self.connect_api()
-        ref_deployments = self.cluster_info.get_deployments()
+        ref_deployment = self.cluster_info.get_deployment()
-        for ref_deployment in ref_deployments:
+        if not ref_deployment or not ref_deployment.metadata:
-            if not ref_deployment or not ref_deployment.metadata:
+            return
-                continue
+        ref_name = ref_deployment.metadata.name
-            ref_name = ref_deployment.metadata.name
+        if not ref_name:
-            if not ref_name:
+            return
                continue
-            deployment = cast(
+        deployment = cast(
-                client.V1Deployment,
+            client.V1Deployment,
-                self.apps_api.read_namespaced_deployment(
+            self.apps_api.read_namespaced_deployment(
-                    name=ref_name, namespace=self.k8s_namespace
+                name=ref_name, namespace=self.k8s_namespace
-                ),
+            ),
-            )
+        )
-            if not deployment.spec or not deployment.spec.template:
+        if not deployment.spec or not deployment.spec.template:
-                continue
+            return
-            template_spec = deployment.spec.template.spec
+        template_spec = deployment.spec.template.spec
-            if not template_spec or not template_spec.containers:
+        if not template_spec or not template_spec.containers:
-                continue
+            return
-            ref_spec = ref_deployment.spec
+        ref_spec = ref_deployment.spec
-            if ref_spec and ref_spec.template and ref_spec.template.spec:
+        if ref_spec and ref_spec.template and ref_spec.template.spec:
-                ref_containers = ref_spec.template.spec.containers
+            ref_containers = ref_spec.template.spec.containers
-                if ref_containers:
+            if ref_containers:
-                    new_env = ref_containers[0].env
+                new_env = ref_containers[0].env
-                    for container in template_spec.containers:
+                for container in template_spec.containers:
-                        old_env = container.env
+                    old_env = container.env
-                        if old_env != new_env:
+                    if old_env != new_env:
-                            container.env = new_env
+                        container.env = new_env
-            template_meta = deployment.spec.template.metadata
+        template_meta = deployment.spec.template.metadata
-            if template_meta:
+        if template_meta:
-                template_meta.annotations = {
+            template_meta.annotations = {
-                    "kubectl.kubernetes.io/restartedAt": datetime.utcnow()
+                "kubectl.kubernetes.io/restartedAt": datetime.utcnow()
-                    .replace(tzinfo=timezone.utc)
+                .replace(tzinfo=timezone.utc)
-                    .isoformat()
+                .isoformat()
-                }
+            }
-            self.apps_api.patch_namespaced_deployment(
+        self.apps_api.patch_namespaced_deployment(
-                name=ref_name,
+            name=ref_name,
-                namespace=self.k8s_namespace,
+            namespace=self.k8s_namespace,
-                body=deployment,
+            body=deployment,
-            )
+        )
    def run(
        self,
@ -1117,7 +817,9 @@ class K8sDeployer(Deployer):
            else:
                # Non-Helm path: create job from ClusterInfo
                self.connect_api()
-                jobs = self.cluster_info.get_jobs(image_pull_policy="Always")
+                jobs = self.cluster_info.get_jobs(
                    image_pull_policy=None if self.is_kind() else "Always"
                )
                # Find the matching job by name
                target_name = f"{self.cluster_info.app_name}-job-{job_name}"
                matched_job = None
--- a/stack_orchestrator/deploy/k8s/helpers.py
+++ b/stack_orchestrator/deploy/k8s/helpers.py
@ -393,9 +393,7 @@ def load_images_into_kind(kind_cluster_name: str, image_set: Set[str]):
            raise DeployerException(f"kind load docker-image failed: {result}")
-def pods_in_deployment(
+def pods_in_deployment(core_api: client.CoreV1Api, deployment_name: str, namespace: str = "default"):
    core_api: client.CoreV1Api, deployment_name: str, namespace: str = "default"
 ):
    pods = []
    pod_response = core_api.list_namespaced_pod(
        namespace=namespace, label_selector=f"app={deployment_name}"
@ -408,9 +406,7 @@ def pods_in_deployment(
    return pods
-def containers_in_pod(
+def containers_in_pod(core_api: client.CoreV1Api, pod_name: str, namespace: str = "default") -> List[str]:
    core_api: client.CoreV1Api, pod_name: str, namespace: str = "default"
 ) -> List[str]:
    containers: List[str] = []
    pod_response = cast(
        client.V1Pod, core_api.read_namespaced_pod(pod_name, namespace=namespace)
@ -444,20 +440,7 @@ def named_volumes_from_pod_files(parsed_pod_files):
    return named_volumes
-def get_kind_pv_bind_mount_path(
+def get_kind_pv_bind_mount_path(volume_name: str):
    volume_name: str,
    kind_mount_root: Optional[str] = None,
    host_path: Optional[str] = None,
 ):
    """Get the path inside the Kind node for a PV.
    When kind-mount-root is set and the volume's host path is under
    that root, return /mnt/{relative_path} so it resolves through the
    single root extraMount. Otherwise fall back to /mnt/{volume_name}.
    """
    if kind_mount_root and host_path and host_path.startswith(kind_mount_root):
        rel = os.path.relpath(host_path, kind_mount_root)
        return f"/mnt/{rel}"
    return f"/mnt/{volume_name}"
@ -580,7 +563,6 @@ def _generate_kind_mounts(parsed_pod_files, deployment_dir, deployment_context):
    volume_definitions = []
    volume_host_path_map = _get_host_paths_for_volumes(deployment_context)
    seen_host_path_mounts = set()  # Track to avoid duplicate mounts
    kind_mount_root = deployment_context.spec.get_kind_mount_root()
    # Cluster state backup for offline data recovery (unique per deployment)
    # etcd contains all k8s state; PKI certs needed to decrypt etcd offline
@ -601,16 +583,6 @@ def _generate_kind_mounts(parsed_pod_files, deployment_dir, deployment_context):
        f"  - hostPath: {pki_host_path}\n" f"    containerPath: /etc/kubernetes/pki\n"
    )
    # When kind-mount-root is set, emit a single extraMount for the root.
    # Individual volumes whose host path starts with the root are covered
    # by this single mount and don't need their own extraMount entries.
    mount_root_emitted = False
    if kind_mount_root:
        volume_definitions.append(
            f"  - hostPath: {kind_mount_root}\n" f"    containerPath: /mnt\n"
        )
        mount_root_emitted = True
    # Note these paths are relative to the location of the pod files (at present)
    # So we need to fix up to make them correct and absolute because kind assumes
    # relative to the cwd.
@ -670,12 +642,6 @@ def _generate_kind_mounts(parsed_pod_files, deployment_dir, deployment_context):
                                        volume_host_path_map[volume_name],
                                        deployment_dir,
                                    )
                                    # Skip individual extraMount if covered
                                    # by the kind-mount-root single mount
                                    if mount_root_emitted and str(host_path).startswith(
                                        kind_mount_root
                                    ):
                                        continue
                                    container_path = get_kind_pv_bind_mount_path(
                                        volume_name
                                    )
@ -1012,7 +978,7 @@ def translate_sidecar_service_names(
 def envs_from_environment_variables_map(
-    map: Mapping[str, str],
+    map: Mapping[str, str]
 ) -> List[client.V1EnvVar]:
    result = []
    for env_var, env_val in map.items():
--- a/stack_orchestrator/deploy/spec.py
+++ b/stack_orchestrator/deploy/spec.py
@ -98,17 +98,16 @@ class Spec:
    def get_image_registry(self):
        return self.obj.get(constants.image_registry_key)
    def get_credentials_files(self) -> typing.List[str]:
        """Returns list of credential file paths to append to config.env."""
        return self.obj.get("credentials-files", [])
    def get_image_registry_config(self) -> typing.Optional[typing.Dict]:
        """Returns registry auth config: {server, username, token-env}.
        Used for private container registries like GHCR. The token-env field
        specifies an environment variable containing the API token/PAT.
        Note: Uses 'registry-credentials' key to avoid collision with
        'image-registry' key which is for pushing images.
        """
-        return self.obj.get("image-pull-secret")
+        return self.obj.get("registry-credentials")
    def get_volumes(self):
        return self.obj.get(constants.volumes_key, {})
@ -171,13 +170,15 @@ class Spec:
        Returns the per-volume Resources if found, otherwise None.
        The caller should fall back to get_volume_resources() then the default.
        """
-        vol_section = self.obj.get(constants.resources_key, {}).get(
+        vol_section = (
-            constants.volumes_key, {}
+            self.obj.get(constants.resources_key, {}).get(constants.volumes_key, {})
        )
        if volume_name not in vol_section:
            return None
        entry = vol_section[volume_name]
-        if isinstance(entry, dict) and ("reservations" in entry or "limits" in entry):
+        if isinstance(entry, dict) and (
            "reservations" in entry or "limits" in entry
        ):
            return Resources(entry)
        return None
@ -264,46 +265,5 @@ class Spec:
    def is_kind_deployment(self):
        return self.get_deployment_type() in [constants.k8s_kind_deploy_type]
    def get_kind_mount_root(self) -> typing.Optional[str]:
        """Return kind-mount-root path or None.
        When set, laconic-so emits a single Kind extraMount mapping this
        host path to /mnt inside the Kind node. Volumes with host paths
        under this root resolve to /mnt/{relative_path} and don't need
        individual extraMounts. This allows adding new volumes without
        recreating the Kind cluster.
        """
        return self.obj.get(constants.kind_mount_root_key)
    def get_maintenance_service(self) -> typing.Optional[str]:
        """Return maintenance-service value (e.g. 'dumpster-maintenance:8000') or None.
        When set, the restart command swaps Ingress backends to this service
        during the main pod Recreate, so users see a branded maintenance page
        instead of a bare 502.
        """
        return self.obj.get("maintenance-service")
    def get_external_services(self) -> typing.Dict[str, typing.Dict]:
        """Return external-services config from spec.
        Each entry maps a service name to its routing config:
        - host mode: {host: "example.com", port: 443}
          → ExternalName k8s Service (DNS CNAME)
        - selector mode: {selector: {app: "foo"}, namespace: "ns", port: 443}
          → Headless Service + Endpoints (cross-namespace routing to mock pod)
        """
        return self.obj.get(constants.external_services_key, {})
    def get_ca_certificates(self) -> typing.List[str]:
        """Return list of CA certificate file paths to trust.
        Used in testing specs to inject mkcert root CAs so containers
        trust TLS certs on mock services. Files are mounted into all
        containers at /etc/ssl/certs/ and NODE_EXTRA_CA_CERTS is set.
        Production specs omit this key entirely.
        """
        return self.obj.get(constants.ca_certificates_key, [])
    def is_docker_deployment(self):
        return self.get_deployment_type() in [constants.compose_deploy_type]
--- a/tests/deploy/run-deploy-test.sh
+++ b/tests/deploy/run-deploy-test.sh
@ -141,35 +141,28 @@ echo "$test_config_file_changed_content" > "$test_config_file"
 test_unchanged_config="$test_deployment_dir/config/test/script.sh"
 # Modify spec file to simulate an update
-sed -i.bak 's/CERC_TEST_PARAM_3: FAST/CERC_TEST_PARAM_3: FASTER/' $test_deployment_spec
+sed -i.bak 's/CERC_TEST_PARAM_3:/CERC_TEST_PARAM_3: FASTER/' $test_deployment_spec
-# Save config.env before update (to verify it gets backed up)
+# Create/modify config.env to test it isn't overwritten during sync
 config_env_file="$test_deployment_dir/config.env"
 config_env_persistent_content="PERSISTENT_VALUE=should-not-be-overwritten-$(date +%s)"
 echo "$config_env_persistent_content" >> "$config_env_file"
 original_config_env_content=$(<$config_env_file)
 # Run sync to update deployment files without destroying data
 $TEST_TARGET_SO --stack test deploy create --spec-file $test_deployment_spec --deployment-dir $test_deployment_dir --update
-# Verify config.env was regenerated from spec (reflects the FASTER change)
+# Verify config.env was not overwritten
 synced_config_env_content=$(<$config_env_file)
-if [[ "$synced_config_env_content" == *"CERC_TEST_PARAM_3=FASTER"* ]]; then
+if [ "$synced_config_env_content" == "$original_config_env_content" ]; then
-    echo "deployment update test: config.env regenerated from spec - passed"
+    echo "deployment update test: config.env preserved - passed"
 else
-    echo "deployment update test: config.env not regenerated - FAILED"
+    echo "deployment update test: config.env was overwritten - FAILED"
-    echo "Expected CERC_TEST_PARAM_3=FASTER in config.env"
+    echo "Expected: $original_config_env_content"
    echo "Got: $synced_config_env_content"
    exit 1
 fi
 # Verify old config.env was backed up
 config_env_backup="${config_env_file}.bak"
 if [ -f "$config_env_backup" ]; then
    echo "deployment update test: config.env backed up - passed"
 else
    echo "deployment update test: config.env backup not created - FAILED"
    exit 1
 fi
 # Verify the spec file was updated in deployment dir
 updated_deployed_spec=$(<$test_deployment_dir/spec.yml)
 if [[ "$updated_deployed_spec" == *"FASTER"* ]]; then
--- a/uv.lock
+++ b/uv.lock
		`@ -1 +0,0 @@`
			`{"project": "stack-orchestrator", "prefix": "so"}`