Merge pull request 'fix(k8s): translate service names to localhost for sidecar containers' (#989 ) from fix-sidecar-localhost into main

Reviewed-on: #989
fix(k8s): translate service names to localhost for sidecar containers
2026-02-03 23:13:27 +00:00 · 2026-02-03 18:10:32 -05:00 · 2026-02-03 23:09:16 +00:00 · 2026-02-03 18:04:52 -05:00 · 2026-02-03 22:57:52 +00:00 · 2026-02-03 17:55:14 -05:00
8 changed files with 654 additions and 128 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -44,6 +44,76 @@ This project follows principles inspired by literate programming, where developm

 This approach treats the human-AI collaboration as a form of **conversational literate programming** where understanding emerges through dialogue before code implementation.

+## External Stacks Preferred
+
+When creating new stacks for any reason, **use the external stack pattern** rather than adding stacks directly to this repository.
+
+External stacks follow this structure:
+
+```
+my-stack/
+└── stack-orchestrator/
+    ├── stacks/
+    │   └── my-stack/
+    │       ├── stack.yml
+    │       └── README.md
+    ├── compose/
+    │   └── docker-compose-my-stack.yml
+    └── config/
+        └── my-stack/
+            └── (config files)
+```
+
+### Usage
+
+```bash
+# Fetch external stack
+laconic-so fetch-stack github.com/org/my-stack
+
+# Use external stack
+STACK_PATH=~/cerc/my-stack/stack-orchestrator/stacks/my-stack
+laconic-so --stack $STACK_PATH deploy init --output spec.yml
+laconic-so --stack $STACK_PATH deploy create --spec-file spec.yml --deployment-dir deployment
+laconic-so deployment --dir deployment start
+```
+
+### Examples
+
+- `zenith-karma-stack` - Karma watcher deployment
+- `urbit-stack` - Fake Urbit ship for testing
+- `zenith-desk-stack` - Desk deployment stack
+
+## Architecture: k8s-kind Deployments
+
+### One Cluster Per Host
+One Kind cluster per host by design. Never request or expect separate clusters.
+
+- `create_cluster()` in `helpers.py` reuses any existing cluster
+- `cluster-id` in deployment.yml is an identifier, not a cluster request
+- All deployments share: ingress controller, etcd, certificates
+
+### Stack Resolution
+- External stacks detected via `Path(stack).exists()` in `util.py`
+- Config/compose resolution: external path first, then internal fallback
+- External path structure: `stack_orchestrator/data/stacks/<name>/stack.yml`
+
+### Secret Generation Implementation
+- `GENERATE_TOKEN_PATTERN` in `deployment_create.py` matches `$generate:type:length$`
+- `_generate_and_store_secrets()` creates K8s Secret
+- `cluster_info.py` adds `envFrom` with `secretRef` to containers
+- Non-secret config written to `config.env`
+
+### Repository Cloning
+`setup-repositories --git-ssh` clones repos defined in stack.yml's `repos:` field. Requires SSH agent.
+
+### Key Files (for codebase navigation)
+- `repos/setup_repositories.py`: `setup-repositories` command (git clone)
+- `deployment_create.py`: `deploy create` command, secret generation
+- `deployment.py`: `deployment start/stop/restart` commands
+- `deploy_k8s.py`: K8s deployer, cluster management calls
+- `helpers.py`: `create_cluster()`, etcd cleanup, kind operations
+- `cluster_info.py`: K8s resource generation (Deployment, Service, Ingress)
+
 ## Insights and Observations

 ### Design Principles
--- a/README.md
+++ b/README.md
@ -71,6 +71,59 @@ The various [stacks](/stack_orchestrator/data/stacks) each contain instructions
 - [laconicd with console and CLI](stack_orchestrator/data/stacks/fixturenet-laconic-loaded)
 - [kubo (IPFS)](stack_orchestrator/data/stacks/kubo)

+## Deployment Types
+
+- **compose**: Docker Compose on local machine
+- **k8s**: External Kubernetes cluster (requires kubeconfig)
+- **k8s-kind**: Local Kubernetes via Kind - one cluster per host, shared by all deployments
+
+## External Stacks
+
+Stacks can live in external git repositories. Required structure:
+
+```
+<repo>/
+  stack_orchestrator/data/
+    stacks/<stack-name>/stack.yml
+    compose/docker-compose-<pod-name>.yml
+  deployment/spec.yml
+```
+
+## Deployment Commands
+
+```bash
+# Create deployment from spec
+laconic-so --stack <path> deploy create --spec-file <spec.yml> --deployment-dir <dir>
+
+# Start (creates cluster on first run)
+laconic-so deployment --dir <dir> start
+
+# GitOps restart (git pull + redeploy, preserves data)
+laconic-so deployment --dir <dir> restart
+
+# Stop
+laconic-so deployment --dir <dir> stop
+```
+
+## spec.yml Reference
+
+```yaml
+stack: stack-name-or-path
+deploy-to: k8s-kind
+network:
+  http-proxy:
+    - host-name: app.example.com
+      routes:
+        - path: /
+          proxy-to: service-name:port
+  acme-email: admin@example.com
+config:
+  ENV_VAR: value
+  SECRET_VAR: $generate:hex:32$   # Auto-generated, stored in K8s Secret
+volumes:
+  volume-name:
+```
+
 ## Contributing

 See the [CONTRIBUTING.md](/docs/CONTRIBUTING.md) for developer mode install.
--- a/docs/deployment_patterns.md
+++ b/docs/deployment_patterns.md
@ -76,6 +76,94 @@ git pull  # Get latest spec.yml from your operator repo
 laconic-so deployment --dir my-deployment restart
 ```

+## Private Registry Authentication
+
+For deployments using images from private container registries (e.g., GitHub Container Registry), configure authentication in your spec.yml:
+
+### Configuration
+
+Add a `registry-credentials` section to your spec.yml:
+
+```yaml
+registry-credentials:
+  server: ghcr.io
+  username: your-org-or-username
+  token-env: REGISTRY_TOKEN
+```
+
+**Fields:**
+- `server`: The registry hostname (e.g., `ghcr.io`, `docker.io`, `gcr.io`)
+- `username`: Registry username (for GHCR, use your GitHub username or org name)
+- `token-env`: Name of the environment variable containing your API token/PAT
+
+### Token Environment Variable
+
+The `token-env` pattern keeps credentials out of version control. Set the environment variable when running `deployment start`:
+
+```bash
+export REGISTRY_TOKEN="your-personal-access-token"
+laconic-so deployment --dir my-deployment start
+```
+
+For GHCR, create a Personal Access Token (PAT) with `read:packages` scope.
+
+### Ansible Integration
+
+When using Ansible for deployments, pass the token from a credentials file:
+
+```yaml
+- name: Start deployment
+  ansible.builtin.command:
+    cmd: laconic-so deployment --dir {{ deployment_dir }} start
+  environment:
+    REGISTRY_TOKEN: "{{ lookup('file', '~/.credentials/ghcr_token') }}"
+```
+
+### How It Works
+
+1. laconic-so reads the `registry-credentials` config from spec.yml
+2. Creates a Kubernetes `docker-registry` secret named `{deployment}-registry`
+3. The deployment's pods reference this secret for image pulls
+
+## Cluster and Volume Management
+
+### Stopping Deployments
+
+The `deployment stop` command has two important flags:
+
+```bash
+# Default: stops deployment, deletes cluster, PRESERVES volumes
+laconic-so deployment --dir my-deployment stop
+
+# Explicitly delete volumes (USE WITH CAUTION)
+laconic-so deployment --dir my-deployment stop --delete-volumes
+```
+
+### Volume Persistence
+
+Volumes persist across cluster deletion by design. This is important because:
+- **Data survives cluster recreation**: Ledger data, databases, and other state are preserved
+- **Faster recovery**: No need to re-sync or rebuild data after cluster issues
+- **Safe cluster upgrades**: Delete and recreate cluster without data loss
+
+**Only use `--delete-volumes` when:**
+- You explicitly want to start fresh with no data
+- The user specifically requests volume deletion
+- You're cleaning up a test/dev environment completely
+
+### Shared Cluster Architecture
+
+In kind deployments, multiple stacks share a single cluster:
+- First `deployment start` creates the cluster
+- Subsequent deployments reuse the existing cluster
+- `deployment stop` on ANY deployment deletes the shared cluster
+- Other deployments will fail until cluster is recreated
+
+To stop a single deployment without affecting the cluster:
+```bash
+laconic-so deployment --dir my-deployment stop --skip-cluster-management
+```
+
 ## Volume Persistence in k8s-kind

 k8s-kind has 3 storage layers:
--- a/stack_orchestrator/deploy/deployment_create.py
+++ b/stack_orchestrator/deploy/deployment_create.py
@ -15,7 +15,10 @@

 import click
 from importlib import util
+import json
 import os
+import re
+import base64
 from pathlib import Path
 from typing import List, Optional
 import random
@ -484,15 +487,180 @@ def init_operation(
        get_yaml().dump(spec_file_content, output_file)


-def _write_config_file(spec_file: Path, config_env_file: Path):
+# Token pattern: $generate:hex:32$ or $generate:base64:16$
+GENERATE_TOKEN_PATTERN = re.compile(r"\$generate:(\w+):(\d+)\$")
+
+
+def _generate_and_store_secrets(config_vars: dict, deployment_name: str):
+    """Generate secrets for $generate:...$ tokens and store in K8s Secret.
+
+    Called by `deploy create` - generates fresh secrets and stores them.
+    Returns the generated secrets dict for reference.
+    """
+    from kubernetes import client, config as k8s_config
+
+    secrets = {}
+    for name, value in config_vars.items():
+        if not isinstance(value, str):
+            continue
+        match = GENERATE_TOKEN_PATTERN.search(value)
+        if not match:
+            continue
+
+        secret_type, length = match.group(1), int(match.group(2))
+        if secret_type == "hex":
+            secrets[name] = token_hex(length)
+        elif secret_type == "base64":
+            secrets[name] = base64.b64encode(os.urandom(length)).decode()
+        else:
+            secrets[name] = token_hex(length)
+
+    if not secrets:
+        return secrets
+
+    # Store in K8s Secret
+    try:
+        k8s_config.load_kube_config()
+    except Exception:
+        # Fall back to in-cluster config if available
+        try:
+            k8s_config.load_incluster_config()
+        except Exception:
+            print(
+                "Warning: Could not load kube config, secrets will not be stored in K8s"
+            )
+            return secrets
+
+    v1 = client.CoreV1Api()
+    secret_name = f"{deployment_name}-generated-secrets"
+    namespace = "default"
+
+    secret_data = {k: base64.b64encode(v.encode()).decode() for k, v in secrets.items()}
+    k8s_secret = client.V1Secret(
+        metadata=client.V1ObjectMeta(name=secret_name), data=secret_data, type="Opaque"
+    )
+
+    try:
+        v1.create_namespaced_secret(namespace, k8s_secret)
+        num_secrets = len(secrets)
+        print(f"Created K8s Secret '{secret_name}' with {num_secrets} secret(s)")
+    except client.exceptions.ApiException as e:
+        if e.status == 409:  # Already exists
+            v1.replace_namespaced_secret(secret_name, namespace, k8s_secret)
+            num_secrets = len(secrets)
+            print(f"Updated K8s Secret '{secret_name}' with {num_secrets} secret(s)")
+        else:
+            raise
+
+    return secrets
+
+
+def create_registry_secret(spec: Spec, deployment_name: str) -> Optional[str]:
+    """Create K8s docker-registry secret from spec + environment.
+
+    Reads registry configuration from spec.yml and creates a Kubernetes
+    secret of type kubernetes.io/dockerconfigjson for image pulls.
+
+    Args:
+        spec: The deployment spec containing image-registry config
+        deployment_name: Name of the deployment (used for secret naming)
+
+    Returns:
+        The secret name if created, None if no registry config
+    """
+    from kubernetes import client, config as k8s_config
+
+    registry_config = spec.get_image_registry_config()
+    if not registry_config:
+        return None
+
+    server = registry_config.get("server")
+    username = registry_config.get("username")
+    token_env = registry_config.get("token-env")
+
+    if not all([server, username, token_env]):
+        return None
+
+    # Type narrowing for pyright - we've validated these aren't None above
+    assert token_env is not None
+    token = os.environ.get(token_env)
+    if not token:
+        print(
+            f"Warning: Registry token env var '{token_env}' not set, "
+            "skipping registry secret"
+        )
+        return None
+
+    # Create dockerconfigjson format (Docker API uses "password" field for tokens)
+    auth = base64.b64encode(f"{username}:{token}".encode()).decode()
+    docker_config = {
+        "auths": {server: {"username": username, "password": token, "auth": auth}}
+    }
+
+    # Secret name derived from deployment name
+    secret_name = f"{deployment_name}-registry"
+
+    # Load kube config
+    try:
+        k8s_config.load_kube_config()
+    except Exception:
+        try:
+            k8s_config.load_incluster_config()
+        except Exception:
+            print("Warning: Could not load kube config, registry secret not created")
+            return None
+
+    v1 = client.CoreV1Api()
+    namespace = "default"
+
+    k8s_secret = client.V1Secret(
+        metadata=client.V1ObjectMeta(name=secret_name),
+        data={
+            ".dockerconfigjson": base64.b64encode(
+                json.dumps(docker_config).encode()
+            ).decode()
+        },
+        type="kubernetes.io/dockerconfigjson",
+    )
+
+    try:
+        v1.create_namespaced_secret(namespace, k8s_secret)
+        print(f"Created registry secret '{secret_name}' for {server}")
+    except client.exceptions.ApiException as e:
+        if e.status == 409:  # Already exists
+            v1.replace_namespaced_secret(secret_name, namespace, k8s_secret)
+            print(f"Updated registry secret '{secret_name}' for {server}")
+        else:
+            raise
+
+    return secret_name
+
+
+def _write_config_file(
+    spec_file: Path, config_env_file: Path, deployment_name: Optional[str] = None
+):
    spec_content = get_parsed_deployment_spec(spec_file)
-    # Note: we want to write an empty file even if we have no config variables
+    config_vars = spec_content.get("config", {}) or {}
+
+    # Generate and store secrets in K8s if deployment_name provided and tokens exist
+    if deployment_name and config_vars:
+        has_generate_tokens = any(
+            isinstance(v, str) and GENERATE_TOKEN_PATTERN.search(v)
+            for v in config_vars.values()
+        )
+        if has_generate_tokens:
+            _generate_and_store_secrets(config_vars, deployment_name)
+
+    # Write non-secret config to config.env (exclude $generate:...$ tokens)
    with open(config_env_file, "w") as output_file:
-        if "config" in spec_content and spec_content["config"]:
-            config_vars = spec_content["config"]
-            if config_vars:
-                for variable_name, variable_value in config_vars.items():
-                    output_file.write(f"{variable_name}={variable_value}\n")
+        if config_vars:
+            for variable_name, variable_value in config_vars.items():
+                # Skip variables with generate tokens - they go to K8s Secret
+                if isinstance(variable_value, str) and GENERATE_TOKEN_PATTERN.search(
+                    variable_value
+                ):
+                    continue
+                output_file.write(f"{variable_name}={variable_value}\n")


 def _write_kube_config_file(external_path: Path, internal_path: Path):
@ -760,7 +928,12 @@ def _write_deployment_files(
        _create_deployment_file(target_dir, stack_source=stack_source)

    # Copy any config variables from the spec file into an env file suitable for compose
-    _write_config_file(spec_file, target_dir.joinpath(constants.config_file_name))
+    # Use stack_name as deployment_name for K8s secret naming
+    # Extract just the name part if stack_name is a path ("path/to/stack" -> "stack")
+    deployment_name = Path(stack_name).name.replace("_", "-")
+    _write_config_file(
+        spec_file, target_dir.joinpath(constants.config_file_name), deployment_name
+    )

    # Copy any k8s config file into the target dir
    if deployment_type == "k8s":
--- a/stack_orchestrator/deploy/k8s/cluster_info.py
+++ b/stack_orchestrator/deploy/k8s/cluster_info.py
@ -31,6 +31,7 @@ from stack_orchestrator.deploy.k8s.helpers import (
    envs_from_environment_variables_map,
    envs_from_compose_file,
    merge_envs,
+    translate_sidecar_service_names,
 )
 from stack_orchestrator.deploy.deploy_util import (
    parsed_pod_files_map_from_file_names,
@ -125,7 +126,8 @@ class ClusterInfo:
                                name=(
                                    f"{self.app_name}-nodeport-"
                                    f"{pod_port}-{protocol.lower()}"
-                                )
+                                ),
+                                labels={"app": self.app_name},
                            ),
                            spec=client.V1ServiceSpec(
                                type="NodePort",
@ -208,7 +210,9 @@ class ClusterInfo:

            ingress = client.V1Ingress(
                metadata=client.V1ObjectMeta(
-                    name=f"{self.app_name}-ingress", annotations=ingress_annotations
+                    name=f"{self.app_name}-ingress",
+                    labels={"app": self.app_name},
+                    annotations=ingress_annotations,
                ),
                spec=spec,
            )
@ -238,7 +242,10 @@ class ClusterInfo:
        ]

        service = client.V1Service(
-            metadata=client.V1ObjectMeta(name=f"{self.app_name}-service"),
+            metadata=client.V1ObjectMeta(
+                name=f"{self.app_name}-service",
+                labels={"app": self.app_name},
+            ),
            spec=client.V1ServiceSpec(
                type="ClusterIP",
                ports=service_ports,
@ -320,7 +327,7 @@ class ClusterInfo:
            spec = client.V1ConfigMap(
                metadata=client.V1ObjectMeta(
                    name=f"{self.app_name}-{cfg_map_name}",
-                    labels={"configmap-label": cfg_map_name},
+                    labels={"app": self.app_name, "configmap-label": cfg_map_name},
                ),
                binary_data=data,
            )
@ -377,7 +384,10 @@ class ClusterInfo:
            pv = client.V1PersistentVolume(
                metadata=client.V1ObjectMeta(
                    name=f"{self.app_name}-{volume_name}",
-                    labels={"volume-label": f"{self.app_name}-{volume_name}"},
+                    labels={
+                        "app": self.app_name,
+                        "volume-label": f"{self.app_name}-{volume_name}",
+                    },
                ),
                spec=spec,
            )
@ -430,6 +440,12 @@ class ClusterInfo:
                    if "environment" in service_info
                    else self.environment_variables.map
                )
+                # Translate docker-compose service names to localhost for sidecars
+                # All services in the same pod share the network namespace
+                sibling_services = [s for s in services.keys() if s != service_name]
+                merged_envs = translate_sidecar_service_names(
+                    merged_envs, sibling_services
+                )
                envs = envs_from_environment_variables_map(merged_envs)
                if opts.o.debug:
                    print(f"Merged envs: {envs}")
@ -457,6 +473,16 @@ class ClusterInfo:
                if "command" in service_info:
                    cmd = service_info["command"]
                    container_args = cmd if isinstance(cmd, list) else cmd.split()
+                # Add env_from to pull secrets from K8s Secret
+                secret_name = f"{self.app_name}-generated-secrets"
+                env_from = [
+                    client.V1EnvFromSource(
+                        secret_ref=client.V1SecretEnvSource(
+                            name=secret_name,
+                            optional=True,  # Don't fail if no secrets
+                        )
+                    )
+                ]
                container = client.V1Container(
                    name=container_name,
                    image=image_to_use,
@ -464,6 +490,7 @@ class ClusterInfo:
                    command=container_command,
                    args=container_args,
                    env=envs,
+                    env_from=env_from,
                    ports=container_ports if container_ports else None,
                    volume_mounts=volume_mounts,
                    security_context=client.V1SecurityContext(
@ -480,7 +507,12 @@ class ClusterInfo:
        volumes = volumes_for_pod_files(
            self.parsed_pod_yaml_map, self.spec, self.app_name
        )
-        image_pull_secrets = [client.V1LocalObjectReference(name="laconic-registry")]
+        registry_config = self.spec.get_image_registry_config()
+        if registry_config:
+            secret_name = f"{self.app_name}-registry"
+            image_pull_secrets = [client.V1LocalObjectReference(name=secret_name)]
+        else:
+            image_pull_secrets = []

        annotations = None
        labels = {"app": self.app_name}
--- a/stack_orchestrator/deploy/k8s/deploy_k8s.py
+++ b/stack_orchestrator/deploy/k8s/deploy_k8s.py
@ -29,6 +29,7 @@ from stack_orchestrator.deploy.k8s.helpers import (
 from stack_orchestrator.deploy.k8s.helpers import (
    install_ingress_for_kind,
    wait_for_ingress_in_kind,
+    is_ingress_running,
 )
 from stack_orchestrator.deploy.k8s.helpers import (
    pods_in_deployment,
@ -95,7 +96,7 @@ class K8sDeployer(Deployer):
    core_api: client.CoreV1Api
    apps_api: client.AppsV1Api
    networking_api: client.NetworkingV1Api
-    k8s_namespace: str = "default"
+    k8s_namespace: str
    kind_cluster_name: str
    skip_cluster_management: bool
    cluster_info: ClusterInfo
@ -112,6 +113,7 @@ class K8sDeployer(Deployer):
    ) -> None:
        self.type = type
        self.skip_cluster_management = False
+        self.k8s_namespace = "default"  # Will be overridden below if context exists
        # TODO: workaround pending refactoring above to cope with being
        # created with a null deployment_context
        if deployment_context is None:
@ -119,6 +121,8 @@ class K8sDeployer(Deployer):
        self.deployment_dir = deployment_context.deployment_dir
        self.deployment_context = deployment_context
        self.kind_cluster_name = compose_project_name
+        # Use deployment-specific namespace for resource isolation and easy cleanup
+        self.k8s_namespace = f"laconic-{compose_project_name}"
        self.cluster_info = ClusterInfo()
        self.cluster_info.int(
            compose_files,
@ -148,6 +152,46 @@ class K8sDeployer(Deployer):
        self.apps_api = client.AppsV1Api()
        self.custom_obj_api = client.CustomObjectsApi()

+    def _ensure_namespace(self):
+        """Create the deployment namespace if it doesn't exist."""
+        if opts.o.dry_run:
+            print(f"Dry run: would create namespace {self.k8s_namespace}")
+            return
+        try:
+            self.core_api.read_namespace(name=self.k8s_namespace)
+            if opts.o.debug:
+                print(f"Namespace {self.k8s_namespace} already exists")
+        except ApiException as e:
+            if e.status == 404:
+                # Create the namespace
+                ns = client.V1Namespace(
+                    metadata=client.V1ObjectMeta(
+                        name=self.k8s_namespace,
+                        labels={"app": self.cluster_info.app_name},
+                    )
+                )
+                self.core_api.create_namespace(body=ns)
+                if opts.o.debug:
+                    print(f"Created namespace {self.k8s_namespace}")
+            else:
+                raise
+
+    def _delete_namespace(self):
+        """Delete the deployment namespace and all resources within it."""
+        if opts.o.dry_run:
+            print(f"Dry run: would delete namespace {self.k8s_namespace}")
+            return
+        try:
+            self.core_api.delete_namespace(name=self.k8s_namespace)
+            if opts.o.debug:
+                print(f"Deleted namespace {self.k8s_namespace}")
+        except ApiException as e:
+            if e.status == 404:
+                if opts.o.debug:
+                    print(f"Namespace {self.k8s_namespace} not found")
+            else:
+                raise
+
    def _create_volume_data(self):
        # Create the host-path-mounted PVs for this deployment
        pvs = self.cluster_info.get_pvs()
@ -289,22 +333,40 @@ class K8sDeployer(Deployer):
        self.skip_cluster_management = skip_cluster_management
        if not opts.o.dry_run:
            if self.is_kind() and not self.skip_cluster_management:
-                # Create the kind cluster
-                create_cluster(
-                    self.kind_cluster_name,
-                    str(self.deployment_dir.joinpath(constants.kind_config_filename)),
+                # Create the kind cluster (or reuse existing one)
+                kind_config = str(
+                    self.deployment_dir.joinpath(constants.kind_config_filename)
                )
-                # Ensure the referenced containers are copied into kind
-                load_images_into_kind(
-                    self.kind_cluster_name, self.cluster_info.image_set
+                actual_cluster = create_cluster(self.kind_cluster_name, kind_config)
+                if actual_cluster != self.kind_cluster_name:
+                    # An existing cluster was found, use it instead
+                    self.kind_cluster_name = actual_cluster
+                # Only load locally-built images into kind
+                # Registry images (docker.io, ghcr.io, etc.) will be pulled by k8s
+                local_containers = self.deployment_context.stack.obj.get(
+                    "containers", []
                )
+                if local_containers:
+                    # Filter image_set to only images matching local containers
+                    local_images = {
+                        img
+                        for img in self.cluster_info.image_set
+                        if any(c in img for c in local_containers)
+                    }
+                    if local_images:
+                        load_images_into_kind(self.kind_cluster_name, local_images)
+                # Note: if no local containers defined, all images come from registries
            self.connect_api()
+            # Create deployment-specific namespace for resource isolation
+            self._ensure_namespace()
            if self.is_kind() and not self.skip_cluster_management:
                # Configure ingress controller (not installed by default in kind)
-                install_ingress_for_kind(self.cluster_info.spec.get_acme_email())
-                # Wait for ingress to start
-                # (deployment provisioning will fail unless this is done)
-                wait_for_ingress_in_kind()
+                # Skip if already running (idempotent for shared cluster)
+                if not is_ingress_running():
+                    install_ingress_for_kind(self.cluster_info.spec.get_acme_email())
+                    # Wait for ingress to start
+                    # (deployment provisioning will fail unless this is done)
+                    wait_for_ingress_in_kind()
                # Create RuntimeClass if unlimited_memlock is enabled
                if self.cluster_info.spec.get_unlimited_memlock():
                    _create_runtime_class(
@ -315,6 +377,11 @@ class K8sDeployer(Deployer):
        else:
            print("Dry run mode enabled, skipping k8s API connect")

+        # Create registry secret if configured
+        from stack_orchestrator.deploy.deployment_create import create_registry_secret
+
+        create_registry_secret(self.cluster_info.spec, self.cluster_info.app_name)
+
        self._create_volume_data()
        self._create_deployment()

@ -359,107 +426,30 @@ class K8sDeployer(Deployer):
                    print("NodePort created:")
                    print(f"{nodeport_resp}")

-    def down(self, timeout, volumes, skip_cluster_management):  # noqa: C901
+    def down(self, timeout, volumes, skip_cluster_management):
        self.skip_cluster_management = skip_cluster_management
        self.connect_api()
-        # Delete the k8s objects

+        # PersistentVolumes are cluster-scoped (not namespaced), so delete by label
        if volumes:
-            # Create the host-path-mounted PVs for this deployment
-            pvs = self.cluster_info.get_pvs()
-            for pv in pvs:
-                if opts.o.debug:
-                    print(f"Deleting this pv: {pv}")
-                try:
-                    pv_resp = self.core_api.delete_persistent_volume(
-                        name=pv.metadata.name
-                    )
+            try:
+                pvs = self.core_api.list_persistent_volume(
+                    label_selector=f"app={self.cluster_info.app_name}"
+                )
+                for pv in pvs.items:
                    if opts.o.debug:
-                        print("PV deleted:")
-                        print(f"{pv_resp}")
-                except ApiException as e:
-                    _check_delete_exception(e)
-
-            # Figure out the PVCs for this deployment
-            pvcs = self.cluster_info.get_pvcs()
-            for pvc in pvcs:
+                        print(f"Deleting PV: {pv.metadata.name}")
+                    try:
+                        self.core_api.delete_persistent_volume(name=pv.metadata.name)
+                    except ApiException as e:
+                        _check_delete_exception(e)
+            except ApiException as e:
                if opts.o.debug:
-                    print(f"Deleting this pvc: {pvc}")
-                try:
-                    pvc_resp = self.core_api.delete_namespaced_persistent_volume_claim(
-                        name=pvc.metadata.name, namespace=self.k8s_namespace
-                    )
-                    if opts.o.debug:
-                        print("PVCs deleted:")
-                        print(f"{pvc_resp}")
-                except ApiException as e:
-                    _check_delete_exception(e)
+                    print(f"Error listing PVs: {e}")

-        # Figure out the ConfigMaps for this deployment
-        cfg_maps = self.cluster_info.get_configmaps()
-        for cfg_map in cfg_maps:
-            if opts.o.debug:
-                print(f"Deleting this ConfigMap: {cfg_map}")
-            try:
-                cfg_map_resp = self.core_api.delete_namespaced_config_map(
-                    name=cfg_map.metadata.name, namespace=self.k8s_namespace
-                )
-                if opts.o.debug:
-                    print("ConfigMap deleted:")
-                    print(f"{cfg_map_resp}")
-            except ApiException as e:
-                _check_delete_exception(e)
-
-        deployment = self.cluster_info.get_deployment()
-        if opts.o.debug:
-            print(f"Deleting this deployment: {deployment}")
-        if deployment and deployment.metadata and deployment.metadata.name:
-            try:
-                self.apps_api.delete_namespaced_deployment(
-                    name=deployment.metadata.name, namespace=self.k8s_namespace
-                )
-            except ApiException as e:
-                _check_delete_exception(e)
-
-        service = self.cluster_info.get_service()
-        if opts.o.debug:
-            print(f"Deleting service: {service}")
-        if service and service.metadata and service.metadata.name:
-            try:
-                self.core_api.delete_namespaced_service(
-                    namespace=self.k8s_namespace, name=service.metadata.name
-                )
-            except ApiException as e:
-                _check_delete_exception(e)
-
-        ingress = self.cluster_info.get_ingress(use_tls=not self.is_kind())
-        if ingress and ingress.metadata and ingress.metadata.name:
-            if opts.o.debug:
-                print(f"Deleting this ingress: {ingress}")
-            try:
-                self.networking_api.delete_namespaced_ingress(
-                    name=ingress.metadata.name, namespace=self.k8s_namespace
-                )
-            except ApiException as e:
-                _check_delete_exception(e)
-        else:
-            if opts.o.debug:
-                print("No ingress to delete")
-
-        nodeports: List[client.V1Service] = self.cluster_info.get_nodeports()
-        for nodeport in nodeports:
-            if opts.o.debug:
-                print(f"Deleting this nodeport: {nodeport}")
-            if nodeport.metadata and nodeport.metadata.name:
-                try:
-                    self.core_api.delete_namespaced_service(
-                        namespace=self.k8s_namespace, name=nodeport.metadata.name
-                    )
-                except ApiException as e:
-                    _check_delete_exception(e)
-        else:
-            if opts.o.debug:
-                print("No nodeport to delete")
+        # Delete the deployment namespace - this cascades to all namespaced resources
+        # (PVCs, ConfigMaps, Deployments, Services, Ingresses, etc.)
+        self._delete_namespace()

        if self.is_kind() and not self.skip_cluster_management:
            # Destroy the kind cluster
@ -597,7 +587,7 @@ class K8sDeployer(Deployer):
                log_data = ""
                for container in containers:
                    container_log = self.core_api.read_namespaced_pod_log(
-                        k8s_pod_name, namespace="default", container=container
+                        k8s_pod_name, namespace=self.k8s_namespace, container=container
                    )
                    container_log_lines = container_log.splitlines()
                    for line in container_log_lines:
--- a/stack_orchestrator/deploy/k8s/helpers.py
+++ b/stack_orchestrator/deploy/k8s/helpers.py
@ -14,11 +14,13 @@
 # along with this program.  If not, see <http:#www.gnu.org/licenses/>.

 from kubernetes import client, utils, watch
+from kubernetes.client.exceptions import ApiException
 import os
 from pathlib import Path
 import subprocess
 import re
 from typing import Set, Mapping, List, Optional, cast
+import yaml

 from stack_orchestrator.util import get_k8s_dir, error_exit
 from stack_orchestrator.opts import opts
@ -262,20 +264,61 @@ def _clean_etcd_keeping_certs(etcd_path: str) -> bool:


 def create_cluster(name: str, config_file: str):
+    """Create or reuse the single kind cluster for this host.
+
+    There is only one kind cluster per host by design. Multiple deployments
+    share this cluster. If a cluster already exists, it is reused.
+
+    Args:
+        name: Cluster name (used only when creating the first cluster)
+        config_file: Path to kind config file (used only when creating)
+
+    Returns:
+        The name of the cluster being used
+    """
+    existing = get_kind_cluster()
+    if existing:
+        print(f"Using existing cluster: {existing}")
+        return existing
+
    # Clean persisted etcd, keeping only TLS certificates
    etcd_path = _get_etcd_host_path_from_kind_config(config_file)
    if etcd_path:
        _clean_etcd_keeping_certs(etcd_path)

+    print(f"Creating new cluster: {name}")
    result = _run_command(f"kind create cluster --name {name} --config {config_file}")
    if result.returncode != 0:
        raise DeployerException(f"kind create cluster failed: {result}")
+    return name


 def destroy_cluster(name: str):
    _run_command(f"kind delete cluster --name {name}")


+def is_ingress_running() -> bool:
+    """Check if the Caddy ingress controller is already running in the cluster."""
+    try:
+        core_v1 = client.CoreV1Api()
+        pods = core_v1.list_namespaced_pod(
+            namespace="caddy-system",
+            label_selector=(
+                "app.kubernetes.io/name=caddy-ingress-controller,"
+                "app.kubernetes.io/component=controller"
+            ),
+        )
+        for pod in pods.items:
+            if pod.status and pod.status.container_statuses:
+                if pod.status.container_statuses[0].ready is True:
+                    if opts.o.debug:
+                        print("Caddy ingress controller already running")
+                    return True
+        return False
+    except ApiException:
+        return False
+
+
 def wait_for_ingress_in_kind():
    core_v1 = client.CoreV1Api()
    for i in range(20):
@ -311,22 +354,34 @@ def install_ingress_for_kind(acme_email: str = ""):
    )
    if opts.o.debug:
        print("Installing Caddy ingress controller in kind cluster")
-    utils.create_from_yaml(api_client, yaml_file=ingress_install)

-    # Patch ConfigMap with acme email if provided
+    # Template the YAML with email before applying
+    with open(ingress_install) as f:
+        yaml_content = f.read()
+
    if acme_email:
-        core_v1 = client.CoreV1Api()
-        configmap = core_v1.read_namespaced_config_map(
+        yaml_content = yaml_content.replace('email: ""', f'email: "{acme_email}"')
+        if opts.o.debug:
+            print(f"Configured Caddy with ACME email: {acme_email}")
+
+    # Apply templated YAML
+    yaml_objects = list(yaml.safe_load_all(yaml_content))
+    utils.create_from_yaml(api_client, yaml_objects=yaml_objects)
+
+    # Patch ConfigMap with ACME email if provided
+    if acme_email:
+        if opts.o.debug:
+            print(f"Configuring ACME email: {acme_email}")
+        core_api = client.CoreV1Api()
+        configmap = core_api.read_namespaced_config_map(
            name="caddy-ingress-controller-configmap", namespace="caddy-system"
        )
        configmap.data["email"] = acme_email
-        core_v1.patch_namespaced_config_map(
+        core_api.patch_namespaced_config_map(
            name="caddy-ingress-controller-configmap",
            namespace="caddy-system",
            body=configmap,
        )
-        if opts.o.debug:
-            print(f"Patched Caddy ConfigMap with email: {acme_email}")


 def load_images_into_kind(kind_cluster_name: str, image_set: Set[str]):
@ -509,6 +564,25 @@ def _generate_kind_mounts(parsed_pod_files, deployment_dir, deployment_context):
    volume_host_path_map = _get_host_paths_for_volumes(deployment_context)
    seen_host_path_mounts = set()  # Track to avoid duplicate mounts

+    # Cluster state backup for offline data recovery (unique per deployment)
+    # etcd contains all k8s state; PKI certs needed to decrypt etcd offline
+    deployment_id = deployment_context.id
+    backup_subdir = f"cluster-backups/{deployment_id}"
+
+    etcd_host_path = _make_absolute_host_path(
+        Path(f"./data/{backup_subdir}/etcd"), deployment_dir
+    )
+    volume_definitions.append(
+        f"  - hostPath: {etcd_host_path}\n" f"    containerPath: /var/lib/etcd\n"
+    )
+
+    pki_host_path = _make_absolute_host_path(
+        Path(f"./data/{backup_subdir}/pki"), deployment_dir
+    )
+    volume_definitions.append(
+        f"  - hostPath: {pki_host_path}\n" f"    containerPath: /etc/kubernetes/pki\n"
+    )
+
    # Note these paths are relative to the location of the pod files (at present)
    # So we need to fix up to make them correct and absolute because kind assumes
    # relative to the cwd.
@ -868,6 +942,41 @@ def envs_from_compose_file(
    return result


+def translate_sidecar_service_names(
+    envs: Mapping[str, str], sibling_service_names: List[str]
+) -> Mapping[str, str]:
+    """Translate docker-compose service names to localhost for sidecar containers.
+
+    In docker-compose, services can reference each other by name (e.g., 'db:5432').
+    In Kubernetes, when multiple containers are in the same pod (sidecars), they
+    share the same network namespace and must use 'localhost' instead.
+
+    This function replaces service name references with 'localhost' in env values.
+    """
+    import re
+
+    if not sibling_service_names:
+        return envs
+
+    result = {}
+    for env_var, env_val in envs.items():
+        if env_val is None:
+            result[env_var] = env_val
+            continue
+
+        new_val = str(env_val)
+        for service_name in sibling_service_names:
+            # Match service name followed by optional port (e.g., 'db:5432', 'db')
+            # Handle URLs like: postgres://user:pass@db:5432/dbname
+            # and simple refs like: db:5432 or just db
+            pattern = rf"\b{re.escape(service_name)}(:\d+)?\b"
+            new_val = re.sub(pattern, lambda m: f'localhost{m.group(1) or ""}', new_val)
+
+        result[env_var] = new_val
+
+    return result
+
+
 def envs_from_environment_variables_map(
    map: Mapping[str, str]
 ) -> List[client.V1EnvVar]:
--- a/stack_orchestrator/deploy/spec.py
+++ b/stack_orchestrator/deploy/spec.py
@ -98,6 +98,17 @@ class Spec:
    def get_image_registry(self):
        return self.obj.get(constants.image_registry_key)

+    def get_image_registry_config(self) -> typing.Optional[typing.Dict]:
+        """Returns registry auth config: {server, username, token-env}.
+
+        Used for private container registries like GHCR. The token-env field
+        specifies an environment variable containing the API token/PAT.
+
+        Note: Uses 'registry-credentials' key to avoid collision with
+        'image-registry' key which is for pushing images.
+        """
+        return self.obj.get("registry-credentials")
+
    def get_volumes(self):
        return self.obj.get(constants.volumes_key, {})
Author	SHA1	Message	Date
AFDudley	4a1b5d86fd	Merge pull request 'fix(k8s): translate service names to localhost for sidecar containers' (#989 ) from fix-sidecar-localhost into main Some checks failed Lint Checks / Run linter (push) Successful in 16s Details Publish / Build and publish (push) Successful in 29s Details Deploy Test / Run deploy test suite (push) Successful in 2m10s Details Webapp Test / Run webapp test suite (push) Successful in 3m51s Details Smoke Test / Run basic test suite (push) Successful in 3m51s Details K8s Deploy Test / Run deploy test suite on kind/k8s (push) Failing after 1m46s Details Database Test / Run database hosting test on kind/k8s (push) Failing after 2m6s Details Container Registry Test / Run contaier registry hosting test on kind/k8s (push) Failing after 2m36s Details External Stack Test / Run external stack test suite (push) Failing after 1m59s Details Fixturenet-Laconicd-Test / Run Laconicd fixturenet and Laconic CLI tests (push) Successful in 20m41s Details Reviewed-on: #989	2026-02-03 23:13:27 +00:00
A. F. Dudley	019225ca18	fix(k8s): translate service names to localhost for sidecar containers Some checks failed Lint Checks / Run linter (push) Failing after 3s Details Lint Checks / Run linter (pull_request) Failing after 4s Details Deploy Test / Run deploy test suite (pull_request) Failing after 4s Details K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Failing after 5s Details K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Failing after 4s Details Webapp Test / Run webapp test suite (pull_request) Failing after 5s Details Smoke Test / Run basic test suite (pull_request) Failing after 5s Details In docker-compose, services can reference each other by name (e.g., 'db:5432'). In Kubernetes, when multiple containers are in the same pod (sidecars), they share the same network namespace and must use 'localhost' instead. This fix adds translate_sidecar_service_names() which replaces docker-compose service name references with 'localhost' in environment variable values for containers that share the same pod. Fixes issue where multi-container pods fail because one container tries to connect to a sibling using the compose service name instead of localhost. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 18:10:32 -05:00
AFDudley	0296da6f64	Merge pull request 'feat(k8s): namespace-per-deployment for resource isolation and cleanup' (#988 ) from feat-namespace-per-deployment into main Some checks failed Lint Checks / Run linter (push) Failing after 5s Details Deploy Test / Run deploy test suite (push) Failing after 5s Details Publish / Build and publish (push) Failing after 6s Details Webapp Test / Run webapp test suite (push) Failing after 5s Details Smoke Test / Run basic test suite (push) Failing after 5s Details Reviewed-on: #988	2026-02-03 23:09:16 +00:00
A. F. Dudley	d913926144	feat(k8s): namespace-per-deployment for resource isolation and cleanup Some checks failed Lint Checks / Run linter (push) Failing after 4s Details Deploy Test / Run deploy test suite (pull_request) Failing after 5s Details K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Failing after 5s Details K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Failing after 5s Details Webapp Test / Run webapp test suite (pull_request) Failing after 5s Details Smoke Test / Run basic test suite (pull_request) Failing after 4s Details Lint Checks / Run linter (pull_request) Failing after 3s Details Each deployment now gets its own Kubernetes namespace (laconic-{deployment_id}). This provides: - Resource isolation between deployments on the same cluster - Simplified cleanup: deleting the namespace cascades to all namespaced resources - No orphaned resources possible when deployment IDs change Changes: - Set k8s_namespace based on deployment name in __init__ - Add _ensure_namespace() to create namespace before deploying resources - Add _delete_namespace() for cleanup - Simplify down() to just delete PVs (cluster-scoped) and the namespace - Fix hardcoded "default" namespace in logs function Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 18:04:52 -05:00
AFDudley	b41e0cb2f5	Merge pull request 'fix(k8s): query resources by label in down() for proper cleanup' (#987 ) from fix-down-cleanup-by-label into main Some checks failed Lint Checks / Run linter (push) Failing after 17s Details Publish / Build and publish (push) Successful in 27s Details Deploy Test / Run deploy test suite (push) Successful in 2m13s Details Smoke Test / Run basic test suite (push) Successful in 3m54s Details Webapp Test / Run webapp test suite (push) Successful in 4m13s Details Reviewed-on: #987	2026-02-03 22:57:52 +00:00
A. F. Dudley	47d3d10ead	fix(k8s): query resources by label in down() for proper cleanup Some checks failed Lint Checks / Run linter (push) Failing after 14s Details Lint Checks / Run linter (pull_request) Failing after 15s Details K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Failing after 2m3s Details Deploy Test / Run deploy test suite (pull_request) Successful in 2m10s Details K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Failing after 2m51s Details Webapp Test / Run webapp test suite (pull_request) Successful in 4m0s Details Smoke Test / Run basic test suite (pull_request) Successful in 3m56s Details Previously, down() generated resource names from the deployment config and deleted those specific names. This failed to clean up orphaned resources when deployment IDs changed (e.g., after force_redeploy). Changes: - Add 'app' label to all resources: Ingress, Service, NodePort, ConfigMap, PV - Refactor down() to query K8s by label selector instead of generating names - This ensures all resources for a deployment are cleaned up, even if the deployment config has changed or been deleted Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:55:14 -05:00
AFDudley	21d47908cc	Merge pull request 'feat(k8s): ACME email fix, etcd persistence, volume paths' (#986 ) from fix-caddy-acme-email-rbac into main Some checks failed Lint Checks / Run linter (push) Failing after 16s Details Publish / Build and publish (push) Successful in 29s Details Deploy Test / Run deploy test suite (push) Successful in 2m10s Details Webapp Test / Run webapp test suite (push) Successful in 3m46s Details Smoke Test / Run basic test suite (push) Successful in 3m47s Details Reviewed-on: #986	2026-02-03 22:31:47 +00:00
A. F. Dudley	f70e87b848	Add etcd + PKI extraMounts for offline data recovery Some checks failed Lint Checks / Run linter (push) Failing after 13s Details Lint Checks / Run linter (pull_request) Failing after 16s Details Deploy Test / Run deploy test suite (pull_request) Successful in 2m18s Details K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Failing after 2m43s Details K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Successful in 3m31s Details Smoke Test / Run basic test suite (pull_request) Successful in 4m8s Details Webapp Test / Run webapp test suite (pull_request) Successful in 4m21s Details Mount /var/lib/etcd and /etc/kubernetes/pki to host filesystem so cluster state is preserved for offline recovery. Each deployment gets its own backup directory keyed by deployment ID. Directory structure: data/cluster-backups/{deployment_id}/etcd/ data/cluster-backups/{deployment_id}/pki/ This enables extracting secrets from etcd backups using etcdctl with the preserved PKI certificates. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:19:52 -05:00
A. F. Dudley	5bc6c978ac	feat(k8s): support acme-email config for Caddy ingress Adds support for configuring ACME email for Let's Encrypt certificates in kind deployments. The email can be specified in the spec under network.acme-email and will be used to configure the Caddy ingress controller ConfigMap. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:19:52 -05:00
A. F. Dudley	ee59918082	Allow relative volume paths for k8s-kind deployments For k8s-kind, relative paths (e.g., ./data/rpc-config) are resolved to $DEPLOYMENT_DIR/path by _make_absolute_host_path() during kind config generation. This provides Docker Host persistence that survives cluster restarts. Previously, validation threw an exception before paths could be resolved, making it impossible to use relative paths for persistent storage. Changes: - deployment_create.py: Skip relative path check for k8s-kind - cluster_info.py: Allow relative paths to reach PV generation - docs/deployment_patterns.md: Document volume persistence patterns Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:17:44 -05:00
A. F. Dudley	581ceaea94	docs: Add cluster and volume management section Document that: - Volumes persist across cluster deletion by design - Only use --delete-volumes when explicitly requested - Multiple deployments share one kind cluster - Use --skip-cluster-management to stop single deployment Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:16:26 -05:00
A. F. Dudley	7cecf2caa6	Fix Caddy ACME email race condition by templating YAML Previously, install_ingress_for_kind() applied the YAML (which starts the Caddy pod with email: ""), then patched the ConfigMap afterward. The pod had already read the empty email and Caddy doesn't hot-reload. Now template the email into the YAML before applying, so the pod starts with the correct email from the beginning. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:16:26 -05:00
A. F. Dudley	cb6fdb77a6	Rename image-registry to registry-credentials to avoid collision The existing 'image-registry' key is used for pushing images to a remote registry (URL string). Rename the new auth config to 'registry-credentials' to avoid collision. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:16:26 -05:00
A. F. Dudley	73ba13aaa5	Add private registry authentication support Add ability to configure private container registry credentials in spec.yml for deployments using images from registries like GHCR. - Add get_image_registry_config() to spec.py for parsing image-registry config - Add create_registry_secret() to create K8s docker-registry secrets - Update cluster_info.py to use dynamic {deployment}-registry secret names - Update deploy_k8s.py to create registry secret before deployment - Document feature in deployment_patterns.md The token-env pattern keeps credentials out of git - the spec references an environment variable name, and the actual token is passed at runtime. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:16:26 -05:00
A. F. Dudley	d82b3fb881	Only load locally-built images into kind, auto-detect ingress - Check stack.yml containers: field to determine which images are local builds - Only load local images via kind load; let k8s pull registry images directly - Add is_ingress_running() to skip ingress installation if already running - Fixes deployment failures when public registry images aren't in local Docker Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:16:26 -05:00
A. F. Dudley	3bc7832d8c	Fix deployment name extraction from path When stack: field in spec.yml contains a path (e.g., stack_orchestrator/data/stacks/name), extract just the final name component for K8s secret naming. K8s resource names must be valid RFC 1123 subdomains and cannot contain slashes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:16:26 -05:00
A. F. Dudley	a75138093b	Add setup-repositories to key files list Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:16:26 -05:00
A. F. Dudley	1128c95969	Split documentation: README for users, CLAUDE.md for agents README.md: deployment types, external stacks, commands, spec.yml reference CLAUDE.md: implementation details, code locations, codebase navigation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:16:26 -05:00
A. F. Dudley	d292e7c48d	Add k8s-kind architecture documentation to CLAUDE.md Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:16:25 -05:00
A. F. Dudley	b057969ddd	Clarify create_cluster docstring: one cluster per host by design Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
A. F. Dudley	ca090d2cd5	Add $generate:type:length$ token support for K8s secrets - Add GENERATE_TOKEN_PATTERN to detect $generate:hex:N$ and $generate:base64:N$ tokens - Add _generate_and_store_secrets() to create K8s Secrets from spec.yml config - Modify _write_config_file() to separate secrets from regular config - Add env_from with secretRef to container spec in cluster_info.py - Secrets are injected directly into containers via K8s native mechanism This enables declarative secret generation in spec.yml: config: SESSION_SECRET: $generate:hex:32$ DB_PASSWORD: $generate:hex:16$ Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
A. F. Dudley	2d3721efa4	Add cluster reuse for multi-stack k8s-kind deployments When deploying a second stack to k8s-kind, automatically reuse an existing kind cluster instead of trying to create a new one (which would fail due to port 80/443 conflicts). Changes: - helpers.py: create_cluster() now checks for existing cluster first - deploy_k8s.py: up() captures returned cluster name and updates self This enables deploying multiple stacks (e.g., gorbagana-rpc + trashscan-explorer) to the same kind cluster. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
A. F. Dudley	4408725b08	Fix repo root path calculation (4 parents from stack path)	2026-02-03 17:15:19 -05:00
A. F. Dudley	22d64f1e97	Add --spec-file option to restart and auto-detect GitOps spec - Add --spec-file option to specify spec location in repo - Auto-detect deployment/spec.yml in repo as GitOps location - Fall back to deployment dir if no repo spec found Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
A. F. Dudley	14258500bc	Fix restart command for GitOps deployments - Remove init_operation() from restart - don't regenerate spec from commands.py defaults, use existing git-tracked spec.yml instead - Add docs/deployment_patterns.md documenting GitOps workflow - Add pre-commit rule to CLAUDE.md - Fix line length issues in helpers.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
A. F. Dudley	3fbd854b8c	Use docker for etcd existence check (root-owned dir) The etcd directory is root-owned, so shell test -f fails. Use docker with volume mount to check file existence. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
A. F. Dudley	e2d3c44321	Keep timestamped backup of etcd forever Create member.backup-YYYYMMDD-HHMMSS before cleaning. Each cluster recreation creates a new backup, preserving history. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
A. F. Dudley	720e01fc75	Preserve original etcd backup until restore is verified Move original to .bak, move new into place, then delete bak. If anything fails before the swap, original remains intact. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
A. F. Dudley	5b06cffe17	Use whitelist approach for etcd cleanup Instead of trying to delete specific stale resources (blacklist), keep only the valuable data (caddy TLS certs) and delete everything else. This is more robust as we don't need to maintain a list of all possible stale resources. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
A. F. Dudley	8948f5bfec	Fix etcd cleanup to use docker for root-owned files Use docker containers with volume mounts to handle all file operations on root-owned etcd directories, avoiding the need for sudo on the host. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
A. F. Dudley	675ee87544	Clear stale CNI resources from persisted etcd before cluster creation When etcd is persisted (for certificate backup) and a cluster is recreated, kind tries to install CNI (kindnet) fresh but the persisted etcd already has those resources, causing 'AlreadyExists' errors and cluster creation failure. This fix: - Detects etcd mount path from kind config - Before cluster creation, clears stale CNI resources (kindnet, coredns) - Preserves certificate and other important data Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
A. F. Dudley	8d3191e4fd	Fix Caddy ingress ACME email and RBAC issues - Add acme_email_key constant for spec.yml parsing - Add get_acme_email() method to Spec class - Modify install_ingress_for_kind() to patch ConfigMap with email - Pass acme-email from spec to ingress installation - Add 'delete' verb to leases RBAC for certificate lock cleanup The acme-email field in spec.yml was previously ignored, causing Let's Encrypt to fail with "unable to parse email address". The missing delete permission on leases caused lock cleanup failures. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
A. F. Dudley	c197406cc7	feat(deploy): add deployment restart command Add `laconic-so deployment restart` command that: - Pulls latest code from stack git repository - Regenerates spec.yml from stack's commands.py - Verifies DNS if hostname changed (with --force to skip) - Syncs deployment directory preserving cluster ID and data - Stops and restarts deployment with --skip-cluster-management Also stores stack-source path in deployment.yml during create for automatic stack location on restart. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
A. F. Dudley	4713107546	docs(CLAUDE.md): add external stacks preferred guideline Document that external stack pattern should be used when creating new stacks for any reason, with directory structure and usage examples. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:15:19 -05:00
AFDudley	88dccdfb7c	Merge pull request 'fix(deploy): merge volumes from stack init() instead of overwriting' (#985 ) from fix-init-volumes-merge into main Some checks failed Lint Checks / Run linter (push) Successful in 16s Details Publish / Build and publish (push) Successful in 29s Details Deploy Test / Run deploy test suite (push) Successful in 2m5s Details Smoke Test / Run basic test suite (push) Successful in 3m46s Details Webapp Test / Run webapp test suite (push) Successful in 3m46s Details Fixturenet-Laconicd-Test / Run Laconicd fixturenet and Laconic CLI tests (push) Successful in 17m15s Details K8s Deploy Test / Run deploy test suite on kind/k8s (push) Successful in 2m54s Details Database Test / Run database hosting test on kind/k8s (push) Successful in 4m3s Details Container Registry Test / Run contaier registry hosting test on kind/k8s (push) Failing after 3m25s Details External Stack Test / Run external stack test suite (push) Failing after 2m18s Details Reviewed-on: #985	2026-01-31 23:39:38 +00:00