Compare commits

..

12 Commits

Author SHA1 Message Date
A. F. Dudley
5a0f573b0e Allow relative volume paths for k8s-kind deployments
All checks were successful
Lint Checks / Run linter (push) Successful in 13s
For k8s-kind, relative paths (e.g., ./data/rpc-config) are resolved to
$DEPLOYMENT_DIR/path by _make_absolute_host_path() during kind config
generation. This provides Docker Host persistence that survives cluster
restarts.

Previously, validation threw an exception before paths could be resolved,
making it impossible to use relative paths for persistent storage.

Changes:
- deployment_create.py: Skip relative path check for k8s-kind
- cluster_info.py: Allow relative paths to reach PV generation
- docs/deployment_patterns.md: Document volume persistence patterns

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 23:26:13 -05:00
A. F. Dudley
ee89f9e87d Fix repo root path calculation (4 parents from stack path)
All checks were successful
Lint Checks / Run linter (push) Successful in 14s
2026-02-02 22:49:19 -05:00
A. F. Dudley
600eb93b4d Add --spec-file option to restart and auto-detect GitOps spec
All checks were successful
Lint Checks / Run linter (push) Successful in 14s
- Add --spec-file option to specify spec location in repo
- Auto-detect deployment/spec.yml in repo as GitOps location
- Fall back to deployment dir if no repo spec found

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 22:48:19 -05:00
A. F. Dudley
ca3153bb78 Fix restart command for GitOps deployments
All checks were successful
Lint Checks / Run linter (push) Successful in 13s
- Remove init_operation() from restart - don't regenerate spec from
  commands.py defaults, use existing git-tracked spec.yml instead
- Add docs/deployment_patterns.md documenting GitOps workflow
- Add pre-commit rule to CLAUDE.md
- Fix line length issues in helpers.py

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 22:18:19 -05:00
A. F. Dudley
be334ca39f Use docker for etcd existence check (root-owned dir)
All checks were successful
Lint Checks / Run linter (push) Successful in 13s
The etcd directory is root-owned, so shell test -f fails.
Use docker with volume mount to check file existence.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 19:31:45 -05:00
A. F. Dudley
0213ec5d7d Keep timestamped backup of etcd forever
All checks were successful
Lint Checks / Run linter (push) Successful in 15s
Create member.backup-YYYYMMDD-HHMMSS before cleaning.
Each cluster recreation creates a new backup, preserving history.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 19:30:13 -05:00
A. F. Dudley
a4d8592815 Preserve original etcd backup until restore is verified
All checks were successful
Lint Checks / Run linter (push) Successful in 14s
Move original to .bak, move new into place, then delete bak.
If anything fails before the swap, original remains intact.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 19:28:53 -05:00
A. F. Dudley
8d6e50b3ae Use whitelist approach for etcd cleanup
All checks were successful
Lint Checks / Run linter (push) Successful in 14s
Instead of trying to delete specific stale resources (blacklist),
keep only the valuable data (caddy TLS certs) and delete everything
else. This is more robust as we don't need to maintain a list of
all possible stale resources.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 19:27:59 -05:00
A. F. Dudley
51e65857b9 Fix etcd cleanup to use docker for root-owned files
All checks were successful
Lint Checks / Run linter (push) Successful in 14s
Use docker containers with volume mounts to handle all file
operations on root-owned etcd directories, avoiding the need
for sudo on the host.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 19:22:41 -05:00
A. F. Dudley
ba9f51116d Clear stale CNI resources from persisted etcd before cluster creation
All checks were successful
Lint Checks / Run linter (push) Successful in 14s
When etcd is persisted (for certificate backup) and a cluster is
recreated, kind tries to install CNI (kindnet) fresh but the
persisted etcd already has those resources, causing 'AlreadyExists'
errors and cluster creation failure.

This fix:
- Detects etcd mount path from kind config
- Before cluster creation, clears stale CNI resources (kindnet, coredns)
- Preserves certificate and other important data

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 19:21:00 -05:00
A. F. Dudley
5214bc8c0c Fix Caddy ingress ACME email and RBAC issues
All checks were successful
Lint Checks / Run linter (push) Successful in 14s
- Add acme_email_key constant for spec.yml parsing
- Add get_acme_email() method to Spec class
- Modify install_ingress_for_kind() to patch ConfigMap with email
- Pass acme-email from spec to ingress installation
- Add 'delete' verb to leases RBAC for certificate lock cleanup

The acme-email field in spec.yml was previously ignored, causing
Let's Encrypt to fail with "unable to parse email address".
The missing delete permission on leases caused lock cleanup failures.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 19:13:10 -05:00
A. F. Dudley
10716e9d44 feat(deploy): add deployment restart command
Add `laconic-so deployment restart` command that:
- Pulls latest code from stack git repository
- Regenerates spec.yml from stack's commands.py
- Verifies DNS if hostname changed (with --force to skip)
- Syncs deployment directory preserving cluster ID and data
- Stops and restarts deployment with --skip-cluster-management

Also stores stack-source path in deployment.yml during create
for automatic stack location on restart.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 19:05:27 -05:00
8 changed files with 129 additions and 655 deletions

View File

@ -44,76 +44,6 @@ This project follows principles inspired by literate programming, where developm
This approach treats the human-AI collaboration as a form of **conversational literate programming** where understanding emerges through dialogue before code implementation.
## External Stacks Preferred
When creating new stacks for any reason, **use the external stack pattern** rather than adding stacks directly to this repository.
External stacks follow this structure:
```
my-stack/
└── stack-orchestrator/
├── stacks/
│ └── my-stack/
│ ├── stack.yml
│ └── README.md
├── compose/
│ └── docker-compose-my-stack.yml
└── config/
└── my-stack/
└── (config files)
```
### Usage
```bash
# Fetch external stack
laconic-so fetch-stack github.com/org/my-stack
# Use external stack
STACK_PATH=~/cerc/my-stack/stack-orchestrator/stacks/my-stack
laconic-so --stack $STACK_PATH deploy init --output spec.yml
laconic-so --stack $STACK_PATH deploy create --spec-file spec.yml --deployment-dir deployment
laconic-so deployment --dir deployment start
```
### Examples
- `zenith-karma-stack` - Karma watcher deployment
- `urbit-stack` - Fake Urbit ship for testing
- `zenith-desk-stack` - Desk deployment stack
## Architecture: k8s-kind Deployments
### One Cluster Per Host
One Kind cluster per host by design. Never request or expect separate clusters.
- `create_cluster()` in `helpers.py` reuses any existing cluster
- `cluster-id` in deployment.yml is an identifier, not a cluster request
- All deployments share: ingress controller, etcd, certificates
### Stack Resolution
- External stacks detected via `Path(stack).exists()` in `util.py`
- Config/compose resolution: external path first, then internal fallback
- External path structure: `stack_orchestrator/data/stacks/<name>/stack.yml`
### Secret Generation Implementation
- `GENERATE_TOKEN_PATTERN` in `deployment_create.py` matches `$generate:type:length$`
- `_generate_and_store_secrets()` creates K8s Secret
- `cluster_info.py` adds `envFrom` with `secretRef` to containers
- Non-secret config written to `config.env`
### Repository Cloning
`setup-repositories --git-ssh` clones repos defined in stack.yml's `repos:` field. Requires SSH agent.
### Key Files (for codebase navigation)
- `repos/setup_repositories.py`: `setup-repositories` command (git clone)
- `deployment_create.py`: `deploy create` command, secret generation
- `deployment.py`: `deployment start/stop/restart` commands
- `deploy_k8s.py`: K8s deployer, cluster management calls
- `helpers.py`: `create_cluster()`, etcd cleanup, kind operations
- `cluster_info.py`: K8s resource generation (Deployment, Service, Ingress)
## Insights and Observations
### Design Principles

View File

@ -71,59 +71,6 @@ The various [stacks](/stack_orchestrator/data/stacks) each contain instructions
- [laconicd with console and CLI](stack_orchestrator/data/stacks/fixturenet-laconic-loaded)
- [kubo (IPFS)](stack_orchestrator/data/stacks/kubo)
## Deployment Types
- **compose**: Docker Compose on local machine
- **k8s**: External Kubernetes cluster (requires kubeconfig)
- **k8s-kind**: Local Kubernetes via Kind - one cluster per host, shared by all deployments
## External Stacks
Stacks can live in external git repositories. Required structure:
```
<repo>/
stack_orchestrator/data/
stacks/<stack-name>/stack.yml
compose/docker-compose-<pod-name>.yml
deployment/spec.yml
```
## Deployment Commands
```bash
# Create deployment from spec
laconic-so --stack <path> deploy create --spec-file <spec.yml> --deployment-dir <dir>
# Start (creates cluster on first run)
laconic-so deployment --dir <dir> start
# GitOps restart (git pull + redeploy, preserves data)
laconic-so deployment --dir <dir> restart
# Stop
laconic-so deployment --dir <dir> stop
```
## spec.yml Reference
```yaml
stack: stack-name-or-path
deploy-to: k8s-kind
network:
http-proxy:
- host-name: app.example.com
routes:
- path: /
proxy-to: service-name:port
acme-email: admin@example.com
config:
ENV_VAR: value
SECRET_VAR: $generate:hex:32$ # Auto-generated, stored in K8s Secret
volumes:
volume-name:
```
## Contributing
See the [CONTRIBUTING.md](/docs/CONTRIBUTING.md) for developer mode install.

View File

@ -76,94 +76,6 @@ git pull # Get latest spec.yml from your operator repo
laconic-so deployment --dir my-deployment restart
```
## Private Registry Authentication
For deployments using images from private container registries (e.g., GitHub Container Registry), configure authentication in your spec.yml:
### Configuration
Add a `registry-credentials` section to your spec.yml:
```yaml
registry-credentials:
server: ghcr.io
username: your-org-or-username
token-env: REGISTRY_TOKEN
```
**Fields:**
- `server`: The registry hostname (e.g., `ghcr.io`, `docker.io`, `gcr.io`)
- `username`: Registry username (for GHCR, use your GitHub username or org name)
- `token-env`: Name of the environment variable containing your API token/PAT
### Token Environment Variable
The `token-env` pattern keeps credentials out of version control. Set the environment variable when running `deployment start`:
```bash
export REGISTRY_TOKEN="your-personal-access-token"
laconic-so deployment --dir my-deployment start
```
For GHCR, create a Personal Access Token (PAT) with `read:packages` scope.
### Ansible Integration
When using Ansible for deployments, pass the token from a credentials file:
```yaml
- name: Start deployment
ansible.builtin.command:
cmd: laconic-so deployment --dir {{ deployment_dir }} start
environment:
REGISTRY_TOKEN: "{{ lookup('file', '~/.credentials/ghcr_token') }}"
```
### How It Works
1. laconic-so reads the `registry-credentials` config from spec.yml
2. Creates a Kubernetes `docker-registry` secret named `{deployment}-registry`
3. The deployment's pods reference this secret for image pulls
## Cluster and Volume Management
### Stopping Deployments
The `deployment stop` command has two important flags:
```bash
# Default: stops deployment, deletes cluster, PRESERVES volumes
laconic-so deployment --dir my-deployment stop
# Explicitly delete volumes (USE WITH CAUTION)
laconic-so deployment --dir my-deployment stop --delete-volumes
```
### Volume Persistence
Volumes persist across cluster deletion by design. This is important because:
- **Data survives cluster recreation**: Ledger data, databases, and other state are preserved
- **Faster recovery**: No need to re-sync or rebuild data after cluster issues
- **Safe cluster upgrades**: Delete and recreate cluster without data loss
**Only use `--delete-volumes` when:**
- You explicitly want to start fresh with no data
- The user specifically requests volume deletion
- You're cleaning up a test/dev environment completely
### Shared Cluster Architecture
In kind deployments, multiple stacks share a single cluster:
- First `deployment start` creates the cluster
- Subsequent deployments reuse the existing cluster
- `deployment stop` on ANY deployment deletes the shared cluster
- Other deployments will fail until cluster is recreated
To stop a single deployment without affecting the cluster:
```bash
laconic-so deployment --dir my-deployment stop --skip-cluster-management
```
## Volume Persistence in k8s-kind
k8s-kind has 3 storage layers:

View File

@ -15,10 +15,7 @@
import click
from importlib import util
import json
import os
import re
import base64
from pathlib import Path
from typing import List, Optional
import random
@ -487,180 +484,15 @@ def init_operation(
get_yaml().dump(spec_file_content, output_file)
# Token pattern: $generate:hex:32$ or $generate:base64:16$
GENERATE_TOKEN_PATTERN = re.compile(r"\$generate:(\w+):(\d+)\$")
def _generate_and_store_secrets(config_vars: dict, deployment_name: str):
"""Generate secrets for $generate:...$ tokens and store in K8s Secret.
Called by `deploy create` - generates fresh secrets and stores them.
Returns the generated secrets dict for reference.
"""
from kubernetes import client, config as k8s_config
secrets = {}
for name, value in config_vars.items():
if not isinstance(value, str):
continue
match = GENERATE_TOKEN_PATTERN.search(value)
if not match:
continue
secret_type, length = match.group(1), int(match.group(2))
if secret_type == "hex":
secrets[name] = token_hex(length)
elif secret_type == "base64":
secrets[name] = base64.b64encode(os.urandom(length)).decode()
else:
secrets[name] = token_hex(length)
if not secrets:
return secrets
# Store in K8s Secret
try:
k8s_config.load_kube_config()
except Exception:
# Fall back to in-cluster config if available
try:
k8s_config.load_incluster_config()
except Exception:
print(
"Warning: Could not load kube config, secrets will not be stored in K8s"
)
return secrets
v1 = client.CoreV1Api()
secret_name = f"{deployment_name}-generated-secrets"
namespace = "default"
secret_data = {k: base64.b64encode(v.encode()).decode() for k, v in secrets.items()}
k8s_secret = client.V1Secret(
metadata=client.V1ObjectMeta(name=secret_name), data=secret_data, type="Opaque"
)
try:
v1.create_namespaced_secret(namespace, k8s_secret)
num_secrets = len(secrets)
print(f"Created K8s Secret '{secret_name}' with {num_secrets} secret(s)")
except client.exceptions.ApiException as e:
if e.status == 409: # Already exists
v1.replace_namespaced_secret(secret_name, namespace, k8s_secret)
num_secrets = len(secrets)
print(f"Updated K8s Secret '{secret_name}' with {num_secrets} secret(s)")
else:
raise
return secrets
def create_registry_secret(spec: Spec, deployment_name: str) -> Optional[str]:
"""Create K8s docker-registry secret from spec + environment.
Reads registry configuration from spec.yml and creates a Kubernetes
secret of type kubernetes.io/dockerconfigjson for image pulls.
Args:
spec: The deployment spec containing image-registry config
deployment_name: Name of the deployment (used for secret naming)
Returns:
The secret name if created, None if no registry config
"""
from kubernetes import client, config as k8s_config
registry_config = spec.get_image_registry_config()
if not registry_config:
return None
server = registry_config.get("server")
username = registry_config.get("username")
token_env = registry_config.get("token-env")
if not all([server, username, token_env]):
return None
# Type narrowing for pyright - we've validated these aren't None above
assert token_env is not None
token = os.environ.get(token_env)
if not token:
print(
f"Warning: Registry token env var '{token_env}' not set, "
"skipping registry secret"
)
return None
# Create dockerconfigjson format (Docker API uses "password" field for tokens)
auth = base64.b64encode(f"{username}:{token}".encode()).decode()
docker_config = {
"auths": {server: {"username": username, "password": token, "auth": auth}}
}
# Secret name derived from deployment name
secret_name = f"{deployment_name}-registry"
# Load kube config
try:
k8s_config.load_kube_config()
except Exception:
try:
k8s_config.load_incluster_config()
except Exception:
print("Warning: Could not load kube config, registry secret not created")
return None
v1 = client.CoreV1Api()
namespace = "default"
k8s_secret = client.V1Secret(
metadata=client.V1ObjectMeta(name=secret_name),
data={
".dockerconfigjson": base64.b64encode(
json.dumps(docker_config).encode()
).decode()
},
type="kubernetes.io/dockerconfigjson",
)
try:
v1.create_namespaced_secret(namespace, k8s_secret)
print(f"Created registry secret '{secret_name}' for {server}")
except client.exceptions.ApiException as e:
if e.status == 409: # Already exists
v1.replace_namespaced_secret(secret_name, namespace, k8s_secret)
print(f"Updated registry secret '{secret_name}' for {server}")
else:
raise
return secret_name
def _write_config_file(
spec_file: Path, config_env_file: Path, deployment_name: Optional[str] = None
):
def _write_config_file(spec_file: Path, config_env_file: Path):
spec_content = get_parsed_deployment_spec(spec_file)
config_vars = spec_content.get("config", {}) or {}
# Generate and store secrets in K8s if deployment_name provided and tokens exist
if deployment_name and config_vars:
has_generate_tokens = any(
isinstance(v, str) and GENERATE_TOKEN_PATTERN.search(v)
for v in config_vars.values()
)
if has_generate_tokens:
_generate_and_store_secrets(config_vars, deployment_name)
# Write non-secret config to config.env (exclude $generate:...$ tokens)
# Note: we want to write an empty file even if we have no config variables
with open(config_env_file, "w") as output_file:
if config_vars:
for variable_name, variable_value in config_vars.items():
# Skip variables with generate tokens - they go to K8s Secret
if isinstance(variable_value, str) and GENERATE_TOKEN_PATTERN.search(
variable_value
):
continue
output_file.write(f"{variable_name}={variable_value}\n")
if "config" in spec_content and spec_content["config"]:
config_vars = spec_content["config"]
if config_vars:
for variable_name, variable_value in config_vars.items():
output_file.write(f"{variable_name}={variable_value}\n")
def _write_kube_config_file(external_path: Path, internal_path: Path):
@ -928,12 +760,7 @@ def _write_deployment_files(
_create_deployment_file(target_dir, stack_source=stack_source)
# Copy any config variables from the spec file into an env file suitable for compose
# Use stack_name as deployment_name for K8s secret naming
# Extract just the name part if stack_name is a path ("path/to/stack" -> "stack")
deployment_name = Path(stack_name).name.replace("_", "-")
_write_config_file(
spec_file, target_dir.joinpath(constants.config_file_name), deployment_name
)
_write_config_file(spec_file, target_dir.joinpath(constants.config_file_name))
# Copy any k8s config file into the target dir
if deployment_type == "k8s":

View File

@ -31,7 +31,6 @@ from stack_orchestrator.deploy.k8s.helpers import (
envs_from_environment_variables_map,
envs_from_compose_file,
merge_envs,
translate_sidecar_service_names,
)
from stack_orchestrator.deploy.deploy_util import (
parsed_pod_files_map_from_file_names,
@ -126,8 +125,7 @@ class ClusterInfo:
name=(
f"{self.app_name}-nodeport-"
f"{pod_port}-{protocol.lower()}"
),
labels={"app": self.app_name},
)
),
spec=client.V1ServiceSpec(
type="NodePort",
@ -210,9 +208,7 @@ class ClusterInfo:
ingress = client.V1Ingress(
metadata=client.V1ObjectMeta(
name=f"{self.app_name}-ingress",
labels={"app": self.app_name},
annotations=ingress_annotations,
name=f"{self.app_name}-ingress", annotations=ingress_annotations
),
spec=spec,
)
@ -242,10 +238,7 @@ class ClusterInfo:
]
service = client.V1Service(
metadata=client.V1ObjectMeta(
name=f"{self.app_name}-service",
labels={"app": self.app_name},
),
metadata=client.V1ObjectMeta(name=f"{self.app_name}-service"),
spec=client.V1ServiceSpec(
type="ClusterIP",
ports=service_ports,
@ -327,7 +320,7 @@ class ClusterInfo:
spec = client.V1ConfigMap(
metadata=client.V1ObjectMeta(
name=f"{self.app_name}-{cfg_map_name}",
labels={"app": self.app_name, "configmap-label": cfg_map_name},
labels={"configmap-label": cfg_map_name},
),
binary_data=data,
)
@ -384,10 +377,7 @@ class ClusterInfo:
pv = client.V1PersistentVolume(
metadata=client.V1ObjectMeta(
name=f"{self.app_name}-{volume_name}",
labels={
"app": self.app_name,
"volume-label": f"{self.app_name}-{volume_name}",
},
labels={"volume-label": f"{self.app_name}-{volume_name}"},
),
spec=spec,
)
@ -440,12 +430,6 @@ class ClusterInfo:
if "environment" in service_info
else self.environment_variables.map
)
# Translate docker-compose service names to localhost for sidecars
# All services in the same pod share the network namespace
sibling_services = [s for s in services.keys() if s != service_name]
merged_envs = translate_sidecar_service_names(
merged_envs, sibling_services
)
envs = envs_from_environment_variables_map(merged_envs)
if opts.o.debug:
print(f"Merged envs: {envs}")
@ -473,16 +457,6 @@ class ClusterInfo:
if "command" in service_info:
cmd = service_info["command"]
container_args = cmd if isinstance(cmd, list) else cmd.split()
# Add env_from to pull secrets from K8s Secret
secret_name = f"{self.app_name}-generated-secrets"
env_from = [
client.V1EnvFromSource(
secret_ref=client.V1SecretEnvSource(
name=secret_name,
optional=True, # Don't fail if no secrets
)
)
]
container = client.V1Container(
name=container_name,
image=image_to_use,
@ -490,7 +464,6 @@ class ClusterInfo:
command=container_command,
args=container_args,
env=envs,
env_from=env_from,
ports=container_ports if container_ports else None,
volume_mounts=volume_mounts,
security_context=client.V1SecurityContext(
@ -507,12 +480,7 @@ class ClusterInfo:
volumes = volumes_for_pod_files(
self.parsed_pod_yaml_map, self.spec, self.app_name
)
registry_config = self.spec.get_image_registry_config()
if registry_config:
secret_name = f"{self.app_name}-registry"
image_pull_secrets = [client.V1LocalObjectReference(name=secret_name)]
else:
image_pull_secrets = []
image_pull_secrets = [client.V1LocalObjectReference(name="laconic-registry")]
annotations = None
labels = {"app": self.app_name}

View File

@ -29,7 +29,6 @@ from stack_orchestrator.deploy.k8s.helpers import (
from stack_orchestrator.deploy.k8s.helpers import (
install_ingress_for_kind,
wait_for_ingress_in_kind,
is_ingress_running,
)
from stack_orchestrator.deploy.k8s.helpers import (
pods_in_deployment,
@ -96,7 +95,7 @@ class K8sDeployer(Deployer):
core_api: client.CoreV1Api
apps_api: client.AppsV1Api
networking_api: client.NetworkingV1Api
k8s_namespace: str
k8s_namespace: str = "default"
kind_cluster_name: str
skip_cluster_management: bool
cluster_info: ClusterInfo
@ -113,7 +112,6 @@ class K8sDeployer(Deployer):
) -> None:
self.type = type
self.skip_cluster_management = False
self.k8s_namespace = "default" # Will be overridden below if context exists
# TODO: workaround pending refactoring above to cope with being
# created with a null deployment_context
if deployment_context is None:
@ -121,8 +119,6 @@ class K8sDeployer(Deployer):
self.deployment_dir = deployment_context.deployment_dir
self.deployment_context = deployment_context
self.kind_cluster_name = compose_project_name
# Use deployment-specific namespace for resource isolation and easy cleanup
self.k8s_namespace = f"laconic-{compose_project_name}"
self.cluster_info = ClusterInfo()
self.cluster_info.int(
compose_files,
@ -152,46 +148,6 @@ class K8sDeployer(Deployer):
self.apps_api = client.AppsV1Api()
self.custom_obj_api = client.CustomObjectsApi()
def _ensure_namespace(self):
"""Create the deployment namespace if it doesn't exist."""
if opts.o.dry_run:
print(f"Dry run: would create namespace {self.k8s_namespace}")
return
try:
self.core_api.read_namespace(name=self.k8s_namespace)
if opts.o.debug:
print(f"Namespace {self.k8s_namespace} already exists")
except ApiException as e:
if e.status == 404:
# Create the namespace
ns = client.V1Namespace(
metadata=client.V1ObjectMeta(
name=self.k8s_namespace,
labels={"app": self.cluster_info.app_name},
)
)
self.core_api.create_namespace(body=ns)
if opts.o.debug:
print(f"Created namespace {self.k8s_namespace}")
else:
raise
def _delete_namespace(self):
"""Delete the deployment namespace and all resources within it."""
if opts.o.dry_run:
print(f"Dry run: would delete namespace {self.k8s_namespace}")
return
try:
self.core_api.delete_namespace(name=self.k8s_namespace)
if opts.o.debug:
print(f"Deleted namespace {self.k8s_namespace}")
except ApiException as e:
if e.status == 404:
if opts.o.debug:
print(f"Namespace {self.k8s_namespace} not found")
else:
raise
def _create_volume_data(self):
# Create the host-path-mounted PVs for this deployment
pvs = self.cluster_info.get_pvs()
@ -333,40 +289,22 @@ class K8sDeployer(Deployer):
self.skip_cluster_management = skip_cluster_management
if not opts.o.dry_run:
if self.is_kind() and not self.skip_cluster_management:
# Create the kind cluster (or reuse existing one)
kind_config = str(
self.deployment_dir.joinpath(constants.kind_config_filename)
# Create the kind cluster
create_cluster(
self.kind_cluster_name,
str(self.deployment_dir.joinpath(constants.kind_config_filename)),
)
actual_cluster = create_cluster(self.kind_cluster_name, kind_config)
if actual_cluster != self.kind_cluster_name:
# An existing cluster was found, use it instead
self.kind_cluster_name = actual_cluster
# Only load locally-built images into kind
# Registry images (docker.io, ghcr.io, etc.) will be pulled by k8s
local_containers = self.deployment_context.stack.obj.get(
"containers", []
# Ensure the referenced containers are copied into kind
load_images_into_kind(
self.kind_cluster_name, self.cluster_info.image_set
)
if local_containers:
# Filter image_set to only images matching local containers
local_images = {
img
for img in self.cluster_info.image_set
if any(c in img for c in local_containers)
}
if local_images:
load_images_into_kind(self.kind_cluster_name, local_images)
# Note: if no local containers defined, all images come from registries
self.connect_api()
# Create deployment-specific namespace for resource isolation
self._ensure_namespace()
if self.is_kind() and not self.skip_cluster_management:
# Configure ingress controller (not installed by default in kind)
# Skip if already running (idempotent for shared cluster)
if not is_ingress_running():
install_ingress_for_kind(self.cluster_info.spec.get_acme_email())
# Wait for ingress to start
# (deployment provisioning will fail unless this is done)
wait_for_ingress_in_kind()
install_ingress_for_kind(self.cluster_info.spec.get_acme_email())
# Wait for ingress to start
# (deployment provisioning will fail unless this is done)
wait_for_ingress_in_kind()
# Create RuntimeClass if unlimited_memlock is enabled
if self.cluster_info.spec.get_unlimited_memlock():
_create_runtime_class(
@ -377,11 +315,6 @@ class K8sDeployer(Deployer):
else:
print("Dry run mode enabled, skipping k8s API connect")
# Create registry secret if configured
from stack_orchestrator.deploy.deployment_create import create_registry_secret
create_registry_secret(self.cluster_info.spec, self.cluster_info.app_name)
self._create_volume_data()
self._create_deployment()
@ -426,30 +359,107 @@ class K8sDeployer(Deployer):
print("NodePort created:")
print(f"{nodeport_resp}")
def down(self, timeout, volumes, skip_cluster_management):
def down(self, timeout, volumes, skip_cluster_management): # noqa: C901
self.skip_cluster_management = skip_cluster_management
self.connect_api()
# Delete the k8s objects
# PersistentVolumes are cluster-scoped (not namespaced), so delete by label
if volumes:
try:
pvs = self.core_api.list_persistent_volume(
label_selector=f"app={self.cluster_info.app_name}"
)
for pv in pvs.items:
if opts.o.debug:
print(f"Deleting PV: {pv.metadata.name}")
try:
self.core_api.delete_persistent_volume(name=pv.metadata.name)
except ApiException as e:
_check_delete_exception(e)
except ApiException as e:
# Create the host-path-mounted PVs for this deployment
pvs = self.cluster_info.get_pvs()
for pv in pvs:
if opts.o.debug:
print(f"Error listing PVs: {e}")
print(f"Deleting this pv: {pv}")
try:
pv_resp = self.core_api.delete_persistent_volume(
name=pv.metadata.name
)
if opts.o.debug:
print("PV deleted:")
print(f"{pv_resp}")
except ApiException as e:
_check_delete_exception(e)
# Delete the deployment namespace - this cascades to all namespaced resources
# (PVCs, ConfigMaps, Deployments, Services, Ingresses, etc.)
self._delete_namespace()
# Figure out the PVCs for this deployment
pvcs = self.cluster_info.get_pvcs()
for pvc in pvcs:
if opts.o.debug:
print(f"Deleting this pvc: {pvc}")
try:
pvc_resp = self.core_api.delete_namespaced_persistent_volume_claim(
name=pvc.metadata.name, namespace=self.k8s_namespace
)
if opts.o.debug:
print("PVCs deleted:")
print(f"{pvc_resp}")
except ApiException as e:
_check_delete_exception(e)
# Figure out the ConfigMaps for this deployment
cfg_maps = self.cluster_info.get_configmaps()
for cfg_map in cfg_maps:
if opts.o.debug:
print(f"Deleting this ConfigMap: {cfg_map}")
try:
cfg_map_resp = self.core_api.delete_namespaced_config_map(
name=cfg_map.metadata.name, namespace=self.k8s_namespace
)
if opts.o.debug:
print("ConfigMap deleted:")
print(f"{cfg_map_resp}")
except ApiException as e:
_check_delete_exception(e)
deployment = self.cluster_info.get_deployment()
if opts.o.debug:
print(f"Deleting this deployment: {deployment}")
if deployment and deployment.metadata and deployment.metadata.name:
try:
self.apps_api.delete_namespaced_deployment(
name=deployment.metadata.name, namespace=self.k8s_namespace
)
except ApiException as e:
_check_delete_exception(e)
service = self.cluster_info.get_service()
if opts.o.debug:
print(f"Deleting service: {service}")
if service and service.metadata and service.metadata.name:
try:
self.core_api.delete_namespaced_service(
namespace=self.k8s_namespace, name=service.metadata.name
)
except ApiException as e:
_check_delete_exception(e)
ingress = self.cluster_info.get_ingress(use_tls=not self.is_kind())
if ingress and ingress.metadata and ingress.metadata.name:
if opts.o.debug:
print(f"Deleting this ingress: {ingress}")
try:
self.networking_api.delete_namespaced_ingress(
name=ingress.metadata.name, namespace=self.k8s_namespace
)
except ApiException as e:
_check_delete_exception(e)
else:
if opts.o.debug:
print("No ingress to delete")
nodeports: List[client.V1Service] = self.cluster_info.get_nodeports()
for nodeport in nodeports:
if opts.o.debug:
print(f"Deleting this nodeport: {nodeport}")
if nodeport.metadata and nodeport.metadata.name:
try:
self.core_api.delete_namespaced_service(
namespace=self.k8s_namespace, name=nodeport.metadata.name
)
except ApiException as e:
_check_delete_exception(e)
else:
if opts.o.debug:
print("No nodeport to delete")
if self.is_kind() and not self.skip_cluster_management:
# Destroy the kind cluster
@ -587,7 +597,7 @@ class K8sDeployer(Deployer):
log_data = ""
for container in containers:
container_log = self.core_api.read_namespaced_pod_log(
k8s_pod_name, namespace=self.k8s_namespace, container=container
k8s_pod_name, namespace="default", container=container
)
container_log_lines = container_log.splitlines()
for line in container_log_lines:

View File

@ -14,13 +14,11 @@
# along with this program. If not, see <http:#www.gnu.org/licenses/>.
from kubernetes import client, utils, watch
from kubernetes.client.exceptions import ApiException
import os
from pathlib import Path
import subprocess
import re
from typing import Set, Mapping, List, Optional, cast
import yaml
from stack_orchestrator.util import get_k8s_dir, error_exit
from stack_orchestrator.opts import opts
@ -264,61 +262,20 @@ def _clean_etcd_keeping_certs(etcd_path: str) -> bool:
def create_cluster(name: str, config_file: str):
"""Create or reuse the single kind cluster for this host.
There is only one kind cluster per host by design. Multiple deployments
share this cluster. If a cluster already exists, it is reused.
Args:
name: Cluster name (used only when creating the first cluster)
config_file: Path to kind config file (used only when creating)
Returns:
The name of the cluster being used
"""
existing = get_kind_cluster()
if existing:
print(f"Using existing cluster: {existing}")
return existing
# Clean persisted etcd, keeping only TLS certificates
etcd_path = _get_etcd_host_path_from_kind_config(config_file)
if etcd_path:
_clean_etcd_keeping_certs(etcd_path)
print(f"Creating new cluster: {name}")
result = _run_command(f"kind create cluster --name {name} --config {config_file}")
if result.returncode != 0:
raise DeployerException(f"kind create cluster failed: {result}")
return name
def destroy_cluster(name: str):
_run_command(f"kind delete cluster --name {name}")
def is_ingress_running() -> bool:
"""Check if the Caddy ingress controller is already running in the cluster."""
try:
core_v1 = client.CoreV1Api()
pods = core_v1.list_namespaced_pod(
namespace="caddy-system",
label_selector=(
"app.kubernetes.io/name=caddy-ingress-controller,"
"app.kubernetes.io/component=controller"
),
)
for pod in pods.items:
if pod.status and pod.status.container_statuses:
if pod.status.container_statuses[0].ready is True:
if opts.o.debug:
print("Caddy ingress controller already running")
return True
return False
except ApiException:
return False
def wait_for_ingress_in_kind():
core_v1 = client.CoreV1Api()
for i in range(20):
@ -354,34 +311,22 @@ def install_ingress_for_kind(acme_email: str = ""):
)
if opts.o.debug:
print("Installing Caddy ingress controller in kind cluster")
utils.create_from_yaml(api_client, yaml_file=ingress_install)
# Template the YAML with email before applying
with open(ingress_install) as f:
yaml_content = f.read()
# Patch ConfigMap with acme email if provided
if acme_email:
yaml_content = yaml_content.replace('email: ""', f'email: "{acme_email}"')
if opts.o.debug:
print(f"Configured Caddy with ACME email: {acme_email}")
# Apply templated YAML
yaml_objects = list(yaml.safe_load_all(yaml_content))
utils.create_from_yaml(api_client, yaml_objects=yaml_objects)
# Patch ConfigMap with ACME email if provided
if acme_email:
if opts.o.debug:
print(f"Configuring ACME email: {acme_email}")
core_api = client.CoreV1Api()
configmap = core_api.read_namespaced_config_map(
core_v1 = client.CoreV1Api()
configmap = core_v1.read_namespaced_config_map(
name="caddy-ingress-controller-configmap", namespace="caddy-system"
)
configmap.data["email"] = acme_email
core_api.patch_namespaced_config_map(
core_v1.patch_namespaced_config_map(
name="caddy-ingress-controller-configmap",
namespace="caddy-system",
body=configmap,
)
if opts.o.debug:
print(f"Patched Caddy ConfigMap with email: {acme_email}")
def load_images_into_kind(kind_cluster_name: str, image_set: Set[str]):
@ -564,25 +509,6 @@ def _generate_kind_mounts(parsed_pod_files, deployment_dir, deployment_context):
volume_host_path_map = _get_host_paths_for_volumes(deployment_context)
seen_host_path_mounts = set() # Track to avoid duplicate mounts
# Cluster state backup for offline data recovery (unique per deployment)
# etcd contains all k8s state; PKI certs needed to decrypt etcd offline
deployment_id = deployment_context.id
backup_subdir = f"cluster-backups/{deployment_id}"
etcd_host_path = _make_absolute_host_path(
Path(f"./data/{backup_subdir}/etcd"), deployment_dir
)
volume_definitions.append(
f" - hostPath: {etcd_host_path}\n" f" containerPath: /var/lib/etcd\n"
)
pki_host_path = _make_absolute_host_path(
Path(f"./data/{backup_subdir}/pki"), deployment_dir
)
volume_definitions.append(
f" - hostPath: {pki_host_path}\n" f" containerPath: /etc/kubernetes/pki\n"
)
# Note these paths are relative to the location of the pod files (at present)
# So we need to fix up to make them correct and absolute because kind assumes
# relative to the cwd.
@ -942,41 +868,6 @@ def envs_from_compose_file(
return result
def translate_sidecar_service_names(
envs: Mapping[str, str], sibling_service_names: List[str]
) -> Mapping[str, str]:
"""Translate docker-compose service names to localhost for sidecar containers.
In docker-compose, services can reference each other by name (e.g., 'db:5432').
In Kubernetes, when multiple containers are in the same pod (sidecars), they
share the same network namespace and must use 'localhost' instead.
This function replaces service name references with 'localhost' in env values.
"""
import re
if not sibling_service_names:
return envs
result = {}
for env_var, env_val in envs.items():
if env_val is None:
result[env_var] = env_val
continue
new_val = str(env_val)
for service_name in sibling_service_names:
# Match service name followed by optional port (e.g., 'db:5432', 'db')
# Handle URLs like: postgres://user:pass@db:5432/dbname
# and simple refs like: db:5432 or just db
pattern = rf"\b{re.escape(service_name)}(:\d+)?\b"
new_val = re.sub(pattern, lambda m: f'localhost{m.group(1) or ""}', new_val)
result[env_var] = new_val
return result
def envs_from_environment_variables_map(
map: Mapping[str, str]
) -> List[client.V1EnvVar]:

View File

@ -98,17 +98,6 @@ class Spec:
def get_image_registry(self):
return self.obj.get(constants.image_registry_key)
def get_image_registry_config(self) -> typing.Optional[typing.Dict]:
"""Returns registry auth config: {server, username, token-env}.
Used for private container registries like GHCR. The token-env field
specifies an environment variable containing the API token/PAT.
Note: Uses 'registry-credentials' key to avoid collision with
'image-registry' key which is for pushing images.
"""
return self.obj.get("registry-credentials")
def get_volumes(self):
return self.obj.get(constants.volumes_key, {})