Compare commits

..

No commits in common. "main" and "roysc/deployment-create-sync" have entirely different histories.

27 changed files with 253 additions and 2113 deletions

View File

@ -8,7 +8,6 @@ NEVER assume your hypotheses are true without evidence
ALWAYS clearly state when something is a hypothesis
ALWAYS use evidence from the systems your interacting with to support your claims and hypotheses
ALWAYS run `pre-commit run --all-files` before committing changes
## Key Principles
@ -44,76 +43,6 @@ This project follows principles inspired by literate programming, where developm
This approach treats the human-AI collaboration as a form of **conversational literate programming** where understanding emerges through dialogue before code implementation.
## External Stacks Preferred
When creating new stacks for any reason, **use the external stack pattern** rather than adding stacks directly to this repository.
External stacks follow this structure:
```
my-stack/
└── stack-orchestrator/
├── stacks/
│ └── my-stack/
│ ├── stack.yml
│ └── README.md
├── compose/
│ └── docker-compose-my-stack.yml
└── config/
└── my-stack/
└── (config files)
```
### Usage
```bash
# Fetch external stack
laconic-so fetch-stack github.com/org/my-stack
# Use external stack
STACK_PATH=~/cerc/my-stack/stack-orchestrator/stacks/my-stack
laconic-so --stack $STACK_PATH deploy init --output spec.yml
laconic-so --stack $STACK_PATH deploy create --spec-file spec.yml --deployment-dir deployment
laconic-so deployment --dir deployment start
```
### Examples
- `zenith-karma-stack` - Karma watcher deployment
- `urbit-stack` - Fake Urbit ship for testing
- `zenith-desk-stack` - Desk deployment stack
## Architecture: k8s-kind Deployments
### One Cluster Per Host
One Kind cluster per host by design. Never request or expect separate clusters.
- `create_cluster()` in `helpers.py` reuses any existing cluster
- `cluster-id` in deployment.yml is an identifier, not a cluster request
- All deployments share: ingress controller, etcd, certificates
### Stack Resolution
- External stacks detected via `Path(stack).exists()` in `util.py`
- Config/compose resolution: external path first, then internal fallback
- External path structure: `stack_orchestrator/data/stacks/<name>/stack.yml`
### Secret Generation Implementation
- `GENERATE_TOKEN_PATTERN` in `deployment_create.py` matches `$generate:type:length$`
- `_generate_and_store_secrets()` creates K8s Secret
- `cluster_info.py` adds `envFrom` with `secretRef` to containers
- Non-secret config written to `config.env`
### Repository Cloning
`setup-repositories --git-ssh` clones repos defined in stack.yml's `repos:` field. Requires SSH agent.
### Key Files (for codebase navigation)
- `repos/setup_repositories.py`: `setup-repositories` command (git clone)
- `deployment_create.py`: `deploy create` command, secret generation
- `deployment.py`: `deployment start/stop/restart` commands
- `deploy_k8s.py`: K8s deployer, cluster management calls
- `helpers.py`: `create_cluster()`, etcd cleanup, kind operations
- `cluster_info.py`: K8s resource generation (Deployment, Service, Ingress)
## Insights and Observations
### Design Principles

View File

@ -71,59 +71,6 @@ The various [stacks](/stack_orchestrator/data/stacks) each contain instructions
- [laconicd with console and CLI](stack_orchestrator/data/stacks/fixturenet-laconic-loaded)
- [kubo (IPFS)](stack_orchestrator/data/stacks/kubo)
## Deployment Types
- **compose**: Docker Compose on local machine
- **k8s**: External Kubernetes cluster (requires kubeconfig)
- **k8s-kind**: Local Kubernetes via Kind - one cluster per host, shared by all deployments
## External Stacks
Stacks can live in external git repositories. Required structure:
```
<repo>/
stack_orchestrator/data/
stacks/<stack-name>/stack.yml
compose/docker-compose-<pod-name>.yml
deployment/spec.yml
```
## Deployment Commands
```bash
# Create deployment from spec
laconic-so --stack <path> deploy create --spec-file <spec.yml> --deployment-dir <dir>
# Start (creates cluster on first run)
laconic-so deployment --dir <dir> start
# GitOps restart (git pull + redeploy, preserves data)
laconic-so deployment --dir <dir> restart
# Stop
laconic-so deployment --dir <dir> stop
```
## spec.yml Reference
```yaml
stack: stack-name-or-path
deploy-to: k8s-kind
network:
http-proxy:
- host-name: app.example.com
routes:
- path: /
proxy-to: service-name:port
acme-email: admin@example.com
config:
ENV_VAR: value
SECRET_VAR: $generate:hex:32$ # Auto-generated, stored in K8s Secret
volumes:
volume-name:
```
## Contributing
See the [CONTRIBUTING.md](/docs/CONTRIBUTING.md) for developer mode install.

19
TODO.md
View File

@ -7,25 +7,6 @@ We need an "update stack" command in stack orchestrator and cleaner documentatio
**Context**: Currently, `deploy init` generates a spec file and `deploy create` creates a deployment directory. The `deployment update` command (added by Thomas Lackey) only syncs env vars and restarts - it doesn't regenerate configurations. There's a gap in the workflow for updating stack configurations after initial deployment.
## Bugs
### `deploy create` doesn't auto-generate volume mappings for new pods
When a new pod is added to `stack.yml` (e.g. `monitoring`), `deploy create`
does not generate default host path mappings in spec.yml for the new pod's
volumes. The deployment then fails at scheduling because the PVCs don't exist.
**Expected**: `deploy create` enumerates all volumes from all compose files
in the stack and generates default host paths for any that aren't already
mapped in the spec.yml `volumes:` section.
**Actual**: Only volumes already in spec.yml get PVs. New volumes are silently
missing, causing `FailedScheduling: persistentvolumeclaim not found`.
**Workaround**: Manually add volume entries to spec.yml and create host dirs.
**Files**: `deployment_create.py` (`_write_config_file`, volume handling)
## Architecture Refactoring
### Separate Deployer from Stack Orchestrator CLI

View File

@ -68,7 +68,7 @@ $ laconic-so build-npms --include <package-name> --force-rebuild
## deploy
The `deploy` command group manages persistent deployments. The general workflow is `deploy init` to generate a spec file, then `deploy create` to create a deployment directory from the spec, then runtime commands like `deployment start` and `deployment stop`.
The `deploy` command group manages persistent deployments. The general workflow is `deploy init` to generate a spec file, then `deploy create` to create a deployment directory from the spec, then runtime commands like `deploy up` and `deploy down`.
### deploy init
@ -101,91 +101,35 @@ Options:
- `--spec-file` (required): spec file to use
- `--deployment-dir`: target directory for deployment files
- `--update`: update an existing deployment directory, preserving data volumes and env file. Changed files are backed up with a `.bak` suffix. The deployment's `config.env` and `deployment.yml` are also preserved.
- `--helm-chart`: generate Helm chart instead of deploying (k8s only)
- `--network-dir`: network configuration supplied in this directory
- `--initial-peers`: initial set of persistent peers
## deployment
### deploy up
Runtime commands for managing a created deployment. Use `--dir` to specify the deployment directory.
### deployment start
Start a deployment (`up` is a legacy alias):
Start a deployment:
```
$ laconic-so deployment --dir <deployment-dir> start
$ laconic-so deployment --dir <deployment-dir> up
```
Options:
- `--stay-attached` / `--detatch-terminal`: attach to container stdout (default: detach)
- `--skip-cluster-management` / `--perform-cluster-management`: skip kind cluster creation/teardown (default: perform management). Only affects k8s-kind deployments. Use this when multiple stacks share a single cluster.
### deploy down
### deployment stop
Stop a deployment (`down` is a legacy alias):
Stop a deployment:
```
$ laconic-so deployment --dir <deployment-dir> stop
$ laconic-so deployment --dir <deployment-dir> down
```
Use `--delete-volumes` to also remove data volumes.
Options:
- `--delete-volumes` / `--preserve-volumes`: delete data volumes on stop (default: preserve)
- `--skip-cluster-management` / `--perform-cluster-management`: skip kind cluster teardown (default: perform management). Use this to stop a single deployment without destroying a shared cluster.
### deployment restart
Restart a deployment with GitOps-aware workflow. Pulls latest stack code, syncs the deployment directory from the git-tracked spec, and restarts services:
```
$ laconic-so deployment --dir <deployment-dir> restart
```
See [deployment_patterns.md](deployment_patterns.md) for the recommended GitOps workflow.
### deployment ps
### deploy ps
Show running services:
```
$ laconic-so deployment --dir <deployment-dir> ps
```
### deployment logs
### deploy logs
View service logs:
```
$ laconic-so deployment --dir <deployment-dir> logs
```
Use `-f` to follow and `-n <count>` to tail.
### deployment exec
Execute a command in a running service container:
```
$ laconic-so deployment --dir <deployment-dir> exec <service-name> "<command>"
```
### deployment status
Show deployment status:
```
$ laconic-so deployment --dir <deployment-dir> status
```
### deployment port
Show mapped ports for a service:
```
$ laconic-so deployment --dir <deployment-dir> port <service-name> <port>
```
### deployment push-images
Push deployment images to a registry:
```
$ laconic-so deployment --dir <deployment-dir> push-images
```
### deployment run-job
Run a one-time job in the deployment:
```
$ laconic-so deployment --dir <deployment-dir> run-job <job-name>
```

View File

@ -1,202 +0,0 @@
# Deployment Patterns
## GitOps Pattern
For production deployments, we recommend a GitOps approach where your deployment configuration is tracked in version control.
### Overview
- **spec.yml is your source of truth**: Maintain it in your operator repository
- **Don't regenerate on every restart**: Run `deploy init` once, then customize and commit
- **Use restart for updates**: The restart command respects your git-tracked spec.yml
### Workflow
1. **Initial setup**: Run `deploy init` once to generate a spec.yml template
2. **Customize and commit**: Edit spec.yml with your configuration (hostnames, resources, etc.) and commit to your operator repo
3. **Deploy from git**: Use the committed spec.yml for deployments
4. **Update via git**: Make changes in git, then restart to apply
```bash
# Initial setup (run once)
laconic-so --stack my-stack deploy init --output spec.yml
# Customize for your environment
vim spec.yml # Set hostname, resources, etc.
# Commit to your operator repository
git add spec.yml
git commit -m "Add my-stack deployment configuration"
git push
# On deployment server: deploy from git-tracked spec
laconic-so --stack my-stack deploy create \
--spec-file /path/to/operator-repo/spec.yml \
--deployment-dir my-deployment
laconic-so deployment --dir my-deployment start
```
### Updating Deployments
When you need to update a deployment:
```bash
# 1. Make changes in your operator repo
vim /path/to/operator-repo/spec.yml
git commit -am "Update configuration"
git push
# 2. On deployment server: pull and restart
cd /path/to/operator-repo && git pull
laconic-so deployment --dir my-deployment restart
```
The `restart` command:
- Pulls latest code from the stack repository
- Uses your git-tracked spec.yml (does NOT regenerate from defaults)
- Syncs the deployment directory
- Restarts services
### Anti-patterns
**Don't do this:**
```bash
# BAD: Regenerating spec on every deployment
laconic-so --stack my-stack deploy init --output spec.yml
laconic-so deploy create --spec-file spec.yml ...
```
This overwrites your customizations with defaults from the stack's `commands.py`.
**Do this instead:**
```bash
# GOOD: Use your git-tracked spec
git pull # Get latest spec.yml from your operator repo
laconic-so deployment --dir my-deployment restart
```
## Private Registry Authentication
For deployments using images from private container registries (e.g., GitHub Container Registry), configure authentication in your spec.yml:
### Configuration
Add a `registry-credentials` section to your spec.yml:
```yaml
registry-credentials:
server: ghcr.io
username: your-org-or-username
token-env: REGISTRY_TOKEN
```
**Fields:**
- `server`: The registry hostname (e.g., `ghcr.io`, `docker.io`, `gcr.io`)
- `username`: Registry username (for GHCR, use your GitHub username or org name)
- `token-env`: Name of the environment variable containing your API token/PAT
### Token Environment Variable
The `token-env` pattern keeps credentials out of version control. Set the environment variable when running `deployment start`:
```bash
export REGISTRY_TOKEN="your-personal-access-token"
laconic-so deployment --dir my-deployment start
```
For GHCR, create a Personal Access Token (PAT) with `read:packages` scope.
### Ansible Integration
When using Ansible for deployments, pass the token from a credentials file:
```yaml
- name: Start deployment
ansible.builtin.command:
cmd: laconic-so deployment --dir {{ deployment_dir }} start
environment:
REGISTRY_TOKEN: "{{ lookup('file', '~/.credentials/ghcr_token') }}"
```
### How It Works
1. laconic-so reads the `registry-credentials` config from spec.yml
2. Creates a Kubernetes `docker-registry` secret named `{deployment}-registry`
3. The deployment's pods reference this secret for image pulls
## Cluster and Volume Management
### Stopping Deployments
The `deployment stop` command has two important flags:
```bash
# Default: stops deployment, deletes cluster, PRESERVES volumes
laconic-so deployment --dir my-deployment stop
# Explicitly delete volumes (USE WITH CAUTION)
laconic-so deployment --dir my-deployment stop --delete-volumes
```
### Volume Persistence
Volumes persist across cluster deletion by design. This is important because:
- **Data survives cluster recreation**: Ledger data, databases, and other state are preserved
- **Faster recovery**: No need to re-sync or rebuild data after cluster issues
- **Safe cluster upgrades**: Delete and recreate cluster without data loss
**Only use `--delete-volumes` when:**
- You explicitly want to start fresh with no data
- The user specifically requests volume deletion
- You're cleaning up a test/dev environment completely
### Shared Cluster Architecture
In kind deployments, multiple stacks share a single cluster:
- First `deployment start` creates the cluster
- Subsequent deployments reuse the existing cluster
- `deployment stop` on ANY deployment deletes the shared cluster
- Other deployments will fail until cluster is recreated
To stop a single deployment without affecting the cluster:
```bash
laconic-so deployment --dir my-deployment stop --skip-cluster-management
```
## Volume Persistence in k8s-kind
k8s-kind has 3 storage layers:
- **Docker Host**: The physical server running Docker
- **Kind Node**: A Docker container simulating a k8s node
- **Pod Container**: Your workload
For k8s-kind, volumes with paths are mounted from Docker Host → Kind Node → Pod via extraMounts.
| spec.yml volume | Storage Location | Survives Pod Restart | Survives Cluster Restart |
|-----------------|------------------|---------------------|-------------------------|
| `vol:` (empty) | Kind Node PVC | ✅ | ❌ |
| `vol: ./data/x` | Docker Host | ✅ | ✅ |
| `vol: /abs/path`| Docker Host | ✅ | ✅ |
**Recommendation**: Always use paths for data you want to keep. Relative paths
(e.g., `./data/rpc-config`) resolve to `$DEPLOYMENT_DIR/data/rpc-config` on the
Docker Host.
### Example
```yaml
# In spec.yml
volumes:
rpc-config: ./data/rpc-config # Persists to $DEPLOYMENT_DIR/data/rpc-config
chain-data: ./data/chain # Persists to $DEPLOYMENT_DIR/data/chain
temp-cache: # Empty = Kind Node PVC (lost on cluster delete)
```
### The Antipattern
Empty-path volumes appear persistent because they survive pod restarts (data lives
in Kind Node container). However, this data is lost when the kind cluster is
recreated. This "false persistence" has caused data loss when operators assumed
their data was safe.

View File

@ -29,7 +29,6 @@ network_key = "network"
http_proxy_key = "http-proxy"
image_registry_key = "image-registry"
configmaps_key = "configmaps"
secrets_key = "secrets"
resources_key = "resources"
volumes_key = "volumes"
security_key = "security"
@ -45,4 +44,3 @@ unlimited_memlock_key = "unlimited-memlock"
runtime_class_key = "runtime-class"
high_memlock_runtime = "high-memlock"
high_memlock_spec_filename = "high-memlock-spec.json"
acme_email_key = "acme-email"

View File

@ -1,5 +0,0 @@
services:
test-job:
image: cerc/test-container:local
entrypoint: /bin/sh
command: ["-c", "echo 'Job completed successfully'"]

View File

@ -93,7 +93,6 @@ rules:
- get
- create
- update
- delete
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding

View File

@ -21,7 +21,7 @@ from stack_orchestrator.deploy.deploy_util import VolumeMapping, run_container_c
from pathlib import Path
default_spec_file_content = """config:
test_variable_1: test-value-1
test-variable-1: test-value-1
"""

View File

@ -7,5 +7,3 @@ containers:
- cerc/test-container
pods:
- test
jobs:
- test-job

View File

@ -35,7 +35,6 @@ from stack_orchestrator.util import (
get_dev_root_path,
stack_is_in_deployment,
resolve_compose_file,
get_job_list,
)
from stack_orchestrator.deploy.deployer import DeployerException
from stack_orchestrator.deploy.deployer_factory import getDeployer
@ -131,7 +130,6 @@ def create_deploy_context(
compose_files=cluster_context.compose_files,
compose_project_name=cluster_context.cluster,
compose_env_file=cluster_context.env_file,
job_compose_files=cluster_context.job_compose_files,
)
return DeployCommandContext(stack, cluster_context, deployer)
@ -405,7 +403,7 @@ def _make_cluster_context(ctx, stack, include, exclude, cluster, env_file):
stack_config = get_parsed_stack_config(stack)
if stack_config is not None:
# TODO: syntax check the input here
pods_in_scope = stack_config.get("pods") or []
pods_in_scope = stack_config["pods"]
cluster_config = (
stack_config["config"] if "config" in stack_config else None
)
@ -479,22 +477,6 @@ def _make_cluster_context(ctx, stack, include, exclude, cluster, env_file):
if ctx.verbose:
print(f"files: {compose_files}")
# Gather job compose files (from compose-jobs/ directory in deployment)
job_compose_files = []
if deployment and stack:
stack_config = get_parsed_stack_config(stack)
if stack_config:
jobs = get_job_list(stack_config)
compose_jobs_dir = stack.joinpath("compose-jobs")
for job in jobs:
job_file_name = os.path.join(
compose_jobs_dir, f"docker-compose-{job}.yml"
)
if os.path.exists(job_file_name):
job_compose_files.append(job_file_name)
if ctx.verbose:
print(f"job files: {job_compose_files}")
return ClusterContext(
ctx,
cluster,
@ -503,7 +485,6 @@ def _make_cluster_context(ctx, stack, include, exclude, cluster, env_file):
post_start_commands,
cluster_config,
env_file,
job_compose_files=job_compose_files if job_compose_files else None,
)

View File

@ -29,7 +29,6 @@ class ClusterContext:
post_start_commands: List[str]
config: Optional[str]
env_file: Optional[str]
job_compose_files: Optional[List[str]] = None
@dataclass

View File

@ -34,12 +34,7 @@ def getDeployerConfigGenerator(type: str, deployment_context):
def getDeployer(
type: str,
deployment_context,
compose_files,
compose_project_name,
compose_env_file,
job_compose_files=None,
type: str, deployment_context, compose_files, compose_project_name, compose_env_file
):
if type == "compose" or type is None:
return DockerDeployer(
@ -59,7 +54,6 @@ def getDeployer(
compose_files,
compose_project_name,
compose_env_file,
job_compose_files=job_compose_files,
)
else:
print(f"ERROR: deploy-to {type} is not valid")

View File

@ -15,9 +15,7 @@
import click
from pathlib import Path
import subprocess
import sys
import time
from stack_orchestrator import constants
from stack_orchestrator.deploy.images import push_images_operation
from stack_orchestrator.deploy.deploy import (
@ -230,176 +228,3 @@ def run_job(ctx, job_name, helm_release):
ctx.obj = make_deploy_context(ctx)
run_job_operation(ctx, job_name, helm_release)
@command.command()
@click.option("--stack-path", help="Path to stack git repo (overrides stored path)")
@click.option(
"--spec-file", help="Path to GitOps spec.yml in repo (e.g., deployment/spec.yml)"
)
@click.option("--config-file", help="Config file to pass to deploy init")
@click.option(
"--force",
is_flag=True,
default=False,
help="Skip DNS verification",
)
@click.option(
"--expected-ip",
help="Expected IP for DNS verification (if different from egress)",
)
@click.pass_context
def restart(ctx, stack_path, spec_file, config_file, force, expected_ip):
"""Pull latest code and restart deployment using git-tracked spec.
GitOps workflow:
1. Operator maintains spec.yml in their git repository
2. This command pulls latest code (including updated spec.yml)
3. If hostname changed, verifies DNS routes to this server
4. Syncs deployment directory with the git-tracked spec
5. Stops and restarts the deployment
Data volumes are always preserved. The cluster is never destroyed.
Stack source resolution (in order):
1. --stack-path argument (if provided)
2. stack-source field in deployment.yml (if stored)
3. Error if neither available
Note: spec.yml should be maintained in git, not regenerated from
commands.py on each restart. Use 'deploy init' only for initial
spec generation, then customize and commit to your operator repo.
"""
from stack_orchestrator.util import get_yaml, get_parsed_deployment_spec
from stack_orchestrator.deploy.deployment_create import create_operation
from stack_orchestrator.deploy.dns_probe import verify_dns_via_probe
deployment_context: DeploymentContext = ctx.obj
# Get current spec info (before git pull)
current_spec = deployment_context.spec
current_http_proxy = current_spec.get_http_proxy()
current_hostname = (
current_http_proxy[0]["host-name"] if current_http_proxy else None
)
# Resolve stack source path
if stack_path:
stack_source = Path(stack_path).resolve()
else:
# Try to get from deployment.yml
deployment_file = (
deployment_context.deployment_dir / constants.deployment_file_name
)
deployment_data = get_yaml().load(open(deployment_file))
stack_source_str = deployment_data.get("stack-source")
if not stack_source_str:
print(
"Error: No stack-source in deployment.yml and --stack-path not provided"
)
print("Use --stack-path to specify the stack git repository location")
sys.exit(1)
stack_source = Path(stack_source_str)
if not stack_source.exists():
print(f"Error: Stack source path does not exist: {stack_source}")
sys.exit(1)
print("=== Deployment Restart ===")
print(f"Deployment dir: {deployment_context.deployment_dir}")
print(f"Stack source: {stack_source}")
print(f"Current hostname: {current_hostname}")
# Step 1: Git pull (brings in updated spec.yml from operator's repo)
print("\n[1/4] Pulling latest code from stack repository...")
git_result = subprocess.run(
["git", "pull"], cwd=stack_source, capture_output=True, text=True
)
if git_result.returncode != 0:
print(f"Git pull failed: {git_result.stderr}")
sys.exit(1)
print(f"Git pull: {git_result.stdout.strip()}")
# Determine spec file location
# Priority: --spec-file argument > repo's deployment/spec.yml > deployment dir
# Stack path is like: repo/stack_orchestrator/data/stacks/stack-name
# So repo root is 4 parents up
repo_root = stack_source.parent.parent.parent.parent
if spec_file:
# Spec file relative to repo root
spec_file_path = repo_root / spec_file
else:
# Try standard GitOps location in repo
gitops_spec = repo_root / "deployment" / "spec.yml"
if gitops_spec.exists():
spec_file_path = gitops_spec
else:
# Fall back to deployment directory
spec_file_path = deployment_context.deployment_dir / "spec.yml"
if not spec_file_path.exists():
print(f"Error: spec.yml not found at {spec_file_path}")
print("For GitOps, add spec.yml to your repo at deployment/spec.yml")
print("Or specify --spec-file with path relative to repo root")
sys.exit(1)
print(f"Using spec: {spec_file_path}")
# Parse spec to check for hostname changes
new_spec_obj = get_parsed_deployment_spec(str(spec_file_path))
new_http_proxy = new_spec_obj.get("network", {}).get("http-proxy", [])
new_hostname = new_http_proxy[0]["host-name"] if new_http_proxy else None
print(f"Spec hostname: {new_hostname}")
# Step 2: DNS verification (only if hostname changed)
if new_hostname and new_hostname != current_hostname:
print(f"\n[2/4] Hostname changed: {current_hostname} -> {new_hostname}")
if force:
print("DNS verification skipped (--force)")
else:
print("Verifying DNS via probe...")
if not verify_dns_via_probe(new_hostname):
print(f"\nDNS verification failed for {new_hostname}")
print("Ensure DNS is configured before restarting.")
print("Use --force to skip this check.")
sys.exit(1)
else:
print("\n[2/4] Hostname unchanged, skipping DNS verification")
# Step 3: Sync deployment directory with spec
print("\n[3/4] Syncing deployment directory...")
deploy_ctx = make_deploy_context(ctx)
create_operation(
deployment_command_context=deploy_ctx,
spec_file=str(spec_file_path),
deployment_dir=str(deployment_context.deployment_dir),
update=True,
network_dir=None,
initial_peers=None,
)
# Reload deployment context with updated spec
deployment_context.init(deployment_context.deployment_dir)
ctx.obj = deployment_context
# Stop deployment
print("\n[4/4] Restarting deployment...")
ctx.obj = make_deploy_context(ctx)
down_operation(
ctx, delete_volumes=False, extra_args_list=[], skip_cluster_management=True
)
# Brief pause to ensure clean shutdown
time.sleep(5)
# Start deployment
up_operation(
ctx, services_list=None, stay_attached=False, skip_cluster_management=True
)
print("\n=== Restart Complete ===")
print("Deployment restarted with git-tracked configuration.")
if new_hostname and new_hostname != current_hostname:
print(f"\nNew hostname: {new_hostname}")
print("Caddy will automatically provision TLS certificate.")

View File

@ -15,12 +15,9 @@
import click
from importlib import util
import json
import os
import re
import base64
from pathlib import Path
from typing import List, Optional
from typing import List
import random
from shutil import copy, copyfile, copytree, rmtree
from secrets import token_hex
@ -265,25 +262,6 @@ def call_stack_deploy_create(deployment_context, extra_args):
imported_stack.create(deployment_context, extra_args)
def call_stack_deploy_start(deployment_context):
"""Call start() hooks after k8s deployments and jobs are created.
The start() hook receives the DeploymentContext, allowing stacks to
create additional k8s resources (Services, etc.) in the deployment namespace.
The namespace can be derived as f"laconic-{deployment_context.id}".
"""
python_file_paths = _commands_plugin_paths(deployment_context.stack.name)
for python_file_path in python_file_paths:
if python_file_path.exists():
spec = util.spec_from_file_location("commands", python_file_path)
if spec is None or spec.loader is None:
continue
imported_stack = util.module_from_spec(spec)
spec.loader.exec_module(imported_stack)
if _has_method(imported_stack, "start"):
imported_stack.start(deployment_context)
# Inspect the pod yaml to find config files referenced in subdirectories
# other than the one associated with the pod
def _find_extra_config_dirs(parsed_pod_file, pod):
@ -490,15 +468,9 @@ def init_operation(
else:
volume_descriptors[named_volume] = f"./data/{named_volume}"
if volume_descriptors:
# Merge with existing volumes from stack init()
# init() volumes take precedence over compose defaults
orig_volumes = spec_file_content.get("volumes", {})
spec_file_content["volumes"] = {**volume_descriptors, **orig_volumes}
spec_file_content["volumes"] = volume_descriptors
if configmap_descriptors:
spec_file_content["configmaps"] = configmap_descriptors
if "k8s" in deployer_type:
if "secrets" not in spec_file_content:
spec_file_content["secrets"] = {}
if opts.o.debug:
print(
@ -509,180 +481,15 @@ def init_operation(
get_yaml().dump(spec_file_content, output_file)
# Token pattern: $generate:hex:32$ or $generate:base64:16$
GENERATE_TOKEN_PATTERN = re.compile(r"\$generate:(\w+):(\d+)\$")
def _generate_and_store_secrets(config_vars: dict, deployment_name: str):
"""Generate secrets for $generate:...$ tokens and store in K8s Secret.
Called by `deploy create` - generates fresh secrets and stores them.
Returns the generated secrets dict for reference.
"""
from kubernetes import client, config as k8s_config
secrets = {}
for name, value in config_vars.items():
if not isinstance(value, str):
continue
match = GENERATE_TOKEN_PATTERN.search(value)
if not match:
continue
secret_type, length = match.group(1), int(match.group(2))
if secret_type == "hex":
secrets[name] = token_hex(length)
elif secret_type == "base64":
secrets[name] = base64.b64encode(os.urandom(length)).decode()
else:
secrets[name] = token_hex(length)
if not secrets:
return secrets
# Store in K8s Secret
try:
k8s_config.load_kube_config()
except Exception:
# Fall back to in-cluster config if available
try:
k8s_config.load_incluster_config()
except Exception:
print(
"Warning: Could not load kube config, secrets will not be stored in K8s"
)
return secrets
v1 = client.CoreV1Api()
secret_name = f"{deployment_name}-generated-secrets"
namespace = "default"
secret_data = {k: base64.b64encode(v.encode()).decode() for k, v in secrets.items()}
k8s_secret = client.V1Secret(
metadata=client.V1ObjectMeta(name=secret_name), data=secret_data, type="Opaque"
)
try:
v1.create_namespaced_secret(namespace, k8s_secret)
num_secrets = len(secrets)
print(f"Created K8s Secret '{secret_name}' with {num_secrets} secret(s)")
except client.exceptions.ApiException as e:
if e.status == 409: # Already exists
v1.replace_namespaced_secret(secret_name, namespace, k8s_secret)
num_secrets = len(secrets)
print(f"Updated K8s Secret '{secret_name}' with {num_secrets} secret(s)")
else:
raise
return secrets
def create_registry_secret(spec: Spec, deployment_name: str) -> Optional[str]:
"""Create K8s docker-registry secret from spec + environment.
Reads registry configuration from spec.yml and creates a Kubernetes
secret of type kubernetes.io/dockerconfigjson for image pulls.
Args:
spec: The deployment spec containing image-registry config
deployment_name: Name of the deployment (used for secret naming)
Returns:
The secret name if created, None if no registry config
"""
from kubernetes import client, config as k8s_config
registry_config = spec.get_image_registry_config()
if not registry_config:
return None
server = registry_config.get("server")
username = registry_config.get("username")
token_env = registry_config.get("token-env")
if not all([server, username, token_env]):
return None
# Type narrowing for pyright - we've validated these aren't None above
assert token_env is not None
token = os.environ.get(token_env)
if not token:
print(
f"Warning: Registry token env var '{token_env}' not set, "
"skipping registry secret"
)
return None
# Create dockerconfigjson format (Docker API uses "password" field for tokens)
auth = base64.b64encode(f"{username}:{token}".encode()).decode()
docker_config = {
"auths": {server: {"username": username, "password": token, "auth": auth}}
}
# Secret name derived from deployment name
secret_name = f"{deployment_name}-registry"
# Load kube config
try:
k8s_config.load_kube_config()
except Exception:
try:
k8s_config.load_incluster_config()
except Exception:
print("Warning: Could not load kube config, registry secret not created")
return None
v1 = client.CoreV1Api()
namespace = "default"
k8s_secret = client.V1Secret(
metadata=client.V1ObjectMeta(name=secret_name),
data={
".dockerconfigjson": base64.b64encode(
json.dumps(docker_config).encode()
).decode()
},
type="kubernetes.io/dockerconfigjson",
)
try:
v1.create_namespaced_secret(namespace, k8s_secret)
print(f"Created registry secret '{secret_name}' for {server}")
except client.exceptions.ApiException as e:
if e.status == 409: # Already exists
v1.replace_namespaced_secret(secret_name, namespace, k8s_secret)
print(f"Updated registry secret '{secret_name}' for {server}")
else:
raise
return secret_name
def _write_config_file(
spec_file: Path, config_env_file: Path, deployment_name: Optional[str] = None
):
def _write_config_file(spec_file: Path, config_env_file: Path):
spec_content = get_parsed_deployment_spec(spec_file)
config_vars = spec_content.get("config", {}) or {}
# Generate and store secrets in K8s if deployment_name provided and tokens exist
if deployment_name and config_vars:
has_generate_tokens = any(
isinstance(v, str) and GENERATE_TOKEN_PATTERN.search(v)
for v in config_vars.values()
)
if has_generate_tokens:
_generate_and_store_secrets(config_vars, deployment_name)
# Write non-secret config to config.env (exclude $generate:...$ tokens)
# Note: we want to write an empty file even if we have no config variables
with open(config_env_file, "w") as output_file:
if config_vars:
for variable_name, variable_value in config_vars.items():
# Skip variables with generate tokens - they go to K8s Secret
if isinstance(variable_value, str) and GENERATE_TOKEN_PATTERN.search(
variable_value
):
continue
output_file.write(f"{variable_name}={variable_value}\n")
if "config" in spec_content and spec_content["config"]:
config_vars = spec_content["config"]
if config_vars:
for variable_name, variable_value in config_vars.items():
output_file.write(f"{variable_name}={variable_value}\n")
def _write_kube_config_file(external_path: Path, internal_path: Path):
@ -697,14 +504,11 @@ def _copy_files_to_directory(file_paths: List[Path], directory: Path):
copy(path, os.path.join(directory, os.path.basename(path)))
def _create_deployment_file(deployment_dir: Path, stack_source: Optional[Path] = None):
def _create_deployment_file(deployment_dir: Path):
deployment_file_path = deployment_dir.joinpath(constants.deployment_file_name)
cluster = f"{constants.cluster_name_prefix}{token_hex(8)}"
deployment_content = {constants.cluster_id_key: cluster}
if stack_source:
deployment_content["stack-source"] = str(stack_source)
with open(deployment_file_path, "w") as output_file:
get_yaml().dump(deployment_content, output_file)
output_file.write(f"{constants.cluster_id_key}: {cluster}\n")
def _check_volume_definitions(spec):
@ -712,14 +516,10 @@ def _check_volume_definitions(spec):
for volume_name, volume_path in spec.get_volumes().items():
if volume_path:
if not os.path.isabs(volume_path):
# For k8s-kind: allow relative paths, they'll be resolved
# by _make_absolute_host_path() during kind config generation
if not spec.is_kind_deployment():
deploy_type = spec.get_deployment_type()
raise Exception(
f"Relative path {volume_path} for volume "
f"{volume_name} not supported for {deploy_type}"
)
raise Exception(
f"Relative path {volume_path} for volume {volume_name} not "
f"supported for deployment type {spec.get_deployment_type()}"
)
@click.command()
@ -813,15 +613,11 @@ def create_operation(
generate_helm_chart(stack_name, spec_file, deployment_dir_path)
return # Exit early for helm chart generation
# Resolve stack source path for restart capability
stack_source = get_stack_path(stack_name)
if update:
# Sync mode: write to temp dir, then copy to deployment dir with backups
temp_dir = Path(tempfile.mkdtemp(prefix="deployment-sync-"))
try:
# Write deployment files to temp dir
# (skip deployment.yml to preserve cluster ID)
# Write deployment files to temp dir (skip deployment.yml to preserve cluster ID)
_write_deployment_files(
temp_dir,
Path(spec_file),
@ -829,14 +625,12 @@ def create_operation(
stack_name,
deployment_type,
include_deployment_file=False,
stack_source=stack_source,
)
# Copy from temp to deployment dir, excluding data volumes
# and backing up changed files.
# Exclude data/* to avoid touching user data volumes.
# Exclude config file to preserve deployment settings
# (XXX breaks passing config vars from spec)
# Copy from temp to deployment dir, excluding data volumes and backing up changed files
# Exclude data/* to avoid touching user data volumes
# Exclude config file to preserve deployment settings (XXX breaks passing config vars
# from spec. could warn about this or not exclude...)
exclude_patterns = ["data", "data/*", constants.config_file_name]
_safe_copy_tree(
temp_dir, deployment_dir_path, exclude_patterns=exclude_patterns
@ -853,7 +647,6 @@ def create_operation(
stack_name,
deployment_type,
include_deployment_file=True,
stack_source=stack_source,
)
# Delegate to the stack's Python code
@ -874,7 +667,7 @@ def create_operation(
)
def _safe_copy_tree(src: Path, dst: Path, exclude_patterns: Optional[List[str]] = None):
def _safe_copy_tree(src: Path, dst: Path, exclude_patterns: List[str] = None):
"""
Recursively copy a directory tree, backing up changed files with .bak suffix.
@ -925,7 +718,6 @@ def _write_deployment_files(
stack_name: str,
deployment_type: str,
include_deployment_file: bool = True,
stack_source: Optional[Path] = None,
):
"""
Write deployment files to target directory.
@ -935,8 +727,7 @@ def _write_deployment_files(
:param parsed_spec: Parsed spec object
:param stack_name: Name of stack
:param deployment_type: Type of deployment
:param include_deployment_file: Whether to create deployment.yml (skip for update)
:param stack_source: Path to stack source (git repo) for restart capability
:param include_deployment_file: Whether to create deployment.yml file (skip for update)
"""
stack_file = get_stack_path(stack_name).joinpath(constants.stack_file_name)
parsed_stack = get_parsed_stack_config(stack_name)
@ -947,15 +738,10 @@ def _write_deployment_files(
# Create deployment file if requested
if include_deployment_file:
_create_deployment_file(target_dir, stack_source=stack_source)
_create_deployment_file(target_dir)
# Copy any config variables from the spec file into an env file suitable for compose
# Use stack_name as deployment_name for K8s secret naming
# Extract just the name part if stack_name is a path ("path/to/stack" -> "stack")
deployment_name = Path(stack_name).name.replace("_", "-")
_write_config_file(
spec_file, target_dir.joinpath(constants.config_file_name), deployment_name
)
_write_config_file(spec_file, target_dir.joinpath(constants.config_file_name))
# Copy any k8s config file into the target dir
if deployment_type == "k8s":
@ -1004,11 +790,20 @@ def _write_deployment_files(
script_paths = get_pod_script_paths(parsed_stack, pod)
_copy_files_to_directory(script_paths, destination_script_dir)
if not parsed_spec.is_kubernetes_deployment():
if parsed_spec.is_kubernetes_deployment():
for configmap in parsed_spec.get_configmaps():
source_config_dir = resolve_config_dir(stack_name, configmap)
if os.path.exists(source_config_dir):
destination_config_dir = target_dir.joinpath(
"configmaps", configmap
)
copytree(
source_config_dir, destination_config_dir, dirs_exist_ok=True
)
else:
# TODO:
# This is odd - looks up config dir that matches a volume name,
# then copies as a mount dir?
# AFAICT not used by or relevant to any existing stack - roy
# this is odd - looks up config dir that matches a volume name, then copies as a mount dir?
# AFAICT this is not used by or relevant to any existing stack - roy
# TODO: We should probably only do this if the volume is marked :ro.
for volume_name, volume_path in parsed_spec.get_volumes().items():
@ -1026,22 +821,9 @@ def _write_deployment_files(
dirs_exist_ok=True,
)
# Copy configmap directories for k8s deployments (outside the pod loop
# so this works for jobs-only stacks too)
if parsed_spec.is_kubernetes_deployment():
for configmap in parsed_spec.get_configmaps():
source_config_dir = resolve_config_dir(stack_name, configmap)
if os.path.exists(source_config_dir):
destination_config_dir = target_dir.joinpath(
"configmaps", configmap
)
copytree(
source_config_dir, destination_config_dir, dirs_exist_ok=True
)
# Copy the job files into the target dir
# Copy the job files into the target dir (for Docker deployments)
jobs = get_job_list(parsed_stack)
if jobs:
if jobs and not parsed_spec.is_kubernetes_deployment():
destination_compose_jobs_dir = target_dir.joinpath("compose-jobs")
os.makedirs(destination_compose_jobs_dir, exist_ok=True)
for job in jobs:

View File

@ -1,159 +0,0 @@
# Copyright © 2024 Vulcanize
# SPDX-License-Identifier: AGPL-3.0
"""DNS verification via temporary ingress probe."""
import secrets
import socket
import time
from typing import Optional
import requests
from kubernetes import client
def get_server_egress_ip() -> str:
"""Get this server's public egress IP via ipify."""
response = requests.get("https://api.ipify.org", timeout=10)
response.raise_for_status()
return response.text.strip()
def resolve_hostname(hostname: str) -> list[str]:
"""Resolve hostname to list of IP addresses."""
try:
_, _, ips = socket.gethostbyname_ex(hostname)
return ips
except socket.gaierror:
return []
def verify_dns_simple(hostname: str, expected_ip: Optional[str] = None) -> bool:
"""Simple DNS verification - check hostname resolves to expected IP.
If expected_ip not provided, uses server's egress IP.
Returns True if hostname resolves to expected IP.
"""
resolved_ips = resolve_hostname(hostname)
if not resolved_ips:
print(f"DNS FAIL: {hostname} does not resolve")
return False
if expected_ip is None:
expected_ip = get_server_egress_ip()
if expected_ip in resolved_ips:
print(f"DNS OK: {hostname} -> {resolved_ips} (includes {expected_ip})")
return True
else:
print(f"DNS WARN: {hostname} -> {resolved_ips} (expected {expected_ip})")
return False
def create_probe_ingress(hostname: str, namespace: str = "default") -> str:
"""Create a temporary ingress for DNS probing.
Returns the probe token that the ingress will respond with.
"""
token = secrets.token_hex(16)
networking_api = client.NetworkingV1Api()
# Create a simple ingress that Caddy will pick up
ingress = client.V1Ingress(
metadata=client.V1ObjectMeta(
name="laconic-dns-probe",
annotations={
"kubernetes.io/ingress.class": "caddy",
"laconic.com/probe-token": token,
},
),
spec=client.V1IngressSpec(
rules=[
client.V1IngressRule(
host=hostname,
http=client.V1HTTPIngressRuleValue(
paths=[
client.V1HTTPIngressPath(
path="/.well-known/laconic-probe",
path_type="Exact",
backend=client.V1IngressBackend(
service=client.V1IngressServiceBackend(
name="caddy-ingress-controller",
port=client.V1ServiceBackendPort(number=80),
)
),
)
]
),
)
]
),
)
networking_api.create_namespaced_ingress(namespace=namespace, body=ingress)
return token
def delete_probe_ingress(namespace: str = "default"):
"""Delete the temporary probe ingress."""
networking_api = client.NetworkingV1Api()
try:
networking_api.delete_namespaced_ingress(
name="laconic-dns-probe", namespace=namespace
)
except client.exceptions.ApiException:
pass # Ignore if already deleted
def verify_dns_via_probe(
hostname: str, namespace: str = "default", timeout: int = 30, poll_interval: int = 2
) -> bool:
"""Verify DNS by creating temp ingress and probing it.
This definitively proves that traffic to the hostname reaches this cluster.
Args:
hostname: The hostname to verify
namespace: Kubernetes namespace for probe ingress
timeout: Total seconds to wait for probe to succeed
poll_interval: Seconds between probe attempts
Returns:
True if probe succeeds, False otherwise
"""
# First check DNS resolves at all
if not resolve_hostname(hostname):
print(f"DNS FAIL: {hostname} does not resolve")
return False
print(f"Creating probe ingress for {hostname}...")
create_probe_ingress(hostname, namespace)
try:
# Wait for Caddy to pick up the ingress
time.sleep(3)
# Poll until success or timeout
probe_url = f"http://{hostname}/.well-known/laconic-probe"
start_time = time.time()
last_error = None
while time.time() - start_time < timeout:
try:
response = requests.get(probe_url, timeout=5)
# For now, just verify we get a response from this cluster
# A more robust check would verify a unique token
if response.status_code < 500:
print(f"DNS PROBE OK: {hostname} routes to this cluster")
return True
except requests.RequestException as e:
last_error = e
time.sleep(poll_interval)
print(f"DNS PROBE FAIL: {hostname} - {last_error}")
return False
finally:
print("Cleaning up probe ingress...")
delete_probe_ingress(namespace)

View File

@ -31,7 +31,6 @@ from stack_orchestrator.deploy.k8s.helpers import (
envs_from_environment_variables_map,
envs_from_compose_file,
merge_envs,
translate_sidecar_service_names,
)
from stack_orchestrator.deploy.deploy_util import (
parsed_pod_files_map_from_file_names,
@ -72,17 +71,15 @@ def to_k8s_resource_requirements(resources: Resources) -> client.V1ResourceRequi
class ClusterInfo:
parsed_pod_yaml_map: Any
parsed_job_yaml_map: Any
image_set: Set[str] = set()
app_name: str
stack_name: str
environment_variables: DeployEnvVars
spec: Spec
def __init__(self) -> None:
self.parsed_job_yaml_map = {}
pass
def int(self, pod_files: List[str], compose_env_file, deployment_name, spec: Spec, stack_name=""):
def int(self, pod_files: List[str], compose_env_file, deployment_name, spec: Spec):
self.parsed_pod_yaml_map = parsed_pod_files_map_from_file_names(pod_files)
# Find the set of images in the pods
self.image_set = images_for_deployment(pod_files)
@ -92,23 +89,10 @@ class ClusterInfo:
}
self.environment_variables = DeployEnvVars(env_vars)
self.app_name = deployment_name
self.stack_name = stack_name
self.spec = spec
if opts.o.debug:
print(f"Env vars: {self.environment_variables.map}")
def init_jobs(self, job_files: List[str]):
"""Initialize parsed job YAML map from job compose files."""
self.parsed_job_yaml_map = parsed_pod_files_map_from_file_names(job_files)
if opts.o.debug:
print(f"Parsed job yaml map: {self.parsed_job_yaml_map}")
def _all_named_volumes(self) -> list:
"""Return named volumes from both pod and job compose files."""
volumes = named_volumes_from_pod_files(self.parsed_pod_yaml_map)
volumes.extend(named_volumes_from_pod_files(self.parsed_job_yaml_map))
return volumes
def get_nodeports(self):
nodeports = []
for pod_name in self.parsed_pod_yaml_map:
@ -141,8 +125,7 @@ class ClusterInfo:
name=(
f"{self.app_name}-nodeport-"
f"{pod_port}-{protocol.lower()}"
),
labels={"app": self.app_name},
)
),
spec=client.V1ServiceSpec(
type="NodePort",
@ -225,9 +208,7 @@ class ClusterInfo:
ingress = client.V1Ingress(
metadata=client.V1ObjectMeta(
name=f"{self.app_name}-ingress",
labels={"app": self.app_name},
annotations=ingress_annotations,
name=f"{self.app_name}-ingress", annotations=ingress_annotations
),
spec=spec,
)
@ -257,10 +238,7 @@ class ClusterInfo:
]
service = client.V1Service(
metadata=client.V1ObjectMeta(
name=f"{self.app_name}-service",
labels={"app": self.app_name},
),
metadata=client.V1ObjectMeta(name=f"{self.app_name}-service"),
spec=client.V1ServiceSpec(
type="ClusterIP",
ports=service_ports,
@ -272,26 +250,20 @@ class ClusterInfo:
def get_pvcs(self):
result = []
spec_volumes = self.spec.get_volumes()
named_volumes = self._all_named_volumes()
global_resources = self.spec.get_volume_resources()
if not global_resources:
global_resources = DEFAULT_VOLUME_RESOURCES
named_volumes = named_volumes_from_pod_files(self.parsed_pod_yaml_map)
resources = self.spec.get_volume_resources()
if not resources:
resources = DEFAULT_VOLUME_RESOURCES
if opts.o.debug:
print(f"Spec Volumes: {spec_volumes}")
print(f"Named Volumes: {named_volumes}")
print(f"Resources: {global_resources}")
print(f"Resources: {resources}")
for volume_name, volume_path in spec_volumes.items():
if volume_name not in named_volumes:
if opts.o.debug:
print(f"{volume_name} not in pod files")
continue
# Per-volume resources override global, which overrides default.
vol_resources = (
self.spec.get_volume_resources_for(volume_name)
or global_resources
)
labels = {
"app": self.app_name,
"volume-label": f"{self.app_name}-{volume_name}",
@ -307,7 +279,7 @@ class ClusterInfo:
spec = client.V1PersistentVolumeClaimSpec(
access_modes=["ReadWriteOnce"],
storage_class_name=storage_class_name,
resources=to_k8s_resource_requirements(vol_resources),
resources=to_k8s_resource_requirements(resources),
volume_name=k8s_volume_name,
)
pvc = client.V1PersistentVolumeClaim(
@ -322,7 +294,7 @@ class ClusterInfo:
def get_configmaps(self):
result = []
spec_configmaps = self.spec.get_configmaps()
named_volumes = self._all_named_volumes()
named_volumes = named_volumes_from_pod_files(self.parsed_pod_yaml_map)
for cfg_map_name, cfg_map_path in spec_configmaps.items():
if cfg_map_name not in named_volumes:
if opts.o.debug:
@ -348,7 +320,7 @@ class ClusterInfo:
spec = client.V1ConfigMap(
metadata=client.V1ObjectMeta(
name=f"{self.app_name}-{cfg_map_name}",
labels={"app": self.app_name, "configmap-label": cfg_map_name},
labels={"configmap-label": cfg_map_name},
),
binary_data=data,
)
@ -358,10 +330,10 @@ class ClusterInfo:
def get_pvs(self):
result = []
spec_volumes = self.spec.get_volumes()
named_volumes = self._all_named_volumes()
global_resources = self.spec.get_volume_resources()
if not global_resources:
global_resources = DEFAULT_VOLUME_RESOURCES
named_volumes = named_volumes_from_pod_files(self.parsed_pod_yaml_map)
resources = self.spec.get_volume_resources()
if not resources:
resources = DEFAULT_VOLUME_RESOURCES
for volume_name, volume_path in spec_volumes.items():
# We only need to create a volume if it is fully qualified HostPath.
# Otherwise, we create the PVC and expect the node to allocate the volume
@ -380,20 +352,12 @@ class ClusterInfo:
continue
if not os.path.isabs(volume_path):
# For k8s-kind, allow relative paths:
# - PV uses /mnt/{volume_name} (path inside kind node)
# - extraMounts resolve the relative path to Docker Host
if not self.spec.is_kind_deployment():
print(
f"WARNING: {volume_name}:{volume_path} is not absolute, "
"cannot bind volume."
)
continue
print(
f"WARNING: {volume_name}:{volume_path} is not absolute, "
"cannot bind volume."
)
continue
vol_resources = (
self.spec.get_volume_resources_for(volume_name)
or global_resources
)
if self.spec.is_kind_deployment():
host_path = client.V1HostPathVolumeSource(
path=get_kind_pv_bind_mount_path(volume_name)
@ -403,75 +367,28 @@ class ClusterInfo:
spec = client.V1PersistentVolumeSpec(
storage_class_name="manual",
access_modes=["ReadWriteOnce"],
capacity=to_k8s_resource_requirements(vol_resources).requests,
capacity=to_k8s_resource_requirements(resources).requests,
host_path=host_path,
)
pv = client.V1PersistentVolume(
metadata=client.V1ObjectMeta(
name=f"{self.app_name}-{volume_name}",
labels={
"app": self.app_name,
"volume-label": f"{self.app_name}-{volume_name}",
},
labels={"volume-label": f"{self.app_name}-{volume_name}"},
),
spec=spec,
)
result.append(pv)
return result
def _any_service_has_host_network(self):
# TODO: put things like image pull policy into an object-scope struct
def get_deployment(self, image_pull_policy: Optional[str] = None):
containers = []
services = {}
resources = self.spec.get_container_resources()
if not resources:
resources = DEFAULT_CONTAINER_RESOURCES
for pod_name in self.parsed_pod_yaml_map:
pod = self.parsed_pod_yaml_map[pod_name]
for svc in pod.get("services", {}).values():
if svc.get("network_mode") == "host":
return True
return False
def _resolve_container_resources(
self, container_name: str, service_info: dict, global_resources: Resources
) -> Resources:
"""Resolve resources for a container using layered priority.
Priority: spec per-container > compose deploy.resources
> spec global > DEFAULT
"""
# 1. Check spec.yml for per-container override
per_container = self.spec.get_container_resources_for(container_name)
if per_container:
return per_container
# 2. Check compose service_info for deploy.resources
deploy_block = service_info.get("deploy", {})
compose_resources = deploy_block.get("resources", {}) if deploy_block else {}
if compose_resources:
return Resources(compose_resources)
# 3. Fall back to spec.yml global (already resolved with DEFAULT fallback)
return global_resources
def _build_containers(
self,
parsed_yaml_map: Any,
image_pull_policy: Optional[str] = None,
) -> tuple:
"""Build k8s container specs from parsed compose YAML.
Returns a tuple of (containers, init_containers, services, volumes)
where:
- containers: list of V1Container objects
- init_containers: list of V1Container objects for init containers
(compose services with label ``laconic.init-container: "true"``)
- services: the last services dict processed (used for annotations/labels)
- volumes: list of V1Volume objects
"""
containers = []
init_containers = []
services = {}
global_resources = self.spec.get_container_resources()
if not global_resources:
global_resources = DEFAULT_CONTAINER_RESOURCES
for pod_name in parsed_yaml_map:
pod = parsed_yaml_map[pod_name]
services = pod["services"]
for service_name in services:
container_name = service_name
@ -509,12 +426,6 @@ class ClusterInfo:
if "environment" in service_info
else self.environment_variables.map
)
# Translate docker-compose service names to localhost for sidecars
# All services in the same pod share the network namespace
sibling_services = [s for s in services.keys() if s != service_name]
merged_envs = translate_sidecar_service_names(
merged_envs, sibling_services
)
envs = envs_from_environment_variables_map(merged_envs)
if opts.o.debug:
print(f"Merged envs: {envs}")
@ -528,7 +439,7 @@ class ClusterInfo:
else image
)
volume_mounts = volume_mounts_for_service(
parsed_yaml_map, service_name
self.parsed_pod_yaml_map, service_name
)
# Handle command/entrypoint from compose file
# In docker-compose: entrypoint -> k8s command, command -> k8s args
@ -542,29 +453,6 @@ class ClusterInfo:
if "command" in service_info:
cmd = service_info["command"]
container_args = cmd if isinstance(cmd, list) else cmd.split()
# Add env_from to pull secrets from K8s Secret
secret_name = f"{self.app_name}-generated-secrets"
env_from = [
client.V1EnvFromSource(
secret_ref=client.V1SecretEnvSource(
name=secret_name,
optional=True, # Don't fail if no secrets
)
)
]
# Mount user-declared secrets from spec.yml
for user_secret_name in self.spec.get_secrets():
env_from.append(
client.V1EnvFromSource(
secret_ref=client.V1SecretEnvSource(
name=user_secret_name,
optional=True,
)
)
)
container_resources = self._resolve_container_resources(
container_name, service_info, global_resources
)
container = client.V1Container(
name=container_name,
image=image_to_use,
@ -572,56 +460,26 @@ class ClusterInfo:
command=container_command,
args=container_args,
env=envs,
env_from=env_from,
ports=container_ports if container_ports else None,
volume_mounts=volume_mounts,
security_context=client.V1SecurityContext(
privileged=self.spec.get_privileged(),
run_as_user=int(service_info["user"]) if "user" in service_info else None,
capabilities=client.V1Capabilities(
add=self.spec.get_capabilities()
)
if self.spec.get_capabilities()
else None,
),
resources=to_k8s_resource_requirements(container_resources),
resources=to_k8s_resource_requirements(resources),
)
# Services with laconic.init-container label become
# k8s init containers instead of regular containers.
svc_labels = service_info.get("labels", {})
if isinstance(svc_labels, list):
# docker-compose labels can be a list of "key=value"
svc_labels = dict(
item.split("=", 1) for item in svc_labels
)
is_init = str(
svc_labels.get("laconic.init-container", "")
).lower() in ("true", "1", "yes")
if is_init:
init_containers.append(container)
else:
containers.append(container)
containers.append(container)
volumes = volumes_for_pod_files(
parsed_yaml_map, self.spec, self.app_name
self.parsed_pod_yaml_map, self.spec, self.app_name
)
return containers, init_containers, services, volumes
# TODO: put things like image pull policy into an object-scope struct
def get_deployment(self, image_pull_policy: Optional[str] = None):
containers, init_containers, services, volumes = self._build_containers(
self.parsed_pod_yaml_map, image_pull_policy
)
registry_config = self.spec.get_image_registry_config()
if registry_config:
secret_name = f"{self.app_name}-registry"
image_pull_secrets = [client.V1LocalObjectReference(name=secret_name)]
else:
image_pull_secrets = []
image_pull_secrets = [client.V1LocalObjectReference(name="laconic-registry")]
annotations = None
labels = {"app": self.app_name}
if self.stack_name:
labels["app.kubernetes.io/stack"] = self.stack_name
affinity = None
tolerations = None
@ -674,19 +532,15 @@ class ClusterInfo:
)
)
use_host_network = self._any_service_has_host_network()
template = client.V1PodTemplateSpec(
metadata=client.V1ObjectMeta(annotations=annotations, labels=labels),
spec=client.V1PodSpec(
containers=containers,
init_containers=init_containers or None,
image_pull_secrets=image_pull_secrets,
volumes=volumes,
affinity=affinity,
tolerations=tolerations,
runtime_class_name=self.spec.get_runtime_class(),
host_network=use_host_network or None,
dns_policy=("ClusterFirstWithHostNet" if use_host_network else None),
),
)
spec = client.V1DeploymentSpec(
@ -698,83 +552,7 @@ class ClusterInfo:
deployment = client.V1Deployment(
api_version="apps/v1",
kind="Deployment",
metadata=client.V1ObjectMeta(
name=f"{self.app_name}-deployment",
labels={"app": self.app_name, **({"app.kubernetes.io/stack": self.stack_name} if self.stack_name else {})},
),
metadata=client.V1ObjectMeta(name=f"{self.app_name}-deployment"),
spec=spec,
)
return deployment
def get_jobs(self, image_pull_policy: Optional[str] = None) -> List[client.V1Job]:
"""Build k8s Job objects from parsed job compose files.
Each job compose file produces a V1Job with:
- restartPolicy: Never
- backoffLimit: 0
- Name: {app_name}-job-{job_name}
"""
if not self.parsed_job_yaml_map:
return []
jobs = []
registry_config = self.spec.get_image_registry_config()
if registry_config:
secret_name = f"{self.app_name}-registry"
image_pull_secrets = [client.V1LocalObjectReference(name=secret_name)]
else:
image_pull_secrets = []
for job_file in self.parsed_job_yaml_map:
# Build containers for this single job file
single_job_map = {job_file: self.parsed_job_yaml_map[job_file]}
containers, init_containers, _services, volumes = (
self._build_containers(single_job_map, image_pull_policy)
)
# Derive job name from file path: docker-compose-<name>.yml -> <name>
base = os.path.basename(job_file)
# Strip docker-compose- prefix and .yml suffix
job_name = base
if job_name.startswith("docker-compose-"):
job_name = job_name[len("docker-compose-"):]
if job_name.endswith(".yml"):
job_name = job_name[: -len(".yml")]
elif job_name.endswith(".yaml"):
job_name = job_name[: -len(".yaml")]
# Use a distinct app label for job pods so they don't get
# picked up by pods_in_deployment() which queries app={app_name}.
pod_labels = {
"app": f"{self.app_name}-job",
**({"app.kubernetes.io/stack": self.stack_name} if self.stack_name else {}),
}
template = client.V1PodTemplateSpec(
metadata=client.V1ObjectMeta(
labels=pod_labels
),
spec=client.V1PodSpec(
containers=containers,
init_containers=init_containers or None,
image_pull_secrets=image_pull_secrets,
volumes=volumes,
restart_policy="Never",
),
)
job_spec = client.V1JobSpec(
template=template,
backoff_limit=0,
)
job_labels = {"app": self.app_name, **({"app.kubernetes.io/stack": self.stack_name} if self.stack_name else {})}
job = client.V1Job(
api_version="batch/v1",
kind="Job",
metadata=client.V1ObjectMeta(
name=f"{self.app_name}-job-{job_name}",
labels=job_labels,
),
spec=job_spec,
)
jobs.append(job)
return jobs

View File

@ -29,7 +29,6 @@ from stack_orchestrator.deploy.k8s.helpers import (
from stack_orchestrator.deploy.k8s.helpers import (
install_ingress_for_kind,
wait_for_ingress_in_kind,
is_ingress_running,
)
from stack_orchestrator.deploy.k8s.helpers import (
pods_in_deployment,
@ -95,9 +94,8 @@ class K8sDeployer(Deployer):
type: str
core_api: client.CoreV1Api
apps_api: client.AppsV1Api
batch_api: client.BatchV1Api
networking_api: client.NetworkingV1Api
k8s_namespace: str
k8s_namespace: str = "default"
kind_cluster_name: str
skip_cluster_management: bool
cluster_info: ClusterInfo
@ -111,39 +109,26 @@ class K8sDeployer(Deployer):
compose_files,
compose_project_name,
compose_env_file,
job_compose_files=None,
) -> None:
self.type = type
self.skip_cluster_management = False
self.k8s_namespace = "default" # Will be overridden below if context exists
# TODO: workaround pending refactoring above to cope with being
# created with a null deployment_context
if deployment_context is None:
return
self.deployment_dir = deployment_context.deployment_dir
self.deployment_context = deployment_context
self.kind_cluster_name = deployment_context.spec.get_kind_cluster_name() or compose_project_name
# Use spec namespace if provided, otherwise derive from cluster-id
self.k8s_namespace = deployment_context.spec.get_namespace() or f"laconic-{compose_project_name}"
self.kind_cluster_name = compose_project_name
self.cluster_info = ClusterInfo()
# stack.name may be an absolute path (from spec "stack:" key after
# path resolution). Extract just the directory basename for labels.
raw_name = deployment_context.stack.name if deployment_context else ""
stack_name = Path(raw_name).name if raw_name else ""
self.cluster_info.int(
compose_files,
compose_env_file,
compose_project_name,
deployment_context.spec,
stack_name=stack_name,
)
# Initialize job compose files if provided
if job_compose_files:
self.cluster_info.init_jobs(job_compose_files)
if opts.o.debug:
print(f"Deployment dir: {deployment_context.deployment_dir}")
print(f"Compose files: {compose_files}")
print(f"Job compose files: {job_compose_files}")
print(f"Project name: {compose_project_name}")
print(f"Env file: {compose_env_file}")
print(f"Type: {type}")
@ -161,136 +146,8 @@ class K8sDeployer(Deployer):
self.core_api = client.CoreV1Api()
self.networking_api = client.NetworkingV1Api()
self.apps_api = client.AppsV1Api()
self.batch_api = client.BatchV1Api()
self.custom_obj_api = client.CustomObjectsApi()
def _ensure_namespace(self):
"""Create the deployment namespace if it doesn't exist."""
if opts.o.dry_run:
print(f"Dry run: would create namespace {self.k8s_namespace}")
return
try:
self.core_api.read_namespace(name=self.k8s_namespace)
if opts.o.debug:
print(f"Namespace {self.k8s_namespace} already exists")
except ApiException as e:
if e.status == 404:
# Create the namespace
ns = client.V1Namespace(
metadata=client.V1ObjectMeta(
name=self.k8s_namespace,
labels={"app": self.cluster_info.app_name},
)
)
self.core_api.create_namespace(body=ns)
if opts.o.debug:
print(f"Created namespace {self.k8s_namespace}")
else:
raise
def _delete_namespace(self):
"""Delete the deployment namespace and all resources within it."""
if opts.o.dry_run:
print(f"Dry run: would delete namespace {self.k8s_namespace}")
return
try:
self.core_api.delete_namespace(name=self.k8s_namespace)
if opts.o.debug:
print(f"Deleted namespace {self.k8s_namespace}")
except ApiException as e:
if e.status == 404:
if opts.o.debug:
print(f"Namespace {self.k8s_namespace} not found")
else:
raise
def _delete_resources_by_label(self, label_selector: str, delete_volumes: bool):
"""Delete only this stack's resources from a shared namespace."""
ns = self.k8s_namespace
if opts.o.dry_run:
print(f"Dry run: would delete resources with {label_selector} in {ns}")
return
# Deployments
try:
deps = self.apps_api.list_namespaced_deployment(
namespace=ns, label_selector=label_selector
)
for dep in deps.items:
print(f"Deleting Deployment {dep.metadata.name}")
self.apps_api.delete_namespaced_deployment(
name=dep.metadata.name, namespace=ns
)
except ApiException as e:
_check_delete_exception(e)
# Jobs
try:
jobs = self.batch_api.list_namespaced_job(
namespace=ns, label_selector=label_selector
)
for job in jobs.items:
print(f"Deleting Job {job.metadata.name}")
self.batch_api.delete_namespaced_job(
name=job.metadata.name, namespace=ns,
body=client.V1DeleteOptions(propagation_policy="Background"),
)
except ApiException as e:
_check_delete_exception(e)
# Services (NodePorts created by SO)
try:
svcs = self.core_api.list_namespaced_service(
namespace=ns, label_selector=label_selector
)
for svc in svcs.items:
print(f"Deleting Service {svc.metadata.name}")
self.core_api.delete_namespaced_service(
name=svc.metadata.name, namespace=ns
)
except ApiException as e:
_check_delete_exception(e)
# Ingresses
try:
ings = self.networking_api.list_namespaced_ingress(
namespace=ns, label_selector=label_selector
)
for ing in ings.items:
print(f"Deleting Ingress {ing.metadata.name}")
self.networking_api.delete_namespaced_ingress(
name=ing.metadata.name, namespace=ns
)
except ApiException as e:
_check_delete_exception(e)
# ConfigMaps
try:
cms = self.core_api.list_namespaced_config_map(
namespace=ns, label_selector=label_selector
)
for cm in cms.items:
print(f"Deleting ConfigMap {cm.metadata.name}")
self.core_api.delete_namespaced_config_map(
name=cm.metadata.name, namespace=ns
)
except ApiException as e:
_check_delete_exception(e)
# PVCs (only if --delete-volumes)
if delete_volumes:
try:
pvcs = self.core_api.list_namespaced_persistent_volume_claim(
namespace=ns, label_selector=label_selector
)
for pvc in pvcs.items:
print(f"Deleting PVC {pvc.metadata.name}")
self.core_api.delete_namespaced_persistent_volume_claim(
name=pvc.metadata.name, namespace=ns
)
except ApiException as e:
_check_delete_exception(e)
def _create_volume_data(self):
# Create the host-path-mounted PVs for this deployment
pvs = self.cluster_info.get_pvs()
@ -355,11 +212,6 @@ class K8sDeployer(Deployer):
print(f"{cfg_rsp}")
def _create_deployment(self):
# Skip if there are no pods to deploy (e.g. jobs-only stacks)
if not self.cluster_info.parsed_pod_yaml_map:
if opts.o.debug:
print("No pods defined, skipping Deployment creation")
return
# Process compose files into a Deployment
deployment = self.cluster_info.get_deployment(
image_pull_policy=None if self.is_kind() else "Always"
@ -397,26 +249,6 @@ class K8sDeployer(Deployer):
print("Service created:")
print(f"{service_resp}")
def _create_jobs(self):
# Process job compose files into k8s Jobs
jobs = self.cluster_info.get_jobs(
image_pull_policy=None if self.is_kind() else "Always"
)
for job in jobs:
if opts.o.debug:
print(f"Sending this job: {job}")
if not opts.o.dry_run:
job_resp = self.batch_api.create_namespaced_job(
body=job, namespace=self.k8s_namespace
)
if opts.o.debug:
print("Job created:")
if job_resp.metadata:
print(
f" {job_resp.metadata.namespace} "
f"{job_resp.metadata.name}"
)
def _find_certificate_for_host_name(self, host_name):
all_certificates = self.custom_obj_api.list_namespaced_custom_object(
group="cert-manager.io",
@ -457,40 +289,22 @@ class K8sDeployer(Deployer):
self.skip_cluster_management = skip_cluster_management
if not opts.o.dry_run:
if self.is_kind() and not self.skip_cluster_management:
# Create the kind cluster (or reuse existing one)
kind_config = str(
self.deployment_dir.joinpath(constants.kind_config_filename)
# Create the kind cluster
create_cluster(
self.kind_cluster_name,
str(self.deployment_dir.joinpath(constants.kind_config_filename)),
)
actual_cluster = create_cluster(self.kind_cluster_name, kind_config)
if actual_cluster != self.kind_cluster_name:
# An existing cluster was found, use it instead
self.kind_cluster_name = actual_cluster
# Only load locally-built images into kind
# Registry images (docker.io, ghcr.io, etc.) will be pulled by k8s
local_containers = self.deployment_context.stack.obj.get(
"containers", []
# Ensure the referenced containers are copied into kind
load_images_into_kind(
self.kind_cluster_name, self.cluster_info.image_set
)
if local_containers:
# Filter image_set to only images matching local containers
local_images = {
img
for img in self.cluster_info.image_set
if any(c in img for c in local_containers)
}
if local_images:
load_images_into_kind(self.kind_cluster_name, local_images)
# Note: if no local containers defined, all images come from registries
self.connect_api()
# Create deployment-specific namespace for resource isolation
self._ensure_namespace()
if self.is_kind() and not self.skip_cluster_management:
# Configure ingress controller (not installed by default in kind)
# Skip if already running (idempotent for shared cluster)
if not is_ingress_running():
install_ingress_for_kind(self.cluster_info.spec.get_acme_email())
# Wait for ingress to start
# (deployment provisioning will fail unless this is done)
wait_for_ingress_in_kind()
install_ingress_for_kind()
# Wait for ingress to start
# (deployment provisioning will fail unless this is done)
wait_for_ingress_in_kind()
# Create RuntimeClass if unlimited_memlock is enabled
if self.cluster_info.spec.get_unlimited_memlock():
_create_runtime_class(
@ -501,14 +315,8 @@ class K8sDeployer(Deployer):
else:
print("Dry run mode enabled, skipping k8s API connect")
# Create registry secret if configured
from stack_orchestrator.deploy.deployment_create import create_registry_secret
create_registry_secret(self.cluster_info.spec, self.cluster_info.app_name)
self._create_volume_data()
self._create_deployment()
self._create_jobs()
http_proxy_info = self.cluster_info.spec.get_http_proxy()
# Note: we don't support tls for kind (enabling tls causes errors)
@ -551,42 +359,107 @@ class K8sDeployer(Deployer):
print("NodePort created:")
print(f"{nodeport_resp}")
# Call start() hooks — stacks can create additional k8s resources
if self.deployment_context:
from stack_orchestrator.deploy.deployment_create import call_stack_deploy_start
call_stack_deploy_start(self.deployment_context)
def down(self, timeout, volumes, skip_cluster_management):
def down(self, timeout, volumes, skip_cluster_management): # noqa: C901
self.skip_cluster_management = skip_cluster_management
self.connect_api()
# Delete the k8s objects
app_label = f"app={self.cluster_info.app_name}"
# PersistentVolumes are cluster-scoped (not namespaced), so delete by label
if volumes:
try:
pvs = self.core_api.list_persistent_volume(
label_selector=app_label
)
for pv in pvs.items:
if opts.o.debug:
print(f"Deleting PV: {pv.metadata.name}")
try:
self.core_api.delete_persistent_volume(name=pv.metadata.name)
except ApiException as e:
_check_delete_exception(e)
except ApiException as e:
# Create the host-path-mounted PVs for this deployment
pvs = self.cluster_info.get_pvs()
for pv in pvs:
if opts.o.debug:
print(f"Error listing PVs: {e}")
print(f"Deleting this pv: {pv}")
try:
pv_resp = self.core_api.delete_persistent_volume(
name=pv.metadata.name
)
if opts.o.debug:
print("PV deleted:")
print(f"{pv_resp}")
except ApiException as e:
_check_delete_exception(e)
# When namespace is explicitly set in the spec, it may be shared with
# other stacks — delete only this stack's resources by label.
# Otherwise the namespace is owned by this deployment, delete it entirely.
shared_namespace = self.deployment_context.spec.get_namespace() is not None
if shared_namespace:
self._delete_resources_by_label(app_label, volumes)
# Figure out the PVCs for this deployment
pvcs = self.cluster_info.get_pvcs()
for pvc in pvcs:
if opts.o.debug:
print(f"Deleting this pvc: {pvc}")
try:
pvc_resp = self.core_api.delete_namespaced_persistent_volume_claim(
name=pvc.metadata.name, namespace=self.k8s_namespace
)
if opts.o.debug:
print("PVCs deleted:")
print(f"{pvc_resp}")
except ApiException as e:
_check_delete_exception(e)
# Figure out the ConfigMaps for this deployment
cfg_maps = self.cluster_info.get_configmaps()
for cfg_map in cfg_maps:
if opts.o.debug:
print(f"Deleting this ConfigMap: {cfg_map}")
try:
cfg_map_resp = self.core_api.delete_namespaced_config_map(
name=cfg_map.metadata.name, namespace=self.k8s_namespace
)
if opts.o.debug:
print("ConfigMap deleted:")
print(f"{cfg_map_resp}")
except ApiException as e:
_check_delete_exception(e)
deployment = self.cluster_info.get_deployment()
if opts.o.debug:
print(f"Deleting this deployment: {deployment}")
if deployment and deployment.metadata and deployment.metadata.name:
try:
self.apps_api.delete_namespaced_deployment(
name=deployment.metadata.name, namespace=self.k8s_namespace
)
except ApiException as e:
_check_delete_exception(e)
service = self.cluster_info.get_service()
if opts.o.debug:
print(f"Deleting service: {service}")
if service and service.metadata and service.metadata.name:
try:
self.core_api.delete_namespaced_service(
namespace=self.k8s_namespace, name=service.metadata.name
)
except ApiException as e:
_check_delete_exception(e)
ingress = self.cluster_info.get_ingress(use_tls=not self.is_kind())
if ingress and ingress.metadata and ingress.metadata.name:
if opts.o.debug:
print(f"Deleting this ingress: {ingress}")
try:
self.networking_api.delete_namespaced_ingress(
name=ingress.metadata.name, namespace=self.k8s_namespace
)
except ApiException as e:
_check_delete_exception(e)
else:
self._delete_namespace()
if opts.o.debug:
print("No ingress to delete")
nodeports: List[client.V1Service] = self.cluster_info.get_nodeports()
for nodeport in nodeports:
if opts.o.debug:
print(f"Deleting this nodeport: {nodeport}")
if nodeport.metadata and nodeport.metadata.name:
try:
self.core_api.delete_namespaced_service(
namespace=self.k8s_namespace, name=nodeport.metadata.name
)
except ApiException as e:
_check_delete_exception(e)
else:
if opts.o.debug:
print("No nodeport to delete")
if self.is_kind() and not self.skip_cluster_management:
# Destroy the kind cluster
@ -711,20 +584,20 @@ class K8sDeployer(Deployer):
def logs(self, services, tail, follow, stream):
self.connect_api()
pods = pods_in_deployment(self.core_api, self.cluster_info.app_name, namespace=self.k8s_namespace)
pods = pods_in_deployment(self.core_api, self.cluster_info.app_name)
if len(pods) > 1:
print("Warning: more than one pod in the deployment")
if len(pods) == 0:
log_data = "******* Pods not running ********\n"
else:
k8s_pod_name = pods[0]
containers = containers_in_pod(self.core_api, k8s_pod_name, namespace=self.k8s_namespace)
containers = containers_in_pod(self.core_api, k8s_pod_name)
# If pod not started, logs request below will throw an exception
try:
log_data = ""
for container in containers:
container_log = self.core_api.read_namespaced_pod_log(
k8s_pod_name, namespace=self.k8s_namespace, container=container
k8s_pod_name, namespace="default", container=container
)
container_log_lines = container_log.splitlines()
for line in container_log_lines:
@ -736,10 +609,6 @@ class K8sDeployer(Deployer):
return log_stream_from_string(log_data)
def update(self):
if not self.cluster_info.parsed_pod_yaml_map:
if opts.o.debug:
print("No pods defined, skipping update")
return
self.connect_api()
ref_deployment = self.cluster_info.get_deployment()
if not ref_deployment or not ref_deployment.metadata:
@ -800,43 +669,26 @@ class K8sDeployer(Deployer):
def run_job(self, job_name: str, helm_release: Optional[str] = None):
if not opts.o.dry_run:
from stack_orchestrator.deploy.k8s.helm.job_runner import run_helm_job
# Check if this is a helm-based deployment
chart_dir = self.deployment_dir / "chart"
if chart_dir.exists():
from stack_orchestrator.deploy.k8s.helm.job_runner import run_helm_job
if not chart_dir.exists():
# TODO: Implement job support for compose-based K8s deployments
raise Exception(
f"Job support is only available for helm-based "
f"deployments. Chart directory not found: {chart_dir}"
)
# Run the job using the helm job runner
run_helm_job(
chart_dir=chart_dir,
job_name=job_name,
release=helm_release,
namespace=self.k8s_namespace,
timeout=600,
verbose=opts.o.verbose,
)
else:
# Non-Helm path: create job from ClusterInfo
self.connect_api()
jobs = self.cluster_info.get_jobs(
image_pull_policy=None if self.is_kind() else "Always"
)
# Find the matching job by name
target_name = f"{self.cluster_info.app_name}-job-{job_name}"
matched_job = None
for job in jobs:
if job.metadata and job.metadata.name == target_name:
matched_job = job
break
if matched_job is None:
raise Exception(
f"Job '{job_name}' not found. Available jobs: "
f"{[j.metadata.name for j in jobs if j.metadata]}"
)
if opts.o.debug:
print(f"Creating job: {target_name}")
self.batch_api.create_namespaced_job(
body=matched_job, namespace=self.k8s_namespace
)
# Run the job using the helm job runner
run_helm_job(
chart_dir=chart_dir,
job_name=job_name,
release=helm_release,
namespace=self.k8s_namespace,
timeout=600,
verbose=opts.o.verbose,
)
def is_kind(self):
return self.type == "k8s-kind"

View File

@ -14,13 +14,11 @@
# along with this program. If not, see <http:#www.gnu.org/licenses/>.
from kubernetes import client, utils, watch
from kubernetes.client.exceptions import ApiException
import os
from pathlib import Path
import subprocess
import re
from typing import Set, Mapping, List, Optional, cast
import yaml
from stack_orchestrator.util import get_k8s_dir, error_exit
from stack_orchestrator.opts import opts
@ -98,227 +96,16 @@ def _run_command(command: str):
return result
def _get_etcd_host_path_from_kind_config(config_file: str) -> Optional[str]:
"""Extract etcd host path from kind config extraMounts."""
import yaml
try:
with open(config_file, "r") as f:
config = yaml.safe_load(f)
except Exception:
return None
nodes = config.get("nodes", [])
for node in nodes:
extra_mounts = node.get("extraMounts", [])
for mount in extra_mounts:
if mount.get("containerPath") == "/var/lib/etcd":
return mount.get("hostPath")
return None
def _clean_etcd_keeping_certs(etcd_path: str) -> bool:
"""Clean persisted etcd, keeping only TLS certificates.
When etcd is persisted and a cluster is recreated, kind tries to install
resources fresh but they already exist. Instead of trying to delete
specific stale resources (blacklist), we keep only the valuable data
(caddy TLS certs) and delete everything else (whitelist approach).
The etcd image is distroless (no shell), so we extract the statically-linked
etcdctl binary and run it from alpine which has shell support.
Returns True if cleanup succeeded, False if no action needed or failed.
"""
db_path = Path(etcd_path) / "member" / "snap" / "db"
# Check existence using docker since etcd dir is root-owned
check_cmd = (
f"docker run --rm -v {etcd_path}:/etcd:ro alpine:3.19 "
"test -f /etcd/member/snap/db"
)
check_result = subprocess.run(check_cmd, shell=True, capture_output=True)
if check_result.returncode != 0:
if opts.o.debug:
print(f"No etcd snapshot at {db_path}, skipping cleanup")
return False
if opts.o.debug:
print(f"Cleaning persisted etcd at {etcd_path}, keeping only TLS certs")
etcd_image = "gcr.io/etcd-development/etcd:v3.5.9"
temp_dir = "/tmp/laconic-etcd-cleanup"
# Whitelist: prefixes to KEEP - everything else gets deleted
keep_prefixes = "/registry/secrets/caddy-system"
# The etcd image is distroless (no shell). We extract the statically-linked
# etcdctl binary and run it from alpine which has shell + jq support.
cleanup_script = f"""
set -e
ALPINE_IMAGE="alpine:3.19"
# Cleanup previous runs
docker rm -f laconic-etcd-cleanup 2>/dev/null || true
docker rm -f etcd-extract 2>/dev/null || true
docker run --rm -v /tmp:/tmp $ALPINE_IMAGE rm -rf {temp_dir}
# Create temp dir
docker run --rm -v /tmp:/tmp $ALPINE_IMAGE mkdir -p {temp_dir}
# Extract etcdctl binary (it's statically linked)
docker create --name etcd-extract {etcd_image}
docker cp etcd-extract:/usr/local/bin/etcdctl /tmp/etcdctl-bin
docker rm etcd-extract
docker run --rm -v /tmp/etcdctl-bin:/src:ro -v {temp_dir}:/dst $ALPINE_IMAGE \
sh -c "cp /src /dst/etcdctl && chmod +x /dst/etcdctl"
# Copy db to temp location
docker run --rm \
-v {etcd_path}:/etcd:ro \
-v {temp_dir}:/tmp-work \
$ALPINE_IMAGE cp /etcd/member/snap/db /tmp-work/etcd-snapshot.db
# Restore snapshot
docker run --rm -v {temp_dir}:/work {etcd_image} \
etcdutl snapshot restore /work/etcd-snapshot.db \
--data-dir=/work/etcd-data --skip-hash-check 2>/dev/null
# Start temp etcd (runs the etcd binary, no shell needed)
docker run -d --name laconic-etcd-cleanup \
-v {temp_dir}/etcd-data:/etcd-data \
-v {temp_dir}:/backup \
{etcd_image} etcd \
--data-dir=/etcd-data \
--listen-client-urls=http://0.0.0.0:2379 \
--advertise-client-urls=http://localhost:2379
sleep 3
# Use alpine with extracted etcdctl to run commands (alpine has shell + jq)
# Export caddy secrets
docker run --rm \
-v {temp_dir}:/backup \
--network container:laconic-etcd-cleanup \
$ALPINE_IMAGE sh -c \
'/backup/etcdctl get --prefix "{keep_prefixes}" -w json \
> /backup/kept.json 2>/dev/null || echo "{{}}" > /backup/kept.json'
# Delete ALL registry keys
docker run --rm \
-v {temp_dir}:/backup \
--network container:laconic-etcd-cleanup \
$ALPINE_IMAGE /backup/etcdctl del --prefix /registry
# Restore kept keys using jq
docker run --rm \
-v {temp_dir}:/backup \
--network container:laconic-etcd-cleanup \
$ALPINE_IMAGE sh -c '
apk add --no-cache jq >/dev/null 2>&1
jq -r ".kvs[] | @base64" /backup/kept.json 2>/dev/null | \
while read encoded; do
key=$(echo $encoded | base64 -d | jq -r ".key" | base64 -d)
val=$(echo $encoded | base64 -d | jq -r ".value" | base64 -d)
echo "$val" | /backup/etcdctl put "$key"
done
' || true
# Save cleaned snapshot
docker exec laconic-etcd-cleanup \
etcdctl snapshot save /etcd-data/cleaned-snapshot.db
docker stop laconic-etcd-cleanup
docker rm laconic-etcd-cleanup
# Restore to temp location first to verify it works
docker run --rm \
-v {temp_dir}/etcd-data/cleaned-snapshot.db:/data/db:ro \
-v {temp_dir}:/restore \
{etcd_image} \
etcdutl snapshot restore /data/db --data-dir=/restore/new-etcd \
--skip-hash-check 2>/dev/null
# Create timestamped backup of original (kept forever)
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
docker run --rm -v {etcd_path}:/etcd $ALPINE_IMAGE \
cp -a /etcd/member /etcd/member.backup-$TIMESTAMP
# Replace original with cleaned version
docker run --rm -v {etcd_path}:/etcd -v {temp_dir}:/tmp-work $ALPINE_IMAGE \
sh -c "rm -rf /etcd/member && mv /tmp-work/new-etcd/member /etcd/member"
# Cleanup temp files (but NOT the timestamped backup in etcd_path)
docker run --rm -v /tmp:/tmp $ALPINE_IMAGE rm -rf {temp_dir}
rm -f /tmp/etcdctl-bin
"""
result = subprocess.run(cleanup_script, shell=True, capture_output=True, text=True)
if result.returncode != 0:
if opts.o.debug:
print(f"Warning: etcd cleanup failed: {result.stderr}")
return False
if opts.o.debug:
print("Cleaned etcd, kept only TLS certificates")
return True
def create_cluster(name: str, config_file: str):
"""Create or reuse the single kind cluster for this host.
There is only one kind cluster per host by design. Multiple deployments
share this cluster. If a cluster already exists, it is reused.
Args:
name: Cluster name (used only when creating the first cluster)
config_file: Path to kind config file (used only when creating)
Returns:
The name of the cluster being used
"""
existing = get_kind_cluster()
if existing:
print(f"Using existing cluster: {existing}")
return existing
# Clean persisted etcd, keeping only TLS certificates
etcd_path = _get_etcd_host_path_from_kind_config(config_file)
if etcd_path:
_clean_etcd_keeping_certs(etcd_path)
print(f"Creating new cluster: {name}")
result = _run_command(f"kind create cluster --name {name} --config {config_file}")
if result.returncode != 0:
raise DeployerException(f"kind create cluster failed: {result}")
return name
def destroy_cluster(name: str):
_run_command(f"kind delete cluster --name {name}")
def is_ingress_running() -> bool:
"""Check if the Caddy ingress controller is already running in the cluster."""
try:
core_v1 = client.CoreV1Api()
pods = core_v1.list_namespaced_pod(
namespace="caddy-system",
label_selector=(
"app.kubernetes.io/name=caddy-ingress-controller,"
"app.kubernetes.io/component=controller"
),
)
for pod in pods.items:
if pod.status and pod.status.container_statuses:
if pod.status.container_statuses[0].ready is True:
if opts.o.debug:
print("Caddy ingress controller already running")
return True
return False
except ApiException:
return False
def wait_for_ingress_in_kind():
core_v1 = client.CoreV1Api()
for i in range(20):
@ -345,7 +132,7 @@ def wait_for_ingress_in_kind():
error_exit("ERROR: Timed out waiting for Caddy ingress to become ready")
def install_ingress_for_kind(acme_email: str = ""):
def install_ingress_for_kind():
api_client = client.ApiClient()
ingress_install = os.path.abspath(
get_k8s_dir().joinpath(
@ -354,34 +141,7 @@ def install_ingress_for_kind(acme_email: str = ""):
)
if opts.o.debug:
print("Installing Caddy ingress controller in kind cluster")
# Template the YAML with email before applying
with open(ingress_install) as f:
yaml_content = f.read()
if acme_email:
yaml_content = yaml_content.replace('email: ""', f'email: "{acme_email}"')
if opts.o.debug:
print(f"Configured Caddy with ACME email: {acme_email}")
# Apply templated YAML
yaml_objects = list(yaml.safe_load_all(yaml_content))
utils.create_from_yaml(api_client, yaml_objects=yaml_objects)
# Patch ConfigMap with ACME email if provided
if acme_email:
if opts.o.debug:
print(f"Configuring ACME email: {acme_email}")
core_api = client.CoreV1Api()
configmap = core_api.read_namespaced_config_map(
name="caddy-ingress-controller-configmap", namespace="caddy-system"
)
configmap.data["email"] = acme_email
core_api.patch_namespaced_config_map(
name="caddy-ingress-controller-configmap",
namespace="caddy-system",
body=configmap,
)
utils.create_from_yaml(api_client, yaml_file=ingress_install)
def load_images_into_kind(kind_cluster_name: str, image_set: Set[str]):
@ -393,10 +153,10 @@ def load_images_into_kind(kind_cluster_name: str, image_set: Set[str]):
raise DeployerException(f"kind load docker-image failed: {result}")
def pods_in_deployment(core_api: client.CoreV1Api, deployment_name: str, namespace: str = "default"):
def pods_in_deployment(core_api: client.CoreV1Api, deployment_name: str):
pods = []
pod_response = core_api.list_namespaced_pod(
namespace=namespace, label_selector=f"app={deployment_name}"
namespace="default", label_selector=f"app={deployment_name}"
)
if opts.o.debug:
print(f"pod_response: {pod_response}")
@ -406,10 +166,10 @@ def pods_in_deployment(core_api: client.CoreV1Api, deployment_name: str, namespa
return pods
def containers_in_pod(core_api: client.CoreV1Api, pod_name: str, namespace: str = "default") -> List[str]:
def containers_in_pod(core_api: client.CoreV1Api, pod_name: str) -> List[str]:
containers: List[str] = []
pod_response = cast(
client.V1Pod, core_api.read_namespaced_pod(pod_name, namespace=namespace)
client.V1Pod, core_api.read_namespaced_pod(pod_name, namespace="default")
)
if opts.o.debug:
print(f"pod_response: {pod_response}")
@ -564,25 +324,6 @@ def _generate_kind_mounts(parsed_pod_files, deployment_dir, deployment_context):
volume_host_path_map = _get_host_paths_for_volumes(deployment_context)
seen_host_path_mounts = set() # Track to avoid duplicate mounts
# Cluster state backup for offline data recovery (unique per deployment)
# etcd contains all k8s state; PKI certs needed to decrypt etcd offline
deployment_id = deployment_context.id
backup_subdir = f"cluster-backups/{deployment_id}"
etcd_host_path = _make_absolute_host_path(
Path(f"./data/{backup_subdir}/etcd"), deployment_dir
)
volume_definitions.append(
f" - hostPath: {etcd_host_path}\n" f" containerPath: /var/lib/etcd\n"
)
pki_host_path = _make_absolute_host_path(
Path(f"./data/{backup_subdir}/pki"), deployment_dir
)
volume_definitions.append(
f" - hostPath: {pki_host_path}\n" f" containerPath: /etc/kubernetes/pki\n"
)
# Note these paths are relative to the location of the pod files (at present)
# So we need to fix up to make them correct and absolute because kind assumes
# relative to the cwd.
@ -942,41 +683,6 @@ def envs_from_compose_file(
return result
def translate_sidecar_service_names(
envs: Mapping[str, str], sibling_service_names: List[str]
) -> Mapping[str, str]:
"""Translate docker-compose service names to localhost for sidecar containers.
In docker-compose, services can reference each other by name (e.g., 'db:5432').
In Kubernetes, when multiple containers are in the same pod (sidecars), they
share the same network namespace and must use 'localhost' instead.
This function replaces service name references with 'localhost' in env values.
"""
import re
if not sibling_service_names:
return envs
result = {}
for env_var, env_val in envs.items():
if env_val is None:
result[env_var] = env_val
continue
new_val = str(env_val)
for service_name in sibling_service_names:
# Match service name followed by optional port (e.g., 'db:5432', 'db')
# Handle URLs like: postgres://user:pass@db:5432/dbname
# and simple refs like: db:5432 or just db
pattern = rf"\b{re.escape(service_name)}(:\d+)?\b"
new_val = re.sub(pattern, lambda m: f'localhost{m.group(1) or ""}', new_val)
result[env_var] = new_val
return result
def envs_from_environment_variables_map(
map: Mapping[str, str]
) -> List[client.V1EnvVar]:

View File

@ -98,99 +98,25 @@ class Spec:
def get_image_registry(self):
return self.obj.get(constants.image_registry_key)
def get_image_registry_config(self) -> typing.Optional[typing.Dict]:
"""Returns registry auth config: {server, username, token-env}.
Used for private container registries like GHCR. The token-env field
specifies an environment variable containing the API token/PAT.
Note: Uses 'registry-credentials' key to avoid collision with
'image-registry' key which is for pushing images.
"""
return self.obj.get("registry-credentials")
def get_volumes(self):
return self.obj.get(constants.volumes_key, {})
def get_configmaps(self):
return self.obj.get(constants.configmaps_key, {})
def get_secrets(self):
return self.obj.get(constants.secrets_key, {})
def get_container_resources(self):
return Resources(
self.obj.get(constants.resources_key, {}).get("containers", {})
)
def get_container_resources_for(
self, container_name: str
) -> typing.Optional[Resources]:
"""Look up per-container resource overrides from spec.yml.
Checks resources.containers.<container_name> in the spec. Returns None
if no per-container override exists (caller falls back to other sources).
"""
containers_block = self.obj.get(constants.resources_key, {}).get(
"containers", {}
)
if container_name in containers_block:
entry = containers_block[container_name]
# Only treat it as a per-container override if it's a dict with
# reservations/limits nested inside (not a top-level global key)
if isinstance(entry, dict) and (
"reservations" in entry or "limits" in entry
):
return Resources(entry)
return None
def get_volume_resources(self):
return Resources(
self.obj.get(constants.resources_key, {}).get(constants.volumes_key, {})
)
def get_volume_resources_for(self, volume_name: str) -> typing.Optional[Resources]:
"""Look up per-volume resource overrides from spec.yml.
Supports two formats under resources.volumes:
Global (original):
resources:
volumes:
reservations:
storage: 5Gi
Per-volume (new):
resources:
volumes:
my-volume:
reservations:
storage: 10Gi
Returns the per-volume Resources if found, otherwise None.
The caller should fall back to get_volume_resources() then the default.
"""
vol_section = (
self.obj.get(constants.resources_key, {}).get(constants.volumes_key, {})
)
if volume_name not in vol_section:
return None
entry = vol_section[volume_name]
if isinstance(entry, dict) and (
"reservations" in entry or "limits" in entry
):
return Resources(entry)
return None
def get_http_proxy(self):
return self.obj.get(constants.network_key, {}).get(constants.http_proxy_key, [])
def get_namespace(self):
return self.obj.get("namespace")
def get_kind_cluster_name(self):
return self.obj.get("kind-cluster-name")
def get_annotations(self):
return self.obj.get(constants.annotations_key, {})
@ -253,9 +179,6 @@ class Spec:
def get_deployment_type(self):
return self.obj.get(constants.deploy_to_key)
def get_acme_email(self):
return self.obj.get(constants.network_key, {}).get(constants.acme_email_key, "")
def is_kubernetes_deployment(self):
return self.get_deployment_type() in [
constants.k8s_kind_deploy_type,

View File

@ -19,7 +19,7 @@ from pathlib import Path
from urllib.parse import urlparse
from tempfile import NamedTemporaryFile
from stack_orchestrator.util import error_exit, global_options2, get_yaml
from stack_orchestrator.util import error_exit, global_options2
from stack_orchestrator.deploy.deployment_create import init_operation, create_operation
from stack_orchestrator.deploy.deploy import create_deploy_context
from stack_orchestrator.deploy.deploy_types import DeployCommandContext
@ -41,23 +41,19 @@ def _fixup_container_tag(deployment_dir: str, image: str):
def _fixup_url_spec(spec_file_name: str, url: str):
# url is like: https://example.com/path
parsed_url = urlparse(url)
http_proxy_spec = f"""
http-proxy:
- host-name: {parsed_url.hostname}
routes:
- path: '{parsed_url.path if parsed_url.path else "/"}'
proxy-to: webapp:80
"""
spec_file_path = Path(spec_file_name)
yaml = get_yaml()
with open(spec_file_path) as rfile:
contents = yaml.load(rfile)
contents.setdefault("network", {})["http-proxy"] = [
{
"host-name": parsed_url.hostname,
"routes": [
{
"path": parsed_url.path if parsed_url.path else "/",
"proxy-to": "webapp:80",
}
],
}
]
contents = rfile.read()
contents = contents + http_proxy_spec
with open(spec_file_path, "w") as wfile:
yaml.dump(contents, wfile)
wfile.write(contents)
def create_deployment(

View File

@ -75,8 +75,6 @@ def get_parsed_stack_config(stack):
def get_pod_list(parsed_stack):
# Handle both old and new format
if "pods" not in parsed_stack or not parsed_stack["pods"]:
return []
pods = parsed_stack["pods"]
if type(pods[0]) is str:
result = pods
@ -105,7 +103,7 @@ def get_job_list(parsed_stack):
def get_plugin_code_paths(stack) -> List[Path]:
parsed_stack = get_parsed_stack_config(stack)
pods = parsed_stack.get("pods") or []
pods = parsed_stack["pods"]
result: Set[Path] = set()
for pod in pods:
if type(pod) is str:
@ -155,16 +153,15 @@ def resolve_job_compose_file(stack, job_name: str):
if proposed_file.exists():
return proposed_file
# If we don't find it fall through to the internal case
data_dir = Path(__file__).absolute().parent.joinpath("data")
compose_jobs_base = data_dir.joinpath("compose-jobs")
# TODO: Add internal compose-jobs directory support if needed
# For now, jobs are expected to be in external stacks only
compose_jobs_base = Path(stack).parent.parent.joinpath("compose-jobs")
return compose_jobs_base.joinpath(f"docker-compose-{job_name}.yml")
def get_pod_file_path(stack, parsed_stack, pod_name: str):
pods = parsed_stack.get("pods") or []
pods = parsed_stack["pods"]
result = None
if not pods:
return result
if type(pods[0]) is str:
result = resolve_compose_file(stack, pod_name)
else:
@ -192,9 +189,9 @@ def get_job_file_path(stack, parsed_stack, job_name: str):
def get_pod_script_paths(parsed_stack, pod_name: str):
pods = parsed_stack.get("pods") or []
pods = parsed_stack["pods"]
result = []
if not pods or not type(pods[0]) is str:
if not type(pods[0]) is str:
for pod in pods:
if pod["name"] == pod_name:
pod_root_dir = os.path.join(
@ -210,9 +207,9 @@ def get_pod_script_paths(parsed_stack, pod_name: str):
def pod_has_scripts(parsed_stack, pod_name: str):
pods = parsed_stack.get("pods") or []
pods = parsed_stack["pods"]
result = False
if not pods or type(pods[0]) is str:
if type(pods[0]) is str:
result = False
else:
for pod in pods:

View File

@ -105,15 +105,6 @@ fi
# Add a config file to be picked up by the ConfigMap before starting.
echo "dbfc7a4d-44a7-416d-b5f3-29842cc47650" > $test_deployment_dir/configmaps/test-config/test_config
# Add secrets to the deployment spec (references a pre-existing k8s Secret by name).
# deploy init already writes an empty 'secrets: {}' key, so we replace it
# rather than appending (ruamel.yaml rejects duplicate keys).
deployment_spec_file=${test_deployment_dir}/spec.yml
sed -i 's/^secrets: {}$/secrets:\n test-secret:\n - TEST_SECRET_KEY/' ${deployment_spec_file}
# Get the deployment ID for kubectl queries
deployment_id=$(cat ${test_deployment_dir}/deployment.yml | cut -d ' ' -f 2)
echo "deploy create output file test: passed"
# Try to start the deployment
$TEST_TARGET_SO deployment --dir $test_deployment_dir start
@ -175,71 +166,12 @@ else
delete_cluster_exit
fi
# --- New feature tests: namespace, labels, jobs, secrets ---
# Check that the pod is in the deployment-specific namespace (not default)
ns_pod_count=$(kubectl get pods -n laconic-${deployment_id} -l app=${deployment_id} --no-headers 2>/dev/null | wc -l)
if [ "$ns_pod_count" -gt 0 ]; then
echo "namespace isolation test: passed"
else
echo "namespace isolation test: FAILED"
echo "Expected pod in namespace laconic-${deployment_id}"
delete_cluster_exit
fi
# Check that the stack label is set on the pod
stack_label_count=$(kubectl get pods -n laconic-${deployment_id} -l app.kubernetes.io/stack=test --no-headers 2>/dev/null | wc -l)
if [ "$stack_label_count" -gt 0 ]; then
echo "stack label test: passed"
else
echo "stack label test: FAILED"
delete_cluster_exit
fi
# Check that the job completed successfully
for i in {1..30}; do
job_status=$(kubectl get job ${deployment_id}-job-test-job -n laconic-${deployment_id} -o jsonpath='{.status.succeeded}' 2>/dev/null || true)
if [ "$job_status" == "1" ]; then
break
fi
sleep 2
done
if [ "$job_status" == "1" ]; then
echo "job completion test: passed"
else
echo "job completion test: FAILED"
echo "Job status.succeeded: ${job_status}"
delete_cluster_exit
fi
# Check that the secrets spec results in an envFrom secretRef on the pod
secret_ref=$(kubectl get pod -n laconic-${deployment_id} -l app=${deployment_id} \
-o jsonpath='{.items[0].spec.containers[0].envFrom[?(@.secretRef.name=="test-secret")].secretRef.name}' 2>/dev/null || true)
if [ "$secret_ref" == "test-secret" ]; then
echo "secrets envFrom test: passed"
else
echo "secrets envFrom test: FAILED"
echo "Expected secretRef 'test-secret', got: ${secret_ref}"
delete_cluster_exit
fi
# Stop then start again and check the volume was preserved.
# Use --skip-cluster-management to reuse the existing kind cluster instead of
# destroying and recreating it (which fails on CI runners due to stale etcd/certs
# and cgroup detection issues).
# Use --delete-volumes to clear PVs so fresh PVCs can bind on restart.
# Bind-mount data survives on the host filesystem; provisioner volumes are recreated fresh.
$TEST_TARGET_SO deployment --dir $test_deployment_dir stop --delete-volumes --skip-cluster-management
# Wait for the namespace to be fully terminated before restarting.
# Without this, 'start' fails with 403 Forbidden because the namespace
# is still in Terminating state.
for i in {1..60}; do
if ! kubectl get namespace laconic-${deployment_id} 2>/dev/null | grep -q .; then
break
fi
sleep 2
done
$TEST_TARGET_SO deployment --dir $test_deployment_dir start --skip-cluster-management
# Stop then start again and check the volume was preserved
$TEST_TARGET_SO deployment --dir $test_deployment_dir stop
# Sleep a bit just in case
# sleep for longer to check if that's why the subsequent create cluster fails
sleep 20
$TEST_TARGET_SO deployment --dir $test_deployment_dir start
wait_for_pods_started
wait_for_log_output
sleep 1
@ -252,9 +184,8 @@ else
delete_cluster_exit
fi
# Provisioner volumes are destroyed when PVs are deleted (--delete-volumes on stop).
# Unlike bind-mount volumes whose data persists on the host, provisioner storage
# is gone, so the volume appears fresh after restart.
# These volumes will be completely destroyed by the kind delete/create, because they lived inside
# the kind container. So, unlike the bind-mount case, they will appear fresh after the restart.
log_output_11=$( $TEST_TARGET_SO deployment --dir $test_deployment_dir logs )
if [[ "$log_output_11" == *"/data2 filesystem is fresh"* ]]; then
echo "Fresh provisioner volumes test: passed"

View File

@ -206,7 +206,7 @@ fi
# The deployment's pod should be scheduled onto node: worker3
# Check that's what happened
# Get get the node onto which the stack pod has been deployed
deployment_node=$(kubectl get pods -n laconic-${deployment_id} -l app=${deployment_id} -o=jsonpath='{.items..spec.nodeName}')
deployment_node=$(kubectl get pods -l app=${deployment_id} -o=jsonpath='{.items..spec.nodeName}')
expected_node=${deployment_id}-worker3
echo "Stack pod deployed to node: ${deployment_node}"
if [[ ${deployment_node} == ${expected_node} ]]; then

View File

@ -1,5 +1,5 @@
#!/usr/bin/env bash
# TODO: handle ARM
curl --silent -Lo ./kind https://kind.sigs.k8s.io/dl/v0.25.0/kind-linux-amd64
curl --silent -Lo ./kind https://kind.sigs.k8s.io/dl/v0.20.0/kind-linux-amd64
chmod +x ./kind
mv ./kind /usr/local/bin

View File

@ -1,6 +1,5 @@
#!/usr/bin/env bash
# TODO: handle ARM
# Pin kubectl to match Kind's default k8s version (v1.31.x)
curl --silent -LO "https://dl.k8s.io/release/v1.31.2/bin/linux/amd64/kubectl"
curl --silent -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x ./kubectl
mv ./kubectl /usr/local/bin

View File

@ -1,53 +0,0 @@
#!/bin/bash
# Run a test suite locally in an isolated venv.
#
# Usage:
# ./tests/scripts/run-test-local.sh <test-script>
#
# Examples:
# ./tests/scripts/run-test-local.sh tests/webapp-test/run-webapp-test.sh
# ./tests/scripts/run-test-local.sh tests/smoke-test/run-smoke-test.sh
# ./tests/scripts/run-test-local.sh tests/k8s-deploy/run-deploy-test.sh
#
# The script creates a temporary venv, installs shiv, builds the laconic-so
# package, runs the requested test, then cleans up.
set -euo pipefail
if [ $# -lt 1 ]; then
echo "Usage: $0 <test-script> [args...]"
exit 1
fi
TEST_SCRIPT="$1"
shift
if [ ! -f "$TEST_SCRIPT" ]; then
echo "Error: $TEST_SCRIPT not found"
exit 1
fi
REPO_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
VENV_DIR=$(mktemp -d /tmp/so-test-XXXXXX)
cleanup() {
echo "Cleaning up venv: $VENV_DIR"
rm -rf "$VENV_DIR"
}
trap cleanup EXIT
cd "$REPO_DIR"
echo "==> Creating venv in $VENV_DIR"
python3 -m venv "$VENV_DIR"
source "$VENV_DIR/bin/activate"
echo "==> Installing shiv"
pip install -q shiv
echo "==> Building laconic-so package"
./scripts/create_build_tag_file.sh
./scripts/build_shiv_package.sh
echo "==> Running: $TEST_SCRIPT $*"
exec "./$TEST_SCRIPT" "$@"