forked from cerc-io/stack-orchestrator
Merge pull request 'feat(k8s): ACME email fix, etcd persistence, volume paths' (#986) from fix-caddy-acme-email-rbac into main
Reviewed-on: cerc-io/stack-orchestrator#986
This commit is contained in:
commit
21d47908cc
71
CLAUDE.md
71
CLAUDE.md
@ -8,6 +8,7 @@ NEVER assume your hypotheses are true without evidence
|
|||||||
|
|
||||||
ALWAYS clearly state when something is a hypothesis
|
ALWAYS clearly state when something is a hypothesis
|
||||||
ALWAYS use evidence from the systems your interacting with to support your claims and hypotheses
|
ALWAYS use evidence from the systems your interacting with to support your claims and hypotheses
|
||||||
|
ALWAYS run `pre-commit run --all-files` before committing changes
|
||||||
|
|
||||||
## Key Principles
|
## Key Principles
|
||||||
|
|
||||||
@ -43,6 +44,76 @@ This project follows principles inspired by literate programming, where developm
|
|||||||
|
|
||||||
This approach treats the human-AI collaboration as a form of **conversational literate programming** where understanding emerges through dialogue before code implementation.
|
This approach treats the human-AI collaboration as a form of **conversational literate programming** where understanding emerges through dialogue before code implementation.
|
||||||
|
|
||||||
|
## External Stacks Preferred
|
||||||
|
|
||||||
|
When creating new stacks for any reason, **use the external stack pattern** rather than adding stacks directly to this repository.
|
||||||
|
|
||||||
|
External stacks follow this structure:
|
||||||
|
|
||||||
|
```
|
||||||
|
my-stack/
|
||||||
|
└── stack-orchestrator/
|
||||||
|
├── stacks/
|
||||||
|
│ └── my-stack/
|
||||||
|
│ ├── stack.yml
|
||||||
|
│ └── README.md
|
||||||
|
├── compose/
|
||||||
|
│ └── docker-compose-my-stack.yml
|
||||||
|
└── config/
|
||||||
|
└── my-stack/
|
||||||
|
└── (config files)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Usage
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Fetch external stack
|
||||||
|
laconic-so fetch-stack github.com/org/my-stack
|
||||||
|
|
||||||
|
# Use external stack
|
||||||
|
STACK_PATH=~/cerc/my-stack/stack-orchestrator/stacks/my-stack
|
||||||
|
laconic-so --stack $STACK_PATH deploy init --output spec.yml
|
||||||
|
laconic-so --stack $STACK_PATH deploy create --spec-file spec.yml --deployment-dir deployment
|
||||||
|
laconic-so deployment --dir deployment start
|
||||||
|
```
|
||||||
|
|
||||||
|
### Examples
|
||||||
|
|
||||||
|
- `zenith-karma-stack` - Karma watcher deployment
|
||||||
|
- `urbit-stack` - Fake Urbit ship for testing
|
||||||
|
- `zenith-desk-stack` - Desk deployment stack
|
||||||
|
|
||||||
|
## Architecture: k8s-kind Deployments
|
||||||
|
|
||||||
|
### One Cluster Per Host
|
||||||
|
One Kind cluster per host by design. Never request or expect separate clusters.
|
||||||
|
|
||||||
|
- `create_cluster()` in `helpers.py` reuses any existing cluster
|
||||||
|
- `cluster-id` in deployment.yml is an identifier, not a cluster request
|
||||||
|
- All deployments share: ingress controller, etcd, certificates
|
||||||
|
|
||||||
|
### Stack Resolution
|
||||||
|
- External stacks detected via `Path(stack).exists()` in `util.py`
|
||||||
|
- Config/compose resolution: external path first, then internal fallback
|
||||||
|
- External path structure: `stack_orchestrator/data/stacks/<name>/stack.yml`
|
||||||
|
|
||||||
|
### Secret Generation Implementation
|
||||||
|
- `GENERATE_TOKEN_PATTERN` in `deployment_create.py` matches `$generate:type:length$`
|
||||||
|
- `_generate_and_store_secrets()` creates K8s Secret
|
||||||
|
- `cluster_info.py` adds `envFrom` with `secretRef` to containers
|
||||||
|
- Non-secret config written to `config.env`
|
||||||
|
|
||||||
|
### Repository Cloning
|
||||||
|
`setup-repositories --git-ssh` clones repos defined in stack.yml's `repos:` field. Requires SSH agent.
|
||||||
|
|
||||||
|
### Key Files (for codebase navigation)
|
||||||
|
- `repos/setup_repositories.py`: `setup-repositories` command (git clone)
|
||||||
|
- `deployment_create.py`: `deploy create` command, secret generation
|
||||||
|
- `deployment.py`: `deployment start/stop/restart` commands
|
||||||
|
- `deploy_k8s.py`: K8s deployer, cluster management calls
|
||||||
|
- `helpers.py`: `create_cluster()`, etcd cleanup, kind operations
|
||||||
|
- `cluster_info.py`: K8s resource generation (Deployment, Service, Ingress)
|
||||||
|
|
||||||
## Insights and Observations
|
## Insights and Observations
|
||||||
|
|
||||||
### Design Principles
|
### Design Principles
|
||||||
|
|||||||
53
README.md
53
README.md
@ -71,6 +71,59 @@ The various [stacks](/stack_orchestrator/data/stacks) each contain instructions
|
|||||||
- [laconicd with console and CLI](stack_orchestrator/data/stacks/fixturenet-laconic-loaded)
|
- [laconicd with console and CLI](stack_orchestrator/data/stacks/fixturenet-laconic-loaded)
|
||||||
- [kubo (IPFS)](stack_orchestrator/data/stacks/kubo)
|
- [kubo (IPFS)](stack_orchestrator/data/stacks/kubo)
|
||||||
|
|
||||||
|
## Deployment Types
|
||||||
|
|
||||||
|
- **compose**: Docker Compose on local machine
|
||||||
|
- **k8s**: External Kubernetes cluster (requires kubeconfig)
|
||||||
|
- **k8s-kind**: Local Kubernetes via Kind - one cluster per host, shared by all deployments
|
||||||
|
|
||||||
|
## External Stacks
|
||||||
|
|
||||||
|
Stacks can live in external git repositories. Required structure:
|
||||||
|
|
||||||
|
```
|
||||||
|
<repo>/
|
||||||
|
stack_orchestrator/data/
|
||||||
|
stacks/<stack-name>/stack.yml
|
||||||
|
compose/docker-compose-<pod-name>.yml
|
||||||
|
deployment/spec.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
## Deployment Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Create deployment from spec
|
||||||
|
laconic-so --stack <path> deploy create --spec-file <spec.yml> --deployment-dir <dir>
|
||||||
|
|
||||||
|
# Start (creates cluster on first run)
|
||||||
|
laconic-so deployment --dir <dir> start
|
||||||
|
|
||||||
|
# GitOps restart (git pull + redeploy, preserves data)
|
||||||
|
laconic-so deployment --dir <dir> restart
|
||||||
|
|
||||||
|
# Stop
|
||||||
|
laconic-so deployment --dir <dir> stop
|
||||||
|
```
|
||||||
|
|
||||||
|
## spec.yml Reference
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
stack: stack-name-or-path
|
||||||
|
deploy-to: k8s-kind
|
||||||
|
network:
|
||||||
|
http-proxy:
|
||||||
|
- host-name: app.example.com
|
||||||
|
routes:
|
||||||
|
- path: /
|
||||||
|
proxy-to: service-name:port
|
||||||
|
acme-email: admin@example.com
|
||||||
|
config:
|
||||||
|
ENV_VAR: value
|
||||||
|
SECRET_VAR: $generate:hex:32$ # Auto-generated, stored in K8s Secret
|
||||||
|
volumes:
|
||||||
|
volume-name:
|
||||||
|
```
|
||||||
|
|
||||||
## Contributing
|
## Contributing
|
||||||
|
|
||||||
See the [CONTRIBUTING.md](/docs/CONTRIBUTING.md) for developer mode install.
|
See the [CONTRIBUTING.md](/docs/CONTRIBUTING.md) for developer mode install.
|
||||||
|
|||||||
202
docs/deployment_patterns.md
Normal file
202
docs/deployment_patterns.md
Normal file
@ -0,0 +1,202 @@
|
|||||||
|
# Deployment Patterns
|
||||||
|
|
||||||
|
## GitOps Pattern
|
||||||
|
|
||||||
|
For production deployments, we recommend a GitOps approach where your deployment configuration is tracked in version control.
|
||||||
|
|
||||||
|
### Overview
|
||||||
|
|
||||||
|
- **spec.yml is your source of truth**: Maintain it in your operator repository
|
||||||
|
- **Don't regenerate on every restart**: Run `deploy init` once, then customize and commit
|
||||||
|
- **Use restart for updates**: The restart command respects your git-tracked spec.yml
|
||||||
|
|
||||||
|
### Workflow
|
||||||
|
|
||||||
|
1. **Initial setup**: Run `deploy init` once to generate a spec.yml template
|
||||||
|
2. **Customize and commit**: Edit spec.yml with your configuration (hostnames, resources, etc.) and commit to your operator repo
|
||||||
|
3. **Deploy from git**: Use the committed spec.yml for deployments
|
||||||
|
4. **Update via git**: Make changes in git, then restart to apply
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Initial setup (run once)
|
||||||
|
laconic-so --stack my-stack deploy init --output spec.yml
|
||||||
|
|
||||||
|
# Customize for your environment
|
||||||
|
vim spec.yml # Set hostname, resources, etc.
|
||||||
|
|
||||||
|
# Commit to your operator repository
|
||||||
|
git add spec.yml
|
||||||
|
git commit -m "Add my-stack deployment configuration"
|
||||||
|
git push
|
||||||
|
|
||||||
|
# On deployment server: deploy from git-tracked spec
|
||||||
|
laconic-so deploy create \
|
||||||
|
--spec-file /path/to/operator-repo/spec.yml \
|
||||||
|
--deployment-dir my-deployment
|
||||||
|
|
||||||
|
laconic-so deployment --dir my-deployment start
|
||||||
|
```
|
||||||
|
|
||||||
|
### Updating Deployments
|
||||||
|
|
||||||
|
When you need to update a deployment:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Make changes in your operator repo
|
||||||
|
vim /path/to/operator-repo/spec.yml
|
||||||
|
git commit -am "Update configuration"
|
||||||
|
git push
|
||||||
|
|
||||||
|
# 2. On deployment server: pull and restart
|
||||||
|
cd /path/to/operator-repo && git pull
|
||||||
|
laconic-so deployment --dir my-deployment restart
|
||||||
|
```
|
||||||
|
|
||||||
|
The `restart` command:
|
||||||
|
- Pulls latest code from the stack repository
|
||||||
|
- Uses your git-tracked spec.yml (does NOT regenerate from defaults)
|
||||||
|
- Syncs the deployment directory
|
||||||
|
- Restarts services
|
||||||
|
|
||||||
|
### Anti-patterns
|
||||||
|
|
||||||
|
**Don't do this:**
|
||||||
|
```bash
|
||||||
|
# BAD: Regenerating spec on every deployment
|
||||||
|
laconic-so --stack my-stack deploy init --output spec.yml
|
||||||
|
laconic-so deploy create --spec-file spec.yml ...
|
||||||
|
```
|
||||||
|
|
||||||
|
This overwrites your customizations with defaults from the stack's `commands.py`.
|
||||||
|
|
||||||
|
**Do this instead:**
|
||||||
|
```bash
|
||||||
|
# GOOD: Use your git-tracked spec
|
||||||
|
git pull # Get latest spec.yml from your operator repo
|
||||||
|
laconic-so deployment --dir my-deployment restart
|
||||||
|
```
|
||||||
|
|
||||||
|
## Private Registry Authentication
|
||||||
|
|
||||||
|
For deployments using images from private container registries (e.g., GitHub Container Registry), configure authentication in your spec.yml:
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
Add a `registry-credentials` section to your spec.yml:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
registry-credentials:
|
||||||
|
server: ghcr.io
|
||||||
|
username: your-org-or-username
|
||||||
|
token-env: REGISTRY_TOKEN
|
||||||
|
```
|
||||||
|
|
||||||
|
**Fields:**
|
||||||
|
- `server`: The registry hostname (e.g., `ghcr.io`, `docker.io`, `gcr.io`)
|
||||||
|
- `username`: Registry username (for GHCR, use your GitHub username or org name)
|
||||||
|
- `token-env`: Name of the environment variable containing your API token/PAT
|
||||||
|
|
||||||
|
### Token Environment Variable
|
||||||
|
|
||||||
|
The `token-env` pattern keeps credentials out of version control. Set the environment variable when running `deployment start`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export REGISTRY_TOKEN="your-personal-access-token"
|
||||||
|
laconic-so deployment --dir my-deployment start
|
||||||
|
```
|
||||||
|
|
||||||
|
For GHCR, create a Personal Access Token (PAT) with `read:packages` scope.
|
||||||
|
|
||||||
|
### Ansible Integration
|
||||||
|
|
||||||
|
When using Ansible for deployments, pass the token from a credentials file:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
- name: Start deployment
|
||||||
|
ansible.builtin.command:
|
||||||
|
cmd: laconic-so deployment --dir {{ deployment_dir }} start
|
||||||
|
environment:
|
||||||
|
REGISTRY_TOKEN: "{{ lookup('file', '~/.credentials/ghcr_token') }}"
|
||||||
|
```
|
||||||
|
|
||||||
|
### How It Works
|
||||||
|
|
||||||
|
1. laconic-so reads the `registry-credentials` config from spec.yml
|
||||||
|
2. Creates a Kubernetes `docker-registry` secret named `{deployment}-registry`
|
||||||
|
3. The deployment's pods reference this secret for image pulls
|
||||||
|
|
||||||
|
## Cluster and Volume Management
|
||||||
|
|
||||||
|
### Stopping Deployments
|
||||||
|
|
||||||
|
The `deployment stop` command has two important flags:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Default: stops deployment, deletes cluster, PRESERVES volumes
|
||||||
|
laconic-so deployment --dir my-deployment stop
|
||||||
|
|
||||||
|
# Explicitly delete volumes (USE WITH CAUTION)
|
||||||
|
laconic-so deployment --dir my-deployment stop --delete-volumes
|
||||||
|
```
|
||||||
|
|
||||||
|
### Volume Persistence
|
||||||
|
|
||||||
|
Volumes persist across cluster deletion by design. This is important because:
|
||||||
|
- **Data survives cluster recreation**: Ledger data, databases, and other state are preserved
|
||||||
|
- **Faster recovery**: No need to re-sync or rebuild data after cluster issues
|
||||||
|
- **Safe cluster upgrades**: Delete and recreate cluster without data loss
|
||||||
|
|
||||||
|
**Only use `--delete-volumes` when:**
|
||||||
|
- You explicitly want to start fresh with no data
|
||||||
|
- The user specifically requests volume deletion
|
||||||
|
- You're cleaning up a test/dev environment completely
|
||||||
|
|
||||||
|
### Shared Cluster Architecture
|
||||||
|
|
||||||
|
In kind deployments, multiple stacks share a single cluster:
|
||||||
|
- First `deployment start` creates the cluster
|
||||||
|
- Subsequent deployments reuse the existing cluster
|
||||||
|
- `deployment stop` on ANY deployment deletes the shared cluster
|
||||||
|
- Other deployments will fail until cluster is recreated
|
||||||
|
|
||||||
|
To stop a single deployment without affecting the cluster:
|
||||||
|
```bash
|
||||||
|
laconic-so deployment --dir my-deployment stop --skip-cluster-management
|
||||||
|
```
|
||||||
|
|
||||||
|
## Volume Persistence in k8s-kind
|
||||||
|
|
||||||
|
k8s-kind has 3 storage layers:
|
||||||
|
|
||||||
|
- **Docker Host**: The physical server running Docker
|
||||||
|
- **Kind Node**: A Docker container simulating a k8s node
|
||||||
|
- **Pod Container**: Your workload
|
||||||
|
|
||||||
|
For k8s-kind, volumes with paths are mounted from Docker Host → Kind Node → Pod via extraMounts.
|
||||||
|
|
||||||
|
| spec.yml volume | Storage Location | Survives Pod Restart | Survives Cluster Restart |
|
||||||
|
|-----------------|------------------|---------------------|-------------------------|
|
||||||
|
| `vol:` (empty) | Kind Node PVC | ✅ | ❌ |
|
||||||
|
| `vol: ./data/x` | Docker Host | ✅ | ✅ |
|
||||||
|
| `vol: /abs/path`| Docker Host | ✅ | ✅ |
|
||||||
|
|
||||||
|
**Recommendation**: Always use paths for data you want to keep. Relative paths
|
||||||
|
(e.g., `./data/rpc-config`) resolve to `$DEPLOYMENT_DIR/data/rpc-config` on the
|
||||||
|
Docker Host.
|
||||||
|
|
||||||
|
### Example
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# In spec.yml
|
||||||
|
volumes:
|
||||||
|
rpc-config: ./data/rpc-config # Persists to $DEPLOYMENT_DIR/data/rpc-config
|
||||||
|
chain-data: ./data/chain # Persists to $DEPLOYMENT_DIR/data/chain
|
||||||
|
temp-cache: # Empty = Kind Node PVC (lost on cluster delete)
|
||||||
|
```
|
||||||
|
|
||||||
|
### The Antipattern
|
||||||
|
|
||||||
|
Empty-path volumes appear persistent because they survive pod restarts (data lives
|
||||||
|
in Kind Node container). However, this data is lost when the kind cluster is
|
||||||
|
recreated. This "false persistence" has caused data loss when operators assumed
|
||||||
|
their data was safe.
|
||||||
@ -44,3 +44,4 @@ unlimited_memlock_key = "unlimited-memlock"
|
|||||||
runtime_class_key = "runtime-class"
|
runtime_class_key = "runtime-class"
|
||||||
high_memlock_runtime = "high-memlock"
|
high_memlock_runtime = "high-memlock"
|
||||||
high_memlock_spec_filename = "high-memlock-spec.json"
|
high_memlock_spec_filename = "high-memlock-spec.json"
|
||||||
|
acme_email_key = "acme-email"
|
||||||
|
|||||||
@ -93,6 +93,7 @@ rules:
|
|||||||
- get
|
- get
|
||||||
- create
|
- create
|
||||||
- update
|
- update
|
||||||
|
- delete
|
||||||
---
|
---
|
||||||
apiVersion: rbac.authorization.k8s.io/v1
|
apiVersion: rbac.authorization.k8s.io/v1
|
||||||
kind: ClusterRoleBinding
|
kind: ClusterRoleBinding
|
||||||
|
|||||||
@ -15,7 +15,9 @@
|
|||||||
|
|
||||||
import click
|
import click
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
import subprocess
|
||||||
import sys
|
import sys
|
||||||
|
import time
|
||||||
from stack_orchestrator import constants
|
from stack_orchestrator import constants
|
||||||
from stack_orchestrator.deploy.images import push_images_operation
|
from stack_orchestrator.deploy.images import push_images_operation
|
||||||
from stack_orchestrator.deploy.deploy import (
|
from stack_orchestrator.deploy.deploy import (
|
||||||
@ -228,3 +230,176 @@ def run_job(ctx, job_name, helm_release):
|
|||||||
|
|
||||||
ctx.obj = make_deploy_context(ctx)
|
ctx.obj = make_deploy_context(ctx)
|
||||||
run_job_operation(ctx, job_name, helm_release)
|
run_job_operation(ctx, job_name, helm_release)
|
||||||
|
|
||||||
|
|
||||||
|
@command.command()
|
||||||
|
@click.option("--stack-path", help="Path to stack git repo (overrides stored path)")
|
||||||
|
@click.option(
|
||||||
|
"--spec-file", help="Path to GitOps spec.yml in repo (e.g., deployment/spec.yml)"
|
||||||
|
)
|
||||||
|
@click.option("--config-file", help="Config file to pass to deploy init")
|
||||||
|
@click.option(
|
||||||
|
"--force",
|
||||||
|
is_flag=True,
|
||||||
|
default=False,
|
||||||
|
help="Skip DNS verification",
|
||||||
|
)
|
||||||
|
@click.option(
|
||||||
|
"--expected-ip",
|
||||||
|
help="Expected IP for DNS verification (if different from egress)",
|
||||||
|
)
|
||||||
|
@click.pass_context
|
||||||
|
def restart(ctx, stack_path, spec_file, config_file, force, expected_ip):
|
||||||
|
"""Pull latest code and restart deployment using git-tracked spec.
|
||||||
|
|
||||||
|
GitOps workflow:
|
||||||
|
1. Operator maintains spec.yml in their git repository
|
||||||
|
2. This command pulls latest code (including updated spec.yml)
|
||||||
|
3. If hostname changed, verifies DNS routes to this server
|
||||||
|
4. Syncs deployment directory with the git-tracked spec
|
||||||
|
5. Stops and restarts the deployment
|
||||||
|
|
||||||
|
Data volumes are always preserved. The cluster is never destroyed.
|
||||||
|
|
||||||
|
Stack source resolution (in order):
|
||||||
|
1. --stack-path argument (if provided)
|
||||||
|
2. stack-source field in deployment.yml (if stored)
|
||||||
|
3. Error if neither available
|
||||||
|
|
||||||
|
Note: spec.yml should be maintained in git, not regenerated from
|
||||||
|
commands.py on each restart. Use 'deploy init' only for initial
|
||||||
|
spec generation, then customize and commit to your operator repo.
|
||||||
|
"""
|
||||||
|
from stack_orchestrator.util import get_yaml, get_parsed_deployment_spec
|
||||||
|
from stack_orchestrator.deploy.deployment_create import create_operation
|
||||||
|
from stack_orchestrator.deploy.dns_probe import verify_dns_via_probe
|
||||||
|
|
||||||
|
deployment_context: DeploymentContext = ctx.obj
|
||||||
|
|
||||||
|
# Get current spec info (before git pull)
|
||||||
|
current_spec = deployment_context.spec
|
||||||
|
current_http_proxy = current_spec.get_http_proxy()
|
||||||
|
current_hostname = (
|
||||||
|
current_http_proxy[0]["host-name"] if current_http_proxy else None
|
||||||
|
)
|
||||||
|
|
||||||
|
# Resolve stack source path
|
||||||
|
if stack_path:
|
||||||
|
stack_source = Path(stack_path).resolve()
|
||||||
|
else:
|
||||||
|
# Try to get from deployment.yml
|
||||||
|
deployment_file = (
|
||||||
|
deployment_context.deployment_dir / constants.deployment_file_name
|
||||||
|
)
|
||||||
|
deployment_data = get_yaml().load(open(deployment_file))
|
||||||
|
stack_source_str = deployment_data.get("stack-source")
|
||||||
|
if not stack_source_str:
|
||||||
|
print(
|
||||||
|
"Error: No stack-source in deployment.yml and --stack-path not provided"
|
||||||
|
)
|
||||||
|
print("Use --stack-path to specify the stack git repository location")
|
||||||
|
sys.exit(1)
|
||||||
|
stack_source = Path(stack_source_str)
|
||||||
|
|
||||||
|
if not stack_source.exists():
|
||||||
|
print(f"Error: Stack source path does not exist: {stack_source}")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
print("=== Deployment Restart ===")
|
||||||
|
print(f"Deployment dir: {deployment_context.deployment_dir}")
|
||||||
|
print(f"Stack source: {stack_source}")
|
||||||
|
print(f"Current hostname: {current_hostname}")
|
||||||
|
|
||||||
|
# Step 1: Git pull (brings in updated spec.yml from operator's repo)
|
||||||
|
print("\n[1/4] Pulling latest code from stack repository...")
|
||||||
|
git_result = subprocess.run(
|
||||||
|
["git", "pull"], cwd=stack_source, capture_output=True, text=True
|
||||||
|
)
|
||||||
|
if git_result.returncode != 0:
|
||||||
|
print(f"Git pull failed: {git_result.stderr}")
|
||||||
|
sys.exit(1)
|
||||||
|
print(f"Git pull: {git_result.stdout.strip()}")
|
||||||
|
|
||||||
|
# Determine spec file location
|
||||||
|
# Priority: --spec-file argument > repo's deployment/spec.yml > deployment dir
|
||||||
|
# Stack path is like: repo/stack_orchestrator/data/stacks/stack-name
|
||||||
|
# So repo root is 4 parents up
|
||||||
|
repo_root = stack_source.parent.parent.parent.parent
|
||||||
|
if spec_file:
|
||||||
|
# Spec file relative to repo root
|
||||||
|
spec_file_path = repo_root / spec_file
|
||||||
|
else:
|
||||||
|
# Try standard GitOps location in repo
|
||||||
|
gitops_spec = repo_root / "deployment" / "spec.yml"
|
||||||
|
if gitops_spec.exists():
|
||||||
|
spec_file_path = gitops_spec
|
||||||
|
else:
|
||||||
|
# Fall back to deployment directory
|
||||||
|
spec_file_path = deployment_context.deployment_dir / "spec.yml"
|
||||||
|
|
||||||
|
if not spec_file_path.exists():
|
||||||
|
print(f"Error: spec.yml not found at {spec_file_path}")
|
||||||
|
print("For GitOps, add spec.yml to your repo at deployment/spec.yml")
|
||||||
|
print("Or specify --spec-file with path relative to repo root")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
print(f"Using spec: {spec_file_path}")
|
||||||
|
|
||||||
|
# Parse spec to check for hostname changes
|
||||||
|
new_spec_obj = get_parsed_deployment_spec(str(spec_file_path))
|
||||||
|
new_http_proxy = new_spec_obj.get("network", {}).get("http-proxy", [])
|
||||||
|
new_hostname = new_http_proxy[0]["host-name"] if new_http_proxy else None
|
||||||
|
|
||||||
|
print(f"Spec hostname: {new_hostname}")
|
||||||
|
|
||||||
|
# Step 2: DNS verification (only if hostname changed)
|
||||||
|
if new_hostname and new_hostname != current_hostname:
|
||||||
|
print(f"\n[2/4] Hostname changed: {current_hostname} -> {new_hostname}")
|
||||||
|
if force:
|
||||||
|
print("DNS verification skipped (--force)")
|
||||||
|
else:
|
||||||
|
print("Verifying DNS via probe...")
|
||||||
|
if not verify_dns_via_probe(new_hostname):
|
||||||
|
print(f"\nDNS verification failed for {new_hostname}")
|
||||||
|
print("Ensure DNS is configured before restarting.")
|
||||||
|
print("Use --force to skip this check.")
|
||||||
|
sys.exit(1)
|
||||||
|
else:
|
||||||
|
print("\n[2/4] Hostname unchanged, skipping DNS verification")
|
||||||
|
|
||||||
|
# Step 3: Sync deployment directory with spec
|
||||||
|
print("\n[3/4] Syncing deployment directory...")
|
||||||
|
deploy_ctx = make_deploy_context(ctx)
|
||||||
|
create_operation(
|
||||||
|
deployment_command_context=deploy_ctx,
|
||||||
|
spec_file=str(spec_file_path),
|
||||||
|
deployment_dir=str(deployment_context.deployment_dir),
|
||||||
|
update=True,
|
||||||
|
network_dir=None,
|
||||||
|
initial_peers=None,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Reload deployment context with updated spec
|
||||||
|
deployment_context.init(deployment_context.deployment_dir)
|
||||||
|
ctx.obj = deployment_context
|
||||||
|
|
||||||
|
# Stop deployment
|
||||||
|
print("\n[4/4] Restarting deployment...")
|
||||||
|
ctx.obj = make_deploy_context(ctx)
|
||||||
|
down_operation(
|
||||||
|
ctx, delete_volumes=False, extra_args_list=[], skip_cluster_management=True
|
||||||
|
)
|
||||||
|
|
||||||
|
# Brief pause to ensure clean shutdown
|
||||||
|
time.sleep(5)
|
||||||
|
|
||||||
|
# Start deployment
|
||||||
|
up_operation(
|
||||||
|
ctx, services_list=None, stay_attached=False, skip_cluster_management=True
|
||||||
|
)
|
||||||
|
|
||||||
|
print("\n=== Restart Complete ===")
|
||||||
|
print("Deployment restarted with git-tracked configuration.")
|
||||||
|
if new_hostname and new_hostname != current_hostname:
|
||||||
|
print(f"\nNew hostname: {new_hostname}")
|
||||||
|
print("Caddy will automatically provision TLS certificate.")
|
||||||
|
|||||||
@ -15,9 +15,12 @@
|
|||||||
|
|
||||||
import click
|
import click
|
||||||
from importlib import util
|
from importlib import util
|
||||||
|
import json
|
||||||
import os
|
import os
|
||||||
|
import re
|
||||||
|
import base64
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import List
|
from typing import List, Optional
|
||||||
import random
|
import random
|
||||||
from shutil import copy, copyfile, copytree, rmtree
|
from shutil import copy, copyfile, copytree, rmtree
|
||||||
from secrets import token_hex
|
from secrets import token_hex
|
||||||
@ -484,15 +487,180 @@ def init_operation(
|
|||||||
get_yaml().dump(spec_file_content, output_file)
|
get_yaml().dump(spec_file_content, output_file)
|
||||||
|
|
||||||
|
|
||||||
def _write_config_file(spec_file: Path, config_env_file: Path):
|
# Token pattern: $generate:hex:32$ or $generate:base64:16$
|
||||||
|
GENERATE_TOKEN_PATTERN = re.compile(r"\$generate:(\w+):(\d+)\$")
|
||||||
|
|
||||||
|
|
||||||
|
def _generate_and_store_secrets(config_vars: dict, deployment_name: str):
|
||||||
|
"""Generate secrets for $generate:...$ tokens and store in K8s Secret.
|
||||||
|
|
||||||
|
Called by `deploy create` - generates fresh secrets and stores them.
|
||||||
|
Returns the generated secrets dict for reference.
|
||||||
|
"""
|
||||||
|
from kubernetes import client, config as k8s_config
|
||||||
|
|
||||||
|
secrets = {}
|
||||||
|
for name, value in config_vars.items():
|
||||||
|
if not isinstance(value, str):
|
||||||
|
continue
|
||||||
|
match = GENERATE_TOKEN_PATTERN.search(value)
|
||||||
|
if not match:
|
||||||
|
continue
|
||||||
|
|
||||||
|
secret_type, length = match.group(1), int(match.group(2))
|
||||||
|
if secret_type == "hex":
|
||||||
|
secrets[name] = token_hex(length)
|
||||||
|
elif secret_type == "base64":
|
||||||
|
secrets[name] = base64.b64encode(os.urandom(length)).decode()
|
||||||
|
else:
|
||||||
|
secrets[name] = token_hex(length)
|
||||||
|
|
||||||
|
if not secrets:
|
||||||
|
return secrets
|
||||||
|
|
||||||
|
# Store in K8s Secret
|
||||||
|
try:
|
||||||
|
k8s_config.load_kube_config()
|
||||||
|
except Exception:
|
||||||
|
# Fall back to in-cluster config if available
|
||||||
|
try:
|
||||||
|
k8s_config.load_incluster_config()
|
||||||
|
except Exception:
|
||||||
|
print(
|
||||||
|
"Warning: Could not load kube config, secrets will not be stored in K8s"
|
||||||
|
)
|
||||||
|
return secrets
|
||||||
|
|
||||||
|
v1 = client.CoreV1Api()
|
||||||
|
secret_name = f"{deployment_name}-generated-secrets"
|
||||||
|
namespace = "default"
|
||||||
|
|
||||||
|
secret_data = {k: base64.b64encode(v.encode()).decode() for k, v in secrets.items()}
|
||||||
|
k8s_secret = client.V1Secret(
|
||||||
|
metadata=client.V1ObjectMeta(name=secret_name), data=secret_data, type="Opaque"
|
||||||
|
)
|
||||||
|
|
||||||
|
try:
|
||||||
|
v1.create_namespaced_secret(namespace, k8s_secret)
|
||||||
|
num_secrets = len(secrets)
|
||||||
|
print(f"Created K8s Secret '{secret_name}' with {num_secrets} secret(s)")
|
||||||
|
except client.exceptions.ApiException as e:
|
||||||
|
if e.status == 409: # Already exists
|
||||||
|
v1.replace_namespaced_secret(secret_name, namespace, k8s_secret)
|
||||||
|
num_secrets = len(secrets)
|
||||||
|
print(f"Updated K8s Secret '{secret_name}' with {num_secrets} secret(s)")
|
||||||
|
else:
|
||||||
|
raise
|
||||||
|
|
||||||
|
return secrets
|
||||||
|
|
||||||
|
|
||||||
|
def create_registry_secret(spec: Spec, deployment_name: str) -> Optional[str]:
|
||||||
|
"""Create K8s docker-registry secret from spec + environment.
|
||||||
|
|
||||||
|
Reads registry configuration from spec.yml and creates a Kubernetes
|
||||||
|
secret of type kubernetes.io/dockerconfigjson for image pulls.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
spec: The deployment spec containing image-registry config
|
||||||
|
deployment_name: Name of the deployment (used for secret naming)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
The secret name if created, None if no registry config
|
||||||
|
"""
|
||||||
|
from kubernetes import client, config as k8s_config
|
||||||
|
|
||||||
|
registry_config = spec.get_image_registry_config()
|
||||||
|
if not registry_config:
|
||||||
|
return None
|
||||||
|
|
||||||
|
server = registry_config.get("server")
|
||||||
|
username = registry_config.get("username")
|
||||||
|
token_env = registry_config.get("token-env")
|
||||||
|
|
||||||
|
if not all([server, username, token_env]):
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Type narrowing for pyright - we've validated these aren't None above
|
||||||
|
assert token_env is not None
|
||||||
|
token = os.environ.get(token_env)
|
||||||
|
if not token:
|
||||||
|
print(
|
||||||
|
f"Warning: Registry token env var '{token_env}' not set, "
|
||||||
|
"skipping registry secret"
|
||||||
|
)
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Create dockerconfigjson format (Docker API uses "password" field for tokens)
|
||||||
|
auth = base64.b64encode(f"{username}:{token}".encode()).decode()
|
||||||
|
docker_config = {
|
||||||
|
"auths": {server: {"username": username, "password": token, "auth": auth}}
|
||||||
|
}
|
||||||
|
|
||||||
|
# Secret name derived from deployment name
|
||||||
|
secret_name = f"{deployment_name}-registry"
|
||||||
|
|
||||||
|
# Load kube config
|
||||||
|
try:
|
||||||
|
k8s_config.load_kube_config()
|
||||||
|
except Exception:
|
||||||
|
try:
|
||||||
|
k8s_config.load_incluster_config()
|
||||||
|
except Exception:
|
||||||
|
print("Warning: Could not load kube config, registry secret not created")
|
||||||
|
return None
|
||||||
|
|
||||||
|
v1 = client.CoreV1Api()
|
||||||
|
namespace = "default"
|
||||||
|
|
||||||
|
k8s_secret = client.V1Secret(
|
||||||
|
metadata=client.V1ObjectMeta(name=secret_name),
|
||||||
|
data={
|
||||||
|
".dockerconfigjson": base64.b64encode(
|
||||||
|
json.dumps(docker_config).encode()
|
||||||
|
).decode()
|
||||||
|
},
|
||||||
|
type="kubernetes.io/dockerconfigjson",
|
||||||
|
)
|
||||||
|
|
||||||
|
try:
|
||||||
|
v1.create_namespaced_secret(namespace, k8s_secret)
|
||||||
|
print(f"Created registry secret '{secret_name}' for {server}")
|
||||||
|
except client.exceptions.ApiException as e:
|
||||||
|
if e.status == 409: # Already exists
|
||||||
|
v1.replace_namespaced_secret(secret_name, namespace, k8s_secret)
|
||||||
|
print(f"Updated registry secret '{secret_name}' for {server}")
|
||||||
|
else:
|
||||||
|
raise
|
||||||
|
|
||||||
|
return secret_name
|
||||||
|
|
||||||
|
|
||||||
|
def _write_config_file(
|
||||||
|
spec_file: Path, config_env_file: Path, deployment_name: Optional[str] = None
|
||||||
|
):
|
||||||
spec_content = get_parsed_deployment_spec(spec_file)
|
spec_content = get_parsed_deployment_spec(spec_file)
|
||||||
# Note: we want to write an empty file even if we have no config variables
|
config_vars = spec_content.get("config", {}) or {}
|
||||||
|
|
||||||
|
# Generate and store secrets in K8s if deployment_name provided and tokens exist
|
||||||
|
if deployment_name and config_vars:
|
||||||
|
has_generate_tokens = any(
|
||||||
|
isinstance(v, str) and GENERATE_TOKEN_PATTERN.search(v)
|
||||||
|
for v in config_vars.values()
|
||||||
|
)
|
||||||
|
if has_generate_tokens:
|
||||||
|
_generate_and_store_secrets(config_vars, deployment_name)
|
||||||
|
|
||||||
|
# Write non-secret config to config.env (exclude $generate:...$ tokens)
|
||||||
with open(config_env_file, "w") as output_file:
|
with open(config_env_file, "w") as output_file:
|
||||||
if "config" in spec_content and spec_content["config"]:
|
if config_vars:
|
||||||
config_vars = spec_content["config"]
|
for variable_name, variable_value in config_vars.items():
|
||||||
if config_vars:
|
# Skip variables with generate tokens - they go to K8s Secret
|
||||||
for variable_name, variable_value in config_vars.items():
|
if isinstance(variable_value, str) and GENERATE_TOKEN_PATTERN.search(
|
||||||
output_file.write(f"{variable_name}={variable_value}\n")
|
variable_value
|
||||||
|
):
|
||||||
|
continue
|
||||||
|
output_file.write(f"{variable_name}={variable_value}\n")
|
||||||
|
|
||||||
|
|
||||||
def _write_kube_config_file(external_path: Path, internal_path: Path):
|
def _write_kube_config_file(external_path: Path, internal_path: Path):
|
||||||
@ -507,11 +675,14 @@ def _copy_files_to_directory(file_paths: List[Path], directory: Path):
|
|||||||
copy(path, os.path.join(directory, os.path.basename(path)))
|
copy(path, os.path.join(directory, os.path.basename(path)))
|
||||||
|
|
||||||
|
|
||||||
def _create_deployment_file(deployment_dir: Path):
|
def _create_deployment_file(deployment_dir: Path, stack_source: Optional[Path] = None):
|
||||||
deployment_file_path = deployment_dir.joinpath(constants.deployment_file_name)
|
deployment_file_path = deployment_dir.joinpath(constants.deployment_file_name)
|
||||||
cluster = f"{constants.cluster_name_prefix}{token_hex(8)}"
|
cluster = f"{constants.cluster_name_prefix}{token_hex(8)}"
|
||||||
|
deployment_content = {constants.cluster_id_key: cluster}
|
||||||
|
if stack_source:
|
||||||
|
deployment_content["stack-source"] = str(stack_source)
|
||||||
with open(deployment_file_path, "w") as output_file:
|
with open(deployment_file_path, "w") as output_file:
|
||||||
output_file.write(f"{constants.cluster_id_key}: {cluster}\n")
|
get_yaml().dump(deployment_content, output_file)
|
||||||
|
|
||||||
|
|
||||||
def _check_volume_definitions(spec):
|
def _check_volume_definitions(spec):
|
||||||
@ -519,10 +690,14 @@ def _check_volume_definitions(spec):
|
|||||||
for volume_name, volume_path in spec.get_volumes().items():
|
for volume_name, volume_path in spec.get_volumes().items():
|
||||||
if volume_path:
|
if volume_path:
|
||||||
if not os.path.isabs(volume_path):
|
if not os.path.isabs(volume_path):
|
||||||
raise Exception(
|
# For k8s-kind: allow relative paths, they'll be resolved
|
||||||
f"Relative path {volume_path} for volume {volume_name} not "
|
# by _make_absolute_host_path() during kind config generation
|
||||||
f"supported for deployment type {spec.get_deployment_type()}"
|
if not spec.is_kind_deployment():
|
||||||
)
|
deploy_type = spec.get_deployment_type()
|
||||||
|
raise Exception(
|
||||||
|
f"Relative path {volume_path} for volume "
|
||||||
|
f"{volume_name} not supported for {deploy_type}"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
@click.command()
|
@click.command()
|
||||||
@ -616,11 +791,15 @@ def create_operation(
|
|||||||
generate_helm_chart(stack_name, spec_file, deployment_dir_path)
|
generate_helm_chart(stack_name, spec_file, deployment_dir_path)
|
||||||
return # Exit early for helm chart generation
|
return # Exit early for helm chart generation
|
||||||
|
|
||||||
|
# Resolve stack source path for restart capability
|
||||||
|
stack_source = get_stack_path(stack_name)
|
||||||
|
|
||||||
if update:
|
if update:
|
||||||
# Sync mode: write to temp dir, then copy to deployment dir with backups
|
# Sync mode: write to temp dir, then copy to deployment dir with backups
|
||||||
temp_dir = Path(tempfile.mkdtemp(prefix="deployment-sync-"))
|
temp_dir = Path(tempfile.mkdtemp(prefix="deployment-sync-"))
|
||||||
try:
|
try:
|
||||||
# Write deployment files to temp dir (skip deployment.yml to preserve cluster ID)
|
# Write deployment files to temp dir
|
||||||
|
# (skip deployment.yml to preserve cluster ID)
|
||||||
_write_deployment_files(
|
_write_deployment_files(
|
||||||
temp_dir,
|
temp_dir,
|
||||||
Path(spec_file),
|
Path(spec_file),
|
||||||
@ -628,12 +807,14 @@ def create_operation(
|
|||||||
stack_name,
|
stack_name,
|
||||||
deployment_type,
|
deployment_type,
|
||||||
include_deployment_file=False,
|
include_deployment_file=False,
|
||||||
|
stack_source=stack_source,
|
||||||
)
|
)
|
||||||
|
|
||||||
# Copy from temp to deployment dir, excluding data volumes and backing up changed files
|
# Copy from temp to deployment dir, excluding data volumes
|
||||||
# Exclude data/* to avoid touching user data volumes
|
# and backing up changed files.
|
||||||
# Exclude config file to preserve deployment settings (XXX breaks passing config vars
|
# Exclude data/* to avoid touching user data volumes.
|
||||||
# from spec. could warn about this or not exclude...)
|
# Exclude config file to preserve deployment settings
|
||||||
|
# (XXX breaks passing config vars from spec)
|
||||||
exclude_patterns = ["data", "data/*", constants.config_file_name]
|
exclude_patterns = ["data", "data/*", constants.config_file_name]
|
||||||
_safe_copy_tree(
|
_safe_copy_tree(
|
||||||
temp_dir, deployment_dir_path, exclude_patterns=exclude_patterns
|
temp_dir, deployment_dir_path, exclude_patterns=exclude_patterns
|
||||||
@ -650,6 +831,7 @@ def create_operation(
|
|||||||
stack_name,
|
stack_name,
|
||||||
deployment_type,
|
deployment_type,
|
||||||
include_deployment_file=True,
|
include_deployment_file=True,
|
||||||
|
stack_source=stack_source,
|
||||||
)
|
)
|
||||||
|
|
||||||
# Delegate to the stack's Python code
|
# Delegate to the stack's Python code
|
||||||
@ -670,7 +852,7 @@ def create_operation(
|
|||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
def _safe_copy_tree(src: Path, dst: Path, exclude_patterns: List[str] = None):
|
def _safe_copy_tree(src: Path, dst: Path, exclude_patterns: Optional[List[str]] = None):
|
||||||
"""
|
"""
|
||||||
Recursively copy a directory tree, backing up changed files with .bak suffix.
|
Recursively copy a directory tree, backing up changed files with .bak suffix.
|
||||||
|
|
||||||
@ -721,6 +903,7 @@ def _write_deployment_files(
|
|||||||
stack_name: str,
|
stack_name: str,
|
||||||
deployment_type: str,
|
deployment_type: str,
|
||||||
include_deployment_file: bool = True,
|
include_deployment_file: bool = True,
|
||||||
|
stack_source: Optional[Path] = None,
|
||||||
):
|
):
|
||||||
"""
|
"""
|
||||||
Write deployment files to target directory.
|
Write deployment files to target directory.
|
||||||
@ -730,7 +913,8 @@ def _write_deployment_files(
|
|||||||
:param parsed_spec: Parsed spec object
|
:param parsed_spec: Parsed spec object
|
||||||
:param stack_name: Name of stack
|
:param stack_name: Name of stack
|
||||||
:param deployment_type: Type of deployment
|
:param deployment_type: Type of deployment
|
||||||
:param include_deployment_file: Whether to create deployment.yml file (skip for update)
|
:param include_deployment_file: Whether to create deployment.yml (skip for update)
|
||||||
|
:param stack_source: Path to stack source (git repo) for restart capability
|
||||||
"""
|
"""
|
||||||
stack_file = get_stack_path(stack_name).joinpath(constants.stack_file_name)
|
stack_file = get_stack_path(stack_name).joinpath(constants.stack_file_name)
|
||||||
parsed_stack = get_parsed_stack_config(stack_name)
|
parsed_stack = get_parsed_stack_config(stack_name)
|
||||||
@ -741,10 +925,15 @@ def _write_deployment_files(
|
|||||||
|
|
||||||
# Create deployment file if requested
|
# Create deployment file if requested
|
||||||
if include_deployment_file:
|
if include_deployment_file:
|
||||||
_create_deployment_file(target_dir)
|
_create_deployment_file(target_dir, stack_source=stack_source)
|
||||||
|
|
||||||
# Copy any config variables from the spec file into an env file suitable for compose
|
# Copy any config variables from the spec file into an env file suitable for compose
|
||||||
_write_config_file(spec_file, target_dir.joinpath(constants.config_file_name))
|
# Use stack_name as deployment_name for K8s secret naming
|
||||||
|
# Extract just the name part if stack_name is a path ("path/to/stack" -> "stack")
|
||||||
|
deployment_name = Path(stack_name).name.replace("_", "-")
|
||||||
|
_write_config_file(
|
||||||
|
spec_file, target_dir.joinpath(constants.config_file_name), deployment_name
|
||||||
|
)
|
||||||
|
|
||||||
# Copy any k8s config file into the target dir
|
# Copy any k8s config file into the target dir
|
||||||
if deployment_type == "k8s":
|
if deployment_type == "k8s":
|
||||||
@ -805,8 +994,9 @@ def _write_deployment_files(
|
|||||||
)
|
)
|
||||||
else:
|
else:
|
||||||
# TODO:
|
# TODO:
|
||||||
# this is odd - looks up config dir that matches a volume name, then copies as a mount dir?
|
# This is odd - looks up config dir that matches a volume name,
|
||||||
# AFAICT this is not used by or relevant to any existing stack - roy
|
# then copies as a mount dir?
|
||||||
|
# AFAICT not used by or relevant to any existing stack - roy
|
||||||
|
|
||||||
# TODO: We should probably only do this if the volume is marked :ro.
|
# TODO: We should probably only do this if the volume is marked :ro.
|
||||||
for volume_name, volume_path in parsed_spec.get_volumes().items():
|
for volume_name, volume_path in parsed_spec.get_volumes().items():
|
||||||
|
|||||||
159
stack_orchestrator/deploy/dns_probe.py
Normal file
159
stack_orchestrator/deploy/dns_probe.py
Normal file
@ -0,0 +1,159 @@
|
|||||||
|
# Copyright © 2024 Vulcanize
|
||||||
|
# SPDX-License-Identifier: AGPL-3.0
|
||||||
|
|
||||||
|
"""DNS verification via temporary ingress probe."""
|
||||||
|
|
||||||
|
import secrets
|
||||||
|
import socket
|
||||||
|
import time
|
||||||
|
from typing import Optional
|
||||||
|
import requests
|
||||||
|
from kubernetes import client
|
||||||
|
|
||||||
|
|
||||||
|
def get_server_egress_ip() -> str:
|
||||||
|
"""Get this server's public egress IP via ipify."""
|
||||||
|
response = requests.get("https://api.ipify.org", timeout=10)
|
||||||
|
response.raise_for_status()
|
||||||
|
return response.text.strip()
|
||||||
|
|
||||||
|
|
||||||
|
def resolve_hostname(hostname: str) -> list[str]:
|
||||||
|
"""Resolve hostname to list of IP addresses."""
|
||||||
|
try:
|
||||||
|
_, _, ips = socket.gethostbyname_ex(hostname)
|
||||||
|
return ips
|
||||||
|
except socket.gaierror:
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def verify_dns_simple(hostname: str, expected_ip: Optional[str] = None) -> bool:
|
||||||
|
"""Simple DNS verification - check hostname resolves to expected IP.
|
||||||
|
|
||||||
|
If expected_ip not provided, uses server's egress IP.
|
||||||
|
Returns True if hostname resolves to expected IP.
|
||||||
|
"""
|
||||||
|
resolved_ips = resolve_hostname(hostname)
|
||||||
|
if not resolved_ips:
|
||||||
|
print(f"DNS FAIL: {hostname} does not resolve")
|
||||||
|
return False
|
||||||
|
|
||||||
|
if expected_ip is None:
|
||||||
|
expected_ip = get_server_egress_ip()
|
||||||
|
|
||||||
|
if expected_ip in resolved_ips:
|
||||||
|
print(f"DNS OK: {hostname} -> {resolved_ips} (includes {expected_ip})")
|
||||||
|
return True
|
||||||
|
else:
|
||||||
|
print(f"DNS WARN: {hostname} -> {resolved_ips} (expected {expected_ip})")
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def create_probe_ingress(hostname: str, namespace: str = "default") -> str:
|
||||||
|
"""Create a temporary ingress for DNS probing.
|
||||||
|
|
||||||
|
Returns the probe token that the ingress will respond with.
|
||||||
|
"""
|
||||||
|
token = secrets.token_hex(16)
|
||||||
|
|
||||||
|
networking_api = client.NetworkingV1Api()
|
||||||
|
|
||||||
|
# Create a simple ingress that Caddy will pick up
|
||||||
|
ingress = client.V1Ingress(
|
||||||
|
metadata=client.V1ObjectMeta(
|
||||||
|
name="laconic-dns-probe",
|
||||||
|
annotations={
|
||||||
|
"kubernetes.io/ingress.class": "caddy",
|
||||||
|
"laconic.com/probe-token": token,
|
||||||
|
},
|
||||||
|
),
|
||||||
|
spec=client.V1IngressSpec(
|
||||||
|
rules=[
|
||||||
|
client.V1IngressRule(
|
||||||
|
host=hostname,
|
||||||
|
http=client.V1HTTPIngressRuleValue(
|
||||||
|
paths=[
|
||||||
|
client.V1HTTPIngressPath(
|
||||||
|
path="/.well-known/laconic-probe",
|
||||||
|
path_type="Exact",
|
||||||
|
backend=client.V1IngressBackend(
|
||||||
|
service=client.V1IngressServiceBackend(
|
||||||
|
name="caddy-ingress-controller",
|
||||||
|
port=client.V1ServiceBackendPort(number=80),
|
||||||
|
)
|
||||||
|
),
|
||||||
|
)
|
||||||
|
]
|
||||||
|
),
|
||||||
|
)
|
||||||
|
]
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
networking_api.create_namespaced_ingress(namespace=namespace, body=ingress)
|
||||||
|
return token
|
||||||
|
|
||||||
|
|
||||||
|
def delete_probe_ingress(namespace: str = "default"):
|
||||||
|
"""Delete the temporary probe ingress."""
|
||||||
|
networking_api = client.NetworkingV1Api()
|
||||||
|
try:
|
||||||
|
networking_api.delete_namespaced_ingress(
|
||||||
|
name="laconic-dns-probe", namespace=namespace
|
||||||
|
)
|
||||||
|
except client.exceptions.ApiException:
|
||||||
|
pass # Ignore if already deleted
|
||||||
|
|
||||||
|
|
||||||
|
def verify_dns_via_probe(
|
||||||
|
hostname: str, namespace: str = "default", timeout: int = 30, poll_interval: int = 2
|
||||||
|
) -> bool:
|
||||||
|
"""Verify DNS by creating temp ingress and probing it.
|
||||||
|
|
||||||
|
This definitively proves that traffic to the hostname reaches this cluster.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
hostname: The hostname to verify
|
||||||
|
namespace: Kubernetes namespace for probe ingress
|
||||||
|
timeout: Total seconds to wait for probe to succeed
|
||||||
|
poll_interval: Seconds between probe attempts
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if probe succeeds, False otherwise
|
||||||
|
"""
|
||||||
|
# First check DNS resolves at all
|
||||||
|
if not resolve_hostname(hostname):
|
||||||
|
print(f"DNS FAIL: {hostname} does not resolve")
|
||||||
|
return False
|
||||||
|
|
||||||
|
print(f"Creating probe ingress for {hostname}...")
|
||||||
|
create_probe_ingress(hostname, namespace)
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Wait for Caddy to pick up the ingress
|
||||||
|
time.sleep(3)
|
||||||
|
|
||||||
|
# Poll until success or timeout
|
||||||
|
probe_url = f"http://{hostname}/.well-known/laconic-probe"
|
||||||
|
start_time = time.time()
|
||||||
|
last_error = None
|
||||||
|
|
||||||
|
while time.time() - start_time < timeout:
|
||||||
|
try:
|
||||||
|
response = requests.get(probe_url, timeout=5)
|
||||||
|
# For now, just verify we get a response from this cluster
|
||||||
|
# A more robust check would verify a unique token
|
||||||
|
if response.status_code < 500:
|
||||||
|
print(f"DNS PROBE OK: {hostname} routes to this cluster")
|
||||||
|
return True
|
||||||
|
except requests.RequestException as e:
|
||||||
|
last_error = e
|
||||||
|
|
||||||
|
time.sleep(poll_interval)
|
||||||
|
|
||||||
|
print(f"DNS PROBE FAIL: {hostname} - {last_error}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
finally:
|
||||||
|
print("Cleaning up probe ingress...")
|
||||||
|
delete_probe_ingress(namespace)
|
||||||
@ -352,11 +352,15 @@ class ClusterInfo:
|
|||||||
continue
|
continue
|
||||||
|
|
||||||
if not os.path.isabs(volume_path):
|
if not os.path.isabs(volume_path):
|
||||||
print(
|
# For k8s-kind, allow relative paths:
|
||||||
f"WARNING: {volume_name}:{volume_path} is not absolute, "
|
# - PV uses /mnt/{volume_name} (path inside kind node)
|
||||||
"cannot bind volume."
|
# - extraMounts resolve the relative path to Docker Host
|
||||||
)
|
if not self.spec.is_kind_deployment():
|
||||||
continue
|
print(
|
||||||
|
f"WARNING: {volume_name}:{volume_path} is not absolute, "
|
||||||
|
"cannot bind volume."
|
||||||
|
)
|
||||||
|
continue
|
||||||
|
|
||||||
if self.spec.is_kind_deployment():
|
if self.spec.is_kind_deployment():
|
||||||
host_path = client.V1HostPathVolumeSource(
|
host_path = client.V1HostPathVolumeSource(
|
||||||
@ -453,6 +457,16 @@ class ClusterInfo:
|
|||||||
if "command" in service_info:
|
if "command" in service_info:
|
||||||
cmd = service_info["command"]
|
cmd = service_info["command"]
|
||||||
container_args = cmd if isinstance(cmd, list) else cmd.split()
|
container_args = cmd if isinstance(cmd, list) else cmd.split()
|
||||||
|
# Add env_from to pull secrets from K8s Secret
|
||||||
|
secret_name = f"{self.app_name}-generated-secrets"
|
||||||
|
env_from = [
|
||||||
|
client.V1EnvFromSource(
|
||||||
|
secret_ref=client.V1SecretEnvSource(
|
||||||
|
name=secret_name,
|
||||||
|
optional=True, # Don't fail if no secrets
|
||||||
|
)
|
||||||
|
)
|
||||||
|
]
|
||||||
container = client.V1Container(
|
container = client.V1Container(
|
||||||
name=container_name,
|
name=container_name,
|
||||||
image=image_to_use,
|
image=image_to_use,
|
||||||
@ -460,6 +474,7 @@ class ClusterInfo:
|
|||||||
command=container_command,
|
command=container_command,
|
||||||
args=container_args,
|
args=container_args,
|
||||||
env=envs,
|
env=envs,
|
||||||
|
env_from=env_from,
|
||||||
ports=container_ports if container_ports else None,
|
ports=container_ports if container_ports else None,
|
||||||
volume_mounts=volume_mounts,
|
volume_mounts=volume_mounts,
|
||||||
security_context=client.V1SecurityContext(
|
security_context=client.V1SecurityContext(
|
||||||
@ -476,7 +491,12 @@ class ClusterInfo:
|
|||||||
volumes = volumes_for_pod_files(
|
volumes = volumes_for_pod_files(
|
||||||
self.parsed_pod_yaml_map, self.spec, self.app_name
|
self.parsed_pod_yaml_map, self.spec, self.app_name
|
||||||
)
|
)
|
||||||
image_pull_secrets = [client.V1LocalObjectReference(name="laconic-registry")]
|
registry_config = self.spec.get_image_registry_config()
|
||||||
|
if registry_config:
|
||||||
|
secret_name = f"{self.app_name}-registry"
|
||||||
|
image_pull_secrets = [client.V1LocalObjectReference(name=secret_name)]
|
||||||
|
else:
|
||||||
|
image_pull_secrets = []
|
||||||
|
|
||||||
annotations = None
|
annotations = None
|
||||||
labels = {"app": self.app_name}
|
labels = {"app": self.app_name}
|
||||||
|
|||||||
@ -29,6 +29,7 @@ from stack_orchestrator.deploy.k8s.helpers import (
|
|||||||
from stack_orchestrator.deploy.k8s.helpers import (
|
from stack_orchestrator.deploy.k8s.helpers import (
|
||||||
install_ingress_for_kind,
|
install_ingress_for_kind,
|
||||||
wait_for_ingress_in_kind,
|
wait_for_ingress_in_kind,
|
||||||
|
is_ingress_running,
|
||||||
)
|
)
|
||||||
from stack_orchestrator.deploy.k8s.helpers import (
|
from stack_orchestrator.deploy.k8s.helpers import (
|
||||||
pods_in_deployment,
|
pods_in_deployment,
|
||||||
@ -289,22 +290,38 @@ class K8sDeployer(Deployer):
|
|||||||
self.skip_cluster_management = skip_cluster_management
|
self.skip_cluster_management = skip_cluster_management
|
||||||
if not opts.o.dry_run:
|
if not opts.o.dry_run:
|
||||||
if self.is_kind() and not self.skip_cluster_management:
|
if self.is_kind() and not self.skip_cluster_management:
|
||||||
# Create the kind cluster
|
# Create the kind cluster (or reuse existing one)
|
||||||
create_cluster(
|
kind_config = str(
|
||||||
self.kind_cluster_name,
|
self.deployment_dir.joinpath(constants.kind_config_filename)
|
||||||
str(self.deployment_dir.joinpath(constants.kind_config_filename)),
|
|
||||||
)
|
)
|
||||||
# Ensure the referenced containers are copied into kind
|
actual_cluster = create_cluster(self.kind_cluster_name, kind_config)
|
||||||
load_images_into_kind(
|
if actual_cluster != self.kind_cluster_name:
|
||||||
self.kind_cluster_name, self.cluster_info.image_set
|
# An existing cluster was found, use it instead
|
||||||
|
self.kind_cluster_name = actual_cluster
|
||||||
|
# Only load locally-built images into kind
|
||||||
|
# Registry images (docker.io, ghcr.io, etc.) will be pulled by k8s
|
||||||
|
local_containers = self.deployment_context.stack.obj.get(
|
||||||
|
"containers", []
|
||||||
)
|
)
|
||||||
|
if local_containers:
|
||||||
|
# Filter image_set to only images matching local containers
|
||||||
|
local_images = {
|
||||||
|
img
|
||||||
|
for img in self.cluster_info.image_set
|
||||||
|
if any(c in img for c in local_containers)
|
||||||
|
}
|
||||||
|
if local_images:
|
||||||
|
load_images_into_kind(self.kind_cluster_name, local_images)
|
||||||
|
# Note: if no local containers defined, all images come from registries
|
||||||
self.connect_api()
|
self.connect_api()
|
||||||
if self.is_kind() and not self.skip_cluster_management:
|
if self.is_kind() and not self.skip_cluster_management:
|
||||||
# Configure ingress controller (not installed by default in kind)
|
# Configure ingress controller (not installed by default in kind)
|
||||||
install_ingress_for_kind()
|
# Skip if already running (idempotent for shared cluster)
|
||||||
# Wait for ingress to start
|
if not is_ingress_running():
|
||||||
# (deployment provisioning will fail unless this is done)
|
install_ingress_for_kind(self.cluster_info.spec.get_acme_email())
|
||||||
wait_for_ingress_in_kind()
|
# Wait for ingress to start
|
||||||
|
# (deployment provisioning will fail unless this is done)
|
||||||
|
wait_for_ingress_in_kind()
|
||||||
# Create RuntimeClass if unlimited_memlock is enabled
|
# Create RuntimeClass if unlimited_memlock is enabled
|
||||||
if self.cluster_info.spec.get_unlimited_memlock():
|
if self.cluster_info.spec.get_unlimited_memlock():
|
||||||
_create_runtime_class(
|
_create_runtime_class(
|
||||||
@ -315,6 +332,11 @@ class K8sDeployer(Deployer):
|
|||||||
else:
|
else:
|
||||||
print("Dry run mode enabled, skipping k8s API connect")
|
print("Dry run mode enabled, skipping k8s API connect")
|
||||||
|
|
||||||
|
# Create registry secret if configured
|
||||||
|
from stack_orchestrator.deploy.deployment_create import create_registry_secret
|
||||||
|
|
||||||
|
create_registry_secret(self.cluster_info.spec, self.cluster_info.app_name)
|
||||||
|
|
||||||
self._create_volume_data()
|
self._create_volume_data()
|
||||||
self._create_deployment()
|
self._create_deployment()
|
||||||
|
|
||||||
|
|||||||
@ -14,11 +14,13 @@
|
|||||||
# along with this program. If not, see <http:#www.gnu.org/licenses/>.
|
# along with this program. If not, see <http:#www.gnu.org/licenses/>.
|
||||||
|
|
||||||
from kubernetes import client, utils, watch
|
from kubernetes import client, utils, watch
|
||||||
|
from kubernetes.client.exceptions import ApiException
|
||||||
import os
|
import os
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
import subprocess
|
import subprocess
|
||||||
import re
|
import re
|
||||||
from typing import Set, Mapping, List, Optional, cast
|
from typing import Set, Mapping, List, Optional, cast
|
||||||
|
import yaml
|
||||||
|
|
||||||
from stack_orchestrator.util import get_k8s_dir, error_exit
|
from stack_orchestrator.util import get_k8s_dir, error_exit
|
||||||
from stack_orchestrator.opts import opts
|
from stack_orchestrator.opts import opts
|
||||||
@ -96,16 +98,227 @@ def _run_command(command: str):
|
|||||||
return result
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def _get_etcd_host_path_from_kind_config(config_file: str) -> Optional[str]:
|
||||||
|
"""Extract etcd host path from kind config extraMounts."""
|
||||||
|
import yaml
|
||||||
|
|
||||||
|
try:
|
||||||
|
with open(config_file, "r") as f:
|
||||||
|
config = yaml.safe_load(f)
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
|
||||||
|
nodes = config.get("nodes", [])
|
||||||
|
for node in nodes:
|
||||||
|
extra_mounts = node.get("extraMounts", [])
|
||||||
|
for mount in extra_mounts:
|
||||||
|
if mount.get("containerPath") == "/var/lib/etcd":
|
||||||
|
return mount.get("hostPath")
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _clean_etcd_keeping_certs(etcd_path: str) -> bool:
|
||||||
|
"""Clean persisted etcd, keeping only TLS certificates.
|
||||||
|
|
||||||
|
When etcd is persisted and a cluster is recreated, kind tries to install
|
||||||
|
resources fresh but they already exist. Instead of trying to delete
|
||||||
|
specific stale resources (blacklist), we keep only the valuable data
|
||||||
|
(caddy TLS certs) and delete everything else (whitelist approach).
|
||||||
|
|
||||||
|
The etcd image is distroless (no shell), so we extract the statically-linked
|
||||||
|
etcdctl binary and run it from alpine which has shell support.
|
||||||
|
|
||||||
|
Returns True if cleanup succeeded, False if no action needed or failed.
|
||||||
|
"""
|
||||||
|
db_path = Path(etcd_path) / "member" / "snap" / "db"
|
||||||
|
# Check existence using docker since etcd dir is root-owned
|
||||||
|
check_cmd = (
|
||||||
|
f"docker run --rm -v {etcd_path}:/etcd:ro alpine:3.19 "
|
||||||
|
"test -f /etcd/member/snap/db"
|
||||||
|
)
|
||||||
|
check_result = subprocess.run(check_cmd, shell=True, capture_output=True)
|
||||||
|
if check_result.returncode != 0:
|
||||||
|
if opts.o.debug:
|
||||||
|
print(f"No etcd snapshot at {db_path}, skipping cleanup")
|
||||||
|
return False
|
||||||
|
|
||||||
|
if opts.o.debug:
|
||||||
|
print(f"Cleaning persisted etcd at {etcd_path}, keeping only TLS certs")
|
||||||
|
|
||||||
|
etcd_image = "gcr.io/etcd-development/etcd:v3.5.9"
|
||||||
|
temp_dir = "/tmp/laconic-etcd-cleanup"
|
||||||
|
|
||||||
|
# Whitelist: prefixes to KEEP - everything else gets deleted
|
||||||
|
keep_prefixes = "/registry/secrets/caddy-system"
|
||||||
|
|
||||||
|
# The etcd image is distroless (no shell). We extract the statically-linked
|
||||||
|
# etcdctl binary and run it from alpine which has shell + jq support.
|
||||||
|
cleanup_script = f"""
|
||||||
|
set -e
|
||||||
|
ALPINE_IMAGE="alpine:3.19"
|
||||||
|
|
||||||
|
# Cleanup previous runs
|
||||||
|
docker rm -f laconic-etcd-cleanup 2>/dev/null || true
|
||||||
|
docker rm -f etcd-extract 2>/dev/null || true
|
||||||
|
docker run --rm -v /tmp:/tmp $ALPINE_IMAGE rm -rf {temp_dir}
|
||||||
|
|
||||||
|
# Create temp dir
|
||||||
|
docker run --rm -v /tmp:/tmp $ALPINE_IMAGE mkdir -p {temp_dir}
|
||||||
|
|
||||||
|
# Extract etcdctl binary (it's statically linked)
|
||||||
|
docker create --name etcd-extract {etcd_image}
|
||||||
|
docker cp etcd-extract:/usr/local/bin/etcdctl /tmp/etcdctl-bin
|
||||||
|
docker rm etcd-extract
|
||||||
|
docker run --rm -v /tmp/etcdctl-bin:/src:ro -v {temp_dir}:/dst $ALPINE_IMAGE \
|
||||||
|
sh -c "cp /src /dst/etcdctl && chmod +x /dst/etcdctl"
|
||||||
|
|
||||||
|
# Copy db to temp location
|
||||||
|
docker run --rm \
|
||||||
|
-v {etcd_path}:/etcd:ro \
|
||||||
|
-v {temp_dir}:/tmp-work \
|
||||||
|
$ALPINE_IMAGE cp /etcd/member/snap/db /tmp-work/etcd-snapshot.db
|
||||||
|
|
||||||
|
# Restore snapshot
|
||||||
|
docker run --rm -v {temp_dir}:/work {etcd_image} \
|
||||||
|
etcdutl snapshot restore /work/etcd-snapshot.db \
|
||||||
|
--data-dir=/work/etcd-data --skip-hash-check 2>/dev/null
|
||||||
|
|
||||||
|
# Start temp etcd (runs the etcd binary, no shell needed)
|
||||||
|
docker run -d --name laconic-etcd-cleanup \
|
||||||
|
-v {temp_dir}/etcd-data:/etcd-data \
|
||||||
|
-v {temp_dir}:/backup \
|
||||||
|
{etcd_image} etcd \
|
||||||
|
--data-dir=/etcd-data \
|
||||||
|
--listen-client-urls=http://0.0.0.0:2379 \
|
||||||
|
--advertise-client-urls=http://localhost:2379
|
||||||
|
|
||||||
|
sleep 3
|
||||||
|
|
||||||
|
# Use alpine with extracted etcdctl to run commands (alpine has shell + jq)
|
||||||
|
# Export caddy secrets
|
||||||
|
docker run --rm \
|
||||||
|
-v {temp_dir}:/backup \
|
||||||
|
--network container:laconic-etcd-cleanup \
|
||||||
|
$ALPINE_IMAGE sh -c \
|
||||||
|
'/backup/etcdctl get --prefix "{keep_prefixes}" -w json \
|
||||||
|
> /backup/kept.json 2>/dev/null || echo "{{}}" > /backup/kept.json'
|
||||||
|
|
||||||
|
# Delete ALL registry keys
|
||||||
|
docker run --rm \
|
||||||
|
-v {temp_dir}:/backup \
|
||||||
|
--network container:laconic-etcd-cleanup \
|
||||||
|
$ALPINE_IMAGE /backup/etcdctl del --prefix /registry
|
||||||
|
|
||||||
|
# Restore kept keys using jq
|
||||||
|
docker run --rm \
|
||||||
|
-v {temp_dir}:/backup \
|
||||||
|
--network container:laconic-etcd-cleanup \
|
||||||
|
$ALPINE_IMAGE sh -c '
|
||||||
|
apk add --no-cache jq >/dev/null 2>&1
|
||||||
|
jq -r ".kvs[] | @base64" /backup/kept.json 2>/dev/null | \
|
||||||
|
while read encoded; do
|
||||||
|
key=$(echo $encoded | base64 -d | jq -r ".key" | base64 -d)
|
||||||
|
val=$(echo $encoded | base64 -d | jq -r ".value" | base64 -d)
|
||||||
|
echo "$val" | /backup/etcdctl put "$key"
|
||||||
|
done
|
||||||
|
' || true
|
||||||
|
|
||||||
|
# Save cleaned snapshot
|
||||||
|
docker exec laconic-etcd-cleanup \
|
||||||
|
etcdctl snapshot save /etcd-data/cleaned-snapshot.db
|
||||||
|
|
||||||
|
docker stop laconic-etcd-cleanup
|
||||||
|
docker rm laconic-etcd-cleanup
|
||||||
|
|
||||||
|
# Restore to temp location first to verify it works
|
||||||
|
docker run --rm \
|
||||||
|
-v {temp_dir}/etcd-data/cleaned-snapshot.db:/data/db:ro \
|
||||||
|
-v {temp_dir}:/restore \
|
||||||
|
{etcd_image} \
|
||||||
|
etcdutl snapshot restore /data/db --data-dir=/restore/new-etcd \
|
||||||
|
--skip-hash-check 2>/dev/null
|
||||||
|
|
||||||
|
# Create timestamped backup of original (kept forever)
|
||||||
|
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
|
||||||
|
docker run --rm -v {etcd_path}:/etcd $ALPINE_IMAGE \
|
||||||
|
cp -a /etcd/member /etcd/member.backup-$TIMESTAMP
|
||||||
|
|
||||||
|
# Replace original with cleaned version
|
||||||
|
docker run --rm -v {etcd_path}:/etcd -v {temp_dir}:/tmp-work $ALPINE_IMAGE \
|
||||||
|
sh -c "rm -rf /etcd/member && mv /tmp-work/new-etcd/member /etcd/member"
|
||||||
|
|
||||||
|
# Cleanup temp files (but NOT the timestamped backup in etcd_path)
|
||||||
|
docker run --rm -v /tmp:/tmp $ALPINE_IMAGE rm -rf {temp_dir}
|
||||||
|
rm -f /tmp/etcdctl-bin
|
||||||
|
"""
|
||||||
|
|
||||||
|
result = subprocess.run(cleanup_script, shell=True, capture_output=True, text=True)
|
||||||
|
if result.returncode != 0:
|
||||||
|
if opts.o.debug:
|
||||||
|
print(f"Warning: etcd cleanup failed: {result.stderr}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
if opts.o.debug:
|
||||||
|
print("Cleaned etcd, kept only TLS certificates")
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
def create_cluster(name: str, config_file: str):
|
def create_cluster(name: str, config_file: str):
|
||||||
|
"""Create or reuse the single kind cluster for this host.
|
||||||
|
|
||||||
|
There is only one kind cluster per host by design. Multiple deployments
|
||||||
|
share this cluster. If a cluster already exists, it is reused.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
name: Cluster name (used only when creating the first cluster)
|
||||||
|
config_file: Path to kind config file (used only when creating)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
The name of the cluster being used
|
||||||
|
"""
|
||||||
|
existing = get_kind_cluster()
|
||||||
|
if existing:
|
||||||
|
print(f"Using existing cluster: {existing}")
|
||||||
|
return existing
|
||||||
|
|
||||||
|
# Clean persisted etcd, keeping only TLS certificates
|
||||||
|
etcd_path = _get_etcd_host_path_from_kind_config(config_file)
|
||||||
|
if etcd_path:
|
||||||
|
_clean_etcd_keeping_certs(etcd_path)
|
||||||
|
|
||||||
|
print(f"Creating new cluster: {name}")
|
||||||
result = _run_command(f"kind create cluster --name {name} --config {config_file}")
|
result = _run_command(f"kind create cluster --name {name} --config {config_file}")
|
||||||
if result.returncode != 0:
|
if result.returncode != 0:
|
||||||
raise DeployerException(f"kind create cluster failed: {result}")
|
raise DeployerException(f"kind create cluster failed: {result}")
|
||||||
|
return name
|
||||||
|
|
||||||
|
|
||||||
def destroy_cluster(name: str):
|
def destroy_cluster(name: str):
|
||||||
_run_command(f"kind delete cluster --name {name}")
|
_run_command(f"kind delete cluster --name {name}")
|
||||||
|
|
||||||
|
|
||||||
|
def is_ingress_running() -> bool:
|
||||||
|
"""Check if the Caddy ingress controller is already running in the cluster."""
|
||||||
|
try:
|
||||||
|
core_v1 = client.CoreV1Api()
|
||||||
|
pods = core_v1.list_namespaced_pod(
|
||||||
|
namespace="caddy-system",
|
||||||
|
label_selector=(
|
||||||
|
"app.kubernetes.io/name=caddy-ingress-controller,"
|
||||||
|
"app.kubernetes.io/component=controller"
|
||||||
|
),
|
||||||
|
)
|
||||||
|
for pod in pods.items:
|
||||||
|
if pod.status and pod.status.container_statuses:
|
||||||
|
if pod.status.container_statuses[0].ready is True:
|
||||||
|
if opts.o.debug:
|
||||||
|
print("Caddy ingress controller already running")
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
except ApiException:
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
def wait_for_ingress_in_kind():
|
def wait_for_ingress_in_kind():
|
||||||
core_v1 = client.CoreV1Api()
|
core_v1 = client.CoreV1Api()
|
||||||
for i in range(20):
|
for i in range(20):
|
||||||
@ -132,7 +345,7 @@ def wait_for_ingress_in_kind():
|
|||||||
error_exit("ERROR: Timed out waiting for Caddy ingress to become ready")
|
error_exit("ERROR: Timed out waiting for Caddy ingress to become ready")
|
||||||
|
|
||||||
|
|
||||||
def install_ingress_for_kind():
|
def install_ingress_for_kind(acme_email: str = ""):
|
||||||
api_client = client.ApiClient()
|
api_client = client.ApiClient()
|
||||||
ingress_install = os.path.abspath(
|
ingress_install = os.path.abspath(
|
||||||
get_k8s_dir().joinpath(
|
get_k8s_dir().joinpath(
|
||||||
@ -141,7 +354,34 @@ def install_ingress_for_kind():
|
|||||||
)
|
)
|
||||||
if opts.o.debug:
|
if opts.o.debug:
|
||||||
print("Installing Caddy ingress controller in kind cluster")
|
print("Installing Caddy ingress controller in kind cluster")
|
||||||
utils.create_from_yaml(api_client, yaml_file=ingress_install)
|
|
||||||
|
# Template the YAML with email before applying
|
||||||
|
with open(ingress_install) as f:
|
||||||
|
yaml_content = f.read()
|
||||||
|
|
||||||
|
if acme_email:
|
||||||
|
yaml_content = yaml_content.replace('email: ""', f'email: "{acme_email}"')
|
||||||
|
if opts.o.debug:
|
||||||
|
print(f"Configured Caddy with ACME email: {acme_email}")
|
||||||
|
|
||||||
|
# Apply templated YAML
|
||||||
|
yaml_objects = list(yaml.safe_load_all(yaml_content))
|
||||||
|
utils.create_from_yaml(api_client, yaml_objects=yaml_objects)
|
||||||
|
|
||||||
|
# Patch ConfigMap with ACME email if provided
|
||||||
|
if acme_email:
|
||||||
|
if opts.o.debug:
|
||||||
|
print(f"Configuring ACME email: {acme_email}")
|
||||||
|
core_api = client.CoreV1Api()
|
||||||
|
configmap = core_api.read_namespaced_config_map(
|
||||||
|
name="caddy-ingress-controller-configmap", namespace="caddy-system"
|
||||||
|
)
|
||||||
|
configmap.data["email"] = acme_email
|
||||||
|
core_api.patch_namespaced_config_map(
|
||||||
|
name="caddy-ingress-controller-configmap",
|
||||||
|
namespace="caddy-system",
|
||||||
|
body=configmap,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
def load_images_into_kind(kind_cluster_name: str, image_set: Set[str]):
|
def load_images_into_kind(kind_cluster_name: str, image_set: Set[str]):
|
||||||
@ -324,6 +564,25 @@ def _generate_kind_mounts(parsed_pod_files, deployment_dir, deployment_context):
|
|||||||
volume_host_path_map = _get_host_paths_for_volumes(deployment_context)
|
volume_host_path_map = _get_host_paths_for_volumes(deployment_context)
|
||||||
seen_host_path_mounts = set() # Track to avoid duplicate mounts
|
seen_host_path_mounts = set() # Track to avoid duplicate mounts
|
||||||
|
|
||||||
|
# Cluster state backup for offline data recovery (unique per deployment)
|
||||||
|
# etcd contains all k8s state; PKI certs needed to decrypt etcd offline
|
||||||
|
deployment_id = deployment_context.id
|
||||||
|
backup_subdir = f"cluster-backups/{deployment_id}"
|
||||||
|
|
||||||
|
etcd_host_path = _make_absolute_host_path(
|
||||||
|
Path(f"./data/{backup_subdir}/etcd"), deployment_dir
|
||||||
|
)
|
||||||
|
volume_definitions.append(
|
||||||
|
f" - hostPath: {etcd_host_path}\n" f" containerPath: /var/lib/etcd\n"
|
||||||
|
)
|
||||||
|
|
||||||
|
pki_host_path = _make_absolute_host_path(
|
||||||
|
Path(f"./data/{backup_subdir}/pki"), deployment_dir
|
||||||
|
)
|
||||||
|
volume_definitions.append(
|
||||||
|
f" - hostPath: {pki_host_path}\n" f" containerPath: /etc/kubernetes/pki\n"
|
||||||
|
)
|
||||||
|
|
||||||
# Note these paths are relative to the location of the pod files (at present)
|
# Note these paths are relative to the location of the pod files (at present)
|
||||||
# So we need to fix up to make them correct and absolute because kind assumes
|
# So we need to fix up to make them correct and absolute because kind assumes
|
||||||
# relative to the cwd.
|
# relative to the cwd.
|
||||||
|
|||||||
@ -98,6 +98,17 @@ class Spec:
|
|||||||
def get_image_registry(self):
|
def get_image_registry(self):
|
||||||
return self.obj.get(constants.image_registry_key)
|
return self.obj.get(constants.image_registry_key)
|
||||||
|
|
||||||
|
def get_image_registry_config(self) -> typing.Optional[typing.Dict]:
|
||||||
|
"""Returns registry auth config: {server, username, token-env}.
|
||||||
|
|
||||||
|
Used for private container registries like GHCR. The token-env field
|
||||||
|
specifies an environment variable containing the API token/PAT.
|
||||||
|
|
||||||
|
Note: Uses 'registry-credentials' key to avoid collision with
|
||||||
|
'image-registry' key which is for pushing images.
|
||||||
|
"""
|
||||||
|
return self.obj.get("registry-credentials")
|
||||||
|
|
||||||
def get_volumes(self):
|
def get_volumes(self):
|
||||||
return self.obj.get(constants.volumes_key, {})
|
return self.obj.get(constants.volumes_key, {})
|
||||||
|
|
||||||
@ -117,6 +128,9 @@ class Spec:
|
|||||||
def get_http_proxy(self):
|
def get_http_proxy(self):
|
||||||
return self.obj.get(constants.network_key, {}).get(constants.http_proxy_key, [])
|
return self.obj.get(constants.network_key, {}).get(constants.http_proxy_key, [])
|
||||||
|
|
||||||
|
def get_acme_email(self):
|
||||||
|
return self.obj.get(constants.network_key, {}).get("acme-email", "")
|
||||||
|
|
||||||
def get_annotations(self):
|
def get_annotations(self):
|
||||||
return self.obj.get(constants.annotations_key, {})
|
return self.obj.get(constants.annotations_key, {})
|
||||||
|
|
||||||
@ -179,6 +193,9 @@ class Spec:
|
|||||||
def get_deployment_type(self):
|
def get_deployment_type(self):
|
||||||
return self.obj.get(constants.deploy_to_key)
|
return self.obj.get(constants.deploy_to_key)
|
||||||
|
|
||||||
|
def get_acme_email(self):
|
||||||
|
return self.obj.get(constants.network_key, {}).get(constants.acme_email_key, "")
|
||||||
|
|
||||||
def is_kubernetes_deployment(self):
|
def is_kubernetes_deployment(self):
|
||||||
return self.get_deployment_type() in [
|
return self.get_deployment_type() in [
|
||||||
constants.k8s_kind_deploy_type,
|
constants.k8s_kind_deploy_type,
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user