Archived

Add Job and secrets support for k8s-kind deployments #995

Merged

prathamesh merged 26 commits from feature/k8s-jobs into main

2026-03-11 03:56:22 +00:00

Author	SHA1	Message	Date
Prathamesh Musale	aac317503e	fix(test): wait for namespace termination before restart All checks were successful Lint Checks / Run linter (pull_request) Successful in 6m10s Details Deploy Test / Run deploy test suite (pull_request) Successful in 16m28s Details K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Successful in 25m15s Details K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Successful in 27m14s Details Webapp Test / Run webapp test suite (pull_request) Successful in 36m47s Details Smoke Test / Run basic test suite (pull_request) Successful in 35m54s Details Replace fixed sleep with a polling loop that waits for the deployment namespace to be fully deleted. Without this, the start command fails with 403 Forbidden because k8s rejects resource creation in a namespace that is still terminating. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 07:01:27 +00:00
Prathamesh Musale	b85c12e4da	fix(test): use --skip-cluster-management for stop/start volume test Some checks failed Lint Checks / Run linter (pull_request) Successful in 2m50s Details Deploy Test / Run deploy test suite (pull_request) Successful in 8m27s Details K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Failing after 9m20s Details K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Failing after 10m56s Details Webapp Test / Run webapp test suite (pull_request) Failing after 14m12s Details Smoke Test / Run basic test suite (pull_request) Failing after 15m12s Details Recreating a kind cluster in the same CI run fails due to stale etcd/certs and cgroup detection issues. Use --skip-cluster-management to reuse the existing cluster, and --delete-volumes to clear PVs so fresh PVCs can bind on restart. The volume retention semantics are preserved: bind-mount host path data survives (filesystem is old), provisioner volumes are fresh (PVs were deleted). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 06:49:42 +00:00
Prathamesh Musale	a1c6c35834	style: wrap long line in cluster_info.py to fix flake8 E501 Some checks failed Lint Checks / Run linter (pull_request) Successful in 1m58s Details Deploy Test / Run deploy test suite (pull_request) Successful in 4m19s Details K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Successful in 5m33s Details Webapp Test / Run webapp test suite (pull_request) Successful in 6m47s Details Smoke Test / Run basic test suite (pull_request) Successful in 7m30s Details K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Failing after 9m20s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 06:31:25 +00:00
Prathamesh Musale	91f4e5fe38	fix(k8s): use distinct app label for job pods Some checks failed Lint Checks / Run linter (pull_request) Failing after 1m34s Details Deploy Test / Run deploy test suite (pull_request) Successful in 3m17s Details Smoke Test / Run basic test suite (pull_request) Successful in 4m13s Details K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Successful in 4m33s Details Webapp Test / Run webapp test suite (pull_request) Successful in 4m57s Details K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Successful in 5m20s Details Job pod templates used the same app={deployment_id} label as deployment pods, causing pods_in_deployment() to return both. This made the logs command warn about multiple pods and pick the wrong one. Use app={deployment_id}-job for job pod templates so they are not matched by pods_in_deployment(). The Job metadata itself retains the original app label for stack-level queries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 06:26:03 +00:00
Prathamesh Musale	68ef9de016	fix(k8s): resolve internal job compose files from data/compose-jobs Some checks failed Lint Checks / Run linter (pull_request) Successful in 2m5s Details Deploy Test / Run deploy test suite (pull_request) Successful in 4m43s Details K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Failing after 5m4s Details K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Failing after 6m26s Details Webapp Test / Run webapp test suite (pull_request) Successful in 7m36s Details Smoke Test / Run basic test suite (pull_request) Successful in 5m32s Details resolve_job_compose_file() used Path(stack).parent.parent for the internal fallback, which resolved to data/stacks/compose-jobs/ instead of data/compose-jobs/. This meant deploy create couldn't find job compose files for internal stacks, so they were never copied to the deployment directory and never created as k8s Jobs. Use the same data directory resolution pattern as resolve_compose_file. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 06:15:15 +00:00
Prathamesh Musale	a1b5220e40	fix(test): prevent set -e from killing kubectl queries in test checks Some checks failed Lint Checks / Run linter (pull_request) Successful in 2m5s Details Deploy Test / Run deploy test suite (pull_request) Successful in 4m52s Details K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Failing after 6m31s Details K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Successful in 6m58s Details Webapp Test / Run webapp test suite (pull_request) Successful in 7m49s Details Smoke Test / Run basic test suite (pull_request) Successful in 6m15s Details kubectl commands that query jobs or pod specs exit non-zero when the resource doesn't exist yet. Under set -e, a bare command substitution like var=$(kubectl ...) aborts the script silently. Add \|\| true so the polling loop and assertion logic can handle failures gracefully. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 05:59:35 +00:00
Prathamesh Musale	464215c72a	fix(test): replace empty secrets key instead of appending duplicate Some checks failed Lint Checks / Run linter (pull_request) Successful in 3m19s Details Deploy Test / Run deploy test suite (pull_request) Successful in 5m57s Details K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Failing after 6m32s Details K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Successful in 7m31s Details Webapp Test / Run webapp test suite (pull_request) Successful in 8m1s Details Smoke Test / Run basic test suite (pull_request) Successful in 7m38s Details deploy init already writes 'secrets: {}' into the spec file. The test was appending a second secrets block via heredoc, which ruamel.yaml rejects as a duplicate key. Use sed to replace the empty value instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 05:34:37 +00:00
Prathamesh Musale	108f13a09b	fix(test): wait for kind cluster cleanup before recreating Some checks failed Lint Checks / Run linter (pull_request) Successful in 1m58s Details K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Failing after 4m11s Details Deploy Test / Run deploy test suite (pull_request) Successful in 4m43s Details K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Successful in 6m22s Details Webapp Test / Run webapp test suite (pull_request) Successful in 7m42s Details Smoke Test / Run basic test suite (pull_request) Successful in 6m26s Details Replace the fixed `sleep 20` with a polling loop that waits for `kind get clusters` to report no clusters. The previous approach was flaky on CI runners where Docker takes longer to tear down cgroup hierarchies after `kind delete cluster`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 05:26:48 +00:00
Prathamesh Musale	d64046df55	Revert "fix(test): reuse kind cluster on stop/start cycle in deploy test" This reverts commit `35f179b755`.	2026-03-10 05:24:00 +00:00
Prathamesh Musale	35f179b755	fix(test): reuse kind cluster on stop/start cycle in deploy test Use --skip-cluster-management to avoid destroying and recreating the kind cluster during the stop/start volume retention test. The second kind create fails on some CI runners due to cgroups detection issues. Use --delete-volumes to clear PVs so fresh PVCs can bind on restart. Bind-mount data survives on the host filesystem; provisioner volumes are recreated fresh. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 05:13:26 +00:00
Prathamesh Musale	1375f209d3	test(k8s): add tests for jobs, secrets, labels, and namespace isolation Add a job compose file for the test stack and extend the k8s deploy test to verify new features: - Namespace isolation: pod exists in laconic-{id}, not default - Stack labels: app.kubernetes.io/stack label set on pods - Job completion: test-job runs to completion (status.succeeded=1) - Secrets: spec secrets: key results in envFrom secretRef on pod Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 05:06:31 +00:00
Prathamesh Musale	241cd75671	fix(test): use deployment namespace in k8s control test Some checks failed Lint Checks / Run linter (pull_request) Successful in 1m55s Details Deploy Test / Run deploy test suite (pull_request) Successful in 4m46s Details K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Failing after 6m2s Details K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Successful in 6m13s Details Webapp Test / Run webapp test suite (pull_request) Successful in 7m2s Details Smoke Test / Run basic test suite (pull_request) Successful in 5m22s Details The deployment control test queries pods with raw kubectl but didn't specify the namespace. Since pods now live in laconic-{deployment_id} instead of default, the query returned empty results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 04:53:52 +00:00
Prathamesh Musale	183a188874	ci: upgrade Kind to v0.25.0 and pin kubectl to v1.31.2 Some checks failed Lint Checks / Run linter (pull_request) Successful in 2m1s Details Deploy Test / Run deploy test suite (pull_request) Successful in 4m42s Details K8s Deployment Control Test / Run deployment control suite on kind/k8s (pull_request) Failing after 6m39s Details Webapp Test / Run webapp test suite (pull_request) Successful in 6m55s Details K8s Deploy Test / Run deploy test suite on kind/k8s (pull_request) Successful in 7m7s Details Smoke Test / Run basic test suite (pull_request) Successful in 5m20s Details Kind v0.20.0 defaults to k8s v1.27.3 which fails on newer CI runners (kubelet cgroups issue). Upgrade to Kind v0.25.0 (k8s v1.31.2) and pin kubectl to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 04:40:06 +00:00
Prathamesh Musale	517e102830	fix(k8s): use deployment namespace for pod/container lookups pods_in_deployment() and containers_in_pod() hardcoded namespace="default", but pods are created in the deployment-specific namespace (laconic-{cluster-id}). This caused logs() to return "Pods not running" even when pods were healthy. Add namespace parameter to both functions and pass self.k8s_namespace from the logs() caller. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 04:40:06 +00:00
Prathamesh Musale	ef07b2c86e	k8s: extract basename from stack path for labels Stack.name contains the full absolute path from the spec file's "stack:" key (e.g. /home/.../stacks/hyperlane-minio). K8s labels must be <= 63 bytes and alphanumeric. Extract just the directory basename (e.g. "hyperlane-minio") before using it as a label value. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 04:40:06 +00:00
Prathamesh Musale	7c8a4d91e7	k8s: add start() hook for post-deployment k8s resource creation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 04:40:06 +00:00
Prathamesh Musale	b8702f0bfc	k8s: add app.kubernetes.io/stack label to pods and jobs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 04:40:06 +00:00
Prathamesh Musale	d7a742032e	fix(webapp): use YAML round-trip instead of raw string append in _fixup_url_spec The secrets: {} key added by init_operation for k8s deployments became the last key in the spec file, breaking the raw string append that assumed network: was always last. Replace with proper YAML load/modify/dump. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 04:40:06 +00:00
Prathamesh Musale	b77037c73d	fix: remove shadowed os import in cluster_info Inline `import os` at line 663 shadowed the top-level import, causing flake8 F402. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 04:40:06 +00:00
Prathamesh Musale	8d65cb13a0	fix(k8s): copy configmap dirs for jobs-only stacks during deploy create The k8s configmap directory copying was inside the `for pod in pods:` loop. For jobs-only stacks (no pods), the loop never executes, so configmap files were never copied into the deployment directory. The ConfigMaps were created as empty objects, leaving volume mounts with no files. Move the k8s configmap copying outside the pod loop so it runs regardless of whether the stack has pods. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 04:40:06 +00:00
Prathamesh Musale	47f3068e70	fix(k8s): include job volumes in PVC/ConfigMap/PV creation For jobs-only stacks, named_volumes_from_pod_files() returned empty because it only scanned parsed_pod_yaml_map. This caused ConfigMaps and PVCs declared in the spec to be silently skipped. - Add _all_named_volumes() helper that scans both pod and job maps - Guard update() against empty parsed_pod_yaml_map (uncaught 404) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 04:40:06 +00:00
Prathamesh Musale	9b304b8990	fix(k8s): remove job-name label that conflicts with k8s auto-label Kubernetes automatically adds a job-name label to Job pod templates matching the full Job name. Our custom job-name label used the short name, causing a 422 validation error. Let k8s manage this label. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 04:40:06 +00:00
Prathamesh Musale	e0a8477326	fix(k8s): skip Deployment creation for jobs-only stacks When a stack defines only jobs: (no pods:), the parsed_pod_yaml_map is empty. Creating a Deployment with no containers causes a 422 error from the k8s API. Skip Deployment creation when there are no pods. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 04:40:06 +00:00
Prathamesh Musale	74deb3f8d6	feat(k8s): add Job support for non-Helm k8s-kind deployments Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 04:40:06 +00:00
Prathamesh Musale	589ed3cf69	docs: update CLI reference to match actual code cli.md: - Document `start`/`stop` as preferred commands (`up`/`down` as legacy) - Add --skip-cluster-management flag for start and stop - Add --delete-volumes flag for stop - Add missing subcommands: restart, exec, status, port, push-images, run-job - Add --helm-chart option to deploy create - Reorganize deploy vs deployment sections for clarity deployment_patterns.md: - Add missing --stack flag to deploy create example Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 04:38:49 +00:00
Prathamesh Musale	641052558a	feat: add secrets support for k8s deployments Adds a `secrets:` key to spec.yml that references pre-existing k8s Secrets by name. SO mounts them as envFrom.secretRef on all pod containers. Secret contents are managed out-of-band by the operator. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 04:38:48 +00:00

Add Job and secrets support for k8s-kind deployments #995

26 Commits