From 3b9b06a99b35626274f329a6830de6ccfdb80db0 Mon Sep 17 00:00:00 2001 From: Ronni Baslund Date: Wed, 10 Jun 2026 12:19:47 +0200 Subject: [PATCH] docs(runbook): app tier + push-to-deploy CI/CD flow Bring the runbook up to the 2026-06-10 state: app tier + CI/CD in current state, a Deploy flow section (push to main = release, rollback, break-glass, required Gitea secrets), reproduce steps 8-9 (app tier secrets+apply, CI runner + ci-deployer with the runner gotchas), per-router ACME-safe redirect instead of the old global one, platform-api key read-back for Bitwarden, and a pruned TODO list. --- infrastructure/production/RUNBOOK.md | 104 +++++++++++++++++++++++---- 1 file changed, 90 insertions(+), 14 deletions(-) diff --git a/infrastructure/production/RUNBOOK.md b/infrastructure/production/RUNBOOK.md index c98e74f..343ba79 100644 --- a/infrastructure/production/RUNBOOK.md +++ b/infrastructure/production/RUNBOOK.md @@ -9,7 +9,7 @@ bottom to rebuild it. Per-layer detail lives in `host/README.md`, > and stored in **Bitwarden**. See "Secrets" below for how to read the live > values back out of the cluster. -## Current state (built 2026-06-08) +## Current state (built 2026-06-08, app tier + CI/CD 2026-06-10) - **Host:** hardened via `host/bootstrap.sh` — `dezky` admin user, **key-only SSH** (no root, no passwords), k3s-safe nftables firewall (SSH/6443 → mgmt @@ -23,8 +23,42 @@ bottom to rebuild it. Per-layer detail lives in `host/README.md`, - **Data tier** (`dezky-data` ns) — Postgres 16, Mongo 7, Redis 7 as StatefulSets on Longhorn PVCs. Postgres holds the `authentik` + `ocis` DBs. - **Authentik** (`dezky-auth` ns) — live at https://auth.dezky.eu (LE cert), - image `2026.5.2`, on our Postgres/Redis. `akadmin` bootstrap login. -- **Traefik** — global HTTP→HTTPS 308 redirect (`fleet/traefik/`). + chart pinned `2026.5.2`, on our Postgres/Redis. Portal + operator OIDC app + blueprints applied (`fleet/authentik/blueprints/`). +- **Stalwart** (host, not k3s) — mail on the bare host; JMAP management API + reachable from pods at `http://10.42.0.1:8080` (cni0 gateway). +- **Traefik** — per-router HTTP→HTTPS redirect via `redirectScheme` + Middleware on each Ingress (`web,websecure` entrypoints). **No global + entrypoint redirect** — that breaks cert-manager HTTP-01 (`fleet/traefik/`). +- **App tier** (`dezky-apps` ns) — portal (`app.dezky.eu`), platform-api + (`api.dezky.eu`), booking (`booking.dezky.eu`), operator + (`operator.dezky.eu`). See `fleet/README.md`. +- **CI/CD** (`gitea-runner` ns) — in-cluster `gitea/runner:1.0.8` + dind + sidecar. **Push to main = deploy** (see "Deploy flow" below). +- **Registry hygiene** — Gitea package cleanup rule (user-level, Container + type): keep newest 5 versions per image + `latest`, remove older than 7 + days. Applied by Gitea's daily cleanup cron. + +## Deploy flow (day-to-day) + +Push to `main` on Gitea → `.gitea/workflows/ci.yml` runs in-cluster: +**typecheck + test → docker build + push** (each app image tagged `:latest` + +the commit SHA, to `git.lastcloud.io/ronnibaslund/dezky/`) → **deploy** +(`kustomize edit set image` pins the SHA, `kubectl apply -k fleet/apps`, +waits for rollouts). No GitOps controller, no manual steps. Push-to-live is +~2 min with a warm build cache, 5–10 min after a runner pod restart (the dind +layer cache is an emptyDir). + +- **Watch:** repo → Actions in Gitea, or + `kubectl -n dezky-apps get deploy -o wide` (image column shows the SHA). +- **Rollback:** re-run an older green run from the Gitea Actions UI, or + `kubectl -n dezky-apps set image deploy/ =git.lastcloud.io/ronnibaslund/dezky/:`. +- **Break-glass (runner down):** `kubectl apply -k fleet/apps/` by hand — + manifests reference `:latest`. +- **Gitea Actions secrets** (repo Settings → Actions → Secrets): + `KUBECONFIG_B64` (ci-deployer kubeconfig, see step 9) and `REGISTRY_TOKEN` + (Gitea PAT with package read+write — the per-job GITHUB_TOKEN is NOT + accepted by the container registry). ## Reproduce from scratch @@ -91,11 +125,46 @@ See `fleet/authentik/README.md`. Create `dezky-auth` ns + `authentik-secret` generated), then `kubectl apply -f fleet/authentik/helmchart.yaml`. Reachable at https://auth.dezky.eu; first login `akadmin` / `AUTHENTIK_BOOTSTRAP_PASSWORD`. -### 7. Traefik — global HTTP→HTTPS redirect +### 7. Traefik — per-router HTTPS redirect (ACME-safe) ```bash +# NO global entrypoint redirect — it would 301 the HTTP-01 challenge before +# cert-manager's solver router can answer it. Redirect lives per-Ingress via +# a redirectScheme Middleware instead (applied with each tier's kustomize). kubectl apply -f fleet/traefik/helmchartconfig.yaml kubectl -n kube-system delete job helm-install-traefik # force the controller to re-run with merged values -# verify: curl -sI http://auth.dezky.eu -> 308 -> https://auth.dezky.eu/ +# verify: curl -sI http://app.dezky.eu -> 301 https://... AND new certs still issue +``` + +### 8. App tier (portal · platform-api · booking · operator) +```bash +# Secrets first (out-of-band, values from Bitwarden / generated — see +# fleet/README.md "Required env / secrets" + fleet/apps/secrets.example.yaml): +# portal-secrets, booking-secrets, operator-secrets, platform-api-secrets +kubectl apply -k fleet/apps/ +kubectl -n dezky-apps get pods # all Running once images exist in the registry +``` + +### 9. CI runner + push-to-deploy +```bash +# In-cluster Gitea Actions runner (gitea/runner + privileged dind sidecar). +# Registration token from Gitea: Settings → Actions → Runners → Create token. +kubectl create namespace gitea-runner --dry-run=client -o yaml | kubectl apply -f - +kubectl -n gitea-runner create secret generic gitea-runner-token \ + --from-literal=token= +kubectl apply -f fleet/ci/gitea-runner.yaml + +# Deploy ServiceAccount + kubeconfig for the pipeline's deploy job: +kubectl apply -f fleet/ci/ci-deployer.yaml +# mint the kubeconfig (full recipe in fleet/README.md "Deploy") and store it +# as the KUBECONFIG_B64 repo secret; create a Gitea PAT with package +# read+write and store as REGISTRY_TOKEN. + +# Gotchas baked into fleet/ci/gitea-runner.yaml — don't "simplify" them away: +# - gitea/runner 1.x (NOT act_runner 0.2.x: Gitea 1.26 never marks its jobs +# complete, which freezes runs at "Complete job"). +# - dind shares /var/run with the runner: jobs can only get a docker host +# by bind-mounting a UNIX socket (tcp://+TLS can't be mounted). +# - docker:24-dind (moby 27 has a cgroup-v2 teardown deadlock). ``` ## Secrets — read live values for Bitwarden @@ -107,21 +176,28 @@ k postgres-secret AUTHENTIK_DB_PASSWORD # must match Authentik's DB config k postgres-secret OCIS_DB_PASSWORD # must match OCIS's DB config k mongo-secret root-password k redis-secret REDIS_PASSWORD + +a(){ kubectl -n dezky-apps get secret platform-api-secrets -o jsonpath="{.data.$1}" | base64 -d; echo; } +a SCHEDULING_CREDENTIAL_KEY # AES key for stored scheduling creds — losing it orphans them +a AUDIT_SIGNING_KEY # audit hash-chain key — rotation closes the segment ``` ## Still TODO (next layers) -1. **Authentik** — ✅ deployed (`auth.dezky.eu`). Remaining: OIDC app - blueprints (portal + operator, with prod redirect URLs + client secrets) and - the cosmetic rebrand. See `fleet/authentik/README.md`. -2. **OCIS** (files) — uses the `ocis` Postgres DB + Hetzner Object Storage (S3). -3. **Apps** — `fleet/apps/` (portal · platform-api · booking) + their secrets. -4. **Stalwart** (host) — `host/stalwart/install.sh`; needs DNS + PTR. -5. **Backups** — Longhorn → Hetzner Object Storage (`fleet/longhorn/README.md`), +1. **OCIS** (files) — uses the `ocis` Postgres DB + Hetzner Object Storage + (S3). platform-api already carries placeholder `OCIS_*` config + (`fleet/apps/platform-api-config.yaml`) — swap in real values when live. +2. **Audit cold storage** — Hetzner Object Storage bucket + real + `AUDIT_COLD_*` keys in `platform-api-secrets`; flip `ARCHIVE_ENABLED`. +3. **Backups** — Longhorn → Hetzner Object Storage (`fleet/longhorn/README.md`), plus host Restic for the mail store + etcd snapshots, plus pg_dump/mongodump CronJobs. -6. **DNS** — A records `api`/`app`/`booking`/`auth`/`mail`.dezky.eu → 46.4.78.187, - and PTR for mail. +4. **Stripe live keys** — billing is dark-launched off + (`BILLING_STRIPE_ENABLED: "false"` in the app config). + +Done since first build: ✅ Authentik + OIDC blueprints · ✅ Stalwart on the +host · ✅ app tier (incl. operator) · ✅ CI/CD push-to-deploy · ✅ DNS A +records (`api`/`app`/`booking`/`auth`/`mail`/`operator`).dezky.eu. ## Access cheatsheet - SSH: `ssh dezky@46.4.78.187` (key only). Root SSH disabled.