Files
dezky/infrastructure/production/RUNBOOK.md
T
Ronni Baslund 406e2ca78b
ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled
ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
feat(infra): deploy Authentik (auth.dezky.eu) + global HTTP→HTTPS redirect
- Authentik on the in-cluster Postgres/Redis (mirrors the dev compose config:
  external DB/Redis, error-reporting off, update-check off, bootstrap admin),
  via the k3s Helm controller; Ingress + cert-manager letsencrypt-prod. Live at
  https://auth.dezky.eu (image 2026.5.2). Secrets generated on-box (Bitwarden).
- Traefik HelmChartConfig: global :80 -> :443 (308) redirect via
  additionalArguments (to=:443, HTTP-01-safe).
- RUNBOOK updated.

Deferred (mirror remaining dev bits): OIDC app blueprints (portal/operator with
prod URLs) + the cosmetic "Powered by Dezky" rebrand.
2026-06-08 19:00:07 +02:00

6.2 KiB

Dezky production — node1 build runbook

The actual, reproducible order used to stand up node1.dezky.eu (Hetzner AX41, 46.4.78.187, Ubuntu 24.04). If the box is lost, follow this top to bottom to rebuild it. Per-layer detail lives in host/README.md, fleet/cert-manager/, fleet/longhorn/, fleet/data/.

Secrets are never in git. They're generated with openssl rand -hex 24 and stored in Bitwarden. See "Secrets" below for how to read the live values back out of the cluster.

Current state (built 2026-06-08)

  • Host: hardened via host/bootstrap.shdezky admin user, key-only SSH (no root, no passwords), k3s-safe nftables firewall (SSH/6443 → mgmt IPs 46.32.144.38/46.32.144.45; 80/443+mail → world), fail2ban, unattended-upgrades, open-iscsi+iscsid (Longhorn prereq). dezky has NOPASSWD sudo (/etc/sudoers.d/90-dezky).
  • k3s v1.33.11 — single node (control-plane/etcd/worker), registered in Rancher (91.99.122.153).
  • Longhorn — default StorageClass, numberOfReplicas: 1 (single node).
  • cert-manager + letsencrypt-staging / letsencrypt-prod (HTTP-01/Traefik).
  • Data tier (dezky-data ns) — Postgres 16, Mongo 7, Redis 7 as StatefulSets on Longhorn PVCs. Postgres holds the authentik + ocis DBs.
  • Authentik (dezky-auth ns) — live at https://auth.dezky.eu (LE cert), image 2026.5.2, on our Postgres/Redis. akadmin bootstrap login.
  • Traefik — global HTTP→HTTPS 308 redirect (fleet/traefik/).

Reproduce from scratch

1. Host layer

# from laptop
scp -r infrastructure/production/host root@<ip>:/opt/dezky-host
# copy/fill config.env on the box (gitignored — MGMT IPs, ADMIN_SSH_PUBKEY,
# RANCHER_* token/checksum, STALWART_*, RESTIC_*)
ssh root@<ip> 'cd /opt/dezky-host && ./bootstrap.sh'
# set a console/sudo password for the admin user, then (optional) NOPASSWD:
ssh root@<ip> 'passwd dezky'
ssh dezky@<ip> "echo 'dezky ALL=(ALL) NOPASSWD:ALL' | sudo tee /etc/sudoers.d/90-dezky && sudo chmod 0440 /etc/sudoers.d/90-dezky"

2. k3s + kubectl access

ssh dezky@<ip>
sudo /opt/dezky-host/k3s/register.sh           # joins the Rancher Custom (K3s) cluster
kubectl --kubeconfig /etc/rancher/k3s/k3s.yaml get nodes   # -> Ready
# give dezky a kubeconfig:
mkdir -p ~/.kube && sudo install -m 600 -o dezky -g dezky /etc/rancher/k3s/k3s.yaml ~/.kube/config

3. Longhorn (storage)

sudo apt-get install -y open-iscsi nfs-common && sudo systemctl enable --now iscsid   # (bootstrap.sh does this now)
helm repo add longhorn https://charts.longhorn.io && helm repo update
helm install longhorn longhorn/longhorn -n longhorn-system --create-namespace \
  --version 1.12.0 -f fleet/longhorn/values.yaml        # replica=1, default class
# one default SC only:
kubectl patch storageclass local-path -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
kubectl -n longhorn-system patch settings.longhorn.io default-replica-count --type=merge -p '{"value":"1"}'
kubectl get storageclass        # only 'longhorn (default)'

4. cert-manager + issuers

kubectl apply -f fleet/cert-manager/cert-manager.yaml
kubectl -n cert-manager rollout status deploy/cert-manager-webhook --timeout=180s
kubectl apply -f fleet/cert-manager/cluster-issuer.yaml
kubectl get clusterissuer       # both READY=True

5. Data tier

kubectl create namespace dezky-data --dry-run=client -o yaml | kubectl apply -f -
# secrets — generate fresh, store in Bitwarden:
kubectl -n dezky-data create secret generic postgres-secret \
  --from-literal=POSTGRES_PASSWORD=$(openssl rand -hex 24) \
  --from-literal=AUTHENTIK_DB_PASSWORD=$(openssl rand -hex 24) \
  --from-literal=OCIS_DB_PASSWORD=$(openssl rand -hex 24)
kubectl -n dezky-data create secret generic mongo-secret \
  --from-literal=root-username=dezky --from-literal=root-password=$(openssl rand -hex 24)
kubectl -n dezky-data create secret generic redis-secret \
  --from-literal=REDIS_PASSWORD=$(openssl rand -hex 24)
kubectl apply -k fleet/data/
kubectl -n dezky-data get pods,pvc      # all Running, PVCs Bound on longhorn

6. Authentik (IdP)

See fleet/authentik/README.md. Create dezky-auth ns + authentik-secret (DB/Redis pw read back from dezky-data so they match; SECRET_KEY + bootstrap generated), then kubectl apply -f fleet/authentik/helmchart.yaml. Reachable at https://auth.dezky.eu; first login akadmin / AUTHENTIK_BOOTSTRAP_PASSWORD.

7. Traefik — global HTTP→HTTPS redirect

kubectl apply -f fleet/traefik/helmchartconfig.yaml
kubectl -n kube-system delete job helm-install-traefik   # force the controller to re-run with merged values
# verify: curl -sI http://auth.dezky.eu  ->  308 -> https://auth.dezky.eu/

Secrets — read live values for Bitwarden

k(){ kubectl -n dezky-data get secret "$1" -o jsonpath="{.data.$2}" | base64 -d; echo; }
k postgres-secret POSTGRES_PASSWORD
k postgres-secret AUTHENTIK_DB_PASSWORD     # must match Authentik's DB config
k postgres-secret OCIS_DB_PASSWORD          # must match OCIS's DB config
k mongo-secret root-password
k redis-secret REDIS_PASSWORD

Still TODO (next layers)

  1. Authentik deployed (auth.dezky.eu). Remaining: OIDC app blueprints (portal + operator, with prod redirect URLs + client secrets) and the cosmetic rebrand. See fleet/authentik/README.md.
  2. OCIS (files) — uses the ocis Postgres DB + Hetzner Object Storage (S3).
  3. Appsfleet/apps/ (portal · platform-api · booking) + their secrets.
  4. Stalwart (host) — host/stalwart/install.sh; needs DNS + PTR.
  5. Backups — Longhorn → Hetzner Object Storage (fleet/longhorn/README.md), plus host Restic for the mail store + etcd snapshots, plus pg_dump/mongodump CronJobs.
  6. DNS — A records api/app/booking/auth/mail.dezky.eu → 46.4.78.187, and PTR for mail.

Access cheatsheet

  • SSH: ssh dezky@46.4.78.187 (key only). Root SSH disabled.
  • kubectl: works as dezky (kubeconfig at ~/.kube/config).
  • Out-of-band if locked out: Hetzner Robot KVM/LARA or Rescue System.
  • The level=warning … 50-rancher.yaml: permission denied from kubectl is harmless noise (k3s kubectl probing a root-only config dir).