feat(infra): k3s foundation — cert-manager, Longhorn config, in-cluster data tier

Adds the production cluster foundation (authored + applied live on node1): - cert-manager via the k3s HelmChart controller + letsencrypt staging/prod ClusterIssuers (HTTP-01 / Traefik). - Longhorn config for single-node (values: replica=1, default StorageClass, Retain) + backup-to-Hetzner-Object-Storage credential template. - In-cluster data tier (dezky-data): Postgres 16 (with Authentik+OCIS DB init), MongoDB 7, Redis 7 as StatefulSets on Longhorn, + secret template. - bootstrap.sh: install open-iscsi/nfs-common + enable iscsid (Longhorn prereq). - RUNBOOK.md: full reproducible node1 build order. Real secrets are generated on-box and kept in Bitwarden — never in git.
2026-06-08 18:39:31 +02:00
parent 65a68ee126
commit 153d7053ca
17 changed files with 733 additions and 1 deletions
@@ -0,0 +1,68 @@
+# fleet/longhorn — block storage for the data tier
+
+Longhorn provides the `longhorn` StorageClass that the data tier (Postgres /
+Mongo / Redis) and other stateful apps use. Single node for now (replica = 1):
+durability is the same as local disk, but you gain **snapshots** and **off-box
+backups to Hetzner Object Storage**, plus a clean path to multi-node later.
+
+You install Longhorn; this dir holds the **config** (`values.yaml`) + the backup
+credential template.
+
+## 1. Host prerequisite (every node)
+`open-iscsi` + a running `iscsid`, and `nfs-common`. Already baked into
+`../../host/bootstrap.sh` — but the node is already bootstrapped, so install it
+**now** on node1:
+```bash
+sudo apt-get install -y open-iscsi nfs-common
+sudo systemctl enable --now iscsid
+systemctl is-active iscsid          # -> active
+```
+(Optional but recommended) run Longhorn's environment check before installing:
+```bash
+curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.12.0/scripts/environment_check.sh | bash
+```
+
+## 2. Install (your step) with this config
+```bash
+helm repo add longhorn https://charts.longhorn.io && helm repo update
+helm install longhorn longhorn/longhorn \
+  -n longhorn-system --create-namespace \
+  --version 1.12.0 -f values.yaml
+kubectl -n longhorn-system rollout status deploy/longhorn-driver-deployer
+kubectl get storageclass            # 'longhorn' present + (default)
+```
+
+## 3. Make Longhorn the only default StorageClass
+`values.yaml` sets Longhorn as default — now drop k3s's local-path default so
+there aren't two:
+```bash
+kubectl patch storageclass local-path \
+  -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
+kubectl get storageclass            # only 'longhorn' shows (default)
+```
+
+## 4. Backups → Hetzner Object Storage (S3)
+1. In Hetzner: create a bucket (e.g. `dezky-longhorn`) + an S3 key pair; note the
+   endpoint (`https://fsn1.your-objectstorage.com`).
+2. Fill + apply `backup-secret.example.yaml` (creds → Bitwarden).
+3. Set the backup target (UI: **Settings → General**, or uncomment in
+   `values.yaml` + upgrade):
+   - Backup Target: `s3://dezky-longhorn@fsn1/`
+   - Backup Target Credential Secret: `longhorn-backup-secret`
+4. Add a **RecurringJob** (UI → Recurring Job, or a `RecurringJob` CR): e.g. a
+   nightly `backup` with retention 14, applied to the `default` volume group so
+   every PV is backed up off-box.
+
+## How this changes the backup story
+Longhorn now owns volume-level snapshots + S3 backups, so the host `restic`
+layer no longer needs to capture `/var/lib/rancher/k3s/storage` (local-path).
+Keep restic for the **host** bits (Stalwart mail store, k3s etcd snapshots), and
+still take **logical DB dumps** (`pg_dump`/`mongodump`) into a Longhorn PVC —
+Longhorn backs that up to S3 and a logical dump is what you actually restore
+from. (Crash-consistent block snapshots of a live DB are a last resort.)
+
+## Notes
+- Bump `defaultReplicaCount` to 2–3 in `values.yaml` (helm upgrade) once more
+  nodes join; Longhorn rebalances.
+- The UI Ingress is intentionally **off** — it's full storage admin. Gate it
+  behind an IP allowlist or Authentik before exposing it.