Files
dezky/infrastructure/production/fleet/longhorn/README.md
T
Ronni Baslund 153d7053ca
ci / typecheck (map[dir:apps/website name:website]) (push) Failing after 10m58s
ci / typecheck (map[dir:apps/portal name:portal]) (push) Failing after 11m56s
ci / typecheck (map[dir:apps/booking name:booking]) (push) Failing after 14m0s
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
feat(infra): k3s foundation — cert-manager, Longhorn config, in-cluster data tier
Adds the production cluster foundation (authored + applied live on node1):
- cert-manager via the k3s HelmChart controller + letsencrypt staging/prod
  ClusterIssuers (HTTP-01 / Traefik).
- Longhorn config for single-node (values: replica=1, default StorageClass,
  Retain) + backup-to-Hetzner-Object-Storage credential template.
- In-cluster data tier (dezky-data): Postgres 16 (with Authentik+OCIS DB init),
  MongoDB 7, Redis 7 as StatefulSets on Longhorn, + secret template.
- bootstrap.sh: install open-iscsi/nfs-common + enable iscsid (Longhorn prereq).
- RUNBOOK.md: full reproducible node1 build order.

Real secrets are generated on-box and kept in Bitwarden — never in git.
2026-06-08 18:39:31 +02:00

3.1 KiB
Raw Blame History

fleet/longhorn — block storage for the data tier

Longhorn provides the longhorn StorageClass that the data tier (Postgres / Mongo / Redis) and other stateful apps use. Single node for now (replica = 1): durability is the same as local disk, but you gain snapshots and off-box backups to Hetzner Object Storage, plus a clean path to multi-node later.

You install Longhorn; this dir holds the config (values.yaml) + the backup credential template.

1. Host prerequisite (every node)

open-iscsi + a running iscsid, and nfs-common. Already baked into ../../host/bootstrap.sh — but the node is already bootstrapped, so install it now on node1:

sudo apt-get install -y open-iscsi nfs-common
sudo systemctl enable --now iscsid
systemctl is-active iscsid          # -> active

(Optional but recommended) run Longhorn's environment check before installing:

curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.12.0/scripts/environment_check.sh | bash

2. Install (your step) with this config

helm repo add longhorn https://charts.longhorn.io && helm repo update
helm install longhorn longhorn/longhorn \
  -n longhorn-system --create-namespace \
  --version 1.12.0 -f values.yaml
kubectl -n longhorn-system rollout status deploy/longhorn-driver-deployer
kubectl get storageclass            # 'longhorn' present + (default)

3. Make Longhorn the only default StorageClass

values.yaml sets Longhorn as default — now drop k3s's local-path default so there aren't two:

kubectl patch storageclass local-path \
  -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
kubectl get storageclass            # only 'longhorn' shows (default)

4. Backups → Hetzner Object Storage (S3)

  1. In Hetzner: create a bucket (e.g. dezky-longhorn) + an S3 key pair; note the endpoint (https://fsn1.your-objectstorage.com).
  2. Fill + apply backup-secret.example.yaml (creds → Bitwarden).
  3. Set the backup target (UI: Settings → General, or uncomment in values.yaml + upgrade):
    • Backup Target: s3://dezky-longhorn@fsn1/
    • Backup Target Credential Secret: longhorn-backup-secret
  4. Add a RecurringJob (UI → Recurring Job, or a RecurringJob CR): e.g. a nightly backup with retention 14, applied to the default volume group so every PV is backed up off-box.

How this changes the backup story

Longhorn now owns volume-level snapshots + S3 backups, so the host restic layer no longer needs to capture /var/lib/rancher/k3s/storage (local-path). Keep restic for the host bits (Stalwart mail store, k3s etcd snapshots), and still take logical DB dumps (pg_dump/mongodump) into a Longhorn PVC — Longhorn backs that up to S3 and a logical dump is what you actually restore from. (Crash-consistent block snapshots of a live DB are a last resort.)

Notes

  • Bump defaultReplicaCount to 23 in values.yaml (helm upgrade) once more nodes join; Longhorn rebalances.
  • The UI Ingress is intentionally off — it's full storage admin. Gate it behind an IP allowlist or Authentik before exposing it.