feat(infra): k3s foundation — cert-manager, Longhorn config, in-cluster data tier

Adds the production cluster foundation (authored + applied live on node1): - cert-manager via the k3s HelmChart controller + letsencrypt staging/prod ClusterIssuers (HTTP-01 / Traefik). - Longhorn config for single-node (values: replica=1, default StorageClass, Retain) + backup-to-Hetzner-Object-Storage credential template. - In-cluster data tier (dezky-data): Postgres 16 (with Authentik+OCIS DB init), MongoDB 7, Redis 7 as StatefulSets on Longhorn, + secret template. - bootstrap.sh: install open-iscsi/nfs-common + enable iscsid (Longhorn prereq). - RUNBOOK.md: full reproducible node1 build order. Real secrets are generated on-box and kept in Bitwarden — never in git.
2026-06-08 18:39:31 +02:00
parent 65a68ee126
commit 153d7053ca
17 changed files with 733 additions and 1 deletions
@@ -37,6 +37,9 @@ data/
 # But keep app-level data/ dirs — operator carries mock fixtures there.
 !apps/*/data/
 !apps/*/data/**
 # ...and the production fleet data-tier manifests (k8s YAML, not volume data).
 !infrastructure/production/fleet/data/
 !infrastructure/production/fleet/data/**
 # Coverage
 coverage/
@@ -0,0 +1,114 @@
 # Dezky production — node1 build runbook
 The actual, reproducible order used to stand up **node1.dezky.eu** (Hetzner
 AX41, `46.4.78.187`, Ubuntu 24.04). If the box is lost, follow this top to
 bottom to rebuild it. Per-layer detail lives in `host/README.md`,
 `fleet/cert-manager/`, `fleet/longhorn/`, `fleet/data/`.
 > Secrets are **never** in git. They're generated with `openssl rand -hex 24`
 > and stored in **Bitwarden**. See "Secrets" below for how to read the live
 > values back out of the cluster.
 ## Current state (built 2026-06-08)
 - **Host:** hardened via `host/bootstrap.sh` — `dezky` admin user, **key-only
  SSH** (no root, no passwords), k3s-safe nftables firewall (SSH/6443 → mgmt
  IPs `46.32.144.38`/`46.32.144.45`; 80/443+mail → world), fail2ban,
  unattended-upgrades, `open-iscsi`+`iscsid` (Longhorn prereq).
  `dezky` has **NOPASSWD sudo** (`/etc/sudoers.d/90-dezky`).
 - **k3s** v1.33.11 — single node (control-plane/etcd/worker), registered in
  Rancher (`91.99.122.153`).
 - **Longhorn** — default StorageClass, `numberOfReplicas: 1` (single node).
 - **cert-manager** + `letsencrypt-staging` / `letsencrypt-prod` (HTTP-01/Traefik).
 - **Data tier** (`dezky-data` ns) — Postgres 16, Mongo 7, Redis 7 as
  StatefulSets on Longhorn PVCs. Postgres holds the `authentik` + `ocis` DBs.
 ## Reproduce from scratch
 ### 1. Host layer
 ```bash
 # from laptop
 scp -r infrastructure/production/host root@<ip>:/opt/dezky-host
 # copy/fill config.env on the box (gitignored — MGMT IPs, ADMIN_SSH_PUBKEY,
 # RANCHER_* token/checksum, STALWART_*, RESTIC_*)
 ssh root@<ip> 'cd /opt/dezky-host && ./bootstrap.sh'
 # set a console/sudo password for the admin user, then (optional) NOPASSWD:
 ssh root@<ip> 'passwd dezky'
 ssh dezky@<ip> "echo 'dezky ALL=(ALL) NOPASSWD:ALL' | sudo tee /etc/sudoers.d/90-dezky && sudo chmod 0440 /etc/sudoers.d/90-dezky"
 ```
 ### 2. k3s + kubectl access
 ```bash
 ssh dezky@<ip>
 sudo /opt/dezky-host/k3s/register.sh           # joins the Rancher Custom (K3s) cluster
 kubectl --kubeconfig /etc/rancher/k3s/k3s.yaml get nodes   # -> Ready
 # give dezky a kubeconfig:
 mkdir -p ~/.kube && sudo install -m 600 -o dezky -g dezky /etc/rancher/k3s/k3s.yaml ~/.kube/config
 ```
 ### 3. Longhorn (storage)
 ```bash
 sudo apt-get install -y open-iscsi nfs-common && sudo systemctl enable --now iscsid   # (bootstrap.sh does this now)
 helm repo add longhorn https://charts.longhorn.io && helm repo update
 helm install longhorn longhorn/longhorn -n longhorn-system --create-namespace \
  --version 1.12.0 -f fleet/longhorn/values.yaml        # replica=1, default class
 # one default SC only:
 kubectl patch storageclass local-path -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
 kubectl -n longhorn-system patch settings.longhorn.io default-replica-count --type=merge -p '{"value":"1"}'
 kubectl get storageclass        # only 'longhorn (default)'
 ```
 ### 4. cert-manager + issuers
 ```bash
 kubectl apply -f fleet/cert-manager/cert-manager.yaml
 kubectl -n cert-manager rollout status deploy/cert-manager-webhook --timeout=180s
 kubectl apply -f fleet/cert-manager/cluster-issuer.yaml
 kubectl get clusterissuer       # both READY=True
 ```
 ### 5. Data tier
 ```bash
 kubectl create namespace dezky-data --dry-run=client -o yaml | kubectl apply -f -
 # secrets — generate fresh, store in Bitwarden:
 kubectl -n dezky-data create secret generic postgres-secret \
  --from-literal=POSTGRES_PASSWORD=$(openssl rand -hex 24) \
  --from-literal=AUTHENTIK_DB_PASSWORD=$(openssl rand -hex 24) \
  --from-literal=OCIS_DB_PASSWORD=$(openssl rand -hex 24)
 kubectl -n dezky-data create secret generic mongo-secret \
  --from-literal=root-username=dezky --from-literal=root-password=$(openssl rand -hex 24)
 kubectl -n dezky-data create secret generic redis-secret \
  --from-literal=REDIS_PASSWORD=$(openssl rand -hex 24)
 kubectl apply -k fleet/data/
 kubectl -n dezky-data get pods,pvc      # all Running, PVCs Bound on longhorn
 ```
 ## Secrets — read live values for Bitwarden
 ```bash
 k(){ kubectl -n dezky-data get secret "$1" -o jsonpath="{.data.$2}" | base64 -d; echo; }
 k postgres-secret POSTGRES_PASSWORD
 k postgres-secret AUTHENTIK_DB_PASSWORD     # must match Authentik's DB config
 k postgres-secret OCIS_DB_PASSWORD          # must match OCIS's DB config
 k mongo-secret root-password
 k redis-secret REDIS_PASSWORD
 ```
 ## Still TODO (next layers)
 1. **Authentik** (`auth.dezky.eu`) — OIDC for the portal; uses the `authentik`
   Postgres DB + Redis.
 2. **OCIS** (files) — uses the `ocis` Postgres DB + Hetzner Object Storage (S3).
 3. **Apps** — `fleet/apps/` (portal · platform-api · booking) + their secrets.
 4. **Stalwart** (host) — `host/stalwart/install.sh`; needs DNS + PTR.
 5. **Backups** — Longhorn → Hetzner Object Storage (`fleet/longhorn/README.md`),
   plus host Restic for the mail store + etcd snapshots, plus pg_dump/mongodump
   CronJobs.
 6. **DNS** — A records `api`/`app`/`booking`/`auth`/`mail`.dezky.eu → 46.4.78.187,
   and PTR for mail.
 ## Access cheatsheet
 - SSH: `ssh dezky@46.4.78.187` (key only). Root SSH disabled.
 - kubectl: works as `dezky` (kubeconfig at `~/.kube/config`).
 - Out-of-band if locked out: Hetzner Robot KVM/LARA or Rescue System.
 - The `level=warning … 50-rancher.yaml: permission denied` from kubectl is
  harmless noise (k3s kubectl probing a root-only config dir).
@@ -0,0 +1,29 @@
 # fleet/cert-manager — TLS for the cluster
 cert-manager + ACME ClusterIssuers. Installs via the **k3s built-in Helm
 controller** (no Helm CLI needed), then defines `letsencrypt-staging` and
 `letsencrypt-prod` (HTTP-01 through the bundled Traefik).
 ## Apply order (matters — issuers need the CRDs first)
 ```bash
 # 1) Install cert-manager
 kubectl apply -f cert-manager.yaml
 # 2) Wait until it's up (CRDs + webhook ready)
 kubectl -n cert-manager rollout status deploy/cert-manager-webhook --timeout=180s
 kubectl -n cert-manager get pods
 # 3) Create the issuers
 kubectl apply -f cluster-issuer.yaml
 kubectl get clusterissuer            # both should report READY=True
 ```
 ## Notes
 - ACME email is `info@dezky.eu` — change in `cluster-issuer.yaml` if needed.
 - **Test with `letsencrypt-staging` first** (set an Ingress annotation
  `cert-manager.io/cluster-issuer: letsencrypt-staging`) to avoid burning the
  strict prod rate limits, then switch the apps to `letsencrypt-prod`.
 - HTTP-01 requires each hostname's DNS A record → `46.4.78.187` and port 80
  open (already true). A cert won't issue until DNS resolves.
 - The app Ingresses (`fleet/apps/`) already reference `letsencrypt-prod`.
@@ -0,0 +1,37 @@
 # cert-manager, installed via the k3s built-in Helm controller
 # (helm.cattle.io/v1). k3s watches HelmChart resources in any namespace and
 # runs a `helm install` Job for them — no Helm CLI needed on your laptop.
 #
 # The chart installs its own CRDs (crds.enabled=true). Apply this first and
 # wait for the cert-manager pods to be Running/Ready before applying the
 # ClusterIssuers (cluster-issuer.yaml) — the issuers need the CRDs + webhook.
 apiVersion: helm.cattle.io/v1
 kind: HelmChart
 metadata:
  name: cert-manager
  namespace: kube-system
 spec:
  repo: https://charts.jetstack.io
  chart: cert-manager
  # Pin a version; bump to the latest stable when you upgrade.
  version: v1.16.2
  targetNamespace: cert-manager
  createNamespace: true
  valuesContent: |-
    crds:
      enabled: true
    # Single-node box — keep the footprint modest.
    resources:
      requests:
        cpu: 10m
        memory: 64Mi
    webhook:
      resources:
        requests:
          cpu: 10m
          memory: 32Mi
    cainjector:
      resources:
        requests:
          cpu: 10m
          memory: 64Mi
@@ -0,0 +1,43 @@
 # ACME ClusterIssuers (HTTP-01 via the k3s-bundled Traefik ingress).
 #
 # Apply ONLY after cert-manager is Running:
 #   kubectl -n cert-manager rollout status deploy/cert-manager-webhook
 #
 # Two issuers:
 #   - letsencrypt-staging : use while testing (high rate limits, UNTRUSTED
 #     certs). Point an Ingress at this first to prove the HTTP-01 flow works.
 #   - letsencrypt-prod    : the real one the app Ingresses reference. Switch to
 #     it once staging issues cleanly, to avoid burning Let's Encrypt's strict
 #     prod rate limits on misconfigurations.
 #
 # HTTP-01 needs the hostname to resolve to this box (DNS A record -> 46.4.78.187)
 # and port 80 reachable — both are already true (firewall opens 80 to the world).
 apiVersion: cert-manager.io/v1
 kind: ClusterIssuer
 metadata:
  name: letsencrypt-staging
 spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    email: info@dezky.eu
    privateKeySecretRef:
      name: letsencrypt-staging-account-key
    solvers:
      - http01:
          ingress:
            class: traefik
 ---
 apiVersion: cert-manager.io/v1
 kind: ClusterIssuer
 metadata:
  name: letsencrypt-prod
 spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: info@dezky.eu
    privateKeySecretRef:
      name: letsencrypt-prod-account-key
    solvers:
      - http01:
          ingress:
            class: traefik
@@ -0,0 +1,49 @@
 # fleet/data — in-cluster data tier
 PostgreSQL 16 (Authentik + OCIS), MongoDB 7 (portal/platform-api) and Redis 7
 (cache/sessions) as single-node StatefulSets on **Longhorn** volumes
 (`storageClassName: longhorn` — see `../longhorn/`), in the `dezky-data`
 namespace. Mirrors the dev docker-compose stack. Self-hosted on the box — no
 external/managed DBs (EU-sovereign).
 > Prereq: Longhorn must be installed and its `longhorn` StorageClass present
 > before applying these (the PVCs request it). See `../longhorn/README.md`.
 Stable in-cluster DNS:
 - `postgres.dezky-data.svc.cluster.local:5432`
 - `mongo.dezky-data.svc.cluster.local:27017`
 - `redis.dezky-data.svc.cluster.local:6379`
 ## Apply
 ```bash
 # 1) Secrets first (out-of-band — NOT in git). Generate values with openssl.
 cp secrets.example.yaml /tmp/data-secrets.yaml
 $EDITOR /tmp/data-secrets.yaml         # fill every REPLACE_* (openssl rand -hex 24)
 kubectl create namespace dezky-data --dry-run=client -o yaml | kubectl apply -f -
 kubectl apply -f /tmp/data-secrets.yaml && rm /tmp/data-secrets.yaml
 # 2) The data tier
 kubectl apply -k .
 # 3) Watch them come up
 kubectl -n dezky-data rollout status statefulset/postgres
 kubectl -n dezky-data rollout status statefulset/mongo
 kubectl -n dezky-data rollout status statefulset/redis
 kubectl -n dezky-data get pods,pvc
 ```
 ## Notes
 - **Postgres init runs once** (empty data dir): `postgres-init` ConfigMap
  creates the `authentik` + `ocis` databases/roles using
  `AUTHENTIK_DB_PASSWORD` / `OCIS_DB_PASSWORD` from the secret. If you change
  those passwords later, alter the roles in SQL — re-init won't re-run on an
  existing volume.
 - Store all generated passwords in **Bitwarden**. `AUTHENTIK_DB_PASSWORD` /
  `OCIS_DB_PASSWORD` must match what you later give Authentik and OCIS.
 - **Backups:** Longhorn snapshots + backs these volumes up to Hetzner Object
  Storage (S3) — see `../longhorn/README.md`. Block snapshots of a live DB are
  crash-consistent at best, so also run `pg_dump`/`mongodump` CronJobs (added
  next) into a Longhorn PVC; restore from those logical dumps, not the raw
  data dirs.
 - Single replica each — fine for one node. HA/replicas are a later concern.
@@ -0,0 +1,12 @@
 apiVersion: kustomize.config.k8s.io/v1beta1
 kind: Kustomization
 namespace: dezky-data
 # Non-secret resources only. Real secrets (secrets.example.yaml) are applied
 # out-of-band and deliberately NOT listed here — same pattern as apps/.
 resources:
  - namespace.yaml
  - postgres-init.yaml
  - postgres.yaml
  - mongodb.yaml
  - redis.yaml
@@ -0,0 +1,78 @@
 # MongoDB 7 — portal / platform-api application data (mirrors the dev stack).
 # Single-node StatefulSet on k3s local-path storage. App DBs/collections are
 # created by the apps on first use; root creds come from mongo-secret.
 apiVersion: v1
 kind: Service
 metadata:
  name: mongo
  namespace: dezky-data
 spec:
  clusterIP: None          # headless: stable DNS mongo.dezky-data:27017
  selector:
    app: mongo
  ports:
    - name: mongo
      port: 27017
      targetPort: 27017
 ---
 apiVersion: apps/v1
 kind: StatefulSet
 metadata:
  name: mongo
  namespace: dezky-data
 spec:
  serviceName: mongo
  replicas: 1
  selector:
    matchLabels:
      app: mongo
  template:
    metadata:
      labels:
        app: mongo
    spec:
      containers:
        - name: mongo
          image: mongo:7
          args: ["--bind_ip_all"]
          ports:
            - containerPort: 27017
          env:
            - name: MONGO_INITDB_ROOT_USERNAME
              valueFrom:
                secretKeyRef:
                  name: mongo-secret
                  key: root-username
            - name: MONGO_INITDB_ROOT_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: mongo-secret
                  key: root-password
          volumeMounts:
            - name: data
              mountPath: /data/db
          resources:
            requests:
              cpu: 100m
              memory: 256Mi
            limits:
              memory: 1Gi
          readinessProbe:
            exec:
              command: ["mongosh", "--quiet", "--eval", "db.adminCommand('ping')"]
            initialDelaySeconds: 15
            periodSeconds: 10
          livenessProbe:
            exec:
              command: ["mongosh", "--quiet", "--eval", "db.adminCommand('ping')"]
            initialDelaySeconds: 30
            periodSeconds: 20
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: longhorn
        resources:
          requests:
            storage: 20Gi
@@ -0,0 +1,6 @@
 apiVersion: v1
 kind: Namespace
 metadata:
  name: dezky-data
  labels:
    app.kubernetes.io/part-of: dezky
@@ -0,0 +1,20 @@
 # Runs once, on first Postgres init (empty data dir), via the official image's
 # /docker-entrypoint-initdb.d hook. Creates the per-service databases + roles
 # Authentik and OCIS need. Passwords come from the postgres-secret env (see
 # secrets.example.yaml) — never hard-code them here.
 apiVersion: v1
 kind: ConfigMap
 metadata:
  name: postgres-init
  namespace: dezky-data
 data:
  10-extra-databases.sh: |
    #!/bin/bash
    set -euo pipefail
    psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" <<-EOSQL
      CREATE ROLE authentik LOGIN PASSWORD '${AUTHENTIK_DB_PASSWORD}';
      CREATE DATABASE authentik OWNER authentik;
      CREATE ROLE ocis LOGIN PASSWORD '${OCIS_DB_PASSWORD}';
      CREATE DATABASE ocis OWNER ocis;
    EOSQL
@@ -0,0 +1,82 @@
 # PostgreSQL 16 — shared RDBMS for Authentik + OCIS (mirrors the dev stack).
 # Single-node StatefulSet on k3s local-path storage. Logical dumps for backup
 # are added by a pg_dump CronJob (Restic captures the dump dir on the host).
 apiVersion: v1
 kind: Service
 metadata:
  name: postgres
  namespace: dezky-data
 spec:
  clusterIP: None          # headless: stable DNS postgres.dezky-data:5432
  selector:
    app: postgres
  ports:
    - name: postgres
      port: 5432
      targetPort: 5432
 ---
 apiVersion: apps/v1
 kind: StatefulSet
 metadata:
  name: postgres
  namespace: dezky-data
 spec:
  serviceName: postgres
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      # No fsGroup needed: the postgres image entrypoint runs as root and
      # chowns PGDATA to the postgres user before stepping down.
      containers:
        - name: postgres
          image: postgres:16-alpine
          ports:
            - containerPort: 5432
          env:
            - name: POSTGRES_USER
              value: postgres
            - name: PGDATA
              value: /var/lib/postgresql/data/pgdata   # subdir avoids lost+found clash
          envFrom:
            - secretRef:
                name: postgres-secret    # POSTGRES_PASSWORD, AUTHENTIK_DB_PASSWORD, OCIS_DB_PASSWORD
          volumeMounts:
            - name: data
              mountPath: /var/lib/postgresql/data
            - name: init
              mountPath: /docker-entrypoint-initdb.d
          resources:
            requests:
              cpu: 100m
              memory: 256Mi
            limits:
              memory: 1Gi
          readinessProbe:
            exec:
              command: ["pg_isready", "-U", "postgres"]
            initialDelaySeconds: 10
            periodSeconds: 10
          livenessProbe:
            exec:
              command: ["pg_isready", "-U", "postgres"]
            initialDelaySeconds: 30
            periodSeconds: 20
      volumes:
        - name: init
          configMap:
            name: postgres-init
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: longhorn
        resources:
          requests:
            storage: 10Gi
@@ -0,0 +1,78 @@
 # Redis 7 — cache / session store (Authentik, and available to the apps).
 # Password-protected (requirepass) even in-cluster; AOF persistence on a small
 # local-path volume so sessions survive restarts.
 apiVersion: v1
 kind: Service
 metadata:
  name: redis
  namespace: dezky-data
 spec:
  clusterIP: None          # headless: stable DNS redis.dezky-data:6379
  selector:
    app: redis
  ports:
    - name: redis
      port: 6379
      targetPort: 6379
 ---
 apiVersion: apps/v1
 kind: StatefulSet
 metadata:
  name: redis
  namespace: dezky-data
 spec:
  serviceName: redis
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
        - name: redis
          image: redis:7-alpine
          command: ["redis-server"]
          args:
            - "--requirepass"
            - "$(REDIS_PASSWORD)"
            - "--appendonly"
            - "yes"
          ports:
            - containerPort: 6379
          env:
            - name: REDIS_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: redis-secret
                  key: REDIS_PASSWORD
          volumeMounts:
            - name: data
              mountPath: /data
          resources:
            requests:
              cpu: 50m
              memory: 64Mi
            limits:
              memory: 256Mi
          readinessProbe:
            exec:
              command: ["sh", "-c", 'redis-cli -a "$REDIS_PASSWORD" ping']
            initialDelaySeconds: 5
            periodSeconds: 10
          livenessProbe:
            exec:
              command: ["sh", "-c", 'redis-cli -a "$REDIS_PASSWORD" ping']
            initialDelaySeconds: 15
            periodSeconds: 20
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: longhorn
        resources:
          requests:
            storage: 2Gi
@@ -0,0 +1,39 @@
 # SECRET TEMPLATE for the data tier — copy, fill, apply OUT-OF-BAND.
 # NEVER commit real values. Excluded from kustomization.yaml on purpose.
 #
 #   cp secrets.example.yaml /tmp/data-secrets.yaml
 #   # fill every REPLACE_* (openssl rand -hex 24)
 #   kubectl apply -f /tmp/data-secrets.yaml && rm /tmp/data-secrets.yaml
 #
 # Record these in Bitwarden — losing them locks you out of the DBs. The
 # AUTHENTIK_DB_PASSWORD / OCIS_DB_PASSWORD must match what you give Authentik
 # and OCIS in their own configs.
 apiVersion: v1
 kind: Secret
 metadata:
  name: postgres-secret
  namespace: dezky-data
 type: Opaque
 stringData:
  POSTGRES_PASSWORD: REPLACE_superuser_pw      # openssl rand -hex 24
  AUTHENTIK_DB_PASSWORD: REPLACE_authentik_pw  # openssl rand -hex 24
  OCIS_DB_PASSWORD: REPLACE_ocis_pw            # openssl rand -hex 24
 ---
 apiVersion: v1
 kind: Secret
 metadata:
  name: mongo-secret
  namespace: dezky-data
 type: Opaque
 stringData:
  root-username: dezky
  root-password: REPLACE_mongo_root_pw         # openssl rand -hex 24
 ---
 apiVersion: v1
 kind: Secret
 metadata:
  name: redis-secret
  namespace: dezky-data
 type: Opaque
 stringData:
  REDIS_PASSWORD: REPLACE_redis_pw             # openssl rand -hex 24
@@ -0,0 +1,68 @@
 # fleet/longhorn — block storage for the data tier
 Longhorn provides the `longhorn` StorageClass that the data tier (Postgres /
 Mongo / Redis) and other stateful apps use. Single node for now (replica = 1):
 durability is the same as local disk, but you gain **snapshots** and **off-box
 backups to Hetzner Object Storage**, plus a clean path to multi-node later.
 You install Longhorn; this dir holds the **config** (`values.yaml`) + the backup
 credential template.
 ## 1. Host prerequisite (every node)
 `open-iscsi` + a running `iscsid`, and `nfs-common`. Already baked into
 `../../host/bootstrap.sh` — but the node is already bootstrapped, so install it
 **now** on node1:
 ```bash
 sudo apt-get install -y open-iscsi nfs-common
 sudo systemctl enable --now iscsid
 systemctl is-active iscsid          # -> active
 ```
 (Optional but recommended) run Longhorn's environment check before installing:
 ```bash
 curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.12.0/scripts/environment_check.sh | bash
 ```
 ## 2. Install (your step) with this config
 ```bash
 helm repo add longhorn https://charts.longhorn.io && helm repo update
 helm install longhorn longhorn/longhorn \
  -n longhorn-system --create-namespace \
  --version 1.12.0 -f values.yaml
 kubectl -n longhorn-system rollout status deploy/longhorn-driver-deployer
 kubectl get storageclass            # 'longhorn' present + (default)
 ```
 ## 3. Make Longhorn the only default StorageClass
 `values.yaml` sets Longhorn as default — now drop k3s's local-path default so
 there aren't two:
 ```bash
 kubectl patch storageclass local-path \
  -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
 kubectl get storageclass            # only 'longhorn' shows (default)
 ```
 ## 4. Backups → Hetzner Object Storage (S3)
 1. In Hetzner: create a bucket (e.g. `dezky-longhorn`) + an S3 key pair; note the
   endpoint (`https://fsn1.your-objectstorage.com`).
 2. Fill + apply `backup-secret.example.yaml` (creds → Bitwarden).
 3. Set the backup target (UI: **Settings → General**, or uncomment in
   `values.yaml` + upgrade):
   - Backup Target: `s3://dezky-longhorn@fsn1/`
   - Backup Target Credential Secret: `longhorn-backup-secret`
 4. Add a **RecurringJob** (UI → Recurring Job, or a `RecurringJob` CR): e.g. a
   nightly `backup` with retention 14, applied to the `default` volume group so
   every PV is backed up off-box.
 ## How this changes the backup story
 Longhorn now owns volume-level snapshots + S3 backups, so the host `restic`
 layer no longer needs to capture `/var/lib/rancher/k3s/storage` (local-path).
 Keep restic for the **host** bits (Stalwart mail store, k3s etcd snapshots), and
 still take **logical DB dumps** (`pg_dump`/`mongodump`) into a Longhorn PVC —
 Longhorn backs that up to S3 and a logical dump is what you actually restore
 from. (Crash-consistent block snapshots of a live DB are a last resort.)
 ## Notes
 - Bump `defaultReplicaCount` to 2–3 in `values.yaml` (helm upgrade) once more
  nodes join; Longhorn rebalances.
 - The UI Ingress is intentionally **off** — it's full storage admin. Gate it
  behind an IP allowlist or Authentik before exposing it.
@@ -0,0 +1,28 @@
 # Longhorn backup target credentials → Hetzner Object Storage (S3-compatible).
 # Template — fill + apply OUT-OF-BAND, never commit real keys. Store the keys
 # in Bitwarden.
 #
 #   1. Create a bucket (e.g. dezky-longhorn) + an S3 key pair in Hetzner Cloud
 #      Console → Object Storage. Note the endpoint, e.g.:
 #        Falkenstein  https://fsn1.your-objectstorage.com
 #        Nuremberg    https://nbg1.your-objectstorage.com
 #        Helsinki     https://hel1.your-objectstorage.com
 #   2. Fill this and apply:
 #        kubectl apply -f /tmp/longhorn-backup-secret.yaml
 #   3. Set the backup target (UI: Settings → General, or in values.yaml):
 #        Backup Target:            s3://dezky-longhorn@fsn1/
 #        Backup Target Credential: longhorn-backup-secret
 #      (The "@fsn1" region tag is just a label for non-AWS S3; the real endpoint
 #       comes from AWS_ENDPOINTS below.)
 apiVersion: v1
 kind: Secret
 metadata:
  name: longhorn-backup-secret
  namespace: longhorn-system
 type: Opaque
 stringData:
  AWS_ACCESS_KEY_ID: REPLACE_hetzner_s3_access_key
  AWS_SECRET_ACCESS_KEY: REPLACE_hetzner_s3_secret_key
  AWS_ENDPOINTS: https://fsn1.your-objectstorage.com
  # Hetzner Object Storage uses virtual-hosted-style addressing.
  VIRTUAL_HOSTED_STYLE: "true"
@@ -0,0 +1,42 @@
 # Longhorn Helm values — single-node config for the dezky AX41 (node1).
 # You install Longhorn; feed it these values, e.g.:
 #
 #   helm repo add longhorn https://charts.longhorn.io && helm repo update
 #   helm install longhorn longhorn/longhorn \
 #     -n longhorn-system --create-namespace \
 #     --version 1.12.0 -f values.yaml
 #
 # (Or paste this into Rancher → Apps → Longhorn → Edit YAML.)
 #
 # Host prereq (added to bootstrap.sh): open-iscsi + a running iscsid + nfs-common
 # on EVERY node. Verify: `systemctl is-active iscsid` → active.
 defaultSettings:
  # Single node → 1 replica. No cross-node redundancy yet (durability is the
  # same as local disk, but you gain snapshots + off-box backups). Bump to 2–3
  # once you add nodes and Longhorn will rebalance.
  defaultReplicaCount: 1
  # Replica data lives here on the AX41 NVMe.
  defaultDataPath: /var/lib/longhorn
  # Don't pack the disk to 100%.
  storageMinimalAvailablePercentage: 15
  storageOverProvisioningPercentage: 100
  # Tidy up orphaned replicas automatically.
  orphanResourceAutoDeletion: "replica-data"
  # ── Backups → Hetzner Object Storage (set after creating the bucket+secret;
  #    see README). Can also be set in the UI under Settings → General. ──
  # backupTarget: s3://dezky-longhorn@fsn1/
  # backupTargetCredentialSecret: longhorn-backup-secret
 persistence:
  # Make Longhorn the DEFAULT StorageClass so PVCs land on it automatically.
  # ALSO unset local-path's default flag (one default only — see README).
  defaultClass: true
  defaultClassReplicaCount: 1
  # Databases: keep the volume if a PVC is deleted, until you reclaim it by hand.
  reclaimPolicy: Retain
 # The Longhorn UI is full storage admin — keep its Ingress OFF until you decide
 # how to protect it (IP allowlist at Traefik, or behind Authentik forward-auth).
 ingress:
  enabled: false
@@ -63,8 +63,12 @@ apt-get upgrade -y -qq
 apt-get install -y -qq \
    nftables fail2ban unattended-upgrades apt-listchanges \
    curl ca-certificates gnupg htop tmux vim chrony \
    open-iscsi nfs-common \
    >/dev/null
-ok "Base packages installed."
+# Longhorn requires a running iscsid on every node; nfs-common is needed for
 # RWX volumes / NFS backup targets.
 systemctl enable --now iscsid >/dev/null 2>&1 || true
 ok "Base packages installed (incl. Longhorn prereqs: open-iscsi, nfs-common)."
 # ── Step 2: hostname + timezone + time sync ────────────────────────────────
 info "Step 2: Hostname, timezone (UTC), time sync..."