feat(infra): k3s foundation — cert-manager, Longhorn config, in-cluster data tier
ci / typecheck (map[dir:apps/website name:website]) (push) Failing after 10m58s
ci / typecheck (map[dir:apps/portal name:portal]) (push) Failing after 11m56s
ci / typecheck (map[dir:apps/booking name:booking]) (push) Failing after 14m0s
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Failing after 10m58s
ci / typecheck (map[dir:apps/portal name:portal]) (push) Failing after 11m56s
ci / typecheck (map[dir:apps/booking name:booking]) (push) Failing after 14m0s
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
Adds the production cluster foundation (authored + applied live on node1): - cert-manager via the k3s HelmChart controller + letsencrypt staging/prod ClusterIssuers (HTTP-01 / Traefik). - Longhorn config for single-node (values: replica=1, default StorageClass, Retain) + backup-to-Hetzner-Object-Storage credential template. - In-cluster data tier (dezky-data): Postgres 16 (with Authentik+OCIS DB init), MongoDB 7, Redis 7 as StatefulSets on Longhorn, + secret template. - bootstrap.sh: install open-iscsi/nfs-common + enable iscsid (Longhorn prereq). - RUNBOOK.md: full reproducible node1 build order. Real secrets are generated on-box and kept in Bitwarden — never in git.
This commit is contained in:
@@ -0,0 +1,68 @@
|
||||
# fleet/longhorn — block storage for the data tier
|
||||
|
||||
Longhorn provides the `longhorn` StorageClass that the data tier (Postgres /
|
||||
Mongo / Redis) and other stateful apps use. Single node for now (replica = 1):
|
||||
durability is the same as local disk, but you gain **snapshots** and **off-box
|
||||
backups to Hetzner Object Storage**, plus a clean path to multi-node later.
|
||||
|
||||
You install Longhorn; this dir holds the **config** (`values.yaml`) + the backup
|
||||
credential template.
|
||||
|
||||
## 1. Host prerequisite (every node)
|
||||
`open-iscsi` + a running `iscsid`, and `nfs-common`. Already baked into
|
||||
`../../host/bootstrap.sh` — but the node is already bootstrapped, so install it
|
||||
**now** on node1:
|
||||
```bash
|
||||
sudo apt-get install -y open-iscsi nfs-common
|
||||
sudo systemctl enable --now iscsid
|
||||
systemctl is-active iscsid # -> active
|
||||
```
|
||||
(Optional but recommended) run Longhorn's environment check before installing:
|
||||
```bash
|
||||
curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.12.0/scripts/environment_check.sh | bash
|
||||
```
|
||||
|
||||
## 2. Install (your step) with this config
|
||||
```bash
|
||||
helm repo add longhorn https://charts.longhorn.io && helm repo update
|
||||
helm install longhorn longhorn/longhorn \
|
||||
-n longhorn-system --create-namespace \
|
||||
--version 1.12.0 -f values.yaml
|
||||
kubectl -n longhorn-system rollout status deploy/longhorn-driver-deployer
|
||||
kubectl get storageclass # 'longhorn' present + (default)
|
||||
```
|
||||
|
||||
## 3. Make Longhorn the only default StorageClass
|
||||
`values.yaml` sets Longhorn as default — now drop k3s's local-path default so
|
||||
there aren't two:
|
||||
```bash
|
||||
kubectl patch storageclass local-path \
|
||||
-p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
|
||||
kubectl get storageclass # only 'longhorn' shows (default)
|
||||
```
|
||||
|
||||
## 4. Backups → Hetzner Object Storage (S3)
|
||||
1. In Hetzner: create a bucket (e.g. `dezky-longhorn`) + an S3 key pair; note the
|
||||
endpoint (`https://fsn1.your-objectstorage.com`).
|
||||
2. Fill + apply `backup-secret.example.yaml` (creds → Bitwarden).
|
||||
3. Set the backup target (UI: **Settings → General**, or uncomment in
|
||||
`values.yaml` + upgrade):
|
||||
- Backup Target: `s3://dezky-longhorn@fsn1/`
|
||||
- Backup Target Credential Secret: `longhorn-backup-secret`
|
||||
4. Add a **RecurringJob** (UI → Recurring Job, or a `RecurringJob` CR): e.g. a
|
||||
nightly `backup` with retention 14, applied to the `default` volume group so
|
||||
every PV is backed up off-box.
|
||||
|
||||
## How this changes the backup story
|
||||
Longhorn now owns volume-level snapshots + S3 backups, so the host `restic`
|
||||
layer no longer needs to capture `/var/lib/rancher/k3s/storage` (local-path).
|
||||
Keep restic for the **host** bits (Stalwart mail store, k3s etcd snapshots), and
|
||||
still take **logical DB dumps** (`pg_dump`/`mongodump`) into a Longhorn PVC —
|
||||
Longhorn backs that up to S3 and a logical dump is what you actually restore
|
||||
from. (Crash-consistent block snapshots of a live DB are a last resort.)
|
||||
|
||||
## Notes
|
||||
- Bump `defaultReplicaCount` to 2–3 in `values.yaml` (helm upgrade) once more
|
||||
nodes join; Longhorn rebalances.
|
||||
- The UI Ingress is intentionally **off** — it's full storage admin. Gate it
|
||||
behind an IP allowlist or Authentik before exposing it.
|
||||
@@ -0,0 +1,28 @@
|
||||
# Longhorn backup target credentials → Hetzner Object Storage (S3-compatible).
|
||||
# Template — fill + apply OUT-OF-BAND, never commit real keys. Store the keys
|
||||
# in Bitwarden.
|
||||
#
|
||||
# 1. Create a bucket (e.g. dezky-longhorn) + an S3 key pair in Hetzner Cloud
|
||||
# Console → Object Storage. Note the endpoint, e.g.:
|
||||
# Falkenstein https://fsn1.your-objectstorage.com
|
||||
# Nuremberg https://nbg1.your-objectstorage.com
|
||||
# Helsinki https://hel1.your-objectstorage.com
|
||||
# 2. Fill this and apply:
|
||||
# kubectl apply -f /tmp/longhorn-backup-secret.yaml
|
||||
# 3. Set the backup target (UI: Settings → General, or in values.yaml):
|
||||
# Backup Target: s3://dezky-longhorn@fsn1/
|
||||
# Backup Target Credential: longhorn-backup-secret
|
||||
# (The "@fsn1" region tag is just a label for non-AWS S3; the real endpoint
|
||||
# comes from AWS_ENDPOINTS below.)
|
||||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: longhorn-backup-secret
|
||||
namespace: longhorn-system
|
||||
type: Opaque
|
||||
stringData:
|
||||
AWS_ACCESS_KEY_ID: REPLACE_hetzner_s3_access_key
|
||||
AWS_SECRET_ACCESS_KEY: REPLACE_hetzner_s3_secret_key
|
||||
AWS_ENDPOINTS: https://fsn1.your-objectstorage.com
|
||||
# Hetzner Object Storage uses virtual-hosted-style addressing.
|
||||
VIRTUAL_HOSTED_STYLE: "true"
|
||||
@@ -0,0 +1,42 @@
|
||||
# Longhorn Helm values — single-node config for the dezky AX41 (node1).
|
||||
# You install Longhorn; feed it these values, e.g.:
|
||||
#
|
||||
# helm repo add longhorn https://charts.longhorn.io && helm repo update
|
||||
# helm install longhorn longhorn/longhorn \
|
||||
# -n longhorn-system --create-namespace \
|
||||
# --version 1.12.0 -f values.yaml
|
||||
#
|
||||
# (Or paste this into Rancher → Apps → Longhorn → Edit YAML.)
|
||||
#
|
||||
# Host prereq (added to bootstrap.sh): open-iscsi + a running iscsid + nfs-common
|
||||
# on EVERY node. Verify: `systemctl is-active iscsid` → active.
|
||||
|
||||
defaultSettings:
|
||||
# Single node → 1 replica. No cross-node redundancy yet (durability is the
|
||||
# same as local disk, but you gain snapshots + off-box backups). Bump to 2–3
|
||||
# once you add nodes and Longhorn will rebalance.
|
||||
defaultReplicaCount: 1
|
||||
# Replica data lives here on the AX41 NVMe.
|
||||
defaultDataPath: /var/lib/longhorn
|
||||
# Don't pack the disk to 100%.
|
||||
storageMinimalAvailablePercentage: 15
|
||||
storageOverProvisioningPercentage: 100
|
||||
# Tidy up orphaned replicas automatically.
|
||||
orphanResourceAutoDeletion: "replica-data"
|
||||
# ── Backups → Hetzner Object Storage (set after creating the bucket+secret;
|
||||
# see README). Can also be set in the UI under Settings → General. ──
|
||||
# backupTarget: s3://dezky-longhorn@fsn1/
|
||||
# backupTargetCredentialSecret: longhorn-backup-secret
|
||||
|
||||
persistence:
|
||||
# Make Longhorn the DEFAULT StorageClass so PVCs land on it automatically.
|
||||
# ALSO unset local-path's default flag (one default only — see README).
|
||||
defaultClass: true
|
||||
defaultClassReplicaCount: 1
|
||||
# Databases: keep the volume if a PVC is deleted, until you reclaim it by hand.
|
||||
reclaimPolicy: Retain
|
||||
|
||||
# The Longhorn UI is full storage admin — keep its Ingress OFF until you decide
|
||||
# how to protect it (IP allowlist at Traefik, or behind Authentik forward-auth).
|
||||
ingress:
|
||||
enabled: false
|
||||
Reference in New Issue
Block a user