feat(infra): k3s foundation — cert-manager, Longhorn config, in-cluster data tier
ci / typecheck (map[dir:apps/website name:website]) (push) Failing after 10m58s
ci / typecheck (map[dir:apps/portal name:portal]) (push) Failing after 11m56s
ci / typecheck (map[dir:apps/booking name:booking]) (push) Failing after 14m0s
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Failing after 10m58s
ci / typecheck (map[dir:apps/portal name:portal]) (push) Failing after 11m56s
ci / typecheck (map[dir:apps/booking name:booking]) (push) Failing after 14m0s
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
Adds the production cluster foundation (authored + applied live on node1): - cert-manager via the k3s HelmChart controller + letsencrypt staging/prod ClusterIssuers (HTTP-01 / Traefik). - Longhorn config for single-node (values: replica=1, default StorageClass, Retain) + backup-to-Hetzner-Object-Storage credential template. - In-cluster data tier (dezky-data): Postgres 16 (with Authentik+OCIS DB init), MongoDB 7, Redis 7 as StatefulSets on Longhorn, + secret template. - bootstrap.sh: install open-iscsi/nfs-common + enable iscsid (Longhorn prereq). - RUNBOOK.md: full reproducible node1 build order. Real secrets are generated on-box and kept in Bitwarden — never in git.
This commit is contained in:
@@ -37,6 +37,9 @@ data/
|
|||||||
# But keep app-level data/ dirs — operator carries mock fixtures there.
|
# But keep app-level data/ dirs — operator carries mock fixtures there.
|
||||||
!apps/*/data/
|
!apps/*/data/
|
||||||
!apps/*/data/**
|
!apps/*/data/**
|
||||||
|
# ...and the production fleet data-tier manifests (k8s YAML, not volume data).
|
||||||
|
!infrastructure/production/fleet/data/
|
||||||
|
!infrastructure/production/fleet/data/**
|
||||||
|
|
||||||
# Coverage
|
# Coverage
|
||||||
coverage/
|
coverage/
|
||||||
|
|||||||
@@ -0,0 +1,114 @@
|
|||||||
|
# Dezky production — node1 build runbook
|
||||||
|
|
||||||
|
The actual, reproducible order used to stand up **node1.dezky.eu** (Hetzner
|
||||||
|
AX41, `46.4.78.187`, Ubuntu 24.04). If the box is lost, follow this top to
|
||||||
|
bottom to rebuild it. Per-layer detail lives in `host/README.md`,
|
||||||
|
`fleet/cert-manager/`, `fleet/longhorn/`, `fleet/data/`.
|
||||||
|
|
||||||
|
> Secrets are **never** in git. They're generated with `openssl rand -hex 24`
|
||||||
|
> and stored in **Bitwarden**. See "Secrets" below for how to read the live
|
||||||
|
> values back out of the cluster.
|
||||||
|
|
||||||
|
## Current state (built 2026-06-08)
|
||||||
|
|
||||||
|
- **Host:** hardened via `host/bootstrap.sh` — `dezky` admin user, **key-only
|
||||||
|
SSH** (no root, no passwords), k3s-safe nftables firewall (SSH/6443 → mgmt
|
||||||
|
IPs `46.32.144.38`/`46.32.144.45`; 80/443+mail → world), fail2ban,
|
||||||
|
unattended-upgrades, `open-iscsi`+`iscsid` (Longhorn prereq).
|
||||||
|
`dezky` has **NOPASSWD sudo** (`/etc/sudoers.d/90-dezky`).
|
||||||
|
- **k3s** v1.33.11 — single node (control-plane/etcd/worker), registered in
|
||||||
|
Rancher (`91.99.122.153`).
|
||||||
|
- **Longhorn** — default StorageClass, `numberOfReplicas: 1` (single node).
|
||||||
|
- **cert-manager** + `letsencrypt-staging` / `letsencrypt-prod` (HTTP-01/Traefik).
|
||||||
|
- **Data tier** (`dezky-data` ns) — Postgres 16, Mongo 7, Redis 7 as
|
||||||
|
StatefulSets on Longhorn PVCs. Postgres holds the `authentik` + `ocis` DBs.
|
||||||
|
|
||||||
|
## Reproduce from scratch
|
||||||
|
|
||||||
|
### 1. Host layer
|
||||||
|
```bash
|
||||||
|
# from laptop
|
||||||
|
scp -r infrastructure/production/host root@<ip>:/opt/dezky-host
|
||||||
|
# copy/fill config.env on the box (gitignored — MGMT IPs, ADMIN_SSH_PUBKEY,
|
||||||
|
# RANCHER_* token/checksum, STALWART_*, RESTIC_*)
|
||||||
|
ssh root@<ip> 'cd /opt/dezky-host && ./bootstrap.sh'
|
||||||
|
# set a console/sudo password for the admin user, then (optional) NOPASSWD:
|
||||||
|
ssh root@<ip> 'passwd dezky'
|
||||||
|
ssh dezky@<ip> "echo 'dezky ALL=(ALL) NOPASSWD:ALL' | sudo tee /etc/sudoers.d/90-dezky && sudo chmod 0440 /etc/sudoers.d/90-dezky"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. k3s + kubectl access
|
||||||
|
```bash
|
||||||
|
ssh dezky@<ip>
|
||||||
|
sudo /opt/dezky-host/k3s/register.sh # joins the Rancher Custom (K3s) cluster
|
||||||
|
kubectl --kubeconfig /etc/rancher/k3s/k3s.yaml get nodes # -> Ready
|
||||||
|
# give dezky a kubeconfig:
|
||||||
|
mkdir -p ~/.kube && sudo install -m 600 -o dezky -g dezky /etc/rancher/k3s/k3s.yaml ~/.kube/config
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Longhorn (storage)
|
||||||
|
```bash
|
||||||
|
sudo apt-get install -y open-iscsi nfs-common && sudo systemctl enable --now iscsid # (bootstrap.sh does this now)
|
||||||
|
helm repo add longhorn https://charts.longhorn.io && helm repo update
|
||||||
|
helm install longhorn longhorn/longhorn -n longhorn-system --create-namespace \
|
||||||
|
--version 1.12.0 -f fleet/longhorn/values.yaml # replica=1, default class
|
||||||
|
# one default SC only:
|
||||||
|
kubectl patch storageclass local-path -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
|
||||||
|
kubectl -n longhorn-system patch settings.longhorn.io default-replica-count --type=merge -p '{"value":"1"}'
|
||||||
|
kubectl get storageclass # only 'longhorn (default)'
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. cert-manager + issuers
|
||||||
|
```bash
|
||||||
|
kubectl apply -f fleet/cert-manager/cert-manager.yaml
|
||||||
|
kubectl -n cert-manager rollout status deploy/cert-manager-webhook --timeout=180s
|
||||||
|
kubectl apply -f fleet/cert-manager/cluster-issuer.yaml
|
||||||
|
kubectl get clusterissuer # both READY=True
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Data tier
|
||||||
|
```bash
|
||||||
|
kubectl create namespace dezky-data --dry-run=client -o yaml | kubectl apply -f -
|
||||||
|
# secrets — generate fresh, store in Bitwarden:
|
||||||
|
kubectl -n dezky-data create secret generic postgres-secret \
|
||||||
|
--from-literal=POSTGRES_PASSWORD=$(openssl rand -hex 24) \
|
||||||
|
--from-literal=AUTHENTIK_DB_PASSWORD=$(openssl rand -hex 24) \
|
||||||
|
--from-literal=OCIS_DB_PASSWORD=$(openssl rand -hex 24)
|
||||||
|
kubectl -n dezky-data create secret generic mongo-secret \
|
||||||
|
--from-literal=root-username=dezky --from-literal=root-password=$(openssl rand -hex 24)
|
||||||
|
kubectl -n dezky-data create secret generic redis-secret \
|
||||||
|
--from-literal=REDIS_PASSWORD=$(openssl rand -hex 24)
|
||||||
|
kubectl apply -k fleet/data/
|
||||||
|
kubectl -n dezky-data get pods,pvc # all Running, PVCs Bound on longhorn
|
||||||
|
```
|
||||||
|
|
||||||
|
## Secrets — read live values for Bitwarden
|
||||||
|
|
||||||
|
```bash
|
||||||
|
k(){ kubectl -n dezky-data get secret "$1" -o jsonpath="{.data.$2}" | base64 -d; echo; }
|
||||||
|
k postgres-secret POSTGRES_PASSWORD
|
||||||
|
k postgres-secret AUTHENTIK_DB_PASSWORD # must match Authentik's DB config
|
||||||
|
k postgres-secret OCIS_DB_PASSWORD # must match OCIS's DB config
|
||||||
|
k mongo-secret root-password
|
||||||
|
k redis-secret REDIS_PASSWORD
|
||||||
|
```
|
||||||
|
|
||||||
|
## Still TODO (next layers)
|
||||||
|
|
||||||
|
1. **Authentik** (`auth.dezky.eu`) — OIDC for the portal; uses the `authentik`
|
||||||
|
Postgres DB + Redis.
|
||||||
|
2. **OCIS** (files) — uses the `ocis` Postgres DB + Hetzner Object Storage (S3).
|
||||||
|
3. **Apps** — `fleet/apps/` (portal · platform-api · booking) + their secrets.
|
||||||
|
4. **Stalwart** (host) — `host/stalwart/install.sh`; needs DNS + PTR.
|
||||||
|
5. **Backups** — Longhorn → Hetzner Object Storage (`fleet/longhorn/README.md`),
|
||||||
|
plus host Restic for the mail store + etcd snapshots, plus pg_dump/mongodump
|
||||||
|
CronJobs.
|
||||||
|
6. **DNS** — A records `api`/`app`/`booking`/`auth`/`mail`.dezky.eu → 46.4.78.187,
|
||||||
|
and PTR for mail.
|
||||||
|
|
||||||
|
## Access cheatsheet
|
||||||
|
- SSH: `ssh dezky@46.4.78.187` (key only). Root SSH disabled.
|
||||||
|
- kubectl: works as `dezky` (kubeconfig at `~/.kube/config`).
|
||||||
|
- Out-of-band if locked out: Hetzner Robot KVM/LARA or Rescue System.
|
||||||
|
- The `level=warning … 50-rancher.yaml: permission denied` from kubectl is
|
||||||
|
harmless noise (k3s kubectl probing a root-only config dir).
|
||||||
@@ -0,0 +1,29 @@
|
|||||||
|
# fleet/cert-manager — TLS for the cluster
|
||||||
|
|
||||||
|
cert-manager + ACME ClusterIssuers. Installs via the **k3s built-in Helm
|
||||||
|
controller** (no Helm CLI needed), then defines `letsencrypt-staging` and
|
||||||
|
`letsencrypt-prod` (HTTP-01 through the bundled Traefik).
|
||||||
|
|
||||||
|
## Apply order (matters — issuers need the CRDs first)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1) Install cert-manager
|
||||||
|
kubectl apply -f cert-manager.yaml
|
||||||
|
|
||||||
|
# 2) Wait until it's up (CRDs + webhook ready)
|
||||||
|
kubectl -n cert-manager rollout status deploy/cert-manager-webhook --timeout=180s
|
||||||
|
kubectl -n cert-manager get pods
|
||||||
|
|
||||||
|
# 3) Create the issuers
|
||||||
|
kubectl apply -f cluster-issuer.yaml
|
||||||
|
kubectl get clusterissuer # both should report READY=True
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
- ACME email is `info@dezky.eu` — change in `cluster-issuer.yaml` if needed.
|
||||||
|
- **Test with `letsencrypt-staging` first** (set an Ingress annotation
|
||||||
|
`cert-manager.io/cluster-issuer: letsencrypt-staging`) to avoid burning the
|
||||||
|
strict prod rate limits, then switch the apps to `letsencrypt-prod`.
|
||||||
|
- HTTP-01 requires each hostname's DNS A record → `46.4.78.187` and port 80
|
||||||
|
open (already true). A cert won't issue until DNS resolves.
|
||||||
|
- The app Ingresses (`fleet/apps/`) already reference `letsencrypt-prod`.
|
||||||
@@ -0,0 +1,37 @@
|
|||||||
|
# cert-manager, installed via the k3s built-in Helm controller
|
||||||
|
# (helm.cattle.io/v1). k3s watches HelmChart resources in any namespace and
|
||||||
|
# runs a `helm install` Job for them — no Helm CLI needed on your laptop.
|
||||||
|
#
|
||||||
|
# The chart installs its own CRDs (crds.enabled=true). Apply this first and
|
||||||
|
# wait for the cert-manager pods to be Running/Ready before applying the
|
||||||
|
# ClusterIssuers (cluster-issuer.yaml) — the issuers need the CRDs + webhook.
|
||||||
|
apiVersion: helm.cattle.io/v1
|
||||||
|
kind: HelmChart
|
||||||
|
metadata:
|
||||||
|
name: cert-manager
|
||||||
|
namespace: kube-system
|
||||||
|
spec:
|
||||||
|
repo: https://charts.jetstack.io
|
||||||
|
chart: cert-manager
|
||||||
|
# Pin a version; bump to the latest stable when you upgrade.
|
||||||
|
version: v1.16.2
|
||||||
|
targetNamespace: cert-manager
|
||||||
|
createNamespace: true
|
||||||
|
valuesContent: |-
|
||||||
|
crds:
|
||||||
|
enabled: true
|
||||||
|
# Single-node box — keep the footprint modest.
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 10m
|
||||||
|
memory: 64Mi
|
||||||
|
webhook:
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 10m
|
||||||
|
memory: 32Mi
|
||||||
|
cainjector:
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 10m
|
||||||
|
memory: 64Mi
|
||||||
@@ -0,0 +1,43 @@
|
|||||||
|
# ACME ClusterIssuers (HTTP-01 via the k3s-bundled Traefik ingress).
|
||||||
|
#
|
||||||
|
# Apply ONLY after cert-manager is Running:
|
||||||
|
# kubectl -n cert-manager rollout status deploy/cert-manager-webhook
|
||||||
|
#
|
||||||
|
# Two issuers:
|
||||||
|
# - letsencrypt-staging : use while testing (high rate limits, UNTRUSTED
|
||||||
|
# certs). Point an Ingress at this first to prove the HTTP-01 flow works.
|
||||||
|
# - letsencrypt-prod : the real one the app Ingresses reference. Switch to
|
||||||
|
# it once staging issues cleanly, to avoid burning Let's Encrypt's strict
|
||||||
|
# prod rate limits on misconfigurations.
|
||||||
|
#
|
||||||
|
# HTTP-01 needs the hostname to resolve to this box (DNS A record -> 46.4.78.187)
|
||||||
|
# and port 80 reachable — both are already true (firewall opens 80 to the world).
|
||||||
|
apiVersion: cert-manager.io/v1
|
||||||
|
kind: ClusterIssuer
|
||||||
|
metadata:
|
||||||
|
name: letsencrypt-staging
|
||||||
|
spec:
|
||||||
|
acme:
|
||||||
|
server: https://acme-staging-v02.api.letsencrypt.org/directory
|
||||||
|
email: info@dezky.eu
|
||||||
|
privateKeySecretRef:
|
||||||
|
name: letsencrypt-staging-account-key
|
||||||
|
solvers:
|
||||||
|
- http01:
|
||||||
|
ingress:
|
||||||
|
class: traefik
|
||||||
|
---
|
||||||
|
apiVersion: cert-manager.io/v1
|
||||||
|
kind: ClusterIssuer
|
||||||
|
metadata:
|
||||||
|
name: letsencrypt-prod
|
||||||
|
spec:
|
||||||
|
acme:
|
||||||
|
server: https://acme-v02.api.letsencrypt.org/directory
|
||||||
|
email: info@dezky.eu
|
||||||
|
privateKeySecretRef:
|
||||||
|
name: letsencrypt-prod-account-key
|
||||||
|
solvers:
|
||||||
|
- http01:
|
||||||
|
ingress:
|
||||||
|
class: traefik
|
||||||
@@ -0,0 +1,49 @@
|
|||||||
|
# fleet/data — in-cluster data tier
|
||||||
|
|
||||||
|
PostgreSQL 16 (Authentik + OCIS), MongoDB 7 (portal/platform-api) and Redis 7
|
||||||
|
(cache/sessions) as single-node StatefulSets on **Longhorn** volumes
|
||||||
|
(`storageClassName: longhorn` — see `../longhorn/`), in the `dezky-data`
|
||||||
|
namespace. Mirrors the dev docker-compose stack. Self-hosted on the box — no
|
||||||
|
external/managed DBs (EU-sovereign).
|
||||||
|
|
||||||
|
> Prereq: Longhorn must be installed and its `longhorn` StorageClass present
|
||||||
|
> before applying these (the PVCs request it). See `../longhorn/README.md`.
|
||||||
|
|
||||||
|
Stable in-cluster DNS:
|
||||||
|
- `postgres.dezky-data.svc.cluster.local:5432`
|
||||||
|
- `mongo.dezky-data.svc.cluster.local:27017`
|
||||||
|
- `redis.dezky-data.svc.cluster.local:6379`
|
||||||
|
|
||||||
|
## Apply
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1) Secrets first (out-of-band — NOT in git). Generate values with openssl.
|
||||||
|
cp secrets.example.yaml /tmp/data-secrets.yaml
|
||||||
|
$EDITOR /tmp/data-secrets.yaml # fill every REPLACE_* (openssl rand -hex 24)
|
||||||
|
kubectl create namespace dezky-data --dry-run=client -o yaml | kubectl apply -f -
|
||||||
|
kubectl apply -f /tmp/data-secrets.yaml && rm /tmp/data-secrets.yaml
|
||||||
|
|
||||||
|
# 2) The data tier
|
||||||
|
kubectl apply -k .
|
||||||
|
|
||||||
|
# 3) Watch them come up
|
||||||
|
kubectl -n dezky-data rollout status statefulset/postgres
|
||||||
|
kubectl -n dezky-data rollout status statefulset/mongo
|
||||||
|
kubectl -n dezky-data rollout status statefulset/redis
|
||||||
|
kubectl -n dezky-data get pods,pvc
|
||||||
|
```
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
- **Postgres init runs once** (empty data dir): `postgres-init` ConfigMap
|
||||||
|
creates the `authentik` + `ocis` databases/roles using
|
||||||
|
`AUTHENTIK_DB_PASSWORD` / `OCIS_DB_PASSWORD` from the secret. If you change
|
||||||
|
those passwords later, alter the roles in SQL — re-init won't re-run on an
|
||||||
|
existing volume.
|
||||||
|
- Store all generated passwords in **Bitwarden**. `AUTHENTIK_DB_PASSWORD` /
|
||||||
|
`OCIS_DB_PASSWORD` must match what you later give Authentik and OCIS.
|
||||||
|
- **Backups:** Longhorn snapshots + backs these volumes up to Hetzner Object
|
||||||
|
Storage (S3) — see `../longhorn/README.md`. Block snapshots of a live DB are
|
||||||
|
crash-consistent at best, so also run `pg_dump`/`mongodump` CronJobs (added
|
||||||
|
next) into a Longhorn PVC; restore from those logical dumps, not the raw
|
||||||
|
data dirs.
|
||||||
|
- Single replica each — fine for one node. HA/replicas are a later concern.
|
||||||
@@ -0,0 +1,12 @@
|
|||||||
|
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||||
|
kind: Kustomization
|
||||||
|
namespace: dezky-data
|
||||||
|
|
||||||
|
# Non-secret resources only. Real secrets (secrets.example.yaml) are applied
|
||||||
|
# out-of-band and deliberately NOT listed here — same pattern as apps/.
|
||||||
|
resources:
|
||||||
|
- namespace.yaml
|
||||||
|
- postgres-init.yaml
|
||||||
|
- postgres.yaml
|
||||||
|
- mongodb.yaml
|
||||||
|
- redis.yaml
|
||||||
@@ -0,0 +1,78 @@
|
|||||||
|
# MongoDB 7 — portal / platform-api application data (mirrors the dev stack).
|
||||||
|
# Single-node StatefulSet on k3s local-path storage. App DBs/collections are
|
||||||
|
# created by the apps on first use; root creds come from mongo-secret.
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Service
|
||||||
|
metadata:
|
||||||
|
name: mongo
|
||||||
|
namespace: dezky-data
|
||||||
|
spec:
|
||||||
|
clusterIP: None # headless: stable DNS mongo.dezky-data:27017
|
||||||
|
selector:
|
||||||
|
app: mongo
|
||||||
|
ports:
|
||||||
|
- name: mongo
|
||||||
|
port: 27017
|
||||||
|
targetPort: 27017
|
||||||
|
---
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: StatefulSet
|
||||||
|
metadata:
|
||||||
|
name: mongo
|
||||||
|
namespace: dezky-data
|
||||||
|
spec:
|
||||||
|
serviceName: mongo
|
||||||
|
replicas: 1
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: mongo
|
||||||
|
template:
|
||||||
|
metadata:
|
||||||
|
labels:
|
||||||
|
app: mongo
|
||||||
|
spec:
|
||||||
|
containers:
|
||||||
|
- name: mongo
|
||||||
|
image: mongo:7
|
||||||
|
args: ["--bind_ip_all"]
|
||||||
|
ports:
|
||||||
|
- containerPort: 27017
|
||||||
|
env:
|
||||||
|
- name: MONGO_INITDB_ROOT_USERNAME
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: mongo-secret
|
||||||
|
key: root-username
|
||||||
|
- name: MONGO_INITDB_ROOT_PASSWORD
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: mongo-secret
|
||||||
|
key: root-password
|
||||||
|
volumeMounts:
|
||||||
|
- name: data
|
||||||
|
mountPath: /data/db
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 100m
|
||||||
|
memory: 256Mi
|
||||||
|
limits:
|
||||||
|
memory: 1Gi
|
||||||
|
readinessProbe:
|
||||||
|
exec:
|
||||||
|
command: ["mongosh", "--quiet", "--eval", "db.adminCommand('ping')"]
|
||||||
|
initialDelaySeconds: 15
|
||||||
|
periodSeconds: 10
|
||||||
|
livenessProbe:
|
||||||
|
exec:
|
||||||
|
command: ["mongosh", "--quiet", "--eval", "db.adminCommand('ping')"]
|
||||||
|
initialDelaySeconds: 30
|
||||||
|
periodSeconds: 20
|
||||||
|
volumeClaimTemplates:
|
||||||
|
- metadata:
|
||||||
|
name: data
|
||||||
|
spec:
|
||||||
|
accessModes: ["ReadWriteOnce"]
|
||||||
|
storageClassName: longhorn
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
storage: 20Gi
|
||||||
@@ -0,0 +1,6 @@
|
|||||||
|
apiVersion: v1
|
||||||
|
kind: Namespace
|
||||||
|
metadata:
|
||||||
|
name: dezky-data
|
||||||
|
labels:
|
||||||
|
app.kubernetes.io/part-of: dezky
|
||||||
@@ -0,0 +1,20 @@
|
|||||||
|
# Runs once, on first Postgres init (empty data dir), via the official image's
|
||||||
|
# /docker-entrypoint-initdb.d hook. Creates the per-service databases + roles
|
||||||
|
# Authentik and OCIS need. Passwords come from the postgres-secret env (see
|
||||||
|
# secrets.example.yaml) — never hard-code them here.
|
||||||
|
apiVersion: v1
|
||||||
|
kind: ConfigMap
|
||||||
|
metadata:
|
||||||
|
name: postgres-init
|
||||||
|
namespace: dezky-data
|
||||||
|
data:
|
||||||
|
10-extra-databases.sh: |
|
||||||
|
#!/bin/bash
|
||||||
|
set -euo pipefail
|
||||||
|
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" <<-EOSQL
|
||||||
|
CREATE ROLE authentik LOGIN PASSWORD '${AUTHENTIK_DB_PASSWORD}';
|
||||||
|
CREATE DATABASE authentik OWNER authentik;
|
||||||
|
|
||||||
|
CREATE ROLE ocis LOGIN PASSWORD '${OCIS_DB_PASSWORD}';
|
||||||
|
CREATE DATABASE ocis OWNER ocis;
|
||||||
|
EOSQL
|
||||||
@@ -0,0 +1,82 @@
|
|||||||
|
# PostgreSQL 16 — shared RDBMS for Authentik + OCIS (mirrors the dev stack).
|
||||||
|
# Single-node StatefulSet on k3s local-path storage. Logical dumps for backup
|
||||||
|
# are added by a pg_dump CronJob (Restic captures the dump dir on the host).
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Service
|
||||||
|
metadata:
|
||||||
|
name: postgres
|
||||||
|
namespace: dezky-data
|
||||||
|
spec:
|
||||||
|
clusterIP: None # headless: stable DNS postgres.dezky-data:5432
|
||||||
|
selector:
|
||||||
|
app: postgres
|
||||||
|
ports:
|
||||||
|
- name: postgres
|
||||||
|
port: 5432
|
||||||
|
targetPort: 5432
|
||||||
|
---
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: StatefulSet
|
||||||
|
metadata:
|
||||||
|
name: postgres
|
||||||
|
namespace: dezky-data
|
||||||
|
spec:
|
||||||
|
serviceName: postgres
|
||||||
|
replicas: 1
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: postgres
|
||||||
|
template:
|
||||||
|
metadata:
|
||||||
|
labels:
|
||||||
|
app: postgres
|
||||||
|
spec:
|
||||||
|
# No fsGroup needed: the postgres image entrypoint runs as root and
|
||||||
|
# chowns PGDATA to the postgres user before stepping down.
|
||||||
|
containers:
|
||||||
|
- name: postgres
|
||||||
|
image: postgres:16-alpine
|
||||||
|
ports:
|
||||||
|
- containerPort: 5432
|
||||||
|
env:
|
||||||
|
- name: POSTGRES_USER
|
||||||
|
value: postgres
|
||||||
|
- name: PGDATA
|
||||||
|
value: /var/lib/postgresql/data/pgdata # subdir avoids lost+found clash
|
||||||
|
envFrom:
|
||||||
|
- secretRef:
|
||||||
|
name: postgres-secret # POSTGRES_PASSWORD, AUTHENTIK_DB_PASSWORD, OCIS_DB_PASSWORD
|
||||||
|
volumeMounts:
|
||||||
|
- name: data
|
||||||
|
mountPath: /var/lib/postgresql/data
|
||||||
|
- name: init
|
||||||
|
mountPath: /docker-entrypoint-initdb.d
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 100m
|
||||||
|
memory: 256Mi
|
||||||
|
limits:
|
||||||
|
memory: 1Gi
|
||||||
|
readinessProbe:
|
||||||
|
exec:
|
||||||
|
command: ["pg_isready", "-U", "postgres"]
|
||||||
|
initialDelaySeconds: 10
|
||||||
|
periodSeconds: 10
|
||||||
|
livenessProbe:
|
||||||
|
exec:
|
||||||
|
command: ["pg_isready", "-U", "postgres"]
|
||||||
|
initialDelaySeconds: 30
|
||||||
|
periodSeconds: 20
|
||||||
|
volumes:
|
||||||
|
- name: init
|
||||||
|
configMap:
|
||||||
|
name: postgres-init
|
||||||
|
volumeClaimTemplates:
|
||||||
|
- metadata:
|
||||||
|
name: data
|
||||||
|
spec:
|
||||||
|
accessModes: ["ReadWriteOnce"]
|
||||||
|
storageClassName: longhorn
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
storage: 10Gi
|
||||||
@@ -0,0 +1,78 @@
|
|||||||
|
# Redis 7 — cache / session store (Authentik, and available to the apps).
|
||||||
|
# Password-protected (requirepass) even in-cluster; AOF persistence on a small
|
||||||
|
# local-path volume so sessions survive restarts.
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Service
|
||||||
|
metadata:
|
||||||
|
name: redis
|
||||||
|
namespace: dezky-data
|
||||||
|
spec:
|
||||||
|
clusterIP: None # headless: stable DNS redis.dezky-data:6379
|
||||||
|
selector:
|
||||||
|
app: redis
|
||||||
|
ports:
|
||||||
|
- name: redis
|
||||||
|
port: 6379
|
||||||
|
targetPort: 6379
|
||||||
|
---
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: StatefulSet
|
||||||
|
metadata:
|
||||||
|
name: redis
|
||||||
|
namespace: dezky-data
|
||||||
|
spec:
|
||||||
|
serviceName: redis
|
||||||
|
replicas: 1
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: redis
|
||||||
|
template:
|
||||||
|
metadata:
|
||||||
|
labels:
|
||||||
|
app: redis
|
||||||
|
spec:
|
||||||
|
containers:
|
||||||
|
- name: redis
|
||||||
|
image: redis:7-alpine
|
||||||
|
command: ["redis-server"]
|
||||||
|
args:
|
||||||
|
- "--requirepass"
|
||||||
|
- "$(REDIS_PASSWORD)"
|
||||||
|
- "--appendonly"
|
||||||
|
- "yes"
|
||||||
|
ports:
|
||||||
|
- containerPort: 6379
|
||||||
|
env:
|
||||||
|
- name: REDIS_PASSWORD
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: redis-secret
|
||||||
|
key: REDIS_PASSWORD
|
||||||
|
volumeMounts:
|
||||||
|
- name: data
|
||||||
|
mountPath: /data
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 50m
|
||||||
|
memory: 64Mi
|
||||||
|
limits:
|
||||||
|
memory: 256Mi
|
||||||
|
readinessProbe:
|
||||||
|
exec:
|
||||||
|
command: ["sh", "-c", 'redis-cli -a "$REDIS_PASSWORD" ping']
|
||||||
|
initialDelaySeconds: 5
|
||||||
|
periodSeconds: 10
|
||||||
|
livenessProbe:
|
||||||
|
exec:
|
||||||
|
command: ["sh", "-c", 'redis-cli -a "$REDIS_PASSWORD" ping']
|
||||||
|
initialDelaySeconds: 15
|
||||||
|
periodSeconds: 20
|
||||||
|
volumeClaimTemplates:
|
||||||
|
- metadata:
|
||||||
|
name: data
|
||||||
|
spec:
|
||||||
|
accessModes: ["ReadWriteOnce"]
|
||||||
|
storageClassName: longhorn
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
storage: 2Gi
|
||||||
@@ -0,0 +1,39 @@
|
|||||||
|
# SECRET TEMPLATE for the data tier — copy, fill, apply OUT-OF-BAND.
|
||||||
|
# NEVER commit real values. Excluded from kustomization.yaml on purpose.
|
||||||
|
#
|
||||||
|
# cp secrets.example.yaml /tmp/data-secrets.yaml
|
||||||
|
# # fill every REPLACE_* (openssl rand -hex 24)
|
||||||
|
# kubectl apply -f /tmp/data-secrets.yaml && rm /tmp/data-secrets.yaml
|
||||||
|
#
|
||||||
|
# Record these in Bitwarden — losing them locks you out of the DBs. The
|
||||||
|
# AUTHENTIK_DB_PASSWORD / OCIS_DB_PASSWORD must match what you give Authentik
|
||||||
|
# and OCIS in their own configs.
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Secret
|
||||||
|
metadata:
|
||||||
|
name: postgres-secret
|
||||||
|
namespace: dezky-data
|
||||||
|
type: Opaque
|
||||||
|
stringData:
|
||||||
|
POSTGRES_PASSWORD: REPLACE_superuser_pw # openssl rand -hex 24
|
||||||
|
AUTHENTIK_DB_PASSWORD: REPLACE_authentik_pw # openssl rand -hex 24
|
||||||
|
OCIS_DB_PASSWORD: REPLACE_ocis_pw # openssl rand -hex 24
|
||||||
|
---
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Secret
|
||||||
|
metadata:
|
||||||
|
name: mongo-secret
|
||||||
|
namespace: dezky-data
|
||||||
|
type: Opaque
|
||||||
|
stringData:
|
||||||
|
root-username: dezky
|
||||||
|
root-password: REPLACE_mongo_root_pw # openssl rand -hex 24
|
||||||
|
---
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Secret
|
||||||
|
metadata:
|
||||||
|
name: redis-secret
|
||||||
|
namespace: dezky-data
|
||||||
|
type: Opaque
|
||||||
|
stringData:
|
||||||
|
REDIS_PASSWORD: REPLACE_redis_pw # openssl rand -hex 24
|
||||||
@@ -0,0 +1,68 @@
|
|||||||
|
# fleet/longhorn — block storage for the data tier
|
||||||
|
|
||||||
|
Longhorn provides the `longhorn` StorageClass that the data tier (Postgres /
|
||||||
|
Mongo / Redis) and other stateful apps use. Single node for now (replica = 1):
|
||||||
|
durability is the same as local disk, but you gain **snapshots** and **off-box
|
||||||
|
backups to Hetzner Object Storage**, plus a clean path to multi-node later.
|
||||||
|
|
||||||
|
You install Longhorn; this dir holds the **config** (`values.yaml`) + the backup
|
||||||
|
credential template.
|
||||||
|
|
||||||
|
## 1. Host prerequisite (every node)
|
||||||
|
`open-iscsi` + a running `iscsid`, and `nfs-common`. Already baked into
|
||||||
|
`../../host/bootstrap.sh` — but the node is already bootstrapped, so install it
|
||||||
|
**now** on node1:
|
||||||
|
```bash
|
||||||
|
sudo apt-get install -y open-iscsi nfs-common
|
||||||
|
sudo systemctl enable --now iscsid
|
||||||
|
systemctl is-active iscsid # -> active
|
||||||
|
```
|
||||||
|
(Optional but recommended) run Longhorn's environment check before installing:
|
||||||
|
```bash
|
||||||
|
curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.12.0/scripts/environment_check.sh | bash
|
||||||
|
```
|
||||||
|
|
||||||
|
## 2. Install (your step) with this config
|
||||||
|
```bash
|
||||||
|
helm repo add longhorn https://charts.longhorn.io && helm repo update
|
||||||
|
helm install longhorn longhorn/longhorn \
|
||||||
|
-n longhorn-system --create-namespace \
|
||||||
|
--version 1.12.0 -f values.yaml
|
||||||
|
kubectl -n longhorn-system rollout status deploy/longhorn-driver-deployer
|
||||||
|
kubectl get storageclass # 'longhorn' present + (default)
|
||||||
|
```
|
||||||
|
|
||||||
|
## 3. Make Longhorn the only default StorageClass
|
||||||
|
`values.yaml` sets Longhorn as default — now drop k3s's local-path default so
|
||||||
|
there aren't two:
|
||||||
|
```bash
|
||||||
|
kubectl patch storageclass local-path \
|
||||||
|
-p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
|
||||||
|
kubectl get storageclass # only 'longhorn' shows (default)
|
||||||
|
```
|
||||||
|
|
||||||
|
## 4. Backups → Hetzner Object Storage (S3)
|
||||||
|
1. In Hetzner: create a bucket (e.g. `dezky-longhorn`) + an S3 key pair; note the
|
||||||
|
endpoint (`https://fsn1.your-objectstorage.com`).
|
||||||
|
2. Fill + apply `backup-secret.example.yaml` (creds → Bitwarden).
|
||||||
|
3. Set the backup target (UI: **Settings → General**, or uncomment in
|
||||||
|
`values.yaml` + upgrade):
|
||||||
|
- Backup Target: `s3://dezky-longhorn@fsn1/`
|
||||||
|
- Backup Target Credential Secret: `longhorn-backup-secret`
|
||||||
|
4. Add a **RecurringJob** (UI → Recurring Job, or a `RecurringJob` CR): e.g. a
|
||||||
|
nightly `backup` with retention 14, applied to the `default` volume group so
|
||||||
|
every PV is backed up off-box.
|
||||||
|
|
||||||
|
## How this changes the backup story
|
||||||
|
Longhorn now owns volume-level snapshots + S3 backups, so the host `restic`
|
||||||
|
layer no longer needs to capture `/var/lib/rancher/k3s/storage` (local-path).
|
||||||
|
Keep restic for the **host** bits (Stalwart mail store, k3s etcd snapshots), and
|
||||||
|
still take **logical DB dumps** (`pg_dump`/`mongodump`) into a Longhorn PVC —
|
||||||
|
Longhorn backs that up to S3 and a logical dump is what you actually restore
|
||||||
|
from. (Crash-consistent block snapshots of a live DB are a last resort.)
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
- Bump `defaultReplicaCount` to 2–3 in `values.yaml` (helm upgrade) once more
|
||||||
|
nodes join; Longhorn rebalances.
|
||||||
|
- The UI Ingress is intentionally **off** — it's full storage admin. Gate it
|
||||||
|
behind an IP allowlist or Authentik before exposing it.
|
||||||
@@ -0,0 +1,28 @@
|
|||||||
|
# Longhorn backup target credentials → Hetzner Object Storage (S3-compatible).
|
||||||
|
# Template — fill + apply OUT-OF-BAND, never commit real keys. Store the keys
|
||||||
|
# in Bitwarden.
|
||||||
|
#
|
||||||
|
# 1. Create a bucket (e.g. dezky-longhorn) + an S3 key pair in Hetzner Cloud
|
||||||
|
# Console → Object Storage. Note the endpoint, e.g.:
|
||||||
|
# Falkenstein https://fsn1.your-objectstorage.com
|
||||||
|
# Nuremberg https://nbg1.your-objectstorage.com
|
||||||
|
# Helsinki https://hel1.your-objectstorage.com
|
||||||
|
# 2. Fill this and apply:
|
||||||
|
# kubectl apply -f /tmp/longhorn-backup-secret.yaml
|
||||||
|
# 3. Set the backup target (UI: Settings → General, or in values.yaml):
|
||||||
|
# Backup Target: s3://dezky-longhorn@fsn1/
|
||||||
|
# Backup Target Credential: longhorn-backup-secret
|
||||||
|
# (The "@fsn1" region tag is just a label for non-AWS S3; the real endpoint
|
||||||
|
# comes from AWS_ENDPOINTS below.)
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Secret
|
||||||
|
metadata:
|
||||||
|
name: longhorn-backup-secret
|
||||||
|
namespace: longhorn-system
|
||||||
|
type: Opaque
|
||||||
|
stringData:
|
||||||
|
AWS_ACCESS_KEY_ID: REPLACE_hetzner_s3_access_key
|
||||||
|
AWS_SECRET_ACCESS_KEY: REPLACE_hetzner_s3_secret_key
|
||||||
|
AWS_ENDPOINTS: https://fsn1.your-objectstorage.com
|
||||||
|
# Hetzner Object Storage uses virtual-hosted-style addressing.
|
||||||
|
VIRTUAL_HOSTED_STYLE: "true"
|
||||||
@@ -0,0 +1,42 @@
|
|||||||
|
# Longhorn Helm values — single-node config for the dezky AX41 (node1).
|
||||||
|
# You install Longhorn; feed it these values, e.g.:
|
||||||
|
#
|
||||||
|
# helm repo add longhorn https://charts.longhorn.io && helm repo update
|
||||||
|
# helm install longhorn longhorn/longhorn \
|
||||||
|
# -n longhorn-system --create-namespace \
|
||||||
|
# --version 1.12.0 -f values.yaml
|
||||||
|
#
|
||||||
|
# (Or paste this into Rancher → Apps → Longhorn → Edit YAML.)
|
||||||
|
#
|
||||||
|
# Host prereq (added to bootstrap.sh): open-iscsi + a running iscsid + nfs-common
|
||||||
|
# on EVERY node. Verify: `systemctl is-active iscsid` → active.
|
||||||
|
|
||||||
|
defaultSettings:
|
||||||
|
# Single node → 1 replica. No cross-node redundancy yet (durability is the
|
||||||
|
# same as local disk, but you gain snapshots + off-box backups). Bump to 2–3
|
||||||
|
# once you add nodes and Longhorn will rebalance.
|
||||||
|
defaultReplicaCount: 1
|
||||||
|
# Replica data lives here on the AX41 NVMe.
|
||||||
|
defaultDataPath: /var/lib/longhorn
|
||||||
|
# Don't pack the disk to 100%.
|
||||||
|
storageMinimalAvailablePercentage: 15
|
||||||
|
storageOverProvisioningPercentage: 100
|
||||||
|
# Tidy up orphaned replicas automatically.
|
||||||
|
orphanResourceAutoDeletion: "replica-data"
|
||||||
|
# ── Backups → Hetzner Object Storage (set after creating the bucket+secret;
|
||||||
|
# see README). Can also be set in the UI under Settings → General. ──
|
||||||
|
# backupTarget: s3://dezky-longhorn@fsn1/
|
||||||
|
# backupTargetCredentialSecret: longhorn-backup-secret
|
||||||
|
|
||||||
|
persistence:
|
||||||
|
# Make Longhorn the DEFAULT StorageClass so PVCs land on it automatically.
|
||||||
|
# ALSO unset local-path's default flag (one default only — see README).
|
||||||
|
defaultClass: true
|
||||||
|
defaultClassReplicaCount: 1
|
||||||
|
# Databases: keep the volume if a PVC is deleted, until you reclaim it by hand.
|
||||||
|
reclaimPolicy: Retain
|
||||||
|
|
||||||
|
# The Longhorn UI is full storage admin — keep its Ingress OFF until you decide
|
||||||
|
# how to protect it (IP allowlist at Traefik, or behind Authentik forward-auth).
|
||||||
|
ingress:
|
||||||
|
enabled: false
|
||||||
@@ -63,8 +63,12 @@ apt-get upgrade -y -qq
|
|||||||
apt-get install -y -qq \
|
apt-get install -y -qq \
|
||||||
nftables fail2ban unattended-upgrades apt-listchanges \
|
nftables fail2ban unattended-upgrades apt-listchanges \
|
||||||
curl ca-certificates gnupg htop tmux vim chrony \
|
curl ca-certificates gnupg htop tmux vim chrony \
|
||||||
|
open-iscsi nfs-common \
|
||||||
>/dev/null
|
>/dev/null
|
||||||
ok "Base packages installed."
|
# Longhorn requires a running iscsid on every node; nfs-common is needed for
|
||||||
|
# RWX volumes / NFS backup targets.
|
||||||
|
systemctl enable --now iscsid >/dev/null 2>&1 || true
|
||||||
|
ok "Base packages installed (incl. Longhorn prereqs: open-iscsi, nfs-common)."
|
||||||
|
|
||||||
# ── Step 2: hostname + timezone + time sync ────────────────────────────────
|
# ── Step 2: hostname + timezone + time sync ────────────────────────────────
|
||||||
info "Step 2: Hostname, timezone (UTC), time sync..."
|
info "Step 2: Hostname, timezone (UTC), time sync..."
|
||||||
|
|||||||
Reference in New Issue
Block a user