Files
dezky/infrastructure/production/RUNBOOK.md
T
Ronni Baslund 153d7053ca
ci / typecheck (map[dir:apps/website name:website]) (push) Failing after 10m58s
ci / typecheck (map[dir:apps/portal name:portal]) (push) Failing after 11m56s
ci / typecheck (map[dir:apps/booking name:booking]) (push) Failing after 14m0s
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
feat(infra): k3s foundation — cert-manager, Longhorn config, in-cluster data tier
Adds the production cluster foundation (authored + applied live on node1):
- cert-manager via the k3s HelmChart controller + letsencrypt staging/prod
  ClusterIssuers (HTTP-01 / Traefik).
- Longhorn config for single-node (values: replica=1, default StorageClass,
  Retain) + backup-to-Hetzner-Object-Storage credential template.
- In-cluster data tier (dezky-data): Postgres 16 (with Authentik+OCIS DB init),
  MongoDB 7, Redis 7 as StatefulSets on Longhorn, + secret template.
- bootstrap.sh: install open-iscsi/nfs-common + enable iscsid (Longhorn prereq).
- RUNBOOK.md: full reproducible node1 build order.

Real secrets are generated on-box and kept in Bitwarden — never in git.
2026-06-08 18:39:31 +02:00

115 lines
5.3 KiB
Markdown

# Dezky production — node1 build runbook
The actual, reproducible order used to stand up **node1.dezky.eu** (Hetzner
AX41, `46.4.78.187`, Ubuntu 24.04). If the box is lost, follow this top to
bottom to rebuild it. Per-layer detail lives in `host/README.md`,
`fleet/cert-manager/`, `fleet/longhorn/`, `fleet/data/`.
> Secrets are **never** in git. They're generated with `openssl rand -hex 24`
> and stored in **Bitwarden**. See "Secrets" below for how to read the live
> values back out of the cluster.
## Current state (built 2026-06-08)
- **Host:** hardened via `host/bootstrap.sh``dezky` admin user, **key-only
SSH** (no root, no passwords), k3s-safe nftables firewall (SSH/6443 → mgmt
IPs `46.32.144.38`/`46.32.144.45`; 80/443+mail → world), fail2ban,
unattended-upgrades, `open-iscsi`+`iscsid` (Longhorn prereq).
`dezky` has **NOPASSWD sudo** (`/etc/sudoers.d/90-dezky`).
- **k3s** v1.33.11 — single node (control-plane/etcd/worker), registered in
Rancher (`91.99.122.153`).
- **Longhorn** — default StorageClass, `numberOfReplicas: 1` (single node).
- **cert-manager** + `letsencrypt-staging` / `letsencrypt-prod` (HTTP-01/Traefik).
- **Data tier** (`dezky-data` ns) — Postgres 16, Mongo 7, Redis 7 as
StatefulSets on Longhorn PVCs. Postgres holds the `authentik` + `ocis` DBs.
## Reproduce from scratch
### 1. Host layer
```bash
# from laptop
scp -r infrastructure/production/host root@<ip>:/opt/dezky-host
# copy/fill config.env on the box (gitignored — MGMT IPs, ADMIN_SSH_PUBKEY,
# RANCHER_* token/checksum, STALWART_*, RESTIC_*)
ssh root@<ip> 'cd /opt/dezky-host && ./bootstrap.sh'
# set a console/sudo password for the admin user, then (optional) NOPASSWD:
ssh root@<ip> 'passwd dezky'
ssh dezky@<ip> "echo 'dezky ALL=(ALL) NOPASSWD:ALL' | sudo tee /etc/sudoers.d/90-dezky && sudo chmod 0440 /etc/sudoers.d/90-dezky"
```
### 2. k3s + kubectl access
```bash
ssh dezky@<ip>
sudo /opt/dezky-host/k3s/register.sh # joins the Rancher Custom (K3s) cluster
kubectl --kubeconfig /etc/rancher/k3s/k3s.yaml get nodes # -> Ready
# give dezky a kubeconfig:
mkdir -p ~/.kube && sudo install -m 600 -o dezky -g dezky /etc/rancher/k3s/k3s.yaml ~/.kube/config
```
### 3. Longhorn (storage)
```bash
sudo apt-get install -y open-iscsi nfs-common && sudo systemctl enable --now iscsid # (bootstrap.sh does this now)
helm repo add longhorn https://charts.longhorn.io && helm repo update
helm install longhorn longhorn/longhorn -n longhorn-system --create-namespace \
--version 1.12.0 -f fleet/longhorn/values.yaml # replica=1, default class
# one default SC only:
kubectl patch storageclass local-path -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
kubectl -n longhorn-system patch settings.longhorn.io default-replica-count --type=merge -p '{"value":"1"}'
kubectl get storageclass # only 'longhorn (default)'
```
### 4. cert-manager + issuers
```bash
kubectl apply -f fleet/cert-manager/cert-manager.yaml
kubectl -n cert-manager rollout status deploy/cert-manager-webhook --timeout=180s
kubectl apply -f fleet/cert-manager/cluster-issuer.yaml
kubectl get clusterissuer # both READY=True
```
### 5. Data tier
```bash
kubectl create namespace dezky-data --dry-run=client -o yaml | kubectl apply -f -
# secrets — generate fresh, store in Bitwarden:
kubectl -n dezky-data create secret generic postgres-secret \
--from-literal=POSTGRES_PASSWORD=$(openssl rand -hex 24) \
--from-literal=AUTHENTIK_DB_PASSWORD=$(openssl rand -hex 24) \
--from-literal=OCIS_DB_PASSWORD=$(openssl rand -hex 24)
kubectl -n dezky-data create secret generic mongo-secret \
--from-literal=root-username=dezky --from-literal=root-password=$(openssl rand -hex 24)
kubectl -n dezky-data create secret generic redis-secret \
--from-literal=REDIS_PASSWORD=$(openssl rand -hex 24)
kubectl apply -k fleet/data/
kubectl -n dezky-data get pods,pvc # all Running, PVCs Bound on longhorn
```
## Secrets — read live values for Bitwarden
```bash
k(){ kubectl -n dezky-data get secret "$1" -o jsonpath="{.data.$2}" | base64 -d; echo; }
k postgres-secret POSTGRES_PASSWORD
k postgres-secret AUTHENTIK_DB_PASSWORD # must match Authentik's DB config
k postgres-secret OCIS_DB_PASSWORD # must match OCIS's DB config
k mongo-secret root-password
k redis-secret REDIS_PASSWORD
```
## Still TODO (next layers)
1. **Authentik** (`auth.dezky.eu`) — OIDC for the portal; uses the `authentik`
Postgres DB + Redis.
2. **OCIS** (files) — uses the `ocis` Postgres DB + Hetzner Object Storage (S3).
3. **Apps**`fleet/apps/` (portal · platform-api · booking) + their secrets.
4. **Stalwart** (host) — `host/stalwart/install.sh`; needs DNS + PTR.
5. **Backups** — Longhorn → Hetzner Object Storage (`fleet/longhorn/README.md`),
plus host Restic for the mail store + etcd snapshots, plus pg_dump/mongodump
CronJobs.
6. **DNS** — A records `api`/`app`/`booking`/`auth`/`mail`.dezky.eu → 46.4.78.187,
and PTR for mail.
## Access cheatsheet
- SSH: `ssh dezky@46.4.78.187` (key only). Root SSH disabled.
- kubectl: works as `dezky` (kubeconfig at `~/.kube/config`).
- Out-of-band if locked out: Hetzner Robot KVM/LARA or Rescue System.
- The `level=warning … 50-rancher.yaml: permission denied` from kubectl is
harmless noise (k3s kubectl probing a root-only config dir).