feat(infra): k3s foundation — cert-manager, Longhorn config, in-cluster data tier
ci / typecheck (map[dir:apps/website name:website]) (push) Failing after 10m58s
ci / typecheck (map[dir:apps/portal name:portal]) (push) Failing after 11m56s
ci / typecheck (map[dir:apps/booking name:booking]) (push) Failing after 14m0s
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Failing after 10m58s
ci / typecheck (map[dir:apps/portal name:portal]) (push) Failing after 11m56s
ci / typecheck (map[dir:apps/booking name:booking]) (push) Failing after 14m0s
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
Adds the production cluster foundation (authored + applied live on node1): - cert-manager via the k3s HelmChart controller + letsencrypt staging/prod ClusterIssuers (HTTP-01 / Traefik). - Longhorn config for single-node (values: replica=1, default StorageClass, Retain) + backup-to-Hetzner-Object-Storage credential template. - In-cluster data tier (dezky-data): Postgres 16 (with Authentik+OCIS DB init), MongoDB 7, Redis 7 as StatefulSets on Longhorn, + secret template. - bootstrap.sh: install open-iscsi/nfs-common + enable iscsid (Longhorn prereq). - RUNBOOK.md: full reproducible node1 build order. Real secrets are generated on-box and kept in Bitwarden — never in git.
This commit is contained in:
@@ -0,0 +1,114 @@
|
||||
# Dezky production — node1 build runbook
|
||||
|
||||
The actual, reproducible order used to stand up **node1.dezky.eu** (Hetzner
|
||||
AX41, `46.4.78.187`, Ubuntu 24.04). If the box is lost, follow this top to
|
||||
bottom to rebuild it. Per-layer detail lives in `host/README.md`,
|
||||
`fleet/cert-manager/`, `fleet/longhorn/`, `fleet/data/`.
|
||||
|
||||
> Secrets are **never** in git. They're generated with `openssl rand -hex 24`
|
||||
> and stored in **Bitwarden**. See "Secrets" below for how to read the live
|
||||
> values back out of the cluster.
|
||||
|
||||
## Current state (built 2026-06-08)
|
||||
|
||||
- **Host:** hardened via `host/bootstrap.sh` — `dezky` admin user, **key-only
|
||||
SSH** (no root, no passwords), k3s-safe nftables firewall (SSH/6443 → mgmt
|
||||
IPs `46.32.144.38`/`46.32.144.45`; 80/443+mail → world), fail2ban,
|
||||
unattended-upgrades, `open-iscsi`+`iscsid` (Longhorn prereq).
|
||||
`dezky` has **NOPASSWD sudo** (`/etc/sudoers.d/90-dezky`).
|
||||
- **k3s** v1.33.11 — single node (control-plane/etcd/worker), registered in
|
||||
Rancher (`91.99.122.153`).
|
||||
- **Longhorn** — default StorageClass, `numberOfReplicas: 1` (single node).
|
||||
- **cert-manager** + `letsencrypt-staging` / `letsencrypt-prod` (HTTP-01/Traefik).
|
||||
- **Data tier** (`dezky-data` ns) — Postgres 16, Mongo 7, Redis 7 as
|
||||
StatefulSets on Longhorn PVCs. Postgres holds the `authentik` + `ocis` DBs.
|
||||
|
||||
## Reproduce from scratch
|
||||
|
||||
### 1. Host layer
|
||||
```bash
|
||||
# from laptop
|
||||
scp -r infrastructure/production/host root@<ip>:/opt/dezky-host
|
||||
# copy/fill config.env on the box (gitignored — MGMT IPs, ADMIN_SSH_PUBKEY,
|
||||
# RANCHER_* token/checksum, STALWART_*, RESTIC_*)
|
||||
ssh root@<ip> 'cd /opt/dezky-host && ./bootstrap.sh'
|
||||
# set a console/sudo password for the admin user, then (optional) NOPASSWD:
|
||||
ssh root@<ip> 'passwd dezky'
|
||||
ssh dezky@<ip> "echo 'dezky ALL=(ALL) NOPASSWD:ALL' | sudo tee /etc/sudoers.d/90-dezky && sudo chmod 0440 /etc/sudoers.d/90-dezky"
|
||||
```
|
||||
|
||||
### 2. k3s + kubectl access
|
||||
```bash
|
||||
ssh dezky@<ip>
|
||||
sudo /opt/dezky-host/k3s/register.sh # joins the Rancher Custom (K3s) cluster
|
||||
kubectl --kubeconfig /etc/rancher/k3s/k3s.yaml get nodes # -> Ready
|
||||
# give dezky a kubeconfig:
|
||||
mkdir -p ~/.kube && sudo install -m 600 -o dezky -g dezky /etc/rancher/k3s/k3s.yaml ~/.kube/config
|
||||
```
|
||||
|
||||
### 3. Longhorn (storage)
|
||||
```bash
|
||||
sudo apt-get install -y open-iscsi nfs-common && sudo systemctl enable --now iscsid # (bootstrap.sh does this now)
|
||||
helm repo add longhorn https://charts.longhorn.io && helm repo update
|
||||
helm install longhorn longhorn/longhorn -n longhorn-system --create-namespace \
|
||||
--version 1.12.0 -f fleet/longhorn/values.yaml # replica=1, default class
|
||||
# one default SC only:
|
||||
kubectl patch storageclass local-path -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
|
||||
kubectl -n longhorn-system patch settings.longhorn.io default-replica-count --type=merge -p '{"value":"1"}'
|
||||
kubectl get storageclass # only 'longhorn (default)'
|
||||
```
|
||||
|
||||
### 4. cert-manager + issuers
|
||||
```bash
|
||||
kubectl apply -f fleet/cert-manager/cert-manager.yaml
|
||||
kubectl -n cert-manager rollout status deploy/cert-manager-webhook --timeout=180s
|
||||
kubectl apply -f fleet/cert-manager/cluster-issuer.yaml
|
||||
kubectl get clusterissuer # both READY=True
|
||||
```
|
||||
|
||||
### 5. Data tier
|
||||
```bash
|
||||
kubectl create namespace dezky-data --dry-run=client -o yaml | kubectl apply -f -
|
||||
# secrets — generate fresh, store in Bitwarden:
|
||||
kubectl -n dezky-data create secret generic postgres-secret \
|
||||
--from-literal=POSTGRES_PASSWORD=$(openssl rand -hex 24) \
|
||||
--from-literal=AUTHENTIK_DB_PASSWORD=$(openssl rand -hex 24) \
|
||||
--from-literal=OCIS_DB_PASSWORD=$(openssl rand -hex 24)
|
||||
kubectl -n dezky-data create secret generic mongo-secret \
|
||||
--from-literal=root-username=dezky --from-literal=root-password=$(openssl rand -hex 24)
|
||||
kubectl -n dezky-data create secret generic redis-secret \
|
||||
--from-literal=REDIS_PASSWORD=$(openssl rand -hex 24)
|
||||
kubectl apply -k fleet/data/
|
||||
kubectl -n dezky-data get pods,pvc # all Running, PVCs Bound on longhorn
|
||||
```
|
||||
|
||||
## Secrets — read live values for Bitwarden
|
||||
|
||||
```bash
|
||||
k(){ kubectl -n dezky-data get secret "$1" -o jsonpath="{.data.$2}" | base64 -d; echo; }
|
||||
k postgres-secret POSTGRES_PASSWORD
|
||||
k postgres-secret AUTHENTIK_DB_PASSWORD # must match Authentik's DB config
|
||||
k postgres-secret OCIS_DB_PASSWORD # must match OCIS's DB config
|
||||
k mongo-secret root-password
|
||||
k redis-secret REDIS_PASSWORD
|
||||
```
|
||||
|
||||
## Still TODO (next layers)
|
||||
|
||||
1. **Authentik** (`auth.dezky.eu`) — OIDC for the portal; uses the `authentik`
|
||||
Postgres DB + Redis.
|
||||
2. **OCIS** (files) — uses the `ocis` Postgres DB + Hetzner Object Storage (S3).
|
||||
3. **Apps** — `fleet/apps/` (portal · platform-api · booking) + their secrets.
|
||||
4. **Stalwart** (host) — `host/stalwart/install.sh`; needs DNS + PTR.
|
||||
5. **Backups** — Longhorn → Hetzner Object Storage (`fleet/longhorn/README.md`),
|
||||
plus host Restic for the mail store + etcd snapshots, plus pg_dump/mongodump
|
||||
CronJobs.
|
||||
6. **DNS** — A records `api`/`app`/`booking`/`auth`/`mail`.dezky.eu → 46.4.78.187,
|
||||
and PTR for mail.
|
||||
|
||||
## Access cheatsheet
|
||||
- SSH: `ssh dezky@46.4.78.187` (key only). Root SSH disabled.
|
||||
- kubectl: works as `dezky` (kubeconfig at `~/.kube/config`).
|
||||
- Out-of-band if locked out: Hetzner Robot KVM/LARA or Rescue System.
|
||||
- The `level=warning … 50-rancher.yaml: permission denied` from kubectl is
|
||||
harmless noise (k3s kubectl probing a root-only config dir).
|
||||
Reference in New Issue
Block a user