feat(infra): k3s foundation — cert-manager, Longhorn config, in-cluster data tier
ci / typecheck (map[dir:apps/website name:website]) (push) Failing after 10m58s
ci / typecheck (map[dir:apps/portal name:portal]) (push) Failing after 11m56s
ci / typecheck (map[dir:apps/booking name:booking]) (push) Failing after 14m0s
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled

Adds the production cluster foundation (authored + applied live on node1):
- cert-manager via the k3s HelmChart controller + letsencrypt staging/prod
  ClusterIssuers (HTTP-01 / Traefik).
- Longhorn config for single-node (values: replica=1, default StorageClass,
  Retain) + backup-to-Hetzner-Object-Storage credential template.
- In-cluster data tier (dezky-data): Postgres 16 (with Authentik+OCIS DB init),
  MongoDB 7, Redis 7 as StatefulSets on Longhorn, + secret template.
- bootstrap.sh: install open-iscsi/nfs-common + enable iscsid (Longhorn prereq).
- RUNBOOK.md: full reproducible node1 build order.

Real secrets are generated on-box and kept in Bitwarden — never in git.
This commit is contained in:
Ronni Baslund
2026-06-08 18:39:31 +02:00
parent 65a68ee126
commit 153d7053ca
17 changed files with 733 additions and 1 deletions
@@ -0,0 +1,29 @@
# fleet/cert-manager — TLS for the cluster
cert-manager + ACME ClusterIssuers. Installs via the **k3s built-in Helm
controller** (no Helm CLI needed), then defines `letsencrypt-staging` and
`letsencrypt-prod` (HTTP-01 through the bundled Traefik).
## Apply order (matters — issuers need the CRDs first)
```bash
# 1) Install cert-manager
kubectl apply -f cert-manager.yaml
# 2) Wait until it's up (CRDs + webhook ready)
kubectl -n cert-manager rollout status deploy/cert-manager-webhook --timeout=180s
kubectl -n cert-manager get pods
# 3) Create the issuers
kubectl apply -f cluster-issuer.yaml
kubectl get clusterissuer # both should report READY=True
```
## Notes
- ACME email is `info@dezky.eu` — change in `cluster-issuer.yaml` if needed.
- **Test with `letsencrypt-staging` first** (set an Ingress annotation
`cert-manager.io/cluster-issuer: letsencrypt-staging`) to avoid burning the
strict prod rate limits, then switch the apps to `letsencrypt-prod`.
- HTTP-01 requires each hostname's DNS A record → `46.4.78.187` and port 80
open (already true). A cert won't issue until DNS resolves.
- The app Ingresses (`fleet/apps/`) already reference `letsencrypt-prod`.
@@ -0,0 +1,37 @@
# cert-manager, installed via the k3s built-in Helm controller
# (helm.cattle.io/v1). k3s watches HelmChart resources in any namespace and
# runs a `helm install` Job for them — no Helm CLI needed on your laptop.
#
# The chart installs its own CRDs (crds.enabled=true). Apply this first and
# wait for the cert-manager pods to be Running/Ready before applying the
# ClusterIssuers (cluster-issuer.yaml) — the issuers need the CRDs + webhook.
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: cert-manager
namespace: kube-system
spec:
repo: https://charts.jetstack.io
chart: cert-manager
# Pin a version; bump to the latest stable when you upgrade.
version: v1.16.2
targetNamespace: cert-manager
createNamespace: true
valuesContent: |-
crds:
enabled: true
# Single-node box — keep the footprint modest.
resources:
requests:
cpu: 10m
memory: 64Mi
webhook:
resources:
requests:
cpu: 10m
memory: 32Mi
cainjector:
resources:
requests:
cpu: 10m
memory: 64Mi
@@ -0,0 +1,43 @@
# ACME ClusterIssuers (HTTP-01 via the k3s-bundled Traefik ingress).
#
# Apply ONLY after cert-manager is Running:
# kubectl -n cert-manager rollout status deploy/cert-manager-webhook
#
# Two issuers:
# - letsencrypt-staging : use while testing (high rate limits, UNTRUSTED
# certs). Point an Ingress at this first to prove the HTTP-01 flow works.
# - letsencrypt-prod : the real one the app Ingresses reference. Switch to
# it once staging issues cleanly, to avoid burning Let's Encrypt's strict
# prod rate limits on misconfigurations.
#
# HTTP-01 needs the hostname to resolve to this box (DNS A record -> 46.4.78.187)
# and port 80 reachable — both are already true (firewall opens 80 to the world).
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: info@dezky.eu
privateKeySecretRef:
name: letsencrypt-staging-account-key
solvers:
- http01:
ingress:
class: traefik
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: info@dezky.eu
privateKeySecretRef:
name: letsencrypt-prod-account-key
solvers:
- http01:
ingress:
class: traefik