# Dezky production — host layer OS baseline + firewall for the bare-metal **Hetzner AX41** that runs the k3s node. This layer is everything that lives on the *host* (outside Kubernetes): hardening, the k3s-safe firewall, and — added next — k3s registration, Stalwart mail, and Restic backups. Managed by **Fleet/Rancher** once k3s is up; this host layer is the part Fleet *can't* do, so it runs over SSH from reviewed scripts. ## Files | File | Purpose | |------|---------| | `config.env.example` | Template for host-specific values | | `config.env` | **Real values — gitignored.** Source of truth lives only on your machine/box | | `bootstrap.sh` | One-shot OS hardening: user, SSH, sysctl, swap, fail2ban, auto-updates, firewall | | `firewall/firewall.sh` | Renders + applies the k3s-safe nftables ruleset (idempotent) | | `firewall/dezky-firewall.service` | systemd unit; reapplies our table on boot, never flushes globally | | `k3s/register.sh` | Registers the node into Rancher (Custom k3s cluster); secrets from `config.env` | | `stalwart/install.sh` | Installs Stalwart as a hardened host service (binary, units, secrets, bootstrap cert) | | `stalwart/config.toml` | Production Stalwart config (mail ports on host, JMAP on internal 8080) | | `stalwart/stalwart-mail.service` | systemd unit; non-root + `CAP_NET_BIND_SERVICE` for low ports | | `stalwart/cert-sync.sh` + `*.service`/`*.timer` | Pulls the cert-manager mail cert into Stalwart, reloads on change | | `restic/install.sh` | Sets up Restic, the backup SSH key/config, env, and the nightly timer | | `restic/backup.sh` | Backup → primary Storage Box, retention, then `copy` → Helsinki DR | | `restic/restore.sh` | List/restore snapshots (run drills!) | | `restic/dezky-backup.service` + `.timer` | Nightly 03:20 UTC backup | ## The firewall model (read this) k3s, kube-proxy and flannel manage their **own** nftables tables (`ip`/`ip6`: `filter`, `nat`, `mangle`). The classic mistake is running `ufw`/`firewalld` or `nft flush ruleset`, which wipes or fights those rules and breaks pod networking. So instead: - We own a single dedicated table — **`inet dezky_fw`** — with only an INPUT chain (default `drop`). Separate tables coexist; a packet is dropped if *any* base chain drops it, so our default-drop INPUT gates host-bound traffic while k3s keeps owning FORWARD/NAT untouched. - We explicitly **accept the pod (`10.42.0.0/16`) and service (`10.43.0.0/16`) CIDRs and the CNI interfaces** (`cni0`, `flannel.1`) so cluster↔host traffic (API server, kubelet, CoreDNS) is never dropped. - We **never** `flush ruleset`. The systemd unit's `ExecStop` removes only our table. ### Access policy | Surface | Ports | Who | |---------|-------|-----| | Web + ACME | 80, 443 | **World** (customers) | | Mail | 25, 465, 587, 143, 993, 4190 | **World** | | SSH | 22 | **`MGMT_ALLOW_V4/V6` only** | | k3s API | 6443 | **`MGMT_ALLOW_V4/V6` only** | Current management allowlist: **home `46.32.144.38`**, **office `46.32.144.45`**. The Rancher plane (`91.99.122.153`) needs **no inbound rule** — the cluster agent dials *out* to Rancher over 443, so replies ride the established/related fast-path. ## Apply order > Prereqs: AX41 provisioned with **Debian 12 (bookworm)**, reachable as `root`. > `config.env` filled in — in particular `ADMIN_SSH_PUBKEY` and > `SERVER_PUBLIC_IPV4` (still TODO until the box exists). ```bash # From your laptop: scp -r infrastructure/production/host root@:/opt/dezky-host # On the server: ssh root@ cd /opt/dezky-host # config.env is gitignored, so copy it up separately or recreate it here: # cp config.env.example config.env && nano config.env ./bootstrap.sh ``` `bootstrap.sh` creates your admin user and installs your key **before** it disables root/password SSH, so the order is lockout-safe. It's idempotent — re-run anytime. To touch only the firewall later: ```bash sudo ./firewall/firewall.sh --dry-run # preview the ruleset sudo ./firewall/firewall.sh # render, validate, apply, install unit ``` ### Then register into Rancher Once the host is hardened, register the node as a **Custom k3s cluster** (create the cluster in Rancher first, choosing the **K3s** distribution, then paste its token/checksum into `config.env`): ```bash sudo ./k3s/register.sh # downloads agent installer, joins cluster journalctl -u rancher-system-agent -f # follow provisioning ``` Rancher is currently reached by IP, so the installer is fetched with `--insecure`; the agent's ongoing link is still verified via `--ca-checksum`. Give Rancher a real hostname + cert later to drop the insecure fetch. ### Then install Stalwart (mail) ```bash sudo ./stalwart/install.sh # binary + systemd + bootstrap cert systemctl status stalwart-mail ``` Requires `STALWART_ADMIN_PASSWORD` + `STALWART_WEBHOOK_SECRET` in `config.env` (`openssl rand -hex 24` / `-hex 32`). See the mail topology below. ## Mail (Stalwart) topology Stalwart runs on the **host**, not in k3s — mail must keep flowing regardless of cluster state, and SMTP/IMAP want the real public IP for reputation. The single public IP forces a deliberate split with Traefik: | Concern | Owner | Detail | |---------|-------|--------| | Mail protocol ports (25/465/587/143/993/4190) | **Stalwart (host)** | Bound on the public IP; opened to the world by the firewall | | Web/JMAP for `mail.dezky.eu:443` | **Traefik (k3s)** | Terminates TLS, reverse-proxies to Stalwart's internal `:8080` | | ACME / TLS issuance | **cert-manager (k3s)** | Issues `mail.dezky.eu` via HTTP-01; Stalwart runs no ACME (80/443 are Traefik's) | | Cert delivery to mail ports | **`cert-sync.sh` (host)** | Reads the cluster TLS secret via local kubeconfig, reloads Stalwart on change | | Storage | **RocksDB on host disk** | Intentionally independent of the in-cluster Postgres | | Domain/DKIM provisioning | **platform-api (k3s)** | JMAP management API at `http://:8080/jmap`, Basic auth | | Audit webhook | **Stalwart → platform-api** | POSTs to `https://api.dezky.eu/ingest/...`, HMAC-signed | **platform-api Fleet env** (must match the host's `config.env`): ``` STALWART_API_URL=http://:8080 STALWART_ADMIN_USER=admin STALWART_ADMIN_PASSWORD= STALWART_WEBHOOK_SECRET= STALWART_PROVISIONING_ENABLED=true ``` The firewall already lets the k3s pod CIDR reach host `:8080` while blocking the world, so no extra rule is needed. > **Forward dependency:** `cert-sync.sh` needs the fleet layer to create the > `mail/mail-tls` cert secret. Until then Stalwart serves the self-signed > bootstrap cert `install.sh` generated; the timer swaps in the real cert > automatically once it exists. ### Finally, backups ```bash sudo ./restic/install.sh # restic + key + nightly timer # upload the printed public key to BOTH Storage Boxes (port 23), then: sudo ./restic/install.sh # re-run to init the repos sudo /opt/dezky-backup/backup.sh # first backup (or wait for 03:20 UTC) ``` Needs `RESTIC_PASSWORD` + `BACKUP_PRIMARY_REPO` (+ `BACKUP_DR_REPO`) in `config.env`. See backups below. ## Backups (Restic) Nightly at **03:20 UTC**: back up to the **primary Storage Box**, apply retention, `restic check`, then a dedup-aware **`copy` to the Helsinki DR box**. | What | Why | |------|-----| | `/opt/stalwart/data` + `/etc` | Mail store (RocksDB) + config — the crown jewels | | `/var/lib/rancher/k3s/server/db/snapshots` | k3s **etcd snapshots** (cluster state) | | `/var/lib/rancher/k3s/storage` | local-path PVCs — incl. where fleet `pg_dump`/`mongodump` CronJobs land | - **Retention:** 7 daily · 4 weekly · 6 monthly (tunable via `BACKUP_RETENTION`). - **Storage Box quirk:** SSH/SFTP on **port 23**, key auth. A single ssh-config wildcard covers both boxes, so one key + `restic copy` mirrors primary → DR. - **Encryption:** repos are Restic-encrypted with `RESTIC_PASSWORD`. **Store it offline** — losing it makes every backup unrecoverable. - **Alerting:** set `BACKUP_HEALTHCHECK_URL` (e.g. healthchecks.io) for a dead-man's switch — get paged when a nightly run is missed, not when you need to restore. > **Database consistency:** live DB files in PVCs are crash-consistent at best. > The reliable path is logical dumps — the **fleet layer** adds `pg_dump` / > `mongodump` CronJobs that write into a backup PVC under > `/var/lib/rancher/k3s/storage`, which Restic then captures. Restore those > dumps, not the raw data dirs. **Run restore drills.** A backup you've never restored isn't a backup: ```bash sudo /opt/dezky-backup/restore.sh snapshots sudo /opt/dezky-backup/restore.sh restore latest /tmp/restore-test ``` ## ⚠️ Lockout safety - **Always** open a second SSH session and confirm access **before** closing the one you ran bootstrap in. - Management is pinned to home + office IPs. **Residential IPs can change** — if yours does, you'll be locked out of SSH/6443 (public services stay up). - **Break-glass:** Hetzner's **KVM/LARA** console (Robot panel) is out-of-band and bypasses the firewall entirely. From there you can edit `/etc/nftables.d/dezky-fw.nft` or update `config.env` + re-run `firewall.sh`. - If your IP changes often, widen `MGMT_ALLOW_V4` to a small prefix, or we add a WireGuard bastion later. ## Verifying after apply ```bash sudo nft list table inet dezky_fw # our rules sudo nft list ruleset | grep -c KUBE # k3s rules still present (>0 once k3s runs) sudo systemctl status dezky-firewall # enabled + active (exited) sudo fail2ban-client status sshd # jail active # From a NON-allowlisted network, `ssh` should hang/timeout; 443 should work. ``` ## Host layer status **Complete:** hardening ✅ · firewall ✅ · k3s registration ✅ · Stalwart ✅ · backups ✅. Next is the **Fleet/GitOps layer** (`infrastructure/production/fleet/`): cert-manager + `ClusterIssuer`, ingress, the data tier (Postgres/Mongo/Redis), Authentik, OCIS + Collabora, and portal + platform-api — plus the `mail/mail-tls` cert and the DB-dump CronJobs this layer's `cert-sync` and backups depend on.