From 3831c852855bee9dc535e7be00f404f04ea2dcee Mon Sep 17 00:00:00 2001 From: Ronni Baslund Date: Sun, 7 Jun 2026 00:19:48 +0200 Subject: [PATCH] feat(infra): production host bootstrap and bare-metal Stalwart scaffolding Host provisioning for the single-server production target: SSH + firewall hardening (nftables allowlist), k3s node registration, bare-metal Stalwart install with systemd units and TLS cert-sync from the cluster secret, and Restic encrypted backup/restore (primary + DR) with timer units. Host-specific secrets live in config.env (gitignored); config.env.example is the template. Also gitignores MemPalace per-project files. --- .gitignore | 7 + infrastructure/production/host/README.md | 227 ++++++++++++++++++ infrastructure/production/host/bootstrap.sh | 192 +++++++++++++++ .../production/host/config.env.example | 59 +++++ .../host/firewall/dezky-firewall.service | 27 +++ .../production/host/firewall/firewall.sh | 160 ++++++++++++ .../production/host/k3s/register.sh | 93 +++++++ .../production/host/restic/backup.sh | 86 +++++++ .../host/restic/dezky-backup.service | 13 + .../production/host/restic/dezky-backup.timer | 12 + .../production/host/restic/install.sh | 115 +++++++++ .../production/host/restic/restore.sh | 57 +++++ .../production/host/stalwart/cert-sync.sh | 77 ++++++ .../production/host/stalwart/config.toml | 102 ++++++++ .../production/host/stalwart/install.sh | 144 +++++++++++ .../host/stalwart/stalwart-cert-sync.service | 10 + .../host/stalwart/stalwart-cert-sync.timer | 12 + .../host/stalwart/stalwart-mail.service | 39 +++ 18 files changed, 1432 insertions(+) create mode 100644 infrastructure/production/host/README.md create mode 100755 infrastructure/production/host/bootstrap.sh create mode 100644 infrastructure/production/host/config.env.example create mode 100644 infrastructure/production/host/firewall/dezky-firewall.service create mode 100755 infrastructure/production/host/firewall/firewall.sh create mode 100755 infrastructure/production/host/k3s/register.sh create mode 100755 infrastructure/production/host/restic/backup.sh create mode 100644 infrastructure/production/host/restic/dezky-backup.service create mode 100644 infrastructure/production/host/restic/dezky-backup.timer create mode 100755 infrastructure/production/host/restic/install.sh create mode 100755 infrastructure/production/host/restic/restore.sh create mode 100755 infrastructure/production/host/stalwart/cert-sync.sh create mode 100644 infrastructure/production/host/stalwart/config.toml create mode 100755 infrastructure/production/host/stalwart/install.sh create mode 100644 infrastructure/production/host/stalwart/stalwart-cert-sync.service create mode 100644 infrastructure/production/host/stalwart/stalwart-cert-sync.timer create mode 100644 infrastructure/production/host/stalwart/stalwart-mail.service diff --git a/.gitignore b/.gitignore index 61cd4ad..16a7577 100644 --- a/.gitignore +++ b/.gitignore @@ -3,6 +3,9 @@ .env.local .env.*.local +# Production host config (real IPs / SSH key — keep out of git) +infrastructure/production/host/config.env + # TLS certificates (mkcert generated) infrastructure/docker-compose/certs/*.pem @@ -41,3 +44,7 @@ coverage/ # Temporary tmp/ .tmp/ + +# MemPalace per-project files (issue #185) +mempalace.yaml +entities.json diff --git a/infrastructure/production/host/README.md b/infrastructure/production/host/README.md new file mode 100644 index 0000000..86e82cf --- /dev/null +++ b/infrastructure/production/host/README.md @@ -0,0 +1,227 @@ +# Dezky production — host layer + +OS baseline + firewall for the bare-metal **Hetzner AX41** that runs the k3s +node. This layer is everything that lives on the *host* (outside Kubernetes): +hardening, the k3s-safe firewall, and — added next — k3s registration, Stalwart +mail, and Restic backups. + +Managed by **Fleet/Rancher** once k3s is up; this host layer is the part Fleet +*can't* do, so it runs over SSH from reviewed scripts. + +## Files + +| File | Purpose | +|------|---------| +| `config.env.example` | Template for host-specific values | +| `config.env` | **Real values — gitignored.** Source of truth lives only on your machine/box | +| `bootstrap.sh` | One-shot OS hardening: user, SSH, sysctl, swap, fail2ban, auto-updates, firewall | +| `firewall/firewall.sh` | Renders + applies the k3s-safe nftables ruleset (idempotent) | +| `firewall/dezky-firewall.service` | systemd unit; reapplies our table on boot, never flushes globally | +| `k3s/register.sh` | Registers the node into Rancher (Custom k3s cluster); secrets from `config.env` | +| `stalwart/install.sh` | Installs Stalwart as a hardened host service (binary, units, secrets, bootstrap cert) | +| `stalwart/config.toml` | Production Stalwart config (mail ports on host, JMAP on internal 8080) | +| `stalwart/stalwart-mail.service` | systemd unit; non-root + `CAP_NET_BIND_SERVICE` for low ports | +| `stalwart/cert-sync.sh` + `*.service`/`*.timer` | Pulls the cert-manager mail cert into Stalwart, reloads on change | +| `restic/install.sh` | Sets up Restic, the backup SSH key/config, env, and the nightly timer | +| `restic/backup.sh` | Backup → primary Storage Box, retention, then `copy` → Helsinki DR | +| `restic/restore.sh` | List/restore snapshots (run drills!) | +| `restic/dezky-backup.service` + `.timer` | Nightly 03:20 UTC backup | + +## The firewall model (read this) + +k3s, kube-proxy and flannel manage their **own** nftables tables (`ip`/`ip6`: +`filter`, `nat`, `mangle`). The classic mistake is running `ufw`/`firewalld` or +`nft flush ruleset`, which wipes or fights those rules and breaks pod networking. + +So instead: + +- We own a single dedicated table — **`inet dezky_fw`** — with only an INPUT + chain (default `drop`). Separate tables coexist; a packet is dropped if *any* + base chain drops it, so our default-drop INPUT gates host-bound traffic while + k3s keeps owning FORWARD/NAT untouched. +- We explicitly **accept the pod (`10.42.0.0/16`) and service (`10.43.0.0/16`) + CIDRs and the CNI interfaces** (`cni0`, `flannel.1`) so cluster↔host traffic + (API server, kubelet, CoreDNS) is never dropped. +- We **never** `flush ruleset`. The systemd unit's `ExecStop` removes only our + table. + +### Access policy + +| Surface | Ports | Who | +|---------|-------|-----| +| Web + ACME | 80, 443 | **World** (customers) | +| Mail | 25, 465, 587, 143, 993, 4190 | **World** | +| SSH | 22 | **`MGMT_ALLOW_V4/V6` only** | +| k3s API | 6443 | **`MGMT_ALLOW_V4/V6` only** | + +Current management allowlist: **home `46.32.144.38`**, **office `46.32.144.45`**. + +The Rancher plane (`91.99.122.153`) needs **no inbound rule** — the cluster +agent dials *out* to Rancher over 443, so replies ride the established/related +fast-path. + +## Apply order + +> Prereqs: AX41 provisioned with **Debian 12 (bookworm)**, reachable as `root`. +> `config.env` filled in — in particular `ADMIN_SSH_PUBKEY` and +> `SERVER_PUBLIC_IPV4` (still TODO until the box exists). + +```bash +# From your laptop: +scp -r infrastructure/production/host root@:/opt/dezky-host + +# On the server: +ssh root@ +cd /opt/dezky-host +# config.env is gitignored, so copy it up separately or recreate it here: +# cp config.env.example config.env && nano config.env +./bootstrap.sh +``` + +`bootstrap.sh` creates your admin user and installs your key **before** it +disables root/password SSH, so the order is lockout-safe. It's idempotent — +re-run anytime. + +To touch only the firewall later: + +```bash +sudo ./firewall/firewall.sh --dry-run # preview the ruleset +sudo ./firewall/firewall.sh # render, validate, apply, install unit +``` + +### Then register into Rancher + +Once the host is hardened, register the node as a **Custom k3s cluster** +(create the cluster in Rancher first, choosing the **K3s** distribution, then +paste its token/checksum into `config.env`): + +```bash +sudo ./k3s/register.sh # downloads agent installer, joins cluster +journalctl -u rancher-system-agent -f # follow provisioning +``` + +Rancher is currently reached by IP, so the installer is fetched with +`--insecure`; the agent's ongoing link is still verified via `--ca-checksum`. +Give Rancher a real hostname + cert later to drop the insecure fetch. + +### Then install Stalwart (mail) + +```bash +sudo ./stalwart/install.sh # binary + systemd + bootstrap cert +systemctl status stalwart-mail +``` + +Requires `STALWART_ADMIN_PASSWORD` + `STALWART_WEBHOOK_SECRET` in `config.env` +(`openssl rand -hex 24` / `-hex 32`). See the mail topology below. + +## Mail (Stalwart) topology + +Stalwart runs on the **host**, not in k3s — mail must keep flowing regardless of +cluster state, and SMTP/IMAP want the real public IP for reputation. The single +public IP forces a deliberate split with Traefik: + +| Concern | Owner | Detail | +|---------|-------|--------| +| Mail protocol ports (25/465/587/143/993/4190) | **Stalwart (host)** | Bound on the public IP; opened to the world by the firewall | +| Web/JMAP for `mail.dezky.eu:443` | **Traefik (k3s)** | Terminates TLS, reverse-proxies to Stalwart's internal `:8080` | +| ACME / TLS issuance | **cert-manager (k3s)** | Issues `mail.dezky.eu` via HTTP-01; Stalwart runs no ACME (80/443 are Traefik's) | +| Cert delivery to mail ports | **`cert-sync.sh` (host)** | Reads the cluster TLS secret via local kubeconfig, reloads Stalwart on change | +| Storage | **RocksDB on host disk** | Intentionally independent of the in-cluster Postgres | +| Domain/DKIM provisioning | **platform-api (k3s)** | JMAP management API at `http://:8080/jmap`, Basic auth | +| Audit webhook | **Stalwart → platform-api** | POSTs to `https://api.dezky.eu/ingest/...`, HMAC-signed | + +**platform-api Fleet env** (must match the host's `config.env`): + +``` +STALWART_API_URL=http://:8080 +STALWART_ADMIN_USER=admin +STALWART_ADMIN_PASSWORD= +STALWART_WEBHOOK_SECRET= +STALWART_PROVISIONING_ENABLED=true +``` + +The firewall already lets the k3s pod CIDR reach host `:8080` while blocking the +world, so no extra rule is needed. + +> **Forward dependency:** `cert-sync.sh` needs the fleet layer to create the +> `mail/mail-tls` cert secret. Until then Stalwart serves the self-signed +> bootstrap cert `install.sh` generated; the timer swaps in the real cert +> automatically once it exists. + +### Finally, backups + +```bash +sudo ./restic/install.sh # restic + key + nightly timer +# upload the printed public key to BOTH Storage Boxes (port 23), then: +sudo ./restic/install.sh # re-run to init the repos +sudo /opt/dezky-backup/backup.sh # first backup (or wait for 03:20 UTC) +``` + +Needs `RESTIC_PASSWORD` + `BACKUP_PRIMARY_REPO` (+ `BACKUP_DR_REPO`) in +`config.env`. See backups below. + +## Backups (Restic) + +Nightly at **03:20 UTC**: back up to the **primary Storage Box**, apply +retention, `restic check`, then a dedup-aware **`copy` to the Helsinki DR box**. + +| What | Why | +|------|-----| +| `/opt/stalwart/data` + `/etc` | Mail store (RocksDB) + config — the crown jewels | +| `/var/lib/rancher/k3s/server/db/snapshots` | k3s **etcd snapshots** (cluster state) | +| `/var/lib/rancher/k3s/storage` | local-path PVCs — incl. where fleet `pg_dump`/`mongodump` CronJobs land | + +- **Retention:** 7 daily · 4 weekly · 6 monthly (tunable via `BACKUP_RETENTION`). +- **Storage Box quirk:** SSH/SFTP on **port 23**, key auth. A single ssh-config + wildcard covers both boxes, so one key + `restic copy` mirrors primary → DR. +- **Encryption:** repos are Restic-encrypted with `RESTIC_PASSWORD`. **Store it + offline** — losing it makes every backup unrecoverable. +- **Alerting:** set `BACKUP_HEALTHCHECK_URL` (e.g. healthchecks.io) for a + dead-man's switch — get paged when a nightly run is missed, not when you need + to restore. + +> **Database consistency:** live DB files in PVCs are crash-consistent at best. +> The reliable path is logical dumps — the **fleet layer** adds `pg_dump` / +> `mongodump` CronJobs that write into a backup PVC under +> `/var/lib/rancher/k3s/storage`, which Restic then captures. Restore those +> dumps, not the raw data dirs. + +**Run restore drills.** A backup you've never restored isn't a backup: + +```bash +sudo /opt/dezky-backup/restore.sh snapshots +sudo /opt/dezky-backup/restore.sh restore latest /tmp/restore-test +``` + +## ⚠️ Lockout safety + +- **Always** open a second SSH session and confirm access **before** closing the + one you ran bootstrap in. +- Management is pinned to home + office IPs. **Residential IPs can change** — if + yours does, you'll be locked out of SSH/6443 (public services stay up). +- **Break-glass:** Hetzner's **KVM/LARA** console (Robot panel) is out-of-band + and bypasses the firewall entirely. From there you can edit + `/etc/nftables.d/dezky-fw.nft` or update `config.env` + re-run `firewall.sh`. +- If your IP changes often, widen `MGMT_ALLOW_V4` to a small prefix, or we add a + WireGuard bastion later. + +## Verifying after apply + +```bash +sudo nft list table inet dezky_fw # our rules +sudo nft list ruleset | grep -c KUBE # k3s rules still present (>0 once k3s runs) +sudo systemctl status dezky-firewall # enabled + active (exited) +sudo fail2ban-client status sshd # jail active +# From a NON-allowlisted network, `ssh` should hang/timeout; 443 should work. +``` + +## Host layer status + +**Complete:** hardening ✅ · firewall ✅ · k3s registration ✅ · Stalwart ✅ · +backups ✅. + +Next is the **Fleet/GitOps layer** (`infrastructure/production/fleet/`): +cert-manager + `ClusterIssuer`, ingress, the data tier (Postgres/Mongo/Redis), +Authentik, OCIS + Collabora, and portal + platform-api — plus the +`mail/mail-tls` cert and the DB-dump CronJobs this layer's `cert-sync` and +backups depend on. diff --git a/infrastructure/production/host/bootstrap.sh b/infrastructure/production/host/bootstrap.sh new file mode 100755 index 0000000..32402f9 --- /dev/null +++ b/infrastructure/production/host/bootstrap.sh @@ -0,0 +1,192 @@ +#!/usr/bin/env bash +# +# Dezky production host bootstrap — OS hardening for the AX41 k3s node. +# +# Run ONCE on a fresh Debian 12 (bookworm) install, as root, e.g.: +# scp -r infrastructure/production/host root@:/opt/dezky-host +# ssh root@ 'cd /opt/dezky-host && cp config.env.example config.env && nano config.env' +# ssh root@ 'cd /opt/dezky-host && ./bootstrap.sh' +# +# Order matters: we create your admin user + install your SSH key BEFORE +# disabling root/password login, so you can't lock yourself out. The script +# is idempotent — safe to re-run. +# +# What it does NOT do: install k3s, Stalwart, or backups. Those are separate +# steps in this host/ layer (added next). This is OS baseline + firewall only. + +set -euo pipefail + +RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'; BLUE='\033[0;34m'; NC='\033[0m' +info() { echo -e "${BLUE}[INFO]${NC} $*"; } +ok() { echo -e "${GREEN}[OK]${NC} $*"; } +warn() { echo -e "${YELLOW}[WARN]${NC} $*"; } +error() { echo -e "${RED}[ERROR]${NC} $*" >&2; } + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +CONFIG_FILE="$SCRIPT_DIR/config.env" + +echo "" +echo "╔══════════════════════════════════════════════════════════════╗" +echo "║ Dezky Production Host Bootstrap (Debian 12) ║" +echo "╚══════════════════════════════════════════════════════════════╝" +echo "" + +# ── Preflight ────────────────────────────────────────────────────────────── +if [[ $EUID -ne 0 ]]; then + error "Run as root (you'll create the unprivileged admin user from here)." + exit 1 +fi + +if [[ ! -f "$CONFIG_FILE" ]]; then + error "Missing $CONFIG_FILE — copy config.env.example and fill it in." + exit 1 +fi +# shellcheck disable=SC1090 +source "$CONFIG_FILE" + +: "${ADMIN_USER:?ADMIN_USER required}" +: "${ADMIN_SSH_PUBKEY:?ADMIN_SSH_PUBKEY required — without it you would lock yourself out}" +: "${MGMT_ALLOW_V4:?MGMT_ALLOW_V4 required}" +: "${SERVER_HOSTNAME:?SERVER_HOSTNAME required}" +: "${SSH_PORT:=22}" + +if [[ "$ADMIN_SSH_PUBKEY" != ssh-* ]]; then + error "ADMIN_SSH_PUBKEY doesn't look like a public key (should start with 'ssh-')." + exit 1 +fi + +# ── Step 1: base packages + system upgrade ───────────────────────────────── +info "Step 1: Updating system and installing base packages..." +export DEBIAN_FRONTEND=noninteractive +apt-get update -qq +apt-get upgrade -y -qq +apt-get install -y -qq \ + nftables fail2ban unattended-upgrades apt-listchanges \ + curl ca-certificates gnupg htop tmux vim chrony \ + >/dev/null +ok "Base packages installed." + +# ── Step 2: hostname + timezone + time sync ──────────────────────────────── +info "Step 2: Hostname, timezone (UTC), time sync..." +hostnamectl set-hostname "$SERVER_HOSTNAME" +timedatectl set-timezone UTC +systemctl enable --now chrony >/dev/null 2>&1 || true +# Ensure the FQDN resolves locally +if ! grep -q "$SERVER_HOSTNAME" /etc/hosts; then + echo "127.0.1.1 ${SERVER_HOSTNAME} ${SERVER_HOSTNAME%%.*}" >> /etc/hosts +fi +ok "Hostname set to $SERVER_HOSTNAME (UTC)." + +# ── Step 3: admin user + SSH key (BEFORE locking SSH) ────────────────────── +info "Step 3: Admin user '$ADMIN_USER' + SSH key..." +if ! id -u "$ADMIN_USER" >/dev/null 2>&1; then + adduser --disabled-password --gecos "" "$ADMIN_USER" +fi +usermod -aG sudo "$ADMIN_USER" +install -d -m 0700 -o "$ADMIN_USER" -g "$ADMIN_USER" "/home/$ADMIN_USER/.ssh" +AUTH_KEYS="/home/$ADMIN_USER/.ssh/authorized_keys" +touch "$AUTH_KEYS" +grep -qxF "$ADMIN_SSH_PUBKEY" "$AUTH_KEYS" || echo "$ADMIN_SSH_PUBKEY" >> "$AUTH_KEYS" +chmod 0600 "$AUTH_KEYS" +chown "$ADMIN_USER:$ADMIN_USER" "$AUTH_KEYS" +# Passworded sudo (member of sudo group). Set a password manually later if you +# want interactive sudo: `passwd $ADMIN_USER`. Key-only login still works. +ok "Admin user ready with your SSH key." + +# ── Step 4: SSH hardening (drop-in) ──────────────────────────────────────── +info "Step 4: Hardening SSH..." +SSHD_DROPIN="/etc/ssh/sshd_config.d/99-dezky.conf" +cat > "$SSHD_DROPIN" </dev/null || systemctl reload sshd 2>/dev/null || true + ok "SSH hardened: key-only, no root, AllowUsers=${ADMIN_USER}, port ${SSH_PORT}." +else + error "sshd config test FAILED — removing drop-in, leaving SSH as-is." + rm -f "$SSHD_DROPIN" + exit 1 +fi + +# ── Step 5: kernel sysctl for k3s + sane limits ──────────────────────────── +info "Step 5: sysctl + kernel modules for k3s..." +modprobe br_netfilter 2>/dev/null || true +modprobe overlay 2>/dev/null || true +cat > /etc/modules-load.d/dezky-k3s.conf < /etc/sysctl.d/99-dezky-k3s.conf < raise inotify + file limits +fs.inotify.max_user_instances = 8192 +fs.inotify.max_user_watches = 524288 +fs.file-max = 2097152 +EOF +sysctl --system >/dev/null +ok "sysctl applied." + +# ── Step 6: disable swap (kubelet best practice) ─────────────────────────── +info "Step 6: Disabling swap (recommended for k3s nodes)..." +swapoff -a || true +# Comment any swap entries so it stays off across reboots +sed -i.bak -E 's@^([^#].*\sswap\s.*)$@# \1 # disabled by dezky bootstrap@' /etc/fstab || true +ok "Swap disabled." + +# ── Step 7: fail2ban (ssh) ───────────────────────────────────────────────── +info "Step 7: fail2ban for SSH..." +cat > /etc/fail2ban/jail.d/dezky-sshd.local </dev/null 2>&1 || true +systemctl restart fail2ban >/dev/null 2>&1 || true +ok "fail2ban active on SSH." + +# ── Step 8: unattended security upgrades ─────────────────────────────────── +info "Step 8: Enabling unattended security upgrades..." +cat > /etc/apt/apt.conf.d/20auto-upgrades </dev/null 2>&1 || true +CONFIG_FILE="$CONFIG_FILE" "$SCRIPT_DIR/firewall/firewall.sh" + +echo "" +echo "╔══════════════════════════════════════════════════════════════╗" +echo "║ Host bootstrap complete ║" +echo "╚══════════════════════════════════════════════════════════════╝" +warn "BEFORE you close this root session:" +warn " 1. Open a new terminal and run: ssh -p ${SSH_PORT} ${ADMIN_USER}@${SERVER_PUBLIC_IPV4:-}" +warn " 2. Confirm you get in with your key." +warn " 3. Only then close this session. KVM/LARA is your fallback if not." +echo "" +info "Next host-layer steps (separate scripts, added next): k3s registration," +info "Stalwart mail, Restic backups." diff --git a/infrastructure/production/host/config.env.example b/infrastructure/production/host/config.env.example new file mode 100644 index 0000000..a4aa495 --- /dev/null +++ b/infrastructure/production/host/config.env.example @@ -0,0 +1,59 @@ +# ───────────────────────────────────────────────────────────── +# Dezky production host configuration +# +# Copy to `config.env` and fill in real values. `config.env` is +# gitignored — it holds host-specific values, not the repo's source +# of truth. Both bootstrap.sh and firewall/firewall.sh source this. +# ───────────────────────────────────────────────────────────── + +# --- Management allowlist ------------------------------------------------- +# Source addresses allowed to reach SSH (22) and the k3s API (6443). +# Everything else on those ports is dropped. Accepts a comma-separated +# list of single IPs and/or CIDRs (e.g. home + office, or a /29 block, +# or a v6 /64 prefix) — the firewall treats these as nftables interval sets. +# +# NOTE: residential IPs can change. If yours is dynamic, prefer a small +# prefix here, and remember Hetzner's KVM/LARA console is always reachable +# out-of-band if you ever lock yourself out (see README). +MGMT_ALLOW_V4="203.0.113.10, 203.0.113.11" # REQUIRED — management IPv4(s)/CIDR(s) +MGMT_ALLOW_V6="" # optional — management IPv6(s)/prefix (empty to skip) + +# --- Server identity ------------------------------------------------------ +SERVER_HOSTNAME="node1.dezky.eu" # FQDN set on the box +SERVER_PUBLIC_IPV4="" # AX41 primary IPv4 (fill after provisioning) +SERVER_PUBLIC_IPV6="" # AX41 primary IPv6 (fill after provisioning) + +# --- Admin (non-root) user ------------------------------------------------ +ADMIN_USER="dezky" # created with sudo; root SSH login is then disabled +ADMIN_SSH_PUBKEY="" # REQUIRED — your SSH public key (the WHOLE line, e.g. "ssh-ed25519 AAAA... you@home") + +# --- SSH ------------------------------------------------------------------ +SSH_PORT="22" # keep 22 unless you have a reason; obscurity is not security + +# --- k3s networking (defaults; change ONLY if you customise k3s CIDRs) ---- +K3S_POD_CIDR="10.42.0.0/16" # flannel pod network — accepted to/from host +K3S_SERVICE_CIDR="10.43.0.0/16" # cluster service network — accepted to/from host + +# --- Rancher Custom-cluster registration (SECRET) ------------------------- +# From Rancher → Cluster Management → → Registration tab. Create the +# cluster with the **K3s** distribution first. Token + checksum are secrets. +RANCHER_SERVER_URL="https://rancher.example.com" +RANCHER_NODE_TOKEN="" # REQUIRED — node registration token +RANCHER_CA_CHECKSUM="" # REQUIRED — CA checksum from the same command +RANCHER_NODE_ROLES="--etcd --controlplane --worker" # single node = all three +RANCHER_INSECURE_FETCH="true" # true if Rancher is reached by IP / self-signed cert + +# --- Stalwart mail (host service) ----------------------------------------- +# SECRETS — platform-api (k3s) must use the SAME admin password + webhook secret. +STALWART_VERSION="latest" # pin to a release tag after first install +STALWART_ADMIN_PASSWORD="" # REQUIRED — openssl rand -hex 24 +STALWART_WEBHOOK_SECRET="" # REQUIRED — openssl rand -hex 32 + +# --- Restic backups (host) ------------------------------------------------ +# Storage Box is SSH/SFTP on PORT 23, key auth. STORE RESTIC_PASSWORD OFFLINE. +RESTIC_PASSWORD="" # REQUIRED — openssl rand -hex 32 (save offline!) +BACKUP_PRIMARY_REPO="" # sftp:@.your-storagebox.de:/dezky +BACKUP_DR_REPO="" # sftp:@.your-storagebox.de:/dezky (Helsinki box) +BACKUP_PATHS="/opt/stalwart/data /opt/stalwart/etc /var/lib/rancher/k3s/server/db/snapshots /var/lib/rancher/k3s/storage" +BACKUP_RETENTION="--keep-daily 7 --keep-weekly 4 --keep-monthly 6" +BACKUP_HEALTHCHECK_URL="" # optional dead-man's-switch base URL diff --git a/infrastructure/production/host/firewall/dezky-firewall.service b/infrastructure/production/host/firewall/dezky-firewall.service new file mode 100644 index 0000000..c76ee1a --- /dev/null +++ b/infrastructure/production/host/firewall/dezky-firewall.service @@ -0,0 +1,27 @@ +# Dezky host firewall — loads ONLY our table on boot. +# +# Deliberately does NOT use the distro 'nftables.service', whose default +# config starts with `flush ruleset` and would wipe k3s's tables. This unit +# applies /etc/nftables.d/dezky-fw.nft, which only (re)creates inet dezky_fw. +# +# Ordering: runs early (before k3s) so the box is never briefly exposed. +# k3s adds its own tables independently afterwards. + +[Unit] +Description=Dezky host firewall (nftables, k3s-safe) +Wants=network-pre.target +Before=network-pre.target k3s.service +DefaultDependencies=no +Conflicts=shutdown.target +Before=shutdown.target + +[Service] +Type=oneshot +RemainAfterExit=yes +ExecStart=/usr/sbin/nft -f /etc/nftables.d/dezky-fw.nft +ExecReload=/usr/sbin/nft -f /etc/nftables.d/dezky-fw.nft +# On stop, remove only our table — leave k3s networking intact. +ExecStop=/usr/sbin/nft destroy table inet dezky_fw + +[Install] +WantedBy=multi-user.target diff --git a/infrastructure/production/host/firewall/firewall.sh b/infrastructure/production/host/firewall/firewall.sh new file mode 100755 index 0000000..cf1ded3 --- /dev/null +++ b/infrastructure/production/host/firewall/firewall.sh @@ -0,0 +1,160 @@ +#!/usr/bin/env bash +# +# Dezky production host firewall (nftables) — k3s-safe. +# +# Why this design: +# - k3s/kube-proxy/flannel manage their OWN nftables tables (ip/ip6: filter, +# nat, mangle). We must never `flush ruleset` or use ufw/firewalld, or we +# wipe/clobber cluster networking. Instead we own a single dedicated table, +# `inet dezky_fw`, with only an INPUT chain. Separate tables coexist; a +# packet is dropped if ANY base chain drops it, so our default-drop INPUT +# is the gate for host-bound traffic while k3s keeps owning FORWARD/NAT. +# - We explicitly accept the pod/service CIDRs and CNI interfaces so +# cluster<->host traffic (API server, kubelet, CoreDNS) is never dropped. +# +# Idempotent: re-running replaces only our table (`destroy table` first). +# +# Usage (as root, on the server): +# ./firewall.sh # render from ../config.env, install unit, apply +# ./firewall.sh --dry-run # print the ruleset, apply nothing + +set -euo pipefail + +RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'; BLUE='\033[0;34m'; NC='\033[0m' +info() { echo -e "${BLUE}[INFO]${NC} $*"; } +ok() { echo -e "${GREEN}[OK]${NC} $*"; } +warn() { echo -e "${YELLOW}[WARN]${NC} $*"; } +error() { echo -e "${RED}[ERROR]${NC} $*" >&2; } + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +HOST_DIR="$(cd "$SCRIPT_DIR/.." && pwd)" +CONFIG_FILE="${CONFIG_FILE:-$HOST_DIR/config.env}" +NFT_OUT="/etc/nftables.d/dezky-fw.nft" +UNIT_SRC="$SCRIPT_DIR/dezky-firewall.service" +UNIT_DST="/etc/systemd/system/dezky-firewall.service" + +DRY_RUN=0 +[[ "${1:-}" == "--dry-run" ]] && DRY_RUN=1 + +# ── Load config ─────────────────────────────────────────────────────────── +if [[ ! -f "$CONFIG_FILE" ]]; then + error "Config not found: $CONFIG_FILE" + error "Copy config.env.example → config.env and fill it in." + exit 1 +fi +# shellcheck disable=SC1090 +source "$CONFIG_FILE" + +: "${MGMT_ALLOW_V4:?MGMT_ALLOW_V4 is required in config.env}" +: "${SSH_PORT:=22}" +: "${K3S_POD_CIDR:=10.42.0.0/16}" +: "${K3S_SERVICE_CIDR:=10.43.0.0/16}" + +# ── Build the management v6 block only if a v6 address is configured ─────── +V6_SET="" +V6_RULE="" +if [[ -n "${MGMT_ALLOW_V6:-}" ]]; then + V6_SET=$(cat < host traffic ────────────── + iifname "cni0" accept + iifname "flannel.1" accept + ip saddr ${K3S_POD_CIDR} accept + ip saddr ${K3S_SERVICE_CIDR} accept + + # ── Public services (world-reachable) ────────────────────────────── + # Web + ACME HTTP-01 challenge + tcp dport { 80, 443 } accept + # Mail: smtp, submissions, submission, imap, imaps, managesieve + tcp dport { 25, 465, 587, 143, 993, 4190 } accept + + # ── Management surfaces: home IP only ────────────────────────────── + ip saddr @mgmt_v4 tcp dport { ${SSH_PORT}, 6443 } accept +${V6_RULE} + + # Rate-limited drop logging for debugging (then policy drop applies) + limit rate 5/minute burst 5 packets log prefix "dezky-fw drop: " level info + } +} +EOF +) + +if [[ $DRY_RUN -eq 1 ]]; then + echo "$RULESET" + info "Dry run — nothing applied." + exit 0 +fi + +if [[ $EUID -ne 0 ]]; then + error "Must run as root to apply the firewall." + exit 1 +fi + +# ── Write, validate, install, apply ──────────────────────────────────────── +mkdir -p /etc/nftables.d +echo "$RULESET" > "$NFT_OUT" +chmod 0644 "$NFT_OUT" +info "Wrote ruleset → $NFT_OUT" + +# Validate syntax before touching the live ruleset +if ! nft -c -f "$NFT_OUT"; then + error "nft syntax check FAILED — not applying. Live firewall unchanged." + exit 1 +fi +ok "Ruleset syntax valid." + +# Install the systemd unit so the rules survive reboot (and never flush global) +if [[ -f "$UNIT_SRC" ]]; then + install -m 0644 "$UNIT_SRC" "$UNIT_DST" + systemctl daemon-reload + systemctl enable dezky-firewall.service >/dev/null 2>&1 || true + ok "Installed + enabled dezky-firewall.service" +fi + +# Apply now +nft -f "$NFT_OUT" +ok "Firewall applied. Management restricted to: ${MGMT_ALLOW_V4} ${MGMT_ALLOW_V6:-}" +warn "Open a SECOND SSH session NOW and confirm you still have access before" +warn "closing this one. Hetzner KVM/LARA is your out-of-band fallback." diff --git a/infrastructure/production/host/k3s/register.sh b/infrastructure/production/host/k3s/register.sh new file mode 100755 index 0000000..8aca152 --- /dev/null +++ b/infrastructure/production/host/k3s/register.sh @@ -0,0 +1,93 @@ +#!/usr/bin/env bash +# +# Register the AX41 as a single-node k3s cluster in Rancher (Custom cluster, +# provisioning v2). Run AFTER bootstrap.sh — the firewall already allows the +# outbound 443 the cluster-agent needs (no inbound rule required). +# +# This downloads Rancher's system-agent installer and runs it. The agent then +# pulls the cluster spec from Rancher and stands up k3s with the configured +# roles. The Rancher Custom cluster MUST be created with the K3s distribution. +# +# Security note: Rancher here is addressed by IP, whose TLS cert won't match, +# so we fetch the installer with --insecure. That's acceptable because the +# agent verifies Rancher's CA via --ca-checksum for its ongoing connection. +# Move Rancher behind rancher.dezky.eu + a valid cert to drop the insecure fetch. +# +# Usage (on the server): +# sudo ./register.sh # register this node +# sudo ./register.sh --force # re-run even if an agent is already present + +set -euo pipefail + +RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'; BLUE='\033[0;34m'; NC='\033[0m' +info() { echo -e "${BLUE}[INFO]${NC} $*"; } +ok() { echo -e "${GREEN}[OK]${NC} $*"; } +warn() { echo -e "${YELLOW}[WARN]${NC} $*"; } +error() { echo -e "${RED}[ERROR]${NC} $*" >&2; } + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +HOST_DIR="$(cd "$SCRIPT_DIR/.." && pwd)" +CONFIG_FILE="${CONFIG_FILE:-$HOST_DIR/config.env}" + +FORCE=0 +[[ "${1:-}" == "--force" ]] && FORCE=1 + +if [[ $EUID -ne 0 ]]; then + error "Run with sudo/root (the agent installer needs root)." + exit 1 +fi + +if [[ ! -f "$CONFIG_FILE" ]]; then + error "Missing $CONFIG_FILE — fill in the RANCHER_* values first." + exit 1 +fi +# shellcheck disable=SC1090 +source "$CONFIG_FILE" + +: "${RANCHER_SERVER_URL:?RANCHER_SERVER_URL required}" +: "${RANCHER_NODE_TOKEN:?RANCHER_NODE_TOKEN required}" +: "${RANCHER_CA_CHECKSUM:?RANCHER_CA_CHECKSUM required}" +: "${RANCHER_NODE_ROLES:=--etcd --controlplane --worker}" +: "${RANCHER_INSECURE_FETCH:=true}" + +# ── Idempotency guard ────────────────────────────────────────────────────── +if systemctl list-unit-files 2>/dev/null | grep -q '^rancher-system-agent'; then + if [[ $FORCE -eq 0 ]]; then + warn "rancher-system-agent already installed — node looks registered." + warn "Re-run with --force to register again. Skipping." + exit 0 + fi + warn "rancher-system-agent present, but --force given — proceeding." +fi + +# ── Fetch installer ──────────────────────────────────────────────────────── +INSECURE_FLAG="" +if [[ "$RANCHER_INSECURE_FETCH" == "true" ]]; then + INSECURE_FLAG="--insecure" + warn "Fetching installer insecurely (Rancher reached by IP). CA checksum still pins the agent connection." +fi + +TMP_INSTALLER="$(mktemp /tmp/rancher-system-agent-install.XXXXXX.sh)" +trap 'rm -f "$TMP_INSTALLER"' EXIT + +info "Downloading system-agent installer from ${RANCHER_SERVER_URL} ..." +# shellcheck disable=SC2086 +curl -fsSL $INSECURE_FLAG "${RANCHER_SERVER_URL}/system-agent-install.sh" -o "$TMP_INSTALLER" +ok "Installer downloaded ($(wc -c < "$TMP_INSTALLER") bytes)." + +# ── Register ─────────────────────────────────────────────────────────────── +info "Registering node with roles: ${RANCHER_NODE_ROLES}" +info "(token masked: ${RANCHER_NODE_TOKEN:0:6}…)" +# shellcheck disable=SC2086 +sh "$TMP_INSTALLER" \ + --server "${RANCHER_SERVER_URL}" \ + --label 'cattle.io/os=linux' \ + --token "${RANCHER_NODE_TOKEN}" \ + --ca-checksum "${RANCHER_CA_CHECKSUM}" \ + ${RANCHER_NODE_ROLES} + +echo "" +ok "Registration submitted. Watch progress in Rancher (cluster goes Active in a few minutes)." +info "On the node you can follow along with:" +info " journalctl -u rancher-system-agent -f" +info " k3s kubectl get nodes # once k3s is up" diff --git a/infrastructure/production/host/restic/backup.sh b/infrastructure/production/host/restic/backup.sh new file mode 100755 index 0000000..9dec867 --- /dev/null +++ b/infrastructure/production/host/restic/backup.sh @@ -0,0 +1,86 @@ +#!/usr/bin/env bash +# +# Dezky host backup — Restic to a Hetzner Storage Box (primary), then a +# dedup-aware `restic copy` to a second Storage Box in Helsinki (DR). +# +# Runs as root (must read stalwart- and root-owned data). HOME is pointed at +# /opt/dezky-backup so ssh uses the dedicated backup key + config (Storage Box +# is SSH/SFTP on port 23). Triggered daily by dezky-backup.timer. +# +# Requires restic >= 0.14 (for `copy --from-repo`). + +set -euo pipefail + +RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'; BLUE='\033[0;34m'; NC='\033[0m' +info() { echo -e "${BLUE}[INFO]${NC} $*"; } +ok() { echo -e "${GREEN}[OK]${NC} $*"; } +warn() { echo -e "${YELLOW}[WARN]${NC} $*"; } +error() { echo -e "${RED}[ERROR]${NC} $*" >&2; } + +BACKUP_HOME="/opt/dezky-backup" +ENV_FILE="${ENV_FILE:-$BACKUP_HOME/restic.env}" + +if [[ ! -f "$ENV_FILE" ]]; then + error "Missing $ENV_FILE — run restic/install.sh first." + exit 1 +fi +# shellcheck disable=SC1090 +source "$ENV_FILE" + +: "${RESTIC_PASSWORD:?RESTIC_PASSWORD required}" +: "${BACKUP_PRIMARY_REPO:?BACKUP_PRIMARY_REPO required}" +: "${BACKUP_PATHS:?BACKUP_PATHS required}" +: "${BACKUP_RETENTION:=--keep-daily 7 --keep-weekly 4 --keep-monthly 6}" + +# ssh (spawned by restic) reads $HOME/.ssh/config — wildcard for *.your-storagebox.de +export HOME="$BACKUP_HOME" +export RESTIC_PASSWORD +# For `copy`: both repos share the same password. +export RESTIC_FROM_PASSWORD="$RESTIC_PASSWORD" + +# Optional dead-man's-switch (e.g. healthchecks.io). Pinged /start, success, /fail. +HC="${BACKUP_HEALTHCHECK_URL:-}" +ping_hc() { [[ -n "$HC" ]] && curl -fsS -m 10 --retry 3 "${HC}${1:-}" >/dev/null 2>&1 || true; } +fail() { error "$1"; ping_hc "/fail"; exit 1; } + +ping_hc "/start" + +# Exclude obvious churn/noise from the PVC tree +EXCLUDES=(--exclude-caches + --exclude '*/lost+found' + --exclude '*.tmp') + +# ── 1) Back up to the primary Storage Box ────────────────────────────────── +info "Backing up to primary: $BACKUP_PRIMARY_REPO" +# shellcheck disable=SC2086 +restic -r "$BACKUP_PRIMARY_REPO" backup $BACKUP_PATHS \ + "${EXCLUDES[@]}" \ + --tag dezky --tag host \ + --host dezky-node1 \ + || fail "Primary backup failed." +ok "Primary backup done." + +# ── 2) Retention on primary ──────────────────────────────────────────────── +info "Applying retention on primary..." +# shellcheck disable=SC2086 +restic -r "$BACKUP_PRIMARY_REPO" forget $BACKUP_RETENTION --prune \ + || warn "Primary forget/prune reported an issue (backup itself is safe)." + +# ── 3) Light integrity check on primary ──────────────────────────────────── +restic -r "$BACKUP_PRIMARY_REPO" check || warn "restic check flagged the primary repo — investigate." + +# ── 4) Mirror to the Helsinki DR box (dedup-aware copy) ───────────────────── +if [[ -n "${BACKUP_DR_REPO:-}" ]]; then + info "Copying snapshots to DR: $BACKUP_DR_REPO" + restic -r "$BACKUP_DR_REPO" copy --from-repo "$BACKUP_PRIMARY_REPO" \ + || fail "DR copy failed." + # shellcheck disable=SC2086 + restic -r "$BACKUP_DR_REPO" forget $BACKUP_RETENTION --prune \ + || warn "DR forget/prune reported an issue." + ok "DR mirror done." +else + warn "BACKUP_DR_REPO not set — skipping off-site mirror (set it for real DR)." +fi + +ok "Backup cycle complete." +ping_hc "" # success ping (bare URL) diff --git a/infrastructure/production/host/restic/dezky-backup.service b/infrastructure/production/host/restic/dezky-backup.service new file mode 100644 index 0000000..491eb0c --- /dev/null +++ b/infrastructure/production/host/restic/dezky-backup.service @@ -0,0 +1,13 @@ +# Dezky nightly backup (Restic → Storage Box primary + Helsinki DR). +[Unit] +Description=Dezky host backup (Restic) +After=network-online.target +Wants=network-online.target + +[Service] +Type=oneshot +ExecStart=/opt/dezky-backup/backup.sh +# Backups are I/O heavy but should never starve mail/k3s +Nice=10 +IOSchedulingClass=best-effort +IOSchedulingPriority=6 diff --git a/infrastructure/production/host/restic/dezky-backup.timer b/infrastructure/production/host/restic/dezky-backup.timer new file mode 100644 index 0000000..d98daba --- /dev/null +++ b/infrastructure/production/host/restic/dezky-backup.timer @@ -0,0 +1,12 @@ +# Nightly at 03:20 UTC, with a randomized delay so it doesn't hammer the +# Storage Box at the same second every night. Catches up if the box was off. +[Unit] +Description=Run the Dezky host backup nightly + +[Timer] +OnCalendar=*-*-* 03:20:00 +RandomizedDelaySec=20min +Persistent=true + +[Install] +WantedBy=timers.target diff --git a/infrastructure/production/host/restic/install.sh b/infrastructure/production/host/restic/install.sh new file mode 100755 index 0000000..07099c2 --- /dev/null +++ b/infrastructure/production/host/restic/install.sh @@ -0,0 +1,115 @@ +#!/usr/bin/env bash +# +# Install Dezky host backups: Restic + a dedicated backup SSH key/config for the +# Hetzner Storage Box(es), the env file, the backup/restore scripts, and the +# nightly systemd timer. Idempotent. +# +# sudo ./install.sh +# +# Storage Box uses SSH/SFTP on PORT 23 with key auth. After this runs, you must +# upload the printed public key to BOTH Storage Boxes, then re-run to init the +# repos (the box must trust the key before `restic init` can connect). + +set -euo pipefail + +RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'; BLUE='\033[0;34m'; NC='\033[0m' +info() { echo -e "${BLUE}[INFO]${NC} $*"; } +ok() { echo -e "${GREEN}[OK]${NC} $*"; } +warn() { echo -e "${YELLOW}[WARN]${NC} $*"; } +error() { echo -e "${RED}[ERROR]${NC} $*" >&2; } + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +HOST_DIR="$(cd "$SCRIPT_DIR/.." && pwd)" +CONFIG_FILE="${CONFIG_FILE:-$HOST_DIR/config.env}" +BACKUP_HOME="/opt/dezky-backup" +SSH_DIR="$BACKUP_HOME/.ssh" +KEY="$SSH_DIR/id_ed25519" + +if [[ $EUID -ne 0 ]]; then error "Run as root."; exit 1; fi +if [[ ! -f "$CONFIG_FILE" ]]; then error "Missing $CONFIG_FILE"; exit 1; fi +# shellcheck disable=SC1090 +source "$CONFIG_FILE" + +: "${RESTIC_PASSWORD:?RESTIC_PASSWORD required (and STORE IT OFFLINE — losing it loses the backups)}" +: "${BACKUP_PRIMARY_REPO:?BACKUP_PRIMARY_REPO required}" +: "${BACKUP_PATHS:?BACKUP_PATHS required}" +: "${BACKUP_RETENTION:=--keep-daily 7 --keep-weekly 4 --keep-monthly 6}" + +# ── 1) Packages ──────────────────────────────────────────────────────────── +info "Installing restic + openssh client..." +export DEBIAN_FRONTEND=noninteractive +apt-get update -qq +apt-get install -y -qq restic curl openssh-client >/dev/null +ok "restic $(restic version | awk '{print $2}') installed." + +# ── 2) Backup home + SSH key/config ──────────────────────────────────────── +info "Setting up $BACKUP_HOME ..." +install -d -m 0700 "$BACKUP_HOME" "$SSH_DIR" +if [[ ! -f "$KEY" ]]; then + ssh-keygen -t ed25519 -N "" -C "dezky-backup@node1" -f "$KEY" >/dev/null + ok "Generated backup SSH key." +fi +# Single wildcard config covers BOTH Storage Boxes (same domain, port 23, key). +cat > "$SSH_DIR/config" < "$BACKUP_HOME/restic.env" </dev/null 2>&1; then + ok "$label repo already initialized." + elif restic -r "$repo" init >/dev/null 2>&1; then + ok "$label repo initialized." + else + warn "$label repo not reachable/authorized yet — upload the key, then re-run." + fi +} +init_repo "$BACKUP_PRIMARY_REPO" "Primary" +init_repo "${BACKUP_DR_REPO:-}" "DR" + +echo "" +echo "╔══════════════════════════════════════════════════════════════╗" +echo "║ Backup install complete ║" +echo "╚══════════════════════════════════════════════════════════════╝" +warn "Upload this PUBLIC key to BOTH Storage Boxes, then re-run install.sh:" +echo "" +cat "$KEY.pub" +echo "" +info " ssh-copy-id -p 23 -i $KEY.pub @.your-storagebox.de" +info " ssh-copy-id -p 23 -i $KEY.pub @.your-storagebox.de" +info "Then test: sudo $BACKUP_HOME/backup.sh (or wait for 03:20 UTC)" +info "Drill restore: sudo $BACKUP_HOME/restore.sh restore latest /tmp/restore-test" +warn "STORE RESTIC_PASSWORD OFFLINE. Without it, the encrypted backups are unrecoverable." diff --git a/infrastructure/production/host/restic/restore.sh b/infrastructure/production/host/restic/restore.sh new file mode 100755 index 0000000..35b16fe --- /dev/null +++ b/infrastructure/production/host/restic/restore.sh @@ -0,0 +1,57 @@ +#!/usr/bin/env bash +# +# Dezky restore helper. A backup you've never restored is a backup you don't +# have — run a drill periodically. This wraps the common restic restore flows. +# +# sudo ./restore.sh snapshots # list snapshots (primary) +# sudo ./restore.sh snapshots --dr # list from the DR box +# sudo ./restore.sh restore [--dr] +# sudo ./restore.sh restore latest /tmp/restore-test # safe drill target +# +# Restores go to an arbitrary target dir (NOT in place) so you can inspect first. +# For Stalwart, stop the service, swap /opt/stalwart/data, then start it. + +set -euo pipefail + +RED='\033[0;31m'; GREEN='\033[0;32m'; BLUE='\033[0;34m'; NC='\033[0m' +info() { echo -e "${BLUE}[INFO]${NC} $*"; } +ok() { echo -e "${GREEN}[OK]${NC} $*"; } +error() { echo -e "${RED}[ERROR]${NC} $*" >&2; } + +BACKUP_HOME="/opt/dezky-backup" +ENV_FILE="${ENV_FILE:-$BACKUP_HOME/restic.env}" +[[ -f "$ENV_FILE" ]] || { error "Missing $ENV_FILE"; exit 1; } +# shellcheck disable=SC1090 +source "$ENV_FILE" +export HOME="$BACKUP_HOME" +export RESTIC_PASSWORD + +pick_repo() { + if [[ "${*: -1}" == "--dr" ]]; then + [[ -n "${BACKUP_DR_REPO:-}" ]] || { error "BACKUP_DR_REPO not set"; exit 1; } + echo "$BACKUP_DR_REPO" + else + echo "$BACKUP_PRIMARY_REPO" + fi +} + +cmd="${1:-}"; shift || true +case "$cmd" in + snapshots) + repo="$(pick_repo "$@")" + info "Snapshots in $repo:" + restic -r "$repo" snapshots --tag dezky + ;; + restore) + snap="${1:?snapshot id (or 'latest')}"; target="${2:?target dir}" + repo="$(pick_repo "$@")" + mkdir -p "$target" + info "Restoring $snap from $repo → $target" + restic -r "$repo" restore "$snap" --target "$target" + ok "Restored. Inspect $target before putting anything back in place." + ;; + *) + error "Usage: $0 {snapshots|restore} ... (see header)" + exit 1 + ;; +esac diff --git a/infrastructure/production/host/stalwart/cert-sync.sh b/infrastructure/production/host/stalwart/cert-sync.sh new file mode 100755 index 0000000..8290a2a --- /dev/null +++ b/infrastructure/production/host/stalwart/cert-sync.sh @@ -0,0 +1,77 @@ +#!/usr/bin/env bash +# +# Sync the mail.dezky.eu TLS cert from the cluster (issued by cert-manager) to +# Stalwart on the host. The host IS the k3s node, so we read the secret via the +# local kubeconfig — no external machinery. Reloads Stalwart only when the cert +# actually changed (cert-manager renews ~30 days before expiry). +# +# Run by stalwart-cert-sync.timer (every 12h + on boot). Safe to run by hand. +# +# Forward dependency: needs the fleet layer to have created the TLS secret +# (default: namespace 'mail', secret 'mail-tls'). Until then this is a no-op and +# Stalwart keeps using the self-signed bootstrap cert from install.sh. + +set -euo pipefail + +RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'; BLUE='\033[0;34m'; NC='\033[0m' +info() { echo -e "${BLUE}[INFO]${NC} $*"; } +ok() { echo -e "${GREEN}[OK]${NC} $*"; } +warn() { echo -e "${YELLOW}[WARN]${NC} $*"; } +error() { echo -e "${RED}[ERROR]${NC} $*" >&2; } + +TLS_NAMESPACE="${TLS_NAMESPACE:-mail}" +TLS_SECRET="${TLS_SECRET:-mail-tls}" +TLS_DIR="/opt/stalwart/etc/tls" +KUBECONFIG_PATH="${KUBECONFIG:-/etc/rancher/k3s/k3s.yaml}" + +# kubectl: prefer standalone, fall back to the k3s-bundled one +if command -v kubectl >/dev/null 2>&1; then + KUBECTL=(kubectl) +elif command -v k3s >/dev/null 2>&1; then + KUBECTL=(k3s kubectl) +else + error "Neither kubectl nor k3s found — is the node provisioned yet?" + exit 1 +fi +export KUBECONFIG="$KUBECONFIG_PATH" + +# Pull the secret (no-op if it doesn't exist yet) +if ! "${KUBECTL[@]}" -n "$TLS_NAMESPACE" get secret "$TLS_SECRET" >/dev/null 2>&1; then + warn "Secret ${TLS_NAMESPACE}/${TLS_SECRET} not present yet — cert-manager hasn't issued it. Skipping." + exit 0 +fi + +TMP_CRT="$(mktemp)"; TMP_KEY="$(mktemp)" +trap 'rm -f "$TMP_CRT" "$TMP_KEY"' EXIT + +"${KUBECTL[@]}" -n "$TLS_NAMESPACE" get secret "$TLS_SECRET" \ + -o jsonpath='{.data.tls\.crt}' | base64 -d > "$TMP_CRT" +"${KUBECTL[@]}" -n "$TLS_NAMESPACE" get secret "$TLS_SECRET" \ + -o jsonpath='{.data.tls\.key}' | base64 -d > "$TMP_KEY" + +if [[ ! -s "$TMP_CRT" || ! -s "$TMP_KEY" ]]; then + error "Fetched cert or key is empty — leaving current cert in place." + exit 1 +fi + +# Only reload if something changed (compare hashes) +changed=0 +mkdir -p "$TLS_DIR" +if ! cmp -s "$TMP_CRT" "$TLS_DIR/cert.pem" 2>/dev/null; then changed=1; fi +if ! cmp -s "$TMP_KEY" "$TLS_DIR/key.pem" 2>/dev/null; then changed=1; fi + +if [[ $changed -eq 0 ]]; then + info "Cert unchanged — nothing to do." + exit 0 +fi + +install -o stalwart -g stalwart -m 0644 "$TMP_CRT" "$TLS_DIR/cert.pem" +install -o stalwart -g stalwart -m 0640 "$TMP_KEY" "$TLS_DIR/key.pem" +ok "Updated mail TLS cert from ${TLS_NAMESPACE}/${TLS_SECRET}." + +# SIGHUP Stalwart to reload certs without dropping connections +if systemctl is-active --quiet stalwart-mail; then + systemctl reload stalwart-mail && ok "Reloaded stalwart-mail (SIGHUP)." +else + warn "stalwart-mail not active — cert staged, will be used on next start." +fi diff --git a/infrastructure/production/host/stalwart/config.toml b/infrastructure/production/host/stalwart/config.toml new file mode 100644 index 0000000..1cc28c3 --- /dev/null +++ b/infrastructure/production/host/stalwart/config.toml @@ -0,0 +1,102 @@ +# Stalwart Mail Server — Dezky PRODUCTION (bare-metal host, outside k3s) +# +# Topology (see host/README.md): +# - Mail protocol ports bind directly on the host's public IP. +# - Web/JMAP is served plaintext on 127-reachable :8080 and fronted by +# Traefik (k3s) for mail.dezky.eu:443. Stalwart does NOT bind 80/443 — +# those belong to Traefik. +# - TLS for the mail-protocol ports uses a cert ISSUED BY cert-manager +# (mail.dezky.eu) and copied here by stalwart/cert-sync.sh. Stalwart runs +# no ACME of its own (80/443 are Traefik's). +# - Storage is RocksDB on local disk — intentionally independent of the +# in-cluster Postgres so mail keeps flowing regardless of cluster state. +# +# Reference: https://stalw.art/docs + +[server] +hostname = "mail.dezky.eu" # MUST match the IP's PTR/rDNS record + +# ── Listeners ────────────────────────────────────────────────────────────── +# Mail protocols on the public IP; management/JMAP on internal 8080 only +# (firewall blocks 8080 from the world, allows the k3s pod CIDR + Traefik). +[server.listener] +"smtp" = { bind = "[::]:25", protocol = "smtp" } +"submission" = { bind = "[::]:587", protocol = "smtp", tls.implicit = false } +"submissions" = { bind = "[::]:465", protocol = "smtp", tls.implicit = true } +"imap" = { bind = "[::]:143", protocol = "imap", tls.implicit = false } +"imaps" = { bind = "[::]:993", protocol = "imap", tls.implicit = true } +"sieve" = { bind = "[::]:4190", protocol = "managesieve" } +# Internal HTTP: JMAP + WebAdmin + management API. Traefik terminates TLS for +# the public hostname and proxies here; platform-api (pod) calls it directly. +"http" = { bind = "0.0.0.0:8080", protocol = "http" } + +# ── Storage — RocksDB on local disk (host-isolated from the cluster) ──────── +[store."rocksdb"] +type = "rocksdb" +path = "/opt/stalwart/data" +compression = "lz4" + +[storage] +data = "rocksdb" +fts = "rocksdb" +blob = "rocksdb" +lookup = "rocksdb" +directory = "internal" + +[directory."internal"] +type = "internal" +store = "rocksdb" + +# ── TLS — cert issued by cert-manager, synced here by cert-sync.sh ────────── +# Until the first sync runs, install.sh drops a self-signed bootstrap cert so +# the TLS listeners can start. cert-sync replaces it with the real LE cert. +[certificate."default"] +cert = "%{file:/opt/stalwart/etc/tls/cert.pem}%" +private-key = "%{file:/opt/stalwart/etc/tls/key.pem}%" +default = true + +# ── Authentication ───────────────────────────────────────────────────────── +# Fallback admin is what platform-api uses for Basic auth on the JMAP +# management API (STALWART_ADMIN_USER/PASSWORD on the platform-api side). +[authentication] +fallback-admin.user = "admin" +fallback-admin.secret = "$env{STALWART_ADMIN_PASSWORD}" + +# ── Resolver ─────────────────────────────────────────────────────────────── +# DNSSEC-aware system resolver. Mail deliverability depends on clean DNS. +[resolver] +type = "system" +preserve-intermediates = true +concurrency = 4 + +# ── Spam filtering — built-in filter ON in production ────────────────────── +[spam-filter] +enable = true + +# ── Logging — journald captures stdout ───────────────────────────────────── +[tracer."stdout"] +type = "stdout" +level = "info" +ansi = false +enable = true + +# ── Audit webhook → platform-api (via the public api ingress) ────────────── +# Stalwart on the host reaches platform-api through Traefik on the public +# hostname; HMAC-signed so a public endpoint is safe. +[webhook."audit-ingest"] +url = "https://api.dezky.eu/ingest/stalwart/webhook" +signature-key = "$env{STALWART_WEBHOOK_SECRET}" +events = [ + "auth.success", + "auth.failure", + "auth.banned", + "account.created", + "account.deleted", + "account.password-changed", + "message.rejected", + "policy.rejection", + "dkim.failure", + "dmarc.failure", + "spam.detected", +] +throttle = "1s" diff --git a/infrastructure/production/host/stalwart/install.sh b/infrastructure/production/host/stalwart/install.sh new file mode 100755 index 0000000..1a09db3 --- /dev/null +++ b/infrastructure/production/host/stalwart/install.sh @@ -0,0 +1,144 @@ +#!/usr/bin/env bash +# +# Install Stalwart mail server as a hardened host systemd service on the AX41. +# Run AFTER bootstrap.sh (and ideally after k3s registration, so cert-sync can +# immediately pull the real cert). Idempotent — safe to re-run to upgrade. +# +# sudo ./install.sh +# +# What it does: creates the stalwart user + /opt/stalwart layout, downloads a +# pinned Stalwart binary, installs config.toml + the secrets EnvironmentFile, +# drops a self-signed bootstrap cert (replaced later by cert-sync), and installs +# the systemd units (mail service + cert-sync service/timer). + +set -euo pipefail + +RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'; BLUE='\033[0;34m'; NC='\033[0m' +info() { echo -e "${BLUE}[INFO]${NC} $*"; } +ok() { echo -e "${GREEN}[OK]${NC} $*"; } +warn() { echo -e "${YELLOW}[WARN]${NC} $*"; } +error() { echo -e "${RED}[ERROR]${NC} $*" >&2; } + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +HOST_DIR="$(cd "$SCRIPT_DIR/.." && pwd)" +CONFIG_FILE="${CONFIG_FILE:-$HOST_DIR/config.env}" + +PREFIX="/opt/stalwart" +STALWART_REPO="${STALWART_REPO:-stalwartlabs/mail-server}" + +if [[ $EUID -ne 0 ]]; then + error "Run as root." + exit 1 +fi +if [[ ! -f "$CONFIG_FILE" ]]; then + error "Missing $CONFIG_FILE — fill in the STALWART_* values first." + exit 1 +fi +# shellcheck disable=SC1090 +source "$CONFIG_FILE" + +: "${STALWART_ADMIN_PASSWORD:?STALWART_ADMIN_PASSWORD required (openssl rand -hex 24)}" +: "${STALWART_WEBHOOK_SECRET:?STALWART_WEBHOOK_SECRET required (openssl rand -hex 32)}" +: "${STALWART_VERSION:=latest}" + +# ── Step 1: user + directory layout ──────────────────────────────────────── +info "Step 1: stalwart user + ${PREFIX} layout..." +if ! id -u stalwart >/dev/null 2>&1; then + useradd --system --home-dir "$PREFIX" --shell /usr/sbin/nologin stalwart +fi +install -d -o stalwart -g stalwart -m 0750 "$PREFIX" "$PREFIX/bin" "$PREFIX/data" "$PREFIX/logs" +install -d -o stalwart -g stalwart -m 0750 "$PREFIX/etc" "$PREFIX/etc/tls" +ok "Layout ready." + +# ── Step 2: download the Stalwart binary ─────────────────────────────────── +info "Step 2: fetching Stalwart binary (${STALWART_REPO}@${STALWART_VERSION})..." +arch="$(uname -m)" +case "$arch" in + x86_64) target="x86_64-unknown-linux-gnu" ;; + aarch64) target="aarch64-unknown-linux-gnu" ;; + *) error "Unsupported arch: $arch"; exit 1 ;; +esac + +if [[ "$STALWART_VERSION" == "latest" ]]; then + api="https://api.github.com/repos/${STALWART_REPO}/releases/latest" + warn "Using 'latest' — pin STALWART_VERSION to a tag in config.env after this install." +else + api="https://api.github.com/repos/${STALWART_REPO}/releases/tags/${STALWART_VERSION}" +fi + +asset_url="$(curl -fsSL "$api" \ + | grep -oE "https://[^\"]+${target}[^\"]+\.tar\.gz" \ + | head -n1)" +if [[ -z "$asset_url" ]]; then + error "Could not find a ${target} .tar.gz asset in ${STALWART_REPO}@${STALWART_VERSION}." + error "Check the release assets or set STALWART_REPO/STALWART_VERSION." + exit 1 +fi + +tmp="$(mktemp -d)"; trap 'rm -rf "$tmp"' EXIT +info "Downloading $asset_url" +curl -fsSL "$asset_url" -o "$tmp/stalwart.tar.gz" +tar -xzf "$tmp/stalwart.tar.gz" -C "$tmp" +bin="$(find "$tmp" -type f \( -name stalwart -o -name stalwart-mail \) | head -n1)" +if [[ -z "$bin" ]]; then + error "No 'stalwart'/'stalwart-mail' binary found in the archive." + exit 1 +fi +systemctl stop stalwart-mail 2>/dev/null || true +install -o stalwart -g stalwart -m 0755 "$bin" "$PREFIX/bin/stalwart" +ok "Installed $("$PREFIX/bin/stalwart" --version 2>/dev/null || echo 'stalwart binary')." + +# ── Step 3: config + secrets EnvironmentFile ─────────────────────────────── +info "Step 3: config.toml + secrets env..." +install -o stalwart -g stalwart -m 0640 "$SCRIPT_DIR/config.toml" "$PREFIX/etc/config.toml" +umask 077 +cat > "$PREFIX/etc/stalwart.env" </dev/null 2>&1 + chown stalwart:stalwart "$PREFIX/etc/tls/"*.pem + chmod 0644 "$PREFIX/etc/tls/cert.pem"; chmod 0640 "$PREFIX/etc/tls/key.pem" + ok "Bootstrap cert in place." +else + ok "Step 4: TLS cert already present — keeping it." +fi + +# ── Step 5: cert-sync + systemd units ────────────────────────────────────── +info "Step 5: installing cert-sync + systemd units..." +install -o root -g root -m 0755 "$SCRIPT_DIR/cert-sync.sh" "$PREFIX/cert-sync.sh" +install -m 0644 "$SCRIPT_DIR/stalwart-mail.service" /etc/systemd/system/stalwart-mail.service +install -m 0644 "$SCRIPT_DIR/stalwart-cert-sync.service" /etc/systemd/system/stalwart-cert-sync.service +install -m 0644 "$SCRIPT_DIR/stalwart-cert-sync.timer" /etc/systemd/system/stalwart-cert-sync.timer +systemctl daemon-reload +systemctl enable --now stalwart-mail.service +systemctl enable --now stalwart-cert-sync.timer +ok "Services enabled." + +# Try an immediate cert sync (no-op until cert-manager has issued the secret) +"$PREFIX/cert-sync.sh" || true + +echo "" +echo "╔══════════════════════════════════════════════════════════════╗" +echo "║ Stalwart installed & running ║" +echo "╚══════════════════════════════════════════════════════════════╝" +systemctl --no-pager --lines=0 status stalwart-mail || true +echo "" +warn "Follow-ups:" +warn " • PTR/rDNS for the server IP MUST be 'mail.dezky.eu' (Hetzner Robot)." +warn " • Publish DNS at simply.com: MX → mail.dezky.eu, SPF, DMARC; per-domain" +warn " DKIM records come from Stalwart's dnsZoneFile via platform-api." +warn " • platform-api (k3s) env: STALWART_API_URL=http://:8080" +warn " STALWART_ADMIN_USER=admin STALWART_ADMIN_PASSWORD=" +warn " STALWART_WEBHOOK_SECRET= STALWART_PROVISIONING_ENABLED=true" diff --git a/infrastructure/production/host/stalwart/stalwart-cert-sync.service b/infrastructure/production/host/stalwart/stalwart-cert-sync.service new file mode 100644 index 0000000..67f525f --- /dev/null +++ b/infrastructure/production/host/stalwart/stalwart-cert-sync.service @@ -0,0 +1,10 @@ +# Oneshot: sync the mail TLS cert from the cluster to Stalwart. +# Triggered by stalwart-cert-sync.timer. +[Unit] +Description=Sync mail.dezky.eu TLS cert from cluster to Stalwart +After=network-online.target k3s.service +Wants=network-online.target + +[Service] +Type=oneshot +ExecStart=/opt/stalwart/cert-sync.sh diff --git a/infrastructure/production/host/stalwart/stalwart-cert-sync.timer b/infrastructure/production/host/stalwart/stalwart-cert-sync.timer new file mode 100644 index 0000000..43082aa --- /dev/null +++ b/infrastructure/production/host/stalwart/stalwart-cert-sync.timer @@ -0,0 +1,12 @@ +# Run cert-sync shortly after boot and every 12h thereafter. cert-manager +# renews well before expiry, so twice-daily comfortably picks up new certs. +[Unit] +Description=Periodic mail TLS cert sync for Stalwart + +[Timer] +OnBootSec=3min +OnUnitActiveSec=12h +Persistent=true + +[Install] +WantedBy=timers.target diff --git a/infrastructure/production/host/stalwart/stalwart-mail.service b/infrastructure/production/host/stalwart/stalwart-mail.service new file mode 100644 index 0000000..e219114 --- /dev/null +++ b/infrastructure/production/host/stalwart/stalwart-mail.service @@ -0,0 +1,39 @@ +# Dezky — Stalwart mail server (bare-metal host service). +# +# Secrets (admin password, webhook secret) come from the EnvironmentFile, which +# install.sh generates from config.env. The binary needs CAP_NET_BIND_SERVICE +# to bind the privileged mail ports (25/143/...) while running as a non-root user. + +[Unit] +Description=Stalwart Mail Server (Dezky) +After=network-online.target +Wants=network-online.target + +[Service] +Type=simple +User=stalwart +Group=stalwart +EnvironmentFile=/opt/stalwart/etc/stalwart.env +ExecStart=/opt/stalwart/bin/stalwart --config /opt/stalwart/etc/config.toml +# Stalwart reloads its TLS certs / config on SIGHUP — used by cert-sync. +ExecReload=/bin/kill -HUP $MAINPID +Restart=on-failure +RestartSec=5 +LimitNOFILE=65536 + +# Bind privileged ports without full root +AmbientCapabilities=CAP_NET_BIND_SERVICE +CapabilityBoundingSet=CAP_NET_BIND_SERVICE + +# Hardening — Stalwart only needs to write under /opt/stalwart +NoNewPrivileges=true +ProtectSystem=strict +ProtectHome=true +PrivateTmp=true +ReadWritePaths=/opt/stalwart/data /opt/stalwart/logs /opt/stalwart/etc/tls +ProtectKernelTunables=true +ProtectControlGroups=true +RestrictSUIDSGID=true + +[Install] +WantedBy=multi-user.target