feat(infra): production host bootstrap and bare-metal Stalwart scaffolding

Host provisioning for the single-server production target: SSH + firewall
hardening (nftables allowlist), k3s node registration, bare-metal Stalwart
install with systemd units and TLS cert-sync from the cluster secret, and
Restic encrypted backup/restore (primary + DR) with timer units. Host-specific
secrets live in config.env (gitignored); config.env.example is the template.
Also gitignores MemPalace per-project files.
This commit is contained in:
Ronni Baslund
2026-06-07 00:19:48 +02:00
parent 5ed3d2bc5f
commit 3831c85285
18 changed files with 1432 additions and 0 deletions
+7
View File
@@ -3,6 +3,9 @@
.env.local
.env.*.local
# Production host config (real IPs / SSH key — keep out of git)
infrastructure/production/host/config.env
# TLS certificates (mkcert generated)
infrastructure/docker-compose/certs/*.pem
@@ -41,3 +44,7 @@ coverage/
# Temporary
tmp/
.tmp/
# MemPalace per-project files (issue #185)
mempalace.yaml
entities.json
+227
View File
@@ -0,0 +1,227 @@
# Dezky production — host layer
OS baseline + firewall for the bare-metal **Hetzner AX41** that runs the k3s
node. This layer is everything that lives on the *host* (outside Kubernetes):
hardening, the k3s-safe firewall, and — added next — k3s registration, Stalwart
mail, and Restic backups.
Managed by **Fleet/Rancher** once k3s is up; this host layer is the part Fleet
*can't* do, so it runs over SSH from reviewed scripts.
## Files
| File | Purpose |
|------|---------|
| `config.env.example` | Template for host-specific values |
| `config.env` | **Real values — gitignored.** Source of truth lives only on your machine/box |
| `bootstrap.sh` | One-shot OS hardening: user, SSH, sysctl, swap, fail2ban, auto-updates, firewall |
| `firewall/firewall.sh` | Renders + applies the k3s-safe nftables ruleset (idempotent) |
| `firewall/dezky-firewall.service` | systemd unit; reapplies our table on boot, never flushes globally |
| `k3s/register.sh` | Registers the node into Rancher (Custom k3s cluster); secrets from `config.env` |
| `stalwart/install.sh` | Installs Stalwart as a hardened host service (binary, units, secrets, bootstrap cert) |
| `stalwart/config.toml` | Production Stalwart config (mail ports on host, JMAP on internal 8080) |
| `stalwart/stalwart-mail.service` | systemd unit; non-root + `CAP_NET_BIND_SERVICE` for low ports |
| `stalwart/cert-sync.sh` + `*.service`/`*.timer` | Pulls the cert-manager mail cert into Stalwart, reloads on change |
| `restic/install.sh` | Sets up Restic, the backup SSH key/config, env, and the nightly timer |
| `restic/backup.sh` | Backup → primary Storage Box, retention, then `copy` → Helsinki DR |
| `restic/restore.sh` | List/restore snapshots (run drills!) |
| `restic/dezky-backup.service` + `.timer` | Nightly 03:20 UTC backup |
## The firewall model (read this)
k3s, kube-proxy and flannel manage their **own** nftables tables (`ip`/`ip6`:
`filter`, `nat`, `mangle`). The classic mistake is running `ufw`/`firewalld` or
`nft flush ruleset`, which wipes or fights those rules and breaks pod networking.
So instead:
- We own a single dedicated table — **`inet dezky_fw`** — with only an INPUT
chain (default `drop`). Separate tables coexist; a packet is dropped if *any*
base chain drops it, so our default-drop INPUT gates host-bound traffic while
k3s keeps owning FORWARD/NAT untouched.
- We explicitly **accept the pod (`10.42.0.0/16`) and service (`10.43.0.0/16`)
CIDRs and the CNI interfaces** (`cni0`, `flannel.1`) so cluster↔host traffic
(API server, kubelet, CoreDNS) is never dropped.
- We **never** `flush ruleset`. The systemd unit's `ExecStop` removes only our
table.
### Access policy
| Surface | Ports | Who |
|---------|-------|-----|
| Web + ACME | 80, 443 | **World** (customers) |
| Mail | 25, 465, 587, 143, 993, 4190 | **World** |
| SSH | 22 | **`MGMT_ALLOW_V4/V6` only** |
| k3s API | 6443 | **`MGMT_ALLOW_V4/V6` only** |
Current management allowlist: **home `46.32.144.38`**, **office `46.32.144.45`**.
The Rancher plane (`91.99.122.153`) needs **no inbound rule** — the cluster
agent dials *out* to Rancher over 443, so replies ride the established/related
fast-path.
## Apply order
> Prereqs: AX41 provisioned with **Debian 12 (bookworm)**, reachable as `root`.
> `config.env` filled in — in particular `ADMIN_SSH_PUBKEY` and
> `SERVER_PUBLIC_IPV4` (still TODO until the box exists).
```bash
# From your laptop:
scp -r infrastructure/production/host root@<server-ip>:/opt/dezky-host
# On the server:
ssh root@<server-ip>
cd /opt/dezky-host
# config.env is gitignored, so copy it up separately or recreate it here:
# cp config.env.example config.env && nano config.env
./bootstrap.sh
```
`bootstrap.sh` creates your admin user and installs your key **before** it
disables root/password SSH, so the order is lockout-safe. It's idempotent —
re-run anytime.
To touch only the firewall later:
```bash
sudo ./firewall/firewall.sh --dry-run # preview the ruleset
sudo ./firewall/firewall.sh # render, validate, apply, install unit
```
### Then register into Rancher
Once the host is hardened, register the node as a **Custom k3s cluster**
(create the cluster in Rancher first, choosing the **K3s** distribution, then
paste its token/checksum into `config.env`):
```bash
sudo ./k3s/register.sh # downloads agent installer, joins cluster
journalctl -u rancher-system-agent -f # follow provisioning
```
Rancher is currently reached by IP, so the installer is fetched with
`--insecure`; the agent's ongoing link is still verified via `--ca-checksum`.
Give Rancher a real hostname + cert later to drop the insecure fetch.
### Then install Stalwart (mail)
```bash
sudo ./stalwart/install.sh # binary + systemd + bootstrap cert
systemctl status stalwart-mail
```
Requires `STALWART_ADMIN_PASSWORD` + `STALWART_WEBHOOK_SECRET` in `config.env`
(`openssl rand -hex 24` / `-hex 32`). See the mail topology below.
## Mail (Stalwart) topology
Stalwart runs on the **host**, not in k3s — mail must keep flowing regardless of
cluster state, and SMTP/IMAP want the real public IP for reputation. The single
public IP forces a deliberate split with Traefik:
| Concern | Owner | Detail |
|---------|-------|--------|
| Mail protocol ports (25/465/587/143/993/4190) | **Stalwart (host)** | Bound on the public IP; opened to the world by the firewall |
| Web/JMAP for `mail.dezky.eu:443` | **Traefik (k3s)** | Terminates TLS, reverse-proxies to Stalwart's internal `:8080` |
| ACME / TLS issuance | **cert-manager (k3s)** | Issues `mail.dezky.eu` via HTTP-01; Stalwart runs no ACME (80/443 are Traefik's) |
| Cert delivery to mail ports | **`cert-sync.sh` (host)** | Reads the cluster TLS secret via local kubeconfig, reloads Stalwart on change |
| Storage | **RocksDB on host disk** | Intentionally independent of the in-cluster Postgres |
| Domain/DKIM provisioning | **platform-api (k3s)** | JMAP management API at `http://<node>:8080/jmap`, Basic auth |
| Audit webhook | **Stalwart → platform-api** | POSTs to `https://api.dezky.eu/ingest/...`, HMAC-signed |
**platform-api Fleet env** (must match the host's `config.env`):
```
STALWART_API_URL=http://<node-internal-ip>:8080
STALWART_ADMIN_USER=admin
STALWART_ADMIN_PASSWORD=<same as host STALWART_ADMIN_PASSWORD>
STALWART_WEBHOOK_SECRET=<same as host STALWART_WEBHOOK_SECRET>
STALWART_PROVISIONING_ENABLED=true
```
The firewall already lets the k3s pod CIDR reach host `:8080` while blocking the
world, so no extra rule is needed.
> **Forward dependency:** `cert-sync.sh` needs the fleet layer to create the
> `mail/mail-tls` cert secret. Until then Stalwart serves the self-signed
> bootstrap cert `install.sh` generated; the timer swaps in the real cert
> automatically once it exists.
### Finally, backups
```bash
sudo ./restic/install.sh # restic + key + nightly timer
# upload the printed public key to BOTH Storage Boxes (port 23), then:
sudo ./restic/install.sh # re-run to init the repos
sudo /opt/dezky-backup/backup.sh # first backup (or wait for 03:20 UTC)
```
Needs `RESTIC_PASSWORD` + `BACKUP_PRIMARY_REPO` (+ `BACKUP_DR_REPO`) in
`config.env`. See backups below.
## Backups (Restic)
Nightly at **03:20 UTC**: back up to the **primary Storage Box**, apply
retention, `restic check`, then a dedup-aware **`copy` to the Helsinki DR box**.
| What | Why |
|------|-----|
| `/opt/stalwart/data` + `/etc` | Mail store (RocksDB) + config — the crown jewels |
| `/var/lib/rancher/k3s/server/db/snapshots` | k3s **etcd snapshots** (cluster state) |
| `/var/lib/rancher/k3s/storage` | local-path PVCs — incl. where fleet `pg_dump`/`mongodump` CronJobs land |
- **Retention:** 7 daily · 4 weekly · 6 monthly (tunable via `BACKUP_RETENTION`).
- **Storage Box quirk:** SSH/SFTP on **port 23**, key auth. A single ssh-config
wildcard covers both boxes, so one key + `restic copy` mirrors primary → DR.
- **Encryption:** repos are Restic-encrypted with `RESTIC_PASSWORD`. **Store it
offline** — losing it makes every backup unrecoverable.
- **Alerting:** set `BACKUP_HEALTHCHECK_URL` (e.g. healthchecks.io) for a
dead-man's switch — get paged when a nightly run is missed, not when you need
to restore.
> **Database consistency:** live DB files in PVCs are crash-consistent at best.
> The reliable path is logical dumps — the **fleet layer** adds `pg_dump` /
> `mongodump` CronJobs that write into a backup PVC under
> `/var/lib/rancher/k3s/storage`, which Restic then captures. Restore those
> dumps, not the raw data dirs.
**Run restore drills.** A backup you've never restored isn't a backup:
```bash
sudo /opt/dezky-backup/restore.sh snapshots
sudo /opt/dezky-backup/restore.sh restore latest /tmp/restore-test
```
## ⚠️ Lockout safety
- **Always** open a second SSH session and confirm access **before** closing the
one you ran bootstrap in.
- Management is pinned to home + office IPs. **Residential IPs can change** — if
yours does, you'll be locked out of SSH/6443 (public services stay up).
- **Break-glass:** Hetzner's **KVM/LARA** console (Robot panel) is out-of-band
and bypasses the firewall entirely. From there you can edit
`/etc/nftables.d/dezky-fw.nft` or update `config.env` + re-run `firewall.sh`.
- If your IP changes often, widen `MGMT_ALLOW_V4` to a small prefix, or we add a
WireGuard bastion later.
## Verifying after apply
```bash
sudo nft list table inet dezky_fw # our rules
sudo nft list ruleset | grep -c KUBE # k3s rules still present (>0 once k3s runs)
sudo systemctl status dezky-firewall # enabled + active (exited)
sudo fail2ban-client status sshd # jail active
# From a NON-allowlisted network, `ssh` should hang/timeout; 443 should work.
```
## Host layer status
**Complete:** hardening ✅ · firewall ✅ · k3s registration ✅ · Stalwart ✅ ·
backups ✅.
Next is the **Fleet/GitOps layer** (`infrastructure/production/fleet/`):
cert-manager + `ClusterIssuer`, ingress, the data tier (Postgres/Mongo/Redis),
Authentik, OCIS + Collabora, and portal + platform-api — plus the
`mail/mail-tls` cert and the DB-dump CronJobs this layer's `cert-sync` and
backups depend on.
+192
View File
@@ -0,0 +1,192 @@
#!/usr/bin/env bash
#
# Dezky production host bootstrap — OS hardening for the AX41 k3s node.
#
# Run ONCE on a fresh Debian 12 (bookworm) install, as root, e.g.:
# scp -r infrastructure/production/host root@<server>:/opt/dezky-host
# ssh root@<server> 'cd /opt/dezky-host && cp config.env.example config.env && nano config.env'
# ssh root@<server> 'cd /opt/dezky-host && ./bootstrap.sh'
#
# Order matters: we create your admin user + install your SSH key BEFORE
# disabling root/password login, so you can't lock yourself out. The script
# is idempotent — safe to re-run.
#
# What it does NOT do: install k3s, Stalwart, or backups. Those are separate
# steps in this host/ layer (added next). This is OS baseline + firewall only.
set -euo pipefail
RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'; BLUE='\033[0;34m'; NC='\033[0m'
info() { echo -e "${BLUE}[INFO]${NC} $*"; }
ok() { echo -e "${GREEN}[OK]${NC} $*"; }
warn() { echo -e "${YELLOW}[WARN]${NC} $*"; }
error() { echo -e "${RED}[ERROR]${NC} $*" >&2; }
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
CONFIG_FILE="$SCRIPT_DIR/config.env"
echo ""
echo "╔══════════════════════════════════════════════════════════════╗"
echo "║ Dezky Production Host Bootstrap (Debian 12) ║"
echo "╚══════════════════════════════════════════════════════════════╝"
echo ""
# ── Preflight ──────────────────────────────────────────────────────────────
if [[ $EUID -ne 0 ]]; then
error "Run as root (you'll create the unprivileged admin user from here)."
exit 1
fi
if [[ ! -f "$CONFIG_FILE" ]]; then
error "Missing $CONFIG_FILE — copy config.env.example and fill it in."
exit 1
fi
# shellcheck disable=SC1090
source "$CONFIG_FILE"
: "${ADMIN_USER:?ADMIN_USER required}"
: "${ADMIN_SSH_PUBKEY:?ADMIN_SSH_PUBKEY required — without it you would lock yourself out}"
: "${MGMT_ALLOW_V4:?MGMT_ALLOW_V4 required}"
: "${SERVER_HOSTNAME:?SERVER_HOSTNAME required}"
: "${SSH_PORT:=22}"
if [[ "$ADMIN_SSH_PUBKEY" != ssh-* ]]; then
error "ADMIN_SSH_PUBKEY doesn't look like a public key (should start with 'ssh-')."
exit 1
fi
# ── Step 1: base packages + system upgrade ─────────────────────────────────
info "Step 1: Updating system and installing base packages..."
export DEBIAN_FRONTEND=noninteractive
apt-get update -qq
apt-get upgrade -y -qq
apt-get install -y -qq \
nftables fail2ban unattended-upgrades apt-listchanges \
curl ca-certificates gnupg htop tmux vim chrony \
>/dev/null
ok "Base packages installed."
# ── Step 2: hostname + timezone + time sync ────────────────────────────────
info "Step 2: Hostname, timezone (UTC), time sync..."
hostnamectl set-hostname "$SERVER_HOSTNAME"
timedatectl set-timezone UTC
systemctl enable --now chrony >/dev/null 2>&1 || true
# Ensure the FQDN resolves locally
if ! grep -q "$SERVER_HOSTNAME" /etc/hosts; then
echo "127.0.1.1 ${SERVER_HOSTNAME} ${SERVER_HOSTNAME%%.*}" >> /etc/hosts
fi
ok "Hostname set to $SERVER_HOSTNAME (UTC)."
# ── Step 3: admin user + SSH key (BEFORE locking SSH) ──────────────────────
info "Step 3: Admin user '$ADMIN_USER' + SSH key..."
if ! id -u "$ADMIN_USER" >/dev/null 2>&1; then
adduser --disabled-password --gecos "" "$ADMIN_USER"
fi
usermod -aG sudo "$ADMIN_USER"
install -d -m 0700 -o "$ADMIN_USER" -g "$ADMIN_USER" "/home/$ADMIN_USER/.ssh"
AUTH_KEYS="/home/$ADMIN_USER/.ssh/authorized_keys"
touch "$AUTH_KEYS"
grep -qxF "$ADMIN_SSH_PUBKEY" "$AUTH_KEYS" || echo "$ADMIN_SSH_PUBKEY" >> "$AUTH_KEYS"
chmod 0600 "$AUTH_KEYS"
chown "$ADMIN_USER:$ADMIN_USER" "$AUTH_KEYS"
# Passworded sudo (member of sudo group). Set a password manually later if you
# want interactive sudo: `passwd $ADMIN_USER`. Key-only login still works.
ok "Admin user ready with your SSH key."
# ── Step 4: SSH hardening (drop-in) ────────────────────────────────────────
info "Step 4: Hardening SSH..."
SSHD_DROPIN="/etc/ssh/sshd_config.d/99-dezky.conf"
cat > "$SSHD_DROPIN" <<EOF
# Managed by Dezky bootstrap.sh
Port ${SSH_PORT}
PermitRootLogin no
PasswordAuthentication no
KbdInteractiveAuthentication no
ChallengeResponseAuthentication no
PubkeyAuthentication yes
PermitEmptyPasswords no
X11Forwarding no
MaxAuthTries 3
LoginGraceTime 30
AllowUsers ${ADMIN_USER}
EOF
if sshd -t; then
systemctl reload ssh 2>/dev/null || systemctl reload sshd 2>/dev/null || true
ok "SSH hardened: key-only, no root, AllowUsers=${ADMIN_USER}, port ${SSH_PORT}."
else
error "sshd config test FAILED — removing drop-in, leaving SSH as-is."
rm -f "$SSHD_DROPIN"
exit 1
fi
# ── Step 5: kernel sysctl for k3s + sane limits ────────────────────────────
info "Step 5: sysctl + kernel modules for k3s..."
modprobe br_netfilter 2>/dev/null || true
modprobe overlay 2>/dev/null || true
cat > /etc/modules-load.d/dezky-k3s.conf <<EOF
br_netfilter
overlay
EOF
cat > /etc/sysctl.d/99-dezky-k3s.conf <<EOF
# Routing/bridging required by k3s/flannel
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
# Many containers => raise inotify + file limits
fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 524288
fs.file-max = 2097152
EOF
sysctl --system >/dev/null
ok "sysctl applied."
# ── Step 6: disable swap (kubelet best practice) ───────────────────────────
info "Step 6: Disabling swap (recommended for k3s nodes)..."
swapoff -a || true
# Comment any swap entries so it stays off across reboots
sed -i.bak -E 's@^([^#].*\sswap\s.*)$@# \1 # disabled by dezky bootstrap@' /etc/fstab || true
ok "Swap disabled."
# ── Step 7: fail2ban (ssh) ─────────────────────────────────────────────────
info "Step 7: fail2ban for SSH..."
cat > /etc/fail2ban/jail.d/dezky-sshd.local <<EOF
[sshd]
enabled = true
port = ${SSH_PORT}
backend = systemd
maxretry = 4
findtime = 10m
bantime = 1h
EOF
systemctl enable --now fail2ban >/dev/null 2>&1 || true
systemctl restart fail2ban >/dev/null 2>&1 || true
ok "fail2ban active on SSH."
# ── Step 8: unattended security upgrades ───────────────────────────────────
info "Step 8: Enabling unattended security upgrades..."
cat > /etc/apt/apt.conf.d/20auto-upgrades <<EOF
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Unattended-Upgrade "1";
EOF
# Keep defaults for which origins (security). Auto-reboot OFF — you decide when.
ok "Unattended security upgrades enabled (auto-reboot left off)."
# ── Step 9: firewall (k3s-safe nftables) ───────────────────────────────────
info "Step 9: Applying k3s-safe nftables firewall..."
# Ensure distro nftables.service won't fight us: we run our own unit and never
# flush the global ruleset. Disable the stock service's auto-load of its conf.
systemctl disable --now nftables.service >/dev/null 2>&1 || true
CONFIG_FILE="$CONFIG_FILE" "$SCRIPT_DIR/firewall/firewall.sh"
echo ""
echo "╔══════════════════════════════════════════════════════════════╗"
echo "║ Host bootstrap complete ║"
echo "╚══════════════════════════════════════════════════════════════╝"
warn "BEFORE you close this root session:"
warn " 1. Open a new terminal and run: ssh -p ${SSH_PORT} ${ADMIN_USER}@${SERVER_PUBLIC_IPV4:-<server-ip>}"
warn " 2. Confirm you get in with your key."
warn " 3. Only then close this session. KVM/LARA is your fallback if not."
echo ""
info "Next host-layer steps (separate scripts, added next): k3s registration,"
info "Stalwart mail, Restic backups."
@@ -0,0 +1,59 @@
# ─────────────────────────────────────────────────────────────
# Dezky production host configuration
#
# Copy to `config.env` and fill in real values. `config.env` is
# gitignored — it holds host-specific values, not the repo's source
# of truth. Both bootstrap.sh and firewall/firewall.sh source this.
# ─────────────────────────────────────────────────────────────
# --- Management allowlist -------------------------------------------------
# Source addresses allowed to reach SSH (22) and the k3s API (6443).
# Everything else on those ports is dropped. Accepts a comma-separated
# list of single IPs and/or CIDRs (e.g. home + office, or a /29 block,
# or a v6 /64 prefix) — the firewall treats these as nftables interval sets.
#
# NOTE: residential IPs can change. If yours is dynamic, prefer a small
# prefix here, and remember Hetzner's KVM/LARA console is always reachable
# out-of-band if you ever lock yourself out (see README).
MGMT_ALLOW_V4="203.0.113.10, 203.0.113.11" # REQUIRED — management IPv4(s)/CIDR(s)
MGMT_ALLOW_V6="" # optional — management IPv6(s)/prefix (empty to skip)
# --- Server identity ------------------------------------------------------
SERVER_HOSTNAME="node1.dezky.eu" # FQDN set on the box
SERVER_PUBLIC_IPV4="" # AX41 primary IPv4 (fill after provisioning)
SERVER_PUBLIC_IPV6="" # AX41 primary IPv6 (fill after provisioning)
# --- Admin (non-root) user ------------------------------------------------
ADMIN_USER="dezky" # created with sudo; root SSH login is then disabled
ADMIN_SSH_PUBKEY="" # REQUIRED — your SSH public key (the WHOLE line, e.g. "ssh-ed25519 AAAA... you@home")
# --- SSH ------------------------------------------------------------------
SSH_PORT="22" # keep 22 unless you have a reason; obscurity is not security
# --- k3s networking (defaults; change ONLY if you customise k3s CIDRs) ----
K3S_POD_CIDR="10.42.0.0/16" # flannel pod network — accepted to/from host
K3S_SERVICE_CIDR="10.43.0.0/16" # cluster service network — accepted to/from host
# --- Rancher Custom-cluster registration (SECRET) -------------------------
# From Rancher → Cluster Management → <cluster> → Registration tab. Create the
# cluster with the **K3s** distribution first. Token + checksum are secrets.
RANCHER_SERVER_URL="https://rancher.example.com"
RANCHER_NODE_TOKEN="" # REQUIRED — node registration token
RANCHER_CA_CHECKSUM="" # REQUIRED — CA checksum from the same command
RANCHER_NODE_ROLES="--etcd --controlplane --worker" # single node = all three
RANCHER_INSECURE_FETCH="true" # true if Rancher is reached by IP / self-signed cert
# --- Stalwart mail (host service) -----------------------------------------
# SECRETS — platform-api (k3s) must use the SAME admin password + webhook secret.
STALWART_VERSION="latest" # pin to a release tag after first install
STALWART_ADMIN_PASSWORD="" # REQUIRED — openssl rand -hex 24
STALWART_WEBHOOK_SECRET="" # REQUIRED — openssl rand -hex 32
# --- Restic backups (host) ------------------------------------------------
# Storage Box is SSH/SFTP on PORT 23, key auth. STORE RESTIC_PASSWORD OFFLINE.
RESTIC_PASSWORD="" # REQUIRED — openssl rand -hex 32 (save offline!)
BACKUP_PRIMARY_REPO="" # sftp:<user>@<user>.your-storagebox.de:/dezky
BACKUP_DR_REPO="" # sftp:<user>@<user>.your-storagebox.de:/dezky (Helsinki box)
BACKUP_PATHS="/opt/stalwart/data /opt/stalwart/etc /var/lib/rancher/k3s/server/db/snapshots /var/lib/rancher/k3s/storage"
BACKUP_RETENTION="--keep-daily 7 --keep-weekly 4 --keep-monthly 6"
BACKUP_HEALTHCHECK_URL="" # optional dead-man's-switch base URL
@@ -0,0 +1,27 @@
# Dezky host firewall — loads ONLY our table on boot.
#
# Deliberately does NOT use the distro 'nftables.service', whose default
# config starts with `flush ruleset` and would wipe k3s's tables. This unit
# applies /etc/nftables.d/dezky-fw.nft, which only (re)creates inet dezky_fw.
#
# Ordering: runs early (before k3s) so the box is never briefly exposed.
# k3s adds its own tables independently afterwards.
[Unit]
Description=Dezky host firewall (nftables, k3s-safe)
Wants=network-pre.target
Before=network-pre.target k3s.service
DefaultDependencies=no
Conflicts=shutdown.target
Before=shutdown.target
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/sbin/nft -f /etc/nftables.d/dezky-fw.nft
ExecReload=/usr/sbin/nft -f /etc/nftables.d/dezky-fw.nft
# On stop, remove only our table — leave k3s networking intact.
ExecStop=/usr/sbin/nft destroy table inet dezky_fw
[Install]
WantedBy=multi-user.target
+160
View File
@@ -0,0 +1,160 @@
#!/usr/bin/env bash
#
# Dezky production host firewall (nftables) — k3s-safe.
#
# Why this design:
# - k3s/kube-proxy/flannel manage their OWN nftables tables (ip/ip6: filter,
# nat, mangle). We must never `flush ruleset` or use ufw/firewalld, or we
# wipe/clobber cluster networking. Instead we own a single dedicated table,
# `inet dezky_fw`, with only an INPUT chain. Separate tables coexist; a
# packet is dropped if ANY base chain drops it, so our default-drop INPUT
# is the gate for host-bound traffic while k3s keeps owning FORWARD/NAT.
# - We explicitly accept the pod/service CIDRs and CNI interfaces so
# cluster<->host traffic (API server, kubelet, CoreDNS) is never dropped.
#
# Idempotent: re-running replaces only our table (`destroy table` first).
#
# Usage (as root, on the server):
# ./firewall.sh # render from ../config.env, install unit, apply
# ./firewall.sh --dry-run # print the ruleset, apply nothing
set -euo pipefail
RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'; BLUE='\033[0;34m'; NC='\033[0m'
info() { echo -e "${BLUE}[INFO]${NC} $*"; }
ok() { echo -e "${GREEN}[OK]${NC} $*"; }
warn() { echo -e "${YELLOW}[WARN]${NC} $*"; }
error() { echo -e "${RED}[ERROR]${NC} $*" >&2; }
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
HOST_DIR="$(cd "$SCRIPT_DIR/.." && pwd)"
CONFIG_FILE="${CONFIG_FILE:-$HOST_DIR/config.env}"
NFT_OUT="/etc/nftables.d/dezky-fw.nft"
UNIT_SRC="$SCRIPT_DIR/dezky-firewall.service"
UNIT_DST="/etc/systemd/system/dezky-firewall.service"
DRY_RUN=0
[[ "${1:-}" == "--dry-run" ]] && DRY_RUN=1
# ── Load config ───────────────────────────────────────────────────────────
if [[ ! -f "$CONFIG_FILE" ]]; then
error "Config not found: $CONFIG_FILE"
error "Copy config.env.example → config.env and fill it in."
exit 1
fi
# shellcheck disable=SC1090
source "$CONFIG_FILE"
: "${MGMT_ALLOW_V4:?MGMT_ALLOW_V4 is required in config.env}"
: "${SSH_PORT:=22}"
: "${K3S_POD_CIDR:=10.42.0.0/16}"
: "${K3S_SERVICE_CIDR:=10.43.0.0/16}"
# ── Build the management v6 block only if a v6 address is configured ───────
V6_SET=""
V6_RULE=""
if [[ -n "${MGMT_ALLOW_V6:-}" ]]; then
V6_SET=$(cat <<EOF
set mgmt_v6 {
type ipv6_addr
flags interval
elements = { ${MGMT_ALLOW_V6} }
}
EOF
)
V6_RULE=" ip6 saddr @mgmt_v6 tcp dport { ${SSH_PORT}, 6443 } accept"
fi
# ── Render the ruleset ─────────────────────────────────────────────────────
RULESET=$(cat <<EOF
#!/usr/sbin/nft -f
#
# Managed by Dezky firewall.sh — DO NOT edit by hand.
# Owns only 'inet dezky_fw'. k3s manages its own ip/ip6 tables separately.
# NEVER add 'flush ruleset' here: it would wipe k3s networking.
destroy table inet dezky_fw
table inet dezky_fw {
# Management source allowlist (SSH + k3s API). Intervals allow CIDRs.
set mgmt_v4 {
type ipv4_addr
flags interval
elements = { ${MGMT_ALLOW_V4} }
}${V6_SET}
chain input {
type filter hook input priority filter; policy drop;
# Stateful fast-path
ct state established,related accept
ct state invalid drop
# Loopback
iif "lo" accept
# ICMP — keep ping working and (critically) IPv6 NDP/RA + PMTUD
ip protocol icmp accept
ip6 nexthdr icmpv6 accept
# ── k3s internal: never block cluster <-> host traffic ──────────────
iifname "cni0" accept
iifname "flannel.1" accept
ip saddr ${K3S_POD_CIDR} accept
ip saddr ${K3S_SERVICE_CIDR} accept
# ── Public services (world-reachable) ──────────────────────────────
# Web + ACME HTTP-01 challenge
tcp dport { 80, 443 } accept
# Mail: smtp, submissions, submission, imap, imaps, managesieve
tcp dport { 25, 465, 587, 143, 993, 4190 } accept
# ── Management surfaces: home IP only ──────────────────────────────
ip saddr @mgmt_v4 tcp dport { ${SSH_PORT}, 6443 } accept
${V6_RULE}
# Rate-limited drop logging for debugging (then policy drop applies)
limit rate 5/minute burst 5 packets log prefix "dezky-fw drop: " level info
}
}
EOF
)
if [[ $DRY_RUN -eq 1 ]]; then
echo "$RULESET"
info "Dry run — nothing applied."
exit 0
fi
if [[ $EUID -ne 0 ]]; then
error "Must run as root to apply the firewall."
exit 1
fi
# ── Write, validate, install, apply ────────────────────────────────────────
mkdir -p /etc/nftables.d
echo "$RULESET" > "$NFT_OUT"
chmod 0644 "$NFT_OUT"
info "Wrote ruleset → $NFT_OUT"
# Validate syntax before touching the live ruleset
if ! nft -c -f "$NFT_OUT"; then
error "nft syntax check FAILED — not applying. Live firewall unchanged."
exit 1
fi
ok "Ruleset syntax valid."
# Install the systemd unit so the rules survive reboot (and never flush global)
if [[ -f "$UNIT_SRC" ]]; then
install -m 0644 "$UNIT_SRC" "$UNIT_DST"
systemctl daemon-reload
systemctl enable dezky-firewall.service >/dev/null 2>&1 || true
ok "Installed + enabled dezky-firewall.service"
fi
# Apply now
nft -f "$NFT_OUT"
ok "Firewall applied. Management restricted to: ${MGMT_ALLOW_V4} ${MGMT_ALLOW_V6:-}"
warn "Open a SECOND SSH session NOW and confirm you still have access before"
warn "closing this one. Hetzner KVM/LARA is your out-of-band fallback."
+93
View File
@@ -0,0 +1,93 @@
#!/usr/bin/env bash
#
# Register the AX41 as a single-node k3s cluster in Rancher (Custom cluster,
# provisioning v2). Run AFTER bootstrap.sh — the firewall already allows the
# outbound 443 the cluster-agent needs (no inbound rule required).
#
# This downloads Rancher's system-agent installer and runs it. The agent then
# pulls the cluster spec from Rancher and stands up k3s with the configured
# roles. The Rancher Custom cluster MUST be created with the K3s distribution.
#
# Security note: Rancher here is addressed by IP, whose TLS cert won't match,
# so we fetch the installer with --insecure. That's acceptable because the
# agent verifies Rancher's CA via --ca-checksum for its ongoing connection.
# Move Rancher behind rancher.dezky.eu + a valid cert to drop the insecure fetch.
#
# Usage (on the server):
# sudo ./register.sh # register this node
# sudo ./register.sh --force # re-run even if an agent is already present
set -euo pipefail
RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'; BLUE='\033[0;34m'; NC='\033[0m'
info() { echo -e "${BLUE}[INFO]${NC} $*"; }
ok() { echo -e "${GREEN}[OK]${NC} $*"; }
warn() { echo -e "${YELLOW}[WARN]${NC} $*"; }
error() { echo -e "${RED}[ERROR]${NC} $*" >&2; }
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
HOST_DIR="$(cd "$SCRIPT_DIR/.." && pwd)"
CONFIG_FILE="${CONFIG_FILE:-$HOST_DIR/config.env}"
FORCE=0
[[ "${1:-}" == "--force" ]] && FORCE=1
if [[ $EUID -ne 0 ]]; then
error "Run with sudo/root (the agent installer needs root)."
exit 1
fi
if [[ ! -f "$CONFIG_FILE" ]]; then
error "Missing $CONFIG_FILE — fill in the RANCHER_* values first."
exit 1
fi
# shellcheck disable=SC1090
source "$CONFIG_FILE"
: "${RANCHER_SERVER_URL:?RANCHER_SERVER_URL required}"
: "${RANCHER_NODE_TOKEN:?RANCHER_NODE_TOKEN required}"
: "${RANCHER_CA_CHECKSUM:?RANCHER_CA_CHECKSUM required}"
: "${RANCHER_NODE_ROLES:=--etcd --controlplane --worker}"
: "${RANCHER_INSECURE_FETCH:=true}"
# ── Idempotency guard ──────────────────────────────────────────────────────
if systemctl list-unit-files 2>/dev/null | grep -q '^rancher-system-agent'; then
if [[ $FORCE -eq 0 ]]; then
warn "rancher-system-agent already installed — node looks registered."
warn "Re-run with --force to register again. Skipping."
exit 0
fi
warn "rancher-system-agent present, but --force given — proceeding."
fi
# ── Fetch installer ────────────────────────────────────────────────────────
INSECURE_FLAG=""
if [[ "$RANCHER_INSECURE_FETCH" == "true" ]]; then
INSECURE_FLAG="--insecure"
warn "Fetching installer insecurely (Rancher reached by IP). CA checksum still pins the agent connection."
fi
TMP_INSTALLER="$(mktemp /tmp/rancher-system-agent-install.XXXXXX.sh)"
trap 'rm -f "$TMP_INSTALLER"' EXIT
info "Downloading system-agent installer from ${RANCHER_SERVER_URL} ..."
# shellcheck disable=SC2086
curl -fsSL $INSECURE_FLAG "${RANCHER_SERVER_URL}/system-agent-install.sh" -o "$TMP_INSTALLER"
ok "Installer downloaded ($(wc -c < "$TMP_INSTALLER") bytes)."
# ── Register ───────────────────────────────────────────────────────────────
info "Registering node with roles: ${RANCHER_NODE_ROLES}"
info "(token masked: ${RANCHER_NODE_TOKEN:0:6}…)"
# shellcheck disable=SC2086
sh "$TMP_INSTALLER" \
--server "${RANCHER_SERVER_URL}" \
--label 'cattle.io/os=linux' \
--token "${RANCHER_NODE_TOKEN}" \
--ca-checksum "${RANCHER_CA_CHECKSUM}" \
${RANCHER_NODE_ROLES}
echo ""
ok "Registration submitted. Watch progress in Rancher (cluster goes Active in a few minutes)."
info "On the node you can follow along with:"
info " journalctl -u rancher-system-agent -f"
info " k3s kubectl get nodes # once k3s is up"
+86
View File
@@ -0,0 +1,86 @@
#!/usr/bin/env bash
#
# Dezky host backup — Restic to a Hetzner Storage Box (primary), then a
# dedup-aware `restic copy` to a second Storage Box in Helsinki (DR).
#
# Runs as root (must read stalwart- and root-owned data). HOME is pointed at
# /opt/dezky-backup so ssh uses the dedicated backup key + config (Storage Box
# is SSH/SFTP on port 23). Triggered daily by dezky-backup.timer.
#
# Requires restic >= 0.14 (for `copy --from-repo`).
set -euo pipefail
RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'; BLUE='\033[0;34m'; NC='\033[0m'
info() { echo -e "${BLUE}[INFO]${NC} $*"; }
ok() { echo -e "${GREEN}[OK]${NC} $*"; }
warn() { echo -e "${YELLOW}[WARN]${NC} $*"; }
error() { echo -e "${RED}[ERROR]${NC} $*" >&2; }
BACKUP_HOME="/opt/dezky-backup"
ENV_FILE="${ENV_FILE:-$BACKUP_HOME/restic.env}"
if [[ ! -f "$ENV_FILE" ]]; then
error "Missing $ENV_FILE — run restic/install.sh first."
exit 1
fi
# shellcheck disable=SC1090
source "$ENV_FILE"
: "${RESTIC_PASSWORD:?RESTIC_PASSWORD required}"
: "${BACKUP_PRIMARY_REPO:?BACKUP_PRIMARY_REPO required}"
: "${BACKUP_PATHS:?BACKUP_PATHS required}"
: "${BACKUP_RETENTION:=--keep-daily 7 --keep-weekly 4 --keep-monthly 6}"
# ssh (spawned by restic) reads $HOME/.ssh/config — wildcard for *.your-storagebox.de
export HOME="$BACKUP_HOME"
export RESTIC_PASSWORD
# For `copy`: both repos share the same password.
export RESTIC_FROM_PASSWORD="$RESTIC_PASSWORD"
# Optional dead-man's-switch (e.g. healthchecks.io). Pinged /start, success, /fail.
HC="${BACKUP_HEALTHCHECK_URL:-}"
ping_hc() { [[ -n "$HC" ]] && curl -fsS -m 10 --retry 3 "${HC}${1:-}" >/dev/null 2>&1 || true; }
fail() { error "$1"; ping_hc "/fail"; exit 1; }
ping_hc "/start"
# Exclude obvious churn/noise from the PVC tree
EXCLUDES=(--exclude-caches
--exclude '*/lost+found'
--exclude '*.tmp')
# ── 1) Back up to the primary Storage Box ──────────────────────────────────
info "Backing up to primary: $BACKUP_PRIMARY_REPO"
# shellcheck disable=SC2086
restic -r "$BACKUP_PRIMARY_REPO" backup $BACKUP_PATHS \
"${EXCLUDES[@]}" \
--tag dezky --tag host \
--host dezky-node1 \
|| fail "Primary backup failed."
ok "Primary backup done."
# ── 2) Retention on primary ────────────────────────────────────────────────
info "Applying retention on primary..."
# shellcheck disable=SC2086
restic -r "$BACKUP_PRIMARY_REPO" forget $BACKUP_RETENTION --prune \
|| warn "Primary forget/prune reported an issue (backup itself is safe)."
# ── 3) Light integrity check on primary ────────────────────────────────────
restic -r "$BACKUP_PRIMARY_REPO" check || warn "restic check flagged the primary repo — investigate."
# ── 4) Mirror to the Helsinki DR box (dedup-aware copy) ─────────────────────
if [[ -n "${BACKUP_DR_REPO:-}" ]]; then
info "Copying snapshots to DR: $BACKUP_DR_REPO"
restic -r "$BACKUP_DR_REPO" copy --from-repo "$BACKUP_PRIMARY_REPO" \
|| fail "DR copy failed."
# shellcheck disable=SC2086
restic -r "$BACKUP_DR_REPO" forget $BACKUP_RETENTION --prune \
|| warn "DR forget/prune reported an issue."
ok "DR mirror done."
else
warn "BACKUP_DR_REPO not set — skipping off-site mirror (set it for real DR)."
fi
ok "Backup cycle complete."
ping_hc "" # success ping (bare URL)
@@ -0,0 +1,13 @@
# Dezky nightly backup (Restic → Storage Box primary + Helsinki DR).
[Unit]
Description=Dezky host backup (Restic)
After=network-online.target
Wants=network-online.target
[Service]
Type=oneshot
ExecStart=/opt/dezky-backup/backup.sh
# Backups are I/O heavy but should never starve mail/k3s
Nice=10
IOSchedulingClass=best-effort
IOSchedulingPriority=6
@@ -0,0 +1,12 @@
# Nightly at 03:20 UTC, with a randomized delay so it doesn't hammer the
# Storage Box at the same second every night. Catches up if the box was off.
[Unit]
Description=Run the Dezky host backup nightly
[Timer]
OnCalendar=*-*-* 03:20:00
RandomizedDelaySec=20min
Persistent=true
[Install]
WantedBy=timers.target
+115
View File
@@ -0,0 +1,115 @@
#!/usr/bin/env bash
#
# Install Dezky host backups: Restic + a dedicated backup SSH key/config for the
# Hetzner Storage Box(es), the env file, the backup/restore scripts, and the
# nightly systemd timer. Idempotent.
#
# sudo ./install.sh
#
# Storage Box uses SSH/SFTP on PORT 23 with key auth. After this runs, you must
# upload the printed public key to BOTH Storage Boxes, then re-run to init the
# repos (the box must trust the key before `restic init` can connect).
set -euo pipefail
RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'; BLUE='\033[0;34m'; NC='\033[0m'
info() { echo -e "${BLUE}[INFO]${NC} $*"; }
ok() { echo -e "${GREEN}[OK]${NC} $*"; }
warn() { echo -e "${YELLOW}[WARN]${NC} $*"; }
error() { echo -e "${RED}[ERROR]${NC} $*" >&2; }
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
HOST_DIR="$(cd "$SCRIPT_DIR/.." && pwd)"
CONFIG_FILE="${CONFIG_FILE:-$HOST_DIR/config.env}"
BACKUP_HOME="/opt/dezky-backup"
SSH_DIR="$BACKUP_HOME/.ssh"
KEY="$SSH_DIR/id_ed25519"
if [[ $EUID -ne 0 ]]; then error "Run as root."; exit 1; fi
if [[ ! -f "$CONFIG_FILE" ]]; then error "Missing $CONFIG_FILE"; exit 1; fi
# shellcheck disable=SC1090
source "$CONFIG_FILE"
: "${RESTIC_PASSWORD:?RESTIC_PASSWORD required (and STORE IT OFFLINE — losing it loses the backups)}"
: "${BACKUP_PRIMARY_REPO:?BACKUP_PRIMARY_REPO required}"
: "${BACKUP_PATHS:?BACKUP_PATHS required}"
: "${BACKUP_RETENTION:=--keep-daily 7 --keep-weekly 4 --keep-monthly 6}"
# ── 1) Packages ────────────────────────────────────────────────────────────
info "Installing restic + openssh client..."
export DEBIAN_FRONTEND=noninteractive
apt-get update -qq
apt-get install -y -qq restic curl openssh-client >/dev/null
ok "restic $(restic version | awk '{print $2}') installed."
# ── 2) Backup home + SSH key/config ────────────────────────────────────────
info "Setting up $BACKUP_HOME ..."
install -d -m 0700 "$BACKUP_HOME" "$SSH_DIR"
if [[ ! -f "$KEY" ]]; then
ssh-keygen -t ed25519 -N "" -C "dezky-backup@node1" -f "$KEY" >/dev/null
ok "Generated backup SSH key."
fi
# Single wildcard config covers BOTH Storage Boxes (same domain, port 23, key).
cat > "$SSH_DIR/config" <<EOF
Host *.your-storagebox.de
Port 23
IdentityFile $KEY
IdentitiesOnly yes
StrictHostKeyChecking accept-new
UserKnownHostsFile $SSH_DIR/known_hosts
EOF
chmod 0600 "$SSH_DIR/config" "$KEY"
chmod 0644 "$KEY.pub"
# ── 3) restic.env (secrets; generated, not in git) ─────────────────────────
umask 077
cat > "$BACKUP_HOME/restic.env" <<EOF
# Generated by restic/install.sh from config.env — DO NOT commit.
RESTIC_PASSWORD=${RESTIC_PASSWORD}
BACKUP_PRIMARY_REPO=${BACKUP_PRIMARY_REPO}
BACKUP_DR_REPO=${BACKUP_DR_REPO:-}
BACKUP_PATHS=${BACKUP_PATHS}
BACKUP_RETENTION=${BACKUP_RETENTION}
BACKUP_HEALTHCHECK_URL=${BACKUP_HEALTHCHECK_URL:-}
EOF
chmod 0600 "$BACKUP_HOME/restic.env"
ok "Wrote restic.env."
# ── 4) Scripts + systemd units ─────────────────────────────────────────────
install -m 0750 "$SCRIPT_DIR/backup.sh" "$BACKUP_HOME/backup.sh"
install -m 0750 "$SCRIPT_DIR/restore.sh" "$BACKUP_HOME/restore.sh"
install -m 0644 "$SCRIPT_DIR/dezky-backup.service" /etc/systemd/system/dezky-backup.service
install -m 0644 "$SCRIPT_DIR/dezky-backup.timer" /etc/systemd/system/dezky-backup.timer
systemctl daemon-reload
systemctl enable --now dezky-backup.timer
ok "Nightly timer enabled."
# ── 5) Try to init the repos (only works once the key is on the box) ───────
export HOME="$BACKUP_HOME" RESTIC_PASSWORD
init_repo() {
local repo="$1" label="$2"
[[ -z "$repo" ]] && return 0
if restic -r "$repo" cat config >/dev/null 2>&1; then
ok "$label repo already initialized."
elif restic -r "$repo" init >/dev/null 2>&1; then
ok "$label repo initialized."
else
warn "$label repo not reachable/authorized yet — upload the key, then re-run."
fi
}
init_repo "$BACKUP_PRIMARY_REPO" "Primary"
init_repo "${BACKUP_DR_REPO:-}" "DR"
echo ""
echo "╔══════════════════════════════════════════════════════════════╗"
echo "║ Backup install complete ║"
echo "╚══════════════════════════════════════════════════════════════╝"
warn "Upload this PUBLIC key to BOTH Storage Boxes, then re-run install.sh:"
echo ""
cat "$KEY.pub"
echo ""
info " ssh-copy-id -p 23 -i $KEY.pub <primary-user>@<primary-host>.your-storagebox.de"
info " ssh-copy-id -p 23 -i $KEY.pub <dr-user>@<dr-host>.your-storagebox.de"
info "Then test: sudo $BACKUP_HOME/backup.sh (or wait for 03:20 UTC)"
info "Drill restore: sudo $BACKUP_HOME/restore.sh restore latest /tmp/restore-test"
warn "STORE RESTIC_PASSWORD OFFLINE. Without it, the encrypted backups are unrecoverable."
+57
View File
@@ -0,0 +1,57 @@
#!/usr/bin/env bash
#
# Dezky restore helper. A backup you've never restored is a backup you don't
# have — run a drill periodically. This wraps the common restic restore flows.
#
# sudo ./restore.sh snapshots # list snapshots (primary)
# sudo ./restore.sh snapshots --dr # list from the DR box
# sudo ./restore.sh restore <snapshot-id> <target-dir> [--dr]
# sudo ./restore.sh restore latest /tmp/restore-test # safe drill target
#
# Restores go to an arbitrary target dir (NOT in place) so you can inspect first.
# For Stalwart, stop the service, swap /opt/stalwart/data, then start it.
set -euo pipefail
RED='\033[0;31m'; GREEN='\033[0;32m'; BLUE='\033[0;34m'; NC='\033[0m'
info() { echo -e "${BLUE}[INFO]${NC} $*"; }
ok() { echo -e "${GREEN}[OK]${NC} $*"; }
error() { echo -e "${RED}[ERROR]${NC} $*" >&2; }
BACKUP_HOME="/opt/dezky-backup"
ENV_FILE="${ENV_FILE:-$BACKUP_HOME/restic.env}"
[[ -f "$ENV_FILE" ]] || { error "Missing $ENV_FILE"; exit 1; }
# shellcheck disable=SC1090
source "$ENV_FILE"
export HOME="$BACKUP_HOME"
export RESTIC_PASSWORD
pick_repo() {
if [[ "${*: -1}" == "--dr" ]]; then
[[ -n "${BACKUP_DR_REPO:-}" ]] || { error "BACKUP_DR_REPO not set"; exit 1; }
echo "$BACKUP_DR_REPO"
else
echo "$BACKUP_PRIMARY_REPO"
fi
}
cmd="${1:-}"; shift || true
case "$cmd" in
snapshots)
repo="$(pick_repo "$@")"
info "Snapshots in $repo:"
restic -r "$repo" snapshots --tag dezky
;;
restore)
snap="${1:?snapshot id (or 'latest')}"; target="${2:?target dir}"
repo="$(pick_repo "$@")"
mkdir -p "$target"
info "Restoring $snap from $repo$target"
restic -r "$repo" restore "$snap" --target "$target"
ok "Restored. Inspect $target before putting anything back in place."
;;
*)
error "Usage: $0 {snapshots|restore} ... (see header)"
exit 1
;;
esac
+77
View File
@@ -0,0 +1,77 @@
#!/usr/bin/env bash
#
# Sync the mail.dezky.eu TLS cert from the cluster (issued by cert-manager) to
# Stalwart on the host. The host IS the k3s node, so we read the secret via the
# local kubeconfig — no external machinery. Reloads Stalwart only when the cert
# actually changed (cert-manager renews ~30 days before expiry).
#
# Run by stalwart-cert-sync.timer (every 12h + on boot). Safe to run by hand.
#
# Forward dependency: needs the fleet layer to have created the TLS secret
# (default: namespace 'mail', secret 'mail-tls'). Until then this is a no-op and
# Stalwart keeps using the self-signed bootstrap cert from install.sh.
set -euo pipefail
RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'; BLUE='\033[0;34m'; NC='\033[0m'
info() { echo -e "${BLUE}[INFO]${NC} $*"; }
ok() { echo -e "${GREEN}[OK]${NC} $*"; }
warn() { echo -e "${YELLOW}[WARN]${NC} $*"; }
error() { echo -e "${RED}[ERROR]${NC} $*" >&2; }
TLS_NAMESPACE="${TLS_NAMESPACE:-mail}"
TLS_SECRET="${TLS_SECRET:-mail-tls}"
TLS_DIR="/opt/stalwart/etc/tls"
KUBECONFIG_PATH="${KUBECONFIG:-/etc/rancher/k3s/k3s.yaml}"
# kubectl: prefer standalone, fall back to the k3s-bundled one
if command -v kubectl >/dev/null 2>&1; then
KUBECTL=(kubectl)
elif command -v k3s >/dev/null 2>&1; then
KUBECTL=(k3s kubectl)
else
error "Neither kubectl nor k3s found — is the node provisioned yet?"
exit 1
fi
export KUBECONFIG="$KUBECONFIG_PATH"
# Pull the secret (no-op if it doesn't exist yet)
if ! "${KUBECTL[@]}" -n "$TLS_NAMESPACE" get secret "$TLS_SECRET" >/dev/null 2>&1; then
warn "Secret ${TLS_NAMESPACE}/${TLS_SECRET} not present yet — cert-manager hasn't issued it. Skipping."
exit 0
fi
TMP_CRT="$(mktemp)"; TMP_KEY="$(mktemp)"
trap 'rm -f "$TMP_CRT" "$TMP_KEY"' EXIT
"${KUBECTL[@]}" -n "$TLS_NAMESPACE" get secret "$TLS_SECRET" \
-o jsonpath='{.data.tls\.crt}' | base64 -d > "$TMP_CRT"
"${KUBECTL[@]}" -n "$TLS_NAMESPACE" get secret "$TLS_SECRET" \
-o jsonpath='{.data.tls\.key}' | base64 -d > "$TMP_KEY"
if [[ ! -s "$TMP_CRT" || ! -s "$TMP_KEY" ]]; then
error "Fetched cert or key is empty — leaving current cert in place."
exit 1
fi
# Only reload if something changed (compare hashes)
changed=0
mkdir -p "$TLS_DIR"
if ! cmp -s "$TMP_CRT" "$TLS_DIR/cert.pem" 2>/dev/null; then changed=1; fi
if ! cmp -s "$TMP_KEY" "$TLS_DIR/key.pem" 2>/dev/null; then changed=1; fi
if [[ $changed -eq 0 ]]; then
info "Cert unchanged — nothing to do."
exit 0
fi
install -o stalwart -g stalwart -m 0644 "$TMP_CRT" "$TLS_DIR/cert.pem"
install -o stalwart -g stalwart -m 0640 "$TMP_KEY" "$TLS_DIR/key.pem"
ok "Updated mail TLS cert from ${TLS_NAMESPACE}/${TLS_SECRET}."
# SIGHUP Stalwart to reload certs without dropping connections
if systemctl is-active --quiet stalwart-mail; then
systemctl reload stalwart-mail && ok "Reloaded stalwart-mail (SIGHUP)."
else
warn "stalwart-mail not active — cert staged, will be used on next start."
fi
@@ -0,0 +1,102 @@
# Stalwart Mail Server — Dezky PRODUCTION (bare-metal host, outside k3s)
#
# Topology (see host/README.md):
# - Mail protocol ports bind directly on the host's public IP.
# - Web/JMAP is served plaintext on 127-reachable :8080 and fronted by
# Traefik (k3s) for mail.dezky.eu:443. Stalwart does NOT bind 80/443 —
# those belong to Traefik.
# - TLS for the mail-protocol ports uses a cert ISSUED BY cert-manager
# (mail.dezky.eu) and copied here by stalwart/cert-sync.sh. Stalwart runs
# no ACME of its own (80/443 are Traefik's).
# - Storage is RocksDB on local disk — intentionally independent of the
# in-cluster Postgres so mail keeps flowing regardless of cluster state.
#
# Reference: https://stalw.art/docs
[server]
hostname = "mail.dezky.eu" # MUST match the IP's PTR/rDNS record
# ── Listeners ──────────────────────────────────────────────────────────────
# Mail protocols on the public IP; management/JMAP on internal 8080 only
# (firewall blocks 8080 from the world, allows the k3s pod CIDR + Traefik).
[server.listener]
"smtp" = { bind = "[::]:25", protocol = "smtp" }
"submission" = { bind = "[::]:587", protocol = "smtp", tls.implicit = false }
"submissions" = { bind = "[::]:465", protocol = "smtp", tls.implicit = true }
"imap" = { bind = "[::]:143", protocol = "imap", tls.implicit = false }
"imaps" = { bind = "[::]:993", protocol = "imap", tls.implicit = true }
"sieve" = { bind = "[::]:4190", protocol = "managesieve" }
# Internal HTTP: JMAP + WebAdmin + management API. Traefik terminates TLS for
# the public hostname and proxies here; platform-api (pod) calls it directly.
"http" = { bind = "0.0.0.0:8080", protocol = "http" }
# ── Storage — RocksDB on local disk (host-isolated from the cluster) ────────
[store."rocksdb"]
type = "rocksdb"
path = "/opt/stalwart/data"
compression = "lz4"
[storage]
data = "rocksdb"
fts = "rocksdb"
blob = "rocksdb"
lookup = "rocksdb"
directory = "internal"
[directory."internal"]
type = "internal"
store = "rocksdb"
# ── TLS — cert issued by cert-manager, synced here by cert-sync.sh ──────────
# Until the first sync runs, install.sh drops a self-signed bootstrap cert so
# the TLS listeners can start. cert-sync replaces it with the real LE cert.
[certificate."default"]
cert = "%{file:/opt/stalwart/etc/tls/cert.pem}%"
private-key = "%{file:/opt/stalwart/etc/tls/key.pem}%"
default = true
# ── Authentication ─────────────────────────────────────────────────────────
# Fallback admin is what platform-api uses for Basic auth on the JMAP
# management API (STALWART_ADMIN_USER/PASSWORD on the platform-api side).
[authentication]
fallback-admin.user = "admin"
fallback-admin.secret = "$env{STALWART_ADMIN_PASSWORD}"
# ── Resolver ───────────────────────────────────────────────────────────────
# DNSSEC-aware system resolver. Mail deliverability depends on clean DNS.
[resolver]
type = "system"
preserve-intermediates = true
concurrency = 4
# ── Spam filtering — built-in filter ON in production ──────────────────────
[spam-filter]
enable = true
# ── Logging — journald captures stdout ─────────────────────────────────────
[tracer."stdout"]
type = "stdout"
level = "info"
ansi = false
enable = true
# ── Audit webhook → platform-api (via the public api ingress) ──────────────
# Stalwart on the host reaches platform-api through Traefik on the public
# hostname; HMAC-signed so a public endpoint is safe.
[webhook."audit-ingest"]
url = "https://api.dezky.eu/ingest/stalwart/webhook"
signature-key = "$env{STALWART_WEBHOOK_SECRET}"
events = [
"auth.success",
"auth.failure",
"auth.banned",
"account.created",
"account.deleted",
"account.password-changed",
"message.rejected",
"policy.rejection",
"dkim.failure",
"dmarc.failure",
"spam.detected",
]
throttle = "1s"
+144
View File
@@ -0,0 +1,144 @@
#!/usr/bin/env bash
#
# Install Stalwart mail server as a hardened host systemd service on the AX41.
# Run AFTER bootstrap.sh (and ideally after k3s registration, so cert-sync can
# immediately pull the real cert). Idempotent — safe to re-run to upgrade.
#
# sudo ./install.sh
#
# What it does: creates the stalwart user + /opt/stalwart layout, downloads a
# pinned Stalwart binary, installs config.toml + the secrets EnvironmentFile,
# drops a self-signed bootstrap cert (replaced later by cert-sync), and installs
# the systemd units (mail service + cert-sync service/timer).
set -euo pipefail
RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'; BLUE='\033[0;34m'; NC='\033[0m'
info() { echo -e "${BLUE}[INFO]${NC} $*"; }
ok() { echo -e "${GREEN}[OK]${NC} $*"; }
warn() { echo -e "${YELLOW}[WARN]${NC} $*"; }
error() { echo -e "${RED}[ERROR]${NC} $*" >&2; }
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
HOST_DIR="$(cd "$SCRIPT_DIR/.." && pwd)"
CONFIG_FILE="${CONFIG_FILE:-$HOST_DIR/config.env}"
PREFIX="/opt/stalwart"
STALWART_REPO="${STALWART_REPO:-stalwartlabs/mail-server}"
if [[ $EUID -ne 0 ]]; then
error "Run as root."
exit 1
fi
if [[ ! -f "$CONFIG_FILE" ]]; then
error "Missing $CONFIG_FILE — fill in the STALWART_* values first."
exit 1
fi
# shellcheck disable=SC1090
source "$CONFIG_FILE"
: "${STALWART_ADMIN_PASSWORD:?STALWART_ADMIN_PASSWORD required (openssl rand -hex 24)}"
: "${STALWART_WEBHOOK_SECRET:?STALWART_WEBHOOK_SECRET required (openssl rand -hex 32)}"
: "${STALWART_VERSION:=latest}"
# ── Step 1: user + directory layout ────────────────────────────────────────
info "Step 1: stalwart user + ${PREFIX} layout..."
if ! id -u stalwart >/dev/null 2>&1; then
useradd --system --home-dir "$PREFIX" --shell /usr/sbin/nologin stalwart
fi
install -d -o stalwart -g stalwart -m 0750 "$PREFIX" "$PREFIX/bin" "$PREFIX/data" "$PREFIX/logs"
install -d -o stalwart -g stalwart -m 0750 "$PREFIX/etc" "$PREFIX/etc/tls"
ok "Layout ready."
# ── Step 2: download the Stalwart binary ───────────────────────────────────
info "Step 2: fetching Stalwart binary (${STALWART_REPO}@${STALWART_VERSION})..."
arch="$(uname -m)"
case "$arch" in
x86_64) target="x86_64-unknown-linux-gnu" ;;
aarch64) target="aarch64-unknown-linux-gnu" ;;
*) error "Unsupported arch: $arch"; exit 1 ;;
esac
if [[ "$STALWART_VERSION" == "latest" ]]; then
api="https://api.github.com/repos/${STALWART_REPO}/releases/latest"
warn "Using 'latest' — pin STALWART_VERSION to a tag in config.env after this install."
else
api="https://api.github.com/repos/${STALWART_REPO}/releases/tags/${STALWART_VERSION}"
fi
asset_url="$(curl -fsSL "$api" \
| grep -oE "https://[^\"]+${target}[^\"]+\.tar\.gz" \
| head -n1)"
if [[ -z "$asset_url" ]]; then
error "Could not find a ${target} .tar.gz asset in ${STALWART_REPO}@${STALWART_VERSION}."
error "Check the release assets or set STALWART_REPO/STALWART_VERSION."
exit 1
fi
tmp="$(mktemp -d)"; trap 'rm -rf "$tmp"' EXIT
info "Downloading $asset_url"
curl -fsSL "$asset_url" -o "$tmp/stalwart.tar.gz"
tar -xzf "$tmp/stalwart.tar.gz" -C "$tmp"
bin="$(find "$tmp" -type f \( -name stalwart -o -name stalwart-mail \) | head -n1)"
if [[ -z "$bin" ]]; then
error "No 'stalwart'/'stalwart-mail' binary found in the archive."
exit 1
fi
systemctl stop stalwart-mail 2>/dev/null || true
install -o stalwart -g stalwart -m 0755 "$bin" "$PREFIX/bin/stalwart"
ok "Installed $("$PREFIX/bin/stalwart" --version 2>/dev/null || echo 'stalwart binary')."
# ── Step 3: config + secrets EnvironmentFile ───────────────────────────────
info "Step 3: config.toml + secrets env..."
install -o stalwart -g stalwart -m 0640 "$SCRIPT_DIR/config.toml" "$PREFIX/etc/config.toml"
umask 077
cat > "$PREFIX/etc/stalwart.env" <<EOF
# Generated by install.sh from config.env — DO NOT commit.
STALWART_ADMIN_PASSWORD=${STALWART_ADMIN_PASSWORD}
STALWART_WEBHOOK_SECRET=${STALWART_WEBHOOK_SECRET}
EOF
chown root:stalwart "$PREFIX/etc/stalwart.env"
chmod 0640 "$PREFIX/etc/stalwart.env"
ok "Config + secrets installed."
# ── Step 4: self-signed bootstrap cert (only if none yet) ──────────────────
if [[ ! -s "$PREFIX/etc/tls/cert.pem" ]]; then
info "Step 4: generating self-signed bootstrap cert (cert-sync replaces it)..."
openssl req -x509 -newkey rsa:2048 -nodes -days 3650 \
-keyout "$PREFIX/etc/tls/key.pem" \
-out "$PREFIX/etc/tls/cert.pem" \
-subj "/CN=mail.dezky.eu" >/dev/null 2>&1
chown stalwart:stalwart "$PREFIX/etc/tls/"*.pem
chmod 0644 "$PREFIX/etc/tls/cert.pem"; chmod 0640 "$PREFIX/etc/tls/key.pem"
ok "Bootstrap cert in place."
else
ok "Step 4: TLS cert already present — keeping it."
fi
# ── Step 5: cert-sync + systemd units ──────────────────────────────────────
info "Step 5: installing cert-sync + systemd units..."
install -o root -g root -m 0755 "$SCRIPT_DIR/cert-sync.sh" "$PREFIX/cert-sync.sh"
install -m 0644 "$SCRIPT_DIR/stalwart-mail.service" /etc/systemd/system/stalwart-mail.service
install -m 0644 "$SCRIPT_DIR/stalwart-cert-sync.service" /etc/systemd/system/stalwart-cert-sync.service
install -m 0644 "$SCRIPT_DIR/stalwart-cert-sync.timer" /etc/systemd/system/stalwart-cert-sync.timer
systemctl daemon-reload
systemctl enable --now stalwart-mail.service
systemctl enable --now stalwart-cert-sync.timer
ok "Services enabled."
# Try an immediate cert sync (no-op until cert-manager has issued the secret)
"$PREFIX/cert-sync.sh" || true
echo ""
echo "╔══════════════════════════════════════════════════════════════╗"
echo "║ Stalwart installed & running ║"
echo "╚══════════════════════════════════════════════════════════════╝"
systemctl --no-pager --lines=0 status stalwart-mail || true
echo ""
warn "Follow-ups:"
warn " • PTR/rDNS for the server IP MUST be 'mail.dezky.eu' (Hetzner Robot)."
warn " • Publish DNS at simply.com: MX → mail.dezky.eu, SPF, DMARC; per-domain"
warn " DKIM records come from Stalwart's dnsZoneFile via platform-api."
warn " • platform-api (k3s) env: STALWART_API_URL=http://<node-ip>:8080"
warn " STALWART_ADMIN_USER=admin STALWART_ADMIN_PASSWORD=<same as here>"
warn " STALWART_WEBHOOK_SECRET=<same as here> STALWART_PROVISIONING_ENABLED=true"
@@ -0,0 +1,10 @@
# Oneshot: sync the mail TLS cert from the cluster to Stalwart.
# Triggered by stalwart-cert-sync.timer.
[Unit]
Description=Sync mail.dezky.eu TLS cert from cluster to Stalwart
After=network-online.target k3s.service
Wants=network-online.target
[Service]
Type=oneshot
ExecStart=/opt/stalwart/cert-sync.sh
@@ -0,0 +1,12 @@
# Run cert-sync shortly after boot and every 12h thereafter. cert-manager
# renews well before expiry, so twice-daily comfortably picks up new certs.
[Unit]
Description=Periodic mail TLS cert sync for Stalwart
[Timer]
OnBootSec=3min
OnUnitActiveSec=12h
Persistent=true
[Install]
WantedBy=timers.target
@@ -0,0 +1,39 @@
# Dezky — Stalwart mail server (bare-metal host service).
#
# Secrets (admin password, webhook secret) come from the EnvironmentFile, which
# install.sh generates from config.env. The binary needs CAP_NET_BIND_SERVICE
# to bind the privileged mail ports (25/143/...) while running as a non-root user.
[Unit]
Description=Stalwart Mail Server (Dezky)
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=stalwart
Group=stalwart
EnvironmentFile=/opt/stalwart/etc/stalwart.env
ExecStart=/opt/stalwart/bin/stalwart --config /opt/stalwart/etc/config.toml
# Stalwart reloads its TLS certs / config on SIGHUP — used by cert-sync.
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
# Bind privileged ports without full root
AmbientCapabilities=CAP_NET_BIND_SERVICE
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
# Hardening — Stalwart only needs to write under /opt/stalwart
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
PrivateTmp=true
ReadWritePaths=/opt/stalwart/data /opt/stalwart/logs /opt/stalwart/etc/tls
ProtectKernelTunables=true
ProtectControlGroups=true
RestrictSUIDSGID=true
[Install]
WantedBy=multi-user.target