v0.16 dropped TOML config. The host service now boots from a tiny config.json that describes only the datastore (RocksDB); all other settings live in the DB (web UI / stalwart-cli / platform-api JMAP). - add stalwart/config.json (RocksDb datastore at /opt/stalwart/data) - install.sh: install config.json instead of config.toml - stalwart-mail.service: --config points at config.json - README: document the v0.16 model + remaining DB-side config + DNS/PTR Verified: Stalwart 0.16.8 runs on node1 with default mail listeners + the :8080 management server. config.toml retained as a reference for the DB settings.
11 KiB
Dezky production — host layer
OS baseline + firewall for the bare-metal Hetzner AX41 that runs the k3s node. This layer is everything that lives on the host (outside Kubernetes): hardening, the k3s-safe firewall, and — added next — k3s registration, Stalwart mail, and Restic backups.
Managed by Fleet/Rancher once k3s is up; this host layer is the part Fleet can't do, so it runs over SSH from reviewed scripts.
Files
| File | Purpose |
|---|---|
config.env.example |
Template for host-specific values |
config.env |
Real values — gitignored. Source of truth lives only on your machine/box |
bootstrap.sh |
One-shot OS hardening: user, SSH, sysctl, swap, fail2ban, auto-updates, firewall |
firewall/firewall.sh |
Renders + applies the k3s-safe nftables ruleset (idempotent) |
firewall/dezky-firewall.service |
systemd unit; reapplies our table on boot, never flushes globally |
k3s/register.sh |
Registers the node into Rancher (Custom k3s cluster); secrets from config.env |
stalwart/install.sh |
Installs Stalwart as a hardened host service (binary, units, secrets, bootstrap cert) |
stalwart/config.toml |
Production Stalwart config (mail ports on host, JMAP on internal 8080) |
stalwart/stalwart-mail.service |
systemd unit; non-root + CAP_NET_BIND_SERVICE for low ports |
stalwart/cert-sync.sh + *.service/*.timer |
Pulls the cert-manager mail cert into Stalwart, reloads on change |
restic/install.sh |
Sets up Restic, the backup SSH key/config, env, and the nightly timer |
restic/backup.sh |
Backup → primary Storage Box, retention, then copy → Helsinki DR |
restic/restore.sh |
List/restore snapshots (run drills!) |
restic/dezky-backup.service + .timer |
Nightly 03:20 UTC backup |
The firewall model (read this)
k3s, kube-proxy and flannel manage their own nftables tables (ip/ip6:
filter, nat, mangle). The classic mistake is running ufw/firewalld or
nft flush ruleset, which wipes or fights those rules and breaks pod networking.
So instead:
- We own a single dedicated table —
inet dezky_fw— with only an INPUT chain (defaultdrop). Separate tables coexist; a packet is dropped if any base chain drops it, so our default-drop INPUT gates host-bound traffic while k3s keeps owning FORWARD/NAT untouched. - We explicitly accept the pod (
10.42.0.0/16) and service (10.43.0.0/16) CIDRs and the CNI interfaces (cni0,flannel.1) so cluster↔host traffic (API server, kubelet, CoreDNS) is never dropped. - We never
flush ruleset. The systemd unit'sExecStopremoves only our table.
Access policy
| Surface | Ports | Who |
|---|---|---|
| Web + ACME | 80, 443 | World (customers) |
| 25, 465, 587, 143, 993, 4190 | World | |
| SSH | 22 | MGMT_ALLOW_V4/V6 only |
| k3s API | 6443 | MGMT_ALLOW_V4/V6 only |
Current management allowlist: home 46.32.144.38, office 46.32.144.45.
The Rancher plane (91.99.122.153) needs no inbound rule — the cluster
agent dials out to Rancher over 443, so replies ride the established/related
fast-path.
Apply order
Prereqs: AX41 provisioned with Debian 12 (bookworm), reachable as
root.config.envfilled in — in particularADMIN_SSH_PUBKEYandSERVER_PUBLIC_IPV4(still TODO until the box exists).
# From your laptop:
scp -r infrastructure/production/host root@<server-ip>:/opt/dezky-host
# On the server:
ssh root@<server-ip>
cd /opt/dezky-host
# config.env is gitignored, so copy it up separately or recreate it here:
# cp config.env.example config.env && nano config.env
./bootstrap.sh
bootstrap.sh creates your admin user and installs your key before it
disables root/password SSH, so the order is lockout-safe. It's idempotent —
re-run anytime.
To touch only the firewall later:
sudo ./firewall/firewall.sh --dry-run # preview the ruleset
sudo ./firewall/firewall.sh # render, validate, apply, install unit
Then register into Rancher
Once the host is hardened, register the node as a Custom k3s cluster
(create the cluster in Rancher first, choosing the K3s distribution, then
paste its token/checksum into config.env):
sudo ./k3s/register.sh # downloads agent installer, joins cluster
journalctl -u rancher-system-agent -f # follow provisioning
Rancher is currently reached by IP, so the installer is fetched with
--insecure; the agent's ongoing link is still verified via --ca-checksum.
Give Rancher a real hostname + cert later to drop the insecure fetch.
Then install Stalwart (mail)
sudo ./stalwart/install.sh # binary + systemd + bootstrap cert
systemctl status stalwart-mail
Requires STALWART_ADMIN_PASSWORD + STALWART_WEBHOOK_SECRET in config.env
(openssl rand -hex 24 / -hex 32). See the mail topology below.
Mail (Stalwart) topology
Stalwart runs on the host, not in k3s — mail must keep flowing regardless of cluster state, and SMTP/IMAP want the real public IP for reputation. The single public IP forces a deliberate split with Traefik:
| Concern | Owner | Detail |
|---|---|---|
| Mail protocol ports (25/465/587/143/993/4190) | Stalwart (host) | Bound on the public IP; opened to the world by the firewall |
Web/JMAP for mail.dezky.eu:443 |
Traefik (k3s) | Terminates TLS, reverse-proxies to Stalwart's internal :8080 |
| ACME / TLS issuance | cert-manager (k3s) | Issues mail.dezky.eu via HTTP-01; Stalwart runs no ACME (80/443 are Traefik's) |
| Cert delivery to mail ports | cert-sync.sh (host) |
Reads the cluster TLS secret via local kubeconfig, reloads Stalwart on change |
| Storage | RocksDB on host disk | Intentionally independent of the in-cluster Postgres |
| Domain/DKIM provisioning | platform-api (k3s) | JMAP management API at http://<node>:8080/jmap, Basic auth |
| Audit webhook | Stalwart → platform-api | POSTs to https://api.dezky.eu/ingest/..., HMAC-signed |
platform-api Fleet env (must match the host's config.env):
STALWART_API_URL=http://<node-internal-ip>:8080
STALWART_ADMIN_USER=admin
STALWART_ADMIN_PASSWORD=<same as host STALWART_ADMIN_PASSWORD>
STALWART_WEBHOOK_SECRET=<same as host STALWART_WEBHOOK_SECRET>
STALWART_PROVISIONING_ENABLED=true
The firewall already lets the k3s pod CIDR reach host :8080 while blocking the
world, so no extra rule is needed.
Forward dependency:
cert-sync.shneeds the fleet layer to create themail/mail-tlscert secret. Until then Stalwart serves the self-signed bootstrap certinstall.shgenerated; the timer swaps in the real cert automatically once it exists.
Finally, backups
sudo ./restic/install.sh # restic + key + nightly timer
# upload the printed public key to BOTH Storage Boxes (port 23), then:
sudo ./restic/install.sh # re-run to init the repos
sudo /opt/dezky-backup/backup.sh # first backup (or wait for 03:20 UTC)
Needs RESTIC_PASSWORD + BACKUP_PRIMARY_REPO (+ BACKUP_DR_REPO) in
config.env. See backups below.
Backups (Restic)
Nightly at 03:20 UTC: back up to the primary Storage Box, apply
retention, restic check, then a dedup-aware copy to the Helsinki DR box.
| What | Why |
|---|---|
/opt/stalwart/data + /etc |
Mail store (RocksDB) + config — the crown jewels |
/var/lib/rancher/k3s/server/db/snapshots |
k3s etcd snapshots (cluster state) |
/var/lib/rancher/k3s/storage |
local-path PVCs — incl. where fleet pg_dump/mongodump CronJobs land |
- Retention: 7 daily · 4 weekly · 6 monthly (tunable via
BACKUP_RETENTION). - Storage Box quirk: SSH/SFTP on port 23, key auth. A single ssh-config
wildcard covers both boxes, so one key +
restic copymirrors primary → DR. - Encryption: repos are Restic-encrypted with
RESTIC_PASSWORD. Store it offline — losing it makes every backup unrecoverable. - Alerting: set
BACKUP_HEALTHCHECK_URL(e.g. healthchecks.io) for a dead-man's switch — get paged when a nightly run is missed, not when you need to restore.
Database consistency: live DB files in PVCs are crash-consistent at best. The reliable path is logical dumps — the fleet layer adds
pg_dump/mongodumpCronJobs that write into a backup PVC under/var/lib/rancher/k3s/storage, which Restic then captures. Restore those dumps, not the raw data dirs.
Run restore drills. A backup you've never restored isn't a backup:
sudo /opt/dezky-backup/restore.sh snapshots
sudo /opt/dezky-backup/restore.sh restore latest /tmp/restore-test
⚠️ Lockout safety
- Always open a second SSH session and confirm access before closing the one you ran bootstrap in.
- Management is pinned to home + office IPs. Residential IPs can change — if yours does, you'll be locked out of SSH/6443 (public services stay up).
- Break-glass: Hetzner's KVM/LARA console (Robot panel) is out-of-band
and bypasses the firewall entirely. From there you can edit
/etc/nftables.d/dezky-fw.nftor updateconfig.env+ re-runfirewall.sh. - If your IP changes often, widen
MGMT_ALLOW_V4to a small prefix, or we add a WireGuard bastion later.
Verifying after apply
sudo nft list table inet dezky_fw # our rules
sudo nft list ruleset | grep -c KUBE # k3s rules still present (>0 once k3s runs)
sudo systemctl status dezky-firewall # enabled + active (exited)
sudo fail2ban-client status sshd # jail active
# From a NON-allowlisted network, `ssh` should hang/timeout; 443 should work.
Host layer status
Complete: hardening ✅ · firewall ✅ · k3s registration ✅ · Stalwart ✅ · backups ✅.
Next is the Fleet/GitOps layer (infrastructure/production/fleet/):
cert-manager + ClusterIssuer, ingress, the data tier (Postgres/Mongo/Redis),
Authentik, OCIS + Collabora, and portal + platform-api — plus the
mail/mail-tls cert and the DB-dump CronJobs this layer's cert-sync and
backups depend on.
Stalwart v0.16 — config model change (IMPORTANT)
v0.16 removed TOML configuration. The host service now boots from
stalwart/config.json — a tiny file describing ONLY the datastore (RocksDB at
/opt/stalwart/data). Every other setting (listeners, authentication, TLS,
domains, DKIM, spam, webhooks) is stored in the DB and managed via the web admin
UI, stalwart-cli, or platform-api over JMAP. stalwart/config.toml is kept as
a reference for the settings to recreate in the DB; it is NOT loaded by v0.16.
Status (node1): Stalwart 0.16.8 installed + running with default listeners
(25/465/587/143/993/4190 + management on :8080). Still to configure (DB-side):
- Fallback admin password (so platform-api can authenticate) + the audit webhook.
- TLS for
mail.dezky.eu— Stalwart's own ACME, or reworkcert-sync.shto feed the cert-manager cert into the v0.16 DB cert model. - Domains / DKIM — provisioned by platform-api over JMAP.
Then publish DNS (MX, SPF, DKIM, DMARC) and set the PTR/rDNS → mail.dezky.eu.