From c6b6f8faeca759c2ecf7fc0018089295600a7440 Mon Sep 17 00:00:00 2001 From: Ronni Baslund Date: Thu, 11 Jun 2026 11:23:23 +0200 Subject: [PATCH] docs(runbook): HetrixTools monitoring (uptime monitors + node1 agent) --- infrastructure/production/RUNBOOK.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/infrastructure/production/RUNBOOK.md b/infrastructure/production/RUNBOOK.md index 343ba79..094d542 100644 --- a/infrastructure/production/RUNBOOK.md +++ b/infrastructure/production/RUNBOOK.md @@ -38,6 +38,14 @@ bottom to rebuild it. Per-layer detail lives in `host/README.md`, - **Registry hygiene** — Gitea package cleanup rule (user-level, Container type): keep newest 5 versions per image + `latest`, remove older than 7 days. Applied by Gitea's daily cleanup cron. +- **Monitoring** — HetrixTools (Ronni's account): 11 uptime monitors via API + (HTTPS on the five apps + Gitea w/ SSL verify, ping, IMAPS/SMTPS TCP, SMTP + protocol on :25; 1-min checks from ams/fra/lon, alert after 2 fails) + the + Linux server agent on node1 (root mode, per-minute cron in + /etc/hetrixtools/; watches stalwart/k3s/dockerd processes, mdadm RAID, + NVMe SMART via smartmontools). Re-create monitors via their v2 API + (uptime/add, Type 9 = server agent — hidden in the new UI); agent install: + hetrixtools_install.sh 1 "stalwart,k3s,dockerd" 1 1. ## Deploy flow (day-to-day)