CI builds the Nuxt images with no env, so nuxt.config bakes empty OIDC
client creds and .local Authentik URLs into runtimeConfig — sign-in
dead-ended on the app's own /auth/login. Nitro env overrides only apply
when the var name matches the runtimeConfig path
(oidc.providers.oidc.* -> NUXT_OIDC_PROVIDERS_OIDC_*), so production
secrets need that second set of names; the plain NUXT_OIDC_* ones only
work in dev. Also pin NUXT_OIDC_TOKEN_KEY/AUTH_SESSION_SECRET so sessions
survive pod restarts. Live secrets patched on the cluster accordingly.
Bring the runbook up to the 2026-06-10 state: app tier + CI/CD in current
state, a Deploy flow section (push to main = release, rollback, break-glass,
required Gitea secrets), reproduce steps 8-9 (app tier secrets+apply, CI
runner + ci-deployer with the runner gotchas), per-router ACME-safe redirect
instead of the old global one, platform-api key read-back for Bitwarden, and
a pruned TODO list.
gitea/runner can only bind-mount a UNIX-socket docker host into job
containers — the old tcp://localhost:2376 + TLS daemon address cannot be
mounted, so build jobs still had no docker API. Share dind's
/var/run/docker.sock with the runner via a /var/run emptyDir and drop the
DOCKER_HOST/TLS env; the runner auto-finds the socket and the bind path
resolves inside dind where the socket lives.
gitea/runner 1.x no longer auto-mounts the docker daemon into job
containers (act_runner 0.2.x did), so 'docker build' in the build jobs
failed with 'cannot connect to /var/run/docker.sock'. container.docker_host
"" restores find-and-mount.
Gitea 1.26 never marked finished jobs complete with the deprecated
act_runner 0.2.11: the runner ran the job, logged 'Job succeeded' and freed
its slot, but Gitea kept the job 'Running' forever, so dependent jobs
(build -> deploy) were never dispatched. gitea/runner is the successor
project; config, env vars and the .runner registration file are unchanged.
Push to main = release: after build, a deploy job pins each app image to the
commit SHA (kustomize edit set image), kubectl-applies fleet/apps and waits
for the rollouts. The runner already runs in-cluster, so it reaches the API
server on the in-cluster service IP with a kubeconfig for the new ci-deployer
ServiceAccount (namespace-scoped admin, KUBECONFIG_B64 repo secret).
The drafted Flux sync/image-automation layer is removed — a GitOps controller
plus bot tag-bump commits is more machinery than a single-node cluster needs.
Sortable image tags and $imagepolicy markers go with it.
Also: per-router ACME-safe HTTP->HTTPS redirects for the app ingresses,
platform-api prod config completed (Authentik JWT/JWKS + admin API, Stalwart
via the cni0 gateway IP, OCIS/cold-storage placeholders until those tiers
exist) and the secrets template/README updated to match.
- Dockerfile for the operator app (same pattern as portal/booking).
- Env-driven auth/app base URLs in nuxt.config so one build serves
dev (.local) and production (.eu).
- Deployment + Service + Ingress on operator.dezky.eu.
- Add operator to the typecheck matrix.
- Pin the helm-controller chart version (unset = silent latest upgrades) and
move the image tag under global.image per the 2026.5 chart layout.
- Authentik 2026.5 enforces a per-provider grant_types allowlist; empty list
rejected every authorize request. Allow authorization_code + refresh_token
for portal and operator providers.
- Fix the portal redirect URI to the nuxt-oidc-auth callback path.
- Serve the auth ingress on :80 with a per-router HTTPS redirect so the
cert-manager HTTP-01 solver keeps working.
Add an act_runner config.yaml (ConfigMap, CONFIG_FILE env): capacity 4 so the
typecheck matrix + image builds run in parallel instead of one-at-a-time, and
cache.enabled: false (we removed the setup-node cache; the cache server isn't
reachable from the DinD job containers anyway).
Self-registering act_runner on node1 with a privileged docker:dind sidecar so
workflow jobs can build + push app images (k3s has containerd only, no Docker
daemon). Labels ubuntu-latest + docker; state persisted on a Longhorn PVC. The
registration token is applied out-of-band as the gitea-runner-token Secret
(not in git). Verified: runner declared successfully, dind API up.
pg_dumpall (all Postgres DBs + roles) and mongodump (all Mongo DBs) write
gzipped dumps to the hostPath /opt/dezky-backup/dumps at 02:50/02:52 UTC, which
the host Restic job (03:20) ships to the Storage Box. Each keeps the last 7
local dumps; Restic holds the real off-box retention.
- pods run as root (hostPath dir is root-owned, as is the host Restic reader)
- mongo job uses bash (mongo:7 /bin/sh is dash → no pipefail)
- creds from postgres-secret / mongo-secret via secretKeyRef
Verified: both jobs Complete, dumps present on the host
(postgres-all ~2.2MB w/ Authentik data, mongo archive).
Three fixes found bringing up backups on node1:
- restic.env wrote BACKUP_PATHS/RETENTION unquoted → sourcing ran a path as a
command ("Is a directory"); now quoted.
- ssh config was written to $BACKUP_HOME/.ssh/config, but restic runs as root
and its ssh resolves ~ from the passwd db (not $HOME), so it reads
/root/.ssh/config — write the Storage Box block there. Also
StrictHostKeyChecking=no + UserKnownHostsFile=/dev/null (safe: restic encrypts
before upload; fixes flaky Storage Box host-key verification).
- Storage Box SFTP lands in /home, so the repo path needs the /home prefix
(absolute /dezky hit the root-owned chroot parent → SSH_FX_FAILURE).
Verified: repo initialized, nightly snapshot of mail store + Stalwart config +
etcd snapshots + dumps dir, `restic check` clean, retention applied.
v0.16 dropped TOML config. The host service now boots from a tiny config.json
that describes only the datastore (RocksDB); all other settings live in the DB
(web UI / stalwart-cli / platform-api JMAP).
- add stalwart/config.json (RocksDb datastore at /opt/stalwart/data)
- install.sh: install config.json instead of config.toml
- stalwart-mail.service: --config points at config.json
- README: document the v0.16 model + remaining DB-side config + DNS/PTR
Verified: Stalwart 0.16.8 runs on node1 with default mail listeners + the :8080
management server. config.toml retained as a reference for the DB settings.
- install.sh: default repo stalwartlabs/mail-server -> stalwartlabs/stalwart
(renamed), and select the exact /stalwart-<target>.tar.gz asset excluding the
foundationdb build (head -n1 could grab the wrong one).
- config.toml: $env{...} -> %{env:...}% (correct Stalwart macro syntax).
KNOWN ISSUE: Stalwart v0.16 removed TOML config (single config.json datastore +
everything else in the DB via CLI/UI), so this config.toml does not load on
0.16.8 ("Failed to parse data store settings"). Needs either a pinned pre-0.16
version or a migration to the v0.16 config model. Binary is installed; the
service is stopped pending that decision.
Brand CSS only reaches the flow shadow DOM via CSS vars (colors), not the
logo/favicon (deeper shadow root) or the "Powered by authentik" footer (light
DOM). So, dev-style: serve real dezky assets + sed the bundle.
- web-assets/: dezky-logo.svg, dezky-favicon.svg, dezky-bg.svg (carbon).
- server-rebrand.py: patches the authentik-server Deployment with an
initContainer that copies /web/dist to an emptyDir, drops the svgs into
assets/icons, and seds "Powered by authentik" -> "Powered by Dezky".
- brand.yaml: branding_logo / branding_favicon / branding_default_flow_background
point at the served svgs; auth-flow title "Welcome to Dezky"; signal-green CSS.
Verified live: login now matches dev (logo, title, carbon bg, green button,
favicon, Powered by Dezky). Durability caveat documented (reverts on helm
upgrade).
branding_logo / branding_default_flow_background are file-path fields (reject
data URIs), so the dezky logo + carbon background are injected via the brand's
custom CSS (data URIs allowed there): logo replaces the authentik wordmark,
background overrides the forest. Auth-flow title -> "Welcome to Dezky".
Signal-green primary button retained.
Mirror the dev Authentik config in prod via blueprints, applied & successful on
node1:
- brand.yaml: dezky branding on the default brand (title + signal-green custom
CSS) — login page now in dezky colors.
- portal-application.yaml / operator-application.yaml: dezky-portal &
dezky-operator OIDC apps/providers (prod redirect URLs) + the
dezky-platform-admins group & operator access policy.
Two 2026.5 gotchas handled + documented in README:
- invalidation_flow is now REQUIRED on OAuth2 providers (added via !Find).
- ConfigMap mounts are symlinks (discovery can't read them) → worker uses an
initContainer that copies them to an emptyDir as real files. (chart
worker.volumes didn't apply on this version; patch reverts on helm upgrade —
noted as a durability TODO.)
Client secrets (PORTAL/OPERATOR_OIDC_CLIENT_SECRET) live in authentik-secret;
the apps must reuse them.
Adds the production cluster foundation (authored + applied live on node1):
- cert-manager via the k3s HelmChart controller + letsencrypt staging/prod
ClusterIssuers (HTTP-01 / Traefik).
- Longhorn config for single-node (values: replica=1, default StorageClass,
Retain) + backup-to-Hetzner-Object-Storage credential template.
- In-cluster data tier (dezky-data): Postgres 16 (with Authentik+OCIS DB init),
MongoDB 7, Redis 7 as StatefulSets on Longhorn, + secret template.
- bootstrap.sh: install open-iscsi/nfs-common + enable iscsid (Longhorn prereq).
- RUNBOOK.md: full reproducible node1 build order.
Real secrets are generated on-box and kept in Bitwarden — never in git.
Host provisioning for the single-server production target: SSH + firewall
hardening (nftables allowlist), k3s node registration, bare-metal Stalwart
install with systemd units and TLS cert-sync from the cluster secret, and
Restic encrypted backup/restore (primary + DR) with timer units. Host-specific
secrets live in config.env (gitignored); config.env.example is the template.
Also gitignores MemPalace per-project files.