dezky

Author	SHA1	Message	Date
Ronni Baslund	b155e34fe6	fix(infra): runtime OIDC overrides for prod portal/operator login ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 20s Details ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 24s Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 26s Details ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 23s Details ci / build (map[dir:apps/booking name:booking]) (push) Successful in 9s Details ci / build (map[dir:apps/operator name:operator]) (push) Successful in 9s Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 18s Details ci / test (push) Successful in 34s Details ci / build (map[dir:apps/portal name:portal]) (push) Successful in 6s Details ci / build (map[dir:services/platform-api name:platform-api]) (push) Successful in 6s Details ci / deploy (push) Successful in 41s Details CI builds the Nuxt images with no env, so nuxt.config bakes empty OIDC client creds and .local Authentik URLs into runtimeConfig — sign-in dead-ended on the app's own /auth/login. Nitro env overrides only apply when the var name matches the runtimeConfig path (oidc.providers.oidc.* -> NUXT_OIDC_PROVIDERS_OIDC_), so production secrets need that second set of names; the plain NUXT_OIDC_ ones only work in dev. Also pin NUXT_OIDC_TOKEN_KEY/AUTH_SESSION_SECRET so sessions survive pod restarts. Live secrets patched on the cluster accordingly.	2026-06-10 13:24:29 +02:00
Ronni Baslund	3b9b06a99b	docs(runbook): app tier + push-to-deploy CI/CD flow ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 20s Details ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 23s Details ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 20s Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 26s Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 22s Details ci / test (push) Successful in 32s Details ci / build (map[dir:apps/booking name:booking]) (push) Successful in 9s Details ci / build (map[dir:apps/operator name:operator]) (push) Successful in 9s Details ci / build (map[dir:apps/portal name:portal]) (push) Successful in 6s Details ci / build (map[dir:services/platform-api name:platform-api]) (push) Successful in 5s Details ci / deploy (push) Successful in 41s Details Bring the runbook up to the 2026-06-10 state: app tier + CI/CD in current state, a Deploy flow section (push to main = release, rollback, break-glass, required Gitea secrets), reproduce steps 8-9 (app tier secrets+apply, CI runner + ci-deployer with the runner gotchas), per-router ACME-safe redirect instead of the old global one, platform-api key read-back for Bitwarden, and a pruned TODO list.	2026-06-10 12:19:47 +02:00
Ronni Baslund	9a58e486e3	docs(fleet): note verified push-to-deploy pipeline ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 21s Details ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 23s Details ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 21s Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 26s Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 23s Details ci / test (push) Successful in 31s Details ci / build (map[dir:apps/operator name:operator]) (push) Successful in 9s Details ci / build (map[dir:apps/booking name:booking]) (push) Successful in 9s Details ci / build (map[dir:apps/portal name:portal]) (push) Successful in 6s Details ci / build (map[dir:services/platform-api name:platform-api]) (push) Successful in 5s Details ci / deploy (push) Successful in 41s Details	2026-06-10 09:20:18 +02:00
Ronni Baslund	323c46fba1	fix(ci): share dind's unix socket with the runner (jobs need a mountable docker host) ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 42s Details ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 45s Details ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 21s Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 26s Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 20s Details ci / test (push) Successful in 32s Details ci / build (map[dir:apps/booking name:booking]) (push) Successful in 34s Details ci / build (map[dir:apps/operator name:operator]) (push) Successful in 46s Details ci / build (map[dir:services/platform-api name:platform-api]) (push) Successful in 35s Details ci / build (map[dir:apps/portal name:portal]) (push) Successful in 49s Details ci / deploy (push) Successful in 45s Details gitea/runner can only bind-mount a UNIX-socket docker host into job containers — the old tcp://localhost:2376 + TLS daemon address cannot be mounted, so build jobs still had no docker API. Share dind's /var/run/docker.sock with the runner via a /var/run emptyDir and drop the DOCKER_HOST/TLS env; the runner auto-finds the socket and the bind path resolves inside dind where the socket lives.	2026-06-10 08:51:44 +02:00
Ronni Baslund	1114be6c93	fix(ci): expose the dind docker host to job containers ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 45s Details ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 50s Details ci / build (map[dir:apps/operator name:operator]) (push) Failing after 5s Details ci / deploy (push) Has been skipped Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 27s Details ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 23s Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 24s Details ci / test (push) Successful in 35s Details ci / build (map[dir:apps/booking name:booking]) (push) Failing after 7s Details ci / build (map[dir:apps/portal name:portal]) (push) Failing after 5s Details ci / build (map[dir:services/platform-api name:platform-api]) (push) Failing after 6s Details gitea/runner 1.x no longer auto-mounts the docker daemon into job containers (act_runner 0.2.x did), so 'docker build' in the build jobs failed with 'cannot connect to /var/run/docker.sock'. container.docker_host "" restores find-and-mount.	2026-06-10 08:34:54 +02:00
Ronni Baslund	ec707643d6	fix(ci): act_runner 0.2.11 -> gitea/runner 1.0.8 ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 45s Details ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 48s Details ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 23s Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 28s Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 20s Details ci / test (push) Successful in 33s Details ci / build (map[dir:apps/booking name:booking]) (push) Failing after 5s Details ci / build (map[dir:apps/operator name:operator]) (push) Failing after 6s Details ci / build (map[dir:apps/portal name:portal]) (push) Failing after 5s Details ci / build (map[dir:services/platform-api name:platform-api]) (push) Failing after 5s Details ci / deploy (push) Has been skipped Details Gitea 1.26 never marked finished jobs complete with the deprecated act_runner 0.2.11: the runner ran the job, logged 'Job succeeded' and freed its slot, but Gitea kept the job 'Running' forever, so dependent jobs (build -> deploy) were never dispatched. gitea/runner is the successor project; config, env vars and the .runner registration file are unchanged.	2026-06-10 08:02:40 +02:00
Ronni Baslund	c60937c5cb	feat(ci): deploy to k3s straight from the pipeline (drop Flux plan) ci / build (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / build (map[dir:apps/operator name:operator]) (push) Has been cancelled Details ci / build (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / build (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / deploy (push) Has been cancelled Details ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/operator name:operator]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details Push to main = release: after build, a deploy job pins each app image to the commit SHA (kustomize edit set image), kubectl-applies fleet/apps and waits for the rollouts. The runner already runs in-cluster, so it reaches the API server on the in-cluster service IP with a kubeconfig for the new ci-deployer ServiceAccount (namespace-scoped admin, KUBECONFIG_B64 repo secret). The drafted Flux sync/image-automation layer is removed — a GitOps controller plus bot tag-bump commits is more machinery than a single-node cluster needs. Sortable image tags and $imagepolicy markers go with it. Also: per-router ACME-safe HTTP->HTTPS redirects for the app ingresses, platform-api prod config completed (Authentik JWT/JWKS + admin API, Stalwart via the cni0 gateway IP, OCIS/cold-storage placeholders until those tiers exist) and the secrets template/README updated to match.	2026-06-10 07:53:55 +02:00
Ronni Baslund	52e0f5e375	feat(operator): production build + k3s deployment - Dockerfile for the operator app (same pattern as portal/booking). - Env-driven auth/app base URLs in nuxt.config so one build serves dev (.local) and production (.eu). - Deployment + Service + Ingress on operator.dezky.eu. - Add operator to the typecheck matrix.	2026-06-10 07:53:55 +02:00
Ronni Baslund	d02eb5ec50	fix(authentik): pin chart 2026.5.2, grant_types allowlist, portal redirect URI - Pin the helm-controller chart version (unset = silent latest upgrades) and move the image tag under global.image per the 2026.5 chart layout. - Authentik 2026.5 enforces a per-provider grant_types allowlist; empty list rejected every authorize request. Allow authorization_code + refresh_token for portal and operator providers. - Fix the portal redirect URI to the nuxt-oidc-auth callback path. - Serve the auth ingress on :80 with a per-router HTTPS redirect so the cert-manager HTTP-01 solver keeps working.	2026-06-10 07:53:49 +02:00
Ronni Baslund	4c5fdde787	fix(infra): docker:24-dind + capacity 2 (fix moby cgroup-v2 teardown deadlock that hung 'Complete job')	2026-06-08 22:56:21 +02:00
Ronni Baslund	aef0f44915	chore(infra): act_runner capacity 4 + disable cache server ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 31s Details ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details Add an act_runner config.yaml (ConfigMap, CONFIG_FILE env): capacity 4 so the typecheck matrix + image builds run in parallel instead of one-at-a-time, and cache.enabled: false (we removed the setup-node cache; the cache server isn't reachable from the DinD job containers anyway).	2026-06-08 22:46:43 +02:00
Ronni Baslund	f331e3c1e6	feat(infra): in-cluster Gitea Actions runner (act_runner + dind) ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details Self-registering act_runner on node1 with a privileged docker:dind sidecar so workflow jobs can build + push app images (k3s has containerd only, no Docker daemon). Labels ubuntu-latest + docker; state persisted on a Longhorn PVC. The registration token is applied out-of-band as the gitea-runner-token Secret (not in git). Verified: runner declared successfully, dind API up.	2026-06-08 22:13:38 +02:00
Ronni Baslund	a27c238c76	feat(infra): nightly DB-dump CronJobs feeding the Restic backup ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details pg_dumpall (all Postgres DBs + roles) and mongodump (all Mongo DBs) write gzipped dumps to the hostPath /opt/dezky-backup/dumps at 02:50/02:52 UTC, which the host Restic job (03:20) ships to the Storage Box. Each keeps the last 7 local dumps; Restic holds the real off-box retention. - pods run as root (hostPath dir is root-owned, as is the host Restic reader) - mongo job uses bash (mongo:7 /bin/sh is dash → no pipefail) - creds from postgres-secret / mongo-secret via secretKeyRef Verified: both jobs Complete, dumps present on the host (postgres-all ~2.2MB w/ Authentik data, mongo archive).	2026-06-08 21:55:14 +02:00
Ronni Baslund	861212831d	fix(infra): restic→Storage Box backups working end-to-end ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details Three fixes found bringing up backups on node1: - restic.env wrote BACKUP_PATHS/RETENTION unquoted → sourcing ran a path as a command ("Is a directory"); now quoted. - ssh config was written to $BACKUP_HOME/.ssh/config, but restic runs as root and its ssh resolves ~ from the passwd db (not $HOME), so it reads /root/.ssh/config — write the Storage Box block there. Also StrictHostKeyChecking=no + UserKnownHostsFile=/dev/null (safe: restic encrypts before upload; fixes flaky Storage Box host-key verification). - Storage Box SFTP lands in /home, so the repo path needs the /home prefix (absolute /dezky hit the root-owned chroot parent → SSH_FX_FAILURE). Verified: repo initialized, nightly snapshot of mail store + Stalwart config + etcd snapshots + dumps dir, `restic check` clean, retention applied.	2026-06-08 21:46:49 +02:00
Ronni Baslund	9d075343c5	feat(infra): migrate Stalwart to the v0.16 config model (config.json) ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details v0.16 dropped TOML config. The host service now boots from a tiny config.json that describes only the datastore (RocksDB); all other settings live in the DB (web UI / stalwart-cli / platform-api JMAP). - add stalwart/config.json (RocksDb datastore at /opt/stalwart/data) - install.sh: install config.json instead of config.toml - stalwart-mail.service: --config points at config.json - README: document the v0.16 model + remaining DB-side config + DNS/PTR Verified: Stalwart 0.16.8 runs on node1 with default mail listeners + the :8080 management server. config.toml retained as a reference for the DB settings.	2026-06-08 21:02:17 +02:00
Ronni Baslund	149eb0b020	fix(infra): Stalwart installer — repo rename + exact asset; flag 0.16 config break ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details - install.sh: default repo stalwartlabs/mail-server -> stalwartlabs/stalwart (renamed), and select the exact /stalwart-<target>.tar.gz asset excluding the foundationdb build (head -n1 could grab the wrong one). - config.toml: $env{...} -> %{env:...}% (correct Stalwart macro syntax). KNOWN ISSUE: Stalwart v0.16 removed TOML config (single config.json datastore + everything else in the DB via CLI/UI), so this config.toml does not load on 0.16.8 ("Failed to parse data store settings"). Needs either a pinned pre-0.16 version or a migration to the v0.16 config model. Binary is installed; the service is stopped pending that decision.	2026-06-08 20:51:56 +02:00
Ronni Baslund	326b626fc6	feat(infra): full dezky rebrand of Authentik login (logo, favicon, bg, footer) ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details Brand CSS only reaches the flow shadow DOM via CSS vars (colors), not the logo/favicon (deeper shadow root) or the "Powered by authentik" footer (light DOM). So, dev-style: serve real dezky assets + sed the bundle. - web-assets/: dezky-logo.svg, dezky-favicon.svg, dezky-bg.svg (carbon). - server-rebrand.py: patches the authentik-server Deployment with an initContainer that copies /web/dist to an emptyDir, drops the svgs into assets/icons, and seds "Powered by authentik" -> "Powered by Dezky". - brand.yaml: branding_logo / branding_favicon / branding_default_flow_background point at the served svgs; auth-flow title "Welcome to Dezky"; signal-green CSS. Verified live: login now matches dev (logo, title, carbon bg, green button, favicon, Powered by Dezky). Durability caveat documented (reverts on helm upgrade).	2026-06-08 20:36:01 +02:00
Ronni Baslund	99cd86cd3a	feat(infra): full dezky branding on Authentik (logo, carbon bg, flow title) ci / typecheck (map[dir:apps/booking name:booking]) (push) Failing after 7s Details ci / test (push) Failing after 7s Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Failing after 6s Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details branding_logo / branding_default_flow_background are file-path fields (reject data URIs), so the dezky logo + carbon background are injected via the brand's custom CSS (data URIs allowed there): logo replaces the authentik wordmark, background overrides the forest. Auth-flow title -> "Welcome to Dezky". Signal-green primary button retained.	2026-06-08 19:54:44 +02:00
Ronni Baslund	db1354a151	feat(infra): Authentik blueprints (portal+operator OIDC, dezky brand) ci / typecheck (map[dir:apps/booking name:booking]) (push) Failing after 6s Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details Mirror the dev Authentik config in prod via blueprints, applied & successful on node1: - brand.yaml: dezky branding on the default brand (title + signal-green custom CSS) — login page now in dezky colors. - portal-application.yaml / operator-application.yaml: dezky-portal & dezky-operator OIDC apps/providers (prod redirect URLs) + the dezky-platform-admins group & operator access policy. Two 2026.5 gotchas handled + documented in README: - invalidation_flow is now REQUIRED on OAuth2 providers (added via !Find). - ConfigMap mounts are symlinks (discovery can't read them) → worker uses an initContainer that copies them to an emptyDir as real files. (chart worker.volumes didn't apply on this version; patch reverts on helm upgrade — noted as a durability TODO.) Client secrets (PORTAL/OPERATOR_OIDC_CLIENT_SECRET) live in authentik-secret; the apps must reuse them.	2026-06-08 19:46:48 +02:00
Ronni Baslund	406e2ca78b	feat(infra): deploy Authentik (auth.dezky.eu) + global HTTP→HTTPS redirect ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details - Authentik on the in-cluster Postgres/Redis (mirrors the dev compose config: external DB/Redis, error-reporting off, update-check off, bootstrap admin), via the k3s Helm controller; Ingress + cert-manager letsencrypt-prod. Live at https://auth.dezky.eu (image 2026.5.2). Secrets generated on-box (Bitwarden). - Traefik HelmChartConfig: global :80 -> :443 (308) redirect via additionalArguments (to=:443, HTTP-01-safe). - RUNBOOK updated. Deferred (mirror remaining dev bits): OIDC app blueprints (portal/operator with prod URLs) + the cosmetic "Powered by Dezky" rebrand.	2026-06-08 19:00:07 +02:00
Ronni Baslund	153d7053ca	feat(infra): k3s foundation — cert-manager, Longhorn config, in-cluster data tier ci / typecheck (map[dir:apps/website name:website]) (push) Failing after 10m58s Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Failing after 11m56s Details ci / typecheck (map[dir:apps/booking name:booking]) (push) Failing after 14m0s Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details Adds the production cluster foundation (authored + applied live on node1): - cert-manager via the k3s HelmChart controller + letsencrypt staging/prod ClusterIssuers (HTTP-01 / Traefik). - Longhorn config for single-node (values: replica=1, default StorageClass, Retain) + backup-to-Hetzner-Object-Storage credential template. - In-cluster data tier (dezky-data): Postgres 16 (with Authentik+OCIS DB init), MongoDB 7, Redis 7 as StatefulSets on Longhorn, + secret template. - bootstrap.sh: install open-iscsi/nfs-common + enable iscsid (Longhorn prereq). - RUNBOOK.md: full reproducible node1 build order. Real secrets are generated on-box and kept in Bitwarden — never in git.	2026-06-08 18:39:31 +02:00
Ronni Baslund	35bc7b6c31	chore(infra): production manifests + CI for scheduling apps ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details	2026-06-07 09:27:44 +02:00
Ronni Baslund	3831c85285	feat(infra): production host bootstrap and bare-metal Stalwart scaffolding Host provisioning for the single-server production target: SSH + firewall hardening (nftables allowlist), k3s node registration, bare-metal Stalwart install with systemd units and TLS cert-sync from the cluster secret, and Restic encrypted backup/restore (primary + DR) with timer units. Host-specific secrets live in config.env (gitignored); config.env.example is the template. Also gitignores MemPalace per-project files.	2026-06-07 00:19:48 +02:00

23 Commits