dezky

Author	SHA1	Message	Date
Ronni Baslund	716d854b3d	fix(ci): grant ci-deployer Endpoints write (admin role excludes it) ci / tc_portal (push) Has been skipped Details ci / tc_operator (push) Has been skipped Details ci / tc_website (push) Has been skipped Details ci / build_portal (push) Has been skipped Details ci / changes (push) Successful in 4s Details ci / tc_booking (push) Has been skipped Details ci / tc_platform_api (push) Has been skipped Details ci / test_platform_api (push) Has been skipped Details ci / build_booking (push) Has been skipped Details ci / build_operator (push) Has been skipped Details ci / build_platform_api (push) Has been skipped Details ci / deploy (push) Successful in 7s Details The deploy failed creating the selectorless stalwart-http Service's Endpoints: since the CVE-2021-25740 hardening the namespaced 'admin' role no longer grants write on legacy Endpoints. Explicit endpoints + endpointslices rules on the ci-deployer role (already applied live); manifest comment touch retriggers the infra apply.	2026-06-11 08:08:42 +02:00
Ronni Baslund	88ac5e620c	feat(mail): Outlook/Thunderbird autodiscovery over HTTPS ci / changes (push) Successful in 4s Details ci / tc_portal (push) Has been skipped Details ci / tc_operator (push) Has been skipped Details ci / tc_website (push) Has been skipped Details ci / tc_booking (push) Has been skipped Details ci / tc_platform_api (push) Successful in 21s Details ci / build_portal (push) Has been skipped Details ci / build_booking (push) Has been skipped Details ci / build_operator (push) Has been skipped Details ci / test_platform_api (push) Successful in 33s Details ci / build_platform_api (push) Successful in 19s Details ci / deploy (push) Failing after 9s Details Outlook autodiscovers via POST https://autodiscover.<domain>/autodiscover/ autodiscover.xml and Thunderbird via autoconfig.<domain>/mail/ config-v1.1.xml — Stalwart serves both (verified, answers carry mail.dezky.eu:993/465) but its HTTP listener wasn't reachable from outside (the node's :443 is Traefik's). New exact-path-only Ingress routes JUST those discovery endpoints to host-Stalwart via a selectorless Service + Endpoints on the cni0 gateway; the admin/management surface stays internal, and there's no HTTPS-redirect middleware because Thunderbird probes plain HTTP and Outlook POSTs. Domains page now also lists the autoconfig/autodiscover CNAMEs under the autodiscovery slot (CNAME verified against the mail host; a bare A record warns instead of failing). Customer-domain autodiscovery (per-domain certs + automated Ingress) is a follow-up.	2026-06-11 08:04:55 +02:00
Ronni Baslund	e77a963390	feat(infra): real TLS for mail.dezky.eu ci / changes (push) Successful in 3s Details ci / tc_portal (push) Has been skipped Details ci / tc_booking (push) Has been skipped Details ci / tc_operator (push) Has been skipped Details ci / tc_platform_api (push) Has been skipped Details ci / tc_website (push) Has been skipped Details ci / build_portal (push) Has been skipped Details ci / test_platform_api (push) Has been skipped Details ci / build_booking (push) Has been skipped Details ci / build_operator (push) Has been skipped Details ci / build_platform_api (push) Has been skipped Details ci / deploy (push) Has been skipped Details The cert-sync timer waited forever for a mail/mail-tls secret no Certificate resource ever requested — Stalwart served self-signed certs since install, so mail clients refused the IMAP handshake ('cannot verify account name or password' in Apple Mail). Adds the cert-manager Certificate (HTTP-01 via Traefik on :80) and documents the v0.16 wrinkle: TLS files aren't read from config anymore; a one-time file-backed x:Certificate object (created via management JMAP) points at the synced paths, after which cert-sync renewals keep working unchanged. Verified: :993 now serves the Let's Encrypt cert, verify rc=0.	2026-06-10 21:58:35 +02:00
Ronni Baslund	83214eb379	feat(tenants): isPlatformTenant flag replaces PLATFORM_TENANT_SLUG ci / changes (push) Successful in 4s Details ci / tc_portal (push) Has been skipped Details ci / tc_booking (push) Has been skipped Details ci / tc_website (push) Has been skipped Details ci / tc_platform_api (push) Successful in 22s Details ci / tc_operator (push) Successful in 22s Details ci / build_portal (push) Has been skipped Details ci / build_booking (push) Has been skipped Details ci / build_operator (push) Successful in 30s Details ci / test_platform_api (push) Successful in 34s Details ci / build_platform_api (push) Successful in 15s Details ci / deploy (push) Successful in 42s Details Identifying the company tenant by slug in env was fragile — every purge/recreate changed the slug (or id) and the apex guard chased reality through three config flips in one day. The identity now lives ON the tenant document: isPlatformTenant, operator-set from the tenant page (single holder — setting it clears the flag everywhere else), guarded so tenant admins can't set it on themselves through the shared PATCH route. The dezky.eu apex guard reads the flag; PLATFORM_TENANT_SLUG is gone. Dev seed flags its seeded tenant. config-rev 5 rolls platform-api.	2026-06-10 21:47:27 +02:00
Ronni Baslund	eefe1b3ec3	fix(infra): platform tenant is dezky-aps; disable prod seeding ci / changes (push) Successful in 4s Details ci / tc_booking (push) Has been skipped Details ci / tc_operator (push) Has been skipped Details ci / tc_website (push) Has been skipped Details ci / build_portal (push) Has been skipped Details ci / tc_portal (push) Has been skipped Details ci / tc_platform_api (push) Has been skipped Details ci / test_platform_api (push) Has been skipped Details ci / build_booking (push) Has been skipped Details ci / build_operator (push) Has been skipped Details ci / build_platform_api (push) Has been skipped Details ci / deploy (push) Successful in 41s Details The recreated company tenant got slug dezky-aps (wizard auto-derives from the display name 'Dezky ApS'), so the dezky.eu apex guard 409'd it while the config still said 'dezky'. Also SEED_ENABLED=false in prod — the seeder resurrected a ghost 'dezky' tenant on every platform-api boot, which is how the slug landscape kept shifting. config-rev 4 rolls the pods.	2026-06-10 21:35:59 +02:00
Ronni Baslund	2bc302c082	feat(operator): partner-style tenant provisioning wizard + admin invite ci / tc_portal (push) Has been skipped Details ci / changes (push) Successful in 4s Details ci / tc_booking (push) Has been skipped Details ci / tc_website (push) Has been skipped Details ci / tc_platform_api (push) Successful in 22s Details ci / tc_operator (push) Successful in 24s Details ci / build_portal (push) Has been skipped Details ci / build_booking (push) Has been skipped Details ci / test_platform_api (push) Successful in 32s Details ci / build_operator (push) Successful in 31s Details ci / build_platform_api (push) Successful in 15s Details ci / deploy (push) Successful in 41s Details The minimal create modal silently dropped adminName/adminEmail — the invite only existed in the partner wizard's server path. Operator now gets the same 5-step wizard UX (organization, domain, first admin, plan with live price catalog, review) composed client-side: POST /tenants creates + provisions, then POST /users/invite-tenant-admin (new, operator-only — lives in UsersModule because UsersModule already imports TenantsModule and the reverse would be circular) runs the same inviteTenantAdmin flow the partner gets, and the result view hands over the single-use recovery link or temp password. Tenant detail page gains an Invite admin action for retries/successors. PLATFORM_TENANT_SLUG back to 'dezky' (the recreated company tenant) + config-rev bump to roll platform-api.	2026-06-10 21:22:14 +02:00
Ronni Baslund	25d932d3c1	fix(domains): platform tenant slug is configurable (prod: dezky-aps) ci / changes (push) Successful in 4s Details ci / tc_portal (push) Has been skipped Details ci / tc_booking (push) Has been skipped Details ci / tc_operator (push) Has been skipped Details ci / tc_website (push) Has been skipped Details ci / tc_platform_api (push) Successful in 23s Details ci / build_portal (push) Has been skipped Details ci / build_booking (push) Has been skipped Details ci / build_operator (push) Has been skipped Details ci / test_platform_api (push) Successful in 32s Details ci / build_platform_api (push) Successful in 18s Details ci / deploy (push) Successful in 41s Details The company tenant ended up as slug dezky-aps (the seeded 'dezky' tenant was deleted), so the hardcoded apex allowance for slug 'dezky' would have rejected adding dezky.eu to the real tenant. PLATFORM_TENANT_SLUG env (default 'dezky') now names the only tenant allowed to claim the PLATFORM_TENANT_DOMAIN apex.	2026-06-10 20:57:31 +02:00
Ronni Baslund	f66a343472	fix(infra): Stalwart v0.16 management admin is a real account (admin@dezky.eu) ci / changes (push) Successful in 3s Details ci / tc_operator (push) Has been skipped Details ci / build_portal (push) Has been skipped Details ci / build_operator (push) Has been skipped Details ci / build_platform_api (push) Has been skipped Details ci / tc_portal (push) Has been skipped Details ci / tc_booking (push) Has been skipped Details ci / tc_website (push) Has been skipped Details ci / tc_platform_api (push) Has been skipped Details ci / test_platform_api (push) Has been skipped Details ci / build_booking (push) Has been skipped Details ci / deploy (push) Successful in 42s Details The v0.16 config migration silently dropped the fallback admin — the live server had ZERO accounts, so every platform-api JMAP call 401'd and tenant mail provisioning was dead. Bootstrapped via recovery mode on node1 (STALWART_RECOVERY_ADMIN): created the dezky.eu domain + an admin account with the Admin role and the existing STALWART_ADMIN_PASSWORD. v0.16 logins use the full address, so STALWART_ADMIN_USER becomes admin@dezky.eu; config-rev annotation bump rolls platform-api so it picks up the new env. install.sh follow-ups now document the recovery-mode bootstrap for rebuilds instead of the defunct fallback-admin promise.	2026-06-10 20:50:25 +02:00
Ronni Baslund	a43a172449	feat(domains): reserve the platform namespace + one workspace per domain ci / changes (push) Successful in 4s Details ci / tc_portal (push) Has been skipped Details ci / build_operator (push) Has been skipped Details ci / test_platform_api (push) Successful in 34s Details ci / tc_booking (push) Has been skipped Details ci / tc_operator (push) Has been skipped Details ci / tc_website (push) Has been skipped Details ci / tc_platform_api (push) Successful in 23s Details ci / build_portal (push) Has been skipped Details ci / build_booking (push) Has been skipped Details ci / build_platform_api (push) Successful in 18s Details ci / deploy (push) Successful in 41s Details dezky.eu doubles as the platform's infrastructure domain AND the company's own employee mail domain (added to the dezky tenant via the normal Domains flow). Guard rails in DomainsService.add: - a domain already used by ANY other workspace is rejected — Stalwart's idempotent ensureDomain would otherwise silently share one mail domain (and its mailboxes) between tenants - the PLATFORM_TENANT_DOMAIN apex is claimable only by the dezky tenant; everything under it (per-tenant service domains, auth/api/mail/* infra hosts) is reserved outright Set PLATFORM_TENANT_DOMAIN=dezky.eu in the prod ConfigMap (was unset, so prod service domains would have been {slug}.dezky.local) and align the seeded dezky tenant's display domain with the environment.	2026-06-10 20:15:46 +02:00
Ronni Baslund	94270c1f22	fix(health): env-driven infrastructure probe targets ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 20s Details ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 22s Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 28s Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 22s Details ci / test (push) Successful in 30s Details ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 23s Details ci / build (map[dir:apps/booking name:booking]) (push) Successful in 10s Details ci / build (map[dir:apps/operator name:operator]) (push) Successful in 31s Details ci / build (map[dir:services/platform-api name:platform-api]) (push) Successful in 15s Details ci / build (map[dir:apps/portal name:portal]) (push) Successful in 38s Details ci / deploy (push) Successful in 42s Details The operator infrastructure page probed docker-compose hostnames (stalwart/postgres/redis/traefik…) which don't resolve in k3s — 7 of 9 services showed down. Probe targets now come from HEALTH_* env vars with the compose names as dev defaults; platform-api-config.yaml sets the in-cluster/host addresses. 'disabled' omits a service from the report — used for OCIS/Collabora until the files tier is deployed.	2026-06-10 19:51:25 +02:00
Ronni Baslund	0840efb759	fix(operator,portal): env-driven sign-out URLs + host labels (no more .local in prod) Operator sign-out hardcoded the dev Authentik end-session URL, so prod logout landed on auth.dezky.local. Mirror the portal's env-driven pattern (NUXT_PUBLIC_AUTH_URL/NUXT_PUBLIC_OPERATOR_URL with .local fallbacks). Expose authUrl/operatorUrl via public runtimeConfig and use them for the Authentik admin links and the cosmetic host labels (sidebar, eyebrows, auth-page hints). Portal: signed-out + webmail copy now derive their hosts from runtime config (new public.mailUrl, NUXT_PUBLIC_MAIL_URL in prod).	2026-06-10 19:51:25 +02:00
Ronni Baslund	91134c94f5	feat(auth): Redis-backed OIDC sessions for portal + operator ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 19s Details ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 22s Details ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 23s Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 28s Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 23s Details ci / test (push) Successful in 31s Details ci / build (map[dir:apps/booking name:booking]) (push) Successful in 9s Details ci / build (map[dir:apps/operator name:operator]) (push) Successful in 43s Details ci / build (map[dir:services/platform-api name:platform-api]) (push) Successful in 5s Details ci / build (map[dir:apps/portal name:portal]) (push) Successful in 51s Details ci / deploy (push) Failing after 3m42s Details nuxt-oidc-auth persists sessions via useStorage('oidc'), whose default mount is per-pod memory — broken at >1 replica (random 401s) and every deploy logged all users out. A nitro plugin now mounts 'oidc' on the dezky-data Redis (db 1, app-prefixed keys, 14d TTL) when SESSION_REDIS_URL is set; dev keeps the memory driver with no Redis required. Replicas back to 2 for both apps.	2026-06-10 18:48:16 +02:00
Ronni Baslund	fd0c5d011b	fix(infra): single replica for portal/operator (per-pod OIDC sessions) ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 22s Details ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 24s Details ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 21s Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 26s Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 21s Details ci / test (push) Successful in 30s Details ci / build (map[dir:apps/booking name:booking]) (push) Successful in 10s Details ci / build (map[dir:apps/operator name:operator]) (push) Successful in 9s Details ci / build (map[dir:apps/portal name:portal]) (push) Successful in 6s Details ci / build (map[dir:services/platform-api name:platform-api]) (push) Successful in 6s Details ci / deploy (push) Successful in 41s Details nuxt-oidc-auth stores sessions in per-pod memory. With 2 replicas, any request balanced to the pod that didn't handle the login 401s — in practice roughly half of all operator API calls failed after sign-in. One replica until sessions move to shared storage (nitro storage on the dezky-data Redis), then scale back up. Already scaled live; this pins the manifests so the next deploy doesn't undo it.	2026-06-10 18:41:59 +02:00
Ronni Baslund	b155e34fe6	fix(infra): runtime OIDC overrides for prod portal/operator login ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 20s Details ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 24s Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 26s Details ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 23s Details ci / build (map[dir:apps/booking name:booking]) (push) Successful in 9s Details ci / build (map[dir:apps/operator name:operator]) (push) Successful in 9s Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 18s Details ci / test (push) Successful in 34s Details ci / build (map[dir:apps/portal name:portal]) (push) Successful in 6s Details ci / build (map[dir:services/platform-api name:platform-api]) (push) Successful in 6s Details ci / deploy (push) Successful in 41s Details CI builds the Nuxt images with no env, so nuxt.config bakes empty OIDC client creds and .local Authentik URLs into runtimeConfig — sign-in dead-ended on the app's own /auth/login. Nitro env overrides only apply when the var name matches the runtimeConfig path (oidc.providers.oidc.* -> NUXT_OIDC_PROVIDERS_OIDC_), so production secrets need that second set of names; the plain NUXT_OIDC_ ones only work in dev. Also pin NUXT_OIDC_TOKEN_KEY/AUTH_SESSION_SECRET so sessions survive pod restarts. Live secrets patched on the cluster accordingly.	2026-06-10 13:24:29 +02:00
Ronni Baslund	3b9b06a99b	docs(runbook): app tier + push-to-deploy CI/CD flow ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 20s Details ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 23s Details ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 20s Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 26s Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 22s Details ci / test (push) Successful in 32s Details ci / build (map[dir:apps/booking name:booking]) (push) Successful in 9s Details ci / build (map[dir:apps/operator name:operator]) (push) Successful in 9s Details ci / build (map[dir:apps/portal name:portal]) (push) Successful in 6s Details ci / build (map[dir:services/platform-api name:platform-api]) (push) Successful in 5s Details ci / deploy (push) Successful in 41s Details Bring the runbook up to the 2026-06-10 state: app tier + CI/CD in current state, a Deploy flow section (push to main = release, rollback, break-glass, required Gitea secrets), reproduce steps 8-9 (app tier secrets+apply, CI runner + ci-deployer with the runner gotchas), per-router ACME-safe redirect instead of the old global one, platform-api key read-back for Bitwarden, and a pruned TODO list.	2026-06-10 12:19:47 +02:00
Ronni Baslund	9a58e486e3	docs(fleet): note verified push-to-deploy pipeline ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 21s Details ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 23s Details ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 21s Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 26s Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 23s Details ci / test (push) Successful in 31s Details ci / build (map[dir:apps/operator name:operator]) (push) Successful in 9s Details ci / build (map[dir:apps/booking name:booking]) (push) Successful in 9s Details ci / build (map[dir:apps/portal name:portal]) (push) Successful in 6s Details ci / build (map[dir:services/platform-api name:platform-api]) (push) Successful in 5s Details ci / deploy (push) Successful in 41s Details	2026-06-10 09:20:18 +02:00
Ronni Baslund	323c46fba1	fix(ci): share dind's unix socket with the runner (jobs need a mountable docker host) ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 42s Details ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 45s Details ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 21s Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 26s Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 20s Details ci / test (push) Successful in 32s Details ci / build (map[dir:apps/booking name:booking]) (push) Successful in 34s Details ci / build (map[dir:apps/operator name:operator]) (push) Successful in 46s Details ci / build (map[dir:services/platform-api name:platform-api]) (push) Successful in 35s Details ci / build (map[dir:apps/portal name:portal]) (push) Successful in 49s Details ci / deploy (push) Successful in 45s Details gitea/runner can only bind-mount a UNIX-socket docker host into job containers — the old tcp://localhost:2376 + TLS daemon address cannot be mounted, so build jobs still had no docker API. Share dind's /var/run/docker.sock with the runner via a /var/run emptyDir and drop the DOCKER_HOST/TLS env; the runner auto-finds the socket and the bind path resolves inside dind where the socket lives.	2026-06-10 08:51:44 +02:00
Ronni Baslund	1114be6c93	fix(ci): expose the dind docker host to job containers ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 45s Details ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 50s Details ci / build (map[dir:apps/operator name:operator]) (push) Failing after 5s Details ci / deploy (push) Has been skipped Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 27s Details ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 23s Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 24s Details ci / test (push) Successful in 35s Details ci / build (map[dir:apps/booking name:booking]) (push) Failing after 7s Details ci / build (map[dir:apps/portal name:portal]) (push) Failing after 5s Details ci / build (map[dir:services/platform-api name:platform-api]) (push) Failing after 6s Details gitea/runner 1.x no longer auto-mounts the docker daemon into job containers (act_runner 0.2.x did), so 'docker build' in the build jobs failed with 'cannot connect to /var/run/docker.sock'. container.docker_host "" restores find-and-mount.	2026-06-10 08:34:54 +02:00
Ronni Baslund	ec707643d6	fix(ci): act_runner 0.2.11 -> gitea/runner 1.0.8 ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 45s Details ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 48s Details ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 23s Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 28s Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 20s Details ci / test (push) Successful in 33s Details ci / build (map[dir:apps/booking name:booking]) (push) Failing after 5s Details ci / build (map[dir:apps/operator name:operator]) (push) Failing after 6s Details ci / build (map[dir:apps/portal name:portal]) (push) Failing after 5s Details ci / build (map[dir:services/platform-api name:platform-api]) (push) Failing after 5s Details ci / deploy (push) Has been skipped Details Gitea 1.26 never marked finished jobs complete with the deprecated act_runner 0.2.11: the runner ran the job, logged 'Job succeeded' and freed its slot, but Gitea kept the job 'Running' forever, so dependent jobs (build -> deploy) were never dispatched. gitea/runner is the successor project; config, env vars and the .runner registration file are unchanged.	2026-06-10 08:02:40 +02:00
Ronni Baslund	c60937c5cb	feat(ci): deploy to k3s straight from the pipeline (drop Flux plan) ci / build (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / build (map[dir:apps/operator name:operator]) (push) Has been cancelled Details ci / build (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / build (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / deploy (push) Has been cancelled Details ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/operator name:operator]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details Push to main = release: after build, a deploy job pins each app image to the commit SHA (kustomize edit set image), kubectl-applies fleet/apps and waits for the rollouts. The runner already runs in-cluster, so it reaches the API server on the in-cluster service IP with a kubeconfig for the new ci-deployer ServiceAccount (namespace-scoped admin, KUBECONFIG_B64 repo secret). The drafted Flux sync/image-automation layer is removed — a GitOps controller plus bot tag-bump commits is more machinery than a single-node cluster needs. Sortable image tags and $imagepolicy markers go with it. Also: per-router ACME-safe HTTP->HTTPS redirects for the app ingresses, platform-api prod config completed (Authentik JWT/JWKS + admin API, Stalwart via the cni0 gateway IP, OCIS/cold-storage placeholders until those tiers exist) and the secrets template/README updated to match.	2026-06-10 07:53:55 +02:00
Ronni Baslund	52e0f5e375	feat(operator): production build + k3s deployment - Dockerfile for the operator app (same pattern as portal/booking). - Env-driven auth/app base URLs in nuxt.config so one build serves dev (.local) and production (.eu). - Deployment + Service + Ingress on operator.dezky.eu. - Add operator to the typecheck matrix.	2026-06-10 07:53:55 +02:00
Ronni Baslund	d02eb5ec50	fix(authentik): pin chart 2026.5.2, grant_types allowlist, portal redirect URI - Pin the helm-controller chart version (unset = silent latest upgrades) and move the image tag under global.image per the 2026.5 chart layout. - Authentik 2026.5 enforces a per-provider grant_types allowlist; empty list rejected every authorize request. Allow authorization_code + refresh_token for portal and operator providers. - Fix the portal redirect URI to the nuxt-oidc-auth callback path. - Serve the auth ingress on :80 with a per-router HTTPS redirect so the cert-manager HTTP-01 solver keeps working.	2026-06-10 07:53:49 +02:00
Ronni Baslund	4c5fdde787	fix(infra): docker:24-dind + capacity 2 (fix moby cgroup-v2 teardown deadlock that hung 'Complete job')	2026-06-08 22:56:21 +02:00
Ronni Baslund	aef0f44915	chore(infra): act_runner capacity 4 + disable cache server ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 31s Details ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details Add an act_runner config.yaml (ConfigMap, CONFIG_FILE env): capacity 4 so the typecheck matrix + image builds run in parallel instead of one-at-a-time, and cache.enabled: false (we removed the setup-node cache; the cache server isn't reachable from the DinD job containers anyway).	2026-06-08 22:46:43 +02:00
Ronni Baslund	f331e3c1e6	feat(infra): in-cluster Gitea Actions runner (act_runner + dind) ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details Self-registering act_runner on node1 with a privileged docker:dind sidecar so workflow jobs can build + push app images (k3s has containerd only, no Docker daemon). Labels ubuntu-latest + docker; state persisted on a Longhorn PVC. The registration token is applied out-of-band as the gitea-runner-token Secret (not in git). Verified: runner declared successfully, dind API up.	2026-06-08 22:13:38 +02:00
Ronni Baslund	a27c238c76	feat(infra): nightly DB-dump CronJobs feeding the Restic backup ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details pg_dumpall (all Postgres DBs + roles) and mongodump (all Mongo DBs) write gzipped dumps to the hostPath /opt/dezky-backup/dumps at 02:50/02:52 UTC, which the host Restic job (03:20) ships to the Storage Box. Each keeps the last 7 local dumps; Restic holds the real off-box retention. - pods run as root (hostPath dir is root-owned, as is the host Restic reader) - mongo job uses bash (mongo:7 /bin/sh is dash → no pipefail) - creds from postgres-secret / mongo-secret via secretKeyRef Verified: both jobs Complete, dumps present on the host (postgres-all ~2.2MB w/ Authentik data, mongo archive).	2026-06-08 21:55:14 +02:00
Ronni Baslund	861212831d	fix(infra): restic→Storage Box backups working end-to-end ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details Three fixes found bringing up backups on node1: - restic.env wrote BACKUP_PATHS/RETENTION unquoted → sourcing ran a path as a command ("Is a directory"); now quoted. - ssh config was written to $BACKUP_HOME/.ssh/config, but restic runs as root and its ssh resolves ~ from the passwd db (not $HOME), so it reads /root/.ssh/config — write the Storage Box block there. Also StrictHostKeyChecking=no + UserKnownHostsFile=/dev/null (safe: restic encrypts before upload; fixes flaky Storage Box host-key verification). - Storage Box SFTP lands in /home, so the repo path needs the /home prefix (absolute /dezky hit the root-owned chroot parent → SSH_FX_FAILURE). Verified: repo initialized, nightly snapshot of mail store + Stalwart config + etcd snapshots + dumps dir, `restic check` clean, retention applied.	2026-06-08 21:46:49 +02:00
Ronni Baslund	9d075343c5	feat(infra): migrate Stalwart to the v0.16 config model (config.json) ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details v0.16 dropped TOML config. The host service now boots from a tiny config.json that describes only the datastore (RocksDB); all other settings live in the DB (web UI / stalwart-cli / platform-api JMAP). - add stalwart/config.json (RocksDb datastore at /opt/stalwart/data) - install.sh: install config.json instead of config.toml - stalwart-mail.service: --config points at config.json - README: document the v0.16 model + remaining DB-side config + DNS/PTR Verified: Stalwart 0.16.8 runs on node1 with default mail listeners + the :8080 management server. config.toml retained as a reference for the DB settings.	2026-06-08 21:02:17 +02:00
Ronni Baslund	149eb0b020	fix(infra): Stalwart installer — repo rename + exact asset; flag 0.16 config break ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details - install.sh: default repo stalwartlabs/mail-server -> stalwartlabs/stalwart (renamed), and select the exact /stalwart-<target>.tar.gz asset excluding the foundationdb build (head -n1 could grab the wrong one). - config.toml: $env{...} -> %{env:...}% (correct Stalwart macro syntax). KNOWN ISSUE: Stalwart v0.16 removed TOML config (single config.json datastore + everything else in the DB via CLI/UI), so this config.toml does not load on 0.16.8 ("Failed to parse data store settings"). Needs either a pinned pre-0.16 version or a migration to the v0.16 config model. Binary is installed; the service is stopped pending that decision.	2026-06-08 20:51:56 +02:00
Ronni Baslund	326b626fc6	feat(infra): full dezky rebrand of Authentik login (logo, favicon, bg, footer) ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details Brand CSS only reaches the flow shadow DOM via CSS vars (colors), not the logo/favicon (deeper shadow root) or the "Powered by authentik" footer (light DOM). So, dev-style: serve real dezky assets + sed the bundle. - web-assets/: dezky-logo.svg, dezky-favicon.svg, dezky-bg.svg (carbon). - server-rebrand.py: patches the authentik-server Deployment with an initContainer that copies /web/dist to an emptyDir, drops the svgs into assets/icons, and seds "Powered by authentik" -> "Powered by Dezky". - brand.yaml: branding_logo / branding_favicon / branding_default_flow_background point at the served svgs; auth-flow title "Welcome to Dezky"; signal-green CSS. Verified live: login now matches dev (logo, title, carbon bg, green button, favicon, Powered by Dezky). Durability caveat documented (reverts on helm upgrade).	2026-06-08 20:36:01 +02:00
Ronni Baslund	99cd86cd3a	feat(infra): full dezky branding on Authentik (logo, carbon bg, flow title) ci / typecheck (map[dir:apps/booking name:booking]) (push) Failing after 7s Details ci / test (push) Failing after 7s Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Failing after 6s Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details branding_logo / branding_default_flow_background are file-path fields (reject data URIs), so the dezky logo + carbon background are injected via the brand's custom CSS (data URIs allowed there): logo replaces the authentik wordmark, background overrides the forest. Auth-flow title -> "Welcome to Dezky". Signal-green primary button retained.	2026-06-08 19:54:44 +02:00
Ronni Baslund	db1354a151	feat(infra): Authentik blueprints (portal+operator OIDC, dezky brand) ci / typecheck (map[dir:apps/booking name:booking]) (push) Failing after 6s Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details Mirror the dev Authentik config in prod via blueprints, applied & successful on node1: - brand.yaml: dezky branding on the default brand (title + signal-green custom CSS) — login page now in dezky colors. - portal-application.yaml / operator-application.yaml: dezky-portal & dezky-operator OIDC apps/providers (prod redirect URLs) + the dezky-platform-admins group & operator access policy. Two 2026.5 gotchas handled + documented in README: - invalidation_flow is now REQUIRED on OAuth2 providers (added via !Find). - ConfigMap mounts are symlinks (discovery can't read them) → worker uses an initContainer that copies them to an emptyDir as real files. (chart worker.volumes didn't apply on this version; patch reverts on helm upgrade — noted as a durability TODO.) Client secrets (PORTAL/OPERATOR_OIDC_CLIENT_SECRET) live in authentik-secret; the apps must reuse them.	2026-06-08 19:46:48 +02:00
Ronni Baslund	406e2ca78b	feat(infra): deploy Authentik (auth.dezky.eu) + global HTTP→HTTPS redirect ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details - Authentik on the in-cluster Postgres/Redis (mirrors the dev compose config: external DB/Redis, error-reporting off, update-check off, bootstrap admin), via the k3s Helm controller; Ingress + cert-manager letsencrypt-prod. Live at https://auth.dezky.eu (image 2026.5.2). Secrets generated on-box (Bitwarden). - Traefik HelmChartConfig: global :80 -> :443 (308) redirect via additionalArguments (to=:443, HTTP-01-safe). - RUNBOOK updated. Deferred (mirror remaining dev bits): OIDC app blueprints (portal/operator with prod URLs) + the cosmetic "Powered by Dezky" rebrand.	2026-06-08 19:00:07 +02:00
Ronni Baslund	153d7053ca	feat(infra): k3s foundation — cert-manager, Longhorn config, in-cluster data tier ci / typecheck (map[dir:apps/website name:website]) (push) Failing after 10m58s Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Failing after 11m56s Details ci / typecheck (map[dir:apps/booking name:booking]) (push) Failing after 14m0s Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details Adds the production cluster foundation (authored + applied live on node1): - cert-manager via the k3s HelmChart controller + letsencrypt staging/prod ClusterIssuers (HTTP-01 / Traefik). - Longhorn config for single-node (values: replica=1, default StorageClass, Retain) + backup-to-Hetzner-Object-Storage credential template. - In-cluster data tier (dezky-data): Postgres 16 (with Authentik+OCIS DB init), MongoDB 7, Redis 7 as StatefulSets on Longhorn, + secret template. - bootstrap.sh: install open-iscsi/nfs-common + enable iscsid (Longhorn prereq). - RUNBOOK.md: full reproducible node1 build order. Real secrets are generated on-box and kept in Bitwarden — never in git.	2026-06-08 18:39:31 +02:00
Ronni Baslund	35bc7b6c31	chore(infra): production manifests + CI for scheduling apps ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details	2026-06-07 09:27:44 +02:00
Ronni Baslund	3831c85285	feat(infra): production host bootstrap and bare-metal Stalwart scaffolding Host provisioning for the single-server production target: SSH + firewall hardening (nftables allowlist), k3s node registration, bare-metal Stalwart install with systemd units and TLS cert-sync from the cluster secret, and Restic encrypted backup/restore (primary + DR) with timer units. Host-specific secrets live in config.env (gitignored); config.env.example is the template. Also gitignores MemPalace per-project files.	2026-06-07 00:19:48 +02:00

36 Commits