Commit Graph

143 Commits

Author SHA1 Message Date
Ronni Baslund fd0c5d011b fix(infra): single replica for portal/operator (per-pod OIDC sessions)
ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 22s
ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 24s
ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 21s
ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 26s
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 21s
ci / test (push) Successful in 30s
ci / build (map[dir:apps/booking name:booking]) (push) Successful in 10s
ci / build (map[dir:apps/operator name:operator]) (push) Successful in 9s
ci / build (map[dir:apps/portal name:portal]) (push) Successful in 6s
ci / build (map[dir:services/platform-api name:platform-api]) (push) Successful in 6s
ci / deploy (push) Successful in 41s
nuxt-oidc-auth stores sessions in per-pod memory. With 2 replicas, any
request balanced to the pod that didn't handle the login 401s — in practice
roughly half of all operator API calls failed after sign-in. One replica
until sessions move to shared storage (nitro storage on the dezky-data
Redis), then scale back up. Already scaled live; this pins the manifests so
the next deploy doesn't undo it.
2026-06-10 18:41:59 +02:00
Ronni Baslund 83212d7c23 feat(operator): create direct tenants from the operator portal
ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 19s
ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 21s
ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 18s
ci / build (map[dir:apps/booking name:booking]) (push) Successful in 9s
ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 27s
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 21s
ci / test (push) Successful in 29s
ci / build (map[dir:apps/portal name:portal]) (push) Successful in 5s
ci / build (map[dir:services/platform-api name:platform-api]) (push) Successful in 5s
ci / build (map[dir:apps/operator name:operator]) (push) Successful in 29s
ci / deploy (push) Successful in 40s
The operator could list and inspect tenants but had no create flow — tenant
creation only existed as the partner-portal wizard, which always attaches a
partnerId. Platform-api's POST /tenants (platform-admin only, no partner
field) was already built for this; add the missing UI: a New tenant modal on
the tenants page (slug, name, plan/cycle/currency/seats, optional primary
mail domain + first-admin invite) and the server proxy route. Operator-created
tenants are direct customers; attach a partner later if needed.
2026-06-10 13:53:41 +02:00
Ronni Baslund b155e34fe6 fix(infra): runtime OIDC overrides for prod portal/operator login
ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 20s
ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 24s
ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 26s
ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 23s
ci / build (map[dir:apps/booking name:booking]) (push) Successful in 9s
ci / build (map[dir:apps/operator name:operator]) (push) Successful in 9s
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 18s
ci / test (push) Successful in 34s
ci / build (map[dir:apps/portal name:portal]) (push) Successful in 6s
ci / build (map[dir:services/platform-api name:platform-api]) (push) Successful in 6s
ci / deploy (push) Successful in 41s
CI builds the Nuxt images with no env, so nuxt.config bakes empty OIDC
client creds and .local Authentik URLs into runtimeConfig — sign-in
dead-ended on the app's own /auth/login. Nitro env overrides only apply
when the var name matches the runtimeConfig path
(oidc.providers.oidc.* -> NUXT_OIDC_PROVIDERS_OIDC_*), so production
secrets need that second set of names; the plain NUXT_OIDC_* ones only
work in dev. Also pin NUXT_OIDC_TOKEN_KEY/AUTH_SESSION_SECRET so sessions
survive pod restarts. Live secrets patched on the cluster accordingly.
2026-06-10 13:24:29 +02:00
Ronni Baslund 3b9b06a99b docs(runbook): app tier + push-to-deploy CI/CD flow
ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 20s
ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 23s
ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 20s
ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 26s
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 22s
ci / test (push) Successful in 32s
ci / build (map[dir:apps/booking name:booking]) (push) Successful in 9s
ci / build (map[dir:apps/operator name:operator]) (push) Successful in 9s
ci / build (map[dir:apps/portal name:portal]) (push) Successful in 6s
ci / build (map[dir:services/platform-api name:platform-api]) (push) Successful in 5s
ci / deploy (push) Successful in 41s
Bring the runbook up to the 2026-06-10 state: app tier + CI/CD in current
state, a Deploy flow section (push to main = release, rollback, break-glass,
required Gitea secrets), reproduce steps 8-9 (app tier secrets+apply, CI
runner + ci-deployer with the runner gotchas), per-router ACME-safe redirect
instead of the old global one, platform-api key read-back for Bitwarden, and
a pruned TODO list.
2026-06-10 12:19:47 +02:00
Ronni Baslund 9a58e486e3 docs(fleet): note verified push-to-deploy pipeline
ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 21s
ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 23s
ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 21s
ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 26s
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 23s
ci / test (push) Successful in 31s
ci / build (map[dir:apps/operator name:operator]) (push) Successful in 9s
ci / build (map[dir:apps/booking name:booking]) (push) Successful in 9s
ci / build (map[dir:apps/portal name:portal]) (push) Successful in 6s
ci / build (map[dir:services/platform-api name:platform-api]) (push) Successful in 5s
ci / deploy (push) Successful in 41s
2026-06-10 09:20:18 +02:00
Ronni Baslund 323c46fba1 fix(ci): share dind's unix socket with the runner (jobs need a mountable docker host)
ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 42s
ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 45s
ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 21s
ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 26s
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 20s
ci / test (push) Successful in 32s
ci / build (map[dir:apps/booking name:booking]) (push) Successful in 34s
ci / build (map[dir:apps/operator name:operator]) (push) Successful in 46s
ci / build (map[dir:services/platform-api name:platform-api]) (push) Successful in 35s
ci / build (map[dir:apps/portal name:portal]) (push) Successful in 49s
ci / deploy (push) Successful in 45s
gitea/runner can only bind-mount a UNIX-socket docker host into job
containers — the old tcp://localhost:2376 + TLS daemon address cannot be
mounted, so build jobs still had no docker API. Share dind's
/var/run/docker.sock with the runner via a /var/run emptyDir and drop the
DOCKER_HOST/TLS env; the runner auto-finds the socket and the bind path
resolves inside dind where the socket lives.
2026-06-10 08:51:44 +02:00
Ronni Baslund 1114be6c93 fix(ci): expose the dind docker host to job containers
ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 45s
ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 50s
ci / build (map[dir:apps/operator name:operator]) (push) Failing after 5s
ci / deploy (push) Has been skipped
ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 27s
ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 23s
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 24s
ci / test (push) Successful in 35s
ci / build (map[dir:apps/booking name:booking]) (push) Failing after 7s
ci / build (map[dir:apps/portal name:portal]) (push) Failing after 5s
ci / build (map[dir:services/platform-api name:platform-api]) (push) Failing after 6s
gitea/runner 1.x no longer auto-mounts the docker daemon into job
containers (act_runner 0.2.x did), so 'docker build' in the build jobs
failed with 'cannot connect to /var/run/docker.sock'. container.docker_host
"" restores find-and-mount.
2026-06-10 08:34:54 +02:00
Ronni Baslund 3590c356a4 fix(ci): registry login via REGISTRY_TOKEN PAT
ci / build (map[dir:apps/booking name:booking]) (push) Failing after 6s
ci / deploy (push) Has been skipped
ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 24s
ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 24s
ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 23s
ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 28s
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 23s
ci / test (push) Successful in 31s
ci / build (map[dir:apps/operator name:operator]) (push) Failing after 6s
ci / build (map[dir:apps/portal name:portal]) (push) Failing after 6s
ci / build (map[dir:services/platform-api name:platform-api]) (push) Failing after 6s
The per-job GITHUB_TOKEN is no longer accepted by the container registry's
/v2/ basic-auth endpoint since the act_runner -> gitea/runner switch (login
fails 'unauthorized' before push). Use a personal access token with package
read+write scope, provided as the REGISTRY_TOKEN repo secret.
2026-06-10 08:18:32 +02:00
Ronni Baslund ec707643d6 fix(ci): act_runner 0.2.11 -> gitea/runner 1.0.8
ci / typecheck (map[dir:apps/booking name:booking]) (push) Successful in 45s
ci / typecheck (map[dir:apps/operator name:operator]) (push) Successful in 48s
ci / typecheck (map[dir:apps/website name:website]) (push) Successful in 23s
ci / typecheck (map[dir:apps/portal name:portal]) (push) Successful in 28s
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 20s
ci / test (push) Successful in 33s
ci / build (map[dir:apps/booking name:booking]) (push) Failing after 5s
ci / build (map[dir:apps/operator name:operator]) (push) Failing after 6s
ci / build (map[dir:apps/portal name:portal]) (push) Failing after 5s
ci / build (map[dir:services/platform-api name:platform-api]) (push) Failing after 5s
ci / deploy (push) Has been skipped
Gitea 1.26 never marked finished jobs complete with the deprecated
act_runner 0.2.11: the runner ran the job, logged 'Job succeeded' and freed
its slot, but Gitea kept the job 'Running' forever, so dependent jobs
(build -> deploy) were never dispatched. gitea/runner is the successor
project; config, env vars and the .runner registration file are unchanged.
2026-06-10 08:02:40 +02:00
Ronni Baslund c60937c5cb feat(ci): deploy to k3s straight from the pipeline (drop Flux plan)
ci / build (map[dir:apps/booking name:booking]) (push) Has been cancelled
ci / build (map[dir:apps/operator name:operator]) (push) Has been cancelled
ci / build (map[dir:apps/portal name:portal]) (push) Has been cancelled
ci / build (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / deploy (push) Has been cancelled
ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled
ci / typecheck (map[dir:apps/operator name:operator]) (push) Has been cancelled
ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
Push to main = release: after build, a deploy job pins each app image to the
commit SHA (kustomize edit set image), kubectl-applies fleet/apps and waits
for the rollouts. The runner already runs in-cluster, so it reaches the API
server on the in-cluster service IP with a kubeconfig for the new ci-deployer
ServiceAccount (namespace-scoped admin, KUBECONFIG_B64 repo secret).

The drafted Flux sync/image-automation layer is removed — a GitOps controller
plus bot tag-bump commits is more machinery than a single-node cluster needs.
Sortable image tags and $imagepolicy markers go with it.

Also: per-router ACME-safe HTTP->HTTPS redirects for the app ingresses,
platform-api prod config completed (Authentik JWT/JWKS + admin API, Stalwart
via the cni0 gateway IP, OCIS/cold-storage placeholders until those tiers
exist) and the secrets template/README updated to match.
2026-06-10 07:53:55 +02:00
Ronni Baslund 52e0f5e375 feat(operator): production build + k3s deployment
- Dockerfile for the operator app (same pattern as portal/booking).
- Env-driven auth/app base URLs in nuxt.config so one build serves
  dev (.local) and production (.eu).
- Deployment + Service + Ingress on operator.dezky.eu.
- Add operator to the typecheck matrix.
2026-06-10 07:53:55 +02:00
Ronni Baslund d02eb5ec50 fix(authentik): pin chart 2026.5.2, grant_types allowlist, portal redirect URI
- Pin the helm-controller chart version (unset = silent latest upgrades) and
  move the image tag under global.image per the 2026.5 chart layout.
- Authentik 2026.5 enforces a per-provider grant_types allowlist; empty list
  rejected every authorize request. Allow authorization_code + refresh_token
  for portal and operator providers.
- Fix the portal redirect URI to the nuxt-oidc-auth callback path.
- Serve the auth ingress on :80 with a per-router HTTPS redirect so the
  cert-manager HTTP-01 solver keeps working.
2026-06-10 07:53:49 +02:00
Ronni Baslund c814bfdf3b feat(ci): build + push app images to the Gitea registry
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 20s
ci / test (push) Failing after 12m29s
ci / typecheck (map[dir:apps/website name:website]) (push) Failing after 12m55s
ci / typecheck (map[dir:apps/portal name:portal]) (push) Failing after 14m6s
ci / typecheck (map[dir:apps/booking name:booking]) (push) Failing after 14m8s
ci / build (map[dir:services/platform-api name:platform-api]) (push) Failing after 14m4s
ci / build (map[dir:apps/portal name:portal]) (push) Failing after 14m54s
ci / build (map[dir:apps/booking name:booking]) (push) Failing after 14m56s
After typecheck + test pass on main, build portal/booking/platform-api images
(matrix) via the dind sidecar and push to git.lastcloud.io tagged latest + SHA.
Auth uses the runner's job token against the same Gitea instance.
2026-06-09 09:02:36 +02:00
Ronni Baslund e3ce011674 fix(ci): drop actions/setup-node — use runner image's node (fixes ETXTBSY)
ci / typecheck (map[dir:apps/portal name:portal]) (push) Failing after 10m29s
ci / typecheck (map[dir:apps/booking name:booking]) (push) Failing after 10m50s
ci / test (push) Failing after 13m22s
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Failing after 14m11s
ci / typecheck (map[dir:apps/website name:website]) (push) Failing after 14m36s
actions/setup-node writes node into a tool-cache shared across concurrent jobs;
with capacity>1 one job execs node while another writes it → "/usr/bin/env:
'node': Text file busy". The catthehacker runner image already ships node 24,
and corepack (bundled) reads each app's packageManager — so setup-node is
unneeded. Removing it eliminates the shared-cache race.
2026-06-08 23:00:58 +02:00
Ronni Baslund 72a0559b77 ci: verify run after dind fix
ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled
ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
2026-06-08 22:56:21 +02:00
Ronni Baslund 4c5fdde787 fix(infra): docker:24-dind + capacity 2 (fix moby cgroup-v2 teardown deadlock that hung 'Complete job') 2026-06-08 22:56:21 +02:00
Ronni Baslund aef0f44915 chore(infra): act_runner capacity 4 + disable cache server
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Successful in 31s
ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled
ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled
ci / test (push) Has been cancelled
Add an act_runner config.yaml (ConfigMap, CONFIG_FILE env): capacity 4 so the
typecheck matrix + image builds run in parallel instead of one-at-a-time, and
cache.enabled: false (we removed the setup-node cache; the cache server isn't
reachable from the DinD job containers anyway).
2026-06-08 22:46:43 +02:00
Ronni Baslund 46970b7e99 ci: trigger fresh run to verify green (corepack + portal TS fixes applied)
ci / typecheck (map[dir:apps/booking name:booking]) (push) Failing after 10m31s
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled
ci / test (push) Has been cancelled
2026-06-08 22:41:17 +02:00
Ronni Baslund b2cda6937c fix(portal): typecheck error in scheduling (TS18048)
ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled
timeToMin destructured [h, m] from t.split(':').map(Number); under
noUncheckedIndexedAccess those are number|undefined, so `h * 60` errored. Use
default-value destructuring ([h = 0, m = 0]). Surfaced now that the Gitea runner
actually runs the typecheck job (it never ran before).
2026-06-08 22:38:41 +02:00
Ronni Baslund b953be5fa2 fix(ci): use corepack instead of pnpm/action-setup
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled
ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled
ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled
pnpm/action-setup@v4 ran at the repo root (uses: steps ignore
defaults.run.working-directory) where there is no package.json, so it couldn't
read the pnpm version → "No pnpm version specified". Use corepack (bundled with
node) in the install step, which reads each app's own packageManager — matching
the Dockerfiles. Verified in the runner's container: corepack enable + frozen
install succeeds for every app.
2026-06-08 22:36:57 +02:00
Ronni Baslund 7177fa6b9a fix(ci): pin pnpm version in Actions (no root package.json to read)
ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled
ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
pnpm/action-setup ran with no version: `uses:` steps ignore
defaults.run.working-directory, so it executed at the repo root, which has no
package.json (per-app monorepo) → "No pnpm version specified". Pin version: 9
explicitly. Also drop setup-node's `cache: pnpm` — the act_runner cache server
isn't reachable from the DinD job containers, and the install is fast anyway.
2026-06-08 22:29:32 +02:00
Ronni Baslund 955357a91a feat(apps): make environment URLs prod-ready (env-driven, not hardcoded .local)
ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled
ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
The apps were wired for the dev (.local) environment. Drive the base URLs from
env so one build serves dev and prod (.eu):

- portal nuxt.config: OIDC authorization/token/userinfo/discovery URLs +
  redirectUri now derive from NUXT_PUBLIC_AUTH_URL / NUXT_PUBLIC_PORTAL_URL
  (+ PORTAL_OIDC_APP_SLUG); .local defaults keep dev working with no env.
- portal sign-out handler: end-session + post-logout URLs env-driven.
- portal scheduling page: booking base/host from runtimeConfig.public.bookingUrl
  (NUXT_PUBLIC_BOOKING_URL).
- platform-api: tenant mail domain suffix from PLATFORM_TENANT_DOMAIN (dezky.eu
  in prod), defaulting to dezky.local.

(booking needs no change — its only .local ref is the dev-server allowedHosts.)
2026-06-08 22:18:51 +02:00
Ronni Baslund f331e3c1e6 feat(infra): in-cluster Gitea Actions runner (act_runner + dind)
ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled
ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
Self-registering act_runner on node1 with a privileged docker:dind sidecar so
workflow jobs can build + push app images (k3s has containerd only, no Docker
daemon). Labels ubuntu-latest + docker; state persisted on a Longhorn PVC. The
registration token is applied out-of-band as the gitea-runner-token Secret
(not in git). Verified: runner declared successfully, dind API up.
2026-06-08 22:13:38 +02:00
Ronni Baslund a27c238c76 feat(infra): nightly DB-dump CronJobs feeding the Restic backup
ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled
ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
pg_dumpall (all Postgres DBs + roles) and mongodump (all Mongo DBs) write
gzipped dumps to the hostPath /opt/dezky-backup/dumps at 02:50/02:52 UTC, which
the host Restic job (03:20) ships to the Storage Box. Each keeps the last 7
local dumps; Restic holds the real off-box retention.

- pods run as root (hostPath dir is root-owned, as is the host Restic reader)
- mongo job uses bash (mongo:7 /bin/sh is dash → no pipefail)
- creds from postgres-secret / mongo-secret via secretKeyRef

Verified: both jobs Complete, dumps present on the host
(postgres-all ~2.2MB w/ Authentik data, mongo archive).
2026-06-08 21:55:14 +02:00
Ronni Baslund 861212831d fix(infra): restic→Storage Box backups working end-to-end
ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled
ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
Three fixes found bringing up backups on node1:
- restic.env wrote BACKUP_PATHS/RETENTION unquoted → sourcing ran a path as a
  command ("Is a directory"); now quoted.
- ssh config was written to $BACKUP_HOME/.ssh/config, but restic runs as root
  and its ssh resolves ~ from the passwd db (not $HOME), so it reads
  /root/.ssh/config — write the Storage Box block there. Also
  StrictHostKeyChecking=no + UserKnownHostsFile=/dev/null (safe: restic encrypts
  before upload; fixes flaky Storage Box host-key verification).
- Storage Box SFTP lands in /home, so the repo path needs the /home prefix
  (absolute /dezky hit the root-owned chroot parent → SSH_FX_FAILURE).

Verified: repo initialized, nightly snapshot of mail store + Stalwart config +
etcd snapshots + dumps dir, `restic check` clean, retention applied.
2026-06-08 21:46:49 +02:00
Ronni Baslund 9d075343c5 feat(infra): migrate Stalwart to the v0.16 config model (config.json)
ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled
ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
v0.16 dropped TOML config. The host service now boots from a tiny config.json
that describes only the datastore (RocksDB); all other settings live in the DB
(web UI / stalwart-cli / platform-api JMAP).

- add stalwart/config.json (RocksDb datastore at /opt/stalwart/data)
- install.sh: install config.json instead of config.toml
- stalwart-mail.service: --config points at config.json
- README: document the v0.16 model + remaining DB-side config + DNS/PTR

Verified: Stalwart 0.16.8 runs on node1 with default mail listeners + the :8080
management server. config.toml retained as a reference for the DB settings.
2026-06-08 21:02:17 +02:00
Ronni Baslund 149eb0b020 fix(infra): Stalwart installer — repo rename + exact asset; flag 0.16 config break
ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled
ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
- install.sh: default repo stalwartlabs/mail-server -> stalwartlabs/stalwart
  (renamed), and select the exact /stalwart-<target>.tar.gz asset excluding the
  foundationdb build (head -n1 could grab the wrong one).
- config.toml: $env{...} -> %{env:...}% (correct Stalwart macro syntax).

KNOWN ISSUE: Stalwart v0.16 removed TOML config (single config.json datastore +
everything else in the DB via CLI/UI), so this config.toml does not load on
0.16.8 ("Failed to parse data store settings"). Needs either a pinned pre-0.16
version or a migration to the v0.16 config model. Binary is installed; the
service is stopped pending that decision.
2026-06-08 20:51:56 +02:00
Ronni Baslund 326b626fc6 feat(infra): full dezky rebrand of Authentik login (logo, favicon, bg, footer)
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled
ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled
Brand CSS only reaches the flow shadow DOM via CSS vars (colors), not the
logo/favicon (deeper shadow root) or the "Powered by authentik" footer (light
DOM). So, dev-style: serve real dezky assets + sed the bundle.

- web-assets/: dezky-logo.svg, dezky-favicon.svg, dezky-bg.svg (carbon).
- server-rebrand.py: patches the authentik-server Deployment with an
  initContainer that copies /web/dist to an emptyDir, drops the svgs into
  assets/icons, and seds "Powered by authentik" -> "Powered by Dezky".
- brand.yaml: branding_logo / branding_favicon / branding_default_flow_background
  point at the served svgs; auth-flow title "Welcome to Dezky"; signal-green CSS.

Verified live: login now matches dev (logo, title, carbon bg, green button,
favicon, Powered by Dezky). Durability caveat documented (reverts on helm
upgrade).
2026-06-08 20:36:01 +02:00
Ronni Baslund 99cd86cd3a feat(infra): full dezky branding on Authentik (logo, carbon bg, flow title)
ci / typecheck (map[dir:apps/booking name:booking]) (push) Failing after 7s
ci / test (push) Failing after 7s
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Failing after 6s
ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled
branding_logo / branding_default_flow_background are file-path fields (reject
data URIs), so the dezky logo + carbon background are injected via the brand's
custom CSS (data URIs allowed there): logo replaces the authentik wordmark,
background overrides the forest. Auth-flow title -> "Welcome to Dezky".
Signal-green primary button retained.
2026-06-08 19:54:44 +02:00
Ronni Baslund db1354a151 feat(infra): Authentik blueprints (portal+operator OIDC, dezky brand)
ci / typecheck (map[dir:apps/booking name:booking]) (push) Failing after 6s
ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled
ci / test (push) Has been cancelled
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
Mirror the dev Authentik config in prod via blueprints, applied & successful on
node1:
- brand.yaml: dezky branding on the default brand (title + signal-green custom
  CSS) — login page now in dezky colors.
- portal-application.yaml / operator-application.yaml: dezky-portal &
  dezky-operator OIDC apps/providers (prod redirect URLs) + the
  dezky-platform-admins group & operator access policy.

Two 2026.5 gotchas handled + documented in README:
- invalidation_flow is now REQUIRED on OAuth2 providers (added via !Find).
- ConfigMap mounts are symlinks (discovery can't read them) → worker uses an
  initContainer that copies them to an emptyDir as real files. (chart
  worker.volumes didn't apply on this version; patch reverts on helm upgrade —
  noted as a durability TODO.)

Client secrets (PORTAL/OPERATOR_OIDC_CLIENT_SECRET) live in authentik-secret;
the apps must reuse them.
2026-06-08 19:46:48 +02:00
Ronni Baslund 406e2ca78b feat(infra): deploy Authentik (auth.dezky.eu) + global HTTP→HTTPS redirect
ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled
ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
- Authentik on the in-cluster Postgres/Redis (mirrors the dev compose config:
  external DB/Redis, error-reporting off, update-check off, bootstrap admin),
  via the k3s Helm controller; Ingress + cert-manager letsencrypt-prod. Live at
  https://auth.dezky.eu (image 2026.5.2). Secrets generated on-box (Bitwarden).
- Traefik HelmChartConfig: global :80 -> :443 (308) redirect via
  additionalArguments (to=:443, HTTP-01-safe).
- RUNBOOK updated.

Deferred (mirror remaining dev bits): OIDC app blueprints (portal/operator with
prod URLs) + the cosmetic "Powered by Dezky" rebrand.
2026-06-08 19:00:07 +02:00
Ronni Baslund 153d7053ca feat(infra): k3s foundation — cert-manager, Longhorn config, in-cluster data tier
ci / typecheck (map[dir:apps/website name:website]) (push) Failing after 10m58s
ci / typecheck (map[dir:apps/portal name:portal]) (push) Failing after 11m56s
ci / typecheck (map[dir:apps/booking name:booking]) (push) Failing after 14m0s
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
Adds the production cluster foundation (authored + applied live on node1):
- cert-manager via the k3s HelmChart controller + letsencrypt staging/prod
  ClusterIssuers (HTTP-01 / Traefik).
- Longhorn config for single-node (values: replica=1, default StorageClass,
  Retain) + backup-to-Hetzner-Object-Storage credential template.
- In-cluster data tier (dezky-data): Postgres 16 (with Authentik+OCIS DB init),
  MongoDB 7, Redis 7 as StatefulSets on Longhorn, + secret template.
- bootstrap.sh: install open-iscsi/nfs-common + enable iscsid (Longhorn prereq).
- RUNBOOK.md: full reproducible node1 build order.

Real secrets are generated on-box and kept in Bitwarden — never in git.
2026-06-08 18:39:31 +02:00
Ronni Baslund 65a68ee126 feat(ocis): persistent sessions + flat primary surfaces
ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled
ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
- Request offline_access for the ocis-web client (WEB_OIDC_SCOPE) so the web
  SPA gets a refresh token and renews silently instead of dropping the session
  (no surprise logouts; the "no permission to upload" symptom was the
  expired-token state). The ocis-provider already has the offline_access scope
  mapping; its access-token validity is bumped 5m → 1h (refresh 30d).
- Flatten the remaining brand gradients in index.html: the active sidebar
  highlight (.oc-background-primary-gradient) and primary buttons
  (.oc-button-primary-filled) are now solid carbon (text stays light/readable).
- Document the offline_access + token-validity provider settings in
  AUTHENTIK-SETUP.md (the provider lives in Authentik's DB, not git).
2026-06-07 12:34:26 +02:00
Ronni Baslund 8a9fd36f33 feat(ocis): dezky whitelabel theme for the files web UI
ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled
ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
Skin OCIS web in the dezky brand so users don't see ownCloud/Infinite Scale.

- Custom theme.json (WEB_UI_THEME_PATH + WEB_ASSET_THEMES_PATH): dezky name,
  slogan, logos (light wordmark for the dark top bar, dark wordmark for the
  light login, favicon), and the full dezky palette — carbon chrome, signal
  yellow as a sparing accent, paper/bone surfaces, dezky semantic colours
- Pin the light theme as default (single variant) so OS-dark / auto-system
  always resolves to it
- Override only index.html via WEB_ASSET_CORE_PATH (OCIS falls back to the
  embedded core per-file): hide the ".versions" footer ("Infinite Scale … /
  ownCloud Web UI …") and set the pre-hydration <title>/theme-color to dezky

Apache-2.0 lets us drop the ownCloud marks without trademark fees. NOTE:
index.html pins the built bundle hashes — refresh it after an OCIS image bump.
2026-06-07 12:14:04 +02:00
Ronni Baslund b7f10eb092 fix(portal): app launcher opens real per-service hosts
The "Jump to" launcher only navigated for the internal tiles (Personal /
Admin / Partner); every external app (Mail, Drev, Møder, …) just fired a
toast and never opened. Hosts were also hardcoded to *.dezky.com, with Drev
pointing at a vanity drev. subdomain instead of the real OCIS host.

- Open external apps in a new tab at https://<host>.<baseDomain>
- Derive the base domain from the portal's own hostname so links resolve in
  every environment (app.dezky.local → dezky.local, app.dezky.com → dezky.com)
- Map Drev → files (OCIS); mail/meet/chat/cal/contacts/docs use their service
  subdomain
2026-06-07 12:13:59 +02:00
Ronni Baslund 98e49bfe34 feat(admin/users): editable member drawer + mailbox & ownership management
ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled
ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
Rebuild the /admin/users detail drawer from a read-only profile into an
editable, Office 365-style panel with four sections:

- Username & mail: read-only primary for mailbox users; editable sign-in
  (Authentik-only) for mailbox-less identities; "Create mailbox" provisions
  a Stalwart inbox for an external-login admin
- Aliases: list/add/remove mailbox aliases (Stalwart), domain-scoped
- Role: member/admin toggle with a primary-account lock (owner, mailbox-less
  bootstrap admin, self) and a last-admin guard
- Contact information: display name, first/last name, phone, alternative
  email — mirrored best-effort to Authentik attributes + mailbox name

Ownership transfer: "Make owner" (row menu + drawer) plus an owner-side
"Transfer ownership" picker, gated to tenant admins / platform admins so a
departed owner can be replaced; promotes the target and demotes the prior
owner to admin.

Backend (platform-api): contact fields on User; AuthentikClient.updateUser;
StalwartClient.setMailboxName; UsersService updateTenantMember,
changeMemberPrimaryEmail, list/add/removeMemberAlias, createMailboxForMember,
transferOwnership; new DTOs and tenant-member routes. All mutations audited.

Portal: Nuxt proxies for the new endpoints + extended TenantUserDoc.
2026-06-07 10:34:53 +02:00
Ronni Baslund 90e8a22de4 feat(scheduling): calendar_failed badge + admin "retry now" action
ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled
ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
Surface pending/calendar_failed booking states in the admin bookings list with
proper status badges (failed shows the last calendar error as a tooltip), and
add an operator "Retry now" action. The retry re-drives the same Stalwart
calendar write (confirm + attendee email on success); for a terminal
calendar_failed booking it re-claims the slot lock atomically first and refuses
if the time was taken in the meantime, so a manual retry can never double-book.
2026-06-07 09:39:42 +02:00
Ronni Baslund 35bc7b6c31 chore(infra): production manifests + CI for scheduling apps
ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled
ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled
ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled
ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled
ci / test (push) Has been cancelled
2026-06-07 09:27:44 +02:00
Ronni Baslund b2c2650af9 test(scheduling): property-based slot tests + guarded Stalwart integration test 2026-06-07 09:23:16 +02:00
Ronni Baslund 8bbb7881a4 feat(scheduling): tenant scheduling overview/analytics 2026-06-07 09:17:01 +02:00
Ronni Baslund 95cbdc4e3d feat(scheduling): round-robin team event types 2026-06-07 09:14:08 +02:00
Ronni Baslund b9b4d56a2d feat(scheduling): tenant webhooks for booking lifecycle 2026-06-07 09:08:45 +02:00
Ronni Baslund e33b7f18a3 feat(scheduling): pluggable captcha (Turnstile) on public booking 2026-06-07 09:02:35 +02:00
Ronni Baslund e1a77b085f feat(scheduling): optional JWT-authed dezky Meet rooms 2026-06-07 08:58:00 +02:00
Ronni Baslund 851018f481 feat(scheduling): date-overrides UI for availability 2026-06-07 08:55:52 +02:00
Ronni Baslund f41475ac3b feat(scheduling): ignoreAllDayEvents option 2026-06-07 08:53:31 +02:00
Ronni Baslund 2cb13a1a14 feat(scheduling): retry calendar writes for pending bookings
A failed Stalwart calendar write during confirmation no longer deletes the
booking + SlotLock. The booking stays 'pending' with its lock retained, and a
new @Cron worker (every 2 min, max 5 attempts by default) re-drives the write:
on success it promotes to 'confirmed' and sends the confirmation email; after
the cap it moves to the terminal 'calendar_failed' state and releases the lock.

Tracks calendarWriteAttempts + lastCalendarError on the Booking. The public
confirm endpoint still throws 503 on a failed first write (preserving the DoD:
never surface a confirmed booking without a calendar event); the pending row is
left for the background retry to finish.
2026-06-07 08:49:53 +02:00
Ronni Baslund 9e1defa946 feat(scheduling): booking reminder emails 2026-06-07 00:31:33 +02:00
Ronni Baslund 3831c85285 feat(infra): production host bootstrap and bare-metal Stalwart scaffolding
Host provisioning for the single-server production target: SSH + firewall
hardening (nftables allowlist), k3s node registration, bare-metal Stalwart
install with systemd units and TLS cert-sync from the cluster secret, and
Restic encrypted backup/restore (primary + DR) with timer units. Host-specific
secrets live in config.env (gitignored); config.env.example is the template.
Also gitignores MemPalace per-project files.
2026-06-07 00:19:48 +02:00
Ronni Baslund 5ed3d2bc5f feat(scheduling): dezky Scheduling — Calendly-style booking on Stalwart calendars
First-party booking system on top of Stalwart calendars (no third-party
scheduling dependency). Hosts expose public booking pages; visitors pick a
slot computed from the host's live Stalwart free/busy, and confirming writes
the event to the host's calendar and sends a dezky-branded confirmation with
an .ics.

platform-api (services/platform-api/src/scheduling):
- Schemas: Host, StalwartCredential (AES-256-GCM at rest), AvailabilitySchedule,
  EventType, Booking, SlotLock (unique (hostId,startUtc) + TTL).
- StalwartCalendarModule: JMAP gateway (free/busy via Principal/getAvailability,
  event create/delete, scheduleAgent=client) + on-behalf app-password
  provisioning. CredentialCipher for at-rest encryption.
- DST-correct slot engine (Luxon) with unit tests; two-layer double-booking
  guard (atomic SlotLock + live free/busy re-check).
- Booking confirm/cancel/reschedule, branded email + .ics via JMAP submission,
  self-service manage tokens. /api/v1 public + tenant-gated admin routes,
  per-IP rate limiting.

apps/booking: standalone public, whitelabel booking app (booking.dezky.eu) —
path-based tenant resolution, per-tenant brand colour, booking + manage flows.

apps/portal: admin scheduling page (hosts, event types, availability, bookings
with edit/delete + admin cancel/reschedule) and proxy routes.

infra: booking dev service in docker-compose; scheduling env vars.
2026-06-07 00:17:36 +02:00