Real backend for the flags page (was pure mock). Built so it's ready for
the first risky rollout (likely the Stalwart JMAP client or the Stripe
billing engine).
services/platform-api:
- Flag schema (key, description, state, pct, scope.{plans, tenantSlugs,
partnerSlugs, environments}, embedded history capped at 20)
- FlagsService with CRUD + evaluateAll(tenantSlug) → { key: bool }
Eval algorithm:
off → false; on → true
targeted → require non-empty scope (empty allowlist means "nobody"),
then match every non-empty axis
rollout → match scope, then sha256(`${tenantId}:${key}`) % 100 < pct
Hash-based rollout is deterministic: bumping pct only flips the new
slice. Pure helpers (matchesScope, hasAnyScope, inRolloutBucket) are
exported for future unit tests.
- FlagsController exposes GET /flags, GET /flags/:key, POST /flags/evaluate
(JwtAuthGuard); POST/PATCH/DELETE require OperatorGuard. History entries
capture the actor's email.
- SeedService idempotently creates 10 flag keys mapping to real Dezky
concerns (jmap_native_v2, gdpr_export_v2, new_billing_engine, etc.).
$setOnInsert so operator edits survive restarts.
apps/operator:
- 6 proxies: /api/flags index get/post, [key] get/patch/delete, evaluate post
- types/flag.ts with the shape that mirrors the backend
- pages/flags.vue: useFetch real list, row click opens FlagDetail,
"New flag" opens NewFlagModal, scope summary column shows targeting
at a glance
- FlagDetail.vue: side panel with segmented state, rollout slider with
live "~N of M tenants" preview from /api/tenants, plan/tenant/env chip
pickers, dirty-tracked Save, instant Kill-switch (PATCH state=off+pct=0),
embedded change history
- NewFlagModal.vue: minimal create form (key + description). Everything
else is configured in the detail panel afterward.
- CommandPalette: feature-flag rows now come from /api/flags instead of
the dropped fixture, so newly-created flags are searchable immediately
- data/fixtures.ts: drop FLAGS / FeatureFlag exports (replaced by the
real backend)
Smoke-tested end-to-end: list renders 10 seed flags, opening gdpr_export_v2
and flipping to rollout 25% then saving persists + adds a history entry,
kill-switch sets state=off in one click, /api/flags/evaluate returns the
correct booleans for the seeded tenant, same tenant gets the same answer
on consecutive evals (determinism), and creating + deleting a flag through
the UI roundtrips correctly.
The Infrastructure page used to read from a mock fixture that lied two ways:
it listed services that aren't deployed (Jitsi, Zulip, Cloudflare, Object
Storage, Postmark) and showed hardcoded uptime/latency for the ones that
are. Now it shows truth from real probes plus a clearly-labelled "planned"
section for the rest.
Backend (services/platform-api):
- New src/health/ module — HealthService runs 9 probes in parallel with a
1.5s timeout each:
Stalwart → TCP stalwart:8080
OCIS → HTTP GET ocis:9200/health
Collabora → HTTP GET collabora:9980/hosting/discovery
Authentik → HTTP GET authentik-server:9000/-/health/ready/
Postgres → TCP postgres:5432
Mongo → existing Mongoose connection.db.admin().ping()
Redis → TCP redis:6379
Traefik → TCP traefik:80
Platform API → trivially ok (this code is running)
Status thresholds: ok ≤500ms, warn 500–1500ms, bad on timeout/refuse.
- HealthController exposes GET /health/platform behind JwtAuthGuard, plus
keeps the existing public GET /health for infra liveness checks.
- Moved the old src/health.controller.ts into the new module.
Frontend (apps/operator):
- /api/health/platform proxy forwards the operator's access token.
- Infrastructure page swaps SERVICES fixture for useFetch with 30s auto-
refresh + a manual Refresh button. Cards show real status badge + real
latency; uptime/error stay as em-dash with a "no probe history yet"
tooltip until a Prometheus/event-log backend lands.
- Below the live grid, a "Planned · not deployed" section renders 5 dimmed
cards (Jitsi, Zulip, simpledns.plus, Hetzner Object Storage, Postmark).
simpledns.plus replaces the misnamed Cloudflare entry — we use
simpledns.plus, not Cloudflare.
- Subtitle is now truthful: "8 / 9 services live · checked 2s ago".
Verified: stopped redis → card flipped to "down · getaddrinfo ENOTFOUND
redis", subtitle reflected 8/9, incident banner appeared. Restarted →
back to 9/9, banner gone.
SERVICES fixture stays in place for Overview's incident banner — replacing
that is a separate follow-up tied to the incident-management backend.
- Add _verify-token.get.ts to both operator and portal — decodes the
access token stored in the nuxt-oidc-auth session and echoes iss/aud/
sub/groups. Used to confirm operator tokens carry aud=dezky-operator
and portal tokens carry aud=dezky-portal. Listed in NEXT-STEPS.md as
throwaway, to be removed when proper verification surfaces exist.
- OPERATOR-PLAN.md O.9 marked done with the actual claims captured + the
Mongo-side verification of attach + suspend flows.
- NEXT-STEPS.md: replaced the "Operator portal — out-of-band track"
section with a "shipped + follow-ups" version. The 9-item follow-up
list (impersonation, audit, flags, incidents, support, partner
portal, env switcher, on-call, workspace impersonation) is now the
authoritative roadmap, not buried inside OPERATOR-PLAN.md.
- Overview (pages/index.vue): KPIs from real /tenants /partners /users,
status meter, recent + needs-follow-up tables. Mock activity stream and
incident banner overlay come from data/fixtures.ts.
- Operator team: real GET /users filtered to platformAdmin === true,
with last-seen + tenant counts.
- Users (global): real read with All/Admins/Inactive views and search.
- Infrastructure / Feature flags / Audit: mock fixtures only — wiring to
real backends (Prometheus, OpenFeature, append-only audit) is tracked
as follow-ups in OPERATOR-PLAN.md.
- Placeholder pages (support/billing/reports/settings) via OpPlaceholder.
- Shared: Stat, MetricCell, OpPlaceholder components, /api/users proxy,
PlatformUser type.
- .gitignore: scope the docker volumes data/ rule so apps/*/data/ is
tracked again (operator carries mock fixtures there).
- Partners list with name/domain/status/customers/margin + Create modal
- Partner detail: contract card, contact card, customers table, attach modal,
terminate (soft-delete) danger card
- Operator proxies for /partners + /partners/:slug/tenants
- platform-api: add partnerId Prop to Tenant schema. The field was being
silently dropped by Mongoose because the schema didn't declare it.
- tenants.service: rewrite update() to build $set/$unset explicitly and cast
partnerId via new Types.ObjectId(). Handles detach via $unset so the field
vanishes from the doc cleanly.
Operator can now manage tenants end-to-end from the UI:
- pages/tenants/index.vue — list with status/plan/domains/created/
provisioning-state columns, search by slug or name, status chips
with live counts (all/active/pending/suspended), click-through
to detail
- pages/tenants/[slug].vue — 7-tab detail (Overview, Users, Resources,
Billing, Audit, Support, Danger zone)
- 3 tabs hit real backends: Overview (identity + billing fields),
Users (lazy-loaded via new GET /tenants/:slug/users endpoint),
Resources (live provisioning state per integration + Reconcile button)
- 3 tabs render mock fixtures with warn-tone "mock" badges: Billing
(Stripe placeholder), Audit (sample log lines), Support (placeholder
pending the ticket queue work)
- Danger zone: 3 real-backend cards (Suspend / Resume / Soft-delete),
each gated by a ConfirmDialog modal. Verified live — clicked
Suspend on acme, status flipped to 'suspended' in Mongo, then
Resumed back to 'active'
platform-api additions:
- GET /tenants/:slug/users returns users with this tenant in their
tenantIds, sorted by last login. Same authorization rule as the
existing /tenants/:slug — platform admins always pass,
non-admins must be a member of the tenant
- tenants.module imports User schema for the new lookup
New components (apps/operator/components/):
- Tabs.vue — horizontal strip with optional per-tab counts, v-model
- ConfirmDialog.vue — Teleport-to-body modal, Escape/backdrop close,
danger/primary tone for the confirm button
Server proxy infrastructure (apps/operator/server/):
- utils/platform-api.ts — single helper encapsulating
access-token-from-session + bearer-forward + error normalization.
Every operator proxy route is now a one-liner against this helper
- api/tenants/index.get.ts, [slug]/{index.get,index.patch,index.delete,
users.get,suspend.post,resume.post,reconcile.post}.ts
Two real bugs found and fixed during the smoke test:
- Mongoose subdocument `_id` leaks into JSON when iterating
tenant.provisioningStatus. Switched to an explicit
`['authentik', 'stalwart', 'ocis']` whitelist in both v-fors
- Documents created before provisioningErrors was added (like the
acme tenant) don't have the field at all in JSON. Use optional
chaining (`tenant.provisioningErrors?.[k]`) instead of bracket
access. Without it: 'Cannot read properties of undefined (reading
"authentik")' during the Resources tab render
New Nuxt 3 app at apps/operator/ — internal admin portal on its own domain
(operator.dezky.local), own OAuth client (dezky-operator), own session
secrets, own cookies. Customer and operator surfaces can't decrypt each
other's session state.
OAuth flow verified end-to-end:
- GET / → middleware redirect to /auth/login
- User clicks Sign in → /auth/oidc/login → bounces to Authentik with
client_id=dezky-operator, scope includes 'groups'
- Authentik checks dezky-platform-admins group binding (added in O.1),
silent-reauths via the existing auth.dezky.local session
- Returns to /auth/oidc/callback with code, exchanges for token,
creates session cookie on operator.dezky.local
- Lands on pages/index.vue placeholder dashboard
Smoke test 'Create partner "test-partner"' button on the placeholder home
exercises the full operator-only authorization chain:
- 1st call: 200, partner created in Mongo
- 2nd call: 409 'already exists' (idempotency holds, token still valid)
- Same call from the customer portal: 403 'requires operator-scoped
token' (audience guard rejects dezky-portal aud)
JwtAuthGuard now multi-issuer in addition to multi-audience. Each
Authentik OAuth provider mints tokens with its own per-app iss URL
(.../application/o/<slug>/), so the guard accepts a comma-separated
AUTHENTIK_ISSUER. The audience-only fix from O.2 wasn't sufficient —
issuer is validated separately by jose.jwtVerify and was still pinned
to dezky-portal alone, yielding 'unexpected iss claim value' rejections.
Compose changes: new 'operator' service (Node 20 alpine, pnpm install +
nuxt dev, mkcert CA mount, traefik labels for operator.dezky.local +
TLS); new operator_node_modules volume; operator.dezky.local added to
traefik's Docker network aliases. Distinct OPERATOR_NUXT_OIDC_* session
secrets pulled from .env (gitignored, generated via openssl).
Real operator screens (sidebar, topbar, tenants, partners, etc.) come
in O.4. This commit is pure scaffolding + the security boundary proof.