Files
dezky/docs/OPERATOR-PLAN.md
T
Ronni Baslund 8e6f73a921 feat(operator): design system port + persistent shell (O.4)
Operator portal now wears its real chrome instead of placeholder spans.
Sidebar + topbar + page header all rendered against the carbon palette
from tokens.css.

Components ported from the source design (operator-app.jsx,
platform-ui.jsx, operator-screens.jsx) as Vue 3 SFCs in
apps/operator/components/:

  Foundation: NodeMark (copied from portal), UiIcon (expanded to 31 icons
  covering sidebar/topbar/sort/arrows)

  Primitives: Card (3 surface variants), UiButton (primary / secondary /
  ghost / dark / danger × sm / md / lg), DataTable (header + rows),
  Badge (7 tones), Avatar (deterministic palette by name hash), Mono,
  Eyebrow, StatusDot, PageHeader (with actions slot)

  Shell: OpSidebar (collapsible 232<->56px, 12 nav items in 4 sections,
  active-row highlight from route, badge slot, brand + user footer);
  OpTopbar (env badge with prod/staging/dev variants, palette trigger
  stub for the ⌘K work in O.8, on-call pill, bell, avatar)

Layouts: layouts/default.vue wires sidebar + topbar + slot; layouts/blank.vue
is used by the login page (definePageMeta layout:'blank'). app.vue now
wraps NuxtPage in NuxtLayout (the missing piece — without it Nuxt warns
"Your project has layouts but the <NuxtLayout /> component has not been
used" and renders nothing chrome-wise).

Composable composables/useSidebar.ts holds the collapsed state shared
between OpSidebar's toggle button and layouts/default.vue's ⌘[ keyboard
shortcut.

Verified in the browser:
  - Sidebar renders all 12 nav links with section dividers, env badge shows
    PROD, PageHeader resolves to the user's display name from
    useOidcAuth().user
  - Collapse toggle flips sidebar width 232↔56; nav rows become icon-only
  - Smoke test on the placeholder home still returns 409 for the seeded
    test-partner (token forwarding survives the layout refactor)

Gotcha documented in the plan: Vite 7.3 added a strict
server.allowedHosts check that returns plaintext 403 for any host header
that isn't the dev origin. The customer portal pre-dates this Vite
version; operator needs allowedHosts: ['operator.dezky.local'] in
nuxt.config.ts under vite.server.

Pages/index.vue replaces the bare HTML placeholder from O.3 with the
new PageHeader + Card primitives — same smoke-test functionality, much
better visual fidelity.

Real screen content (Tenants, Partners, Infrastructure, etc.) lands in
O.5+. This commit is the chrome, the smoke test, and the verification
that the design system primitives compose correctly.
2026-05-24 07:32:08 +02:00

21 KiB
Raw Blame History

Operator Portal — Plan

operator.dezky.local (dev) → operator.dezky.com (prod). Internal admin portal for Dezky staff: managing tenants, partners, operating the platform.

Distinct from the customer portal at app.dezky.local. Different OAuth client, different cookie domain, different surface — though they share Authentik as the IdP and (eventually) platform-api as the backend.

This file is the running record of decisions made during the design grilling session. Updated inline as questions resolve.


Scope — C-visual with real management for Tenants + Partners

Decision: build every screen from the source design visually, but back two domains with real CRUD from day one — Tenants and Partners. Everything else renders against mock-data fixtures until its backend is built.

Surface Day-1 state
Overview / dashboard Visual — aggregates from real Tenant+Partner data where available, mock for the rest
Tenants (list + detail with 7 tabs) Real backend, full CRUD, suspend/resume/delete
Partners (list + detail) Real backend, new schema, full CRUD
Users (global) Real read across tenants (already in DB)
Support queue Mock
Platform billing Mock
Reports Mock
Infrastructure Visual; could derive from Docker health checks but probably mock initially
Feature flags Mock
Audit log Mock (real backfill is a follow-up)
Operator team Real (Users with platformAdmin: true)
Platform settings Mock
Command palette ⌘K Visual — opens, navigates, but "execute action" just toasts
Impersonation modal + banner Visual — confirms the action but doesn't actually mint a token
Incident modal Mock
Env switcher (prod/staging/dev) Cosmetic — picks a label, no real env switch
On-call indicator Mock

Real-backend surface this adds

Two genuinely new things on the backend:

  1. Partner schema and CRUD in services/platform-api — id, name, domain, status, customers count (computed), MRR (computed), margin, sinceDate. Tenants gain an optional partnerId field. The existing dezky seed gets no partner.
  2. Tenant lifecycle actions beyond create — suspend, resume, change plan, change seat cap, soft-delete with grace period. Existing schema covers most of this; controllers need new methods.

Everything else (incidents, flags, support tickets, audit log collection, impersonation tokens) stays mock until explicitly promoted.


Lives at apps/operator/ — separate Nuxt app

Decision: new Nuxt 3 app, separate package.json, separate Traefik route at operator.dezky.local. Reuses design tokens / NodeMark / UiIcon by copy for now; a packages/ui workspace is a likely follow-up once we have a third consumer.

Why separate, not a route group in apps/portal/: security boundary. The moment any operator-only feature mutates customer state (impersonation, suspend tenant), a routing or middleware bug on a shared app is catastrophic. Separate apps make that nearly impossible. Different cookies, different OIDC client, different domain.

Cost: one more docker-compose service, ~10 lines of Traefik labels, one more volume for node_modules. Some duplicated dev tooling (eslint, tsconfig).


Auth — new dezky-operator Authentik OAuth provider

Decision: a dedicated OAuth client in Authentik, distinct from dezky-portal.

  • New provider dezky-operator (confidential, PKCE on)
  • Redirect URIs: https://operator.dezky.local/auth/oidc/callback
  • Group binding: dezky-platform-admins required at the provider's authorization flow (Authentik policy), so non-admins can't even consent
  • Stricter policies attached only to this provider: MFA required, future IP allowlist for the office network/VPN
  • Token audience claim: dezky-operator
  • Provisioning's JwtAuthGuard widens its audience check to a list: ['dezky-portal', 'dezky-operator']
  • Per-endpoint guard for operator-only mutations: require aud === 'dezky-operator' AND actor.platformAdmin === true. The audience check makes "is this a privileged session" provable from the token alone, independent of the DB lookup

UX trade-off accepted: if Ronni (or any operator who is also a customer) wants to be in both apps, they log into Authentik twice — once per audience. Correct security-wise, fine ergonomically.


Backend stays as one service — rename to services/platform-api

Decision: route all operator mutations and reads through the existing NestJS service (no second backend, no Nitro-direct-to-Mongo). Rename services/provisioningservices/platform-api because the service now owns more than just provisioning — it's the platform's data + control plane.

What changes during the rename:

  • Directory: services/provisioning/services/platform-api/
  • Package: @dezky/provisioning@dezky/platform-api
  • Docker container name: dezky-provisioningdezky-platform-api
  • Compose service key, network alias, volume names
  • Portal env var: PROVISIONING_INTERNAL_URLPLATFORM_API_INTERNAL_URL
  • Portal proxy routes: http://provisioning:3001http://platform-api:3001
  • Internal module names referencing "provisioning" stay (e.g. ProvisioningService is now one orchestration concern inside platform-api; not the whole service's purpose)
  • Public URL stays api.dezky.local (Traefik routes by Host header, unaffected)

New endpoints platform-api gains in this phase:

  • POST /tenants/:slug/suspend, POST /tenants/:slug/resume
  • PATCH /tenants/:slug already exists; ensure it can change plan / seat cap
  • GET /partners, POST /partners, GET /partners/:slug, PATCH /partners/:slug
  • Tenant.partnerId foreign key + filter on tenant queries
  • JwtAuthGuard accepts both dezky-portal and dezky-operator audiences; per-endpoint requirement of dezky-operator aud for operator-only mutations

Strategy: rename in a separate prep commit before the operator work starts, so the rename diff is mechanical and reviewable on its own.


Partner schema

@Schema({ collection: 'partners', timestamps: true })
class Partner {
  slug: string                     // 'nordicmsp', URL-safe, unique
  name: string                     // 'NordicMSP'
  domain: string                   // 'nordicmsp.dk' — partner's own org domain
  status: 'active' | 'in-negotiation' | 'paused' | 'terminated'  // default 'in-negotiation'
  marginPct: number                // 20 = partner keeps 20% of customer MRR; one number per partner
  partnershipStartedAt?: Date
  contactInfo: { primaryName?, primaryEmail?, billingEmail? }
  billingInfo: { /* same shape as Tenant.billingInfo */ }
}

Tenant side: add partnerId?: Types.ObjectId (ref Partner, indexed, optional). Direct customers have no partnerId; partner-owned customers reference one.

Computed at query time, not stored:

  • Partner.customers — count of tenants with partnerId === this._id
  • Partner.mrr — sum of those tenants' MRR

Storing denormalized would force write-time syncing on every tenant create/suspend/plan-change for ~zero benefit at our scale.

Operator-only. A self-serve partner portal at partner.dezky.local is a future surface; not in this phase. Partners are visible/manageable only from the operator app.


Impersonation — visual stub now, real flow later

Decision: build the UI exactly as designed (modal with reason field, top banner, exit button) but do not wire actual token exchange. The confirm action toasts "impersonation not implemented yet" and writes a mock audit entry.

Why now: validates the UX, lets future hires see the operator surface end-to-end, doesn't introduce a dangerous capability before there's an operational need.

Mitigations against confusion:

  • Modal carries a Demo only badge — same styling as other stub-data badges in the operator UI
  • Toast on confirm makes the no-op explicit
  • The banner does display in mock mode (so we can iterate on its design), but the underlying session state is local to the operator tab

Real flow design recorded for the follow-up: OAuth 2 Token Exchange (RFC 8693). Authentik supports it. Customer portal needs to accept tokens carrying an act claim alongside sub, and show its own impersonation banner when the two differ. ~2 days of careful work + security review.


Decisions made without grilling (small, low-risk)

  • Theme: dark by default. Existing apps/portal/assets/styles/tokens.css already defines [data-theme='dark'] tokens; the operator app sets <html data-theme="dark"> at app root and reuses them
  • Mock data location: TypeScript files under apps/operator/data/ (tenants-mock.ts, partners-mock.ts, flags-mock.ts, etc.). Same shape as operator-data.jsx from the design bundle, just retyped
  • Design system reuse: copy NodeMark.vue, UiIcon.vue, and the auth components into apps/operator/components/ directly. A shared packages/ui workspace becomes worth doing once a third surface needs them (partner portal? landing site?)
  • OCIS / Stalwart admin shortcuts in operator UI: out of scope for this phase. Operator drills via the customer-facing service URLs

Follow-up tasks (post-MVP)

In rough priority order:

  1. Real impersonation flow — OAuth Token Exchange (RFC 8693), customer portal act-claim handling, audit on entry+exit, banner with origin operator identity
  2. Real audit log collection — replace mock fixtures with a platform_audit collection in Mongo that platform-api writes on every privileged action; stream from there in the operator UI
  3. Feature flag backendFlag schema + per-tenant rollout state + a tiny flag-eval client every service imports
  4. Incident management backendIncident schema + paging integration (PagerDuty / OpsGenie / custom). Until then, the incident modal renders from mock
  5. Support ticket queueSupportTicket schema + email-in ingestion from a dedicated mailbox via Stalwart
  6. Self-serve Partner portal at partner.dezky.local — Phase 6+ work, own Nuxt app, own OAuth client, scoped to a partner's own customers
  7. Real environment switcher — currently cosmetic; would need separate API endpoints per env, separate Authentik tenants, etc.
  8. Real on-call indicator — integration with the paging system that gets installed in (4)
  9. Operator workspace impersonation in OCIS/Stalwart — operator tooling reaches into the customer's file storage and mail for support, with the same audit trail as portal impersonation

Out of scope for this entire effort

  • Multi-region operator UI
  • Read-only investor / board mode (a real persona but build it when there's a real investor — design has a placeholder "Read-only" role for Jonas Berg)
  • White-label of the operator portal (partners get their own portal eventually; Dezky operator never gets white-labeled — it's our internal tool)

Execution checklist

Tick boxes as work lands. Each phase is roughly one commit. Phases must be done in order — earlier ones unblock later ones.

O.0 · Prep — service rename ✓

  • Rename services/provisioning/services/platform-api/
  • Update package.json name → @dezky/platform-api
  • Update docker-compose.yml: container name, service key, volume name, env var PROVISIONING_INTERNAL_URLPLATFORM_API_INTERNAL_URL, NUXT_API_BASE points at new hostname
  • Update portal proxy routes to read PLATFORM_API_INTERNAL_URL and default to http://platform-api:3001
  • Sweep docs (README, CLAUDE.md, SERVICES.md, AUTHENTIK-SETUP.md, NEXT-STEPS.md, TROUBLESHOOTING.md) for stale references
  • Verify customer portal /api/me still works end-to-end after rename

O.1 · Authentik — operator OAuth client ✓

  • Create dezky-operator OAuth provider via Authentik API
  • Set redirect URIs to https://operator.dezky.local/auth/oidc/{callback,logout}
  • Confidential client; client_secret persisted to .env as OPERATOR_OIDC_CLIENT_SECRET
  • Dezky Operator application created and linked to the provider
  • Group binding on the application: dezky-platform-admins required to reach the consent screen. (Authentik 2025.10 supports group-direct policy bindings — no separate policy_group_membership object needed)
  • Deferred to follow-up: MFA-required policy on this provider. Authentik does this via a stage binding on the authentication flow, which is app-specific configuration we'll wire when there's an actual MFA enrollment to gate against. For dev with one akadmin, akadmin already has WebAuthn — the auth flow prompts for it automatically
  • Discovery doc verified at /application/o/dezky-operator/.well-known/openid-configuration — issuer correct, scopes include groups, all endpoints resolve

Gotchas worth noting

  • Authentik 2025.10 requires both authorization_flow AND invalidation_flow when creating OAuth2 providers. The default invalidation flow is at /api/v3/flows/instances/?designation=invalidation (slug default-provider-invalidation-flow)
  • The policies/group_membership/ endpoint mentioned in older Authentik docs is gone in 2025.10. Use policies/bindings/ with a direct group reference instead

O.2 · platform-api — multi-audience + Partner CRUD ✓

  • JwtAuthGuard: accepts comma-separated AUTHENTIK_AUDIENCE (dezky-portal,dezky-operator). Both audiences validate; per-endpoint guards further restrict
  • OperatorGuard (not a decorator — a regular CanActivate guard) enforcing aud includes 'dezky-operator' && actor.platformAdmin. Applied via @UseGuards(JwtAuthGuard, OperatorGuard)
  • schemas/partner.schema.ts — Partner model
  • partners/ module: controller + service + DTOs (create / read / update / soft-terminate / list tenants under partner)
  • partnerId?: Types.ObjectId added to Tenant schema (indexed, sparse). UpdateTenantDto accepts partnerId to attach/detach
  • Partner.customers aggregated at query time (count of Tenants by partnerId). MRR aggregation deferred — Tenant has no monthly amount yet and Subscription lacks a price column. Will land when Subscription gains pricing
  • Tenant lifecycle endpoints: POST /tenants/:slug/suspend, POST /tenants/:slug/resume (operator-only). PATCH already accepts plan/domains/partnerId changes
  • Smoke test: customer-portal token → POST /partners returns 403 "This endpoint requires an operator-scoped token" ✓. Positive test (operator token → 200) deferred until O.3 when the operator app exists to mint that token

O.3 · Scaffold apps/operator/

  • apps/operator/package.json (Nuxt 3, nuxt-oidc-auth 1.0.0-beta.11)
  • nuxt.config.ts wired against the dezky-operator Authentik provider: client_id=dezky-operator, audience claim becomes dezky-operator, scope includes groups, exposeAccessToken: true so the Nitro proxy can forward it
  • Docker compose service operator running on the dezky network, mkcert root CA mounted, Traefik route at operator.dezky.local
  • Network alias on Traefik: operator.dezky.local
  • operator.dezky.local added to /etc/hosts
  • Distinct session secrets in .env (OPERATOR_NUXT_OIDC_*) — the two apps can't decrypt each other's session cookies
  • Verified login: signing in lands on the placeholder index showing Operator portal · placeholder with the user's identity
  • Smoke test POST /partners: operator session returns 200 (partner created in Mongo), idempotent re-call returns 409 (already exists), customer-portal session returns 403 ("requires operator-scoped token")
  • JwtAuthGuard extended to accept multi-issuer as well as multi-audience (each Authentik OAuth provider has its own per-app iss URL); AUTHENTIK_ISSUER env is now comma-separated. The audience change in O.2 wasn't enough on its own — issuer matching is separate

O.4 · Design system + app shell ✓

  • assets/styles/tokens.css carbon-default (done in O.3)
  • assets/styles/base.css (done in O.3)
  • NodeMark.vue (copied unchanged from portal), UiIcon.vue (expanded set: 31 icons covering sidebar/topbar/sort/arrows)
  • Shared primitives: Card, UiButton (5 variants × 3 sizes), DataTable, Badge (7 tones), Mono, Eyebrow, StatusDot, Avatar (deterministic palette), PageHeader
  • OpSidebar.vue — collapsible (232↔56px), 12 nav items in 4 sections, active-row highlight from route, badge slot per item, brand mark + user identity footer
  • OpTopbar.vue — env badge (prod/staging/dev), ⌘K palette trigger stub, on-call pill, bell, avatar
  • layouts/default.vue wires sidebar + topbar + <slot />; layouts/blank.vue for the login page; app.vue uses <NuxtLayout>
  • Keyboard shortcut: ⌘[ collapses/expands sidebar (verified — width flips 232↔56 in the browser via the toggle click). ⌘K palette lands in O.8
  • Verified in browser: shell renders with all 12 nav links, env badge shows PROD, PageHeader title resolves to the user's display name, smoke test re-confirmed 409 on the seeded test-partner (token forwarding still works after the layout refactor)

Gotcha worth noting

  • Vite 7.3 added a strict server.allowedHosts check that blocks any Host header that isn't an exact match for the dev origin. The customer portal was scaffolded under an older Vite and pre-dates this. Operator needs allowedHosts: ['operator.dezky.local'] in nuxt.config.ts under vite.server or every request 403s with a plaintext error.

O.5 · Tenant management (real backend)

  • pages/tenants/index.vue — list with status/plan/seats/MRR columns, filter by partner and status, search by slug/name
  • pages/tenants/[slug].vue — detail view with tabs
  • Tab: Overview — header card, key stats, partner link
  • Tab: Users — list users via GET /users?tenantSlug=…
  • Tab: Resources — provisioning status per integration (Authentik / Stalwart / OCIS), error messages, "Reconcile" button
  • Tab: Billing (mock fixtures)
  • Tab: Audit (mock fixtures)
  • Tab: Support (mock fixtures)
  • Tab: Danger — suspend, resume, change plan, soft-delete; real backend calls, confirmation modals

O.6 · Partner management (real backend)

  • pages/partners/index.vue — list with name/domain/status/customers/MRR
  • pages/partners/[slug].vue — detail panel with customers list, MRR breakdown, margin, contact info
  • "Create partner" modal — POST /partners
  • Attach / detach tenant to partner (PATCH on tenant.partnerId)

O.7 · Visual-only screens (mock fixtures)

  • data/*.ts — typed mock fixtures (tenants-extra, partners-extra, services, incident, flags, audit, team)
  • pages/index.vue — Overview dashboard
  • pages/operator-team.vue — real backend (Users where platformAdmin === true)
  • pages/users.vue — global users, real read
  • pages/infrastructure.vue — service health (mock for now; docker health check integration is a follow-up)
  • pages/flags.vue — feature flags (mock)
  • pages/audit.vue — global audit (mock)
  • pages/support.vue — placeholder
  • pages/billing.vue — placeholder
  • pages/reports.vue — placeholder
  • pages/settings.vue — placeholder

O.8 · Interactions

  • CommandPalette.vue — ⌘K opens, fuzzy search over tenants + partners + flags + nav items + actions
  • ImpersonationModal.vue — visual stub with reason field, Demo-only badge, no-op confirm + toast
  • ImpersonationBanner.vue — top banner shown when impersonating
  • IncidentModal.vue — mock incident render
  • TweaksPanel.vue — theme (light/dark), density (comfy/compact), env (prod/staging/dev cosmetic switch)

O.9 · Verification

  • Sign in to operator.dezky.local as akadmin via the new OAuth client
  • Confirm JWT audience is dezky-operator (decode in DevTools, post response back)
  • Create a real Partner via the UI, see it in Mongo
  • Attach the acme tenant to that partner; verify count goes 0 → 1
  • Suspend a tenant from the Danger tab; confirm status: 'suspended' in Mongo
  • Sign in to app.dezky.local simultaneously in another browser profile, confirm the customer portal still works and that customer token's aud is dezky-portal
  • Tick all the relevant follow-up tasks in NEXT-STEPS.md as remaining work, file separate issues if anything was deferred