From 92c5056a1d60ded74aef8327a5da0fbf5bbd5317 Mon Sep 17 00:00:00 2001 From: Ronni Baslund Date: Sun, 24 May 2026 00:26:21 +0200 Subject: [PATCH] docs: capture operator portal plan from grilling session OPERATOR-PLAN.md records the decisions from the design review: - Scope: C-visual (full UI fidelity, mock data for most screens) but real CRUD for tenants and partners from day one - Lives at apps/operator/ as a separate Nuxt app, separate domain, separate Authentik OAuth client (dezky-operator), aud-claim distinguishes operator vs portal tokens - Backend stays as a single NestJS service; rename services/provisioning -> services/platform-api as a prep commit - Partner schema designed: slug/name/domain/status/marginPct/contactInfo; Tenant gains optional partnerId; counts and MRR are computed at query time - Impersonation: visual stub now (modal + banner, no-op toast); real OAuth Token Exchange flow recorded as the first follow-up task Also lists follow-up tasks (real audit log, feature flag backend, incident management, partner portal) and out-of-scope items so the next grilling session has a starting point. Pointer added in NEXT-STEPS.md under a new 'Operator portal' track. --- docs/NEXT-STEPS.md | 12 +++ docs/OPERATOR-PLAN.md | 241 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 253 insertions(+) create mode 100644 docs/OPERATOR-PLAN.md diff --git a/docs/NEXT-STEPS.md b/docs/NEXT-STEPS.md index c9c3bc2..d5a125f 100644 --- a/docs/NEXT-STEPS.md +++ b/docs/NEXT-STEPS.md @@ -141,6 +141,18 @@ await authentikClient.coreUsersCreate({ }) ``` +## Operator portal — out-of-band track + +`operator.dezky.local` (internal admin portal — separate Nuxt app, separate +Authentik OAuth client, real CRUD for tenants + partners). Plan and decisions +captured in [`OPERATOR-PLAN.md`](./OPERATOR-PLAN.md). + +Touches platform-api substantially: +- Service rename `services/provisioning` → `services/platform-api` (prep) +- New `Partner` schema + CRUD endpoints +- Tenant lifecycle actions (suspend/resume/plan change) +- Audience-aware JwtAuthGuard for operator-only mutations + ## Phase 5: Custom webmail (week 3-4) Goal: Branded webmail client using Stalwart's JMAP API. diff --git a/docs/OPERATOR-PLAN.md b/docs/OPERATOR-PLAN.md new file mode 100644 index 0000000..197eea5 --- /dev/null +++ b/docs/OPERATOR-PLAN.md @@ -0,0 +1,241 @@ +# Operator Portal — Plan + +`operator.dezky.local` (dev) → `operator.dezky.com` (prod). Internal admin portal +for Dezky staff: managing tenants, partners, operating the platform. + +Distinct from the customer portal at `app.dezky.local`. Different OAuth client, +different cookie domain, different surface — though they share Authentik as the +IdP and (eventually) the provisioning service as the backend. + +This file is the running record of decisions made during the design grilling +session. Updated inline as questions resolve. + +--- + +## Scope — C-visual with real management for Tenants + Partners + +Decision: build every screen from the source design visually, but back two +domains with real CRUD from day one — Tenants and Partners. Everything else +renders against mock-data fixtures until its backend is built. + +| Surface | Day-1 state | +|---|---| +| Overview / dashboard | Visual — aggregates from real Tenant+Partner data where available, mock for the rest | +| Tenants (list + detail with 7 tabs) | **Real backend**, full CRUD, suspend/resume/delete | +| Partners (list + detail) | **Real backend**, new schema, full CRUD | +| Users (global) | Real read across tenants (already in DB) | +| Support queue | Mock | +| Platform billing | Mock | +| Reports | Mock | +| Infrastructure | Visual; could derive from Docker health checks but probably mock initially | +| Feature flags | Mock | +| Audit log | Mock (real backfill is a follow-up) | +| Operator team | Real (Users with `platformAdmin: true`) | +| Platform settings | Mock | +| Command palette ⌘K | Visual — opens, navigates, but "execute action" just toasts | +| Impersonation modal + banner | Visual — confirms the action but doesn't actually mint a token | +| Incident modal | Mock | +| Env switcher (prod/staging/dev) | Cosmetic — picks a label, no real env switch | +| On-call indicator | Mock | + +### Real-backend surface this adds + +Two genuinely new things on the backend: + +1. **Partner schema and CRUD** in `services/provisioning` — id, name, domain, + status, customers count (computed), MRR (computed), margin, sinceDate. Tenants + gain an optional `partnerId` field. The existing `dezky` seed gets no partner. +2. **Tenant lifecycle actions** beyond create — suspend, resume, change plan, + change seat cap, soft-delete with grace period. Existing schema covers most + of this; controllers need new methods. + +Everything else (incidents, flags, support tickets, audit log collection, +impersonation tokens) stays mock until explicitly promoted. + +--- + +## Lives at `apps/operator/` — separate Nuxt app + +Decision: new Nuxt 3 app, separate `package.json`, separate Traefik route at +`operator.dezky.local`. Reuses design tokens / NodeMark / UiIcon by copy for +now; a `packages/ui` workspace is a likely follow-up once we have a third +consumer. + +**Why separate, not a route group in `apps/portal/`:** security boundary. The +moment any operator-only feature mutates customer state (impersonation, suspend +tenant), a routing or middleware bug on a shared app is catastrophic. Separate +apps make that nearly impossible. Different cookies, different OIDC client, +different domain. + +**Cost:** one more docker-compose service, ~10 lines of Traefik labels, one more +volume for `node_modules`. Some duplicated dev tooling (eslint, tsconfig). + +--- + +## Auth — new `dezky-operator` Authentik OAuth provider + +Decision: a dedicated OAuth client in Authentik, distinct from `dezky-portal`. + +- New provider `dezky-operator` (confidential, PKCE on) +- Redirect URIs: `https://operator.dezky.local/auth/oidc/callback` +- Group binding: `dezky-platform-admins` required at the provider's authorization + flow (Authentik policy), so non-admins can't even consent +- Stricter policies attached only to this provider: MFA required, future IP + allowlist for the office network/VPN +- Token audience claim: `dezky-operator` +- Provisioning's `JwtAuthGuard` widens its audience check to a list: + `['dezky-portal', 'dezky-operator']` +- Per-endpoint guard for operator-only mutations: require `aud === 'dezky-operator'` + AND `actor.platformAdmin === true`. The audience check makes "is this a privileged + session" provable from the token alone, independent of the DB lookup + +**UX trade-off accepted:** if Ronni (or any operator who is also a customer) +wants to be in both apps, they log into Authentik twice — once per audience. +Correct security-wise, fine ergonomically. + +--- + +## Backend stays as one service — rename to `services/platform-api` + +Decision: route all operator mutations and reads through the existing NestJS +service (no second backend, no Nitro-direct-to-Mongo). Rename +`services/provisioning` → `services/platform-api` because the service now owns +more than just provisioning — it's the platform's data + control plane. + +**What changes during the rename:** + +- Directory: `services/provisioning/` → `services/platform-api/` +- Package: `@dezky/provisioning` → `@dezky/platform-api` +- Docker container name: `dezky-provisioning` → `dezky-platform-api` +- Compose service key, network alias, volume names +- Portal env var: `PROVISIONING_INTERNAL_URL` → `PLATFORM_API_INTERNAL_URL` +- Portal proxy routes: `http://provisioning:3001` → `http://platform-api:3001` +- Internal module names referencing "provisioning" stay (e.g. + `ProvisioningService` is now one orchestration concern *inside* + `platform-api`; not the whole service's purpose) +- Public URL stays `api.dezky.local` (Traefik routes by Host header, unaffected) + +**New endpoints platform-api gains in this phase:** + +- `POST /tenants/:slug/suspend`, `POST /tenants/:slug/resume` +- `PATCH /tenants/:slug` already exists; ensure it can change plan / seat cap +- `GET /partners`, `POST /partners`, `GET /partners/:slug`, `PATCH /partners/:slug` +- `Tenant.partnerId` foreign key + filter on tenant queries +- `JwtAuthGuard` accepts both `dezky-portal` and `dezky-operator` audiences; + per-endpoint requirement of `dezky-operator` aud for operator-only mutations + +**Strategy:** rename in a separate prep commit before the operator work starts, +so the rename diff is mechanical and reviewable on its own. + +--- + +## Partner schema + +```typescript +@Schema({ collection: 'partners', timestamps: true }) +class Partner { + slug: string // 'nordicmsp', URL-safe, unique + name: string // 'NordicMSP' + domain: string // 'nordicmsp.dk' — partner's own org domain + status: 'active' | 'in-negotiation' | 'paused' | 'terminated' // default 'in-negotiation' + marginPct: number // 20 = partner keeps 20% of customer MRR; one number per partner + partnershipStartedAt?: Date + contactInfo: { primaryName?, primaryEmail?, billingEmail? } + billingInfo: { /* same shape as Tenant.billingInfo */ } +} +``` + +**Tenant side:** add `partnerId?: Types.ObjectId` (ref Partner, indexed, +optional). Direct customers have no `partnerId`; partner-owned customers +reference one. + +**Computed at query time, not stored:** +- `Partner.customers` — count of tenants with `partnerId === this._id` +- `Partner.mrr` — sum of those tenants' MRR + +Storing denormalized would force write-time syncing on every tenant +create/suspend/plan-change for ~zero benefit at our scale. + +**Operator-only.** A self-serve partner portal at `partner.dezky.local` is a +future surface; not in this phase. Partners are visible/manageable only from +the operator app. + +--- + +## Impersonation — visual stub now, real flow later + +Decision: build the UI exactly as designed (modal with reason field, top +banner, exit button) but do not wire actual token exchange. The confirm action +toasts "impersonation not implemented yet" and writes a mock audit entry. + +**Why now:** validates the UX, lets future hires see the operator surface +end-to-end, doesn't introduce a dangerous capability before there's an +operational need. + +**Mitigations against confusion:** +- Modal carries a `Demo only` badge — same styling as other stub-data badges + in the operator UI +- Toast on confirm makes the no-op explicit +- The banner does display in mock mode (so we can iterate on its design), but + the underlying session state is local to the operator tab + +**Real flow design recorded for the follow-up:** OAuth 2 Token Exchange +(RFC 8693). Authentik supports it. Customer portal needs to accept tokens +carrying an `act` claim alongside `sub`, and show its own impersonation banner +when the two differ. ~2 days of careful work + security review. + +--- + +## Decisions made without grilling (small, low-risk) + +- **Theme:** dark by default. Existing `apps/portal/assets/styles/tokens.css` + already defines `[data-theme='dark']` tokens; the operator app sets + `` at app root and reuses them +- **Mock data location:** TypeScript files under `apps/operator/data/` + (`tenants-mock.ts`, `partners-mock.ts`, `flags-mock.ts`, etc.). Same shape + as `operator-data.jsx` from the design bundle, just retyped +- **Design system reuse:** copy `NodeMark.vue`, `UiIcon.vue`, and the auth + components into `apps/operator/components/` directly. A shared `packages/ui` + workspace becomes worth doing once a third surface needs them (partner + portal? landing site?) +- **OCIS / Stalwart admin shortcuts in operator UI:** out of scope for this + phase. Operator drills via the customer-facing service URLs + +--- + +## Follow-up tasks (post-MVP) + +In rough priority order: + +1. **Real impersonation flow** — OAuth Token Exchange (RFC 8693), customer + portal `act`-claim handling, audit on entry+exit, banner with origin + operator identity +2. **Real audit log collection** — replace mock fixtures with a `platform_audit` + collection in Mongo that platform-api writes on every privileged action; + stream from there in the operator UI +3. **Feature flag backend** — `Flag` schema + per-tenant rollout state + a + tiny flag-eval client every service imports +4. **Incident management backend** — `Incident` schema + paging integration + (PagerDuty / OpsGenie / custom). Until then, the incident modal renders + from mock +5. **Support ticket queue** — `SupportTicket` schema + email-in ingestion + from a dedicated mailbox via Stalwart +6. **Self-serve Partner portal at `partner.dezky.local`** — Phase 6+ work, + own Nuxt app, own OAuth client, scoped to a partner's own customers +7. **Real environment switcher** — currently cosmetic; would need separate + API endpoints per env, separate Authentik tenants, etc. +8. **Real on-call indicator** — integration with the paging system that + gets installed in (4) +9. **Operator workspace impersonation in OCIS/Stalwart** — operator tooling + reaches *into* the customer's file storage and mail for support, with the + same audit trail as portal impersonation + +--- + +## Out of scope for this entire effort + +- Multi-region operator UI +- Read-only investor / board mode (a real persona but build it when there's a + real investor — design has a placeholder "Read-only" role for Jonas Berg) +- White-label of the operator portal (partners get their own portal eventually; + Dezky operator never gets white-labeled — it's our internal tool)