Files
dezky/docs/OPERATOR-PLAN.md
T
Ronni Baslund 92c5056a1d docs: capture operator portal plan from grilling session
OPERATOR-PLAN.md records the decisions from the design review:
- Scope: C-visual (full UI fidelity, mock data for most screens) but real
  CRUD for tenants and partners from day one
- Lives at apps/operator/ as a separate Nuxt app, separate domain, separate
  Authentik OAuth client (dezky-operator), aud-claim distinguishes operator
  vs portal tokens
- Backend stays as a single NestJS service; rename
  services/provisioning -> services/platform-api as a prep commit
- Partner schema designed: slug/name/domain/status/marginPct/contactInfo;
  Tenant gains optional partnerId; counts and MRR are computed at query time
- Impersonation: visual stub now (modal + banner, no-op toast); real OAuth
  Token Exchange flow recorded as the first follow-up task

Also lists follow-up tasks (real audit log, feature flag backend, incident
management, partner portal) and out-of-scope items so the next grilling
session has a starting point.

Pointer added in NEXT-STEPS.md under a new 'Operator portal' track.
2026-05-24 00:26:21 +02:00

242 lines
11 KiB
Markdown

# Operator Portal — Plan
`operator.dezky.local` (dev) → `operator.dezky.com` (prod). Internal admin portal
for Dezky staff: managing tenants, partners, operating the platform.
Distinct from the customer portal at `app.dezky.local`. Different OAuth client,
different cookie domain, different surface — though they share Authentik as the
IdP and (eventually) the provisioning service as the backend.
This file is the running record of decisions made during the design grilling
session. Updated inline as questions resolve.
---
## Scope — C-visual with real management for Tenants + Partners
Decision: build every screen from the source design visually, but back two
domains with real CRUD from day one — Tenants and Partners. Everything else
renders against mock-data fixtures until its backend is built.
| Surface | Day-1 state |
|---|---|
| Overview / dashboard | Visual — aggregates from real Tenant+Partner data where available, mock for the rest |
| Tenants (list + detail with 7 tabs) | **Real backend**, full CRUD, suspend/resume/delete |
| Partners (list + detail) | **Real backend**, new schema, full CRUD |
| Users (global) | Real read across tenants (already in DB) |
| Support queue | Mock |
| Platform billing | Mock |
| Reports | Mock |
| Infrastructure | Visual; could derive from Docker health checks but probably mock initially |
| Feature flags | Mock |
| Audit log | Mock (real backfill is a follow-up) |
| Operator team | Real (Users with `platformAdmin: true`) |
| Platform settings | Mock |
| Command palette ⌘K | Visual — opens, navigates, but "execute action" just toasts |
| Impersonation modal + banner | Visual — confirms the action but doesn't actually mint a token |
| Incident modal | Mock |
| Env switcher (prod/staging/dev) | Cosmetic — picks a label, no real env switch |
| On-call indicator | Mock |
### Real-backend surface this adds
Two genuinely new things on the backend:
1. **Partner schema and CRUD** in `services/provisioning` — id, name, domain,
status, customers count (computed), MRR (computed), margin, sinceDate. Tenants
gain an optional `partnerId` field. The existing `dezky` seed gets no partner.
2. **Tenant lifecycle actions** beyond create — suspend, resume, change plan,
change seat cap, soft-delete with grace period. Existing schema covers most
of this; controllers need new methods.
Everything else (incidents, flags, support tickets, audit log collection,
impersonation tokens) stays mock until explicitly promoted.
---
## Lives at `apps/operator/` — separate Nuxt app
Decision: new Nuxt 3 app, separate `package.json`, separate Traefik route at
`operator.dezky.local`. Reuses design tokens / NodeMark / UiIcon by copy for
now; a `packages/ui` workspace is a likely follow-up once we have a third
consumer.
**Why separate, not a route group in `apps/portal/`:** security boundary. The
moment any operator-only feature mutates customer state (impersonation, suspend
tenant), a routing or middleware bug on a shared app is catastrophic. Separate
apps make that nearly impossible. Different cookies, different OIDC client,
different domain.
**Cost:** one more docker-compose service, ~10 lines of Traefik labels, one more
volume for `node_modules`. Some duplicated dev tooling (eslint, tsconfig).
---
## Auth — new `dezky-operator` Authentik OAuth provider
Decision: a dedicated OAuth client in Authentik, distinct from `dezky-portal`.
- New provider `dezky-operator` (confidential, PKCE on)
- Redirect URIs: `https://operator.dezky.local/auth/oidc/callback`
- Group binding: `dezky-platform-admins` required at the provider's authorization
flow (Authentik policy), so non-admins can't even consent
- Stricter policies attached only to this provider: MFA required, future IP
allowlist for the office network/VPN
- Token audience claim: `dezky-operator`
- Provisioning's `JwtAuthGuard` widens its audience check to a list:
`['dezky-portal', 'dezky-operator']`
- Per-endpoint guard for operator-only mutations: require `aud === 'dezky-operator'`
AND `actor.platformAdmin === true`. The audience check makes "is this a privileged
session" provable from the token alone, independent of the DB lookup
**UX trade-off accepted:** if Ronni (or any operator who is also a customer)
wants to be in both apps, they log into Authentik twice — once per audience.
Correct security-wise, fine ergonomically.
---
## Backend stays as one service — rename to `services/platform-api`
Decision: route all operator mutations and reads through the existing NestJS
service (no second backend, no Nitro-direct-to-Mongo). Rename
`services/provisioning``services/platform-api` because the service now owns
more than just provisioning — it's the platform's data + control plane.
**What changes during the rename:**
- Directory: `services/provisioning/``services/platform-api/`
- Package: `@dezky/provisioning``@dezky/platform-api`
- Docker container name: `dezky-provisioning``dezky-platform-api`
- Compose service key, network alias, volume names
- Portal env var: `PROVISIONING_INTERNAL_URL``PLATFORM_API_INTERNAL_URL`
- Portal proxy routes: `http://provisioning:3001``http://platform-api:3001`
- Internal module names referencing "provisioning" stay (e.g.
`ProvisioningService` is now one orchestration concern *inside*
`platform-api`; not the whole service's purpose)
- Public URL stays `api.dezky.local` (Traefik routes by Host header, unaffected)
**New endpoints platform-api gains in this phase:**
- `POST /tenants/:slug/suspend`, `POST /tenants/:slug/resume`
- `PATCH /tenants/:slug` already exists; ensure it can change plan / seat cap
- `GET /partners`, `POST /partners`, `GET /partners/:slug`, `PATCH /partners/:slug`
- `Tenant.partnerId` foreign key + filter on tenant queries
- `JwtAuthGuard` accepts both `dezky-portal` and `dezky-operator` audiences;
per-endpoint requirement of `dezky-operator` aud for operator-only mutations
**Strategy:** rename in a separate prep commit before the operator work starts,
so the rename diff is mechanical and reviewable on its own.
---
## Partner schema
```typescript
@Schema({ collection: 'partners', timestamps: true })
class Partner {
slug: string // 'nordicmsp', URL-safe, unique
name: string // 'NordicMSP'
domain: string // 'nordicmsp.dk' — partner's own org domain
status: 'active' | 'in-negotiation' | 'paused' | 'terminated' // default 'in-negotiation'
marginPct: number // 20 = partner keeps 20% of customer MRR; one number per partner
partnershipStartedAt?: Date
contactInfo: { primaryName?, primaryEmail?, billingEmail? }
billingInfo: { /* same shape as Tenant.billingInfo */ }
}
```
**Tenant side:** add `partnerId?: Types.ObjectId` (ref Partner, indexed,
optional). Direct customers have no `partnerId`; partner-owned customers
reference one.
**Computed at query time, not stored:**
- `Partner.customers` — count of tenants with `partnerId === this._id`
- `Partner.mrr` — sum of those tenants' MRR
Storing denormalized would force write-time syncing on every tenant
create/suspend/plan-change for ~zero benefit at our scale.
**Operator-only.** A self-serve partner portal at `partner.dezky.local` is a
future surface; not in this phase. Partners are visible/manageable only from
the operator app.
---
## Impersonation — visual stub now, real flow later
Decision: build the UI exactly as designed (modal with reason field, top
banner, exit button) but do not wire actual token exchange. The confirm action
toasts "impersonation not implemented yet" and writes a mock audit entry.
**Why now:** validates the UX, lets future hires see the operator surface
end-to-end, doesn't introduce a dangerous capability before there's an
operational need.
**Mitigations against confusion:**
- Modal carries a `Demo only` badge — same styling as other stub-data badges
in the operator UI
- Toast on confirm makes the no-op explicit
- The banner does display in mock mode (so we can iterate on its design), but
the underlying session state is local to the operator tab
**Real flow design recorded for the follow-up:** OAuth 2 Token Exchange
(RFC 8693). Authentik supports it. Customer portal needs to accept tokens
carrying an `act` claim alongside `sub`, and show its own impersonation banner
when the two differ. ~2 days of careful work + security review.
---
## Decisions made without grilling (small, low-risk)
- **Theme:** dark by default. Existing `apps/portal/assets/styles/tokens.css`
already defines `[data-theme='dark']` tokens; the operator app sets
`<html data-theme="dark">` at app root and reuses them
- **Mock data location:** TypeScript files under `apps/operator/data/`
(`tenants-mock.ts`, `partners-mock.ts`, `flags-mock.ts`, etc.). Same shape
as `operator-data.jsx` from the design bundle, just retyped
- **Design system reuse:** copy `NodeMark.vue`, `UiIcon.vue`, and the auth
components into `apps/operator/components/` directly. A shared `packages/ui`
workspace becomes worth doing once a third surface needs them (partner
portal? landing site?)
- **OCIS / Stalwart admin shortcuts in operator UI:** out of scope for this
phase. Operator drills via the customer-facing service URLs
---
## Follow-up tasks (post-MVP)
In rough priority order:
1. **Real impersonation flow** — OAuth Token Exchange (RFC 8693), customer
portal `act`-claim handling, audit on entry+exit, banner with origin
operator identity
2. **Real audit log collection** — replace mock fixtures with a `platform_audit`
collection in Mongo that platform-api writes on every privileged action;
stream from there in the operator UI
3. **Feature flag backend**`Flag` schema + per-tenant rollout state + a
tiny flag-eval client every service imports
4. **Incident management backend**`Incident` schema + paging integration
(PagerDuty / OpsGenie / custom). Until then, the incident modal renders
from mock
5. **Support ticket queue**`SupportTicket` schema + email-in ingestion
from a dedicated mailbox via Stalwart
6. **Self-serve Partner portal at `partner.dezky.local`** — Phase 6+ work,
own Nuxt app, own OAuth client, scoped to a partner's own customers
7. **Real environment switcher** — currently cosmetic; would need separate
API endpoints per env, separate Authentik tenants, etc.
8. **Real on-call indicator** — integration with the paging system that
gets installed in (4)
9. **Operator workspace impersonation in OCIS/Stalwart** — operator tooling
reaches *into* the customer's file storage and mail for support, with the
same audit trail as portal impersonation
---
## Out of scope for this entire effort
- Multi-region operator UI
- Read-only investor / board mode (a real persona but build it when there's a
real investor — design has a placeholder "Read-only" role for Jonas Berg)
- White-label of the operator portal (partners get their own portal eventually;
Dezky operator never gets white-labeled — it's our internal tool)