Files
dezky/docs/OPERATOR-PLAN.md
T
Ronni Baslund 2db41fec5e feat(platform-api): multi-audience JWT + Partner CRUD + tenant lifecycle (O.2)
JwtAuthGuard now accepts a comma-separated AUTHENTIK_AUDIENCE
('dezky-portal,dezky-operator'). jose.jwtVerify takes an array and succeeds
on any match — both customer-portal and operator-portal tokens validate
against this service. Per-endpoint guards restrict further.

New OperatorGuard enforces operator-only mutations:
  1. JWT audience claim includes 'dezky-operator' (proof from the token
     alone that this is a privileged session)
  2. ActorService-resolved User has platformAdmin=true (DB check so
     revocation works without waiting for the token to expire)
Both required; either alone is insufficient.

Partner module:
  - Partner schema: slug, name, domain, status, marginPct, contactInfo,
    billingInfo. marginPct is one number per partner (decided in grilling)
  - CRUD endpoints under @UseGuards(JwtAuthGuard, OperatorGuard) — every
    partner mutation requires operator scope
  - GET /partners returns each row with a computed customers count from
    aggregating Tenant.partnerId. MRR aggregation deferred until
    Subscription gains a price column
  - GET /partners/:slug/tenants for the partner detail view
  - DELETE soft-terminates (status='terminated') — never hard-delete
    because tenants may still reference the partner

Tenant changes:
  - partnerId?: Types.ObjectId (ref Partner, indexed sparse) added to
    Tenant schema
  - UpdateTenantDto accepts partnerId so PATCH can attach/detach
  - POST /tenants/:slug/suspend and /resume — operator-only via
    OperatorGuard. PATCH already covers plan/domains/partnerId changes

Smoke test: customer-portal session sends POST /api/partners through the
portal proxy → 403 "This endpoint requires an operator-scoped token". The
positive test (operator-token → 200) waits for O.3 when there's an
operator app to mint the right token.

apps/portal/server/api/partners/index.post.ts is a temporary verification
proxy — delete once the operator portal exists.
2026-05-24 07:08:59 +02:00

410 lines
19 KiB
Markdown

# Operator Portal — Plan
`operator.dezky.local` (dev) → `operator.dezky.com` (prod). Internal admin portal
for Dezky staff: managing tenants, partners, operating the platform.
Distinct from the customer portal at `app.dezky.local`. Different OAuth client,
different cookie domain, different surface — though they share Authentik as the
IdP and (eventually) platform-api as the backend.
This file is the running record of decisions made during the design grilling
session. Updated inline as questions resolve.
---
## Scope — C-visual with real management for Tenants + Partners
Decision: build every screen from the source design visually, but back two
domains with real CRUD from day one — Tenants and Partners. Everything else
renders against mock-data fixtures until its backend is built.
| Surface | Day-1 state |
|---|---|
| Overview / dashboard | Visual — aggregates from real Tenant+Partner data where available, mock for the rest |
| Tenants (list + detail with 7 tabs) | **Real backend**, full CRUD, suspend/resume/delete |
| Partners (list + detail) | **Real backend**, new schema, full CRUD |
| Users (global) | Real read across tenants (already in DB) |
| Support queue | Mock |
| Platform billing | Mock |
| Reports | Mock |
| Infrastructure | Visual; could derive from Docker health checks but probably mock initially |
| Feature flags | Mock |
| Audit log | Mock (real backfill is a follow-up) |
| Operator team | Real (Users with `platformAdmin: true`) |
| Platform settings | Mock |
| Command palette ⌘K | Visual — opens, navigates, but "execute action" just toasts |
| Impersonation modal + banner | Visual — confirms the action but doesn't actually mint a token |
| Incident modal | Mock |
| Env switcher (prod/staging/dev) | Cosmetic — picks a label, no real env switch |
| On-call indicator | Mock |
### Real-backend surface this adds
Two genuinely new things on the backend:
1. **Partner schema and CRUD** in `services/platform-api` — id, name, domain,
status, customers count (computed), MRR (computed), margin, sinceDate. Tenants
gain an optional `partnerId` field. The existing `dezky` seed gets no partner.
2. **Tenant lifecycle actions** beyond create — suspend, resume, change plan,
change seat cap, soft-delete with grace period. Existing schema covers most
of this; controllers need new methods.
Everything else (incidents, flags, support tickets, audit log collection,
impersonation tokens) stays mock until explicitly promoted.
---
## Lives at `apps/operator/` — separate Nuxt app
Decision: new Nuxt 3 app, separate `package.json`, separate Traefik route at
`operator.dezky.local`. Reuses design tokens / NodeMark / UiIcon by copy for
now; a `packages/ui` workspace is a likely follow-up once we have a third
consumer.
**Why separate, not a route group in `apps/portal/`:** security boundary. The
moment any operator-only feature mutates customer state (impersonation, suspend
tenant), a routing or middleware bug on a shared app is catastrophic. Separate
apps make that nearly impossible. Different cookies, different OIDC client,
different domain.
**Cost:** one more docker-compose service, ~10 lines of Traefik labels, one more
volume for `node_modules`. Some duplicated dev tooling (eslint, tsconfig).
---
## Auth — new `dezky-operator` Authentik OAuth provider
Decision: a dedicated OAuth client in Authentik, distinct from `dezky-portal`.
- New provider `dezky-operator` (confidential, PKCE on)
- Redirect URIs: `https://operator.dezky.local/auth/oidc/callback`
- Group binding: `dezky-platform-admins` required at the provider's authorization
flow (Authentik policy), so non-admins can't even consent
- Stricter policies attached only to this provider: MFA required, future IP
allowlist for the office network/VPN
- Token audience claim: `dezky-operator`
- Provisioning's `JwtAuthGuard` widens its audience check to a list:
`['dezky-portal', 'dezky-operator']`
- Per-endpoint guard for operator-only mutations: require `aud === 'dezky-operator'`
AND `actor.platformAdmin === true`. The audience check makes "is this a privileged
session" provable from the token alone, independent of the DB lookup
**UX trade-off accepted:** if Ronni (or any operator who is also a customer)
wants to be in both apps, they log into Authentik twice — once per audience.
Correct security-wise, fine ergonomically.
---
## Backend stays as one service — rename to `services/platform-api`
Decision: route all operator mutations and reads through the existing NestJS
service (no second backend, no Nitro-direct-to-Mongo). Rename
`services/provisioning``services/platform-api` because the service now owns
more than just provisioning — it's the platform's data + control plane.
**What changes during the rename:**
- Directory: `services/provisioning/``services/platform-api/`
- Package: `@dezky/provisioning``@dezky/platform-api`
- Docker container name: `dezky-provisioning``dezky-platform-api`
- Compose service key, network alias, volume names
- Portal env var: `PROVISIONING_INTERNAL_URL``PLATFORM_API_INTERNAL_URL`
- Portal proxy routes: `http://provisioning:3001``http://platform-api:3001`
- Internal module names referencing "provisioning" stay (e.g.
`ProvisioningService` is now one orchestration concern *inside*
`platform-api`; not the whole service's purpose)
- Public URL stays `api.dezky.local` (Traefik routes by Host header, unaffected)
**New endpoints platform-api gains in this phase:**
- `POST /tenants/:slug/suspend`, `POST /tenants/:slug/resume`
- `PATCH /tenants/:slug` already exists; ensure it can change plan / seat cap
- `GET /partners`, `POST /partners`, `GET /partners/:slug`, `PATCH /partners/:slug`
- `Tenant.partnerId` foreign key + filter on tenant queries
- `JwtAuthGuard` accepts both `dezky-portal` and `dezky-operator` audiences;
per-endpoint requirement of `dezky-operator` aud for operator-only mutations
**Strategy:** rename in a separate prep commit before the operator work starts,
so the rename diff is mechanical and reviewable on its own.
---
## Partner schema
```typescript
@Schema({ collection: 'partners', timestamps: true })
class Partner {
slug: string // 'nordicmsp', URL-safe, unique
name: string // 'NordicMSP'
domain: string // 'nordicmsp.dk' — partner's own org domain
status: 'active' | 'in-negotiation' | 'paused' | 'terminated' // default 'in-negotiation'
marginPct: number // 20 = partner keeps 20% of customer MRR; one number per partner
partnershipStartedAt?: Date
contactInfo: { primaryName?, primaryEmail?, billingEmail? }
billingInfo: { /* same shape as Tenant.billingInfo */ }
}
```
**Tenant side:** add `partnerId?: Types.ObjectId` (ref Partner, indexed,
optional). Direct customers have no `partnerId`; partner-owned customers
reference one.
**Computed at query time, not stored:**
- `Partner.customers` — count of tenants with `partnerId === this._id`
- `Partner.mrr` — sum of those tenants' MRR
Storing denormalized would force write-time syncing on every tenant
create/suspend/plan-change for ~zero benefit at our scale.
**Operator-only.** A self-serve partner portal at `partner.dezky.local` is a
future surface; not in this phase. Partners are visible/manageable only from
the operator app.
---
## Impersonation — visual stub now, real flow later
Decision: build the UI exactly as designed (modal with reason field, top
banner, exit button) but do not wire actual token exchange. The confirm action
toasts "impersonation not implemented yet" and writes a mock audit entry.
**Why now:** validates the UX, lets future hires see the operator surface
end-to-end, doesn't introduce a dangerous capability before there's an
operational need.
**Mitigations against confusion:**
- Modal carries a `Demo only` badge — same styling as other stub-data badges
in the operator UI
- Toast on confirm makes the no-op explicit
- The banner does display in mock mode (so we can iterate on its design), but
the underlying session state is local to the operator tab
**Real flow design recorded for the follow-up:** OAuth 2 Token Exchange
(RFC 8693). Authentik supports it. Customer portal needs to accept tokens
carrying an `act` claim alongside `sub`, and show its own impersonation banner
when the two differ. ~2 days of careful work + security review.
---
## Decisions made without grilling (small, low-risk)
- **Theme:** dark by default. Existing `apps/portal/assets/styles/tokens.css`
already defines `[data-theme='dark']` tokens; the operator app sets
`<html data-theme="dark">` at app root and reuses them
- **Mock data location:** TypeScript files under `apps/operator/data/`
(`tenants-mock.ts`, `partners-mock.ts`, `flags-mock.ts`, etc.). Same shape
as `operator-data.jsx` from the design bundle, just retyped
- **Design system reuse:** copy `NodeMark.vue`, `UiIcon.vue`, and the auth
components into `apps/operator/components/` directly. A shared `packages/ui`
workspace becomes worth doing once a third surface needs them (partner
portal? landing site?)
- **OCIS / Stalwart admin shortcuts in operator UI:** out of scope for this
phase. Operator drills via the customer-facing service URLs
---
## Follow-up tasks (post-MVP)
In rough priority order:
1. **Real impersonation flow** — OAuth Token Exchange (RFC 8693), customer
portal `act`-claim handling, audit on entry+exit, banner with origin
operator identity
2. **Real audit log collection** — replace mock fixtures with a `platform_audit`
collection in Mongo that platform-api writes on every privileged action;
stream from there in the operator UI
3. **Feature flag backend**`Flag` schema + per-tenant rollout state + a
tiny flag-eval client every service imports
4. **Incident management backend**`Incident` schema + paging integration
(PagerDuty / OpsGenie / custom). Until then, the incident modal renders
from mock
5. **Support ticket queue**`SupportTicket` schema + email-in ingestion
from a dedicated mailbox via Stalwart
6. **Self-serve Partner portal at `partner.dezky.local`** — Phase 6+ work,
own Nuxt app, own OAuth client, scoped to a partner's own customers
7. **Real environment switcher** — currently cosmetic; would need separate
API endpoints per env, separate Authentik tenants, etc.
8. **Real on-call indicator** — integration with the paging system that
gets installed in (4)
9. **Operator workspace impersonation in OCIS/Stalwart** — operator tooling
reaches *into* the customer's file storage and mail for support, with the
same audit trail as portal impersonation
---
## Out of scope for this entire effort
- Multi-region operator UI
- Read-only investor / board mode (a real persona but build it when there's a
real investor — design has a placeholder "Read-only" role for Jonas Berg)
- White-label of the operator portal (partners get their own portal eventually;
Dezky operator never gets white-labeled — it's our internal tool)
---
## Execution checklist
Tick boxes as work lands. Each phase is roughly one commit. Phases must be
done in order — earlier ones unblock later ones.
### O.0 · Prep — service rename ✓
- [x] Rename `services/provisioning/``services/platform-api/`
- [x] Update `package.json` name → `@dezky/platform-api`
- [x] Update `docker-compose.yml`: container name, service key, volume name,
env var `PROVISIONING_INTERNAL_URL``PLATFORM_API_INTERNAL_URL`,
NUXT_API_BASE points at new hostname
- [x] Update portal proxy routes to read `PLATFORM_API_INTERNAL_URL` and
default to `http://platform-api:3001`
- [x] Sweep docs (README, CLAUDE.md, SERVICES.md, AUTHENTIK-SETUP.md,
NEXT-STEPS.md, TROUBLESHOOTING.md) for stale references
- [x] Verify customer portal `/api/me` still works end-to-end after rename
### O.1 · Authentik — operator OAuth client ✓
- [x] Create `dezky-operator` OAuth provider via Authentik API
- [x] Set redirect URIs to `https://operator.dezky.local/auth/oidc/{callback,logout}`
- [x] Confidential client; client_secret persisted to `.env` as
`OPERATOR_OIDC_CLIENT_SECRET`
- [x] `Dezky Operator` application created and linked to the provider
- [x] Group binding on the application: `dezky-platform-admins` required to
reach the consent screen. (Authentik 2025.10 supports group-direct
policy bindings — no separate `policy_group_membership` object needed)
- [ ] **Deferred to follow-up:** MFA-required policy on this provider.
Authentik does this via a stage binding on the authentication flow,
which is app-specific configuration we'll wire when there's an actual
MFA enrollment to gate against. For dev with one akadmin, akadmin
already has WebAuthn — the auth flow prompts for it automatically
- [x] Discovery doc verified at
`/application/o/dezky-operator/.well-known/openid-configuration`
issuer correct, scopes include `groups`, all endpoints resolve
### Gotchas worth noting
- Authentik 2025.10 requires both `authorization_flow` AND `invalidation_flow`
when creating OAuth2 providers. The default invalidation flow is at
`/api/v3/flows/instances/?designation=invalidation` (slug
`default-provider-invalidation-flow`)
- The `policies/group_membership/` endpoint mentioned in older Authentik
docs is gone in 2025.10. Use `policies/bindings/` with a direct `group`
reference instead
### O.2 · platform-api — multi-audience + Partner CRUD ✓
- [x] `JwtAuthGuard`: accepts comma-separated `AUTHENTIK_AUDIENCE`
(`dezky-portal,dezky-operator`). Both audiences validate; per-endpoint
guards further restrict
- [x] `OperatorGuard` (not a decorator — a regular `CanActivate` guard)
enforcing `aud includes 'dezky-operator' && actor.platformAdmin`.
Applied via `@UseGuards(JwtAuthGuard, OperatorGuard)`
- [x] `schemas/partner.schema.ts` — Partner model
- [x] `partners/` module: controller + service + DTOs (create / read /
update / soft-terminate / list tenants under partner)
- [x] `partnerId?: Types.ObjectId` added to Tenant schema (indexed, sparse).
`UpdateTenantDto` accepts `partnerId` to attach/detach
- [x] `Partner.customers` aggregated at query time (count of Tenants by
partnerId). MRR aggregation **deferred** — Tenant has no monthly
amount yet and Subscription lacks a price column. Will land when
Subscription gains pricing
- [x] Tenant lifecycle endpoints: `POST /tenants/:slug/suspend`,
`POST /tenants/:slug/resume` (operator-only). PATCH already accepts
plan/domains/partnerId changes
- [x] Smoke test: customer-portal token → `POST /partners` returns 403
"This endpoint requires an operator-scoped token" ✓. Positive test
(operator token → 200) deferred until O.3 when the operator app
exists to mint that token
### O.3 · Scaffold `apps/operator/`
- [ ] `apps/operator/package.json` (Nuxt 3, `nuxt-oidc-auth` beta.11, same
deps as portal)
- [ ] `nuxt.config.ts` with `oidc` block pointing at `dezky-operator`
- [ ] Docker compose service `operator`, with Traefik labels for
`operator.dezky.local`, `node_modules` volume, same `NODE_EXTRA_CA_CERTS`
mount for mkcert
- [ ] Network alias on Traefik: `operator.dezky.local`
- [ ] User task: add `operator.dezky.local` to `/etc/hosts`
- [ ] Session secrets in `.env`: `NUXT_OIDC_TOKEN_KEY` (base64-32),
`NUXT_OIDC_SESSION_SECRET`, `NUXT_OIDC_AUTH_SESSION_SECRET`
**distinct from** the customer portal's secrets
- [ ] Verify login: visit `https://operator.dezky.local`, bounce to Authentik,
sign in as akadmin, land on a placeholder index page
### O.4 · Design system + app shell
- [ ] `assets/styles/tokens.css` — copy with `data-theme="dark"` as default
- [ ] `assets/styles/base.css`
- [ ] Components: `NodeMark.vue`, `UiIcon.vue` (copy from portal)
- [ ] Shared primitives ported from the design: `Card`, `Button`, `Table`,
`Badge`, `Mono`, `Eyebrow`, `StatusDot`, `Avatar`, `PageHeader`
- [ ] `OpSidebar.vue` — collapsible, badges per nav item
- [ ] `OpTopbar.vue` — env badge, ⌘K trigger, on-call pill, bell, avatar
- [ ] `app.vue` shell wires sidebar + topbar + `<NuxtPage />`
- [ ] Keyboard shortcut: ⌘[ collapses sidebar, ⌘K opens palette
### O.5 · Tenant management (real backend)
- [ ] `pages/tenants/index.vue` — list with status/plan/seats/MRR columns,
filter by partner and status, search by slug/name
- [ ] `pages/tenants/[slug].vue` — detail view with tabs
- [ ] Tab: **Overview** — header card, key stats, partner link
- [ ] Tab: **Users** — list users via `GET /users?tenantSlug=…`
- [ ] Tab: **Resources** — provisioning status per integration
(Authentik / Stalwart / OCIS), error messages, "Reconcile" button
- [ ] Tab: **Billing** (mock fixtures)
- [ ] Tab: **Audit** (mock fixtures)
- [ ] Tab: **Support** (mock fixtures)
- [ ] Tab: **Danger** — suspend, resume, change plan, soft-delete; real
backend calls, confirmation modals
### O.6 · Partner management (real backend)
- [ ] `pages/partners/index.vue` — list with name/domain/status/customers/MRR
- [ ] `pages/partners/[slug].vue` — detail panel with customers list,
MRR breakdown, margin, contact info
- [ ] "Create partner" modal — POST /partners
- [ ] Attach / detach tenant to partner (PATCH on tenant.partnerId)
### O.7 · Visual-only screens (mock fixtures)
- [ ] `data/*.ts` — typed mock fixtures (tenants-extra, partners-extra,
services, incident, flags, audit, team)
- [ ] `pages/index.vue` — Overview dashboard
- [ ] `pages/operator-team.vue` — real backend (Users where
`platformAdmin === true`)
- [ ] `pages/users.vue` — global users, real read
- [ ] `pages/infrastructure.vue` — service health (mock for now;
docker health check integration is a follow-up)
- [ ] `pages/flags.vue` — feature flags (mock)
- [ ] `pages/audit.vue` — global audit (mock)
- [ ] `pages/support.vue` — placeholder
- [ ] `pages/billing.vue` — placeholder
- [ ] `pages/reports.vue` — placeholder
- [ ] `pages/settings.vue` — placeholder
### O.8 · Interactions
- [ ] `CommandPalette.vue` — ⌘K opens, fuzzy search over tenants + partners
+ flags + nav items + actions
- [ ] `ImpersonationModal.vue` — visual stub with reason field, Demo-only
badge, no-op confirm + toast
- [ ] `ImpersonationBanner.vue` — top banner shown when impersonating
- [ ] `IncidentModal.vue` — mock incident render
- [ ] `TweaksPanel.vue` — theme (light/dark), density (comfy/compact),
env (prod/staging/dev cosmetic switch)
### O.9 · Verification
- [ ] Sign in to `operator.dezky.local` as akadmin via the new OAuth client
- [ ] Confirm JWT audience is `dezky-operator` (decode in DevTools, post
response back)
- [ ] Create a real Partner via the UI, see it in Mongo
- [ ] Attach the `acme` tenant to that partner; verify count goes 0 → 1
- [ ] Suspend a tenant from the Danger tab; confirm `status: 'suspended'`
in Mongo
- [ ] Sign in to `app.dezky.local` simultaneously in another browser
profile, confirm the customer portal still works and that customer
token's `aud` is `dezky-portal`
- [ ] Tick all the relevant follow-up tasks in NEXT-STEPS.md as remaining
work, file separate issues if anything was deferred