Files
dezky/docs/OPERATOR-PLAN.md
T
Ronni Baslund 22b2583f0b chore(services): rename services/provisioning -> services/platform-api
O.0 prep from OPERATOR-PLAN.md. Mechanical refactor before adding partner
management and operator-specific endpoints. The service now owns more than
just provisioning orchestration (it'll soon own partners, tenant lifecycle
actions, multi-audience JWT validation), so the name 'platform-api' reflects
its scope better.

What changed:
- Directory: services/provisioning/ -> services/platform-api/
- Package: @dezky/provisioning -> @dezky/platform-api
- Docker: container_name dezky-provisioning -> dezky-platform-api;
  compose service key 'provisioning' -> 'platform-api'; volume
  provisioning_node_modules -> platform_api_node_modules
- Portal: PROVISIONING_INTERNAL_URL env var -> PLATFORM_API_INTERNAL_URL,
  default URL http://provisioning:3001 -> http://platform-api:3001 in all
  three proxy routes (me.get.ts, tenants/index.post.ts, tenants/[slug]/
  reconcile.post.ts), plus NUXT_API_BASE updated
- Health endpoint service identifier and main.ts log lines updated to
  'dezky-platform-api'
- Docs swept: README, CLAUDE.md, SERVICES.md, AUTHENTIK-SETUP.md,
  NEXT-STEPS.md, TROUBLESHOOTING.md, OPERATOR-PLAN.md, traefik/dynamic.yml

What deliberately stays:
- Internal module names ProvisioningService / ProvisioningModule (those
  describe an orchestration sub-concern, not the service's purpose)
- Tenant.provisioningStatus / provisioningErrors field names (state
  per integration, not service name)
- File services/platform-api/src/tenants/provisioning.service.ts
- 'Hetzner provisioning' references in production-prep docs (infrastructure
  provisioning, unrelated)

Verified end-to-end after rename: /api/me returns 200 with profile + 2
tenants + subscription, /api/tenants/dezky/reconcile returns 200 with
Authentik integration still ok.

OPERATOR-PLAN.md O.0 checkboxes ticked.
2026-05-24 00:35:01 +02:00

387 lines
18 KiB
Markdown

# Operator Portal — Plan
`operator.dezky.local` (dev) → `operator.dezky.com` (prod). Internal admin portal
for Dezky staff: managing tenants, partners, operating the platform.
Distinct from the customer portal at `app.dezky.local`. Different OAuth client,
different cookie domain, different surface — though they share Authentik as the
IdP and (eventually) platform-api as the backend.
This file is the running record of decisions made during the design grilling
session. Updated inline as questions resolve.
---
## Scope — C-visual with real management for Tenants + Partners
Decision: build every screen from the source design visually, but back two
domains with real CRUD from day one — Tenants and Partners. Everything else
renders against mock-data fixtures until its backend is built.
| Surface | Day-1 state |
|---|---|
| Overview / dashboard | Visual — aggregates from real Tenant+Partner data where available, mock for the rest |
| Tenants (list + detail with 7 tabs) | **Real backend**, full CRUD, suspend/resume/delete |
| Partners (list + detail) | **Real backend**, new schema, full CRUD |
| Users (global) | Real read across tenants (already in DB) |
| Support queue | Mock |
| Platform billing | Mock |
| Reports | Mock |
| Infrastructure | Visual; could derive from Docker health checks but probably mock initially |
| Feature flags | Mock |
| Audit log | Mock (real backfill is a follow-up) |
| Operator team | Real (Users with `platformAdmin: true`) |
| Platform settings | Mock |
| Command palette ⌘K | Visual — opens, navigates, but "execute action" just toasts |
| Impersonation modal + banner | Visual — confirms the action but doesn't actually mint a token |
| Incident modal | Mock |
| Env switcher (prod/staging/dev) | Cosmetic — picks a label, no real env switch |
| On-call indicator | Mock |
### Real-backend surface this adds
Two genuinely new things on the backend:
1. **Partner schema and CRUD** in `services/platform-api` — id, name, domain,
status, customers count (computed), MRR (computed), margin, sinceDate. Tenants
gain an optional `partnerId` field. The existing `dezky` seed gets no partner.
2. **Tenant lifecycle actions** beyond create — suspend, resume, change plan,
change seat cap, soft-delete with grace period. Existing schema covers most
of this; controllers need new methods.
Everything else (incidents, flags, support tickets, audit log collection,
impersonation tokens) stays mock until explicitly promoted.
---
## Lives at `apps/operator/` — separate Nuxt app
Decision: new Nuxt 3 app, separate `package.json`, separate Traefik route at
`operator.dezky.local`. Reuses design tokens / NodeMark / UiIcon by copy for
now; a `packages/ui` workspace is a likely follow-up once we have a third
consumer.
**Why separate, not a route group in `apps/portal/`:** security boundary. The
moment any operator-only feature mutates customer state (impersonation, suspend
tenant), a routing or middleware bug on a shared app is catastrophic. Separate
apps make that nearly impossible. Different cookies, different OIDC client,
different domain.
**Cost:** one more docker-compose service, ~10 lines of Traefik labels, one more
volume for `node_modules`. Some duplicated dev tooling (eslint, tsconfig).
---
## Auth — new `dezky-operator` Authentik OAuth provider
Decision: a dedicated OAuth client in Authentik, distinct from `dezky-portal`.
- New provider `dezky-operator` (confidential, PKCE on)
- Redirect URIs: `https://operator.dezky.local/auth/oidc/callback`
- Group binding: `dezky-platform-admins` required at the provider's authorization
flow (Authentik policy), so non-admins can't even consent
- Stricter policies attached only to this provider: MFA required, future IP
allowlist for the office network/VPN
- Token audience claim: `dezky-operator`
- Provisioning's `JwtAuthGuard` widens its audience check to a list:
`['dezky-portal', 'dezky-operator']`
- Per-endpoint guard for operator-only mutations: require `aud === 'dezky-operator'`
AND `actor.platformAdmin === true`. The audience check makes "is this a privileged
session" provable from the token alone, independent of the DB lookup
**UX trade-off accepted:** if Ronni (or any operator who is also a customer)
wants to be in both apps, they log into Authentik twice — once per audience.
Correct security-wise, fine ergonomically.
---
## Backend stays as one service — rename to `services/platform-api`
Decision: route all operator mutations and reads through the existing NestJS
service (no second backend, no Nitro-direct-to-Mongo). Rename
`services/provisioning``services/platform-api` because the service now owns
more than just provisioning — it's the platform's data + control plane.
**What changes during the rename:**
- Directory: `services/provisioning/``services/platform-api/`
- Package: `@dezky/provisioning``@dezky/platform-api`
- Docker container name: `dezky-provisioning``dezky-platform-api`
- Compose service key, network alias, volume names
- Portal env var: `PROVISIONING_INTERNAL_URL``PLATFORM_API_INTERNAL_URL`
- Portal proxy routes: `http://provisioning:3001``http://platform-api:3001`
- Internal module names referencing "provisioning" stay (e.g.
`ProvisioningService` is now one orchestration concern *inside*
`platform-api`; not the whole service's purpose)
- Public URL stays `api.dezky.local` (Traefik routes by Host header, unaffected)
**New endpoints platform-api gains in this phase:**
- `POST /tenants/:slug/suspend`, `POST /tenants/:slug/resume`
- `PATCH /tenants/:slug` already exists; ensure it can change plan / seat cap
- `GET /partners`, `POST /partners`, `GET /partners/:slug`, `PATCH /partners/:slug`
- `Tenant.partnerId` foreign key + filter on tenant queries
- `JwtAuthGuard` accepts both `dezky-portal` and `dezky-operator` audiences;
per-endpoint requirement of `dezky-operator` aud for operator-only mutations
**Strategy:** rename in a separate prep commit before the operator work starts,
so the rename diff is mechanical and reviewable on its own.
---
## Partner schema
```typescript
@Schema({ collection: 'partners', timestamps: true })
class Partner {
slug: string // 'nordicmsp', URL-safe, unique
name: string // 'NordicMSP'
domain: string // 'nordicmsp.dk' — partner's own org domain
status: 'active' | 'in-negotiation' | 'paused' | 'terminated' // default 'in-negotiation'
marginPct: number // 20 = partner keeps 20% of customer MRR; one number per partner
partnershipStartedAt?: Date
contactInfo: { primaryName?, primaryEmail?, billingEmail? }
billingInfo: { /* same shape as Tenant.billingInfo */ }
}
```
**Tenant side:** add `partnerId?: Types.ObjectId` (ref Partner, indexed,
optional). Direct customers have no `partnerId`; partner-owned customers
reference one.
**Computed at query time, not stored:**
- `Partner.customers` — count of tenants with `partnerId === this._id`
- `Partner.mrr` — sum of those tenants' MRR
Storing denormalized would force write-time syncing on every tenant
create/suspend/plan-change for ~zero benefit at our scale.
**Operator-only.** A self-serve partner portal at `partner.dezky.local` is a
future surface; not in this phase. Partners are visible/manageable only from
the operator app.
---
## Impersonation — visual stub now, real flow later
Decision: build the UI exactly as designed (modal with reason field, top
banner, exit button) but do not wire actual token exchange. The confirm action
toasts "impersonation not implemented yet" and writes a mock audit entry.
**Why now:** validates the UX, lets future hires see the operator surface
end-to-end, doesn't introduce a dangerous capability before there's an
operational need.
**Mitigations against confusion:**
- Modal carries a `Demo only` badge — same styling as other stub-data badges
in the operator UI
- Toast on confirm makes the no-op explicit
- The banner does display in mock mode (so we can iterate on its design), but
the underlying session state is local to the operator tab
**Real flow design recorded for the follow-up:** OAuth 2 Token Exchange
(RFC 8693). Authentik supports it. Customer portal needs to accept tokens
carrying an `act` claim alongside `sub`, and show its own impersonation banner
when the two differ. ~2 days of careful work + security review.
---
## Decisions made without grilling (small, low-risk)
- **Theme:** dark by default. Existing `apps/portal/assets/styles/tokens.css`
already defines `[data-theme='dark']` tokens; the operator app sets
`<html data-theme="dark">` at app root and reuses them
- **Mock data location:** TypeScript files under `apps/operator/data/`
(`tenants-mock.ts`, `partners-mock.ts`, `flags-mock.ts`, etc.). Same shape
as `operator-data.jsx` from the design bundle, just retyped
- **Design system reuse:** copy `NodeMark.vue`, `UiIcon.vue`, and the auth
components into `apps/operator/components/` directly. A shared `packages/ui`
workspace becomes worth doing once a third surface needs them (partner
portal? landing site?)
- **OCIS / Stalwart admin shortcuts in operator UI:** out of scope for this
phase. Operator drills via the customer-facing service URLs
---
## Follow-up tasks (post-MVP)
In rough priority order:
1. **Real impersonation flow** — OAuth Token Exchange (RFC 8693), customer
portal `act`-claim handling, audit on entry+exit, banner with origin
operator identity
2. **Real audit log collection** — replace mock fixtures with a `platform_audit`
collection in Mongo that platform-api writes on every privileged action;
stream from there in the operator UI
3. **Feature flag backend**`Flag` schema + per-tenant rollout state + a
tiny flag-eval client every service imports
4. **Incident management backend**`Incident` schema + paging integration
(PagerDuty / OpsGenie / custom). Until then, the incident modal renders
from mock
5. **Support ticket queue**`SupportTicket` schema + email-in ingestion
from a dedicated mailbox via Stalwart
6. **Self-serve Partner portal at `partner.dezky.local`** — Phase 6+ work,
own Nuxt app, own OAuth client, scoped to a partner's own customers
7. **Real environment switcher** — currently cosmetic; would need separate
API endpoints per env, separate Authentik tenants, etc.
8. **Real on-call indicator** — integration with the paging system that
gets installed in (4)
9. **Operator workspace impersonation in OCIS/Stalwart** — operator tooling
reaches *into* the customer's file storage and mail for support, with the
same audit trail as portal impersonation
---
## Out of scope for this entire effort
- Multi-region operator UI
- Read-only investor / board mode (a real persona but build it when there's a
real investor — design has a placeholder "Read-only" role for Jonas Berg)
- White-label of the operator portal (partners get their own portal eventually;
Dezky operator never gets white-labeled — it's our internal tool)
---
## Execution checklist
Tick boxes as work lands. Each phase is roughly one commit. Phases must be
done in order — earlier ones unblock later ones.
### O.0 · Prep — service rename ✓
- [x] Rename `services/provisioning/``services/platform-api/`
- [x] Update `package.json` name → `@dezky/platform-api`
- [x] Update `docker-compose.yml`: container name, service key, volume name,
env var `PROVISIONING_INTERNAL_URL``PLATFORM_API_INTERNAL_URL`,
NUXT_API_BASE points at new hostname
- [x] Update portal proxy routes to read `PLATFORM_API_INTERNAL_URL` and
default to `http://platform-api:3001`
- [x] Sweep docs (README, CLAUDE.md, SERVICES.md, AUTHENTIK-SETUP.md,
NEXT-STEPS.md, TROUBLESHOOTING.md) for stale references
- [x] Verify customer portal `/api/me` still works end-to-end after rename
### O.1 · Authentik — operator OAuth client
- [ ] Create `dezky-operator` OAuth provider via Authentik API
- [ ] Set redirect URIs to `https://operator.dezky.local/auth/oidc/{callback,logout}`
- [ ] Confidential client; persist client_secret to `.env` as
`OPERATOR_OIDC_CLIENT_SECRET`
- [ ] Create application binding linking the provider to a
`dezky-platform-admins`-only authorization flow (only group members can
reach the consent screen)
- [ ] Configure MFA-required policy on this provider
- [ ] Verify via `curl` that the discovery doc resolves at
`/application/o/dezky-operator/.well-known/openid-configuration`
### O.2 · platform-api — multi-audience + Partner CRUD
- [ ] `JwtAuthGuard`: accept audience list `['dezky-portal', 'dezky-operator']`
- [ ] New decorator/guard `@RequiresOperatorAudience()` enforcing
`aud === 'dezky-operator' && actor.platformAdmin`
- [ ] `schemas/partner.schema.ts` — Partner model (slug, name, domain,
status, marginPct, contactInfo, billingInfo)
- [ ] `partners/` module: controller + service + DTOs (create / read /
update / soft-delete)
- [ ] Add `partnerId?: Types.ObjectId` (ref Partner, index) to Tenant schema
- [ ] Aggregations: `Partner.customers` (count) and `Partner.mrr` (sum)
computed at query time
- [ ] Tenant lifecycle endpoints: `POST /tenants/:slug/suspend`,
`POST /tenants/:slug/resume`, plan/seat-cap change via existing PATCH
- [ ] All operator-only mutations gated by `@RequiresOperatorAudience()`
- [ ] Smoke test: `curl` create-partner with a `dezky-operator` token works,
same call with a `dezky-portal` token gets 403
### O.3 · Scaffold `apps/operator/`
- [ ] `apps/operator/package.json` (Nuxt 3, `nuxt-oidc-auth` beta.11, same
deps as portal)
- [ ] `nuxt.config.ts` with `oidc` block pointing at `dezky-operator`
- [ ] Docker compose service `operator`, with Traefik labels for
`operator.dezky.local`, `node_modules` volume, same `NODE_EXTRA_CA_CERTS`
mount for mkcert
- [ ] Network alias on Traefik: `operator.dezky.local`
- [ ] User task: add `operator.dezky.local` to `/etc/hosts`
- [ ] Session secrets in `.env`: `NUXT_OIDC_TOKEN_KEY` (base64-32),
`NUXT_OIDC_SESSION_SECRET`, `NUXT_OIDC_AUTH_SESSION_SECRET`
**distinct from** the customer portal's secrets
- [ ] Verify login: visit `https://operator.dezky.local`, bounce to Authentik,
sign in as akadmin, land on a placeholder index page
### O.4 · Design system + app shell
- [ ] `assets/styles/tokens.css` — copy with `data-theme="dark"` as default
- [ ] `assets/styles/base.css`
- [ ] Components: `NodeMark.vue`, `UiIcon.vue` (copy from portal)
- [ ] Shared primitives ported from the design: `Card`, `Button`, `Table`,
`Badge`, `Mono`, `Eyebrow`, `StatusDot`, `Avatar`, `PageHeader`
- [ ] `OpSidebar.vue` — collapsible, badges per nav item
- [ ] `OpTopbar.vue` — env badge, ⌘K trigger, on-call pill, bell, avatar
- [ ] `app.vue` shell wires sidebar + topbar + `<NuxtPage />`
- [ ] Keyboard shortcut: ⌘[ collapses sidebar, ⌘K opens palette
### O.5 · Tenant management (real backend)
- [ ] `pages/tenants/index.vue` — list with status/plan/seats/MRR columns,
filter by partner and status, search by slug/name
- [ ] `pages/tenants/[slug].vue` — detail view with tabs
- [ ] Tab: **Overview** — header card, key stats, partner link
- [ ] Tab: **Users** — list users via `GET /users?tenantSlug=…`
- [ ] Tab: **Resources** — provisioning status per integration
(Authentik / Stalwart / OCIS), error messages, "Reconcile" button
- [ ] Tab: **Billing** (mock fixtures)
- [ ] Tab: **Audit** (mock fixtures)
- [ ] Tab: **Support** (mock fixtures)
- [ ] Tab: **Danger** — suspend, resume, change plan, soft-delete; real
backend calls, confirmation modals
### O.6 · Partner management (real backend)
- [ ] `pages/partners/index.vue` — list with name/domain/status/customers/MRR
- [ ] `pages/partners/[slug].vue` — detail panel with customers list,
MRR breakdown, margin, contact info
- [ ] "Create partner" modal — POST /partners
- [ ] Attach / detach tenant to partner (PATCH on tenant.partnerId)
### O.7 · Visual-only screens (mock fixtures)
- [ ] `data/*.ts` — typed mock fixtures (tenants-extra, partners-extra,
services, incident, flags, audit, team)
- [ ] `pages/index.vue` — Overview dashboard
- [ ] `pages/operator-team.vue` — real backend (Users where
`platformAdmin === true`)
- [ ] `pages/users.vue` — global users, real read
- [ ] `pages/infrastructure.vue` — service health (mock for now;
docker health check integration is a follow-up)
- [ ] `pages/flags.vue` — feature flags (mock)
- [ ] `pages/audit.vue` — global audit (mock)
- [ ] `pages/support.vue` — placeholder
- [ ] `pages/billing.vue` — placeholder
- [ ] `pages/reports.vue` — placeholder
- [ ] `pages/settings.vue` — placeholder
### O.8 · Interactions
- [ ] `CommandPalette.vue` — ⌘K opens, fuzzy search over tenants + partners
+ flags + nav items + actions
- [ ] `ImpersonationModal.vue` — visual stub with reason field, Demo-only
badge, no-op confirm + toast
- [ ] `ImpersonationBanner.vue` — top banner shown when impersonating
- [ ] `IncidentModal.vue` — mock incident render
- [ ] `TweaksPanel.vue` — theme (light/dark), density (comfy/compact),
env (prod/staging/dev cosmetic switch)
### O.9 · Verification
- [ ] Sign in to `operator.dezky.local` as akadmin via the new OAuth client
- [ ] Confirm JWT audience is `dezky-operator` (decode in DevTools, post
response back)
- [ ] Create a real Partner via the UI, see it in Mongo
- [ ] Attach the `acme` tenant to that partner; verify count goes 0 → 1
- [ ] Suspend a tenant from the Danger tab; confirm `status: 'suspended'`
in Mongo
- [ ] Sign in to `app.dezky.local` simultaneously in another browser
profile, confirm the customer portal still works and that customer
token's `aud` is `dezky-portal`
- [ ] Tick all the relevant follow-up tasks in NEXT-STEPS.md as remaining
work, file separate issues if anything was deferred