Phase 1 of the audit work — capture everything we control today, ingest from
external systems (Authentik / OCIS / Stalwart) in a later phase. The mock
OP_AUDIT fixture is gone; both the /audit page and Overview's activity card
now show real events recorded by AuditService.record() in platform-api.
Schema (services/platform-api/src/schemas/audit-event.schema.ts):
AuditEvent { at, actorType, actorId, actorEmail, actorIp, action, outcome,
resourceType, resourceId, resourceName, tenantSlug, partnerSlug, source,
metadata, prevHash, hash }
Indexes: {at:-1}, {tenantSlug,at:-1}, {actorId,at:-1}, {action,at:-1}.
prevHash/hash are nullable now; hash-chain tamper evidence is a later phase.
AuditService:
- record() — best-effort write, swallows errors so the underlying mutation
that succeeded isn't failed by a downstream log issue. Surfaces failures
via Logger.
- list() — filters: since/until/before, action (exact OR prefix match
via leading-anchor regex), tenantSlug, partnerSlug, actorEmail, outcome,
free-text q across action/resourceName/actorEmail/tenantSlug, limit
(default 100, max 500). Cursor pagination via `before`.
- No UPDATE/DELETE surface — entries are append-only by construction.
AuditController: GET /audit, behind JwtAuthGuard + OperatorGuard. No mutations
exposed; entries written internally by other modules.
X-Forwarded-For threading:
- apps/operator/server/utils/platform-api.ts forwards the originating
client IP to platform-api so audit entries carry a real address.
- services/platform-api/src/auth/client-ip.ts extracts leftmost
X-Forwarded-For, falls back to socket.remoteAddress.
Instrumented mutations (every one threads actor + IP through):
Tenants: create, update, softDelete, setStatus(suspend/resume)
Partners: create, update, terminate
Flags: create, update (incl. flag.killed verb when state=off+note=kill-switch),
remove
Users: deactivate
Each controller resolves the User doc via ActorService, extracts IP via
clientIp(req), and passes { userId, email, ip } as AuditActor to the service.
FlagsService's local ActorRef collapses to AuditActor so flag history and the
audit log share one shape.
Operator UI:
- /api/audit proxy that forwards query params verbatim
- types/audit.ts
- pages/audit.vue: real list with quick-pick action chips (All/Tenants/
Partners/Flags/Users), outcome filter, free-text search, "Load older
events" cursor pagination
- pages/index.vue: Overview activity card swaps mock OP_AUDIT for the
same /api/audit endpoint, rows link into /audit
- data/fixtures.ts: OP_AUDIT / AuditEntry / AuditTone exports removed
Verified end-to-end: suspended + resumed acme, flipped oci_versioning through
rollout → kill → on, then /audit returned all 5 events with the right action
verbs (tenant.suspended, tenant.resumed, flag.updated, flag.killed,
flag.updated), actor admin@dezky.local, IP 192.168.65.1. Filters (action
prefix + free-text q) narrow correctly.
Out of scope for this commit (each gets its own conversation):
- Authentik / OCIS / Stalwart ingest adapters (Phase 2)
- Hash-chain tamper evidence (Phase 3)
- TTL + cold-storage archival to Hetzner Object Storage (Phase 4)
- GDPR right-to-erasure tooling
Real backend for the flags page (was pure mock). Built so it's ready for
the first risky rollout (likely the Stalwart JMAP client or the Stripe
billing engine).
services/platform-api:
- Flag schema (key, description, state, pct, scope.{plans, tenantSlugs,
partnerSlugs, environments}, embedded history capped at 20)
- FlagsService with CRUD + evaluateAll(tenantSlug) → { key: bool }
Eval algorithm:
off → false; on → true
targeted → require non-empty scope (empty allowlist means "nobody"),
then match every non-empty axis
rollout → match scope, then sha256(`${tenantId}:${key}`) % 100 < pct
Hash-based rollout is deterministic: bumping pct only flips the new
slice. Pure helpers (matchesScope, hasAnyScope, inRolloutBucket) are
exported for future unit tests.
- FlagsController exposes GET /flags, GET /flags/:key, POST /flags/evaluate
(JwtAuthGuard); POST/PATCH/DELETE require OperatorGuard. History entries
capture the actor's email.
- SeedService idempotently creates 10 flag keys mapping to real Dezky
concerns (jmap_native_v2, gdpr_export_v2, new_billing_engine, etc.).
$setOnInsert so operator edits survive restarts.
apps/operator:
- 6 proxies: /api/flags index get/post, [key] get/patch/delete, evaluate post
- types/flag.ts with the shape that mirrors the backend
- pages/flags.vue: useFetch real list, row click opens FlagDetail,
"New flag" opens NewFlagModal, scope summary column shows targeting
at a glance
- FlagDetail.vue: side panel with segmented state, rollout slider with
live "~N of M tenants" preview from /api/tenants, plan/tenant/env chip
pickers, dirty-tracked Save, instant Kill-switch (PATCH state=off+pct=0),
embedded change history
- NewFlagModal.vue: minimal create form (key + description). Everything
else is configured in the detail panel afterward.
- CommandPalette: feature-flag rows now come from /api/flags instead of
the dropped fixture, so newly-created flags are searchable immediately
- data/fixtures.ts: drop FLAGS / FeatureFlag exports (replaced by the
real backend)
Smoke-tested end-to-end: list renders 10 seed flags, opening gdpr_export_v2
and flipping to rollout 25% then saving persists + adds a history entry,
kill-switch sets state=off in one click, /api/flags/evaluate returns the
correct booleans for the seeded tenant, same tenant gets the same answer
on consecutive evals (determinism), and creating + deleting a flag through
the UI roundtrips correctly.
The Infrastructure page used to read from a mock fixture that lied two ways:
it listed services that aren't deployed (Jitsi, Zulip, Cloudflare, Object
Storage, Postmark) and showed hardcoded uptime/latency for the ones that
are. Now it shows truth from real probes plus a clearly-labelled "planned"
section for the rest.
Backend (services/platform-api):
- New src/health/ module — HealthService runs 9 probes in parallel with a
1.5s timeout each:
Stalwart → TCP stalwart:8080
OCIS → HTTP GET ocis:9200/health
Collabora → HTTP GET collabora:9980/hosting/discovery
Authentik → HTTP GET authentik-server:9000/-/health/ready/
Postgres → TCP postgres:5432
Mongo → existing Mongoose connection.db.admin().ping()
Redis → TCP redis:6379
Traefik → TCP traefik:80
Platform API → trivially ok (this code is running)
Status thresholds: ok ≤500ms, warn 500–1500ms, bad on timeout/refuse.
- HealthController exposes GET /health/platform behind JwtAuthGuard, plus
keeps the existing public GET /health for infra liveness checks.
- Moved the old src/health.controller.ts into the new module.
Frontend (apps/operator):
- /api/health/platform proxy forwards the operator's access token.
- Infrastructure page swaps SERVICES fixture for useFetch with 30s auto-
refresh + a manual Refresh button. Cards show real status badge + real
latency; uptime/error stay as em-dash with a "no probe history yet"
tooltip until a Prometheus/event-log backend lands.
- Below the live grid, a "Planned · not deployed" section renders 5 dimmed
cards (Jitsi, Zulip, simpledns.plus, Hetzner Object Storage, Postmark).
simpledns.plus replaces the misnamed Cloudflare entry — we use
simpledns.plus, not Cloudflare.
- Subtitle is now truthful: "8 / 9 services live · checked 2s ago".
Verified: stopped redis → card flipped to "down · getaddrinfo ENOTFOUND
redis", subtitle reflected 8/9, incident banner appeared. Restarted →
back to 9/9, banner gone.
SERVICES fixture stays in place for Overview's incident banner — replacing
that is a separate follow-up tied to the incident-management backend.
Right-anchored slide-in inbox triggered by the bell button. Backend is a
follow-up — for now this is a visual + behavior shell with mock fixtures,
same pattern as INCIDENT / FLAGS / OP_AUDIT.
- data/fixtures.ts: new NotificationItem type + 6 seed rows from the
design (DMARC, invitation, invoice, SAML, ticket reply, failed sign-in)
- useNotifications composable: isOpen + items + unreadCount + markRead +
markAllRead. Items deep-clone the fixture on first import so toggling
unread doesn't mutate the shared seed.
- NotificationDrawer component: Teleport + scrim + slide animation,
header/list/footer. Each row shows tone-tinted icon tile + title +
description + timestamp + left-rail unread dot. Click a row to mark
read; click Mark all read or Preferences in the footer.
- OpTopbar: bell now opens the drawer and only shows .icon-btn-dot when
unreadCount > 0.
- Layout mounts <NotificationDrawer /> alongside the other floating
components.
Dismissal: backdrop click, Escape, X, and route-change watcher (so
Preferences → /settings closes the drawer cleanly).
The "on-call · Mikkel" pill named a person who doesn't exist and a paging
system we haven't built. The IncidentModal still says "will notify on-call"
but nothing actually does, and no schema for rotations / pages exists in
platform-api. Showing this in the chrome was claiming an operational fact
that isn't true.
Drop the prop, span, and CSS from OpTopbar. The right cluster becomes
just [bell] [profile].
Mock audit + incident-timeline fixtures still carry historical "on-call
paged" entries — those are records of past events in the mock, not live
state, so they stay. Paging gets a real indicator when the incident
backend lands (tracked as "Real on-call indicator" in NEXT-STEPS.md).
Replace the inert .spacer (flex: 0 0 auto, did nothing) with a real .right
wrapper using margin-left: auto. The on-call indicator, notifications bell,
and UserMenu now form a single right-aligned cluster instead of sitting
loose in the header flex flow.
A toggle-able env badge is a sticker, not a safety signal. Move env to
useEnv() which reads window.location.hostname:
*.local / localhost → 'dev'
*staging* → 'staging'
everything else → 'prod' (safest default)
- New composable: apps/operator/composables/useEnv.ts
- Topbar reads useEnv() instead of useTweaks().env
- useTweaks loses the env field; hydrate strips it from stale
localStorage payloads so old entries don't break
- TweaksPanel: env section removed (theme + density remain)
- Settings: env section removed from Appearance; added a read-only
Environment row to the Profile card showing the detected env +
hostname source ("auto-detected from operator.dezky.local")
Replace the OpPlaceholder stub at /settings with a three-card account page:
- Profile: name, email, subject ID, deduped group chips (Authentik returns
each group twice; dedupe in the computed). Last sign-in derived from JWT
iat via the existing /api/_verify-token endpoint.
- Security: three deep links to Authentik's user settings — change password,
manage MFA devices, active sessions. We don't re-implement identity here;
Authentik already has a polished UI for it.
- Appearance: theme / density / env segmented controls. Shares the
useTweaks composable with the floating Tweaks panel, so flipping here
is reflected there and vice-versa.
New UserMenu component owns its own trigger + dropdown + dismissal so the
topbar stays simple. Menu contents: identity row (name + email), theme
toggle (reuses useTweaks so the floating panel and menu stay in sync),
link to /settings, Sign out (calls useOidcAuth().logout).
Dismissal: outside click via a transparent Teleport scrim, Escape, and
route change (watch on route.path → close).
Drops the now-unused useOidcAuth import from OpTopbar.
The .me-wrap block in OpSidebar was an inert button — no click handler,
no menu — and duplicated the avatar already shown in the topbar. Remove
it so there's a single place to render the user (topbar), making room
for the avatar dropdown that's landing next.
- Add _verify-token.get.ts to both operator and portal — decodes the
access token stored in the nuxt-oidc-auth session and echoes iss/aud/
sub/groups. Used to confirm operator tokens carry aud=dezky-operator
and portal tokens carry aud=dezky-portal. Listed in NEXT-STEPS.md as
throwaway, to be removed when proper verification surfaces exist.
- OPERATOR-PLAN.md O.9 marked done with the actual claims captured + the
Mongo-side verification of attach + suspend flows.
- NEXT-STEPS.md: replaced the "Operator portal — out-of-band track"
section with a "shipped + follow-ups" version. The 9-item follow-up
list (impersonation, audit, flags, incidents, support, partner
portal, env switcher, on-call, workspace impersonation) is now the
authoritative roadmap, not buried inside OPERATOR-PLAN.md.
- CommandPalette + useCommandPalette: ⌘K opens a search-and-jump panel over
real tenants/partners + fixture flags + nav + actions. Arrow keys + Enter
navigate, Escape/backdrop close. Recents are intentionally omitted for now;
add when there's something to recent over.
- Impersonation stub: useImpersonation + ImpersonationModal + ImpersonationBanner.
Modal opens from tenant detail and from the palette. Banner stays at the top
of the shell until exited. No real OBO token is minted — wiring OAuth Token
Exchange is tracked as a follow-up.
- IncidentModal + useIncidentModal: opened from the Overview and Infrastructure
incident banners, renders the mock INCIDENT data with metrics, timeline and
draft composer.
- TweaksPanel + useTweaks: floating bottom-right panel for theme (dark/light),
density (comfy/compact), env badge (prod/staging/dev). Saved to localStorage.
- Theme/density apply via [data-theme] + [data-density] overrides in
tokens.css. Topbar env badge now reads from useTweaks instead of a prop.
- Layout wires ⌘K + ⌘[ at the document level and mounts the palette + modals
+ banner + tweaks panel once for all pages.
- Overview (pages/index.vue): KPIs from real /tenants /partners /users,
status meter, recent + needs-follow-up tables. Mock activity stream and
incident banner overlay come from data/fixtures.ts.
- Operator team: real GET /users filtered to platformAdmin === true,
with last-seen + tenant counts.
- Users (global): real read with All/Admins/Inactive views and search.
- Infrastructure / Feature flags / Audit: mock fixtures only — wiring to
real backends (Prometheus, OpenFeature, append-only audit) is tracked
as follow-ups in OPERATOR-PLAN.md.
- Placeholder pages (support/billing/reports/settings) via OpPlaceholder.
- Shared: Stat, MetricCell, OpPlaceholder components, /api/users proxy,
PlatformUser type.
- .gitignore: scope the docker volumes data/ rule so apps/*/data/ is
tracked again (operator carries mock fixtures there).
- Partners list with name/domain/status/customers/margin + Create modal
- Partner detail: contract card, contact card, customers table, attach modal,
terminate (soft-delete) danger card
- Operator proxies for /partners + /partners/:slug/tenants
- platform-api: add partnerId Prop to Tenant schema. The field was being
silently dropped by Mongoose because the schema didn't declare it.
- tenants.service: rewrite update() to build $set/$unset explicitly and cast
partnerId via new Types.ObjectId(). Handles detach via $unset so the field
vanishes from the doc cleanly.
Operator can now manage tenants end-to-end from the UI:
- pages/tenants/index.vue — list with status/plan/domains/created/
provisioning-state columns, search by slug or name, status chips
with live counts (all/active/pending/suspended), click-through
to detail
- pages/tenants/[slug].vue — 7-tab detail (Overview, Users, Resources,
Billing, Audit, Support, Danger zone)
- 3 tabs hit real backends: Overview (identity + billing fields),
Users (lazy-loaded via new GET /tenants/:slug/users endpoint),
Resources (live provisioning state per integration + Reconcile button)
- 3 tabs render mock fixtures with warn-tone "mock" badges: Billing
(Stripe placeholder), Audit (sample log lines), Support (placeholder
pending the ticket queue work)
- Danger zone: 3 real-backend cards (Suspend / Resume / Soft-delete),
each gated by a ConfirmDialog modal. Verified live — clicked
Suspend on acme, status flipped to 'suspended' in Mongo, then
Resumed back to 'active'
platform-api additions:
- GET /tenants/:slug/users returns users with this tenant in their
tenantIds, sorted by last login. Same authorization rule as the
existing /tenants/:slug — platform admins always pass,
non-admins must be a member of the tenant
- tenants.module imports User schema for the new lookup
New components (apps/operator/components/):
- Tabs.vue — horizontal strip with optional per-tab counts, v-model
- ConfirmDialog.vue — Teleport-to-body modal, Escape/backdrop close,
danger/primary tone for the confirm button
Server proxy infrastructure (apps/operator/server/):
- utils/platform-api.ts — single helper encapsulating
access-token-from-session + bearer-forward + error normalization.
Every operator proxy route is now a one-liner against this helper
- api/tenants/index.get.ts, [slug]/{index.get,index.patch,index.delete,
users.get,suspend.post,resume.post,reconcile.post}.ts
Two real bugs found and fixed during the smoke test:
- Mongoose subdocument `_id` leaks into JSON when iterating
tenant.provisioningStatus. Switched to an explicit
`['authentik', 'stalwart', 'ocis']` whitelist in both v-fors
- Documents created before provisioningErrors was added (like the
acme tenant) don't have the field at all in JSON. Use optional
chaining (`tenant.provisioningErrors?.[k]`) instead of bracket
access. Without it: 'Cannot read properties of undefined (reading
"authentik")' during the Resources tab render
Operator portal now wears its real chrome instead of placeholder spans.
Sidebar + topbar + page header all rendered against the carbon palette
from tokens.css.
Components ported from the source design (operator-app.jsx,
platform-ui.jsx, operator-screens.jsx) as Vue 3 SFCs in
apps/operator/components/:
Foundation: NodeMark (copied from portal), UiIcon (expanded to 31 icons
covering sidebar/topbar/sort/arrows)
Primitives: Card (3 surface variants), UiButton (primary / secondary /
ghost / dark / danger × sm / md / lg), DataTable (header + rows),
Badge (7 tones), Avatar (deterministic palette by name hash), Mono,
Eyebrow, StatusDot, PageHeader (with actions slot)
Shell: OpSidebar (collapsible 232<->56px, 12 nav items in 4 sections,
active-row highlight from route, badge slot, brand + user footer);
OpTopbar (env badge with prod/staging/dev variants, palette trigger
stub for the ⌘K work in O.8, on-call pill, bell, avatar)
Layouts: layouts/default.vue wires sidebar + topbar + slot; layouts/blank.vue
is used by the login page (definePageMeta layout:'blank'). app.vue now
wraps NuxtPage in NuxtLayout (the missing piece — without it Nuxt warns
"Your project has layouts but the <NuxtLayout /> component has not been
used" and renders nothing chrome-wise).
Composable composables/useSidebar.ts holds the collapsed state shared
between OpSidebar's toggle button and layouts/default.vue's ⌘[ keyboard
shortcut.
Verified in the browser:
- Sidebar renders all 12 nav links with section dividers, env badge shows
PROD, PageHeader resolves to the user's display name from
useOidcAuth().user
- Collapse toggle flips sidebar width 232↔56; nav rows become icon-only
- Smoke test on the placeholder home still returns 409 for the seeded
test-partner (token forwarding survives the layout refactor)
Gotcha documented in the plan: Vite 7.3 added a strict
server.allowedHosts check that returns plaintext 403 for any host header
that isn't the dev origin. The customer portal pre-dates this Vite
version; operator needs allowedHosts: ['operator.dezky.local'] in
nuxt.config.ts under vite.server.
Pages/index.vue replaces the bare HTML placeholder from O.3 with the
new PageHeader + Card primitives — same smoke-test functionality, much
better visual fidelity.
Real screen content (Tenants, Partners, Infrastructure, etc.) lands in
O.5+. This commit is the chrome, the smoke test, and the verification
that the design system primitives compose correctly.
New Nuxt 3 app at apps/operator/ — internal admin portal on its own domain
(operator.dezky.local), own OAuth client (dezky-operator), own session
secrets, own cookies. Customer and operator surfaces can't decrypt each
other's session state.
OAuth flow verified end-to-end:
- GET / → middleware redirect to /auth/login
- User clicks Sign in → /auth/oidc/login → bounces to Authentik with
client_id=dezky-operator, scope includes 'groups'
- Authentik checks dezky-platform-admins group binding (added in O.1),
silent-reauths via the existing auth.dezky.local session
- Returns to /auth/oidc/callback with code, exchanges for token,
creates session cookie on operator.dezky.local
- Lands on pages/index.vue placeholder dashboard
Smoke test 'Create partner "test-partner"' button on the placeholder home
exercises the full operator-only authorization chain:
- 1st call: 200, partner created in Mongo
- 2nd call: 409 'already exists' (idempotency holds, token still valid)
- Same call from the customer portal: 403 'requires operator-scoped
token' (audience guard rejects dezky-portal aud)
JwtAuthGuard now multi-issuer in addition to multi-audience. Each
Authentik OAuth provider mints tokens with its own per-app iss URL
(.../application/o/<slug>/), so the guard accepts a comma-separated
AUTHENTIK_ISSUER. The audience-only fix from O.2 wasn't sufficient —
issuer is validated separately by jose.jwtVerify and was still pinned
to dezky-portal alone, yielding 'unexpected iss claim value' rejections.
Compose changes: new 'operator' service (Node 20 alpine, pnpm install +
nuxt dev, mkcert CA mount, traefik labels for operator.dezky.local +
TLS); new operator_node_modules volume; operator.dezky.local added to
traefik's Docker network aliases. Distinct OPERATOR_NUXT_OIDC_* session
secrets pulled from .env (gitignored, generated via openssl).
Real operator screens (sidebar, topbar, tenants, partners, etc.) come
in O.4. This commit is pure scaffolding + the security boundary proof.