dezky

Author	SHA1	Message	Date
Ronni Baslund	868a305539	feat(flags): real feature-flag system with bulk eval + operator UI Real backend for the flags page (was pure mock). Built so it's ready for the first risky rollout (likely the Stalwart JMAP client or the Stripe billing engine). services/platform-api: - Flag schema (key, description, state, pct, scope.{plans, tenantSlugs, partnerSlugs, environments}, embedded history capped at 20) - FlagsService with CRUD + evaluateAll(tenantSlug) → { key: bool } Eval algorithm: off → false; on → true targeted → require non-empty scope (empty allowlist means "nobody"), then match every non-empty axis rollout → match scope, then sha256(`${tenantId}:${key}`) % 100 < pct Hash-based rollout is deterministic: bumping pct only flips the new slice. Pure helpers (matchesScope, hasAnyScope, inRolloutBucket) are exported for future unit tests. - FlagsController exposes GET /flags, GET /flags/:key, POST /flags/evaluate (JwtAuthGuard); POST/PATCH/DELETE require OperatorGuard. History entries capture the actor's email. - SeedService idempotently creates 10 flag keys mapping to real Dezky concerns (jmap_native_v2, gdpr_export_v2, new_billing_engine, etc.). $setOnInsert so operator edits survive restarts. apps/operator: - 6 proxies: /api/flags index get/post, [key] get/patch/delete, evaluate post - types/flag.ts with the shape that mirrors the backend - pages/flags.vue: useFetch real list, row click opens FlagDetail, "New flag" opens NewFlagModal, scope summary column shows targeting at a glance - FlagDetail.vue: side panel with segmented state, rollout slider with live "~N of M tenants" preview from /api/tenants, plan/tenant/env chip pickers, dirty-tracked Save, instant Kill-switch (PATCH state=off+pct=0), embedded change history - NewFlagModal.vue: minimal create form (key + description). Everything else is configured in the detail panel afterward. - CommandPalette: feature-flag rows now come from /api/flags instead of the dropped fixture, so newly-created flags are searchable immediately - data/fixtures.ts: drop FLAGS / FeatureFlag exports (replaced by the real backend) Smoke-tested end-to-end: list renders 10 seed flags, opening gdpr_export_v2 and flipping to rollout 25% then saving persists + adds a history entry, kill-switch sets state=off in one click, /api/flags/evaluate returns the correct booleans for the seeded tenant, same tenant gets the same answer on consecutive evals (determinism), and creating + deleting a flag through the UI roundtrips correctly.	2026-05-24 19:21:15 +02:00
Ronni Baslund	77a09aaf77	feat(operator): live Infrastructure probes + honest split between deployed and planned The Infrastructure page used to read from a mock fixture that lied two ways: it listed services that aren't deployed (Jitsi, Zulip, Cloudflare, Object Storage, Postmark) and showed hardcoded uptime/latency for the ones that are. Now it shows truth from real probes plus a clearly-labelled "planned" section for the rest. Backend (services/platform-api): - New src/health/ module — HealthService runs 9 probes in parallel with a 1.5s timeout each: Stalwart → TCP stalwart:8080 OCIS → HTTP GET ocis:9200/health Collabora → HTTP GET collabora:9980/hosting/discovery Authentik → HTTP GET authentik-server:9000/-/health/ready/ Postgres → TCP postgres:5432 Mongo → existing Mongoose connection.db.admin().ping() Redis → TCP redis:6379 Traefik → TCP traefik:80 Platform API → trivially ok (this code is running) Status thresholds: ok ≤500ms, warn 500–1500ms, bad on timeout/refuse. - HealthController exposes GET /health/platform behind JwtAuthGuard, plus keeps the existing public GET /health for infra liveness checks. - Moved the old src/health.controller.ts into the new module. Frontend (apps/operator): - /api/health/platform proxy forwards the operator's access token. - Infrastructure page swaps SERVICES fixture for useFetch with 30s auto- refresh + a manual Refresh button. Cards show real status badge + real latency; uptime/error stay as em-dash with a "no probe history yet" tooltip until a Prometheus/event-log backend lands. - Below the live grid, a "Planned · not deployed" section renders 5 dimmed cards (Jitsi, Zulip, simpledns.plus, Hetzner Object Storage, Postmark). simpledns.plus replaces the misnamed Cloudflare entry — we use simpledns.plus, not Cloudflare. - Subtitle is now truthful: "8 / 9 services live · checked 2s ago". Verified: stopped redis → card flipped to "down · getaddrinfo ENOTFOUND redis", subtitle reflected 8/9, incident banner appeared. Restarted → back to 9/9, banner gone. SERVICES fixture stays in place for Overview's incident banner — replacing that is a separate follow-up tied to the incident-management backend.	2026-05-24 18:47:38 +02:00
Ronni Baslund	fbbb43e3e2	feat(operator): partner management with attach/detach (O.6) - Partners list with name/domain/status/customers/margin + Create modal - Partner detail: contract card, contact card, customers table, attach modal, terminate (soft-delete) danger card - Operator proxies for /partners + /partners/:slug/tenants - platform-api: add partnerId Prop to Tenant schema. The field was being silently dropped by Mongoose because the schema didn't declare it. - tenants.service: rewrite update() to build $set/$unset explicitly and cast partnerId via new Types.ObjectId(). Handles detach via $unset so the field vanishes from the doc cleanly.	2026-05-24 08:02:00 +02:00
Ronni Baslund	8e81730372	feat(operator): tenant list + 7-tab detail with real lifecycle (O.5) Operator can now manage tenants end-to-end from the UI: - pages/tenants/index.vue — list with status/plan/domains/created/ provisioning-state columns, search by slug or name, status chips with live counts (all/active/pending/suspended), click-through to detail - pages/tenants/[slug].vue — 7-tab detail (Overview, Users, Resources, Billing, Audit, Support, Danger zone) - 3 tabs hit real backends: Overview (identity + billing fields), Users (lazy-loaded via new GET /tenants/:slug/users endpoint), Resources (live provisioning state per integration + Reconcile button) - 3 tabs render mock fixtures with warn-tone "mock" badges: Billing (Stripe placeholder), Audit (sample log lines), Support (placeholder pending the ticket queue work) - Danger zone: 3 real-backend cards (Suspend / Resume / Soft-delete), each gated by a ConfirmDialog modal. Verified live — clicked Suspend on acme, status flipped to 'suspended' in Mongo, then Resumed back to 'active' platform-api additions: - GET /tenants/:slug/users returns users with this tenant in their tenantIds, sorted by last login. Same authorization rule as the existing /tenants/:slug — platform admins always pass, non-admins must be a member of the tenant - tenants.module imports User schema for the new lookup New components (apps/operator/components/): - Tabs.vue — horizontal strip with optional per-tab counts, v-model - ConfirmDialog.vue — Teleport-to-body modal, Escape/backdrop close, danger/primary tone for the confirm button Server proxy infrastructure (apps/operator/server/): - utils/platform-api.ts — single helper encapsulating access-token-from-session + bearer-forward + error normalization. Every operator proxy route is now a one-liner against this helper - api/tenants/index.get.ts, [slug]/{index.get,index.patch,index.delete, users.get,suspend.post,resume.post,reconcile.post}.ts Two real bugs found and fixed during the smoke test: - Mongoose subdocument `_id` leaks into JSON when iterating tenant.provisioningStatus. Switched to an explicit `['authentik', 'stalwart', 'ocis']` whitelist in both v-fors - Documents created before provisioningErrors was added (like the acme tenant) don't have the field at all in JSON. Use optional chaining (`tenant.provisioningErrors?.[k]`) instead of bracket access. Without it: 'Cannot read properties of undefined (reading "authentik")' during the Resources tab render	2026-05-24 07:44:23 +02:00
Ronni Baslund	55b1c133e3	feat(operator): scaffold apps/operator Nuxt app + multi-issuer JWT (O.3) New Nuxt 3 app at apps/operator/ — internal admin portal on its own domain (operator.dezky.local), own OAuth client (dezky-operator), own session secrets, own cookies. Customer and operator surfaces can't decrypt each other's session state. OAuth flow verified end-to-end: - GET / → middleware redirect to /auth/login - User clicks Sign in → /auth/oidc/login → bounces to Authentik with client_id=dezky-operator, scope includes 'groups' - Authentik checks dezky-platform-admins group binding (added in O.1), silent-reauths via the existing auth.dezky.local session - Returns to /auth/oidc/callback with code, exchanges for token, creates session cookie on operator.dezky.local - Lands on pages/index.vue placeholder dashboard Smoke test 'Create partner "test-partner"' button on the placeholder home exercises the full operator-only authorization chain: - 1st call: 200, partner created in Mongo - 2nd call: 409 'already exists' (idempotency holds, token still valid) - Same call from the customer portal: 403 'requires operator-scoped token' (audience guard rejects dezky-portal aud) JwtAuthGuard now multi-issuer in addition to multi-audience. Each Authentik OAuth provider mints tokens with its own per-app iss URL (.../application/o/<slug>/), so the guard accepts a comma-separated AUTHENTIK_ISSUER. The audience-only fix from O.2 wasn't sufficient — issuer is validated separately by jose.jwtVerify and was still pinned to dezky-portal alone, yielding 'unexpected iss claim value' rejections. Compose changes: new 'operator' service (Node 20 alpine, pnpm install + nuxt dev, mkcert CA mount, traefik labels for operator.dezky.local + TLS); new operator_node_modules volume; operator.dezky.local added to traefik's Docker network aliases. Distinct OPERATOR_NUXT_OIDC_* session secrets pulled from .env (gitignored, generated via openssl). Real operator screens (sidebar, topbar, tenants, partners, etc.) come in O.4. This commit is pure scaffolding + the security boundary proof.	2026-05-24 07:20:16 +02:00
Ronni Baslund	2db41fec5e	feat(platform-api): multi-audience JWT + Partner CRUD + tenant lifecycle (O.2) JwtAuthGuard now accepts a comma-separated AUTHENTIK_AUDIENCE ('dezky-portal,dezky-operator'). jose.jwtVerify takes an array and succeeds on any match — both customer-portal and operator-portal tokens validate against this service. Per-endpoint guards restrict further. New OperatorGuard enforces operator-only mutations: 1. JWT audience claim includes 'dezky-operator' (proof from the token alone that this is a privileged session) 2. ActorService-resolved User has platformAdmin=true (DB check so revocation works without waiting for the token to expire) Both required; either alone is insufficient. Partner module: - Partner schema: slug, name, domain, status, marginPct, contactInfo, billingInfo. marginPct is one number per partner (decided in grilling) - CRUD endpoints under @UseGuards(JwtAuthGuard, OperatorGuard) — every partner mutation requires operator scope - GET /partners returns each row with a computed customers count from aggregating Tenant.partnerId. MRR aggregation deferred until Subscription gains a price column - GET /partners/:slug/tenants for the partner detail view - DELETE soft-terminates (status='terminated') — never hard-delete because tenants may still reference the partner Tenant changes: - partnerId?: Types.ObjectId (ref Partner, indexed sparse) added to Tenant schema - UpdateTenantDto accepts partnerId so PATCH can attach/detach - POST /tenants/:slug/suspend and /resume — operator-only via OperatorGuard. PATCH already covers plan/domains/partnerId changes Smoke test: customer-portal session sends POST /api/partners through the portal proxy → 403 "This endpoint requires an operator-scoped token". The positive test (operator-token → 200) waits for O.3 when there's an operator app to mint the right token. apps/portal/server/api/partners/index.post.ts is a temporary verification proxy — delete once the operator portal exists.	2026-05-24 07:08:59 +02:00
Ronni Baslund	22b2583f0b	chore(services): rename services/provisioning -> services/platform-api O.0 prep from OPERATOR-PLAN.md. Mechanical refactor before adding partner management and operator-specific endpoints. The service now owns more than just provisioning orchestration (it'll soon own partners, tenant lifecycle actions, multi-audience JWT validation), so the name 'platform-api' reflects its scope better. What changed: - Directory: services/provisioning/ -> services/platform-api/ - Package: @dezky/provisioning -> @dezky/platform-api - Docker: container_name dezky-provisioning -> dezky-platform-api; compose service key 'provisioning' -> 'platform-api'; volume provisioning_node_modules -> platform_api_node_modules - Portal: PROVISIONING_INTERNAL_URL env var -> PLATFORM_API_INTERNAL_URL, default URL http://provisioning:3001 -> http://platform-api:3001 in all three proxy routes (me.get.ts, tenants/index.post.ts, tenants/[slug]/ reconcile.post.ts), plus NUXT_API_BASE updated - Health endpoint service identifier and main.ts log lines updated to 'dezky-platform-api' - Docs swept: README, CLAUDE.md, SERVICES.md, AUTHENTIK-SETUP.md, NEXT-STEPS.md, TROUBLESHOOTING.md, OPERATOR-PLAN.md, traefik/dynamic.yml What deliberately stays: - Internal module names ProvisioningService / ProvisioningModule (those describe an orchestration sub-concern, not the service's purpose) - Tenant.provisioningStatus / provisioningErrors field names (state per integration, not service name) - File services/platform-api/src/tenants/provisioning.service.ts - 'Hetzner provisioning' references in production-prep docs (infrastructure provisioning, unrelated) Verified end-to-end after rename: /api/me returns 200 with profile + 2 tenants + subscription, /api/tenants/dezky/reconcile returns 200 with Authentik integration still ok. OPERATOR-PLAN.md O.0 checkboxes ticked.	2026-05-24 00:35:01 +02:00

7 Commits