dezky

Author	SHA1	Message	Date
Ronni Baslund	955357a91a	feat(apps): make environment URLs prod-ready (env-driven, not hardcoded .local) ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details The apps were wired for the dev (.local) environment. Drive the base URLs from env so one build serves dev and prod (.eu): - portal nuxt.config: OIDC authorization/token/userinfo/discovery URLs + redirectUri now derive from NUXT_PUBLIC_AUTH_URL / NUXT_PUBLIC_PORTAL_URL (+ PORTAL_OIDC_APP_SLUG); .local defaults keep dev working with no env. - portal sign-out handler: end-session + post-logout URLs env-driven. - portal scheduling page: booking base/host from runtimeConfig.public.bookingUrl (NUXT_PUBLIC_BOOKING_URL). - platform-api: tenant mail domain suffix from PLATFORM_TENANT_DOMAIN (dezky.eu in prod), defaulting to dezky.local. (booking needs no change — its only .local ref is the dev-server allowedHosts.)	2026-06-08 22:18:51 +02:00
Ronni Baslund	98e49bfe34	feat(admin/users): editable member drawer + mailbox & ownership management ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details Rebuild the /admin/users detail drawer from a read-only profile into an editable, Office 365-style panel with four sections: - Username & mail: read-only primary for mailbox users; editable sign-in (Authentik-only) for mailbox-less identities; "Create mailbox" provisions a Stalwart inbox for an external-login admin - Aliases: list/add/remove mailbox aliases (Stalwart), domain-scoped - Role: member/admin toggle with a primary-account lock (owner, mailbox-less bootstrap admin, self) and a last-admin guard - Contact information: display name, first/last name, phone, alternative email — mirrored best-effort to Authentik attributes + mailbox name Ownership transfer: "Make owner" (row menu + drawer) plus an owner-side "Transfer ownership" picker, gated to tenant admins / platform admins so a departed owner can be replaced; promotes the target and demotes the prior owner to admin. Backend (platform-api): contact fields on User; AuthentikClient.updateUser; StalwartClient.setMailboxName; UsersService updateTenantMember, changeMemberPrimaryEmail, list/add/removeMemberAlias, createMailboxForMember, transferOwnership; new DTOs and tenant-member routes. All mutations audited. Portal: Nuxt proxies for the new endpoints + extended TenantUserDoc.	2026-06-07 10:34:53 +02:00
Ronni Baslund	90e8a22de4	feat(scheduling): calendar_failed badge + admin "retry now" action ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details Surface pending/calendar_failed booking states in the admin bookings list with proper status badges (failed shows the last calendar error as a tooltip), and add an operator "Retry now" action. The retry re-drives the same Stalwart calendar write (confirm + attendee email on success); for a terminal calendar_failed booking it re-claims the slot lock atomically first and refuses if the time was taken in the meantime, so a manual retry can never double-book.	2026-06-07 09:39:42 +02:00
Ronni Baslund	35bc7b6c31	chore(infra): production manifests + CI for scheduling apps ci / typecheck (map[dir:apps/booking name:booking]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/portal name:portal]) (push) Has been cancelled Details ci / typecheck (map[dir:apps/website name:website]) (push) Has been cancelled Details ci / typecheck (map[dir:services/platform-api name:platform-api]) (push) Has been cancelled Details ci / test (push) Has been cancelled Details	2026-06-07 09:27:44 +02:00
Ronni Baslund	b2c2650af9	test(scheduling): property-based slot tests + guarded Stalwart integration test	2026-06-07 09:23:16 +02:00
Ronni Baslund	8bbb7881a4	feat(scheduling): tenant scheduling overview/analytics	2026-06-07 09:17:01 +02:00
Ronni Baslund	95cbdc4e3d	feat(scheduling): round-robin team event types	2026-06-07 09:14:08 +02:00
Ronni Baslund	b9b4d56a2d	feat(scheduling): tenant webhooks for booking lifecycle	2026-06-07 09:08:45 +02:00
Ronni Baslund	e33b7f18a3	feat(scheduling): pluggable captcha (Turnstile) on public booking	2026-06-07 09:02:35 +02:00
Ronni Baslund	e1a77b085f	feat(scheduling): optional JWT-authed dezky Meet rooms	2026-06-07 08:58:00 +02:00
Ronni Baslund	f41475ac3b	feat(scheduling): ignoreAllDayEvents option	2026-06-07 08:53:31 +02:00
Ronni Baslund	2cb13a1a14	feat(scheduling): retry calendar writes for pending bookings A failed Stalwart calendar write during confirmation no longer deletes the booking + SlotLock. The booking stays 'pending' with its lock retained, and a new @Cron worker (every 2 min, max 5 attempts by default) re-drives the write: on success it promotes to 'confirmed' and sends the confirmation email; after the cap it moves to the terminal 'calendar_failed' state and releases the lock. Tracks calendarWriteAttempts + lastCalendarError on the Booking. The public confirm endpoint still throws 503 on a failed first write (preserving the DoD: never surface a confirmed booking without a calendar event); the pending row is left for the background retry to finish.	2026-06-07 08:49:53 +02:00
Ronni Baslund	9e1defa946	feat(scheduling): booking reminder emails	2026-06-07 00:31:33 +02:00
Ronni Baslund	5ed3d2bc5f	feat(scheduling): dezky Scheduling — Calendly-style booking on Stalwart calendars First-party booking system on top of Stalwart calendars (no third-party scheduling dependency). Hosts expose public booking pages; visitors pick a slot computed from the host's live Stalwart free/busy, and confirming writes the event to the host's calendar and sends a dezky-branded confirmation with an .ics. platform-api (services/platform-api/src/scheduling): - Schemas: Host, StalwartCredential (AES-256-GCM at rest), AvailabilitySchedule, EventType, Booking, SlotLock (unique (hostId,startUtc) + TTL). - StalwartCalendarModule: JMAP gateway (free/busy via Principal/getAvailability, event create/delete, scheduleAgent=client) + on-behalf app-password provisioning. CredentialCipher for at-rest encryption. - DST-correct slot engine (Luxon) with unit tests; two-layer double-booking guard (atomic SlotLock + live free/busy re-check). - Booking confirm/cancel/reschedule, branded email + .ics via JMAP submission, self-service manage tokens. /api/v1 public + tenant-gated admin routes, per-IP rate limiting. apps/booking: standalone public, whitelabel booking app (booking.dezky.eu) — path-based tenant resolution, per-tenant brand colour, booking + manage flows. apps/portal: admin scheduling page (hosts, event types, availability, bookings with edit/delete + admin cancel/reschedule) and proxy routes. infra: booking dev service in docker-compose; scheduling env vars.	2026-06-07 00:17:36 +02:00
Ronni Baslund	aee8f13899	feat(mail): tenant alias and distribution-list management via Stalwart Customer-admin Mail settings backed by Stalwart JMAP: per-tenant aliases (extra addresses routing to a mailbox) and distribution lists (one address fanning out to many recipients). Adds StalwartClient x:Alias/x:MailingList methods, a tenant-scoped MailController/MailService, the portal Mail settings page and its proxy routes, and the mailboxAddress field on TenantUserDoc. Removes the old mock mail data now that the page reads live data.	2026-06-07 00:16:30 +02:00
Ronni Baslund	47eb9502f8	feat(platform): real email domains, mailboxes & member lifecycle Wire the mail/identity stack to real Stalwart/Authentik/OCIS provisioning, replacing the mocked Domains and Users pages. Domains (customer-admin): - StalwartClient: real JMAP management (v0.16 dropped REST) — create/list/delete email domains via x:Domain at the internal http://stalwart:8080 listener; DKIM auto-generated; the records to publish are read from the domain's dnsZoneFile. Gated by STALWART_PROVISIONING_ENABLED. - New Domain collection + DomainsModule: add/list/recheck/set-DMARC/remove, tenant-membership-gated and audited. - DnsVerifierService: verifies MX/SPF/DKIM/DMARC/ownership against a public resolver (1.1.1.1/8.8.8.8) and diffs them against the expected records. - Remove is guarded: refuses while accounts/aliases/mailing lists still use the domain (via Stalwart referential integrity). - Domains page + add wizard on real data; sidebar badge counts domains needing attention. Users & groups (customer-admin): - Create a member provisioned across Authentik SSO, a Stalwart mailbox on the tenant's primary domain, and OCIS — returning a one-time password. - Lifecycle: suspend/resume (Authentik is_active + freeze the mailbox via account permissions, original password preserved), force-logout (terminate sessions, filtered client-side so it can never end other users' sessions), reset password (new one-time password on SSO + mailbox), and remove (tear down mailbox + SSO identity + OCIS + doc; mailbox-in-use aware for multi-tenant users). Self-suspend / self-force-logout are blocked. Infra: point platform-api at the internal Stalwart listener; document the new STALWART_/provisioning vars in .env.example.	2026-06-01 21:19:42 +02:00
Ronni Baslund	2a43a7bbf3	feat(operator): show per-tenant role in tenant users list GET /tenants/:slug/users now returns a tenant-scoped `tenantRole` (resolved server-side via roleForTenant), and the operator tenant page displays it instead of the global `role` — so a user who is admin here but member elsewhere reads correctly in this tenant's context. The global `role` field is kept intact for other consumers.	2026-05-31 21:31:51 +02:00
Ronni Baslund	f094158334	fix(api): report tenant-scoped role in tenant users list A user who is admin in one tenant but member in another must read 'admin' for this tenant — use roleForTenant() rather than the global u.role fallback when building the tenant users list.	2026-05-31 21:30:08 +02:00
Ronni Baslund	f8618b2bbc	feat(portal): real OCIS storage data via refresh-token service auth The Storage page + endpoint landed earlier but had no working OCIS backend credential. OCIS has no service-account/client-credentials grant and trusts a single issuer, and basic auth resolves no user in our external-IdP setup — so authenticate OcisClient via an OIDC refresh-token bootstrap instead: - One-time headless login of svc-platform-api against the ocis provider (public client ocis-web, issuer .../o/ocis/) yields a refresh token, persisted in Mongo (ocis_credentials) and rotated on every use. - OcisClient mints access tokens with the refresh_token grant; the service user holds the OCIS admin role (OCIS_ADMIN_USER_ID) so libregraph ListAllDrives works. - scripts/bootstrap-ocis.mjs re-runs the bootstrap if the token lapses. - Dashboard Plan card gains a storage capacity bar beside seats; hidden when storage is unavailable. - compose + .env.example: OCIS service OIDC env and admin user id. - docs/NEXT-STEPS: document the mechanism and the dead-end alternatives.	2026-05-31 21:29:17 +02:00
Ronni Baslund	559348f6bc	feat(portal): real Security & audit page (+ bundled Storage / per-tenant-roles WIP) Security & audit (admin) - Audit log: real, tenant-scoped — widened GET /tenants/:slug/audit with q/action/outcome/actorEmail/since/before; UI gains search, outcome + time filters, action chips, cursor pagination, and client-side CSV export. - Security policy: new tenant.securityPolicy (mfaMode, session idle/absolute, allowedCountries, ipAllowlist) + PATCH /tenants/:slug/security-policy (membership-gated, audited). Editable, labelled by enforcement status. - MFA: live enrollment overview via GET /tenants/:slug/mfa-status (Authentik countAuthenticators per member). - SSO apps (Dezky as IdP): real Authentik OIDC provider + application CRUD, scoped to the tenant group. New AuthentikClient methods (provider/app/binding + flow/key/scope discovery), TenantSsoApp schema, TenantSsoService (rollback on partial failure; client secret never stored), GET/POST/DELETE /tenants/:slug/sso-apps. Validated end-to-end against live Authentik. - Deferred: shared-flow MFA/geo/session enforcement (global auth-flow blast radius) — to be done as its own reviewed change. Bundled in-progress work that shares the same files (kept together so the tree stays green): - Storage page: StorageService + GET /tenants/:slug/storage (OCIS-backed), storage.get proxy, storage.vue. - Per-tenant roles: User.tenantRoles + MeProfile.tenantRoles plumbing.	2026-05-31 17:20:36 +02:00
Ronni Baslund	3288fde693	feat(portal): customer-admin surface on real data + Stripe billing + session resilience Access & navigation - Gate partner-mode strictly to partner staff so admins/end-users never inherit leftover partner-view state; purge stale session entry on hydrate. - Role-driven admin entry: useMe.isTenantAdmin, Admin/Personal tiles in the app launcher, and an /admin route guard in the global middleware (fail closed). - Drop the duplicate user identity block from the sidebar footer. Admin pages on real data - New tenant-scoped, membership-gated endpoints: GET /tenants/:slug/{audit,users, invoices}; useTenant composable resolves the active workspace + subscription. - Dashboard: real seats, spend (cycle-normalized + minor-units), plan, renewal, and recent audit; unbacked sections removed. - Users & groups: real members; Groups/Invitations/Service accounts shown as honest "coming soon". - Subscription & invoices: real plan hero, invoice history, and billing details. Stripe payment method (Elements + SetupIntent) - StripeClient: publishable key + getDefaultCard/createSetupIntent/setDefaultCard. - CustomerBillingController + BillingService methods (ensure-customer on demand). - Portal: PaymentMethodModal, useStripeJs (CDN load), proxies; hidePostalCode. Editable billing details & whitelabel branding - PATCH /tenants/:slug/billing-info (narrow: company/VAT/country/email). - TenantBranding schema/service + GET/PUT /tenants/:slug/branding: real product name, accent colour, and per-tenant email-template overrides. - Branding preview + sidebar workspace mark wired to real name/plan/seats/colour with YIQ auto-contrast (readableOn util). Session resilience - Request offline_access so Authentik issues a refresh token (automaticRefresh). - Silent refresh + single retry on 401 for writes (useApiFetch, incl. partner pages) and reads (useMe.fetchMe) — no redirect, no lost input. - Modal backdrop closes only on press+release on the backdrop (no more drag-select-to-close).	2026-05-31 00:19:34 +02:00
Ronni Baslund	db26dafc64	feat(billing): sync catalog price edits to Stripe + re-price live customers Editing a catalog amount now propagates beyond MongoDB. Stripe Prices are immutable, so each changed currency mints a fresh Stripe Price at the new amount, overwrites the cached stripePriceIds[currency] (which also fixes the stale-price bug for new subscriptions), and repoints every live subscription on that (row, currency) onto it with proration_behavior 'none' — the new amount takes effect at the customer's next billing cycle, no mid-cycle charge. The per-seat snapshot is refreshed so MRR reflects the go-forward rate. Before committing the edit, the operator sees a warning with the affected customer count, driven by a new GET /prices/:id/impact endpoint. Per-sub failures are logged, never fatal; Stripe-disabled rows still re-snapshot.	2026-05-30 16:13:15 +02:00
Ronni Baslund	6a7802c870	feat(billing): partner payout-ledger generation (worker + operator trigger) Add BillingService.generatePayouts: idempotent per-partner/month/currency snapshot of gross MRR x marginPct into Payout rows (never rewrites a paid row), plus platformPayouts(). A PayoutWorker generates the current month daily (and on boot; PAYOUTS_AUTOGEN=false to disable). Operator endpoints GET /billing/payouts + POST /billing/payouts/generate, an operator payouts ledger table with a Generate button, and the proxy routes. The partner Payouts tab now shows real data.	2026-05-30 14:40:01 +02:00
Ronni Baslund	69197e11ae	feat(billing): provision Stripe customer + subscription on tenant create Wire the Stripe lifecycle into TenantsService (best-effort, gated on stripe.enabled): on create open a Stripe customer and, for a priced plan with seats >= 1, lazily create a Stripe Product+Price (persisted to Price.stripePriceIds[currency]) and a send_invoice subscription; mirror seat changes to the subscription quantity; pause/resume on suspend/resume; cancel on delete. A Stripe failure never blocks the tenant — the local Subscription stays the source of truth for derived MRR.	2026-05-30 08:29:34 +02:00
Ronni Baslund	0e1d2fb0d1	feat(billing): Stripe-backed billing engine (dark-launched) Add a lazy/guarded Stripe client (boots without keys), Invoice/Payout schemas, per-currency Price.stripePriceIds, and a BillingService deriving partner/platform summaries, invoices and a partner-cut payout ledger. Partner and operator billing controllers plus a signature-verified Stripe webhook (Fastify raw body). Frontend: partner and operator billing pages and the operator tenant billing/audit tabs on real data. Gated behind new_billing_engine and BILLING_STRIPE_ENABLED; live money paths stay off until keys are set.	2026-05-30 08:03:23 +02:00
Ronni Baslund	6370e392cc	feat(reports): partner and platform analytics Partner reports — health cohorts, revenue-by-plan, top customers, signup/churn cohorts, plus saved custom reports (create/list/delete). Operator platform-wide reports (MRR, revenue by plan, top tenants, growth). Replaces the reports fixtures in both apps.	2026-05-30 08:03:14 +02:00
Ronni Baslund	89691626f4	feat: partner enrichment, mutations, settings & branding + operator quick-wins Backend (platform-api): computed tenant health plus industry/brandColor; partner-scoped tenant update/suspend/resume guarded by assertPartnerOwnsTenant; enriched partner users (MFA + access level) with invite/remove; partner settings and whitelabel branding persistence; Authentik authenticator counting and group removal. Audit on every mutation. Frontend (portal): all five partner pages on real data — dashboard alerts, customers edit/suspend, team MFA/access with invite/remove, editable settings, branding fetch/save. Operator: dashboard and infrastructure service health driven by real liveness probes; fabricated uptime/p95/error-rate removed.	2026-05-30 08:03:07 +02:00
Ronni Baslund	0bd4e5498e	feat: portal redesign, pricing catalog, partner-staff invites - portal: new admin/ and partner/ surfaces with full component library (AppLauncher, Avatar, Badge, Card, Modal, Tabs, etc.), composables, layouts, partner-routing middleware, and supporting server APIs - pricing: Price schema/module with operator CRUD, pricing.vue catalog UI, Subscription extended with cycle/currency/perSeatAmount/seats snapshots for stable MRR aggregation - partner staff: User.partnerId, invite-partner-user DTO and flow, /partners/:slug/users endpoints, InvitePartnerUserModal, shared dezky-partner-staff Authentik group - /me: partner-aware endpoint returning user + partner context so portal can route between end-user and partner-admin surfaces - tenant: seats field for portfolio displays and future MRR calculations - operator: pricing page, signed-out page, useMe/useToast composables, ToastStack	2026-05-28 20:00:33 +02:00
Ronni Baslund	be273ea5f4	fix(partners): allow empty-string email fields on partial updates @IsOptional() only skips validation for null/undefined; an empty string still trips @IsEmail(). When the operator UI saves billingInfo with a blank contactEmail, the request 400'd with 'must be an email'. Coerce '' → undefined on every @IsEmail-decorated field via a shared @Transform so blank inputs round-trip cleanly.	2026-05-24 22:23:27 +02:00
Ronni Baslund	0e0cf8d90b	feat(audit): record before/after diff for partner updates partner.updated events previously recorded only the field names that changed (metadata.changes). Now they record metadata.diff — a { field: { from, to } } map — by reading the partner before the findOneAndUpdate and comparing serialized values. Only fields that actually differ make it into the diff, so a save-without-changes records an empty diff instead of every DTO key. The operator audit row's expanded panel renders the diff as a small inline table (field · from → to). Older audit rows that still carry metadata.changes fall back to the original chip layout so historical events stay readable.	2026-05-24 22:20:50 +02:00
Ronni Baslund	4a1a4ddad5	feat(operator): inline edit mode on /partners/[slug] Toggle the partner detail cards from read-only to editable in place. Edit button in the PageHeader flips to Cancel + Save changes; cards expose text inputs for name/domain/contact/billing, a 4-option segmented control for status, and a 0–100 range slider for marginPct. Save sends a PATCH diff (only fields that actually changed), refreshes the page data, and exits edit mode. Cancel with unsaved changes confirms first. Also tightens audit metadata: previously `Object.keys(dto)` on the ValidationPipe-instantiated DTO listed every @IsOptional() field, even when the request body didn't touch them. The partner.updated audit event now records only the keys the operator actually sent.	2026-05-24 22:05:31 +02:00
Ronni Baslund	0299328175	feat(authentik): auto-wire recovery flow on bootstrap + expire fallback temp passwords Two related fixes that together close the "no recovery flow" gap behind the invite-operator feature. 1. SeedService now provisions an Authentik recovery flow on every boot. Without this, /core/users/{pk}/recovery/ returns 400 "No recovery flow set." and our invite endpoint silently falls back to setting a plaintext temp password — operationally fine in dev but not appropriate for prod. ensureRecoveryFlow() (in seed.service.ts): - Check if a flow with designation='recovery' already exists → no-op - Otherwise create one with slug='default-dezky-recovery' (designation='recovery', authentication='none' so the link token is the only auth needed) - Bind three default Authentik stages to it in order: 10: default-authentication-identification (auto-skipped when the recovery token already pins a user; lets the flow also work for self-service "forgot password" entry) 20: default-password-change-prompt 30: default-password-change-write - PATCH the default brand's flow_recovery to point at the new flow - Wrapped in .catch(warn) so an Authentik blip during boot doesn't crash platform-api — next restart retries. AuthentikClient additions: - findRecoveryFlow(), getDefaultBrand(), findStageByName(), createFlow(), bindStageToFlow(), setBrandRecoveryFlow(). IntegrationsModule pulled into SeedModule so SeedService can use AuthentikClient. 2. Temp-password fallback path now marks the password expired so Authentik forces a change on next login. Closes the window where an operator's plaintext share could outlive the new user's first session. AuthentikClient.markPasswordExpired(userPk): - GET user → merge attributes.passwordExpired=true + passwordExpiredAt=now → PATCH back - Read-modify-write because Authentik PATCH replaces nested objects and we don't want to clobber other attributes UsersService.inviteOperator() calls it on the fallback branch only — the recovery-link path doesn't need it (clicking the link sets a fresh password through the flow anyway). Verified end-to-end: - Boot → recovery flow auto-provisioned with three correctly-ordered stage bindings, default brand patched to flow_recovery=<new pk>. - Re-invite test user → modal now shows a single recovery link starting with https://auth.dezky.local/if/flow/default-dezky- recovery/?flow_token=... (no temp password fallback). - Operator-team list still updates to include the new user immediately via the pre-created local User doc. Known follow-ups: - Enforce MFA enrollment in the recovery flow (add an authenticator stage). Deferred — locks users out if they lose the second factor on day one. Better to fire MFA from a separate "MFA required" stage on subsequent logins for platform admins. - Outbound SMTP (Phase 5/6) so Authentik emails the recovery link directly and the modal hides it.	2026-05-24 21:46:35 +02:00
Ronni Baslund	9a97945565	feat(operator): invite operator → creates user in Authentik New "Invite operator" button + modal on /operator-team. Replaces the bounce-to-Authentik flow with an inline invite that creates the user via the Authentik API and pre-populates our local User doc so they appear immediately. services/platform-api/src/integrations/authentik.client.ts: - findUserByEmail(): early-conflict check before we attempt the create - createUser(): POST /core/users/ with username = email, internal type, is_active, attached to the supplied group PKs - addUserToGroup(): kept for tenant-member invites later - recoveryLink(): tries POST /core/users/{pk}/recovery/, returns undefined when no recovery flow is configured on the Authentik brand (we soft-fail and the service falls back to setInitialPassword) - setInitialPassword(): POST /core/users/{pk}/set_password/. Returns 204 No Content so we bypass request<T>'s JSON parser and call fetch directly with explicit ok check. services/platform-api/src/users/users.service.ts: - inviteOperator(dto, actor) orchestrates: dedup by email → findOrCreate Authentik group → create user in group → pre-create local User doc with platformAdmin=true so the list reflects them immediately → try recovery link → fall back to temp password → record platform.user_invited audit event with handoff method. - Return type is { subject, userId, link? \| tempPassword? } — exactly one credential mode set depending on Authentik config. - generateTempPassword(): 16-char with at least one upper/lower/digit/ symbol, shuffled. Confusable chars (I/O/0/1/l) omitted. - Cached platform-admin group ID after first lookup. services/platform-api/src/users/users.controller.ts: - POST /users/invite behind OperatorGuard. Calls the service with actor + IP from the JWT/request. apps/operator: - server/api/users/invite.post.ts: standard platformApi proxy. - components/InviteOperatorModal.vue: 2-step form. Step 1: name + email with client-side validation. Step 2: shows whichever credential the backend returned — recovery link OR username+ temp-password — with copy-to-clipboard buttons and a note about SMTP/recovery-flow follow-up paths. - pages/operator-team.vue: "Invite operator" replaces "Manage in Authentik" as the primary action; Authentik link demoted to secondary. Refreshes the list on @invited so the new user shows up without a manual reload. Verified end-to-end against real Authentik: - Invite created user pk=7, uid=f22f2bb…, group=dezky-platform-admins, is_active=true, temp password set. Modal showed both fields with copy buttons; operator-team count went 1 → 2 immediately. Audit event recorded (platform.user_invited with handoff='temp-password'). - Recovery link path is preferred but Authentik has no recovery flow configured on the default brand. AuthentikClient.recoveryLink() soft-fails on the "No recovery flow set." 400, returns undefined, and inviteOperator transparently falls back to set_password. Once a recovery flow is configured (Authentik admin → Flows), the link path becomes active and the temp-password path stops firing without any code changes. Known follow-ups: - Configure Authentik recovery flow so the link path activates (one-time admin task, not in code) - Outbound SMTP wiring (Phase 5/6) → Authentik can email link/temp directly; modal stops showing the credential - Deactivate / remove operator from inside the app (currently still Authentik UI; defensible until proven needed) - Tenant-member invite — similar flow but adds to tenant group instead, exposed from /users (global users) or tenant detail	2026-05-24 21:27:46 +02:00
Ronni Baslund	4d9e906ec1	feat(audit): cold-storage archival to S3 (Phase 4) Final piece of the audit work. Events older than the hot retention window move to S3-compatible object storage with signed manifests. Production uses Hetzner Object Storage; dev uses a MinIO container with the same API. Infra (infrastructure/docker-compose): - New `minio` service exposing the S3 API at minio:9000 + admin console at minio.dezky.local. Healthchecked. Bucket-init sidecar runs `mc mb` once to create `dezky-audit`; safe to re-run. - .env adds MINIO_ROOT_USER + MINIO_ROOT_PASSWORD. - platform-api env: AUDIT_COLD_{ENDPOINT,REGION,BUCKET,ACCESS_KEY,SECRET_KEY} + AUDIT_HOT_RETENTION_DAYS=90 + ARCHIVE_ENABLED=false (dormant in dev; operator UI's "Run archive now" bypasses this gate). AUDIT_COLD_SSE opts into SSE-S3 — left unset in dev because MinIO without a KMS rejects AES256 PUTs with "KMS is not configured". Platform-api (services/platform-api/src/cold/): - cold-storage.client.ts: thin @aws-sdk/client-s3 wrapper — put/head/list. forcePathStyle=true so MinIO and Hetzner both work; same code, env-swap. - archive.service.ts: runOnce() selects chained events with at < cutoff → serializes to JSONL → gzip → sha256s → uploads JSONL + signed manifest → HEAD-confirms both objects exist → records an ArchiveBatch doc → only then deletes from hot Mongo. Crash-safe: a failed upload leaves events in hot. Manifest uses the Phase 3 AUDIT_SIGNING_KEY (HMAC-SHA-256), so archives + checkpoints share trust chain. Bypassable via { override: true } for the operator's UI force-run. - archive.worker.ts: hourly tick guarded by configured run-hour-UTC (default 03:00) + day-guard so the same UTC day doesn't archive twice. Disabled until ARCHIVE_ENABLED=true. - archive-batch.schema.ts: { archivedAt, startSeq, endSeq, eventCount, manifestSha256, jsonlKey, manifestKey, bytesUncompressed }. The manifest sha256 stored in Mongo lets us detect manifest tampering without downloading the actual manifest. Audit module additions: - audit.controller.ts: GET /audit/archives, POST /audit/archive/run, /audit/verify now reports { oldestHotSeq, highestArchivedSeq } so the UI shows the tier boundary. Operator UI (apps/operator): - 2 new proxies: /api/audit/archives + /api/audit/archive/run (force override=true). Both behind operator auth via the existing platformApi helper. - audit.vue: new "Cold storage" card with batch table (archived-at, seq range, event count, size, truncated manifest sha256), "Run archive now" button + per-run result line. Smoke-tested end-to-end: - 7 chained events in hot. /api/audit/archive/run → ok=true, batchId returned. JSONL + manifest both exist in MinIO (verified via mc ls + mc cat). Mongo's chained set went 7 → 0. Verify reports highestArchivedSeq=1446 (since we burn-allocate seqs on Authentik dup-key rejections). Operator /audit panel shows the batch with manifest hash 1d8263… - First attempt with SSE-S3 enabled failed cleanly (MinIO KMS not configured) — archive service correctly left events in hot Mongo. Made SSE opt-in via AUDIT_COLD_SSE=true; prod turns it on. Out of scope (each could be its own session): - Restore-to-hot endpoint (today: download from S3 + offline query) - Client-side encryption (today: SSE-S3 in prod, none in dev) - Multi-region replication - Soft TTL safety net (defense-in-depth on top of app-managed deletion) This completes the four-phase audit log work: 1. platform-api as audit hub 2. External system ingest (Authentik / Stalwart / OCIS) 3. Hash-chain + signed checkpoints (tamper evidence) 4. Cold-storage archival (retention without unbounded Mongo growth)	2026-05-24 21:03:41 +02:00
Ronni Baslund	9435baa09d	feat(audit): hash-chain tamper evidence + signed checkpoints (Phase 3) The audit log now carries cryptographic chain-of-custody. Every chained event references the previous event's sha256, and periodic checkpoints sign the head with HMAC-SHA-256. An attacker who modifies a historical row must also forge every checkpoint signature past it — which requires the AUDIT_SIGNING_KEY, kept outside Mongo. Schema (services/platform-api/src/schemas/): - audit-event.schema.ts: new `seq` (monotonic) + `chained` (Phase-3-or- later flag) + `prevHash` + `hash`. Compound unique index on seq with partial filter so pre-Phase-3 rows don't collide on null. - audit-counter.schema.ts: single doc `_id='audit_seq'`, incremented atomically by findOneAndUpdate($inc). - audit-checkpoint.schema.ts: { at, headSeq, headHash, signature, sigAlg, reason }. Reason ∈ {startup, interval, threshold, manual}. Audit module (services/platform-api/src/audit/): - canonical.ts: stable JSON form + hashCanonical (sha256) + checkpointSignature (HMAC-SHA-256) + verifyCheckpointSignature (timingSafeEqual). Single source of truth for hash inputs — schema additions land here at the same time as the field. - audit.service.ts: record() now allocates seq → looks up lastHash() → computes hash → inserts. Per-process write mutex serializes the allocate+lookup so concurrent writers don't both chain off the same predecessor. Documented multi-instance caveat (needs Mongo replica set + transactions OR a distributed lock). - checkpoint.service.ts: scheduler triggers on startup + every 5min + threshold of 100 events accumulated. Skips when no new chained events since the last anchor. - verifier.service.ts: walks chain in seq order, recomputes each hash, validates checkpoint signatures. Returns a precise break: 'event-hash-mismatch' (in-place modification), 'event-prev-hash- mismatch' (insertion/deletion), or 'checkpoint-signature-mismatch'. - audit.controller.ts: GET /audit/verify, GET /audit/checkpoint/latest, POST /audit/checkpoint (manual force). Operator UI (apps/operator/): - 3 new proxies under /api/audit/{verify, checkpoint/latest, checkpoint}. - pages/audit.vue: new "Tamper evidence" card with "Force checkpoint" + "Verify chain" buttons. Header shows live head seq; result line shows verified count or a precise break (kind + seq + expected vs actual hash). Background tinted green/red on ok/broken. Env (.env + docker-compose.yml): - new AUDIT_SIGNING_KEY (32-byte hex HMAC secret). Prod swaps this for ed25519 from an HSM/KMS; verifier code stays the same because sigAlg is on the checkpoint doc. Smoke-tested all three break paths against a clean chain of 5 events: - normal verify: ok=true, 5/5 events verified, 1 checkpoint signed - modified seq=3 in Mongo directly: verify returns ok=false with break = { kind: 'event-hash-mismatch', seq: 3, expected, actual } - restored, nuked checkpoint signature: break = { kind: 'checkpoint-signature-mismatch', headSeq: 5 } - operator UI's verify panel reflects all three states correctly. Legacy data: pre-Phase-3 events stay `chained: false` and are excluded from the chain walk. Retroactive chaining of historical entries is a one-off migration script we can run if we ever care to. Out of scope (Phase 4 etc.): - TTL + cold-storage archival to Hetzner Object Storage - GDPR right-to-erasure tooling - ed25519 / HSM signing (swap is well-defined; sigAlg field is ready) - Multi-instance write coordination (Mongo transaction OR distributed lock when we scale platform-api beyond 1 replica)	2026-05-24 20:43:54 +02:00
Ronni Baslund	df18128617	feat(audit): OCIS file-tail ingest worker (Phase 2 chunk 3) Tails OCIS's JSON-Lines audit log on a shared Docker volume and forwards mutations into AuditService. Final piece of Phase 2 — the /audit page now unifies platform-api, authentik, and ocis events on one timeline. services/platform-api/src/ingest/ocis.ingest.ts: - 5s polling loop (fs.watch is unreliable across Docker bind mounts on macOS). Stat → detect inode change or truncation → resume from byte position OR start over. - Cursor in IngestCursor stores lastEventId = "<inode>:<bytePosition>". Restarts resume cleanly; on overlap the (source, externalId) unique index dedups silently. - Lines collected first, then processed sequentially after the read stream closes. Earlier draft fired recordOne() from inside the readline 'line' callback which would have resolved the stream before all writes finished — same class of race we hit in the Authentik worker, fixed before commit. - Tenant inference: spaceName (set during provisioning to the slug) first, then User.authentikSubjectId → tenantIds → Tenant.slug. - Mutations only: OCIS_ALLOWLIST in action-map.ts whitelists 24 event types (User/Group/Space/Share/Link/File mutations). FileDownloaded, UserSignedIn, and the rest of the high-volume read traffic gets skipped — keeps the timeline scannable. services/platform-api/src/ingest/action-map.ts: - mapOcisAction() + OCIS_ALLOWLIST. Returns null for non-whitelisted types so the worker filters early. infrastructure/docker-compose/docker-compose.yml: - New named volume `ocis_audit_log` mounted writeable on the ocis container and read-only on platform-api. - OCIS env: OCIS_ADD_RUN_SERVICES=audit (the audit microservice is NOT in the default `ocis server` set — opt in explicitly), AUDIT_LOG_FILE_PATH=/var/log/ocis/audit.log, AUDIT_LOG_FORMAT=json. - platform-api env: OCIS_AUDIT_LOG_PATH points at the same file. Verified end-to-end with synthetic events written to the audit log: - Worker tailed 5 events across initial read + incremental append (5 → bytes 0:1276, then 1 → bytes 1276:1519). - FileDownloaded correctly filtered by the allowlist (4 mutations landed in Mongo, not 5). - Tenant inference: events with executingUser.id resolved to `dezky` via User → tenantIds → Tenant.slug. - Operator /audit shows all three sources (89 events: 79 authentik + 5 platform-api + 5 ocis) in one unified timeline. Known unknown — same shape as the Stalwart commit: I couldn't fully confirm the OCIS v7 audit microservice emits events with just OCIS_ADD_RUN_SERVICES=audit + the AUDIT_LOG_FILE_PATH env. The audit service starts but the file stays empty until OCIS internals start publishing events to NATS (which may need additional service-side config). The ingest worker is correct regardless — when OCIS starts writing real events, they'll flow into /audit. This is a follow-up in the OCIS-side configuration, not in our ingest code.	2026-05-24 20:30:47 +02:00
Ronni Baslund	7bec940e7f	feat(audit): Stalwart webhook ingest endpoint (Phase 2 chunk 2) Push-based ingest for mail-server events. Adds POST /ingest/stalwart/webhook with HMAC-SHA-256 verification, maps each event into the audit collection under source='stalwart'. services/platform-api/src/ingest/stalwart-webhook.controller.ts: - Public endpoint (no JwtAuthGuard — Stalwart can't carry a JWT). Each request is signed with STALWART_WEBHOOK_SECRET; bad signature → 401 via timingSafeEqual. - Body: { events: [{ id, type, createdAt, data }, ... ] }. Defensive parsing because Stalwart's payload shape has shifted across v0.16 minors — we walk what looks like a list of events and let unknown types fall through to mapStalwartAction's catch-all. - Per-event recordOne: action via mapStalwartAction(), actor from data.email/account/username, IP from data.ip or X-Forwarded-For, targetName from data.account/email/address/to, full payload kept in metadata. externalId = evt.id so the (source, externalId) unique index dedups re-deliveries. action-map.ts: 14 known Stalwart event types → stalwart.{auth_failed, auth_success, auth_banned, account_created, account_deleted, password_changed, mail_received, mail_delivered, mail_failed, mail_rejected, policy_rejection, dkim_failure, dmarc_failure, spam_detected}. Snake/kebab forms normalized. infrastructure/docker-compose: - .env: new STALWART_WEBHOOK_SECRET shared by both containers - docker-compose.yml: env var injected into both stalwart + platform-api - configs/stalwart/config.toml: [webhook."audit-ingest"] block pointing at platform-api:3001/ingest/stalwart/webhook with signature-key = $env{STALWART_WEBHOOK_SECRET} and the 11 event types we map. Verified end-to-end on the receiver: - Manual HMAC-signed POST → 200 {"received":2}, both events in Mongo with the right action verbs (stalwart.auth_failed, stalwart.account_created), actor/IP/externalId populated. - Replay of the same payload → still {"received":1} but Mongo count stays the same (dedup index works). - X-Signature: deadbeef → 401, no row written. Known unknown: I couldn't fully confirm Stalwart v0.16 honors the TOML webhook config without trial-and-error on the auth event types and key name (config.toml uses signature-key; some Stalwart builds want plain 'key'). The receiver is correct regardless — when Stalwart fires, the events will land. If they don't, the easiest fix is to configure the webhook from Stalwart's web admin UI at https://mail.dezky.local instead of via TOML.	2026-05-24 20:21:29 +02:00
Ronni Baslund	b1d717e466	feat(audit): Authentik events ingest worker (Phase 2 chunk 1) Background worker that pulls Authentik's /api/v3/events/events/ on a 60s cadence and writes each event into our audit log via AuditService. External system events now share the same /audit timeline as internally-recorded platform mutations — operator queries don't have to cross-reference Authentik's own UI to see logins, password changes, group membership, impersonation, etc. Pieces: - src/schemas/ingest-cursor.schema.ts: one row per source, tracks lastEventAt + lastEventId so restarts resume without re-pulling. - src/schemas/audit-event.schema.ts: new `externalId` field; new compound unique index on (source, externalId) with a partial filter on externalId being a string. Partial (not sparse) so internally- recorded events with externalId=null don't collide. - src/audit/audit.service.ts: AuditRecordInput grows `externalId` + `at` fields. record() now silently swallows MongoError code 11000 (duplicate key) so re-pulling the cursor overlap doesn't log noise. - src/integrations/authentik.client.ts: listEvents(since, page, pageSize) on the existing client — reuses the admin token and base URL the provisioning code already configured. - src/ingest/action-map.ts: 16 known Authentik actions → dotted authentik.* verbs (login, login_failed, password_changed, impersonation_started, …). Unknown actions fall through to authentik.<raw> rather than getting silently dropped. - src/ingest/authentik.ingest.ts: OnApplicationBootstrap worker. Reads cursor → pulls events with created__gt=cursor, ordering=created ASC → paginates forward (10 pages × 100/page safety cap per tick) → writes each event with source='authentik' + externalId=pk + at= evt.created → advances cursor to the newest seen. inFlight guard prevents overlapping ticks. AUDIT_INGEST_ENABLED=false disables for test environments. - Tenant inference: from the user's groups (same convention the portal flag-eval proxy uses). Admin groups stripped; first match against a real Tenant.slug wins. Unmatched → tenantSlug undefined, event still lands in the global timeline. Smoke-tested: fresh Mongo + restart → 78 Authentik events ingested, 0 duplicates. Performed a login at app.dezky.local → next 60s tick captured the new login row with actor email + IP. Compound unique index on (source, externalId) verified to reject re-pulled events silently (no error logs). Out of scope here (covered by chunks 2 + 3): - Stalwart webhook ingest - OCIS file-tail ingest	2026-05-24 20:12:21 +02:00
Ronni Baslund	02341d8ba5	feat(audit): platform-api audit log + operator UI wired to real events Phase 1 of the audit work — capture everything we control today, ingest from external systems (Authentik / OCIS / Stalwart) in a later phase. The mock OP_AUDIT fixture is gone; both the /audit page and Overview's activity card now show real events recorded by AuditService.record() in platform-api. Schema (services/platform-api/src/schemas/audit-event.schema.ts): AuditEvent { at, actorType, actorId, actorEmail, actorIp, action, outcome, resourceType, resourceId, resourceName, tenantSlug, partnerSlug, source, metadata, prevHash, hash } Indexes: {at:-1}, {tenantSlug,at:-1}, {actorId,at:-1}, {action,at:-1}. prevHash/hash are nullable now; hash-chain tamper evidence is a later phase. AuditService: - record() — best-effort write, swallows errors so the underlying mutation that succeeded isn't failed by a downstream log issue. Surfaces failures via Logger. - list() — filters: since/until/before, action (exact OR prefix match via leading-anchor regex), tenantSlug, partnerSlug, actorEmail, outcome, free-text q across action/resourceName/actorEmail/tenantSlug, limit (default 100, max 500). Cursor pagination via `before`. - No UPDATE/DELETE surface — entries are append-only by construction. AuditController: GET /audit, behind JwtAuthGuard + OperatorGuard. No mutations exposed; entries written internally by other modules. X-Forwarded-For threading: - apps/operator/server/utils/platform-api.ts forwards the originating client IP to platform-api so audit entries carry a real address. - services/platform-api/src/auth/client-ip.ts extracts leftmost X-Forwarded-For, falls back to socket.remoteAddress. Instrumented mutations (every one threads actor + IP through): Tenants: create, update, softDelete, setStatus(suspend/resume) Partners: create, update, terminate Flags: create, update (incl. flag.killed verb when state=off+note=kill-switch), remove Users: deactivate Each controller resolves the User doc via ActorService, extracts IP via clientIp(req), and passes { userId, email, ip } as AuditActor to the service. FlagsService's local ActorRef collapses to AuditActor so flag history and the audit log share one shape. Operator UI: - /api/audit proxy that forwards query params verbatim - types/audit.ts - pages/audit.vue: real list with quick-pick action chips (All/Tenants/ Partners/Flags/Users), outcome filter, free-text search, "Load older events" cursor pagination - pages/index.vue: Overview activity card swaps mock OP_AUDIT for the same /api/audit endpoint, rows link into /audit - data/fixtures.ts: OP_AUDIT / AuditEntry / AuditTone exports removed Verified end-to-end: suspended + resumed acme, flipped oci_versioning through rollout → kill → on, then /audit returned all 5 events with the right action verbs (tenant.suspended, tenant.resumed, flag.updated, flag.killed, flag.updated), actor admin@dezky.local, IP 192.168.65.1. Filters (action prefix + free-text q) narrow correctly. Out of scope for this commit (each gets its own conversation): - Authentik / OCIS / Stalwart ingest adapters (Phase 2) - Hash-chain tamper evidence (Phase 3) - TTL + cold-storage archival to Hetzner Object Storage (Phase 4) - GDPR right-to-erasure tooling	2026-05-24 19:50:24 +02:00
Ronni Baslund	868a305539	feat(flags): real feature-flag system with bulk eval + operator UI Real backend for the flags page (was pure mock). Built so it's ready for the first risky rollout (likely the Stalwart JMAP client or the Stripe billing engine). services/platform-api: - Flag schema (key, description, state, pct, scope.{plans, tenantSlugs, partnerSlugs, environments}, embedded history capped at 20) - FlagsService with CRUD + evaluateAll(tenantSlug) → { key: bool } Eval algorithm: off → false; on → true targeted → require non-empty scope (empty allowlist means "nobody"), then match every non-empty axis rollout → match scope, then sha256(`${tenantId}:${key}`) % 100 < pct Hash-based rollout is deterministic: bumping pct only flips the new slice. Pure helpers (matchesScope, hasAnyScope, inRolloutBucket) are exported for future unit tests. - FlagsController exposes GET /flags, GET /flags/:key, POST /flags/evaluate (JwtAuthGuard); POST/PATCH/DELETE require OperatorGuard. History entries capture the actor's email. - SeedService idempotently creates 10 flag keys mapping to real Dezky concerns (jmap_native_v2, gdpr_export_v2, new_billing_engine, etc.). $setOnInsert so operator edits survive restarts. apps/operator: - 6 proxies: /api/flags index get/post, [key] get/patch/delete, evaluate post - types/flag.ts with the shape that mirrors the backend - pages/flags.vue: useFetch real list, row click opens FlagDetail, "New flag" opens NewFlagModal, scope summary column shows targeting at a glance - FlagDetail.vue: side panel with segmented state, rollout slider with live "~N of M tenants" preview from /api/tenants, plan/tenant/env chip pickers, dirty-tracked Save, instant Kill-switch (PATCH state=off+pct=0), embedded change history - NewFlagModal.vue: minimal create form (key + description). Everything else is configured in the detail panel afterward. - CommandPalette: feature-flag rows now come from /api/flags instead of the dropped fixture, so newly-created flags are searchable immediately - data/fixtures.ts: drop FLAGS / FeatureFlag exports (replaced by the real backend) Smoke-tested end-to-end: list renders 10 seed flags, opening gdpr_export_v2 and flipping to rollout 25% then saving persists + adds a history entry, kill-switch sets state=off in one click, /api/flags/evaluate returns the correct booleans for the seeded tenant, same tenant gets the same answer on consecutive evals (determinism), and creating + deleting a flag through the UI roundtrips correctly.	2026-05-24 19:21:15 +02:00
Ronni Baslund	77a09aaf77	feat(operator): live Infrastructure probes + honest split between deployed and planned The Infrastructure page used to read from a mock fixture that lied two ways: it listed services that aren't deployed (Jitsi, Zulip, Cloudflare, Object Storage, Postmark) and showed hardcoded uptime/latency for the ones that are. Now it shows truth from real probes plus a clearly-labelled "planned" section for the rest. Backend (services/platform-api): - New src/health/ module — HealthService runs 9 probes in parallel with a 1.5s timeout each: Stalwart → TCP stalwart:8080 OCIS → HTTP GET ocis:9200/health Collabora → HTTP GET collabora:9980/hosting/discovery Authentik → HTTP GET authentik-server:9000/-/health/ready/ Postgres → TCP postgres:5432 Mongo → existing Mongoose connection.db.admin().ping() Redis → TCP redis:6379 Traefik → TCP traefik:80 Platform API → trivially ok (this code is running) Status thresholds: ok ≤500ms, warn 500–1500ms, bad on timeout/refuse. - HealthController exposes GET /health/platform behind JwtAuthGuard, plus keeps the existing public GET /health for infra liveness checks. - Moved the old src/health.controller.ts into the new module. Frontend (apps/operator): - /api/health/platform proxy forwards the operator's access token. - Infrastructure page swaps SERVICES fixture for useFetch with 30s auto- refresh + a manual Refresh button. Cards show real status badge + real latency; uptime/error stay as em-dash with a "no probe history yet" tooltip until a Prometheus/event-log backend lands. - Below the live grid, a "Planned · not deployed" section renders 5 dimmed cards (Jitsi, Zulip, simpledns.plus, Hetzner Object Storage, Postmark). simpledns.plus replaces the misnamed Cloudflare entry — we use simpledns.plus, not Cloudflare. - Subtitle is now truthful: "8 / 9 services live · checked 2s ago". Verified: stopped redis → card flipped to "down · getaddrinfo ENOTFOUND redis", subtitle reflected 8/9, incident banner appeared. Restarted → back to 9/9, banner gone. SERVICES fixture stays in place for Overview's incident banner — replacing that is a separate follow-up tied to the incident-management backend.	2026-05-24 18:47:38 +02:00
Ronni Baslund	fbbb43e3e2	feat(operator): partner management with attach/detach (O.6) - Partners list with name/domain/status/customers/margin + Create modal - Partner detail: contract card, contact card, customers table, attach modal, terminate (soft-delete) danger card - Operator proxies for /partners + /partners/:slug/tenants - platform-api: add partnerId Prop to Tenant schema. The field was being silently dropped by Mongoose because the schema didn't declare it. - tenants.service: rewrite update() to build $set/$unset explicitly and cast partnerId via new Types.ObjectId(). Handles detach via $unset so the field vanishes from the doc cleanly.	2026-05-24 08:02:00 +02:00
Ronni Baslund	8e81730372	feat(operator): tenant list + 7-tab detail with real lifecycle (O.5) Operator can now manage tenants end-to-end from the UI: - pages/tenants/index.vue — list with status/plan/domains/created/ provisioning-state columns, search by slug or name, status chips with live counts (all/active/pending/suspended), click-through to detail - pages/tenants/[slug].vue — 7-tab detail (Overview, Users, Resources, Billing, Audit, Support, Danger zone) - 3 tabs hit real backends: Overview (identity + billing fields), Users (lazy-loaded via new GET /tenants/:slug/users endpoint), Resources (live provisioning state per integration + Reconcile button) - 3 tabs render mock fixtures with warn-tone "mock" badges: Billing (Stripe placeholder), Audit (sample log lines), Support (placeholder pending the ticket queue work) - Danger zone: 3 real-backend cards (Suspend / Resume / Soft-delete), each gated by a ConfirmDialog modal. Verified live — clicked Suspend on acme, status flipped to 'suspended' in Mongo, then Resumed back to 'active' platform-api additions: - GET /tenants/:slug/users returns users with this tenant in their tenantIds, sorted by last login. Same authorization rule as the existing /tenants/:slug — platform admins always pass, non-admins must be a member of the tenant - tenants.module imports User schema for the new lookup New components (apps/operator/components/): - Tabs.vue — horizontal strip with optional per-tab counts, v-model - ConfirmDialog.vue — Teleport-to-body modal, Escape/backdrop close, danger/primary tone for the confirm button Server proxy infrastructure (apps/operator/server/): - utils/platform-api.ts — single helper encapsulating access-token-from-session + bearer-forward + error normalization. Every operator proxy route is now a one-liner against this helper - api/tenants/index.get.ts, [slug]/{index.get,index.patch,index.delete, users.get,suspend.post,resume.post,reconcile.post}.ts Two real bugs found and fixed during the smoke test: - Mongoose subdocument `_id` leaks into JSON when iterating tenant.provisioningStatus. Switched to an explicit `['authentik', 'stalwart', 'ocis']` whitelist in both v-fors - Documents created before provisioningErrors was added (like the acme tenant) don't have the field at all in JSON. Use optional chaining (`tenant.provisioningErrors?.[k]`) instead of bracket access. Without it: 'Cannot read properties of undefined (reading "authentik")' during the Resources tab render	2026-05-24 07:44:23 +02:00
Ronni Baslund	55b1c133e3	feat(operator): scaffold apps/operator Nuxt app + multi-issuer JWT (O.3) New Nuxt 3 app at apps/operator/ — internal admin portal on its own domain (operator.dezky.local), own OAuth client (dezky-operator), own session secrets, own cookies. Customer and operator surfaces can't decrypt each other's session state. OAuth flow verified end-to-end: - GET / → middleware redirect to /auth/login - User clicks Sign in → /auth/oidc/login → bounces to Authentik with client_id=dezky-operator, scope includes 'groups' - Authentik checks dezky-platform-admins group binding (added in O.1), silent-reauths via the existing auth.dezky.local session - Returns to /auth/oidc/callback with code, exchanges for token, creates session cookie on operator.dezky.local - Lands on pages/index.vue placeholder dashboard Smoke test 'Create partner "test-partner"' button on the placeholder home exercises the full operator-only authorization chain: - 1st call: 200, partner created in Mongo - 2nd call: 409 'already exists' (idempotency holds, token still valid) - Same call from the customer portal: 403 'requires operator-scoped token' (audience guard rejects dezky-portal aud) JwtAuthGuard now multi-issuer in addition to multi-audience. Each Authentik OAuth provider mints tokens with its own per-app iss URL (.../application/o/<slug>/), so the guard accepts a comma-separated AUTHENTIK_ISSUER. The audience-only fix from O.2 wasn't sufficient — issuer is validated separately by jose.jwtVerify and was still pinned to dezky-portal alone, yielding 'unexpected iss claim value' rejections. Compose changes: new 'operator' service (Node 20 alpine, pnpm install + nuxt dev, mkcert CA mount, traefik labels for operator.dezky.local + TLS); new operator_node_modules volume; operator.dezky.local added to traefik's Docker network aliases. Distinct OPERATOR_NUXT_OIDC_* session secrets pulled from .env (gitignored, generated via openssl). Real operator screens (sidebar, topbar, tenants, partners, etc.) come in O.4. This commit is pure scaffolding + the security boundary proof.	2026-05-24 07:20:16 +02:00
Ronni Baslund	2db41fec5e	feat(platform-api): multi-audience JWT + Partner CRUD + tenant lifecycle (O.2) JwtAuthGuard now accepts a comma-separated AUTHENTIK_AUDIENCE ('dezky-portal,dezky-operator'). jose.jwtVerify takes an array and succeeds on any match — both customer-portal and operator-portal tokens validate against this service. Per-endpoint guards restrict further. New OperatorGuard enforces operator-only mutations: 1. JWT audience claim includes 'dezky-operator' (proof from the token alone that this is a privileged session) 2. ActorService-resolved User has platformAdmin=true (DB check so revocation works without waiting for the token to expire) Both required; either alone is insufficient. Partner module: - Partner schema: slug, name, domain, status, marginPct, contactInfo, billingInfo. marginPct is one number per partner (decided in grilling) - CRUD endpoints under @UseGuards(JwtAuthGuard, OperatorGuard) — every partner mutation requires operator scope - GET /partners returns each row with a computed customers count from aggregating Tenant.partnerId. MRR aggregation deferred until Subscription gains a price column - GET /partners/:slug/tenants for the partner detail view - DELETE soft-terminates (status='terminated') — never hard-delete because tenants may still reference the partner Tenant changes: - partnerId?: Types.ObjectId (ref Partner, indexed sparse) added to Tenant schema - UpdateTenantDto accepts partnerId so PATCH can attach/detach - POST /tenants/:slug/suspend and /resume — operator-only via OperatorGuard. PATCH already covers plan/domains/partnerId changes Smoke test: customer-portal session sends POST /api/partners through the portal proxy → 403 "This endpoint requires an operator-scoped token". The positive test (operator-token → 200) waits for O.3 when there's an operator app to mint the right token. apps/portal/server/api/partners/index.post.ts is a temporary verification proxy — delete once the operator portal exists.	2026-05-24 07:08:59 +02:00
Ronni Baslund	22b2583f0b	chore(services): rename services/provisioning -> services/platform-api O.0 prep from OPERATOR-PLAN.md. Mechanical refactor before adding partner management and operator-specific endpoints. The service now owns more than just provisioning orchestration (it'll soon own partners, tenant lifecycle actions, multi-audience JWT validation), so the name 'platform-api' reflects its scope better. What changed: - Directory: services/provisioning/ -> services/platform-api/ - Package: @dezky/provisioning -> @dezky/platform-api - Docker: container_name dezky-provisioning -> dezky-platform-api; compose service key 'provisioning' -> 'platform-api'; volume provisioning_node_modules -> platform_api_node_modules - Portal: PROVISIONING_INTERNAL_URL env var -> PLATFORM_API_INTERNAL_URL, default URL http://provisioning:3001 -> http://platform-api:3001 in all three proxy routes (me.get.ts, tenants/index.post.ts, tenants/[slug]/ reconcile.post.ts), plus NUXT_API_BASE updated - Health endpoint service identifier and main.ts log lines updated to 'dezky-platform-api' - Docs swept: README, CLAUDE.md, SERVICES.md, AUTHENTIK-SETUP.md, NEXT-STEPS.md, TROUBLESHOOTING.md, OPERATOR-PLAN.md, traefik/dynamic.yml What deliberately stays: - Internal module names ProvisioningService / ProvisioningModule (those describe an orchestration sub-concern, not the service's purpose) - Tenant.provisioningStatus / provisioningErrors field names (state per integration, not service name) - File services/platform-api/src/tenants/provisioning.service.ts - 'Hetzner provisioning' references in production-prep docs (infrastructure provisioning, unrelated) Verified end-to-end after rename: /api/me returns 200 with profile + 2 tenants + subscription, /api/tenants/dezky/reconcile returns 200 with Authentik integration still ok. OPERATOR-PLAN.md O.0 checkboxes ticked.	2026-05-24 00:35:01 +02:00
Ronni Baslund	28766b80c2	feat(provisioning): orchestrate Authentik/Stalwart/OCIS on tenant create Phase 4 from docs/NEXT-STEPS.md. POST /tenants now writes Mongo AND drives external service provisioning. A new POST /tenants/:slug/reconcile endpoint retries the orchestration — useful when an upstream was down at create time or external state drifted out of band. Integration clients (services/provisioning/src/integrations/): - AuthentikClient: real implementation. ensureGroup() is idempotent — looks up the group by name, creates if missing, returns either way. Group attributes record the tenant slug + Mongo id so we can trace back - StalwartClient: stubbed. v0.16 removed the REST management API in favor of JMAP, which is significantly more work to wrap. TODO comment points to https://stalw.art/docs/api/management/overview for the follow-up - OcisClient: stubbed. Needs libregraph /drives endpoint with service-to- service auth via OIDC client_credentials Orchestration (provisioning.service.ts): - Each step runs independently; one failure doesn't roll back the others - Per-step state recorded on Tenant.provisioningStatus (ok/skipped/error/ pending) plus error message on Tenant.provisioningErrors - Steps return their own terminal state — 'skipped' for stubs, void defaults to 'ok' for real integrations - Mongoose markModified() required for nested subdoc mutations to persist - Tenant auto-flips status: pending → active when all steps are ok\|skipped Portal proxy routes (apps/portal/server/api/tenants/): - POST /api/tenants and POST /api/tenants/:slug/reconcile forward the signed-in user's access token to the provisioning service. Lets the browser drive provisioning without minting tokens by hand. Will be replaced by a real "create workspace" flow with UI later docker-compose: AUTHENTIK_API_URL/STALWART_API_URL/OCIS_API_URL now point at the public Traefik-routed hostnames (with mkcert CA mounted into the provisioning container so Node fetch trusts them). Previously these pointed at internal Docker hostnames which doesn't work for Authentik because of TLS issuer mismatch against the JWT.	2026-05-24 00:06:40 +02:00
Ronni Baslund	3d370caa62	feat(provisioning): tenant data model + CRUD with JWT-validated authz Implements Phase 3 from docs/NEXT-STEPS.md. Mongoose schemas (services/provisioning/src/schemas/): - Tenant: slug, name, status, plan, domains, billingInfo, plus handles for Authentik group, OCIS space, and Stalwart domain (set in Phase 4) - User: authentikSubjectId, tenantIds[], email, name, role, platformAdmin flag - Subscription: tenantId, plan, status, Stripe IDs (unused until Phase 4) Auth (services/provisioning/src/auth/): - JwtAuthGuard verifies Authentik access tokens against the provider's JWKS with issuer + audience checks. Uses NODE_EXTRA_CA_CERTS to trust the mkcert root for the local Authentik cert - ActorService resolves the verified JWT into a Mongo User document — every controller reads tenantIds + platformAdmin from the DB, not the token - CurrentUser decorator extracts the JWT payload onto controllers CRUD modules: - /tenants, /users, /subscriptions with create/read/update/delete - /users/me upserts the caller's User record on every request, syncing email, name, tenantIds, and platformAdmin from the JWT's groups claim — the only place we read JWT.groups outside the bootstrap Why DB-derived authz: putting all group memberships in the JWT doesn't scale past ~50 tenants per user (header/cookie size limits, no mid-session revocation, stale data until re-login). JWT now carries identity only; the DB is the source of truth for who can see what. Seed (SeedService.OnApplicationBootstrap): idempotent creation of the default 'dezky' tenant + matching subscription. User records are created on first /users/me hit. Infrastructure: - Traefik label exposes provisioning at https://api.dezky.local (dev only) - api.dezky.local added to Docker network aliases on Traefik - mkcert root CA mounted into the provisioning container for JWKS fetch - Authentik 'groups' scope mapping created + attached to dezky-portal provider; portal now requests it as a scope - nuxt.config.ts portal: exposeAccessToken=true so Nitro forwards token; NUXT_OIDC_TOKEN_KEY fixed to base64-encoded 32 bytes (was hex, causing "Invalid key length" once exposeAccessToken turned on) Portal: apps/portal/server/api/me.get.ts is a scaffolding route that forwards the user's access token to provisioning and returns profile + tenants + subscriptions — verifies the full chain end to end.	2026-05-23 21:53:53 +02:00
Ronni Baslund	adfd9baafe	chore: initial scaffold with running local stack and portal auth Brings up Dezky's local development environment end-to-end: Infrastructure (docker-compose): - Traefik v3.7 reverse proxy with mkcert TLS (v3.2 couldn't speak Docker API 1.54) - Postgres + Mongo + Redis with healthchecks and init script for per-service users - Authentik 2025.10 (server + worker) as OIDC IdP - Stalwart v0.16 mail server (image renamed from stalwartlabs/mail-server) - OCIS 7.0 with PROXY_TLS=false and OCIS_CONFIG_DIR=/etc/ocis so init writes where the server reads - Collabora office, plus the portal + provisioning service stubs - Docker network aliases on Traefik so containers resolve auth.dezky.local etc. through the network (not host /etc/hosts) - Docker socket mount parameterized for macOS Docker Desktop symlink path Authentik provisioning (done via API after stack boot): - ocis-provider (public client) + OCIS Files application - dezky-portal provider (confidential) + Dezky Portal application - Admin API token bound to akadmin manually since 2025.10's AUTHENTIK_BOOTSTRAP_TOKEN env var doesn't auto-materialize a token row Portal (apps/portal): - Nuxt 3 with nuxt-oidc-auth 1.0.0-beta.11 against generic 'oidc' preset - Global auth middleware; login at /auth/oidc/login redirects to Authentik - Visual implementation of Claude Design 'Auth' canvas: AuthShell, NodeMark, Auth* sub-components, design tokens as CSS custom properties - Pages: auth/login, auth/expired, auth/disabled, index (post-login landing) - mkcert root CA mounted into the portal so Node fetch trusts Authentik's self-signed cert (NODE_EXTRA_CA_CERTS) — dev only Docs: - AUTHENTIK-SETUP.md updated with manual token bind + portal provider scripted alternative - NEXT-STEPS.md: Phase 1 and Phase 2 marked done with file locations and dev-mode caveats Dev-mode shortcuts that need to be revisited before prod: - skipAccessTokenParsing on the OIDC config - NODE_EXTRA_CA_CERTS mkcert mount - Bootstrap password still the generated value in .env - Authentik admin token (dezky-bootstrap-token) is non-expiring	2026-05-23 21:25:11 +02:00

49 Commits