20 Commits

Author SHA1 Message Date
Ronni Baslund c9911cc262 feat(website): add Nuxt 4 marketing landing page
New standalone apps/website (Nuxt 4) serving the public marketing site at
dezky.local / www.dezky.local. The customer portal moves off the root domain
to app.dezky.local only.

Landing page ported from the Dezky design handoff: light theme, Danish
default, hero variant A, with a working da/en toggle. Self-contained colour
system threaded through components (utils/landingTokens.ts), full bilingual
copy (utils/landingCopy.ts), and shared state (composables/useLanding.ts).
Sections live under components/landing/* with the Node logo under
components/brand/*.

Wired into docker-compose (website service, volume, Traefik labels, network
aliases) and bootstrap.sh (hosts list + service URLs).
2026-06-05 10:58:25 +02:00
Ronni Baslund 47eb9502f8 feat(platform): real email domains, mailboxes & member lifecycle
Wire the mail/identity stack to real Stalwart/Authentik/OCIS provisioning,
replacing the mocked Domains and Users pages.

Domains (customer-admin):
- StalwartClient: real JMAP management (v0.16 dropped REST) — create/list/delete
  email domains via x:Domain at the internal http://stalwart:8080 listener;
  DKIM auto-generated; the records to publish are read from the domain's
  dnsZoneFile. Gated by STALWART_PROVISIONING_ENABLED.
- New Domain collection + DomainsModule: add/list/recheck/set-DMARC/remove,
  tenant-membership-gated and audited.
- DnsVerifierService: verifies MX/SPF/DKIM/DMARC/ownership against a public
  resolver (1.1.1.1/8.8.8.8) and diffs them against the expected records.
- Remove is guarded: refuses while accounts/aliases/mailing lists still use the
  domain (via Stalwart referential integrity).
- Domains page + add wizard on real data; sidebar badge counts domains needing
  attention.

Users & groups (customer-admin):
- Create a member provisioned across Authentik SSO, a Stalwart mailbox on the
  tenant's primary domain, and OCIS — returning a one-time password.
- Lifecycle: suspend/resume (Authentik is_active + freeze the mailbox via
  account permissions, original password preserved), force-logout (terminate
  sessions, filtered client-side so it can never end other users' sessions),
  reset password (new one-time password on SSO + mailbox), and remove (tear down
  mailbox + SSO identity + OCIS + doc; mailbox-in-use aware for multi-tenant
  users). Self-suspend / self-force-logout are blocked.

Infra: point platform-api at the internal Stalwart listener; document the new
STALWART_/provisioning vars in .env.example.
2026-06-01 21:19:42 +02:00
Ronni Baslund f8618b2bbc feat(portal): real OCIS storage data via refresh-token service auth
The Storage page + endpoint landed earlier but had no working OCIS
backend credential. OCIS has no service-account/client-credentials grant
and trusts a single issuer, and basic auth resolves no user in our
external-IdP setup — so authenticate OcisClient via an OIDC
refresh-token bootstrap instead:

- One-time headless login of svc-platform-api against the ocis provider
  (public client ocis-web, issuer .../o/ocis/) yields a refresh token,
  persisted in Mongo (ocis_credentials) and rotated on every use.
- OcisClient mints access tokens with the refresh_token grant; the
  service user holds the OCIS admin role (OCIS_ADMIN_USER_ID) so
  libregraph ListAllDrives works.
- scripts/bootstrap-ocis.mjs re-runs the bootstrap if the token lapses.
- Dashboard Plan card gains a storage capacity bar beside seats;
  hidden when storage is unavailable.
- compose + .env.example: OCIS service OIDC env and admin user id.
- docs/NEXT-STEPS: document the mechanism and the dead-end alternatives.
2026-05-31 21:29:17 +02:00
Ronni Baslund 559348f6bc feat(portal): real Security & audit page (+ bundled Storage / per-tenant-roles WIP)
Security & audit (admin)
- Audit log: real, tenant-scoped — widened GET /tenants/:slug/audit with
  q/action/outcome/actorEmail/since/before; UI gains search, outcome + time
  filters, action chips, cursor pagination, and client-side CSV export.
- Security policy: new tenant.securityPolicy (mfaMode, session idle/absolute,
  allowedCountries, ipAllowlist) + PATCH /tenants/:slug/security-policy
  (membership-gated, audited). Editable, labelled by enforcement status.
- MFA: live enrollment overview via GET /tenants/:slug/mfa-status
  (Authentik countAuthenticators per member).
- SSO apps (Dezky as IdP): real Authentik OIDC provider + application CRUD,
  scoped to the tenant group. New AuthentikClient methods (provider/app/binding
  + flow/key/scope discovery), TenantSsoApp schema, TenantSsoService (rollback
  on partial failure; client secret never stored), GET/POST/DELETE
  /tenants/:slug/sso-apps. Validated end-to-end against live Authentik.
- Deferred: shared-flow MFA/geo/session enforcement (global auth-flow blast
  radius) — to be done as its own reviewed change.

Bundled in-progress work that shares the same files (kept together so the tree
stays green):
- Storage page: StorageService + GET /tenants/:slug/storage (OCIS-backed),
  storage.get proxy, storage.vue.
- Per-tenant roles: User.tenantRoles + MeProfile.tenantRoles plumbing.
2026-05-31 17:20:36 +02:00
Ronni Baslund 3288fde693 feat(portal): customer-admin surface on real data + Stripe billing + session resilience
Access & navigation
- Gate partner-mode strictly to partner staff so admins/end-users never inherit
  leftover partner-view state; purge stale session entry on hydrate.
- Role-driven admin entry: useMe.isTenantAdmin, Admin/Personal tiles in the app
  launcher, and an /admin route guard in the global middleware (fail closed).
- Drop the duplicate user identity block from the sidebar footer.

Admin pages on real data
- New tenant-scoped, membership-gated endpoints: GET /tenants/:slug/{audit,users,
  invoices}; useTenant composable resolves the active workspace + subscription.
- Dashboard: real seats, spend (cycle-normalized + minor-units), plan, renewal,
  and recent audit; unbacked sections removed.
- Users & groups: real members; Groups/Invitations/Service accounts shown as
  honest "coming soon".
- Subscription & invoices: real plan hero, invoice history, and billing details.

Stripe payment method (Elements + SetupIntent)
- StripeClient: publishable key + getDefaultCard/createSetupIntent/setDefaultCard.
- CustomerBillingController + BillingService methods (ensure-customer on demand).
- Portal: PaymentMethodModal, useStripeJs (CDN load), proxies; hidePostalCode.

Editable billing details & whitelabel branding
- PATCH /tenants/:slug/billing-info (narrow: company/VAT/country/email).
- TenantBranding schema/service + GET/PUT /tenants/:slug/branding: real product
  name, accent colour, and per-tenant email-template overrides.
- Branding preview + sidebar workspace mark wired to real name/plan/seats/colour
  with YIQ auto-contrast (readableOn util).

Session resilience
- Request offline_access so Authentik issues a refresh token (automaticRefresh).
- Silent refresh + single retry on 401 for writes (useApiFetch, incl. partner
  pages) and reads (useMe.fetchMe) — no redirect, no lost input.
- Modal backdrop closes only on press+release on the backdrop (no more
  drag-select-to-close).
2026-05-31 00:19:34 +02:00
Ronni Baslund 0b269e7ea7 feat(auth): enforce operator/partner platform isolation
A partner or tenant admin could complete the dezky-operator OIDC flow and
land on the operator portal. The platform-api OperatorGuard already 403s
their data, but the login/UI layer had no authorization check at all — the
only gate was a manual Authentik UI setting with nothing in git enforcing it.

Close it with defense-in-depth across three independent layers:

1. IdP — operator-application.yaml blueprint binds an
   ak_is_group_member("dezky-platform-admins") policy to the dezky-operator
   app, so Authentik denies the OIDC flow for non-admins. The blueprint also
   provisions the provider + application (state: created, so a fresh env is
   built from code while an existing hand-made provider is left untouched).
   Wire OPERATOR_OIDC_* into both authentik containers and mount the
   blueprints dir on the worker (it applies blueprints, and previously lacked
   the mount).

2. Operator app — require-platform-admin.global.ts requires platformAdmin and
   routes a non-admin to not-authorized.vue, which triggers a full sign-out
   (local + Authentik IdP) for shared-workstation safety. Fails open on a
   transient /api/me error by design, to avoid mass-signout on platform-api
   restarts; layers 1 and 3 contain the exposure.

3. platform-api — OperatorGuard (unchanged) requires dezky-operator audience
   plus platformAdmin resolved from the DB on every request.

Also harden the partner surface: it shares the dezky-portal client with tenant
users so it has no IdP gate, and its /partner/* route middleware now fails
CLOSED when identity can't be confirmed.

Docs (AUTHENTIK-SETUP.md) and .env.example updated; the operator client secret
must be set before first boot since the blueprint now consumes it.
2026-05-30 15:48:01 +02:00
Ronni Baslund 22925599e7 chore(compose): wire Stripe env into platform-api
Pass STRIPE_SECRET_KEY, STRIPE_WEBHOOK_SECRET and BILLING_STRIPE_ENABLED into the platform-api service from the gitignored .env. Defaults keep billing dark (derived data) when unset.
2026-05-30 08:08:50 +02:00
Ronni Baslund 0bd4e5498e feat: portal redesign, pricing catalog, partner-staff invites
- portal: new admin/ and partner/ surfaces with full component library
  (AppLauncher, Avatar, Badge, Card, Modal, Tabs, etc.), composables,
  layouts, partner-routing middleware, and supporting server APIs
- pricing: Price schema/module with operator CRUD, pricing.vue catalog UI,
  Subscription extended with cycle/currency/perSeatAmount/seats snapshots
  for stable MRR aggregation
- partner staff: User.partnerId, invite-partner-user DTO and flow,
  /partners/:slug/users endpoints, InvitePartnerUserModal, shared
  dezky-partner-staff Authentik group
- /me: partner-aware endpoint returning user + partner context so portal
  can route between end-user and partner-admin surfaces
- tenant: seats field for portfolio displays and future MRR calculations
- operator: pricing page, signed-out page, useMe/useToast composables,
  ToastStack
2026-05-28 20:00:33 +02:00
Ronni Baslund 4d9e906ec1 feat(audit): cold-storage archival to S3 (Phase 4)
Final piece of the audit work. Events older than the hot retention window
move to S3-compatible object storage with signed manifests. Production uses
Hetzner Object Storage; dev uses a MinIO container with the same API.

Infra (infrastructure/docker-compose):
  - New `minio` service exposing the S3 API at minio:9000 + admin console at
    minio.dezky.local. Healthchecked. Bucket-init sidecar runs `mc mb` once
    to create `dezky-audit`; safe to re-run.
  - .env adds MINIO_ROOT_USER + MINIO_ROOT_PASSWORD.
  - platform-api env: AUDIT_COLD_{ENDPOINT,REGION,BUCKET,ACCESS_KEY,SECRET_KEY}
    + AUDIT_HOT_RETENTION_DAYS=90 + ARCHIVE_ENABLED=false (dormant in dev;
    operator UI's "Run archive now" bypasses this gate). AUDIT_COLD_SSE
    opts into SSE-S3 — left unset in dev because MinIO without a KMS rejects
    AES256 PUTs with "KMS is not configured".

Platform-api (services/platform-api/src/cold/):
  - cold-storage.client.ts: thin @aws-sdk/client-s3 wrapper — put/head/list.
    forcePathStyle=true so MinIO and Hetzner both work; same code, env-swap.
  - archive.service.ts: runOnce() selects chained events with at < cutoff →
    serializes to JSONL → gzip → sha256s → uploads JSONL + signed manifest
    → HEAD-confirms both objects exist → records an ArchiveBatch doc → only
    then deletes from hot Mongo. Crash-safe: a failed upload leaves events
    in hot. Manifest uses the Phase 3 AUDIT_SIGNING_KEY (HMAC-SHA-256), so
    archives + checkpoints share trust chain. Bypassable via { override:
    true } for the operator's UI force-run.
  - archive.worker.ts: hourly tick guarded by configured run-hour-UTC
    (default 03:00) + day-guard so the same UTC day doesn't archive twice.
    Disabled until ARCHIVE_ENABLED=true.
  - archive-batch.schema.ts: { archivedAt, startSeq, endSeq, eventCount,
    manifestSha256, jsonlKey, manifestKey, bytesUncompressed }. The
    manifest sha256 stored in Mongo lets us detect manifest tampering
    without downloading the actual manifest.

Audit module additions:
  - audit.controller.ts: GET /audit/archives, POST /audit/archive/run,
    /audit/verify now reports { oldestHotSeq, highestArchivedSeq } so the
    UI shows the tier boundary.

Operator UI (apps/operator):
  - 2 new proxies: /api/audit/archives + /api/audit/archive/run (force
    override=true). Both behind operator auth via the existing platformApi
    helper.
  - audit.vue: new "Cold storage" card with batch table (archived-at, seq
    range, event count, size, truncated manifest sha256), "Run archive
    now" button + per-run result line.

Smoke-tested end-to-end:
  - 7 chained events in hot. /api/audit/archive/run → ok=true, batchId
    returned. JSONL + manifest both exist in MinIO (verified via mc ls +
    mc cat). Mongo's chained set went 7 → 0. Verify reports
    highestArchivedSeq=1446 (since we burn-allocate seqs on Authentik
    dup-key rejections). Operator /audit panel shows the batch with
    manifest hash 1d8263…
  - First attempt with SSE-S3 enabled failed cleanly (MinIO KMS not
    configured) — archive service correctly left events in hot Mongo.
    Made SSE opt-in via AUDIT_COLD_SSE=true; prod turns it on.

Out of scope (each could be its own session):
  - Restore-to-hot endpoint (today: download from S3 + offline query)
  - Client-side encryption (today: SSE-S3 in prod, none in dev)
  - Multi-region replication
  - Soft TTL safety net (defense-in-depth on top of app-managed deletion)

This completes the four-phase audit log work:
  1. platform-api as audit hub
  2. External system ingest (Authentik / Stalwart / OCIS)
  3. Hash-chain + signed checkpoints (tamper evidence)
  4. Cold-storage archival (retention without unbounded Mongo growth)
2026-05-24 21:03:41 +02:00
Ronni Baslund 9435baa09d feat(audit): hash-chain tamper evidence + signed checkpoints (Phase 3)
The audit log now carries cryptographic chain-of-custody. Every chained
event references the previous event's sha256, and periodic checkpoints
sign the head with HMAC-SHA-256. An attacker who modifies a historical
row must also forge every checkpoint signature past it — which requires
the AUDIT_SIGNING_KEY, kept outside Mongo.

Schema (services/platform-api/src/schemas/):
  - audit-event.schema.ts: new `seq` (monotonic) + `chained` (Phase-3-or-
    later flag) + `prevHash` + `hash`. Compound unique index on seq with
    partial filter so pre-Phase-3 rows don't collide on null.
  - audit-counter.schema.ts: single doc `_id='audit_seq'`, incremented
    atomically by findOneAndUpdate($inc).
  - audit-checkpoint.schema.ts: { at, headSeq, headHash, signature,
    sigAlg, reason }. Reason ∈ {startup, interval, threshold, manual}.

Audit module (services/platform-api/src/audit/):
  - canonical.ts: stable JSON form + hashCanonical (sha256) +
    checkpointSignature (HMAC-SHA-256) + verifyCheckpointSignature
    (timingSafeEqual). Single source of truth for hash inputs — schema
    additions land here at the same time as the field.
  - audit.service.ts: record() now allocates seq → looks up lastHash() →
    computes hash → inserts. Per-process write mutex serializes the
    allocate+lookup so concurrent writers don't both chain off the same
    predecessor. Documented multi-instance caveat (needs Mongo replica
    set + transactions OR a distributed lock).
  - checkpoint.service.ts: scheduler triggers on startup + every 5min
    + threshold of 100 events accumulated. Skips when no new chained
    events since the last anchor.
  - verifier.service.ts: walks chain in seq order, recomputes each
    hash, validates checkpoint signatures. Returns a precise break:
    'event-hash-mismatch' (in-place modification), 'event-prev-hash-
    mismatch' (insertion/deletion), or 'checkpoint-signature-mismatch'.
  - audit.controller.ts: GET /audit/verify, GET /audit/checkpoint/latest,
    POST /audit/checkpoint (manual force).

Operator UI (apps/operator/):
  - 3 new proxies under /api/audit/{verify, checkpoint/latest, checkpoint}.
  - pages/audit.vue: new "Tamper evidence" card with "Force checkpoint"
    + "Verify chain" buttons. Header shows live head seq; result line
    shows verified count or a precise break (kind + seq + expected vs
    actual hash). Background tinted green/red on ok/broken.

Env (.env + docker-compose.yml):
  - new AUDIT_SIGNING_KEY (32-byte hex HMAC secret). Prod swaps this for
    ed25519 from an HSM/KMS; verifier code stays the same because sigAlg
    is on the checkpoint doc.

Smoke-tested all three break paths against a clean chain of 5 events:
  - normal verify: ok=true, 5/5 events verified, 1 checkpoint signed
  - modified seq=3 in Mongo directly: verify returns ok=false with
    break = { kind: 'event-hash-mismatch', seq: 3, expected, actual }
  - restored, nuked checkpoint signature: break = { kind:
    'checkpoint-signature-mismatch', headSeq: 5 }
  - operator UI's verify panel reflects all three states correctly.

Legacy data: pre-Phase-3 events stay `chained: false` and are excluded
from the chain walk. Retroactive chaining of historical entries is a
one-off migration script we can run if we ever care to.

Out of scope (Phase 4 etc.):
  - TTL + cold-storage archival to Hetzner Object Storage
  - GDPR right-to-erasure tooling
  - ed25519 / HSM signing (swap is well-defined; sigAlg field is ready)
  - Multi-instance write coordination (Mongo transaction OR distributed
    lock when we scale platform-api beyond 1 replica)
2026-05-24 20:43:54 +02:00
Ronni Baslund df18128617 feat(audit): OCIS file-tail ingest worker (Phase 2 chunk 3)
Tails OCIS's JSON-Lines audit log on a shared Docker volume and forwards
mutations into AuditService. Final piece of Phase 2 — the /audit page now
unifies platform-api, authentik, and ocis events on one timeline.

services/platform-api/src/ingest/ocis.ingest.ts:
  - 5s polling loop (fs.watch is unreliable across Docker bind mounts on
    macOS). Stat → detect inode change or truncation → resume from byte
    position OR start over.
  - Cursor in IngestCursor stores lastEventId = "<inode>:<bytePosition>".
    Restarts resume cleanly; on overlap the (source, externalId) unique
    index dedups silently.
  - Lines collected first, then processed sequentially after the read
    stream closes. Earlier draft fired recordOne() from inside the
    readline 'line' callback which would have resolved the stream
    before all writes finished — same class of race we hit in the
    Authentik worker, fixed before commit.
  - Tenant inference: spaceName (set during provisioning to the slug)
    first, then User.authentikSubjectId → tenantIds → Tenant.slug.
  - Mutations only: OCIS_ALLOWLIST in action-map.ts whitelists 24 event
    types (User/Group/Space/Share/Link/File mutations). FileDownloaded,
    UserSignedIn, and the rest of the high-volume read traffic gets
    skipped — keeps the timeline scannable.

services/platform-api/src/ingest/action-map.ts:
  - mapOcisAction() + OCIS_ALLOWLIST. Returns null for non-whitelisted
    types so the worker filters early.

infrastructure/docker-compose/docker-compose.yml:
  - New named volume `ocis_audit_log` mounted writeable on the ocis
    container and read-only on platform-api.
  - OCIS env: OCIS_ADD_RUN_SERVICES=audit (the audit microservice is
    NOT in the default `ocis server` set — opt in explicitly),
    AUDIT_LOG_FILE_PATH=/var/log/ocis/audit.log, AUDIT_LOG_FORMAT=json.
  - platform-api env: OCIS_AUDIT_LOG_PATH points at the same file.

Verified end-to-end with synthetic events written to the audit log:
  - Worker tailed 5 events across initial read + incremental append
    (5 → bytes 0:1276, then 1 → bytes 1276:1519).
  - FileDownloaded correctly filtered by the allowlist (4 mutations
    landed in Mongo, not 5).
  - Tenant inference: events with executingUser.id resolved to
    `dezky` via User → tenantIds → Tenant.slug.
  - Operator /audit shows all three sources (89 events: 79 authentik
    + 5 platform-api + 5 ocis) in one unified timeline.

Known unknown — same shape as the Stalwart commit: I couldn't fully
confirm the OCIS v7 audit microservice emits events with just
OCIS_ADD_RUN_SERVICES=audit + the AUDIT_LOG_FILE_PATH env. The audit
service starts but the file stays empty until OCIS internals start
publishing events to NATS (which may need additional service-side
config). The ingest worker is correct regardless — when OCIS starts
writing real events, they'll flow into /audit. This is a follow-up
in the OCIS-side configuration, not in our ingest code.
2026-05-24 20:30:47 +02:00
Ronni Baslund 7bec940e7f feat(audit): Stalwart webhook ingest endpoint (Phase 2 chunk 2)
Push-based ingest for mail-server events. Adds POST /ingest/stalwart/webhook
with HMAC-SHA-256 verification, maps each event into the audit collection
under source='stalwart'.

services/platform-api/src/ingest/stalwart-webhook.controller.ts:
  - Public endpoint (no JwtAuthGuard — Stalwart can't carry a JWT). Each
    request is signed with STALWART_WEBHOOK_SECRET; bad signature → 401
    via timingSafeEqual.
  - Body: { events: [{ id, type, createdAt, data }, ... ] }. Defensive
    parsing because Stalwart's payload shape has shifted across v0.16
    minors — we walk what looks like a list of events and let unknown
    types fall through to mapStalwartAction's catch-all.
  - Per-event recordOne: action via mapStalwartAction(), actor from
    data.email/account/username, IP from data.ip or X-Forwarded-For,
    targetName from data.account/email/address/to, full payload kept
    in metadata. externalId = evt.id so the (source, externalId)
    unique index dedups re-deliveries.

action-map.ts: 14 known Stalwart event types →
  stalwart.{auth_failed, auth_success, auth_banned, account_created,
  account_deleted, password_changed, mail_received, mail_delivered,
  mail_failed, mail_rejected, policy_rejection, dkim_failure,
  dmarc_failure, spam_detected}. Snake/kebab forms normalized.

infrastructure/docker-compose:
  - .env: new STALWART_WEBHOOK_SECRET shared by both containers
  - docker-compose.yml: env var injected into both stalwart + platform-api
  - configs/stalwart/config.toml: [webhook."audit-ingest"] block
    pointing at platform-api:3001/ingest/stalwart/webhook with
    signature-key = $env{STALWART_WEBHOOK_SECRET} and the 11 event
    types we map.

Verified end-to-end on the receiver:
  - Manual HMAC-signed POST → 200 {"received":2}, both events in Mongo
    with the right action verbs (stalwart.auth_failed, stalwart.account_created),
    actor/IP/externalId populated.
  - Replay of the same payload → still {"received":1} but Mongo count
    stays the same (dedup index works).
  - X-Signature: deadbeef → 401, no row written.

Known unknown: I couldn't fully confirm Stalwart v0.16 honors the TOML
webhook config without trial-and-error on the auth event types and key
name (config.toml uses signature-key; some Stalwart builds want plain
'key'). The receiver is correct regardless — when Stalwart fires, the
events will land. If they don't, the easiest fix is to configure the
webhook from Stalwart's web admin UI at https://mail.dezky.local
instead of via TOML.
2026-05-24 20:21:29 +02:00
Ronni Baslund 55b1c133e3 feat(operator): scaffold apps/operator Nuxt app + multi-issuer JWT (O.3)
New Nuxt 3 app at apps/operator/ — internal admin portal on its own domain
(operator.dezky.local), own OAuth client (dezky-operator), own session
secrets, own cookies. Customer and operator surfaces can't decrypt each
other's session state.

OAuth flow verified end-to-end:
  - GET / → middleware redirect to /auth/login
  - User clicks Sign in → /auth/oidc/login → bounces to Authentik with
    client_id=dezky-operator, scope includes 'groups'
  - Authentik checks dezky-platform-admins group binding (added in O.1),
    silent-reauths via the existing auth.dezky.local session
  - Returns to /auth/oidc/callback with code, exchanges for token,
    creates session cookie on operator.dezky.local
  - Lands on pages/index.vue placeholder dashboard

Smoke test 'Create partner "test-partner"' button on the placeholder home
exercises the full operator-only authorization chain:
  - 1st call: 200, partner created in Mongo
  - 2nd call: 409 'already exists' (idempotency holds, token still valid)
  - Same call from the customer portal: 403 'requires operator-scoped
    token' (audience guard rejects dezky-portal aud)

JwtAuthGuard now multi-issuer in addition to multi-audience. Each
Authentik OAuth provider mints tokens with its own per-app iss URL
(.../application/o/<slug>/), so the guard accepts a comma-separated
AUTHENTIK_ISSUER. The audience-only fix from O.2 wasn't sufficient —
issuer is validated separately by jose.jwtVerify and was still pinned
to dezky-portal alone, yielding 'unexpected iss claim value' rejections.

Compose changes: new 'operator' service (Node 20 alpine, pnpm install +
nuxt dev, mkcert CA mount, traefik labels for operator.dezky.local +
TLS); new operator_node_modules volume; operator.dezky.local added to
traefik's Docker network aliases. Distinct OPERATOR_NUXT_OIDC_* session
secrets pulled from .env (gitignored, generated via openssl).

Real operator screens (sidebar, topbar, tenants, partners, etc.) come
in O.4. This commit is pure scaffolding + the security boundary proof.
2026-05-24 07:20:16 +02:00
Ronni Baslund 2db41fec5e feat(platform-api): multi-audience JWT + Partner CRUD + tenant lifecycle (O.2)
JwtAuthGuard now accepts a comma-separated AUTHENTIK_AUDIENCE
('dezky-portal,dezky-operator'). jose.jwtVerify takes an array and succeeds
on any match — both customer-portal and operator-portal tokens validate
against this service. Per-endpoint guards restrict further.

New OperatorGuard enforces operator-only mutations:
  1. JWT audience claim includes 'dezky-operator' (proof from the token
     alone that this is a privileged session)
  2. ActorService-resolved User has platformAdmin=true (DB check so
     revocation works without waiting for the token to expire)
Both required; either alone is insufficient.

Partner module:
  - Partner schema: slug, name, domain, status, marginPct, contactInfo,
    billingInfo. marginPct is one number per partner (decided in grilling)
  - CRUD endpoints under @UseGuards(JwtAuthGuard, OperatorGuard) — every
    partner mutation requires operator scope
  - GET /partners returns each row with a computed customers count from
    aggregating Tenant.partnerId. MRR aggregation deferred until
    Subscription gains a price column
  - GET /partners/:slug/tenants for the partner detail view
  - DELETE soft-terminates (status='terminated') — never hard-delete
    because tenants may still reference the partner

Tenant changes:
  - partnerId?: Types.ObjectId (ref Partner, indexed sparse) added to
    Tenant schema
  - UpdateTenantDto accepts partnerId so PATCH can attach/detach
  - POST /tenants/:slug/suspend and /resume — operator-only via
    OperatorGuard. PATCH already covers plan/domains/partnerId changes

Smoke test: customer-portal session sends POST /api/partners through the
portal proxy → 403 "This endpoint requires an operator-scoped token". The
positive test (operator-token → 200) waits for O.3 when there's an
operator app to mint the right token.

apps/portal/server/api/partners/index.post.ts is a temporary verification
proxy — delete once the operator portal exists.
2026-05-24 07:08:59 +02:00
Ronni Baslund 22b2583f0b chore(services): rename services/provisioning -> services/platform-api
O.0 prep from OPERATOR-PLAN.md. Mechanical refactor before adding partner
management and operator-specific endpoints. The service now owns more than
just provisioning orchestration (it'll soon own partners, tenant lifecycle
actions, multi-audience JWT validation), so the name 'platform-api' reflects
its scope better.

What changed:
- Directory: services/provisioning/ -> services/platform-api/
- Package: @dezky/provisioning -> @dezky/platform-api
- Docker: container_name dezky-provisioning -> dezky-platform-api;
  compose service key 'provisioning' -> 'platform-api'; volume
  provisioning_node_modules -> platform_api_node_modules
- Portal: PROVISIONING_INTERNAL_URL env var -> PLATFORM_API_INTERNAL_URL,
  default URL http://provisioning:3001 -> http://platform-api:3001 in all
  three proxy routes (me.get.ts, tenants/index.post.ts, tenants/[slug]/
  reconcile.post.ts), plus NUXT_API_BASE updated
- Health endpoint service identifier and main.ts log lines updated to
  'dezky-platform-api'
- Docs swept: README, CLAUDE.md, SERVICES.md, AUTHENTIK-SETUP.md,
  NEXT-STEPS.md, TROUBLESHOOTING.md, OPERATOR-PLAN.md, traefik/dynamic.yml

What deliberately stays:
- Internal module names ProvisioningService / ProvisioningModule (those
  describe an orchestration sub-concern, not the service's purpose)
- Tenant.provisioningStatus / provisioningErrors field names (state
  per integration, not service name)
- File services/platform-api/src/tenants/provisioning.service.ts
- 'Hetzner provisioning' references in production-prep docs (infrastructure
  provisioning, unrelated)

Verified end-to-end after rename: /api/me returns 200 with profile + 2
tenants + subscription, /api/tenants/dezky/reconcile returns 200 with
Authentik integration still ok.

OPERATOR-PLAN.md O.0 checkboxes ticked.
2026-05-24 00:35:01 +02:00
Ronni Baslund 28766b80c2 feat(provisioning): orchestrate Authentik/Stalwart/OCIS on tenant create
Phase 4 from docs/NEXT-STEPS.md. POST /tenants now writes Mongo AND drives
external service provisioning. A new POST /tenants/:slug/reconcile endpoint
retries the orchestration — useful when an upstream was down at create time
or external state drifted out of band.

Integration clients (services/provisioning/src/integrations/):
- AuthentikClient: real implementation. ensureGroup() is idempotent — looks
  up the group by name, creates if missing, returns either way. Group
  attributes record the tenant slug + Mongo id so we can trace back
- StalwartClient: stubbed. v0.16 removed the REST management API in favor
  of JMAP, which is significantly more work to wrap. TODO comment points
  to https://stalw.art/docs/api/management/overview for the follow-up
- OcisClient: stubbed. Needs libregraph /drives endpoint with service-to-
  service auth via OIDC client_credentials

Orchestration (provisioning.service.ts):
- Each step runs independently; one failure doesn't roll back the others
- Per-step state recorded on Tenant.provisioningStatus (ok/skipped/error/
  pending) plus error message on Tenant.provisioningErrors
- Steps return their own terminal state — 'skipped' for stubs, void
  defaults to 'ok' for real integrations
- Mongoose markModified() required for nested subdoc mutations to persist
- Tenant auto-flips status: pending → active when all steps are ok|skipped

Portal proxy routes (apps/portal/server/api/tenants/):
- POST /api/tenants and POST /api/tenants/:slug/reconcile forward the
  signed-in user's access token to the provisioning service. Lets the
  browser drive provisioning without minting tokens by hand. Will be
  replaced by a real "create workspace" flow with UI later

docker-compose: AUTHENTIK_API_URL/STALWART_API_URL/OCIS_API_URL now point
at the public Traefik-routed hostnames (with mkcert CA mounted into the
provisioning container so Node fetch trusts them). Previously these
pointed at internal Docker hostnames which doesn't work for Authentik
because of TLS issuer mismatch against the JWT.
2026-05-24 00:06:40 +02:00
Ronni Baslund 4bf6a85517 fix(stalwart): wire recovery admin + point portal tile at admin UI
- docker-compose: add STALWART_RECOVERY_ADMIN env so the env-file password
  works as a permanent recovery login. Without this, Stalwart prints a
  one-time bootstrap password to the logs and discards it after first setup
- portal: mail tile now links to /admin/ (the real Stalwart admin SPA),
  not /login (which is the OAuth client authorization UI for IMAP/SMTP
  clients like Thunderbird — confusing and unrelated)

The persistent admin (admin@dezky.local) was created via Stalwart's setup
wizard at /admin/init and lives in the stalwart_data volume. Recovery admin
in env is the "I lost the wizard credentials" escape hatch.
2026-05-23 22:51:25 +02:00
Ronni Baslund e0808bf13e fix(ocis): wire OCIS web SSO + Collabora document editing end to end
OCIS SSO was loading the SPA but never redirecting to Authentik: the default
OCIS CSP only allows connect-src to itself + the awesome-ocis GitHub repo, so
the metadata fetch to auth.dezky.local was blocked. Mount a custom csp.yaml
and point PROXY_CSP_CONFIG_FILE_LOCATION at it (env var lives on the proxy
service, not web — easy mistake). Also added the .html OIDC callback URIs to
the ocis-provider in Authentik (run-time state, not in this commit).

Collabora document editing required adding the OCIS collaboration service —
the WOPI bridge between OCIS storage and Collabora. Key wiring:

- ocis: expose embedded NATS (NATS_NATS_HOST=0.0.0.0) and gateway
  (GATEWAY_GRPC_ADDR=0.0.0.0:9142) so the new container can register and
  reach the rest of OCIS over the Docker network
- collaboration: COLLABORATION_GRPC_ADDR=0.0.0.0:9301 so it registers itself
  in the service registry with a reachable address (default 127.0.0.1 was
  unreachable from cross-container callers)
- collaboration: APP_ADDR uses the public host (office.dezky.local), not
  the internal Docker hostname — this value is sent to the browser as the
  iframe src
- collabora: regenerate proof key on every start (coolconfig generate-proof-key)
  so its public key matches what coolwsd signs with; otherwise collaboration
  rejects WOPI calls with "ProofKeys verification failed"
- collabora: ssl_verification=false (mkcert root not in Collabora's trust
  store), frame_ancestors=files.dezky.local (otherwise the iframe is blocked
  with a Danish "Indhold blokeret"), home_mode.enable=true to drop the
  "Explore The New" welcome popup and feedback prompt
- ocis CSP: extend connect-src + frame-src to include the new hostnames

Result: opening a .docx from OCIS now embeds Collabora in an iframe and the
document opens for editing.

Dev-mode caveats (not for prod): TLS verification disabled on Collabora's
outbound WOPI calls; home_mode caps at 20 concurrent connections / 10 docs.
2026-05-23 22:36:42 +02:00
Ronni Baslund 3d370caa62 feat(provisioning): tenant data model + CRUD with JWT-validated authz
Implements Phase 3 from docs/NEXT-STEPS.md.

Mongoose schemas (services/provisioning/src/schemas/):
- Tenant: slug, name, status, plan, domains, billingInfo, plus handles for
  Authentik group, OCIS space, and Stalwart domain (set in Phase 4)
- User: authentikSubjectId, tenantIds[], email, name, role, platformAdmin flag
- Subscription: tenantId, plan, status, Stripe IDs (unused until Phase 4)

Auth (services/provisioning/src/auth/):
- JwtAuthGuard verifies Authentik access tokens against the provider's JWKS
  with issuer + audience checks. Uses NODE_EXTRA_CA_CERTS to trust the
  mkcert root for the local Authentik cert
- ActorService resolves the verified JWT into a Mongo User document — every
  controller reads tenantIds + platformAdmin from the DB, not the token
- CurrentUser decorator extracts the JWT payload onto controllers

CRUD modules:
- /tenants, /users, /subscriptions with create/read/update/delete
- /users/me upserts the caller's User record on every request, syncing email,
  name, tenantIds, and platformAdmin from the JWT's groups claim — the only
  place we read JWT.groups outside the bootstrap

Why DB-derived authz: putting all group memberships in the JWT doesn't scale
past ~50 tenants per user (header/cookie size limits, no mid-session
revocation, stale data until re-login). JWT now carries identity only; the
DB is the source of truth for who can see what.

Seed (SeedService.OnApplicationBootstrap): idempotent creation of the
default 'dezky' tenant + matching subscription. User records are created on
first /users/me hit.

Infrastructure:
- Traefik label exposes provisioning at https://api.dezky.local (dev only)
- api.dezky.local added to Docker network aliases on Traefik
- mkcert root CA mounted into the provisioning container for JWKS fetch
- Authentik 'groups' scope mapping created + attached to dezky-portal
  provider; portal now requests it as a scope
- nuxt.config.ts portal: exposeAccessToken=true so Nitro forwards token;
  NUXT_OIDC_TOKEN_KEY fixed to base64-encoded 32 bytes (was hex, causing
  "Invalid key length" once exposeAccessToken turned on)

Portal: apps/portal/server/api/me.get.ts is a scaffolding route that
forwards the user's access token to provisioning and returns profile +
tenants + subscriptions — verifies the full chain end to end.
2026-05-23 21:53:53 +02:00
Ronni Baslund adfd9baafe chore: initial scaffold with running local stack and portal auth
Brings up Dezky's local development environment end-to-end:

Infrastructure (docker-compose):
- Traefik v3.7 reverse proxy with mkcert TLS (v3.2 couldn't speak Docker API 1.54)
- Postgres + Mongo + Redis with healthchecks and init script for per-service users
- Authentik 2025.10 (server + worker) as OIDC IdP
- Stalwart v0.16 mail server (image renamed from stalwartlabs/mail-server)
- OCIS 7.0 with PROXY_TLS=false and OCIS_CONFIG_DIR=/etc/ocis so init writes
  where the server reads
- Collabora office, plus the portal + provisioning service stubs
- Docker network aliases on Traefik so containers resolve auth.dezky.local etc.
  through the network (not host /etc/hosts)
- Docker socket mount parameterized for macOS Docker Desktop symlink path

Authentik provisioning (done via API after stack boot):
- ocis-provider (public client) + OCIS Files application
- dezky-portal provider (confidential) + Dezky Portal application
- Admin API token bound to akadmin manually since 2025.10's
  AUTHENTIK_BOOTSTRAP_TOKEN env var doesn't auto-materialize a token row

Portal (apps/portal):
- Nuxt 3 with nuxt-oidc-auth 1.0.0-beta.11 against generic 'oidc' preset
- Global auth middleware; login at /auth/oidc/login redirects to Authentik
- Visual implementation of Claude Design 'Auth' canvas: AuthShell, NodeMark,
  Auth* sub-components, design tokens as CSS custom properties
- Pages: auth/login, auth/expired, auth/disabled, index (post-login landing)
- mkcert root CA mounted into the portal so Node fetch trusts Authentik's
  self-signed cert (NODE_EXTRA_CA_CERTS) — dev only

Docs:
- AUTHENTIK-SETUP.md updated with manual token bind + portal provider scripted
  alternative
- NEXT-STEPS.md: Phase 1 and Phase 2 marked done with file locations and
  dev-mode caveats

Dev-mode shortcuts that need to be revisited before prod:
- skipAccessTokenParsing on the OIDC config
- NODE_EXTRA_CA_CERTS mkcert mount
- Bootstrap password still the generated value in .env
- Authentik admin token (dezky-bootstrap-token) is non-expiring
2026-05-23 21:25:11 +02:00