Files
dezky/docs/NEXT-STEPS.md
T
Ronni Baslund f8618b2bbc feat(portal): real OCIS storage data via refresh-token service auth
The Storage page + endpoint landed earlier but had no working OCIS
backend credential. OCIS has no service-account/client-credentials grant
and trusts a single issuer, and basic auth resolves no user in our
external-IdP setup — so authenticate OcisClient via an OIDC
refresh-token bootstrap instead:

- One-time headless login of svc-platform-api against the ocis provider
  (public client ocis-web, issuer .../o/ocis/) yields a refresh token,
  persisted in Mongo (ocis_credentials) and rotated on every use.
- OcisClient mints access tokens with the refresh_token grant; the
  service user holds the OCIS admin role (OCIS_ADMIN_USER_ID) so
  libregraph ListAllDrives works.
- scripts/bootstrap-ocis.mjs re-runs the bootstrap if the token lapses.
- Dashboard Plan card gains a storage capacity bar beside seats;
  hidden when storage is unavailable.
- compose + .env.example: OCIS service OIDC env and admin user id.
- docs/NEXT-STEPS: document the mechanism and the dead-end alternatives.
2026-05-31 21:29:17 +02:00

312 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Next Steps — After Local Stack Is Running
Once `./scripts/bootstrap.sh` completes successfully and all services are reachable, here's the development roadmap.
## Phase 1: Verify everything works (day 1) — done
- [x] `https://app.dezky.local` shows portal landing page (now the new auth design / post-login home)
- [x] `https://auth.dezky.local` shows Authentik login
- [x] Log into Authentik as admin *(still using generated `AUTHENTIK_BOOTSTRAP_PASSWORD` from `.env` — rotate before exposing to anyone else)*
- [x] Follow `docs/AUTHENTIK-SETUP.md` to configure OIDC providers (ocis + dezky-portal)
- [x] Test OCIS SSO end-to-end (login from `https://files.dezky.local`)
- [x] Verify Stalwart admin UI loads at `https://mail.dezky.local/login` *(root path 404s — admin SPA is at `/login`)*
## Phase 2: Build portal authentication (week 1) — done
Goal: Users can log in to the portal via Authentik.
- [x] Add `nuxt-oidc-auth` to `apps/portal` (`1.0.0-beta.11`)
- [x] Configure Authentik as OIDC provider (generic `oidc` preset with explicit URLs + discovery)
- [x] Implement login/logout flows (`/auth/oidc/login`, `/auth/oidc/logout` from the module)
- [x] Display logged-in user info on the portal home (`pages/index.vue` uses `useOidcAuth()`)
- [x] Add protected routes (`globalMiddlewareEnabled: true`; public pages opt out via `definePageMeta({ auth: false })`)
### Where things live
| Concern | File |
|---|---|
| OIDC module config | `apps/portal/nuxt.config.ts` (`oidc` block) |
| Custom login page | `apps/portal/pages/auth/login.vue` |
| Error states (expired / disabled) | `apps/portal/pages/auth/{expired,disabled}.vue` |
| Post-login landing | `apps/portal/pages/index.vue` |
| Visual shell + tokens | `apps/portal/components/auth/*`, `assets/styles/tokens.css` |
| Brand mark | `apps/portal/components/NodeMark.vue` |
### Dev-mode caveats (clean up before prod)
- `skipAccessTokenParsing: true` in the OIDC config — Authentik's access tokens in this setup aren't reliably JWT-parseable; production should re-evaluate
- `openIdConfiguration` is pinned to the discovery URL because the generic `oidc` preset doesn't ship a default — required for id_token JWKS validation
- `docker-compose.yml` mounts `infrastructure/docker-compose/certs/mkcert-root.pem` into the portal at `/etc/ssl/mkcert-root.pem` and sets `NODE_EXTRA_CA_CERTS` so Node fetch trusts the mkcert root CA. In prod, replace with real CA-signed certs
- Traefik has Docker network aliases for `auth.dezky.local`, `app.dezky.local`, etc. so container-to-Authentik fetch resolves inside the network without going through host `/etc/hosts`
## Phase 3: Tenant data model (week 1-2) — done
- [x] Mongoose schemas in `services/platform-api/src/schemas/` (Tenant, User, Subscription)
- [x] Tenant: slug, name, status, plan, domains, authentikGroupId, ocisSpaceId, stalwartDomain, billingInfo
- [x] User: authentikSubjectId, tenantIds[], email, name, role, active, lastLoginAt
- [x] Subscription: tenantId, plan, status, stripeCustomerId, stripeSubscriptionId, period dates
- [x] CRUD endpoints behind `JwtAuthGuard` (validates Authentik JWT via JWKS)
- [x] Group-based authorization: users see only tenants whose slug matches one of their Authentik `groups`; `dezky-platform-admins` group has global access
- [x] Idempotent seed (`SeedService`) creates the `dezky` tenant + matching subscription on bootstrap
- [x] platform-api exposed at `https://api.dezky.local` (Traefik label, dev only) and via internal `http://platform-api:3001`
- [x] Portal Nitro route at `/api/me` forwards the user's encrypted access token to platform-api — verified end-to-end
### Endpoints
| Method | Path | Notes |
|---|---|---|
| GET | `/health` | open |
| POST/GET | `/tenants`, `/tenants/:slug` | platform admin to create/delete; tenant members can read+update their own |
| GET | `/users/me` | upserts the user on first call from JWT claims |
| GET/POST/PATCH/DELETE | `/users[/:subject]` | platform admin for mutations |
| GET/POST/PATCH | `/subscriptions[/:slug]` | platform admin for mutations |
### Dev-mode caveats (clean up before prod)
- `NUXT_OIDC_TOKEN_KEY` must be base64-encoded 32 bytes (`openssl rand -base64 32`) — NOT hex. Module silently fails with "Invalid key length" if wrong
- Portal config has `exposeAccessToken: true` so Nitro routes can forward the token; token still never reaches the browser
- The `dezky` group in Authentik is the single tenant for dev. New tenants in Phase 4 need to create matching Authentik groups
- A `dezky-platform-admins` group doesn't exist yet — for now akadmin's membership in `authentik Admins` does NOT grant platform-admin rights. Create that group if you want admin-only endpoints to work for you
## Phase 4: Provisioning automation (week 2-3) — partial
Orchestration ships, two of three integrations are still stubs pending
upstream-specific work.
- [x] `POST /tenants` writes tenant and triggers reconciliation in one call
- [x] `POST /tenants/:slug/reconcile` retries provisioning for an existing
tenant — idempotent, useful when an upstream was down or external
state drifted
- [x] Per-step state recorded on `Tenant.provisioningStatus` (ok / skipped /
error / pending) + `Tenant.provisioningErrors` for the last failure
message; tenant auto-activates when all steps settle
- [x] Worker: Authentik group creation (real, idempotent)
- [ ] Worker: Stalwart domain + DKIM (stubbed — v0.16 dropped REST in favor
of JMAP, see follow-up below)
- [ ] Worker: OCIS space (stubbed — needs libregraph `/drives` endpoint
with service-to-service auth)
- [ ] Worker: onboarding email (no SMTP wired yet)
### Where things live
| Concern | File |
|---|---|
| Integration clients | `services/platform-api/src/integrations/{authentik,stalwart,ocis}.client.ts` |
| Orchestration | `services/platform-api/src/tenants/provisioning.service.ts` |
| `/tenants/:slug/reconcile` | `services/platform-api/src/tenants/tenants.controller.ts` |
| Portal proxy routes | `apps/portal/server/api/tenants/index.post.ts` + `[slug]/reconcile.post.ts` |
### Quick smoke test
From the portal in the browser (signed in), in DevTools:
```js
// Create a fresh tenant
await fetch('/api/tenants', {
method: 'POST',
headers: {'Content-Type':'application/json'},
body: JSON.stringify({ slug: 'acme', name: 'Acme Co', plan: 'pro' })
}).then(r => r.json())
// Re-run provisioning (idempotent)
await fetch('/api/tenants/acme/reconcile', { method: 'POST' }).then(r => r.json())
```
Response should include `provisioningStatus: { authentik: 'ok', stalwart:
'skipped', ocis: 'skipped' }` and `status: 'active'`. Verify the Authentik
group exists via the admin UI at `/if/admin/#/identity/groups`.
### Stub follow-up work
**Stalwart (JMAP)** — v0.16 [moved management off REST](https://stalw.art/docs/api/management/overview).
Need a minimal JMAP client that wraps `Domain/set` (create), `Domain/get`
(idempotency check), `Principal/set` (DKIM-keyed signing identity). Auth
via the persistent admin's bearer token from the OAuth flow we already use
for the web UI.
**OCIS (libregraph)** — space *provisioning* is still stubbed:
`POST /graph/v1.0/drives` with body `{ "name": "<slug>", "driveType":
"project" }` to create a tenant's project space, then assign it.
**OCIS read auth (done — powers the customer-admin Storage page).** OCIS has
*no* backend service-account/client-credentials grant and trusts exactly one
issuer, and basic auth doesn't resolve a user in our external-IdP setup. The
working mechanism is a **refresh-token bootstrap**:
1. A dedicated Authentik user `svc-platform-api` (with an email — OCIS
autoprovision rejects empty emails) logs in **once** against the *ocis*
provider (public client `ocis-web`, per-provider issuer `.../o/ocis/` — the
one OCIS trusts). Run it headlessly:
`docker compose exec platform-api node /app/scripts/bootstrap-ocis.mjs`.
The refresh token is persisted in Mongo (`ocis_credentials`).
2. `OcisClient` mints access tokens with the `refresh_token` grant and persists
the rotated token each call (Authentik rotates on every use).
3. The svc user needs the OCIS **admin** role for `ListAllDrives` — granted via
`OCIS_ADMIN_USER_ID=<svc OCIS account UUID>` on the ocis service.
Note: the "global" issuer mode is **not** an option — its issuer is the
Authentik root, which has no `.well-known/openid-configuration`, so OCIS can't
validate tokens against it.
### Authentik API examples (for the eventual user-creation flow)
```typescript
// Create user
await authentikClient.coreUsersCreate({
username: user.email,
email: user.email,
name: user.name,
groups: [authentikGroupId],
})
```
## Operator portal — out-of-band track — shipped (O.0O.9)
`operator.dezky.local` is live as a separate Nuxt app with its own
`dezky-operator` Authentik OAuth client. Full plan and execution log in
[`OPERATOR-PLAN.md`](./OPERATOR-PLAN.md).
What landed:
- `services/provisioning` renamed to `services/platform-api`
- Audience-aware JwtAuthGuard accepts both `dezky-portal` and `dezky-operator`
- `Partner` schema + CRUD endpoints, `Tenant.partnerId` ref
- Tenant lifecycle (suspend / resume) gated by OperatorGuard
- **Real Infrastructure live-probes** — `GET /health/platform` runs TCP +
HTTP probes against every neighbouring service; UI splits "Live" vs
"Planned" with honest status.
- **Real feature-flag system** — `Flag` schema + CRUD + bulk eval +
operator UI + `useFeatureFlag` composable in the portal. Hash-based
deterministic rollout. See [`FEATURE-FLAGS.md`](./FEATURE-FLAGS.md).
- Operator UI: Overview (real KPIs), Tenants (7-tab detail w/ Danger),
Partners (attach/detach), Users, Operator team, real Infrastructure,
real Feature flags. Visual-only Audit. Placeholders for
Support/Billing/Reports/Settings.
- Interactions: ⌘K command palette, impersonation stub (modal + banner),
incident modal, tweaks panel, **notification drawer**.
### Follow-ups before operator hits production
In rough priority order — bulk lifted from OPERATOR-PLAN.md:
- [ ] **Real impersonation flow** — OAuth Token Exchange (RFC 8693),
`act` claim on customer portal, audit on entry+exit, banner with
origin operator identity
- [ ] **Real audit log collection**`platform_audit` Mongo collection,
written by platform-api on every privileged action; stream from there
instead of `data/fixtures.ts`
- [x] **Feature flag backend** — shipped. See
[`FEATURE-FLAGS.md`](./FEATURE-FLAGS.md). Remaining sub-tasks:
partnerSlug eval context, user-level flags, audit-log integration,
server-side cache (all called out in that doc).
- [ ] **Incident management backend**`Incident` schema + paging
(PagerDuty / OpsGenie / custom). Until then, IncidentModal is mock.
- [ ] **Support ticket queue**`SupportTicket` schema + email-in
ingestion from a dedicated mailbox via Stalwart
- [ ] **Self-serve Partner portal at `partner.dezky.local`** — own Nuxt
app, own OAuth client, scoped to a partner's own customers
- [ ] **Real environment switcher** — currently cosmetic; would need
separate API endpoints per env, separate Authentik tenants
- [ ] **Real on-call indicator** — integration with the paging system from
the incident backend
- [ ] **Operator workspace impersonation in OCIS/Stalwart** — operator
tooling reaches into the customer's files + mail for support, with
the same audit trail
- [ ] **MRR aggregation on Partner** when Subscription gains real pricing
- [ ] **MFA-required Authentik policy** on the `dezky-operator` provider
(deferred from O.1)
- [ ] **Delete throwaway endpoints** added during verification:
`apps/operator/server/api/_verify-token.get.ts`,
`apps/portal/server/api/_verify-token.get.ts`,
`apps/operator/server/api/operator-smoke-test.post.ts`,
`apps/portal/server/api/partners/index.post.ts`
## Phase 5: Custom webmail (week 3-4)
Goal: Branded webmail client using Stalwart's JMAP API.
- [ ] Add JMAP client library to portal
- [ ] Build inbox view in Nuxt
- [ ] Build compose dialog
- [ ] Build message view with thread support
- [ ] Style to match Dezky branding
JMAP is a modern JSON-RPC protocol — clean to work with.
## Phase 6: Production migration prep (week 4+)
When the local stack is solid and you have 2-3 pilot customers interested:
- [ ] Order Hetzner AX41-NVMe
- [ ] Order Storage Box BX11 (Falkenstein)
- [ ] Enable Hetzner Object Storage (bucket: dezky-ocis-prod)
- [ ] Build Terraform module for Hetzner provisioning
- [ ] Build Ansible playbook for bare-metal Stalwart deployment
- [ ] Set up k3s on the cloud server
- [ ] Migrate compose to Helm charts
- [ ] Configure Let's Encrypt via cert-manager
- [ ] Set up Restic backup jobs to Storage Box + B2
## Phase 7: Add Zulip and Jitsi (when chat/video needed)
These were excluded from MVP for simplicity. When ready:
- [ ] Create `infrastructure/docker-compose/docker-compose.optional.yml`
- [ ] Add Zulip stack (server + db + worker)
- [ ] Add Jitsi stack (web + prosody + jicofo + jvb)
- [ ] Configure OIDC integration with Authentik
- [ ] Add to portal launcher
## Billing & subscriptions — partial implementation
**Implemented (Tier 1)** as of 2026-05-26:
- Price catalog (`prices` collection): operator-edited plan/cycle/currency/per-seat
matrix. UI at `operator.dezky.local/pricing`.
- Subscription auto-created on tenant provision with the resolved `priceId` +
`cycle` + `seats` snapshot.
- Partner MRR aggregation: `GET /users/me/partner/mrr` sums active subscriptions,
normalized to monthly DKK. Surfaced on `app.dezky.local/partner` dashboard.
**NOT implemented — deferred to a future billing pipeline:**
- **Stripe integration** (Tier 2): no Stripe Customer/Subscription objects are
created when a tenant is provisioned. No checkout flow. No payment method
capture. The `Subscription.stripe*` placeholder fields stay empty.
- **Invoices** (Tier 2): no invoice generation, no PDF rendering, no
past-due/dunning logic. The customer portal's `/admin/billing` page is
still fixture-driven.
- **Partner payouts** (Tier 3): MRR display uses `Partner.marginPct` for
display purposes only — no actual payout calculation, no payout schedule,
no bank account capture, no Stripe Connect.
- **Multi-currency** (Tier 3): catalog only supports DKK. EUR/USD entries
would need currency-conversion at MRR aggregation time.
- **Plan-change prorating** (Tier 3): changing a tenant's plan mid-cycle
doesn't prorate or generate adjustment invoices.
- **Tax handling** (Tier 3): no VAT calculation, no Stripe Tax wiring, no
reverse-charge handling for cross-border B2B.
When ready for paying customers, the next investment is Tier 2 (Stripe +
invoices). Tier 3 work should wait until there's enough volume to justify it.
## Decisions still open
These need to be made before public launch:
- [ ] Final pricing tiers (MVP, Pro, Enterprise)
- [ ] dezky.com purchase decision ($3,000 via BrandBucket)
- [ ] Final logo design (4 directions explored, need to pick one)
- [ ] Legal entity structure for the new business
- [ ] DPA (databehandleraftale) template
- [ ] Customer support process (ticket system choice)
## Long-term architecture goals
- [ ] Multi-region deployment (Hetzner Falkenstein + Helsinki)
- [ ] Disaster recovery: cross-DC Restic copies
- [ ] ISO 27001 certification via Vanta
- [ ] GDPR Article 30 record of processing activities
- [ ] SOC 2 (later, for enterprise customers)
- [ ] Customer-facing status page (Uptime Kuma or cstate)
- [ ] Public documentation site
- [ ] Self-service migration tooling from M365