Files
dezky/docs/NEXT-STEPS.md
T
Ronni Baslund 5407c04682 docs: feature-flag usage guide + cross-links
New docs/FEATURE-FLAGS.md captures when to add a flag, where the moving
parts live, how to use useFeatureFlag from app code, the 4 states + 4
scope axes, kill-switch flow, naming conventions, and the parts we know
aren't built yet (partnerSlug eval context, user-level flags, audit-log
integration, server-side cache).

CLAUDE.md gets a one-line convention entry under "Code conventions" so
future devs notice it when grepping for code rules. NEXT-STEPS.md is
updated: the feature-flag backend follow-up is now ticked done with a
pointer to FEATURE-FLAGS.md for the remaining sub-tasks, and the
"What landed" section reflects the real Infrastructure + Flags pages
and the notification drawer.
2026-05-24 19:29:24 +02:00

261 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Next Steps — After Local Stack Is Running
Once `./scripts/bootstrap.sh` completes successfully and all services are reachable, here's the development roadmap.
## Phase 1: Verify everything works (day 1) — done
- [x] `https://app.dezky.local` shows portal landing page (now the new auth design / post-login home)
- [x] `https://auth.dezky.local` shows Authentik login
- [x] Log into Authentik as admin *(still using generated `AUTHENTIK_BOOTSTRAP_PASSWORD` from `.env` — rotate before exposing to anyone else)*
- [x] Follow `docs/AUTHENTIK-SETUP.md` to configure OIDC providers (ocis + dezky-portal)
- [x] Test OCIS SSO end-to-end (login from `https://files.dezky.local`)
- [x] Verify Stalwart admin UI loads at `https://mail.dezky.local/login` *(root path 404s — admin SPA is at `/login`)*
## Phase 2: Build portal authentication (week 1) — done
Goal: Users can log in to the portal via Authentik.
- [x] Add `nuxt-oidc-auth` to `apps/portal` (`1.0.0-beta.11`)
- [x] Configure Authentik as OIDC provider (generic `oidc` preset with explicit URLs + discovery)
- [x] Implement login/logout flows (`/auth/oidc/login`, `/auth/oidc/logout` from the module)
- [x] Display logged-in user info on the portal home (`pages/index.vue` uses `useOidcAuth()`)
- [x] Add protected routes (`globalMiddlewareEnabled: true`; public pages opt out via `definePageMeta({ auth: false })`)
### Where things live
| Concern | File |
|---|---|
| OIDC module config | `apps/portal/nuxt.config.ts` (`oidc` block) |
| Custom login page | `apps/portal/pages/auth/login.vue` |
| Error states (expired / disabled) | `apps/portal/pages/auth/{expired,disabled}.vue` |
| Post-login landing | `apps/portal/pages/index.vue` |
| Visual shell + tokens | `apps/portal/components/auth/*`, `assets/styles/tokens.css` |
| Brand mark | `apps/portal/components/NodeMark.vue` |
### Dev-mode caveats (clean up before prod)
- `skipAccessTokenParsing: true` in the OIDC config — Authentik's access tokens in this setup aren't reliably JWT-parseable; production should re-evaluate
- `openIdConfiguration` is pinned to the discovery URL because the generic `oidc` preset doesn't ship a default — required for id_token JWKS validation
- `docker-compose.yml` mounts `infrastructure/docker-compose/certs/mkcert-root.pem` into the portal at `/etc/ssl/mkcert-root.pem` and sets `NODE_EXTRA_CA_CERTS` so Node fetch trusts the mkcert root CA. In prod, replace with real CA-signed certs
- Traefik has Docker network aliases for `auth.dezky.local`, `app.dezky.local`, etc. so container-to-Authentik fetch resolves inside the network without going through host `/etc/hosts`
## Phase 3: Tenant data model (week 1-2) — done
- [x] Mongoose schemas in `services/platform-api/src/schemas/` (Tenant, User, Subscription)
- [x] Tenant: slug, name, status, plan, domains, authentikGroupId, ocisSpaceId, stalwartDomain, billingInfo
- [x] User: authentikSubjectId, tenantIds[], email, name, role, active, lastLoginAt
- [x] Subscription: tenantId, plan, status, stripeCustomerId, stripeSubscriptionId, period dates
- [x] CRUD endpoints behind `JwtAuthGuard` (validates Authentik JWT via JWKS)
- [x] Group-based authorization: users see only tenants whose slug matches one of their Authentik `groups`; `dezky-platform-admins` group has global access
- [x] Idempotent seed (`SeedService`) creates the `dezky` tenant + matching subscription on bootstrap
- [x] platform-api exposed at `https://api.dezky.local` (Traefik label, dev only) and via internal `http://platform-api:3001`
- [x] Portal Nitro route at `/api/me` forwards the user's encrypted access token to platform-api — verified end-to-end
### Endpoints
| Method | Path | Notes |
|---|---|---|
| GET | `/health` | open |
| POST/GET | `/tenants`, `/tenants/:slug` | platform admin to create/delete; tenant members can read+update their own |
| GET | `/users/me` | upserts the user on first call from JWT claims |
| GET/POST/PATCH/DELETE | `/users[/:subject]` | platform admin for mutations |
| GET/POST/PATCH | `/subscriptions[/:slug]` | platform admin for mutations |
### Dev-mode caveats (clean up before prod)
- `NUXT_OIDC_TOKEN_KEY` must be base64-encoded 32 bytes (`openssl rand -base64 32`) — NOT hex. Module silently fails with "Invalid key length" if wrong
- Portal config has `exposeAccessToken: true` so Nitro routes can forward the token; token still never reaches the browser
- The `dezky` group in Authentik is the single tenant for dev. New tenants in Phase 4 need to create matching Authentik groups
- A `dezky-platform-admins` group doesn't exist yet — for now akadmin's membership in `authentik Admins` does NOT grant platform-admin rights. Create that group if you want admin-only endpoints to work for you
## Phase 4: Provisioning automation (week 2-3) — partial
Orchestration ships, two of three integrations are still stubs pending
upstream-specific work.
- [x] `POST /tenants` writes tenant and triggers reconciliation in one call
- [x] `POST /tenants/:slug/reconcile` retries provisioning for an existing
tenant — idempotent, useful when an upstream was down or external
state drifted
- [x] Per-step state recorded on `Tenant.provisioningStatus` (ok / skipped /
error / pending) + `Tenant.provisioningErrors` for the last failure
message; tenant auto-activates when all steps settle
- [x] Worker: Authentik group creation (real, idempotent)
- [ ] Worker: Stalwart domain + DKIM (stubbed — v0.16 dropped REST in favor
of JMAP, see follow-up below)
- [ ] Worker: OCIS space (stubbed — needs libregraph `/drives` endpoint
with service-to-service auth)
- [ ] Worker: onboarding email (no SMTP wired yet)
### Where things live
| Concern | File |
|---|---|
| Integration clients | `services/platform-api/src/integrations/{authentik,stalwart,ocis}.client.ts` |
| Orchestration | `services/platform-api/src/tenants/provisioning.service.ts` |
| `/tenants/:slug/reconcile` | `services/platform-api/src/tenants/tenants.controller.ts` |
| Portal proxy routes | `apps/portal/server/api/tenants/index.post.ts` + `[slug]/reconcile.post.ts` |
### Quick smoke test
From the portal in the browser (signed in), in DevTools:
```js
// Create a fresh tenant
await fetch('/api/tenants', {
method: 'POST',
headers: {'Content-Type':'application/json'},
body: JSON.stringify({ slug: 'acme', name: 'Acme Co', plan: 'pro' })
}).then(r => r.json())
// Re-run provisioning (idempotent)
await fetch('/api/tenants/acme/reconcile', { method: 'POST' }).then(r => r.json())
```
Response should include `provisioningStatus: { authentik: 'ok', stalwart:
'skipped', ocis: 'skipped' }` and `status: 'active'`. Verify the Authentik
group exists via the admin UI at `/if/admin/#/identity/groups`.
### Stub follow-up work
**Stalwart (JMAP)** — v0.16 [moved management off REST](https://stalw.art/docs/api/management/overview).
Need a minimal JMAP client that wraps `Domain/set` (create), `Domain/get`
(idempotency check), `Principal/set` (DKIM-keyed signing identity). Auth
via the persistent admin's bearer token from the OAuth flow we already use
for the web UI.
**OCIS (libregraph)**`POST /graph/v1.0/drives` with body
`{ "name": "<slug>", "driveType": "project" }`. Needs service-to-service
auth: either an OIDC client_credentials grant (requires registering a new
Authentik provider for the worker) or the IDM admin user's bearer token.
### Authentik API examples (for the eventual user-creation flow)
```typescript
// Create user
await authentikClient.coreUsersCreate({
username: user.email,
email: user.email,
name: user.name,
groups: [authentikGroupId],
})
```
## Operator portal — out-of-band track — shipped (O.0O.9)
`operator.dezky.local` is live as a separate Nuxt app with its own
`dezky-operator` Authentik OAuth client. Full plan and execution log in
[`OPERATOR-PLAN.md`](./OPERATOR-PLAN.md).
What landed:
- `services/provisioning` renamed to `services/platform-api`
- Audience-aware JwtAuthGuard accepts both `dezky-portal` and `dezky-operator`
- `Partner` schema + CRUD endpoints, `Tenant.partnerId` ref
- Tenant lifecycle (suspend / resume) gated by OperatorGuard
- **Real Infrastructure live-probes** — `GET /health/platform` runs TCP +
HTTP probes against every neighbouring service; UI splits "Live" vs
"Planned" with honest status.
- **Real feature-flag system** — `Flag` schema + CRUD + bulk eval +
operator UI + `useFeatureFlag` composable in the portal. Hash-based
deterministic rollout. See [`FEATURE-FLAGS.md`](./FEATURE-FLAGS.md).
- Operator UI: Overview (real KPIs), Tenants (7-tab detail w/ Danger),
Partners (attach/detach), Users, Operator team, real Infrastructure,
real Feature flags. Visual-only Audit. Placeholders for
Support/Billing/Reports/Settings.
- Interactions: ⌘K command palette, impersonation stub (modal + banner),
incident modal, tweaks panel, **notification drawer**.
### Follow-ups before operator hits production
In rough priority order — bulk lifted from OPERATOR-PLAN.md:
- [ ] **Real impersonation flow** — OAuth Token Exchange (RFC 8693),
`act` claim on customer portal, audit on entry+exit, banner with
origin operator identity
- [ ] **Real audit log collection**`platform_audit` Mongo collection,
written by platform-api on every privileged action; stream from there
instead of `data/fixtures.ts`
- [x] **Feature flag backend** — shipped. See
[`FEATURE-FLAGS.md`](./FEATURE-FLAGS.md). Remaining sub-tasks:
partnerSlug eval context, user-level flags, audit-log integration,
server-side cache (all called out in that doc).
- [ ] **Incident management backend**`Incident` schema + paging
(PagerDuty / OpsGenie / custom). Until then, IncidentModal is mock.
- [ ] **Support ticket queue**`SupportTicket` schema + email-in
ingestion from a dedicated mailbox via Stalwart
- [ ] **Self-serve Partner portal at `partner.dezky.local`** — own Nuxt
app, own OAuth client, scoped to a partner's own customers
- [ ] **Real environment switcher** — currently cosmetic; would need
separate API endpoints per env, separate Authentik tenants
- [ ] **Real on-call indicator** — integration with the paging system from
the incident backend
- [ ] **Operator workspace impersonation in OCIS/Stalwart** — operator
tooling reaches into the customer's files + mail for support, with
the same audit trail
- [ ] **MRR aggregation on Partner** when Subscription gains real pricing
- [ ] **MFA-required Authentik policy** on the `dezky-operator` provider
(deferred from O.1)
- [ ] **Delete throwaway endpoints** added during verification:
`apps/operator/server/api/_verify-token.get.ts`,
`apps/portal/server/api/_verify-token.get.ts`,
`apps/operator/server/api/operator-smoke-test.post.ts`,
`apps/portal/server/api/partners/index.post.ts`
## Phase 5: Custom webmail (week 3-4)
Goal: Branded webmail client using Stalwart's JMAP API.
- [ ] Add JMAP client library to portal
- [ ] Build inbox view in Nuxt
- [ ] Build compose dialog
- [ ] Build message view with thread support
- [ ] Style to match Dezky branding
JMAP is a modern JSON-RPC protocol — clean to work with.
## Phase 6: Production migration prep (week 4+)
When the local stack is solid and you have 2-3 pilot customers interested:
- [ ] Order Hetzner AX41-NVMe
- [ ] Order Storage Box BX11 (Falkenstein)
- [ ] Enable Hetzner Object Storage (bucket: dezky-ocis-prod)
- [ ] Build Terraform module for Hetzner provisioning
- [ ] Build Ansible playbook for bare-metal Stalwart deployment
- [ ] Set up k3s on the cloud server
- [ ] Migrate compose to Helm charts
- [ ] Configure Let's Encrypt via cert-manager
- [ ] Set up Restic backup jobs to Storage Box + B2
## Phase 7: Add Zulip and Jitsi (when chat/video needed)
These were excluded from MVP for simplicity. When ready:
- [ ] Create `infrastructure/docker-compose/docker-compose.optional.yml`
- [ ] Add Zulip stack (server + db + worker)
- [ ] Add Jitsi stack (web + prosody + jicofo + jvb)
- [ ] Configure OIDC integration with Authentik
- [ ] Add to portal launcher
## Decisions still open
These need to be made before public launch:
- [ ] Final pricing tiers (MVP, Pro, Enterprise)
- [ ] dezky.com purchase decision ($3,000 via BrandBucket)
- [ ] Final logo design (4 directions explored, need to pick one)
- [ ] Legal entity structure for the new business
- [ ] DPA (databehandleraftale) template
- [ ] Customer support process (ticket system choice)
## Long-term architecture goals
- [ ] Multi-region deployment (Hetzner Falkenstein + Helsinki)
- [ ] Disaster recovery: cross-DC Restic copies
- [ ] ISO 27001 certification via Vanta
- [ ] GDPR Article 30 record of processing activities
- [ ] SOC 2 (later, for enterprise customers)
- [ ] Customer-facing status page (Uptime Kuma or cstate)
- [ ] Public documentation site
- [ ] Self-service migration tooling from M365