5407c04682
New docs/FEATURE-FLAGS.md captures when to add a flag, where the moving parts live, how to use useFeatureFlag from app code, the 4 states + 4 scope axes, kill-switch flow, naming conventions, and the parts we know aren't built yet (partnerSlug eval context, user-level flags, audit-log integration, server-side cache). CLAUDE.md gets a one-line convention entry under "Code conventions" so future devs notice it when grepping for code rules. NEXT-STEPS.md is updated: the feature-flag backend follow-up is now ticked done with a pointer to FEATURE-FLAGS.md for the remaining sub-tasks, and the "What landed" section reflects the real Infrastructure + Flags pages and the notification drawer.
261 lines
13 KiB
Markdown
261 lines
13 KiB
Markdown
# Next Steps — After Local Stack Is Running
|
||
|
||
Once `./scripts/bootstrap.sh` completes successfully and all services are reachable, here's the development roadmap.
|
||
|
||
## Phase 1: Verify everything works (day 1) — done
|
||
|
||
- [x] `https://app.dezky.local` shows portal landing page (now the new auth design / post-login home)
|
||
- [x] `https://auth.dezky.local` shows Authentik login
|
||
- [x] Log into Authentik as admin *(still using generated `AUTHENTIK_BOOTSTRAP_PASSWORD` from `.env` — rotate before exposing to anyone else)*
|
||
- [x] Follow `docs/AUTHENTIK-SETUP.md` to configure OIDC providers (ocis + dezky-portal)
|
||
- [x] Test OCIS SSO end-to-end (login from `https://files.dezky.local`)
|
||
- [x] Verify Stalwart admin UI loads at `https://mail.dezky.local/login` *(root path 404s — admin SPA is at `/login`)*
|
||
|
||
## Phase 2: Build portal authentication (week 1) — done
|
||
|
||
Goal: Users can log in to the portal via Authentik.
|
||
|
||
- [x] Add `nuxt-oidc-auth` to `apps/portal` (`1.0.0-beta.11`)
|
||
- [x] Configure Authentik as OIDC provider (generic `oidc` preset with explicit URLs + discovery)
|
||
- [x] Implement login/logout flows (`/auth/oidc/login`, `/auth/oidc/logout` from the module)
|
||
- [x] Display logged-in user info on the portal home (`pages/index.vue` uses `useOidcAuth()`)
|
||
- [x] Add protected routes (`globalMiddlewareEnabled: true`; public pages opt out via `definePageMeta({ auth: false })`)
|
||
|
||
### Where things live
|
||
|
||
| Concern | File |
|
||
|---|---|
|
||
| OIDC module config | `apps/portal/nuxt.config.ts` (`oidc` block) |
|
||
| Custom login page | `apps/portal/pages/auth/login.vue` |
|
||
| Error states (expired / disabled) | `apps/portal/pages/auth/{expired,disabled}.vue` |
|
||
| Post-login landing | `apps/portal/pages/index.vue` |
|
||
| Visual shell + tokens | `apps/portal/components/auth/*`, `assets/styles/tokens.css` |
|
||
| Brand mark | `apps/portal/components/NodeMark.vue` |
|
||
|
||
### Dev-mode caveats (clean up before prod)
|
||
|
||
- `skipAccessTokenParsing: true` in the OIDC config — Authentik's access tokens in this setup aren't reliably JWT-parseable; production should re-evaluate
|
||
- `openIdConfiguration` is pinned to the discovery URL because the generic `oidc` preset doesn't ship a default — required for id_token JWKS validation
|
||
- `docker-compose.yml` mounts `infrastructure/docker-compose/certs/mkcert-root.pem` into the portal at `/etc/ssl/mkcert-root.pem` and sets `NODE_EXTRA_CA_CERTS` so Node fetch trusts the mkcert root CA. In prod, replace with real CA-signed certs
|
||
- Traefik has Docker network aliases for `auth.dezky.local`, `app.dezky.local`, etc. so container-to-Authentik fetch resolves inside the network without going through host `/etc/hosts`
|
||
|
||
## Phase 3: Tenant data model (week 1-2) — done
|
||
|
||
- [x] Mongoose schemas in `services/platform-api/src/schemas/` (Tenant, User, Subscription)
|
||
- [x] Tenant: slug, name, status, plan, domains, authentikGroupId, ocisSpaceId, stalwartDomain, billingInfo
|
||
- [x] User: authentikSubjectId, tenantIds[], email, name, role, active, lastLoginAt
|
||
- [x] Subscription: tenantId, plan, status, stripeCustomerId, stripeSubscriptionId, period dates
|
||
- [x] CRUD endpoints behind `JwtAuthGuard` (validates Authentik JWT via JWKS)
|
||
- [x] Group-based authorization: users see only tenants whose slug matches one of their Authentik `groups`; `dezky-platform-admins` group has global access
|
||
- [x] Idempotent seed (`SeedService`) creates the `dezky` tenant + matching subscription on bootstrap
|
||
- [x] platform-api exposed at `https://api.dezky.local` (Traefik label, dev only) and via internal `http://platform-api:3001`
|
||
- [x] Portal Nitro route at `/api/me` forwards the user's encrypted access token to platform-api — verified end-to-end
|
||
|
||
### Endpoints
|
||
|
||
| Method | Path | Notes |
|
||
|---|---|---|
|
||
| GET | `/health` | open |
|
||
| POST/GET | `/tenants`, `/tenants/:slug` | platform admin to create/delete; tenant members can read+update their own |
|
||
| GET | `/users/me` | upserts the user on first call from JWT claims |
|
||
| GET/POST/PATCH/DELETE | `/users[/:subject]` | platform admin for mutations |
|
||
| GET/POST/PATCH | `/subscriptions[/:slug]` | platform admin for mutations |
|
||
|
||
### Dev-mode caveats (clean up before prod)
|
||
|
||
- `NUXT_OIDC_TOKEN_KEY` must be base64-encoded 32 bytes (`openssl rand -base64 32`) — NOT hex. Module silently fails with "Invalid key length" if wrong
|
||
- Portal config has `exposeAccessToken: true` so Nitro routes can forward the token; token still never reaches the browser
|
||
- The `dezky` group in Authentik is the single tenant for dev. New tenants in Phase 4 need to create matching Authentik groups
|
||
- A `dezky-platform-admins` group doesn't exist yet — for now akadmin's membership in `authentik Admins` does NOT grant platform-admin rights. Create that group if you want admin-only endpoints to work for you
|
||
|
||
## Phase 4: Provisioning automation (week 2-3) — partial
|
||
|
||
Orchestration ships, two of three integrations are still stubs pending
|
||
upstream-specific work.
|
||
|
||
- [x] `POST /tenants` writes tenant and triggers reconciliation in one call
|
||
- [x] `POST /tenants/:slug/reconcile` retries provisioning for an existing
|
||
tenant — idempotent, useful when an upstream was down or external
|
||
state drifted
|
||
- [x] Per-step state recorded on `Tenant.provisioningStatus` (ok / skipped /
|
||
error / pending) + `Tenant.provisioningErrors` for the last failure
|
||
message; tenant auto-activates when all steps settle
|
||
- [x] Worker: Authentik group creation (real, idempotent)
|
||
- [ ] Worker: Stalwart domain + DKIM (stubbed — v0.16 dropped REST in favor
|
||
of JMAP, see follow-up below)
|
||
- [ ] Worker: OCIS space (stubbed — needs libregraph `/drives` endpoint
|
||
with service-to-service auth)
|
||
- [ ] Worker: onboarding email (no SMTP wired yet)
|
||
|
||
### Where things live
|
||
|
||
| Concern | File |
|
||
|---|---|
|
||
| Integration clients | `services/platform-api/src/integrations/{authentik,stalwart,ocis}.client.ts` |
|
||
| Orchestration | `services/platform-api/src/tenants/provisioning.service.ts` |
|
||
| `/tenants/:slug/reconcile` | `services/platform-api/src/tenants/tenants.controller.ts` |
|
||
| Portal proxy routes | `apps/portal/server/api/tenants/index.post.ts` + `[slug]/reconcile.post.ts` |
|
||
|
||
### Quick smoke test
|
||
|
||
From the portal in the browser (signed in), in DevTools:
|
||
|
||
```js
|
||
// Create a fresh tenant
|
||
await fetch('/api/tenants', {
|
||
method: 'POST',
|
||
headers: {'Content-Type':'application/json'},
|
||
body: JSON.stringify({ slug: 'acme', name: 'Acme Co', plan: 'pro' })
|
||
}).then(r => r.json())
|
||
|
||
// Re-run provisioning (idempotent)
|
||
await fetch('/api/tenants/acme/reconcile', { method: 'POST' }).then(r => r.json())
|
||
```
|
||
|
||
Response should include `provisioningStatus: { authentik: 'ok', stalwart:
|
||
'skipped', ocis: 'skipped' }` and `status: 'active'`. Verify the Authentik
|
||
group exists via the admin UI at `/if/admin/#/identity/groups`.
|
||
|
||
### Stub follow-up work
|
||
|
||
**Stalwart (JMAP)** — v0.16 [moved management off REST](https://stalw.art/docs/api/management/overview).
|
||
Need a minimal JMAP client that wraps `Domain/set` (create), `Domain/get`
|
||
(idempotency check), `Principal/set` (DKIM-keyed signing identity). Auth
|
||
via the persistent admin's bearer token from the OAuth flow we already use
|
||
for the web UI.
|
||
|
||
**OCIS (libregraph)** — `POST /graph/v1.0/drives` with body
|
||
`{ "name": "<slug>", "driveType": "project" }`. Needs service-to-service
|
||
auth: either an OIDC client_credentials grant (requires registering a new
|
||
Authentik provider for the worker) or the IDM admin user's bearer token.
|
||
|
||
### Authentik API examples (for the eventual user-creation flow)
|
||
|
||
```typescript
|
||
// Create user
|
||
await authentikClient.coreUsersCreate({
|
||
username: user.email,
|
||
email: user.email,
|
||
name: user.name,
|
||
groups: [authentikGroupId],
|
||
})
|
||
```
|
||
|
||
## Operator portal — out-of-band track — shipped (O.0–O.9)
|
||
|
||
`operator.dezky.local` is live as a separate Nuxt app with its own
|
||
`dezky-operator` Authentik OAuth client. Full plan and execution log in
|
||
[`OPERATOR-PLAN.md`](./OPERATOR-PLAN.md).
|
||
|
||
What landed:
|
||
- `services/provisioning` renamed to `services/platform-api`
|
||
- Audience-aware JwtAuthGuard accepts both `dezky-portal` and `dezky-operator`
|
||
- `Partner` schema + CRUD endpoints, `Tenant.partnerId` ref
|
||
- Tenant lifecycle (suspend / resume) gated by OperatorGuard
|
||
- **Real Infrastructure live-probes** — `GET /health/platform` runs TCP +
|
||
HTTP probes against every neighbouring service; UI splits "Live" vs
|
||
"Planned" with honest status.
|
||
- **Real feature-flag system** — `Flag` schema + CRUD + bulk eval +
|
||
operator UI + `useFeatureFlag` composable in the portal. Hash-based
|
||
deterministic rollout. See [`FEATURE-FLAGS.md`](./FEATURE-FLAGS.md).
|
||
- Operator UI: Overview (real KPIs), Tenants (7-tab detail w/ Danger),
|
||
Partners (attach/detach), Users, Operator team, real Infrastructure,
|
||
real Feature flags. Visual-only Audit. Placeholders for
|
||
Support/Billing/Reports/Settings.
|
||
- Interactions: ⌘K command palette, impersonation stub (modal + banner),
|
||
incident modal, tweaks panel, **notification drawer**.
|
||
|
||
### Follow-ups before operator hits production
|
||
|
||
In rough priority order — bulk lifted from OPERATOR-PLAN.md:
|
||
|
||
- [ ] **Real impersonation flow** — OAuth Token Exchange (RFC 8693),
|
||
`act` claim on customer portal, audit on entry+exit, banner with
|
||
origin operator identity
|
||
- [ ] **Real audit log collection** — `platform_audit` Mongo collection,
|
||
written by platform-api on every privileged action; stream from there
|
||
instead of `data/fixtures.ts`
|
||
- [x] **Feature flag backend** — shipped. See
|
||
[`FEATURE-FLAGS.md`](./FEATURE-FLAGS.md). Remaining sub-tasks:
|
||
partnerSlug eval context, user-level flags, audit-log integration,
|
||
server-side cache (all called out in that doc).
|
||
- [ ] **Incident management backend** — `Incident` schema + paging
|
||
(PagerDuty / OpsGenie / custom). Until then, IncidentModal is mock.
|
||
- [ ] **Support ticket queue** — `SupportTicket` schema + email-in
|
||
ingestion from a dedicated mailbox via Stalwart
|
||
- [ ] **Self-serve Partner portal at `partner.dezky.local`** — own Nuxt
|
||
app, own OAuth client, scoped to a partner's own customers
|
||
- [ ] **Real environment switcher** — currently cosmetic; would need
|
||
separate API endpoints per env, separate Authentik tenants
|
||
- [ ] **Real on-call indicator** — integration with the paging system from
|
||
the incident backend
|
||
- [ ] **Operator workspace impersonation in OCIS/Stalwart** — operator
|
||
tooling reaches into the customer's files + mail for support, with
|
||
the same audit trail
|
||
- [ ] **MRR aggregation on Partner** when Subscription gains real pricing
|
||
- [ ] **MFA-required Authentik policy** on the `dezky-operator` provider
|
||
(deferred from O.1)
|
||
- [ ] **Delete throwaway endpoints** added during verification:
|
||
`apps/operator/server/api/_verify-token.get.ts`,
|
||
`apps/portal/server/api/_verify-token.get.ts`,
|
||
`apps/operator/server/api/operator-smoke-test.post.ts`,
|
||
`apps/portal/server/api/partners/index.post.ts`
|
||
|
||
## Phase 5: Custom webmail (week 3-4)
|
||
|
||
Goal: Branded webmail client using Stalwart's JMAP API.
|
||
|
||
- [ ] Add JMAP client library to portal
|
||
- [ ] Build inbox view in Nuxt
|
||
- [ ] Build compose dialog
|
||
- [ ] Build message view with thread support
|
||
- [ ] Style to match Dezky branding
|
||
|
||
JMAP is a modern JSON-RPC protocol — clean to work with.
|
||
|
||
## Phase 6: Production migration prep (week 4+)
|
||
|
||
When the local stack is solid and you have 2-3 pilot customers interested:
|
||
|
||
- [ ] Order Hetzner AX41-NVMe
|
||
- [ ] Order Storage Box BX11 (Falkenstein)
|
||
- [ ] Enable Hetzner Object Storage (bucket: dezky-ocis-prod)
|
||
- [ ] Build Terraform module for Hetzner provisioning
|
||
- [ ] Build Ansible playbook for bare-metal Stalwart deployment
|
||
- [ ] Set up k3s on the cloud server
|
||
- [ ] Migrate compose to Helm charts
|
||
- [ ] Configure Let's Encrypt via cert-manager
|
||
- [ ] Set up Restic backup jobs to Storage Box + B2
|
||
|
||
## Phase 7: Add Zulip and Jitsi (when chat/video needed)
|
||
|
||
These were excluded from MVP for simplicity. When ready:
|
||
|
||
- [ ] Create `infrastructure/docker-compose/docker-compose.optional.yml`
|
||
- [ ] Add Zulip stack (server + db + worker)
|
||
- [ ] Add Jitsi stack (web + prosody + jicofo + jvb)
|
||
- [ ] Configure OIDC integration with Authentik
|
||
- [ ] Add to portal launcher
|
||
|
||
## Decisions still open
|
||
|
||
These need to be made before public launch:
|
||
|
||
- [ ] Final pricing tiers (MVP, Pro, Enterprise)
|
||
- [ ] dezky.com purchase decision ($3,000 via BrandBucket)
|
||
- [ ] Final logo design (4 directions explored, need to pick one)
|
||
- [ ] Legal entity structure for the new business
|
||
- [ ] DPA (databehandleraftale) template
|
||
- [ ] Customer support process (ticket system choice)
|
||
|
||
## Long-term architecture goals
|
||
|
||
- [ ] Multi-region deployment (Hetzner Falkenstein + Helsinki)
|
||
- [ ] Disaster recovery: cross-DC Restic copies
|
||
- [ ] ISO 27001 certification via Vanta
|
||
- [ ] GDPR Article 30 record of processing activities
|
||
- [ ] SOC 2 (later, for enterprise customers)
|
||
- [ ] Customer-facing status page (Uptime Kuma or cstate)
|
||
- [ ] Public documentation site
|
||
- [ ] Self-service migration tooling from M365
|