dezky/docs/FEATURE-FLAGS.md

# Feature flags

Dezky has a real, tenant-aware feature flag system. Use it whenever you ship
something that should roll out incrementally, be gated per plan/tenant, or
needs an instant kill switch in production. Don't push risky behavior behind
hardcoded `if (env === ...)` checks — flip a flag instead.

## When to add a flag

- The change can break things for real customers and you want a kill switch
- You want to ship to internal / friendly tenants first
- The feature is gated by plan tier (Pro/Enterprise)
- You're doing trunk-based development on a feature that takes more than
  one PR to land
- Compliance-sensitive features (GDPR export, retention, audit) — kill
  switch is mandatory

When you **don't** need one: pure UI tweaks, bug fixes, anything that's safe
to release to everyone at once.

## Where it lives

| Layer | Path | What it does |
|---|---|---|
| Schema + service | `services/platform-api/src/flags/` | CRUD + bulk eval (hash-based rollout) |
| Operator UI | `apps/operator/pages/flags.vue` + `components/FlagDetail.vue` | List, side panel, kill-switch, change history |
| Portal helper | `apps/portal/composables/useFeatureFlag.ts` | What you'll import from app code |
| Seed | `services/platform-api/src/seed/seed.service.ts` (`FLAG_SEEDS`) | The 10 flags created on bootstrap |

## Using a flag from app code

In the customer portal:

```vue
<script setup lang="ts">
const showNewInbox = useFeatureFlag('jmap_native_v2')
</script>

<template>
  <NewInbox v-if="showNewInbox" />
  <LegacyInbox v-else />
</template>
```

- One bulk eval per session — the composable shares a module-level cache.
- Fail-closed: every flag stays `false` if the eval call errors.
- The returned ref is reactive — gated UI stays hidden during the ~25ms
  round-trip and appears when the answer lands.

For multi-flag panels or long-lived sessions:

```ts
const { flags, ready, refresh } = useFeatureFlags()
```

The composable's tenant context comes from the signed-in user's JWT — no
slug parameter. Operator-side checks (where there's no "current tenant")
go directly through `POST /api/flags/evaluate` with an explicit
`{ tenantSlug }`.

## Adding a new flag

1. **Add to the seed list** in
   `services/platform-api/src/seed/seed.service.ts → FLAG_SEEDS`. This
   documents what the flag is for and ensures every environment gets it
   on bootstrap. State defaults to `off` for safety.
2. **Restart platform-api** (or wait for HMR + the bootstrap hook). New
   keys are upserted via `$setOnInsert` so existing operator edits
   survive.
3. **Open `https://operator.dezky.local/flags`**, click the row, set
   targeting/rollout, save.
4. **Reference the key** from app code via `useFeatureFlag('your_key')`.

Alternative: create the flag directly through the operator UI's
"New flag" button. The seed list is for keys that should always exist;
the UI is for ad-hoc experiments.

## The 4 states

| State | Meaning |
|---|---|
| `off` | Disabled for everyone, ignores scope. Default kill-switch state. |
| `on` | Enabled for everyone, ignores scope. |
| `targeted` | Explicit allowlist. Requires non-empty scope — empty allowlist evaluates to false ("nobody is on the list yet"). |
| `rollout` | Scope filter + deterministic hash bucket. `sha256("${tenantId}:${flagKey}") % 100 < pct`. Same tenant always gets the same answer until `pct` changes, so bumping 25→50 only flips the new slice. |

## The 4 scope axes (all optional, AND-ed when set)

- **plans** — `['pro', 'enterprise']`
- **tenantSlugs** — explicit allowlist of tenants
- **partnerSlugs** — partner-level pilots (not wired into eval context yet)
- **environments** — `['prod', 'staging']`

Empty list on an axis = "no restriction on this axis".

## Kill switch

One click in the operator UI flips a flag to `state: 'off'` + `pct: 0` and
appends a `kill-switch` history entry. Use it when something's misbehaving
in production and you need it dark immediately. Then triage at leisure.

## Conventions

- **Keys** are snake_case, lowercase, start with a letter. Match the regex
  in `CreateFlagDto`: `^[a-z][a-z0-9_]{1,62}[a-z0-9]$`.
- **One flag per intent**. Don't reuse `new_thing_v2` for unrelated
  features — name them separately.
- **Delete flags** once a feature is `on` for everyone and you've removed
  the legacy branch. Stale flags rot fast.
- **Don't gate auth, billing-critical, or audit-logging code** behind a
  flag where `false` would silently skip security work. Flags should
  pick between two correct paths, not enable correctness.

## What's not built yet

- **partnerSlug eval context** — the schema axis exists but the service
  doesn't currently hydrate `ctx.partnerSlug` from the tenant doc.
  Add when the first partner-gated flag actually needs it.
- **User-level flags** — eval is tenant-level only. If you need
  per-individual gating (e.g. internal preview for specific staff),
  combine `targeted` + a synthetic single-user tenant for now.
- **Audit log integration** — flag changes write to embedded `history`
  on the flag doc, capped at 20. Switch to the real audit collection
  once that exists.
- **Server-side cache** — `evaluateAll` re-reads all flags from Mongo
  on every call. With ~10–50 flags this is fine; if a service ends up
  evaluating per-request and flag count grows, add a small TTL cache
  (~5s) in `FlagsService`.