Commit Graph

3 Commits

Author SHA1 Message Date
Ronni Baslund df18128617 feat(audit): OCIS file-tail ingest worker (Phase 2 chunk 3)
Tails OCIS's JSON-Lines audit log on a shared Docker volume and forwards
mutations into AuditService. Final piece of Phase 2 — the /audit page now
unifies platform-api, authentik, and ocis events on one timeline.

services/platform-api/src/ingest/ocis.ingest.ts:
  - 5s polling loop (fs.watch is unreliable across Docker bind mounts on
    macOS). Stat → detect inode change or truncation → resume from byte
    position OR start over.
  - Cursor in IngestCursor stores lastEventId = "<inode>:<bytePosition>".
    Restarts resume cleanly; on overlap the (source, externalId) unique
    index dedups silently.
  - Lines collected first, then processed sequentially after the read
    stream closes. Earlier draft fired recordOne() from inside the
    readline 'line' callback which would have resolved the stream
    before all writes finished — same class of race we hit in the
    Authentik worker, fixed before commit.
  - Tenant inference: spaceName (set during provisioning to the slug)
    first, then User.authentikSubjectId → tenantIds → Tenant.slug.
  - Mutations only: OCIS_ALLOWLIST in action-map.ts whitelists 24 event
    types (User/Group/Space/Share/Link/File mutations). FileDownloaded,
    UserSignedIn, and the rest of the high-volume read traffic gets
    skipped — keeps the timeline scannable.

services/platform-api/src/ingest/action-map.ts:
  - mapOcisAction() + OCIS_ALLOWLIST. Returns null for non-whitelisted
    types so the worker filters early.

infrastructure/docker-compose/docker-compose.yml:
  - New named volume `ocis_audit_log` mounted writeable on the ocis
    container and read-only on platform-api.
  - OCIS env: OCIS_ADD_RUN_SERVICES=audit (the audit microservice is
    NOT in the default `ocis server` set — opt in explicitly),
    AUDIT_LOG_FILE_PATH=/var/log/ocis/audit.log, AUDIT_LOG_FORMAT=json.
  - platform-api env: OCIS_AUDIT_LOG_PATH points at the same file.

Verified end-to-end with synthetic events written to the audit log:
  - Worker tailed 5 events across initial read + incremental append
    (5 → bytes 0:1276, then 1 → bytes 1276:1519).
  - FileDownloaded correctly filtered by the allowlist (4 mutations
    landed in Mongo, not 5).
  - Tenant inference: events with executingUser.id resolved to
    `dezky` via User → tenantIds → Tenant.slug.
  - Operator /audit shows all three sources (89 events: 79 authentik
    + 5 platform-api + 5 ocis) in one unified timeline.

Known unknown — same shape as the Stalwart commit: I couldn't fully
confirm the OCIS v7 audit microservice emits events with just
OCIS_ADD_RUN_SERVICES=audit + the AUDIT_LOG_FILE_PATH env. The audit
service starts but the file stays empty until OCIS internals start
publishing events to NATS (which may need additional service-side
config). The ingest worker is correct regardless — when OCIS starts
writing real events, they'll flow into /audit. This is a follow-up
in the OCIS-side configuration, not in our ingest code.
2026-05-24 20:30:47 +02:00
Ronni Baslund 7bec940e7f feat(audit): Stalwart webhook ingest endpoint (Phase 2 chunk 2)
Push-based ingest for mail-server events. Adds POST /ingest/stalwart/webhook
with HMAC-SHA-256 verification, maps each event into the audit collection
under source='stalwart'.

services/platform-api/src/ingest/stalwart-webhook.controller.ts:
  - Public endpoint (no JwtAuthGuard — Stalwart can't carry a JWT). Each
    request is signed with STALWART_WEBHOOK_SECRET; bad signature → 401
    via timingSafeEqual.
  - Body: { events: [{ id, type, createdAt, data }, ... ] }. Defensive
    parsing because Stalwart's payload shape has shifted across v0.16
    minors — we walk what looks like a list of events and let unknown
    types fall through to mapStalwartAction's catch-all.
  - Per-event recordOne: action via mapStalwartAction(), actor from
    data.email/account/username, IP from data.ip or X-Forwarded-For,
    targetName from data.account/email/address/to, full payload kept
    in metadata. externalId = evt.id so the (source, externalId)
    unique index dedups re-deliveries.

action-map.ts: 14 known Stalwart event types →
  stalwart.{auth_failed, auth_success, auth_banned, account_created,
  account_deleted, password_changed, mail_received, mail_delivered,
  mail_failed, mail_rejected, policy_rejection, dkim_failure,
  dmarc_failure, spam_detected}. Snake/kebab forms normalized.

infrastructure/docker-compose:
  - .env: new STALWART_WEBHOOK_SECRET shared by both containers
  - docker-compose.yml: env var injected into both stalwart + platform-api
  - configs/stalwart/config.toml: [webhook."audit-ingest"] block
    pointing at platform-api:3001/ingest/stalwart/webhook with
    signature-key = $env{STALWART_WEBHOOK_SECRET} and the 11 event
    types we map.

Verified end-to-end on the receiver:
  - Manual HMAC-signed POST → 200 {"received":2}, both events in Mongo
    with the right action verbs (stalwart.auth_failed, stalwart.account_created),
    actor/IP/externalId populated.
  - Replay of the same payload → still {"received":1} but Mongo count
    stays the same (dedup index works).
  - X-Signature: deadbeef → 401, no row written.

Known unknown: I couldn't fully confirm Stalwart v0.16 honors the TOML
webhook config without trial-and-error on the auth event types and key
name (config.toml uses signature-key; some Stalwart builds want plain
'key'). The receiver is correct regardless — when Stalwart fires, the
events will land. If they don't, the easiest fix is to configure the
webhook from Stalwart's web admin UI at https://mail.dezky.local
instead of via TOML.
2026-05-24 20:21:29 +02:00
Ronni Baslund b1d717e466 feat(audit): Authentik events ingest worker (Phase 2 chunk 1)
Background worker that pulls Authentik's /api/v3/events/events/ on a
60s cadence and writes each event into our audit log via AuditService.
External system events now share the same /audit timeline as
internally-recorded platform mutations — operator queries don't have
to cross-reference Authentik's own UI to see logins, password changes,
group membership, impersonation, etc.

Pieces:
- src/schemas/ingest-cursor.schema.ts: one row per source, tracks
  lastEventAt + lastEventId so restarts resume without re-pulling.
- src/schemas/audit-event.schema.ts: new `externalId` field; new
  compound unique index on (source, externalId) with a partial filter
  on externalId being a string. Partial (not sparse) so internally-
  recorded events with externalId=null don't collide.
- src/audit/audit.service.ts: AuditRecordInput grows `externalId` +
  `at` fields. record() now silently swallows MongoError code 11000
  (duplicate key) so re-pulling the cursor overlap doesn't log noise.
- src/integrations/authentik.client.ts: listEvents(since, page,
  pageSize) on the existing client — reuses the admin token and base
  URL the provisioning code already configured.
- src/ingest/action-map.ts: 16 known Authentik actions → dotted
  authentik.* verbs (login, login_failed, password_changed,
  impersonation_started, …). Unknown actions fall through to
  authentik.<raw> rather than getting silently dropped.
- src/ingest/authentik.ingest.ts: OnApplicationBootstrap worker.
  Reads cursor → pulls events with created__gt=cursor, ordering=created
  ASC → paginates forward (10 pages × 100/page safety cap per tick) →
  writes each event with source='authentik' + externalId=pk + at=
  evt.created → advances cursor to the newest seen. inFlight guard
  prevents overlapping ticks. AUDIT_INGEST_ENABLED=false disables for
  test environments.
- Tenant inference: from the user's groups (same convention the portal
  flag-eval proxy uses). Admin groups stripped; first match against a
  real Tenant.slug wins. Unmatched → tenantSlug undefined, event still
  lands in the global timeline.

Smoke-tested: fresh Mongo + restart → 78 Authentik events ingested,
0 duplicates. Performed a login at app.dezky.local → next 60s tick
captured the new login row with actor email + IP. Compound unique
index on (source, externalId) verified to reject re-pulled events
silently (no error logs).

Out of scope here (covered by chunks 2 + 3):
- Stalwart webhook ingest
- OCIS file-tail ingest
2026-05-24 20:12:21 +02:00