b2c2650af93365c9040ef75bfb8d0f5e9a662d6c
3 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
df18128617 |
feat(audit): OCIS file-tail ingest worker (Phase 2 chunk 3)
Tails OCIS's JSON-Lines audit log on a shared Docker volume and forwards
mutations into AuditService. Final piece of Phase 2 — the /audit page now
unifies platform-api, authentik, and ocis events on one timeline.
services/platform-api/src/ingest/ocis.ingest.ts:
- 5s polling loop (fs.watch is unreliable across Docker bind mounts on
macOS). Stat → detect inode change or truncation → resume from byte
position OR start over.
- Cursor in IngestCursor stores lastEventId = "<inode>:<bytePosition>".
Restarts resume cleanly; on overlap the (source, externalId) unique
index dedups silently.
- Lines collected first, then processed sequentially after the read
stream closes. Earlier draft fired recordOne() from inside the
readline 'line' callback which would have resolved the stream
before all writes finished — same class of race we hit in the
Authentik worker, fixed before commit.
- Tenant inference: spaceName (set during provisioning to the slug)
first, then User.authentikSubjectId → tenantIds → Tenant.slug.
- Mutations only: OCIS_ALLOWLIST in action-map.ts whitelists 24 event
types (User/Group/Space/Share/Link/File mutations). FileDownloaded,
UserSignedIn, and the rest of the high-volume read traffic gets
skipped — keeps the timeline scannable.
services/platform-api/src/ingest/action-map.ts:
- mapOcisAction() + OCIS_ALLOWLIST. Returns null for non-whitelisted
types so the worker filters early.
infrastructure/docker-compose/docker-compose.yml:
- New named volume `ocis_audit_log` mounted writeable on the ocis
container and read-only on platform-api.
- OCIS env: OCIS_ADD_RUN_SERVICES=audit (the audit microservice is
NOT in the default `ocis server` set — opt in explicitly),
AUDIT_LOG_FILE_PATH=/var/log/ocis/audit.log, AUDIT_LOG_FORMAT=json.
- platform-api env: OCIS_AUDIT_LOG_PATH points at the same file.
Verified end-to-end with synthetic events written to the audit log:
- Worker tailed 5 events across initial read + incremental append
(5 → bytes 0:1276, then 1 → bytes 1276:1519).
- FileDownloaded correctly filtered by the allowlist (4 mutations
landed in Mongo, not 5).
- Tenant inference: events with executingUser.id resolved to
`dezky` via User → tenantIds → Tenant.slug.
- Operator /audit shows all three sources (89 events: 79 authentik
+ 5 platform-api + 5 ocis) in one unified timeline.
Known unknown — same shape as the Stalwart commit: I couldn't fully
confirm the OCIS v7 audit microservice emits events with just
OCIS_ADD_RUN_SERVICES=audit + the AUDIT_LOG_FILE_PATH env. The audit
service starts but the file stays empty until OCIS internals start
publishing events to NATS (which may need additional service-side
config). The ingest worker is correct regardless — when OCIS starts
writing real events, they'll flow into /audit. This is a follow-up
in the OCIS-side configuration, not in our ingest code.
|
||
|
|
7bec940e7f |
feat(audit): Stalwart webhook ingest endpoint (Phase 2 chunk 2)
Push-based ingest for mail-server events. Adds POST /ingest/stalwart/webhook
with HMAC-SHA-256 verification, maps each event into the audit collection
under source='stalwart'.
services/platform-api/src/ingest/stalwart-webhook.controller.ts:
- Public endpoint (no JwtAuthGuard — Stalwart can't carry a JWT). Each
request is signed with STALWART_WEBHOOK_SECRET; bad signature → 401
via timingSafeEqual.
- Body: { events: [{ id, type, createdAt, data }, ... ] }. Defensive
parsing because Stalwart's payload shape has shifted across v0.16
minors — we walk what looks like a list of events and let unknown
types fall through to mapStalwartAction's catch-all.
- Per-event recordOne: action via mapStalwartAction(), actor from
data.email/account/username, IP from data.ip or X-Forwarded-For,
targetName from data.account/email/address/to, full payload kept
in metadata. externalId = evt.id so the (source, externalId)
unique index dedups re-deliveries.
action-map.ts: 14 known Stalwart event types →
stalwart.{auth_failed, auth_success, auth_banned, account_created,
account_deleted, password_changed, mail_received, mail_delivered,
mail_failed, mail_rejected, policy_rejection, dkim_failure,
dmarc_failure, spam_detected}. Snake/kebab forms normalized.
infrastructure/docker-compose:
- .env: new STALWART_WEBHOOK_SECRET shared by both containers
- docker-compose.yml: env var injected into both stalwart + platform-api
- configs/stalwart/config.toml: [webhook."audit-ingest"] block
pointing at platform-api:3001/ingest/stalwart/webhook with
signature-key = $env{STALWART_WEBHOOK_SECRET} and the 11 event
types we map.
Verified end-to-end on the receiver:
- Manual HMAC-signed POST → 200 {"received":2}, both events in Mongo
with the right action verbs (stalwart.auth_failed, stalwart.account_created),
actor/IP/externalId populated.
- Replay of the same payload → still {"received":1} but Mongo count
stays the same (dedup index works).
- X-Signature: deadbeef → 401, no row written.
Known unknown: I couldn't fully confirm Stalwart v0.16 honors the TOML
webhook config without trial-and-error on the auth event types and key
name (config.toml uses signature-key; some Stalwart builds want plain
'key'). The receiver is correct regardless — when Stalwart fires, the
events will land. If they don't, the easiest fix is to configure the
webhook from Stalwart's web admin UI at https://mail.dezky.local
instead of via TOML.
|
||
|
|
b1d717e466 |
feat(audit): Authentik events ingest worker (Phase 2 chunk 1)
Background worker that pulls Authentik's /api/v3/events/events/ on a 60s cadence and writes each event into our audit log via AuditService. External system events now share the same /audit timeline as internally-recorded platform mutations — operator queries don't have to cross-reference Authentik's own UI to see logins, password changes, group membership, impersonation, etc. Pieces: - src/schemas/ingest-cursor.schema.ts: one row per source, tracks lastEventAt + lastEventId so restarts resume without re-pulling. - src/schemas/audit-event.schema.ts: new `externalId` field; new compound unique index on (source, externalId) with a partial filter on externalId being a string. Partial (not sparse) so internally- recorded events with externalId=null don't collide. - src/audit/audit.service.ts: AuditRecordInput grows `externalId` + `at` fields. record() now silently swallows MongoError code 11000 (duplicate key) so re-pulling the cursor overlap doesn't log noise. - src/integrations/authentik.client.ts: listEvents(since, page, pageSize) on the existing client — reuses the admin token and base URL the provisioning code already configured. - src/ingest/action-map.ts: 16 known Authentik actions → dotted authentik.* verbs (login, login_failed, password_changed, impersonation_started, …). Unknown actions fall through to authentik.<raw> rather than getting silently dropped. - src/ingest/authentik.ingest.ts: OnApplicationBootstrap worker. Reads cursor → pulls events with created__gt=cursor, ordering=created ASC → paginates forward (10 pages × 100/page safety cap per tick) → writes each event with source='authentik' + externalId=pk + at= evt.created → advances cursor to the newest seen. inFlight guard prevents overlapping ticks. AUDIT_INGEST_ENABLED=false disables for test environments. - Tenant inference: from the user's groups (same convention the portal flag-eval proxy uses). Admin groups stripped; first match against a real Tenant.slug wins. Unmatched → tenantSlug undefined, event still lands in the global timeline. Smoke-tested: fresh Mongo + restart → 78 Authentik events ingested, 0 duplicates. Performed a login at app.dezky.local → next 60s tick captured the new login row with actor email + IP. Compound unique index on (source, externalId) verified to reject re-pulled events silently (no error logs). Out of scope here (covered by chunks 2 + 3): - Stalwart webhook ingest - OCIS file-tail ingest |