4d9e906ec1
Final piece of the audit work. Events older than the hot retention window
move to S3-compatible object storage with signed manifests. Production uses
Hetzner Object Storage; dev uses a MinIO container with the same API.
Infra (infrastructure/docker-compose):
- New `minio` service exposing the S3 API at minio:9000 + admin console at
minio.dezky.local. Healthchecked. Bucket-init sidecar runs `mc mb` once
to create `dezky-audit`; safe to re-run.
- .env adds MINIO_ROOT_USER + MINIO_ROOT_PASSWORD.
- platform-api env: AUDIT_COLD_{ENDPOINT,REGION,BUCKET,ACCESS_KEY,SECRET_KEY}
+ AUDIT_HOT_RETENTION_DAYS=90 + ARCHIVE_ENABLED=false (dormant in dev;
operator UI's "Run archive now" bypasses this gate). AUDIT_COLD_SSE
opts into SSE-S3 — left unset in dev because MinIO without a KMS rejects
AES256 PUTs with "KMS is not configured".
Platform-api (services/platform-api/src/cold/):
- cold-storage.client.ts: thin @aws-sdk/client-s3 wrapper — put/head/list.
forcePathStyle=true so MinIO and Hetzner both work; same code, env-swap.
- archive.service.ts: runOnce() selects chained events with at < cutoff →
serializes to JSONL → gzip → sha256s → uploads JSONL + signed manifest
→ HEAD-confirms both objects exist → records an ArchiveBatch doc → only
then deletes from hot Mongo. Crash-safe: a failed upload leaves events
in hot. Manifest uses the Phase 3 AUDIT_SIGNING_KEY (HMAC-SHA-256), so
archives + checkpoints share trust chain. Bypassable via { override:
true } for the operator's UI force-run.
- archive.worker.ts: hourly tick guarded by configured run-hour-UTC
(default 03:00) + day-guard so the same UTC day doesn't archive twice.
Disabled until ARCHIVE_ENABLED=true.
- archive-batch.schema.ts: { archivedAt, startSeq, endSeq, eventCount,
manifestSha256, jsonlKey, manifestKey, bytesUncompressed }. The
manifest sha256 stored in Mongo lets us detect manifest tampering
without downloading the actual manifest.
Audit module additions:
- audit.controller.ts: GET /audit/archives, POST /audit/archive/run,
/audit/verify now reports { oldestHotSeq, highestArchivedSeq } so the
UI shows the tier boundary.
Operator UI (apps/operator):
- 2 new proxies: /api/audit/archives + /api/audit/archive/run (force
override=true). Both behind operator auth via the existing platformApi
helper.
- audit.vue: new "Cold storage" card with batch table (archived-at, seq
range, event count, size, truncated manifest sha256), "Run archive
now" button + per-run result line.
Smoke-tested end-to-end:
- 7 chained events in hot. /api/audit/archive/run → ok=true, batchId
returned. JSONL + manifest both exist in MinIO (verified via mc ls +
mc cat). Mongo's chained set went 7 → 0. Verify reports
highestArchivedSeq=1446 (since we burn-allocate seqs on Authentik
dup-key rejections). Operator /audit panel shows the batch with
manifest hash 1d8263…
- First attempt with SSE-S3 enabled failed cleanly (MinIO KMS not
configured) — archive service correctly left events in hot Mongo.
Made SSE opt-in via AUDIT_COLD_SSE=true; prod turns it on.
Out of scope (each could be its own session):
- Restore-to-hot endpoint (today: download from S3 + offline query)
- Client-side encryption (today: SSE-S3 in prod, none in dev)
- Multi-region replication
- Soft TTL safety net (defense-in-depth on top of app-managed deletion)
This completes the four-phase audit log work:
1. platform-api as audit hub
2. External system ingest (Authentik / Stalwart / OCIS)
3. Hash-chain + signed checkpoints (tamper evidence)
4. Cold-storage archival (retention without unbounded Mongo growth)
79 lines
2.7 KiB
TypeScript
79 lines
2.7 KiB
TypeScript
// Daily-ish scheduler for the audit archive run. Disabled by default
|
|
// (ARCHIVE_ENABLED=false) — production turns it on once volumes warrant.
|
|
// The operator UI can force a run via POST /audit/archive/run regardless
|
|
// of this flag.
|
|
|
|
import {
|
|
Injectable,
|
|
Logger,
|
|
type OnApplicationBootstrap,
|
|
type OnModuleDestroy,
|
|
} from '@nestjs/common'
|
|
import { ConfigService } from '@nestjs/config'
|
|
import { ArchiveService } from './archive.service.js'
|
|
|
|
// One hour. Inside each tick we check if the current UTC hour matches the
|
|
// configured run-hour; only one of 24 ticks per day actually invokes the
|
|
// archive. Cheap and simple — no real cron lib needed.
|
|
const TICK_MS = 60 * 60 * 1000
|
|
const DEFAULT_RUN_HOUR_UTC = 3 // 03:00 UTC daily
|
|
|
|
@Injectable()
|
|
export class ArchiveWorker implements OnApplicationBootstrap, OnModuleDestroy {
|
|
private readonly logger = new Logger(ArchiveWorker.name)
|
|
private readonly enabled: boolean
|
|
private readonly runHour: number
|
|
private timer: NodeJS.Timeout | null = null
|
|
private lastRunDay: string | null = null
|
|
|
|
constructor(
|
|
private readonly archive: ArchiveService,
|
|
config: ConfigService,
|
|
) {
|
|
this.enabled = config.get('ARCHIVE_ENABLED') === 'true'
|
|
this.runHour = Number(config.get('ARCHIVE_RUN_HOUR_UTC') ?? DEFAULT_RUN_HOUR_UTC)
|
|
}
|
|
|
|
onApplicationBootstrap(): void {
|
|
if (!this.enabled) {
|
|
this.logger.log(
|
|
'ARCHIVE_ENABLED=false — archive scheduler dormant. Operator UI can force runs.',
|
|
)
|
|
return
|
|
}
|
|
this.logger.log(`Archive scheduler active · runs daily at ${this.runHour}:00 UTC`)
|
|
// Fire once on startup in case we missed a window; the day-guard prevents
|
|
// double-runs within the same UTC day.
|
|
void this.tick()
|
|
this.timer = setInterval(() => void this.tick(), TICK_MS)
|
|
}
|
|
|
|
onModuleDestroy(): void {
|
|
if (this.timer) clearInterval(this.timer)
|
|
}
|
|
|
|
private async tick(): Promise<void> {
|
|
const now = new Date()
|
|
if (now.getUTCHours() !== this.runHour) return
|
|
const today = now.toISOString().slice(0, 10) // YYYY-MM-DD UTC
|
|
if (this.lastRunDay === today) return // already ran today
|
|
this.lastRunDay = today
|
|
|
|
this.logger.log(`Archive tick fired for ${today}`)
|
|
try {
|
|
const res = await this.archive.runOnce()
|
|
if (res.ok && res.eventCount) {
|
|
this.logger.log(`Archive complete: ${res.eventCount} events (${res.startSeq}-${res.endSeq})`)
|
|
} else if (res.ok) {
|
|
this.logger.log(`Archive complete: ${res.reason ?? 'no-op'}`)
|
|
} else {
|
|
this.logger.error(`Archive failed: ${res.reason}`)
|
|
}
|
|
} catch (err) {
|
|
this.logger.error(
|
|
`Archive tick crashed: ${err instanceof Error ? err.message : String(err)}`,
|
|
)
|
|
}
|
|
}
|
|
}
|