feat(operator): live Infrastructure probes + honest split between deployed and planned

The Infrastructure page used to read from a mock fixture that lied two ways:
it listed services that aren't deployed (Jitsi, Zulip, Cloudflare, Object
Storage, Postmark) and showed hardcoded uptime/latency for the ones that
are. Now it shows truth from real probes plus a clearly-labelled "planned"
section for the rest.

Backend (services/platform-api):
- New src/health/ module — HealthService runs 9 probes in parallel with a
  1.5s timeout each:
    Stalwart  → TCP stalwart:8080
    OCIS      → HTTP GET ocis:9200/health
    Collabora → HTTP GET collabora:9980/hosting/discovery
    Authentik → HTTP GET authentik-server:9000/-/health/ready/
    Postgres  → TCP postgres:5432
    Mongo     → existing Mongoose connection.db.admin().ping()
    Redis     → TCP redis:6379
    Traefik   → TCP traefik:80
    Platform API → trivially ok (this code is running)
  Status thresholds: ok ≤500ms, warn 500–1500ms, bad on timeout/refuse.
- HealthController exposes GET /health/platform behind JwtAuthGuard, plus
  keeps the existing public GET /health for infra liveness checks.
- Moved the old src/health.controller.ts into the new module.

Frontend (apps/operator):
- /api/health/platform proxy forwards the operator's access token.
- Infrastructure page swaps SERVICES fixture for useFetch with 30s auto-
  refresh + a manual Refresh button. Cards show real status badge + real
  latency; uptime/error stay as em-dash with a "no probe history yet"
  tooltip until a Prometheus/event-log backend lands.
- Below the live grid, a "Planned · not deployed" section renders 5 dimmed
  cards (Jitsi, Zulip, simpledns.plus, Hetzner Object Storage, Postmark).
  simpledns.plus replaces the misnamed Cloudflare entry — we use
  simpledns.plus, not Cloudflare.
- Subtitle is now truthful: "8 / 9 services live · checked 2s ago".

Verified: stopped redis → card flipped to "down · getaddrinfo ENOTFOUND
redis", subtitle reflected 8/9, incident banner appeared. Restarted →
back to 9/9, banner gone.

SERVICES fixture stays in place for Overview's incident banner — replacing
that is a separate follow-up tied to the incident-management backend.
This commit is contained in:
Ronni Baslund
2026-05-24 18:47:38 +02:00
parent 9fac11e668
commit 77a09aaf77
8 changed files with 316 additions and 43 deletions
+2 -2
View File
@@ -2,7 +2,7 @@ import { Module } from '@nestjs/common'
import { ConfigModule } from '@nestjs/config'
import { MongooseModule } from '@nestjs/mongoose'
import { AuthModule } from './auth/auth.module.js'
import { HealthController } from './health.controller.js'
import { HealthModule } from './health/health.module.js'
import { PartnersModule } from './partners/partners.module.js'
import { SeedModule } from './seed/seed.module.js'
import { SubscriptionsModule } from './subscriptions/subscriptions.module.js'
@@ -16,12 +16,12 @@ import { UsersModule } from './users/users.module.js'
process.env.MONGODB_URI ?? 'mongodb://localhost:27017/dezky',
),
AuthModule,
HealthModule,
TenantsModule,
PartnersModule,
UsersModule,
SubscriptionsModule,
SeedModule,
],
controllers: [HealthController],
})
export class AppModule {}