Telemetry and Logging (ST-108)
Overview
Implements basic telemetry and logging infrastructure for monitoring application health, performance, and usage. Includes structured logging, Prometheus metrics, and telemetry event tracking.
Logging
Structured Logging
Location: apps/web/src/lib/server/logger.ts
import { logger } from '@/lib/server/logger';
logger.info('User logged in', { userId: '123' });
logger.error('Database connection failed', { error });
logger.warn('Rate limit approaching', { endpoint: '/api/ingestion' });
Log Levels
fatal- System is unusableerror- Error eventswarn- Warning eventsinfo- Informational messagesdebug- Debug messagestrace- Trace messages
Configuration
LOG_LEVEL=info # fatal, error, warn, info, debug, trace
Log Format
Structured JSON logs with:
- Timestamp
- Level
- Message
- Metadata (optional)
- Request ID (when available)
Metrics
Prometheus Metrics
Location: apps/web/src/lib/server/metrics.ts
Exposed at: GET /api/metrics
Default Metrics
- CPU usage
- Memory usage
- Event loop lag
- Active handles
- Active requests
Custom Metrics
import { httpRequestDurationMs } from '@/lib/server/metrics';
// Record request duration
const start = Date.now();
// ... handle request ...
httpRequestDurationMs.observe({ method: 'POST', route: '/api/ingestion' }, Date.now() - start);
Metric Types
- Histogram: Request duration, response sizes
- Counter: Request counts, error counts
- Gauge: Active connections, queue depth
Telemetry Events
Event Tracking
Location: apps/web/src/lib/server/telemetry.ts
import { recordTelemetry } from '@/lib/server/telemetry';
await recordTelemetry({
eventType: 'test.run.completed',
properties: {
repoId: 'repo-123',
runId: 'run-456',
testCount: 100,
},
requestId: 'req-789',
});
Event Storage
Events stored in telemetry_events table:
eventType- Event identifiereventData- JSONB event payloadrepoId- Repository ID (optional)userId- User ID (optional)eventTimestamp- Event timestamp
Privacy
- No PII in telemetry events
- Only structural fields allowed
- User opt-in required (
telemetryOptIn)
Configuration
TELEMETRY_ENABLED=true # Enable telemetry
Request Context
Request ID
Every request gets a unique ID:
- Generated in middleware
- Included in logs and metrics
- Passed to telemetry events
Context Extraction
import { extractRequestContext } from '@/lib/server/audit-service';
const context = extractRequestContext(request);
// { ipAddress, userAgent, requestId }
Health Monitoring
Status Endpoint
GET /api/status
Returns overall system health:
- Database connectivity
- Message queue connectivity
- Object storage connectivity
- Service status
Health Endpoints
GET /api/health/db- Database healthGET /api/health/mq- Message queue healthGET /api/health/storage- Object storage health
Observability
Log Aggregation
- Structured JSON logs
- Console output (development)
- Future: Centralized log aggregation (ELK, Datadog, etc.)
Metrics Collection
- Prometheus-compatible metrics
- Scraped by monitoring system
- Future: Grafana dashboards
Distributed Tracing
- Request ID propagation
- Future: OpenTelemetry integration
Best Practices
-
Use appropriate log levels
errorfor errors requiring attentionwarnfor recoverable issuesinfofor important eventsdebugfor development debugging
-
Include context
- Request ID
- User ID (when available)
- Repository ID (when relevant)
-
Avoid PII
- No emails, passwords, tokens
- Hash sensitive identifiers
- Sanitize user input
-
Structured logging
- Use metadata objects
- Consistent field names
- JSON format
Future Enhancements
- Centralized log aggregation
- Grafana dashboards
- Alerting rules
- OpenTelemetry integration
- Log retention policies
- Telemetry event streaming to MQ
Related Documentation
- Audit Logging - Security audit logs
- Project Setup - Local development
- API Gateway - API endpoints