Skip to main content

ADR-0004: ML Serving for MCP (Explainable, Versioned, gRPC-first)

Status: Proposed
Date: 2025-10-21

Context

Predictive flakiness and root-cause categorization require serving ML models with explainability (SHAP/LIME), versioning, and rollback.

Decision

MCP server exposes a gRPC inference API with standard request/response contracts.
Initial serving: custom gRPC wrapper around model runtime (e.g., Python FastAPI sidecar or Rust/Go bindings); capture SHAP values when enabled.
Mandatory: model versioning, canary rollout, rollback triggers.

Consequences

Consistent interface across models; polyglot model support.
Explainability overhead must be bounded; can be toggleable.

Alternatives considered

TorchServe/TensorFlow Serving directly (less flexible for XAI hooks).
REST-only inference (less efficient for high-throughput).

Revisit criteria

Latency/throughput constraints; operational overhead; model lifecycle maturity.

References

PRD §5.2, §5.8; Algorithm doc; Research §3.2.

Context
Decision
Consequences
Alternatives considered
Revisit criteria
References