Imagine a customer support chatbot that confidently tells a user they have 365 days to return a product. The actual policy is 30 days. Every system metric is green: P95 latency at 142ms, throughput at 1.2k requests per second, eval accuracy at 94.2%, error rate at 0.02%