How would you evaluate a customer-support LLM assistant?hard
Use task success, factuality, refusal quality, latency, cost, and human review.
LLM evaluation should combine golden datasets, rubric-based judging, production feedback, safety checks, and regression tests for known failure modes.