Test Run Comparison

Compare Test Runs

Select two test runs to see a side-by-side comparison of their metrics and get AI-powered insights.

Test Run A

Test Run B

Metric Comparison

Comparing TR-001 (Chatbot A) vs. TR-004 (Chatbot B)

Metric	Test Run A	Test Run B
Average Latency (s)	1.2	0.9
Estimated Cost ($)	0.45	0.51
Answer Accuracy	88%	92%
Similarity Score	78%	82%
Hallucination Rate	5%	3%
Fluency Score	90%	94%
Coherence Score	86%	92%

AI-Powered Insights

TR-004 shows a 4% improvement in answer accuracy and a lower hallucination rate, suggesting better response quality and reliability.

However, its average latency is 25% higher, which is a significant performance decrease.

The trade-off is a 13% increase in estimated cost. Consider if the quality and reliability improvements justify the cost and latency increase for your use case.