Test Run Comparison

Compare Test Runs
Select two test runs to see a side-by-side comparison of their metrics and get AI-powered insights.
Metric Comparison
Comparing TR-001 (Chatbot A) vs. TR-004 (Chatbot B)
MetricTest Run ATest Run BTrend
Average Latency (s)1.20.9
Estimated Cost ($)0.450.51
Answer Accuracy88%92%
Similarity Score78%82%
Hallucination Rate5%3%
Fluency Score90%94%
Coherence Score86%92%
AI-Powered Insights

TR-004 shows a 4% improvement in answer accuracy and a lower hallucination rate, suggesting better response quality and reliability.

However, its average latency is 25% higher, which is a significant performance decrease.

The trade-off is a 13% increase in estimated cost. Consider if the quality and reliability improvements justify the cost and latency increase for your use case.