Test Run Comparison
Compare Test Runs
Select two test runs to see a side-by-side comparison of their metrics and get AI-powered insights.
Metric Comparison
Comparing TR-001 (Chatbot A) vs. TR-004 (Chatbot B)
| Metric | Test Run A | Test Run B | Trend |
|---|---|---|---|
| Average Latency (s) | 1.2 | 0.9 | |
| Estimated Cost ($) | 0.45 | 0.51 | |
| Answer Accuracy | 88% | 92% | |
| Similarity Score | 78% | 82% | |
| Hallucination Rate | 5% | 3% | |
| Fluency Score | 90% | 94% | |
| Coherence Score | 86% | 92% |
AI-Powered Insights
TR-004 shows a 4% improvement in answer accuracy and a lower hallucination rate, suggesting better response quality and reliability.
However, its average latency is 25% higher, which is a significant performance decrease.
The trade-off is a 13% increase in estimated cost. Consider if the quality and reliability improvements justify the cost and latency increase for your use case.