measurement
The Mistral model exhibits pronounced performance degradation in zero-shot settings, with performance drops observed in Perplexity metrics, whereas the Llama model maintains more consistent performance with minimal degradation.
Authors
Sources
- Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org via serper
Referenced by nodes (3)
- Perplexity concept
- Zero-Shot concept
- Mistral AI entity