reference
SQuADv2 (Stanford Question Answering Dataset v2) tests a model's ability to avoid hallucinations by including unanswerable questions, requiring the model to provide accurate answers or identify when no answer is possible in a 4-shot setting.
Authors
Sources
- The Hallucinations Leaderboard, an Open Effort to Measure ... huggingface.co via serper
Referenced by nodes (2)
- hallucination detection concept
- SQuAD concept