measurement
Evaluation metrics for hallucination detection and knowledge consistency include MC1, MC2, and MC3 scores for the TruthfulQA multiple-choice task; %Truth, %Info, and %Truth*Info for the TruthfulQA open-ended generation task; subspan Exact Match for open-domain QA tasks (NQ-Open, NQ-Swap, TriviaQA, PopQA, MuSiQue); accuracy for MemoTrap; and Prompt-level and Instruction-level accuracies for IFEval.
Authors
Sources
- EdinburghNLP/awesome-hallucination-detection - GitHub github.com via serper
Referenced by nodes (4)
- TruthfulQA concept
- music concept
- TriviaQA concept
- NQ-Open concept