reference
The study 'Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback' uses Expected Calibration Error (ECE) with temperature scaling (ECE-t), accuracy@coverage, and coverage@accuracy as metrics, and utilizes QA datasets including TriviaQA, SciQ, and TruthfulQA.
Authors
Sources
- EdinburghNLP/awesome-hallucination-detection - GitHub github.com via serper
Referenced by nodes (2)
- TruthfulQA concept
- TriviaQA concept