reference
The paper 'Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers' categorizes hallucination detection metrics into black-box scorers (non-contradiction probability, normalized semantic negentropy, normalized cosine similarity, BERTSCore, BLEURT, and exact match rate), white-box token-probability-based scorers (minimum token probability, length-normalized token probability), and LLM-as-a-Judge scorers (categorical incorrect/uncertain/correct).
Authors
Sources
- EdinburghNLP/awesome-hallucination-detection - GitHub github.com via serper
Referenced by nodes (2)
- LLM-as-a-judge concept
- BERTScore concept