reference
The paper 'Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers' categorizes hallucination detection metrics into black-box scorers (non-contradiction probability, normalized semantic negentropy, normalized cosine similarity, BERTSCore, BLEURT, and exact match rate), white-box token-probability-based scorers (minimum token probability, length-normalized token probability), and LLM-as-a-Judge scorers (categorical incorrect/uncertain/correct).

Authors

Sources

Referenced by nodes (2)