Perplexity
Also known as: Perplexity AI
Facts (11)
Sources
Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org Aug 13, 2025 5 facts
referenceUncertainty-based methods for hallucination detection in large language models include Perplexity (Ren et al., 2023), Length-Normalized Entropy (LN-Entropy) (Malinin and Gales, 2021), and Semantic Entropy (SemEntropy) (Farquhar et al., 2024), which utilize multiple generations to capture sequence-level uncertainty.
measurementThe Mistral model exhibits pronounced performance degradation in zero-shot settings, with performance drops observed in Perplexity metrics, whereas the Llama model maintains more consistent performance with minimal degradation.
claimSemantic Entropy maintains the most consistent performance across both zero-shot and few-shot settings, while traditional metrics like Perplexity and LN-Entropy show higher sensitivity to setting changes.
measurementExisting hallucination detection methods experience performance drops of up to 45.9% for Perplexity and 30.4% for Eigenscore when evaluated using LLM-as-Judge criteria compared to ROUGE.
measurementThe Perplexity hallucination detection method sees its AUROC score decrease by as much as 45.9% for the Mistral model on the NQ-Open dataset when switching from ROUGE to LLM-as-Judge evaluation.
EdinburghNLP/awesome-hallucination-detection - GitHub github.com 3 facts
claimMetrics used for hallucination detection include SelfCheckGPT, FactScore, EigenScore, Efficient EigenScore (EES), Semantic Entropy, Perplexity, HaluEval Accuracy, and ROUGE-1 (XSum).
measurementEstablished hallucination detection methods including Perplexity, EigenScore, and eRank suffer performance drops of up to 45.9% AUROC when evaluated with human-aligned LLM-as-Judge metrics instead of ROUGE.
measurementEvaluation of generation tasks uses Perplexity, Unigram Overlap (F1), BLEU-4, ROUGE-L, Knowledge F1, and Rare F1 as metrics, and utilizes datasets including WoW and CMU Document Grounded Conversations (CMU_DoG) with the KiLT Wikipedia dump as the knowledge source.
Hallucination Causes: Why Language Models Fabricate Facts mbrenndoerfer.com Mar 15, 2026 1 fact
claimProgress in large language model capabilities, such as perplexity or instruction-following quality, does not automatically translate into progress in hallucination reduction.
Medical Hallucination in Foundation Models and Their ... medrxiv.org Mar 3, 2025 1 fact
measurementThe most commonly mentioned AI/LLM tools by survey respondents were ChatGPT (30 mentions), followed by Claude (20), Google Bard/Gemini (16), Llama (15), Perplexity (9), Alphafold (2), and Scite and Consensus (1).
Reference Hallucination Score for Medical Artificial ... medinform.jmir.org Jul 31, 2024 1 fact
referenceWahid R, Craven C, Romanoff D, Kapralos B, and Chandross D authored the paper titled 'Exploring the Utilization of Perplexity AI for Academic Information Retrieval with Valid References Sourcing: A Study on Bina Nusantara Students', which was presented at the 2025 16th International Conference on Information, Intelligence, Systems & Applications (IISA).