Relations (1)

related 3.00 — strongly supporting 7 facts

Justification not yet generated — showing supporting facts

Facts (7)

Sources
EdinburghNLP/awesome-hallucination-detection - GitHub github.com GitHub 4 facts
measurementThe LARS uncertainty estimation technique is evaluated using Accuracy, Precision, Recall, and AUROC metrics on the TriviaQA, GSM8k, SVAMP, and Common-sense QA datasets.
referenceEvaluation metrics for hallucination detection include Accuracy (Acc), G-Mean, BSS, AUC, and Precision, Recall, and F1 scores for both 'Not Hallucination' and 'Hallucination' classifications.
referenceThe ClaimDecomp dataset contains 1200 complex claims from PolitiFact, each labeled with one of six veracity labels, a justification paragraph from expert fact-checkers, and subquestions annotated by prior work, evaluated using accuracy, F1, precision, and recall.
referenceEvaluation benchmarks for vision-language hallucination detection and mitigation include MHaluBench, MFHaluBench, Object HalBench, AMBER, MMHal-Bench, and POPE, which utilize metrics such as accuracy, precision, recall, F1-score, CHAIR, Cover, Hal, and Cog.
KG-RAG: Bridging the Gap Between Knowledge and Creativity - arXiv arxiv.org arXiv 1 fact
claimTo evaluate the KG-RAG approach against vector RAG and no-RAG baselines, the researchers incorporated a conventional accuracy metric and introduced a modified precision metric designed to quantify the incidence of hallucinations.
A Comprehensive Benchmark and Evaluation Framework for Multi ... arxiv.org arXiv 1 fact
claimClassical metrics, including Precision, Recall, Accuracy, and F1-score, are used to quantify performance in the study.
A survey on augmenting knowledge graphs (KGs) with large ... link.springer.com Springer 1 fact
claimEvaluation metrics for Large Language Models integrated with Knowledge Graphs vary depending on the specific downstream tasks and can include accuracy, F1-score, precision, and recall.