RefChecker
Facts (13)
Sources
New tool, dataset help detect hallucinations in large language models amazon.science 12 facts
claimIn the initial release of RefChecker, the automatic hallucination checker supports GPT-4, Claude 2, and RoBERTa-NLI, with plans to release additional open-source checkers such as AlignScore and a Mistral-based checker.
claimRefChecker categorizes claims into three types based on their relationship to reference texts: entailments (supported), contradictions (refuted), and neutral (insufficient evidence). This aligns with the support, refute, and not enough information categories used in natural-language inference (NLI).
procedureRefChecker consists of two configurable modules: a claim triplet extractor (E) and a hallucination checker (C).
claimIn the initial release of RefChecker, the claim triplet extractor supports GPT-4 and Claude 2, with plans to provide a Mixtral-8x7B open-source extractor in a future release.
claimRefChecker represents claims in LLM-generated text using knowledge triplets, which are structured as <subject, predicate, object> to capture finer-grained information than sentences or sub-sentences.
claimRefChecker supports the extraction of knowledge triplets, the detection of hallucinations at the triplet level, and the evaluation of large language models.
referenceThe RefChecker benchmark dataset covers three distinct settings: zero context (LLMs generate answers without reference texts), noisy context (LLMs are provided with retrieved documents that may contain inaccurate information, typical of RAG systems), and accurate context (LLMs are provided with one accurate document).
claimAmazon released RefChecker, a tool consisting of a framework for hallucination detection and a benchmark dataset for assessing hallucinations in large language models.
measurementThe RefChecker benchmark dataset includes 100 examples for each of its three settings: zero context, noisy context, and accurate context.
procedureRefChecker uses knowledge triplets with a <subject, predicate, object> structure to characterize factual assertions in LLM-generated texts, rather than using sentences or short phrases like previous frameworks.
procedureRefChecker is available on GitHub and can be installed using pip, with usage instructions provided in the QuickStart section of the project's README.
referenceThe RefChecker benchmark dataset sources its examples from three specific datasets: NaturalQuestions (development set) for zero context closed-book QA, MS MARCO (development set) for noisy context retrieval-augmented generation, and databricks-dolly-15k for accurate context summarization, closed QA, and information extraction.
Awesome-Hallucination-Detection-and-Mitigation - GitHub github.com 1 fact
referenceHu et al. (2024) published 'RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models' on Arxiv.