concept

entity-level filtering

Also known as: entity-level filtering, entity-level filter

Facts (11)

Sources

A Knowledge Graph-Based Hallucination Benchmark for Evaluating ... arxiv.org arXiv Feb 23, 2026 11 facts

measurementThe KGHaluBench entity-level filter at a 0.700 threshold achieved 5.65% higher alignment with human judgment and 48.78% higher recall compared to an automated judge using GPT-3.5-Turbo.

procedureThe entity-level filter classifies responses as aligned, hallucinated, or abstained by identifying abstentions and evaluating semantic and token-level similarities against an entity's description.

procedureThe entity-level filter evaluates semantic similarity using cosine similarity on encoded representations of the response and the entity description, and evaluates token-level similarity using the intersection of common words.

procedureThe entity-level filter integrates abstention detection to classify responses as aligned, hallucinated, or abstained.

measurementThe KGHaluBench entity-level filter achieved its highest F1 score of 78.07% at a threshold of 0.700, with an overall agreement of 77.98%.

procedureIn the entity-level filtering task, human participants compared LLM responses against Wikipedia entity descriptions to determine if they referred to the same entity, while ignoring fact-level hallucinations.

measurementThe entity-level filter combines semantic and token-level similarity metrics using a 70:30 ratio, prioritizing semantic over lexical alignment.

claimThe KGHaluBench entity-level filter prioritizes recall because misaligned responses admitted at the first stage of the pipeline will score poorly at the fact-level check, whereas aligned responses that are mistakenly discarded are detrimental to the overall assessment accuracy.

claimIn the entity-level filter, abstentions are defined as responses that refuse to answer, deflect the question, or admit to not recognizing the focal entity.

claimThe Hallucination Rate metric is split into two components: breadth of knowledge (percentage of responses classified as hallucinations by an entity-level filter) and depth of knowledge (percentage of incorrect facts judged by a fact-level check).

measurementIn the KGHaluBench entity-level filter, a threshold of 0.750 achieved the highest alignment with human judges at 79.19%, but resulted in a lower recall of 73.17%, indicating the filter was overly strict.