hallucination ↔ LLM-as-a-judge

Relations (1)

related 1.58 — strongly supporting 2 facts

LLM-as-a-judge is utilized as a mechanism to detect or score the presence of hallucination in model outputs as described in [1], and it is further used as a comparative metric to analyze hallucination detection performance in [2].

Facts (2)

Sources

Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org arXiv 1 fact

procedureThe researchers curated a dataset of instances where ROUGE and an LLM-as-Judge metric provided conflicting assessments regarding the presence of hallucinations to examine ROUGE's failure modes.

LLM Observability: How to Monitor AI When It Thinks in Tokens | TTMS ttms.com TTMS 1 fact

claimLLM monitoring systems can derive hallucination or correctness scores using automated evaluation pipelines, such as cross-checking model answers against a knowledge base or using an LLM-as-a-judge to score factuality.