Relations (1)
Facts (2)
Sources
Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org 1 fact
procedureThe researchers curated a dataset of instances where ROUGE and an LLM-as-Judge metric provided conflicting assessments regarding the presence of hallucinations to examine ROUGE's failure modes.
LLM Observability: How to Monitor AI When It Thinks in Tokens | TTMS ttms.com 1 fact
claimLLM monitoring systems can derive hallucination or correctness scores using automated evaluation pipelines, such as cross-checking model answers against a knowledge base or using an LLM-as-a-judge to score factuality.