Fact — procedure — Knowledge Tree

The LLM-as-Judge approach for evaluating response correctness leverages GPT-4o-Mini (et al., 2024) to classify generated responses into 'correct,' 'incorrect,' or 'refuse' categories, with 'refuse' treated as a hallucination.

Authors

Person: Not available Organization: arXiv
Re-evaluating Hallucination Detection in LLMs - arXiv

Sources

Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org arXiv via serper

Referenced by nodes (2)

gpt-4o-mini concept
LLM-as-a-judge concept