procedure
The LLM-as-Judge approach for evaluating response correctness leverages GPT-4o-Mini (et al., 2024) to classify generated responses into 'correct,' 'incorrect,' or 'refuse' categories, with 'refuse' treated as a hallucination.
Authors
Sources
- Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org via serper
Referenced by nodes (2)
- gpt-4o-mini concept
- LLM-as-a-judge concept