procedure
The SelfCheckGPT benchmark requires an LLM to generate six Wikipedia passages for evaluation. The first passage is generated with a temperature of 0.0, and the remaining five are generated with a temperature of 1.0. The SelfCheckGPT-NLI method, using the 'potsawee/deberta-v3-large-mnli' NLI model, then assesses whether sentences in the first passage are supported by the other five; if a sentence is inconsistent, the instance is marked as hallucinated.
Authors
Sources
- The Hallucinations Leaderboard, an Open Effort to Measure ... huggingface.co via serper
Referenced by nodes (1)
- Wikipedia entity