procedure
The SelfCheckGPT benchmark requires an LLM to generate six Wikipedia passages for evaluation. The first passage is generated with a temperature of 0.0, and the remaining five are generated with a temperature of 1.0. The SelfCheckGPT-NLI method, using the 'potsawee/deberta-v3-large-mnli' NLI model, then assesses whether sentences in the first passage are supported by the other five; if a sentence is inconsistent, the instance is marked as hallucinated.

Authors

Sources

Referenced by nodes (1)