measurement
GPT-4 achieves an F1-score of approximately 0.625 in detecting subtle falsehoods on the hardest subset of the MedHallu benchmark.
Authors
Sources
- EdinburghNLP/awesome-hallucination-detection - GitHub github.com via serper
GPT-4 achieves an F1-score of approximately 0.625 in detecting subtle falsehoods on the hardest subset of the MedHallu benchmark.