Fact — measurement — Knowledge Tree

GPT-4 achieves an F1-score of approximately 0.625 in detecting subtle falsehoods on the hardest subset of the MedHallu benchmark.

Authors

Person: Not available Organization: GitHub
EdinburghNLP/awesome-hallucination-detection - GitHub

Sources

EdinburghNLP/awesome-hallucination-detection - GitHub github.com GitHub via serper

Referenced by nodes (3)

GPT-4 concept
MedHallu concept
F1 score concept