Fact — measurement — Knowledge Tree

An empirical study on legal question-answering found that GPT-3.5 hallucinates in 69% of outputs, while LLaMA-2 hallucinates in 88% of outputs, when tested against a custom set of factual US case queries.

Authors

Person: Not available Organization: GitHub
EdinburghNLP/awesome-hallucination-detection - GitHub

Sources

EdinburghNLP/awesome-hallucination-detection - GitHub github.com GitHub via serper

Referenced by nodes (1)

U.S. location