Fact — claim — Knowledge Tree

Benchmarking results from the PHANTOM study indicate that out-of-the-box Large Language Models face severe challenges in detecting real-world hallucinations within long-context data.

Authors

Person: Not available Organization: NeurIPS
A Benchmark for Hallucination Detection in Financial Long-Context QA

Sources

A Benchmark for Hallucination Detection in Financial Long-Context QA neurips.cc NeurIPS via serper

Referenced by nodes (1)

Large Language Models concept