claim
Benchmarking results from the PHANTOM study indicate that out-of-the-box Large Language Models face severe challenges in detecting real-world hallucinations within long-context data.

Authors

Sources

Referenced by nodes (1)