claim
Evaluating large language models for hallucinations separately from general capabilities is essential, and metrics should account for the deceptiveness of errors rather than just their frequency to capture practical risk.

Authors

Sources

Referenced by nodes (2)