claim
General-purpose large language models often outperform specialized medical models in hallucination detection tasks according to experiments conducted for the MedHallu benchmark.

Authors

Sources

Referenced by nodes (3)