measurement
The study evaluated hallucination rates and clinical risk severity for five Large Language Models: o1, gemini-2.0-flash-exp, gpt-4o, gemini-1.5-flash, and claude-3.5 sonnet.

Authors

Sources

Referenced by nodes (5)