measurement
GPT-4o exhibited the highest hallucination rates in Chronological Ordering (24.6%) and Lab Data Understanding (18.7%) compared to other models, with many of these hallucinations classified by medical experts as posing 'Significant' or 'Considerable' clinical risk.

Authors

Sources

Referenced by nodes (2)