measurement
General-purpose models demonstrate a median baseline hallucination resistance of 76.6%, compared to 51.3% for medical-purpose models, representing an average difference of 25.2% (95% CI: [18.7%, 31.3%], U = 27.0, p = 0.012, two-tailed Mann–Whitney test, rank-biserial r = −0.64, 95% CI: [−0.86, −0.28]).

Authors

Sources

Referenced by nodes (1)