Relations (1)
Facts (3)
Sources
A survey on augmenting knowledge graphs (KGs) with large ... link.springer.com 1 fact
claimThe evaluation of generated text by Large Language Models is inconsistent and unreliable, as it is difficult to achieve consistent results between human judgments and automatic evaluation tools, and models themselves can be biased based on their training data.
Hallucinations in LLMs: Can You Even Measure the Problem? linkedin.com 1 fact
claimHuman evaluation is considered the gold standard for hallucination detection in Large Language Models, though it is costly to implement.
A framework to assess clinical safety and hallucination rates of LLMs ... nature.com 1 fact
referenceThe article 'A framework for human evaluation of large language models in healthcare derived from literature review' published in NPJ Digital Medicine (2024) establishes a framework for human-based assessment of LLMs in healthcare.