factual consistency evaluation ↔ Large Language Models

Relations (1)

related 2.58 — strongly supporting 5 facts

Large Language Models are the primary subject of factual consistency evaluation, as researchers survey and develop methods to assess their reliability [1], [2]. Furthermore, these models are known to struggle with factual consistency due to architectural limitations and the inadequacy of traditional metrics for their evaluation [3], [4], [5].

Facts (5)

Sources

Survey and analysis of hallucinations in large language models frontiersin.org Frontiers 3 facts

claimTraditional automatic metrics like BLEU, ROUGE, and METEOR are inadequate for assessing factual consistency in large language models, according to Maynez et al. (2020).

referenceLiu et al. (2023) conducted a survey on methods for evaluating the factual consistency of large language models.

claimAutomatic metrics such as BLEU or ROUGE fail to capture factual consistency and reliability in Large Language Models, according to Maynez et al. (2020).

Practices, opportunities and challenges in the fusion of knowledge ... frontiersin.org Frontiers 1 fact

referenceLuo et al. (2024) evaluated the factual consistency of summarization in the era of large language models in the journal Expert Systems with Applications.

What Really Causes Hallucinations in LLMs? - AI Exploration Journey aiexpjourney.substack.com AI Innovations and Insights 1 fact

claimLarge language models may hallucinate because their specific architecture is incapable of learning certain patterns, such as identifying impossible trigrams, which prevents the model from maintaining factual consistency.