Knowledge Tree

The Phare benchmark's hallucination module evaluates large language models across four task categories: factual accuracy, misinformation resistance, debunking capabilities, and tool reliability. Factual accuracy is tested through structured question-answering tasks to measure retrieval precision, while misinformation resistance examines a model's capability to correctly refute ambiguous or ill-posed questions rather than fabricating narratives.

Authors

Person: Not available Organization: Giskard
Phare LLM Benchmark: an analysis of hallucination in ...

Sources

Phare LLM Benchmark: an analysis of hallucination in ... www.giskard.ai Giskard via serper

Referenced by nodes (2)

Large Language Models concept
factual correctness concept