entity

Vectara

Also known as: Vectara, Inc.

Facts (23)

Sources

vectara/hallucination-leaderboard - GitHub github.com Vectara 18 facts

claimThe Vectara hallucination leaderboard serves as an indicator for the accuracy of Large Language Models when deployed in Retrieval Augmented Generation (RAG) and agentic pipelines, where the model acts as a summarizer of search results.

perspectiveVectara does not recommend using their hallucination leaderboard as a standalone metric, but rather as a quality metric to be run alongside other evaluations such as summarization quality and question-answering accuracy.

referenceThe Vectara hallucination leaderboard integrates Claude Sonnet 4 (claude-sonnet-4-20250514), Claude Opus 4 (claude-opus-4-20250514), Claude Opus 4.1 (claude-opus-4-1-20250805), Claude Sonnet 4.5 (claude-sonnet-4-5-20250929), and Claude Haiku 4.5 (claude-haiku-4-5-20251001).

procedureThe procedure to build the Vectara hallucination leaderboard involves: (1) feeding the full set of documents in the dataset to each Large Language Model, (2) asking the models to summarize each document using only the facts presented in the document, (3) computing the overall factual consistency rate (no hallucinations) and hallucination rate (100 minus accuracy) for each model, and (4) recording the rate at which each model refuses to respond in an 'Answer Rate' column.

claimVectara plans to update the public LLM hallucination leaderboard regularly as both the Hallucination Evaluation Model (HHEM) and the evaluated Large Language Models are updated over time.

procedureVectara used a temperature setting of 0 when querying Large Language Models for the hallucination leaderboard, except in cases where that setting was impossible or unavailable.

claimVectara's hallucination leaderboard methodology uses protocols established in existing academic literature on factual consistency evaluation.

claimThe Vectara hallucination leaderboard uses HHEM-2.3, which is Vectara's commercial hallucination evaluation model, to compute the rankings of large language models.

claimVectara evaluates summarization factual consistency rate rather than overall factual accuracy because it allows for a direct comparison between the model's response and the provided source information.

claimThe Vectara hallucination leaderboard authors chose to evaluate hallucination rates in summarization tasks rather than attempting to determine if a response was hallucinated without a reference source, because the latter would require training a model as large or larger than the LLMs being evaluated.

referenceDetailed explanations of the development of the Vectara hallucination detection model are available in the blog posts 'Cut the Bull…. Detecting Hallucinations in Large Language Models' and 'Introducing the Next Generation of Vectara's Hallucination Leaderboard'.

referenceThe Vectara hallucination leaderboard integrates Cohere Command R (command-r-08-2024), Cohere Command R Plus (command-r-plus-08-2024), and Cohere Command A (command-a-03-2025) using the /chat endpoint.

referenceThe Vectara hallucination leaderboard integrates Aya Expanse 8B (c4ai-aya-expanse-8b) and Aya Expanse 32B (c4ai-aya-expanse-32b).

referenceThe Vectara hallucination leaderboard includes multiple versions, such as an initial version based on the HHEM-1.0 model.

claimThe dataset used for the Vectara hallucination leaderboard is curated to be not publicly available to prevent overfitting by Large Language Models, contains over 7700 articles from diverse sources including news, technology, science, medicine, legal, sports, business, and education, and includes articles ranging from 50 words to 24,000 words in length.

claimThe Vectara team acknowledges that their current hallucination detection process does not definitively measure all ways a model can hallucinate, but they view it as a starting point for further development and community contribution.

claimThe HHEM-2.1-Open model is an open-source variant of the Vectara hallucination detection model available on Hugging Face and Kaggle.

claimThe Vectara hallucination leaderboard evaluates hallucination rates within summarization tasks as an analogue for determining the overall truthfulness of Large Language Models (LLMs).

Real-Time Evaluation Models for RAG: Who Detects Hallucinations ... cleanlab.ai Cleanlab Apr 7, 2025 2 facts

referenceThe Hughes Hallucination Evaluation Model (HHEM) is a Transformer model trained by Vectara to distinguish between hallucinated and correct responses from various Large Language Models across different context and response data.

referenceVectara’s Hughes Hallucination Evaluation Model (HHEM) version 2.1 focuses on measuring factual consistency between an AI response and the retrieved context.

EdinburghNLP/awesome-hallucination-detection - GitHub github.com GitHub 2 facts

referenceVectara published a project or report titled 'Cut the Bull...' regarding hallucination detection.

referenceThe Vectara LLM Hallucination Leaderboard is a resource for evaluating hallucinations in large language models.

A framework to assess clinical safety and hallucination rates of LLMs ... nature.com Nature May 13, 2025 1 fact

referenceThe Vectara Hallucination Leaderboard, maintained by Vectara, Inc. since 2023, compares large language model performance in maintaining factual consistency when summarizing sets of facts.