concept

evaluation metrics

Also known as: evaluation methods, evaluation metric

Facts (10)

Sources

Practices, opportunities and challenges in the fusion of knowledge ... frontiersin.org Frontiers 2 facts

claimCurrent evaluation metrics like BLEU (Papineni et al., 2002) and ROUGE (Lin, 2004) mainly measure surface text similarity and fail to effectively capture the semantic consistency between generated text and knowledge graph content.

claimExisting evaluation metrics for knowledge graph completion often prioritize surface-level correctness over logical consistency.

LLM Hallucinations: Causes, Consequences, Prevention - LLMs llmmodels.org llmmodels.org May 10, 2024 1 fact

claimA significant challenge in assessing large language model performance is the need for more accurate and sophisticated evaluation metrics and protocols.

LLM Observability: How to Monitor AI When It Thinks in Tokens | TTMS ttms.com TTMS Feb 10, 2026 1 fact

claimRich observability enables rapid improvement cycles by providing traces to understand model behavior and evaluation metrics to measure the impact of changes, creating a feedback loop for continuous improvement.

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv Mar 12, 2026 1 fact

referenceThe paper 'LLMs-as-judges: a comprehensive survey on LLM-based evaluation methods' provides a survey of methods that use large language models to evaluate other models, as detailed in arXiv preprint arXiv:2412.05579.

Evaluating RAG applications with Amazon Bedrock knowledge base ... aws.amazon.com Amazon Web Services Mar 14, 2025 1 fact

measurementAmazon Bedrock Knowledge Bases evaluation metrics are normalized to a range between 0 and 1.

Understanding the Psychology of Impulse Buying in E-Commerce jmsr-online.com Journal of Management and Science Research Aug 9, 2025 1 fact

referenceThe paper 'A Statistical Study to Develop a Reliable Scale to Evaluate Instructors within Higher Institutions' by Taan and Hajjar focuses on developing evaluation metrics for instructors in higher education.

Medical Hallucination in Foundation Models and Their ... medrxiv.org medRxiv Mar 3, 2025 1 fact

claimIn drug discovery applications, evaluation metrics assess whether drug-protein interactions described by a Large Language Model align with established biochemical knowledge, as cited in Juhi et al. (2023).

A framework to assess clinical safety and hallucination rates of LLMs ... nature.com Nature May 13, 2025 1 fact

referenceAbacha, A. Ben, Yim, W., Michalopoulos, G., and Lin, T. authored 'An Investigation of Evaluation Metrics for Automated Medical Note Generation', published in 2023 (arXiv:2305.17364).

Evaluating Evaluation Metrics — The Mirage of Hallucination ... machinelearning.apple.com Atharva Kulkarni, Yuan Zhang, Joel Ruben Antony Moniz, Xiou Ge, Bo-Hsiang Tseng, Dhivya Piraviperumal, Swabha Swayamdipta, Hong Yu · Apple Machine Learning Research 1 fact

procedureIn the paper 'Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment', the authors propose a sample-efficient reinforcement learning approach for adapting the loss function dynamically during training to directly optimize the evaluation metric.