evaluation metrics
Also known as: evaluation methods, evaluation metric
Facts (10)
Sources
Practices, opportunities and challenges in the fusion of knowledge ... frontiersin.org 2 facts
claimCurrent evaluation metrics like BLEU (Papineni et al., 2002) and ROUGE (Lin, 2004) mainly measure surface text similarity and fail to effectively capture the semantic consistency between generated text and knowledge graph content.
claimExisting evaluation metrics for knowledge graph completion often prioritize surface-level correctness over logical consistency.
LLM Hallucinations: Causes, Consequences, Prevention - LLMs llmmodels.org May 10, 2024 1 fact
claimA significant challenge in assessing large language model performance is the need for more accurate and sophisticated evaluation metrics and protocols.
LLM Observability: How to Monitor AI When It Thinks in Tokens | TTMS ttms.com Feb 10, 2026 1 fact
claimRich observability enables rapid improvement cycles by providing traces to understand model behavior and evaluation metrics to measure the impact of changes, creating a feedback loop for continuous improvement.
A Survey on the Theory and Mechanism of Large Language Models arxiv.org Mar 12, 2026 1 fact
referenceThe paper 'LLMs-as-judges: a comprehensive survey on LLM-based evaluation methods' provides a survey of methods that use large language models to evaluate other models, as detailed in arXiv preprint arXiv:2412.05579.
Evaluating RAG applications with Amazon Bedrock knowledge base ... aws.amazon.com Mar 14, 2025 1 fact
measurementAmazon Bedrock Knowledge Bases evaluation metrics are normalized to a range between 0 and 1.
Understanding the Psychology of Impulse Buying in E-Commerce jmsr-online.com Aug 9, 2025 1 fact
referenceThe paper 'A Statistical Study to Develop a Reliable Scale to Evaluate Instructors within Higher Institutions' by Taan and Hajjar focuses on developing evaluation metrics for instructors in higher education.
Medical Hallucination in Foundation Models and Their ... medrxiv.org Mar 3, 2025 1 fact
claimIn drug discovery applications, evaluation metrics assess whether drug-protein interactions described by a Large Language Model align with established biochemical knowledge, as cited in Juhi et al. (2023).
A framework to assess clinical safety and hallucination rates of LLMs ... nature.com May 13, 2025 1 fact
referenceAbacha, A. Ben, Yim, W., Michalopoulos, G., and Lin, T. authored 'An Investigation of Evaluation Metrics for Automated Medical Note Generation', published in 2023 (arXiv:2305.17364).
Evaluating Evaluation Metrics — The Mirage of Hallucination ... machinelearning.apple.com 1 fact
procedureIn the paper 'Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment', the authors propose a sample-efficient reinforcement learning approach for adapting the loss function dynamically during training to directly optimize the evaluation metric.