Fact — claim — Knowledge Tree

Metrics such as ROUGE and F1 can be inaccurate because they rely on shallow linguistic similarities (word overlap) between ground truth and LLM responses, even when the actual meaning differs.

Authors

Person: Not available Organization: Amazon Web Services
Evaluating RAG applications with Amazon Bedrock knowledge base ...

Sources

Evaluating RAG applications with Amazon Bedrock knowledge base ... aws.amazon.com Amazon Web Services via serper

Referenced by nodes (3)

ROUGE concept
ground truth concept
F1 concept