claim
The BERT score has been shown to correlate with human judgment on both sentence-level and system-level evaluation and computes precision, recall, and F1 measures for language generation tasks.
Authors
Sources
- Detect hallucinations for RAG-based systems - AWS aws.amazon.com via serper