claim
The Datadog hallucination detection method showed the smallest drop in F1 scores between HaluBench and RAGTruth, suggesting robustness as hallucinations become harder to detect.
Authors
Sources
- Detecting hallucinations with LLM-as-a-judge: Prompt ... - Datadog www.datadoghq.com via serper
Referenced by nodes (4)
- hallucination detection concept
- Datadog entity
- F1 score concept
- RAGTruth concept