reference
The StrategyQA and GSM8K benchmarks evaluate AI models using accuracy metrics for Chain-of-Thought (CoT) tasks.
Authors
Sources
- EdinburghNLP/awesome-hallucination-detection - GitHub github.com via serper
Referenced by nodes (1)
- chain-of-thought concept