claim
The researchers used GPT4Score as a model-based evaluation metric, defined as the percentage of answers that GPT-4o identifies as correct when assessing if the model's output matches the ground truth answer.
Authors
Sources
- Grounding LLM Reasoning with Knowledge Graphs - arXiv arxiv.org via serper
Referenced by nodes (1)
- GPT-4 concept