reference
Benchmark datasets for Large Language Model and Knowledge Graph synthesis evaluate three primary criteria: Answer Quality (AnsQ), which measures the correctness of the generated answer against ground-truth; Retrieval Quality (RetQ), which measures the relevance of retrieved context against human-validated context; and Reasoning Quality (ReaQ), which measures the correctness of reasoning chains and intermediate steps.
Authors
Sources
- Large Language Models Meet Knowledge Graphs for Question ... arxiv.org via serper
Referenced by nodes (1)
- Knowledge Graph concept