measurement
Category C, representing cases where the LLM-only model outperforms GraphRAG and GraphRAG leads to wrong predictions for queries the standalone LLM originally answered correctly, accounts for 16.89% of samples when evaluated via F1 score.
Authors
Sources
- Empowering GraphRAG with Knowledge Filtering and Integration arxiv.org via serper
Referenced by nodes (1)
- F1 score concept