concept

WebQSP

Facts (14)

Sources

Empowering GraphRAG with Knowledge Filtering and Integration arxiv.org arXiv Mar 18, 2025 11 facts

measurementThe study evaluates performance using the F1 score on the WebQSP (Yih et al., 2016) and CWQ (Talmor and Berant, 2018) datasets.

measurementOn the WebQSP dataset, the ROG-original method achieved a Hit rate of 86.73 and an F1 score of 70.75, while the ROG + GraphRAG-FI method achieved a Hit rate of 89.25 and an F1 score of 73.86.

referenceThe WebQSP dataset, introduced by Yih et al. (2016), contains 4,737 natural language questions that require reasoning over paths of up to two hops, while the CWQ dataset was introduced by Talmor and Berant (2018).

measurementThe WebQSP dataset contains 2,826 training samples, 1,628 testing samples, and has a maximum hop count of 2.

claimThe researchers utilize the WebQSP and CWQ benchmark datasets for Knowledge Graph Question Answering (KGQA) tasks.

measurementThe GraphRAG-FI method yields an average increase of 5.03% in Hit and 3.70% in F1 compared to PageRank-based filtering across both the WebQSP and CWQ datasets.

measurementThe GraphRAG-FI method achieves an average improvement of 4.78% in Hit and 3.95% in F1 compared to similarity-based filtering when used with the ROG retriever across both the WebQSP and CWQ datasets.

measurementThe GraphRAG-Integration component increases the F1 score by 1.60% and the Hit score by 2.62% on the WebQSP dataset.

procedureThe framework uses LLaMA2-Chat-7B as the Large Language Model backbone, which is instruction-finetuned on the training splits of WebQSP and CWQ, and Freebase, for three epochs.

measurementLeveraging logits to filter out low-confidence responses improves performance on the WebQSP and CWQ datasets. Specifically, on WebQSP, the 'LLM with Logits' approach achieved a Hit rate of 84.17 and F1 score of 76.74, compared to 66.15 and 49.97 for the baseline LLM. On CWQ, the 'LLM with Logits' approach achieved a Hit rate of 61.83 and F1 score of 58.19, compared to 40.27 and 34.17 for the baseline LLM.

measurementIn experiments on the WebQSP and CWQ datasets, the GNN-RAG + GraphRAG-FI method achieved the highest performance, with a Hit rate of 91.89% and F1 score of 75.98% on WebQSP, and a Hit rate of 71.12% and F1 score of 60.34% on CWQ.

Large Language Models Meet Knowledge Graphs for Question ... arxiv.org arXiv Sep 22, 2025 2 facts

referenceWebQSP (tau Yih et al., 2016) is a Knowledge-Based Question Answering (KBQA) dataset that includes SPARQL queries for knowledge-based question answering.

referenceThe GRAG method, proposed by Hu et al. in 2024, utilizes textual graph RAG with the Llama-2-7B language model, incorporating WebQSP and ExplaGraphs knowledge graphs to perform KGQA tasks on the GraphQA and WQSP datasets, evaluated using F1, Hits@1, and Acc metrics.

LLM-KG4QA: Large Language Models and Knowledge Graphs for ... github.com GitHub 1 fact

referenceThe paper 'The Value of Semantic Parse Labeling for Knowledge Base Question Answering' was published at ACL in 2016, utilizes the WebQSP dataset, and is categorized under KBQA and KGQA.