concept

CWQ dataset

Facts (12)

Sources
KG-RAG: Bridging the Gap Between Knowledge and Creativity - arXiv arxiv.org arXiv May 20, 2024 12 facts
claimFinancial constraints limited the ability of the KG-RAG researchers to process all web snippets for each question and to test the entire development split of the CWQ dataset, due to the high costs of using LLMs to convert web snippets into knowledge graph triples.
measurementOn the CWQ dataset, the KG-RAG pipeline achieved an Exact Match (EM) score of 19%, an F1 Score of 25%, an accuracy of 32%, and a hallucination rate of 15%.
measurementHuman benchmarks on the CWQ dataset achieved an Exact Match (EM) score of 63%.
measurementOn the CWQ dataset, the Embedding-RAG model achieved an Exact Match (EM) score of 28%, an F1 Score of 37%, an accuracy of 46%, and a hallucination rate of 30%.
measurementOn the CWQ dataset, the MHQA-GRN model achieved an Exact Match (EM) score of 33.2%.
measurementOn the CWQ dataset, the KG-RAG pipeline achieved an Exact Match (EM) score of 19%, an F1 Score of 25%, an accuracy of 32%, and a hallucination rate of 15%.
measurementOn the CWQ dataset, the MHQA-GRN model achieved an Exact Match (EM) score of 33.2%.
claimFinancial constraints limited the ability of the KG-RAG researchers to process all web snippets for each question and to test the entire development split of the CWQ dataset, due to the high costs of using LLMs to convert web snippets into knowledge graph triples.
claimThe quality of the CWQ web snippets dataset was often low, which potentially impacted the reliability of the research results.
measurementHuman benchmarks on the CWQ dataset achieved an Exact Match (EM) score of 63%.
measurementOn the CWQ dataset, the Embedding-RAG model achieved an Exact Match (EM) score of 28%, an F1 Score of 37%, an accuracy of 46%, and a hallucination rate of 30%.
claimThe quality of the CWQ web snippets dataset was often low, which potentially impacted the reliability of the research results.