measurement
Leveraging logits to filter out low-confidence responses improves performance on the WebQSP and CWQ datasets. Specifically, on WebQSP, the 'LLM with Logits' approach achieved a Hit rate of 84.17 and F1 score of 76.74, compared to 66.15 and 49.97 for the baseline LLM. On CWQ, the 'LLM with Logits' approach achieved a Hit rate of 61.83 and F1 score of 58.19, compared to 40.27 and 34.17 for the baseline LLM.

Authors

Sources

Referenced by nodes (4)