claim
The researchers constructed evaluation datasets for three specific tasks: knowledge question answering, tactical planning, and threat assessment, ensuring domain complexity aligns with real-world scenarios.

Authors

Sources

Referenced by nodes (3)