Relations (1)

related 0.90 — strongly supporting 9 facts

Large Language Models are evaluated for their social intelligence and cognitive reasoning capabilities through Theory of Mind (ToM) benchmarks and tasks, as evidenced by academic studies like [1], [2], and [3]. Researchers utilize specific frameworks such as ToMBENCH and OpenToM [4] to measure how these models represent beliefs [5] and improve interpersonal reasoning [6].

Facts (9)

Sources
A Survey of Incorporating Psychological Theories in LLMs - arXiv arxiv.org arXiv 8 facts
claimResearchers assess the core social intelligence of Large Language Models by measuring their capacity to represent and reason about beliefs using Theory of Mind (ToM) benchmarks.
claimTheory of Mind (ToM) adaptations in LLMs enhance interpersonal reasoning, which aids in missing knowledge inference (Bortoletto et al., 2024), common ground alignment (Qiu et al., 2024), and cognitive modeling (Wu et al., 2024a).
claimWilf et al. (2024) and Jung et al. (2024) refined Theory of Mind in LLMs via task decomposition, while Sarangi et al. (2025) utilized recursive simulation.
measurementCognitive development and reasoning capabilities in Large Language Models have been assessed through cognitive maturity (Laverghetta Jr. & Licato, 2022), subjective similarity (Malloy et al., 2024), reasoning strategies (Mondorf & Plank, 2024; Yuan et al., 2023), decision-making (Ying et al., 2024), and Theory of Mind (Jung et al., 2024).
referenceMichal Kosinski published 'Evaluating large language models in theory of mind tasks' in the Proceedings of the National Academy of Sciences in 2024.
referenceNature Human Behaviour published the study 'Testing theory of mind in large language models and humans' in 2024, volume 8, issue 7, pages 1285–1295.
referenceRecent benchmarks developed to probe distinct facets of Theory of Mind (ToM) in Large Language Models include ToMBENCH (Chen et al., 2024c), OpenToM (Xu et al., 2024a), HI-TOM (Wu et al., 2023), and FANTOM (Kim et al., 2023).
referenceThe paper 'ToMBench: Benchmarking theory of mind in large language models' by Zhuang Chen, Jincenzi Wu, Jinfeng Zhou, Bosi Wen, Guanqun Bi, Gongyao Jiang, Yaru Cao, Mengting Hu, Yunghwei Lai, Zexuan Xiong, and Minlie Huang was published in the 'Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics' in Bangkok, Thailand, in August 2024.
The Synergy of Symbolic and Connectionist AI in LLM ... arxiv.org arXiv 1 fact
referenceJames WA Strachan, Dalila Albergo, Giulia Borghini, Oriana Pansardi, Eugenio Scaliti, Saurabh Gupta, Krati Saxena, Alessandro Rufo, Stefano Panzeri, Guido Manzi, et al. authored the paper 'Testing theory of mind in large language models and humans', published in Nature Human Behaviour in 2024.