Fact — procedure — Knowledge Tree

The study evaluated Large Language Model performance using two metrics: safety, measured through the averaged BART sentiment score (Yin, Hay, and Roth 2019), and consistency, evaluated by comparing provided 'Rule of Thumb' instructions to the rules learned by the LLMs using BERTScore (Zhang et al. 2019).

Authors

Person: Not available Organization: arXiv
Building Trustworthy NeuroSymbolic AI Systems - arXiv

Sources

Building Trustworthy NeuroSymbolic AI Systems - arXiv arxiv.org arXiv via serper

Referenced by nodes (1)

BERTScore concept