claim
Evaluating Large Language Models with psychologically grounded metrics allows researchers to move beyond surface-level performance measures by mapping classic theories onto benchmarks that probe model responses under human-like scenarios.

Authors

Sources

Referenced by nodes (1)