claim
The researchers validated the deception-related features identified in Llama 70B using TruthfulQA, a standard benchmark for common factual misconceptions, demonstrating that amplifying these features increases the model's willingness to state falsehoods.

Authors

Sources

Referenced by nodes (1)