claim
Alignment in language models refers to ensuring that a model designed to follow instructions does not produce unsafe results, a concept discussed by MacDonald in 1991.

Authors

Sources

Referenced by nodes (2)