claim
Alignment in language models refers to ensuring that a model designed to follow instructions does not produce unsafe results, a concept discussed by MacDonald in 1991.
Authors
Sources
- Building Trustworthy NeuroSymbolic AI Systems - arXiv arxiv.org via serper
Referenced by nodes (2)
- Language Model concept
- AI alignment concept