reference
The paper 'MACPO: weak-to-strong alignment via multi-agent contrastive preference optimization' is cited in section 7.2.2 of 'A Survey on the Theory and Mechanism of Large Language Models' and was presented at The Thirteenth International Conference on Learning Representations.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper