formula
The conditional probability distribution of an output sequence y = (y1, y2, …, ym) given an input context x = (x1, x2, …, xn) is factorized as P(y|x; θ) = ∏_{t=1}^{m} P(yt | y<t, x; θ), where θ denotes the model parameters optimized via maximum likelihood estimation or reinforcement learning from human feedback (RLHF).
Authors
Sources
- Survey and analysis of hallucinations in large language models www.frontiersin.org via serper
Referenced by nodes (3)
- Large Language Models concept
- Reinforcement learning from human feedback (RLHF) concept
- RLHF concept