claim
Training linear diagonal networks on square loss for regression tasks causes gradient descent to converge to special solutions, such as non-negative ones.
Authors
Sources
- Track: Poster Session 3 - aistats 2026 virtual.aistats.org via serper
Referenced by nodes (2)
- gradient descent concept
- regression concept