claim
Training linear diagonal networks on square loss for regression tasks causes gradient descent to converge to special solutions, such as non-negative ones.

Authors

Sources

Referenced by nodes (2)