reference
Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, and Zhifeng Chen published 'Gshard: Scaling giant models with conditional computation and automatic sharding' as an arXiv preprint (arXiv:2006.16668) in 2020.

Authors

Sources

Referenced by nodes (1)