Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Facebook Builds an Efficient Neural Network Model over a Billion Words

Facebook Builds an Efficient Neural Network Model over a Billion Words

This item in japanese

Using Neural Networks for sequence prediction is a well-known Computer Science problem with a vast array of applications in speech recognition, machine translation, language modeling and other fields. Models being used are really computationally demanding which limits their practical applicability.

Facebook AI Research scientists designed adaptive softmax, an approximation algorithm tailored for GPUs which can be used to efficiently train neural networks over huge vocabularies. Adaptive softmax, as described in the published paper exploits unbalanced word distribution over large corpora to form clusters that can minimize expectation of computational complexity. Full softmax has a linear correlation with the vocabulary corpus size, whereas adaptive softmax is sublinear and optimized for GPU usage.

In conjunction with the development of adaptive softmax, Facebook researchers released torch-rnnlib, an open-source library that helps researchers designing and testing recurrent models in GPUs. torch.cudnn allows for easy access to baselines using NVIDIA CUDA Deep Neural Network library. RNN, LSTM, GRU and other recurrent networks are implemented and can be easily used by researchers as building blocks to design recurrent networks.

Testing the algorithm in a single GPU, Facebook researchers achieved 12,500 words/sec while maintaining accuracy close to the full softmax. Benchmark perplexities achieved by researchers were 30 (lower is better) by Google’s Jozefowicz et al, 2016 using 32GPUs over three weeks and 44 using 18GPU days for training. Google’s implementation of LSTM model using Tensorflow is available on Github and the main author offers an interesting explanation of perplexity performance in a relevant thread in Reddit. In contrast, adaptive softmax can get to perplexity of 50 within ~14 hours, 43.9 within a couple of days and 39.8 in six days. Without CuDNN library, performance drops by ~30%. All tools and techniques were tested against EuroParl and One Billion Word corpuses, some of the largest ones available.

Rate this Article