Facebook Builds an Efficient Neural Network Model over a Billion Words

by Alex Giamas on Dec 12, 2016 |

Using Neural Networks for sequence prediction is a well-known Computer Science problem with a vast array of applications in speech recognition, machine translation, language modeling and other fields. Models being used are really computationally demanding which limits their practical applicability.

Facebook AI Research scientists designed adaptive softmax, an approximation algorithm tailored for GPUs which can be used to efficiently train neural networks over huge vocabularies. Adaptive softmax, as described in the published paper exploits unbalanced word distribution over large corpora to form clusters that can minimize expectation of computational complexity. Full softmax has a linear correlation with the vocabulary corpus size, whereas adaptive softmax is sublinear and optimized for GPU usage.

In conjunction with the development of adaptive softmax, Facebook researchers released torch-rnnlib, an open-source library that helps researchers designing and testing recurrent models in GPUs. torch.cudnn allows for easy access to baselines using NVIDIA CUDA Deep Neural Network library. RNN, LSTM, GRU and other recurrent networks are implemented and can be easily used by researchers as building blocks to design recurrent networks.

Testing the algorithm in a single GPU, Facebook researchers achieved 12,500 words/sec while maintaining accuracy close to the full softmax. Benchmark perplexities achieved by researchers were 30 (lower is better) by Google’s Jozefowicz et al, 2016 using 32GPUs over three weeks and 44 using 18GPU days for training. Google’s implementation of LSTM model using Tensorflow is available on Github and the main author offers an interesting explanation of perplexity performance in a relevant thread in Reddit. In contrast, adaptive softmax can get to perplexity of 50 within ~14 hours, 43.9 within a couple of days and 39.8 in six days. Without CuDNN library, performance drops by ~30%. All tools and techniques were tested against EuroParl and One Billion Word corpuses, some of the largest ones available.

Rate this Article


Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

General Feedback
Marketing and all content copyright © 2006-2016 C4Media Inc. hosted at Contegix, the best ISP we've ever worked with.
Privacy policy

We notice you're using an ad blocker

We understand why you use ad blockers. However to keep InfoQ free we need your support. InfoQ will not provide your data to third parties without individual opt-in consent. We only work with advertisers relevant to our readers. Please consider whitelisting us.