BT

New Early adopter or innovator? InfoQ has been working on some new features for you. Learn more

Facebook Builds an Efficient Neural Network Model over a Billion Words

| by Alex Giamas Follow 3 Followers on Dec 12, 2016. Estimated reading time: 1 minute | NOTICE: QCon.ai - Applied AI conference for Developers Apr 9-11, 2018, San Francisco. Join us!

Using Neural Networks for sequence prediction is a well-known Computer Science problem with a vast array of applications in speech recognition, machine translation, language modeling and other fields. Models being used are really computationally demanding which limits their practical applicability.

Facebook AI Research scientists designed adaptive softmax, an approximation algorithm tailored for GPUs which can be used to efficiently train neural networks over huge vocabularies. Adaptive softmax, as described in the published paper exploits unbalanced word distribution over large corpora to form clusters that can minimize expectation of computational complexity. Full softmax has a linear correlation with the vocabulary corpus size, whereas adaptive softmax is sublinear and optimized for GPU usage.

In conjunction with the development of adaptive softmax, Facebook researchers released torch-rnnlib, an open-source library that helps researchers designing and testing recurrent models in GPUs. torch.cudnn allows for easy access to baselines using NVIDIA CUDA Deep Neural Network library. RNN, LSTM, GRU and other recurrent networks are implemented and can be easily used by researchers as building blocks to design recurrent networks.

Testing the algorithm in a single GPU, Facebook researchers achieved 12,500 words/sec while maintaining accuracy close to the full softmax. Benchmark perplexities achieved by researchers were 30 (lower is better) by Google’s Jozefowicz et al, 2016 using 32GPUs over three weeks and 44 using 18GPU days for training. Google’s implementation of LSTM model using Tensorflow is available on Github and the main author offers an interesting explanation of perplexity performance in a relevant thread in Reddit. In contrast, adaptive softmax can get to perplexity of 50 within ~14 hours, 43.9 within a couple of days and 39.8 in six days. Without CuDNN library, performance drops by ~30%. All tools and techniques were tested against EuroParl and One Billion Word corpuses, some of the largest ones available.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT