Facilitating the spread of knowledge and innovation in professional software development



Choose your language

InfoQ Homepage News Intel Open-Sources BigDL, Distributed Deep Learning Library for Apache Spark

Intel Open-Sources BigDL, Distributed Deep Learning Library for Apache Spark


Intel has open-sourced BigDL, a distributed deep learning library that runs on Apache Spark. It leverages existing Spark clusters to run deep learning computations and simplifies the data loading from big datasets stored in Hadoop.

Tests show a significant speedup performance running on Xeon servers compared to other open source frameworks Caffe, Torch or TensorFlow. The speed is comparable with a mainstream GPU and BigDL is able to scale to tens of Xeon servers.

The BigDL library supports Spark versions 1.5, 1.6 and 2.0 and allows for deep learning to be embedded in existing Spark based programs. It contains methods to convert Spark RDDs to a BigDL DataSet and can be used directly with Spark ML Pipelines.

For model training, BigDL applies a synchronous mini-batch SGD (Stochastic Gradient Descent) executed in a single Spark task across multiple executors. Each executor runs a multi-threaded engine and processes a part of the micro-batch data. In the current version, all the training and validation data is loaded into memory.

BigDL is implemented in Scala and is modeled after Torch. Like Torch, it provides a Tensor class, that uses Intel MKL library for computations. Intel MKL, short for Math Kernel Library, consists of a library with a set of routines optimized for calculations, ranging from FFT (Fast Fourier Transform) to matrix multiplications, that are heavily used for deep learning model training. Other concepts borrowed from Torch are Module, inspired on Torch’s nn package, that represents individual neural network layers, Table and Criterion.

BigDL provides an AWS EC2 image and examples for text classification using convolutional neural networks, image classification and how to load models pre-trained in Torch or Caffe into Spark for predictions computation. The main community requests are Python support and MKL-DNN, deep learning extensions for MKL.

We need your feedback

How might we improve InfoQ for you

Thank you for being an InfoQ reader.

Each year, we seek feedback from our readers to help us improve InfoQ. Would you mind spending 2 minutes to share your feedback in our short survey? Your feedback will directly help us continually evolve how we support you.

Take the Survey

Rate this Article


Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p


Is your profile up-to-date? Please take a moment to review and update.

Note: If updating/changing your email, a validation request will be sent

Company name:
Company role:
Company size:
You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.