Deep Convolutional Networks for Super-Resolution Image Reconstruction at Flipboard

Flipboard recently reported on an in-house application of deep learning to scale up low-resolution images that illustrates the power and flexibility of this class of learning algorithms.

Super-resolution reconstruction of images is an ongoing research problem in image processing and computer vision. The task consists in reconstructing a higher resolution image from a scaled down version. Unlike what TV wants to make us believe, this reconstruction is not possible from the downscaled image alone, because the finer structure is irrevocably lost.

Instead, algorithms often work by using a dictionary of pairs of image patches between higher resolution images and their scaled down versions in order to fill in the lost pieces of information.

Neural networks, in particular deep convolutional neuronal networks, are one class of machine learning algorithm that is able to effectively learn that mapping between pairs of image patches, and to generalize this prediction beyond the set of given examples.

Neural networks have been invented as early as the 1940s as a mathematical model for neural computation in the brain, but only the advances in computing technologies in the past two decades have made it possible to simulate large networks with millions of neurons effectively.

A neural network consists of a number of interconnected layers that gradually transform a number of input variables into an output. By tuning the weights between the neurons, a large number of possible mapping can be computed.

Convolutional neural networks additionally contain layers that are freely configurable image filter banks, that can form edge detectors, for examples, if the learning problem requires it.

The article by Flipboard engineer Norman Tasfi reports on the application of this technology at Flipboard. It is used to scale up low-resolution images in order to obtain visually appealing images. These images are used, for example, as background or header images in the presentation of articles in their product.

Flipboard used neural networks consisting of eight layers, three of which were convolutional layers. The networks were trained on the Amazon cloud using GPU acceleration via NVIDIA's cuDNN library, using gradient descent, which is a standard procedure that looks at the prediction error of individual input-output examples and then computes an update on the weight parameters of the neural network to lower the error based on the gradient information of the error.

Update information was aggregated over 250 examples, so-called mini-batches, before performing the update to smooth out the update information.

Training one model took about 19 hours. All in all, about 500 different network configurations were tested, resulting in an overall training time of about four weeks.

The article does not give performance numbers but illustrates on a few example images that the resulting reconstructions are indeed of high quality. Compared with plain scaling algorithms like bicubic interpolation, especially fine detail is reconstructed better.

Norman also reports that they were surprised to find that aspects like the initialization of weights to be a key point in obtaining good performance.

This example shows how deep neural networks are moving beyond purely academic use cases.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Big Data topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter