Facebook Releases Open Source "Detectron" Deep-Learning Library for Object Detection

Last January, Facebook AI Research (FAIR), the research arm of Facebook, released the open source Detectron object detection library. This release was followed several weeks later by Google with an update of its Tensorflow Image Recognition API. Both libraries implement the most recent deep-learning algorithms for object detection.

Detectron is available as a Python library available under the Apache 2.0 license and is built on Caffe2, a deep-learning framework backed by Facebook. The Detectron library is available on GitHub and includes scripts, pre-trained models as well as a Docker image to facilitate installation. Google’s Tensorflow Image Recognition API was first released in June 2017 and is part of a much larger TensorFlow Research repository of nearly 40 different deep-learning projects.

The pre-trained models included in both libraries have been trained on the COCO dataset, a large-scale object detection, segmentation, and captioning dataset which includes 80 object categories, over 200K labeled images and 1.5 million object instances. Both Facebook's Detectron and Google's Tensorflow Image Recognition API are primarily intended for research and are not yet production ready.

Object detection remains a challenging branch of computer vision but has applications in many areas of computer vision from simple face detection in digital cameras to image retrieval and video surveillance. Self-driving cars rely on real-time pedestrian detection, while automatically counting people or cars is valuable in urban planning.

The problem comes down to identifying an unknown number of objects of an unknown nature that may vary in size and are spread out across an image. The constraint to provide sufficient speed as well as high accuracy adds to the inherent difficulty of the task.

In machine-learning terms, object detection from still images requires solving two problems at the same time. Deciding if a particular region of the image is an object and finding out which object it may be. Current object detection models are built on convolutional neural networks (CNN), a specific architecture of neural networks. CNNs use a sliding rectangular windows across the original picture for feature extraction.

There are two main families of object detection algorithms. The R-CNN based algorithms handle detecting objects of all sizes by using multiple sliding windows also of varying sizes. The YOLO (You Only Look Once) type algorithms of object detection algorithms applies a one-time grid over the image and uses a different feature extraction and decision architecture. While previous algorithms were able to draw a box boundary across the detected objects, recent evolutions (Mask R-CNN and RetinaNet) draw a tight boundary around the edge of the objects. This important innovation is called instance segmentation and comes from classifying each pixel as belonging or not to the inferred object.

Reviews suggest that the TensorFlow Object Detection API is easier to use to train proprietary models. Its GitHub repository includes several Jupyter Notebooks for installation, model training and transfer learning. More tutorials are also currently available online for the Google object detection library.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter