TensorFlow 3D: Deep Learning for Autonomous Cars’ 3D Perception

Google has released TensorFlow 3D, a library that adds 3D deep-learning capabilities to the TensorFlow machine-learning framework. The new library brings tools and resources that allow researchers to develop and deploy 3D scene understanding models.

TensorFlow 3D contains state-of-the-art models for 3D deep learning with GPU acceleration. These models have a wide range of applications from 3D object detection (e.g. cars, pedestrians,etc) to point cloud’s registration (e.g. 3D indoors mapping).

For instance, 3D object detection is a hard problem using point cloud data due to high sparsity. To make it easier to build efficient models, TensorFlow 3D provides sparse convolution network and pooling operations. We can imagine convolution networks in 3D as a method to handle sparsity in this data. Similarly, one can think of a convolution network for an image with only a few non-zero pixels.

From the previous mentioned operation in Tensorflow 3D, there is a fundamental model called U-Net that is a backbone for performance 3D object detection as well as 3D semantic segmentation (classify each point in space as belonging to an object category). An example using Tensorflow 3D semantic segmentation follows:

from tf3d.layers import sparse_voxel_unet

task_names_to_num_output_channels = {'semantics': 5, 'embedding': 64}
task_names_to_use_relu_last_conv = {'semantics': False, 'embedding': False}
task_names_to_use_batch_norm_in_last_layer = {'semantics': False, 'embedding': False}

unet = sparse_voxel_unet.SparseConvUNet(
  task_names_to_num_output_channels,
  task_names_to_use_relu_last_conv,
  task_names_to_use_batch_norm_in_last_layer,
  encoder_dimensions=((32, 48), (64, 80)),
  bottleneck_dimensions=(96, 96),
  decoder_dimensions=((80, 80), (64, 64)),
  network_pooling_segment_func=tf.math.unsorted_segment_max)
outputs = unet(voxel_features, voxel_xyz_indices, num_valid_voxels)
semantics = outputs['semantics']
embedding = outputs['embedding']

For 3D perception in autonomous driving, there has been an increasing demand for LiDAR and depth cameras, which are the most commonly used sensors for autonomous driving. In addition, research in 3D scene understanding, such as 3D object detection (e.g. cars, pedestrians, etc), has considerably increased accuracy and inference speed. According to Google, this implementation is around 20x faster on the Waymo Open Dataset than a well-designed implementation with pre-existing TensorFlow operations.

Finally, according to a recent paper published by Google AI researchers there is a significant performance improvement in LiDAR 3D object detection on Waymo Open Dataset.

Experiments on the Waymo Open Dataset show that our algorithm outperforms the traditional frame by frame approach by 7.5% mAP* @0.7 and other multi-frame approaches by 1.2% while using less memory and computation per frame.

(Mean average precision (mAP) is a metric for evaluating object detection in deep learning.)

This achievement is hugely valuable for LiDAR perception as well as 3D mapping for autonomous driving.

TensorFlow 3D is just one of the 3D deep learning extensions in the market. Facebook launched PyTorch3D in 2020, more dedicated to 3D rendering and virtual reality. Another player in the market is Kaolin from NVIDIA, a modular differentiable rendering for applications like high resolution simulation environments. From this overview, seems that TensorFlow 3D application is more dedicated to robotics perception as well as mapping, while other options are more dedicated to 3D simulation and rendering. For the purpose of 3D rendering Google has TensorFlow Graphics.

Image Source: Waymo Open Dataset on GitHub

InfoQ Software Architects' Newsletter

Follow us on

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter