Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Waymo Shares Autonomous Vehicle Dataset for Machine Learning

Waymo Shares Autonomous Vehicle Dataset for Machine Learning

This item in japanese

Waymo, the self-driving technology company owned by Google's parent company, Alphabet, released a dataset containing sensor data collected by their autonomous vehicles during more than five hours of driving. The set contains high-resolution data from lidar and camera sensors collected in several urban and suburban environments in a wide variety of driving conditions, and includes labels for vehicles, pedestrians, cyclists, and signage.

The Waymo team announced the release of the Waymo Open Dataset in a blog post, describing it as "one of the largest, richest, and most diverse self-driving datasets ever released for research." The data was collected by Waymo's vehicles operating in the USA in Phoenix, AZ, Kirkland, WA, Mountain View, CA and San Francisco, CA, at various times of day and night, and in good and bad weather. The dataset consists of 1,000 segments of 20 seconds each, collected at 10Hz (i.e., 200,000 frames) which contain:

  • Synchronized data from five lidars and five front-and-side-facing cameras
  • Sensor calibrations and poses
  • Object labels (vehicles, pedestrians, cyclists, and signage) with 3D bounding boxes for all lidar frames
  • Object labels with 2D bounding boxes for camera data in 100 segments

Waymo also released a Google Colab notebook containing tutorials and a GitHub repository containing TensorFlow helper-code for building models. This large labelled dataset could be used for supervised-machine-learning of models for detecting obstacles and traffic signs, a key ability for any self-driving vehicle. While lidar can produce a point-cloud map locating objects in 3D space, it cannot detect colors, and thus is completely blind to the letters on road signs, for example. The 2D camera images lack distance information, although images from multiple cameras can be processed to recreate depth. And while Elon Musk maintains that lidar is "unnecessary," the combination of lidar's 3D data with the 2D camera data can simplify the process of finding the distance to obstacles detected in images.

Lidar Point Cloud


Lyft announced a similar dataset last month, Lyft Level 5 (named for the SAE’s highest level of vehicle autonomy). Lyft's dataset contains 55,000 frames, about a quarter the number of Waymo's; each of Lyft's frames contains data from more cameras (seven) and fewer lidars (three) compared to Waymo's. Both companies hope that their data will be used by the research community to improve algorithms and models. Lyft specifically emphasized academic research in their release, and plans to sponsor a machine-learning competition using their dataset.

Not surprisingly, both datasets are licensed for non-commercial use only. Lyft's is released under the Creative Commons Attribution-NonCommercial-ShareAlike license. Waymo's license is quite restrictive, and even precludes the use "in operation of a vehicle or to assist in the operation of a vehicle." A user on Twitter pointed out that though the Waymo describes the dataset as "open", the license agreement "doesn't meet the well-understood definition of open."

While in one sense autonomous cars are already a reality---Waymo's self-driving taxis have been operating in Phoenix for over two years---and research indicates that in the future the robot cars could save lives, it is not clear that they are currently "ready for prime-time." Waymo's taxis always have a human behind the wheel as a safety backup, and the self-driving software sometimes gives passengers a harrowing experience. Technology news site The Information examined the passenger ratings and feedback for over 10,000 Waymo trips in July and August. Although 70% of the trips received a perfect rating, an improvement from the first quarter of the year, some riders complained that the experience was "uncomfortable and downright alarming." Other riders complained that the cars chose circuitous routes that made them late.

AI researcher and Roomba co-founder Rodney Brooks says he doesn't expect a real robot taxi service before 2032:

The true test of the viability of driverless cars will be when they are not just in testing or in demonstration, but when the owners of driverless taxis or ride sharing services or parking garages for end consumer self driving cars are actually making money at it.

Rate this Article