Google has released new APIs and tools for their Coral AI toolkit. The new release brings parity across the C++ and Python SDKs and includes more efficient memory usage. Other updates include additional pre-trained models and general availability of model pipelining.
Coral product manager Carlos Mendonça provided an overview of the release on the TensorFlow Blog. The new APIs expose the TensorFlow Lite interpreter object and also provide convenience methods for common machine-learning (ML) inferencing tasks. An additional family of pre-trained object-detection models, MobileDets, has been added to the model "garden". The APIs also include updates for on-device fine-tuning of pre-trained models. Finally, model pipelining---the ability to partition large models across multiple devices for distributed inference---has graduated from beta to general availability. According to Mendonça, the team's goal in this release was to:
[R]efactor our APIs and make them more modular, reusable and performant, while at the same time eliminating unnecessary API abstractions and surfacing more of the native TensorFlow Lite APIs that developers are familiar with.
Because training ML models often requires considerable compute resources and specialized hardware such as Graphics-Processing Units (GPUs) or Tensor-Processing Units (TPUs), developers often turn to the cloud for training. However, in many Internet of Things (IoT) applications, the cloud is often not a good choice for inferencing---the use of a trained model at run-time; for example, to detect objects in an image. Instead, inferencing on a local or edge device may be required because of privacy concerns or network latency, or in scenarios where networking is unreliable or unavailable.
Because these edge devices often have constrained compute and power resources, resulting in long runtimes for inference, manufacturers have begun shipping low-power inference accelerators. Google's Coral platform is one such, based on a custom ASIC, the Edge TPU. The Coral platform also includes a software SDK based on TensorFlow Lite. Developers can train models in the cloud with TensorFlow, export them to TensorFlow Lite, compile them with the Edge TPU Compiler, then deploy them on edge devices with a Coral accelerator. Coral also supports fine-tuning models on-device, if the model is a neural network designed to support training on the last layer.
The latest Coral software update refactors the previous SDKs into two new libraries: libcoral, a C++ library, and PyCoral, a Python package. Both libraries wrap the TensorFlow Lite APIs and provide convenience methods, such as a single call to set up the TensorFlow interpreter and helpers for handling model output. Both SDKs offer improved support for multi-accelerator systems, including device labels and model pipelining. Model pipelining improves runtime performance by partitioning large models across multiple accelerators; the new release includes a profiling-based partitioning algorithm that monitors latency during runtime to optimize the partitions.
One drawback of the Coral platform is that it only supports TensorFlow Lite, and in particular only models that have been 8-bit quantized. Other hardware manufacturers offer comparable accelerator solutions that support multiple model sources. For example, LG's AIoT boards include the LG Neural Engine that supports both TensorFlow Lite and Caffe, while NVIDIA's Jetson boards use a containerized approach to support multiple frameworks.
The source code for the new libcoral and PyCoral libraries is available on GitHub.