TensorFlow Lite Supports On-Device Conversational Modeling

TensorFlow Lite, the light-weight solution of open source deep learning framework TensorFlow, supports on-device conversation modeling to plugin the conversational intelligence features into chat applications. The TensorFlow team recently announced the developer preview of TensorFlow Lite, which can be used in mobile and embedded devices.

The need to deploy the machine learning models on mobile and embedded devices has grown over the last few years. Earlier this year the Google team launched Android Wear 2.0 technology which brings the Google Assistant to your wrist. This featured the first on-device machine learning technology for smart messaging and enabled cloud-based technologies like Smart Reply, already available in Gmail, Inbox and Allo, to be used directly within the application without having to connect to the cloud.

TensorFlow has been used on many platforms from servers to IoT devices and now the TensorFlow Lite can be used to enable low-latency inference of on-device machine learning models. It's designed to be light weight and cross-platform with a runtime designed to run on different platforms, starting with Android and iOS. It uses different low latency techniques such as optimizing the kernels for mobile apps, pre-fused activations, and quantized kernels that allow smaller and faster (fixed-point math) models.

It's important to note that TensorFlow already supports the TensorFlow Mobile API, that's used for mobile and embedded deployment of machine learning models. TensorFlow Lite is the evolution of TensorFlow Mobile, and as it matures it will become the recommended solution for deploying models on devices.

The architectural design of TensorFlow Lite includes the following individual components:

TensorFlow Model: A trained TensorFlow model saved on disk.
Converter: This program converts the model to the TensorFlow Lite file format.
Model File: A model file format based on FlatBuffers, optimized for speed and size.

TensorFlow Lite supports hardware acceleration with the Android Neural Networks API. It also has support for different models such as MobileNet, Inception v3, and Smart Reply.

On-Device Conversational Modeling

As part of the TensorFlow Lite library, the team has also released an on-device conversational model and a demo application with an example of a natural language application. Developers and researchers can reference this application to build new machine intelligence features powered by on-device inference. This model generates reply suggestions to input conversational chat messages, with inference that can be easily plugged into chat applications that require conversational intelligence features.

The conversational model uses a new ML architecture for training compact neural networks based on a joint optimization framework as discussed in Sujith Ravi's research paper on on-device deep networks using neural projections. This architecture uses efficient “projection” operations that transform the input to a compact bit vector representation. Similar inputs are projected to nearby vectors that are dense or sparse depending on type of projection. For example, messages like “hey, how's it going?” and “How's it going buddy?”, might be projected to the same vector representation.

This on-device model is trained end-to-end using a machine learning framework that jointly trains two types of models: a compact projection model combined with a trainer model. After it’s been trained, the projection model can be used directly for inference on device.

In future releases, TensorFlow Lite will support more models and built-in operators, performance improvements for fixed-point and floating-point models.

TensorFlow Lite developer preview documentation, code samples and demo applications are available on GitHub. You can also find a list of sample messages used by this conversational model.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

InfoQ Article Contest

On-Device Conversational Modeling

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter