The Machine Learning Behind Android Pie Smart Linkify API

Last week, Google announced Android 9, codenamed Pie. Android is launching a set of new features, powered by Artificial Intelligence. One of the most important new AI powered features, is Android Smart Linkify.

Smart Linkify builds upon the Smart Text Selection which was released with the previous version, Android Oreo. Smart Linkify can detect certain types of entities in text (e.g. address, phone number) and add clickable links to them, allowing the user to launch a map or make a phone call directly. It is powered by an in-device feed-forward Neural Network that takes up just 500kB per language and inference code of no more than 250kB. The system is designed to be near-real time, calculating the result in less than 20ms on a Google Pixel phone.

The system starts by tokenizing input text into words by spaces and calculates all possible word subsequences of up to 15 words. Each subsequence is fed into the Neural Network which assigns a [0…1] range value based on its validity as an entity. After removing overlapping entities, the system favors the subsequences with a higher score. At the end of this first part of the process, we have non-overlapping word subsequences of an unknown type each.

A second Neural Network is then used to identify the type of each word subsequence, be it a phone number, an address or no identified entity. This Neural Network takes as input the word subsequence in context. By taking the first three and last three words of the word subsequence as the Entity, the five words preceding them as Left context and the five words following them as Right context, it uses them as different features to identify its meaning. An interesting optimisation within this Neural Network is the usage of a binary feature to identify words that start with a capital letter. The reason behind it, is that postal addresses are quite unique in this sense and easier to identify this way.

To train the Neural Network, the team at Google generated fake and yet realistic examples out of real data. Using a custom list of entities, addresses, phone numbers and random words from Schema.org annotations, they synthesized a training set. Taking observable entities and surrounding them with random words helped reach the desirable outcome. In addition, intentionally negative data training examples were generated to train the network to avoid identifying for example a phrase like "ID: " as a phone number.

Internationalization is an important aspect of this feature and based on testing, one model works well for all Latin based languages, with individual models being added for Chinese, Japanese, Korean, Thai, Arabic and Russian. At the moment the API supports 16 languages, with more coming in the next months. Models were trained using TensorFlow and the custom in-device inference library is powered by TensorFlow Lite and FlatBuffers. Developers can start using Smart Linkify through the generateLinks method via the TextClassifier API.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

InfoQ Article Contest

Rate this Article

This content is in the Mobile topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter