BT

Google Details Allo Recommendation Graph Processing Algorithm

by Dylan Raithel on Nov 08, 2016 |

The Google Expander team announced a constant-runtime graph streaming algorithm that's now serving the basis for photo reply recommendations in the Allo application. Google describes using the proximity of nodes to each other to infer likely categories and properties of unlabelled nodes, in this case, input data of photos, text or other data comprising the heterogenous graph. The semi-supervised learning approach reduces the volume of training data required compared to supervised learning, which is noted as being cost prohibitive when graph algorithms need to process and learn from multi million or multibillion-node graphs.

All sizes and shapes, and composed of heterogeneous, multimodal data like graphs composed of text, image and video inputs, or various representations of these data "data representations" like image pixels and chat responses to photo reply in Allo; Data can be relational or structured data, as well as unstructured, sparse or dense representations extracted from raw data.

Google noted several attributes of the example graph, but also noted that the approach doesn’t scale to the multimillion, or sometimes multibillion node graphs. In the example for predicting "red" or "blue" for any node in the example graph, Google noted

Relationships between node data is represented via edges and thickness of each edge indicates strength of the connection… Edge strength is computed using similarity between embedding vectors — low similarity edges are ignored..."grey" represents unlabelled data whereas the colored nodes represent labeled data. Relationships between node data is represented via edges and thickness of each edge indicates strength of the connection. Note that the specific choice of graph structure and color depend on the task... this method does not scale to large graphs

A more applied or real-world use case Google provided was that of identifying humorous words from a priori of labeled words arranged in a likeness proximity graph.

The constant-runtime algorithm propagates from neighboring nodes in a distributed manner in order to work on very large graphs and calculates whether or not a given word is humorous, based on applied semi-supervised learning over the graph to discover emotion categories for remaining words. Google reported the complexity space and memory requirements of the system remain constant regardless of task difficulty, number of prediction labels, as well as the size of potential output space factor into the algorithm design decisions. There aren’t any code samples, data sets or their attributes available at this time.

"In practice, we use complex optimisation functions defined over the graph structure, which incorporate additional information and constraints for semi-supervised graph learning that can lead to hard, non-convex problems. The real challenge, however, is to scale this efficiently to graphs containing billions of nodes, trillions of edges and for complex tasks involving billions of different label types. "

Rate this Article

Relevance
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss
General Feedback
Bugs
Advertising
Editorial
Marketing
InfoQ.com and all content copyright © 2006-2016 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT

We notice you're using an ad blocker

We understand why you use ad blockers. However to keep InfoQ free we need your support. InfoQ will not provide your data to third parties without individual opt-in consent. We only work with advertisers relevant to our readers. Please consider whitelisting us.