Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Building Conversational Apps Using Actions on Google and API.AI

Building Conversational Apps Using Actions on Google and API.AI

At the Amazon re:Invent conference, Amazon announced Lex, a deep learning service that is based upon the technology used by Alexa in Amazon’s portable Bluetooth and Wi-Fi enabled Echo speaker. Shortly after Amazon’s announcement, Google introduced Actions on Google which allow developers to build Google Assistant-based conversational apps, including integration with the Google Home device.

Image Source:

Wayne Piekarsk, a senior developer advocate at Google, describes the Actions on Google platform as a way for:

You as a developer to integrate your services with the Google Assistant.

This integration is achieved through:

Conversation actions which enable you to fulfill a user request action through a two-way dialog. When users request an action, the Google Assistant processes this request, determines the best action to invoke, and invokes your Conversation Action if relevant. From there, your action manages the rest, including how users are greeted, how to fulfill the user's request, and how the conversation ends.

An example that Google has used to illustrate Actions on Google is a Personal Chef application that allows end users to interact with the recipe finding service through a Google Home device. An end user will specify what mood they are in and what ingredients they have. The conversational app will then interpret your mood and understand what ingredients are available to provide a recipe that aligns to your appetite and mood.

Historically, it has been challenging to write these types of applications as it is difficult to extract the meanings out of the action. Mehdi Samadi, co-founder and CTO at Solvvy, explains:

In terms of an AI technology, even converting a command/instruction such as “show me cheap Indian restaurants near me” to a set of executable commands is not an easy task. It requires understanding that the user wants to see restaurants with “Indian” cuisine, and also being personalized in terms of interpreting what users mean by “cheap”.

Conversation Actions have been developed by Google to aid developers in building conversational apps that address these contextually aware challenges.

Image Source: (screenshot)

Conversation Actions are made up of three main components, including:

  • Invocation triggers define how users invoke and discover your actions. Once triggered, your action carries out a conversation with users, which is defined by dialogs.
  • Dialogs define how users converse with your actions and act as the user interface for your actions. They rely on fulfillment code to move the conversation forward.
  • Fulfillment is the code that processes user input and returns responses and you expose it as a REST endpoint. Fulfillment also typically has the logic that carries out the actual action like retrieving recipes or news to read aloud.

There are three ways to invoke conversation actions:

  • Conversation APIs which provide request and response formats that must be used to communicate with the Google Assistant.
  • Actions SDK which includes a NodeJS client library, Action Package definition, CLI and Web Simulator.
  • Other tools including API.AI.

API.AI is a recent Google acquisition, which allows developers to build conversational interfaces. As of September 2016, Scott Huffman, VP of engineering at Google, claims:

Over 60,000 developers are using API.AI to build conversational experiences, for environments such as Slack, Facebook Messenger and Kik.

Within the Actions on Google platform, developers can plug API.AI into their conversational interfaces to reduce the amount of text transcription that is typically required in the Conversation API. Piekarsk highlights some of the benefits of using API.AI with the Actions on Google platform:

API.AI Provides an intuitive graphical user interface to create conversational interfaces and it does the heavy lifting in terms of managing conversational state and filling out slots and forms.

Image Source: (screenshot)

To handle a conversation, developers can use the Developer console to define Intents. In the context of Google’s Personal Chef recipe example, this includes defining information that you need from the user, such as ingredients, temperature, type of food and cooking time.

Next, developers will need to provide example sentences. API.AI will use these examples sentences to train its Machine Learning algorithms, which will allow it to process other possible sentences from users. Developers do not need to write regular expressions for API.AI to parse additional sentences.

Image Source:

Developers can manually set acceptable values for each piece of information. Once this has been provided, API.AI can use this information to extract meaning out of spoken sentences. For example, developers can define a list of entities that map to a protein. If the Personal Chef app is expecting protein to be part of a recipe, developers can include a list of synonyms like beef, lamb, tofu, chicken, etc.

When users are communicating with the application, users can provide information naturally including only pieces of information, which can be out of order and if the application does not receive complete information, or is unclear about it, the action can provide follow up questions.

Developers can also connect Intents to backend webhooks which allow for extensibility by connecting to 3rd party platforms like IFTTT, Zapier or Azure Logic Apps. When a webhook is called, all appropriate data is passed as JSON strings.

Once developers have configured their Intents and Entities within the developer console, they can launch a test on API.AI Web Simulator, preview it on a personal Google Home before making it available to all Google Home users.

Rate this Article