Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Amazon Introduces Two New Features for Polly: Neural Text-to-Speech and Newscaster Style

Amazon Introduces Two New Features for Polly: Neural Text-to-Speech and Newscaster Style

Recently, Amazon announced the general availability of Neural Text-to-Speech (NTTS) technology in their Polly service in AWS, which turns text into lifelike speech. Furthermore, Amazon Polly now also offers a Newscaster speaking style.

At AWS re:Invent 2016, Amazon first introduced Amazon Polly - allowing customers to start building applications that could talk, and also enable them to develop entirely new categories of speech-enabled products. With this service, customers only needed to call an API and did not require any machine learning knowledge. Going forward, the team at Amazon responsible for Polly kept evolving the service by adding new voices – currently at a total of 29 languages and 59 voices.

Now the team has added two new major features:

  • NTTS, which delivers enhanced speech quality through a new machine learning approach. Moreover, the quality approaches human voices through the increase in naturalness and expressiveness.  NTTS is now available according to a blog post on the feature for 11 voices, both in real-time and in batch mode:
    • All three UK English voices: Amy, Emma and Brian.
    • All eight US English voices: Ivy, Joanna, Kendra, Kimberly, Salli, Joey, Justin and Matthew.
  • Newscaster style, which makes narration sound for content such as blog posts and news articles, sounds more like what people expect to hear on the TV or radio. Currently, the style is, according to the same blog post, available in two US English voices (Joanna and Matthew), both in real-time and in batch mode.

Note that users can quickly test both new features users through the AWS Console.

AWS evangelist Julien Simon stated in an AWS News blog post about the new features:

Speech quality is certainly important, but more can be done to make a synthetic voice sound even more realistic and engaging. What about style? For sure, human ears can tell the difference between a newscast, a sportscast, a university class and so on; indeed, most humans adopt the right style of speech for the right context, and this certainly helps in getting their message across.

On a Reddit thread about the new features, a respondent stated:

It's still not going to pass a blind test between a human and TTS, but I agree this sounds significantly better than before.

Several Amazon customers like Gannett (whose USA Today is the most widely read US newspaper) and The Globe and Mail, one of Canada’s top newspapers, use the Amazon Polly Newscaster feature today. Gannett’s Scott Stein, vice president of Content Ventures, said in an AWS Machine Learning blog post about the Newscaster feature:

With more than 100 newsrooms across the country, it’s important for Gannett | USA TODAY NETWORK to produce audio content efficiently. Services like Amazon Polly and features like its Newscaster's voice help us deliver breaking news and original reporting with increased speed and fidelity worthy of our brands.

Lastly, both features are available today in the US East (N. Virginia), US West (Oregon) and Europe (Ireland) regions. Furthermore, pricing starts with a free tier offering 1 million characters for NTTS voices per month for the first 12 months, starting from the first request for speech (standard or NTTS). Next to the free tier, customers can leverage the pay-as-you-go model – details are available on the pricing page.

Rate this Article