Microsoft Launches New Cognitive Speech Services Features to Accelerate Language Learning

Microsoft recently launched new features for its Cognitive Speech Service to accelerate language learning with pronunciation assessment, new speech-to-text (STT) languages, and prebuilt and custom neural voice enhancements.

Microsoft Azure Cognitive Speech Services is a comprehensive collection of technologies and services such as Speech to Text, Text to Speech, custom neural voice (CNV) Conversation Transcription Service, Speaker Recognition, Speech Translation, Speech SDK, and Speech Device Development Kit (DDK) to accelerate speech incorporation into applications.

Pronunciation Assessment is a feature of Speech Service in the Azure Cognitive Services portfolio, publicly available in 10+ languages and variances, including American English, British English, Australian English, French, Spanish, and Chinese, with additional languages in preview. It utilizes Azure Neural Text-to-Speech and Transformer models, ordinal regression, and a hierarchical structure to improve the accuracy of the word-level assessment providing language learners of all backgrounds to improve their skills.

Source: https://techcommunity.microsoft.com/t5/ai-cognitive-services-blog/speech-service-update-hierarchical-transformer-for-pronunciation/ba-p/3740866

In addition, the Azure Speech to text supports real-time language identification for multilingual language learning scenarios and helps human-human interaction with better understanding and readable context. This service’s new speech-to-text (STT) languages are based on vast amounts of data leveraging the latest multilingual modeling technology and transfer learning techniques providing output, which includes Inverse Text Normalization (ITN), capitalization (when appropriate), and automatic punctuation to enhance readability.

Lastly, Microsoft Azure AI provides a range of prebuilt neural voices for AI teachers, content read-aloud capabilities, and more. Custom Neural Voice (CNV) also enables users to create a unique, customized synthetic voice for their applications, using human speech samples as training data. CNV is based on neural text-to-speech technology and is excellent for representing brands and personifying machines for conversational interactions. Education companies are using this technology to personalize language learning, for example, Duolingo and Pearson.

Qinying Liao, a principal program manager at Microsoft, stated in an Azure Tech community blog post:

Microsoft offers over 400 neural voices covering more than 140 languages and locales. With these Text-to-Speech voices, you can quickly add read-aloud functionality for a more accessible app design or give a voice to chatbots to provide a richer conversational experience to your users.

In general, Andy Beatman, a senior product marketing manager at Azure AI, said in an Azure AI blog post:

The integration of AI, specifically speech services, into the education sector is becoming increasingly important as it can greatly enhance the learning experience and improve the effectiveness of teaching. Speech services such as Azure Pronunciation Assessment and Custom Neural Voice provide personalization, automation, and analytics in education platforms, which can lead to better student engagement and achievement.

Lastly, more Azure Cognitive Speech Services details are available on the documentation landing page. Additionally, customers can use Speech Studio to test how custom speech features would help improve recognition for their audio.

About the Author

Steef-Jan Wiggers

Show moreShow less

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

InfoQ Article Contest

About the Author

Steef-Jan Wiggers

Rate this Article

This content is in the Cloud topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter