InfoQ Homepage Natural Language Processing Content on InfoQ

News

RSS Feed

Newer Older

AI, ML & Data Engineering

Amazon Announces One Billion Parameter Speech Model BASE TTS

Amazon Science recently published their work on Big Adaptive Streamable TTS with Emergent abilities (BASE TTS). BASE TTS supports voice-cloning and outperforms baseline TTS models when evaluated by human judges. Further, Amazon's experiments show that scaling model and data size improves the subjective quality of the model's output.

Anthony Alford
on Mar 05, 2024
AI, ML & Data Engineering

Google Open-Sources AI Fine-Tuning Method Distilling Step-by-Step

A team from the University of Washington and Google Research recently open-sourced Distilling Step-by-Step, a technique for fine-tuning smaller language models. Distilling Step-by-Step requires less training data than standard fine-tuning and results in smaller models that can outperform few-shot prompted large language models (LLMs) that have 700x the parameters.

Anthony Alford
on Oct 24, 2023
AI, ML & Data Engineering

Meta Open-Sources Multilingual Translation Foundation Model SeamlessM4T

Meta recently open-sourced Massively Multilingual & Multimodal Machine Translation (SeamlessM4T), a multilingual translation AI that can translate both speech audio and text data across nearly 100 languages. SeamlessM4T is trained on 1 million hours of audio data and outperforms the current state-of-the-art speech-to-text translation model.

Anthony Alford
on Sep 19, 2023
AI, ML & Data Engineering

Meta's Voicebox Outperforms State-of-the-Art Models on Speech Synthesis

Meta recently announced Voicebox, a speech generation model that can perform text-to-speech (TTS) synthesis in six languages, as well as edit and remove noise from speech recordings. Voicebox is trained on over 50k hours of audio data and outperforms previous state-of-the-art models on several TTS benchmarks.

Anthony Alford
on Jul 25, 2023
AI, ML & Data Engineering

Google's Speech AI AudioPaLM Performs Translation with Voice Transfer

Researchers at Google announced AudioPaLM, a large language model (LLM) that performs text-to-speech (TTS), automated speech recognition (ASR), and speech-to-speech translation (S2ST) with voice transfer. AudioPaLM is based on the PaLM-2 LLM and outperforms OpenAI's Whisper on translation benchmarks.

Anthony Alford
on Jul 11, 2023
AI, ML & Data Engineering

Meta's Open-Source Massively Multilingual Speech AI Handles over 1,100 Languages

Meta AI open-sourced the Massively Multilingual Speech (MMS) model, which supports automatic speech recognition (ASR) and text-to-speech synthesis (TTS) in over 1,100 languages and language identification (LID) in over 4,000 languages. MMS can outperform existing models and covers nearly 10x the number of languages.

Anthony Alford
on Jun 13, 2023
AI, ML & Data Engineering

Google's Universal Speech Model Performs Speech Recognition on Hundreds of Languages

Google Research announced Universal Speech Model (USM), a 2B parameter automated speech recognition (ASR) model trained on over 12M hours of speech audio. USM can recognize speech in over 100 languages, including low-resource languages, and achieves new state-of-the-art performance on several benchmarks.

Anthony Alford
on May 16, 2023
AI, ML & Data Engineering

Stability AI Open-Sources 7B Parameter Language Model StableLM

Stability AI released two sets of pre-trained model weights for StableLM, a suite of large language models (LLM). The models are trained on 1.5 trillion text tokens and are licensed for commercial use under CC BY-SA-4.0.

Anthony Alford
on May 02, 2023
AI, ML & Data Engineering

Microsoft Semantic Kernel Enables LLM Integration with Conventional Programs

Microsoft has open sourced Semantic Kernel (SK), a lightweight SDK enabling the integration of large language models (LLMs) with conventional programs which can leverage prompt templating, vectorized memory, intelligent planning, and other capabilities.

Sergio De Simone
on Mar 31, 2023
AI, ML & Data Engineering

Microsoft Open Sources AI Prompt Optimization Toolkit LMOps

Microsoft Research open sourced LMOps, a collection of tools for improving text prompts used as input to generative AI models. The toolkit includes Promptist, which optimizes a user's text input for text-to-image generation, and Structured Prompting, a technique for including more examples in a few-shot learning prompt for text generation.

Anthony Alford
on Feb 07, 2023
Mobile

Generating Text Inputs for Mobile App Testing Using GPT-3

A group of researchers from the Chinese Academy of Sciences and Monash University have presented a new approach to text input generation for mobile app testing based on a pre-trained large language model (LLM). Dubbed QTypist, the approach was evaluated on 106 Android apps and automated test tools, showing a significant improvement of testing performance.

Sergio De Simone
on Jan 04, 2023
AI, ML & Data Engineering

Google Publishes Technique for AI Language Model Self-Improvement

Researchers at Google and University of Illinois at Urbana-Champaign (UIUC) have published a technique called Language Model Self-Improved (LMSI), which fine-tunes a large language model (LLM) on a dataset generated by that same model. Using LMSI, the researchers improved the performance of the LLM on six benchmarks and set new state-of-the-art accuracy records on four of them.

Anthony Alford
on Jan 03, 2023
AI, ML & Data Engineering

OpenAI Unveils a Powerful, Cost-Effective, and User-Friendly Embedding Model

OpenAI is introducing text-embedding-ada-002, a cutting-edge embedding model that combines the capabilities of five previous models for text search, text similarity, and code search. This new model outperforms the previous most capable model, Davinci, on most tasks, while being significantly more cost-effective at 99.8% lower pricing.

Daniel Dominguez
on Dec 26, 2022
AI, ML & Data Engineering

OpenAI Releases Conversational AI Model ChatGPT

OpenAI released ChatGPT, a conversational AI model based on their GPT-3.5 language model (LM). ChatGPT is fine-tuned using Reinforcement Learning from Human Feedback (RLHF) and includes a moderation filter to block inappropriate interactions.

Anthony Alford
on Dec 13, 2022
AI, ML & Data Engineering

Google's Code-as-Policies Lets Robots Write Their Own Code

Researchers from Google's Robotics team have open-sourced Code-as-Policies (CaP), a robot control method that uses a large language model (LLM) to generate robot-control code that achieves a user-specified goal. CaP uses a hierarchical prompting technique for code generation that outperforms previous methods on the HumanEval code-generation benchmark.

Anthony Alford
on Nov 29, 2022

Newer News

Older News

InfoQ Software Architects' Newsletter

News