InfoQ Homepage Neural Networks Content on InfoQ
-
Meta Announces Generative AI Models Emu Video and Emu Edit
Meta AI Research announced two new generative AI models: Emu Video, which can generate short videos given a text prompt, and Emu Edit, which can edit images given text-based instructions. Both models are based on Meta's Emu foundation model and exhibit state-of-the-art performance on several benchmarks.
-
Spotify Open-Sources Voyager Nearest-Neighbor Search Library
Spotify Engineering recently open-sourced Voyager, an approximate nearest-neighbor (ANN) search library. Voyager is based on the hierarchical navigable small worlds (HNSW) algorithm and is 10 times faster than Spotify's previous ANN library, Annoy.
-
Stability AI Releases Generative Audio Model Stable Audio
Harmonai, the audio research lab of Stability AI, has released Stable Audio, a diffusion model for text-controlled audio generation. Stable Audio is trained on 19,500 hours of audio data and can generate 44.1kHz quality audio in realtime using a single NVIDIA A100 GPU.
-
Meta's Voicebox Outperforms State-of-the-Art Models on Speech Synthesis
Meta recently announced Voicebox, a speech generation model that can perform text-to-speech (TTS) synthesis in six languages, as well as edit and remove noise from speech recordings. Voicebox is trained on over 50k hours of audio data and outperforms previous state-of-the-art models on several TTS benchmarks.
-
Meta's Open-Source Massively Multilingual Speech AI Handles over 1,100 Languages
Meta AI open-sourced the Massively Multilingual Speech (MMS) model, which supports automatic speech recognition (ASR) and text-to-speech synthesis (TTS) in over 1,100 languages and language identification (LID) in over 4,000 languages. MMS can outperform existing models and covers nearly 10x the number of languages.
-
Meta Open-Sources Computer Vision Foundation Model DINOv2
Meta AI Research open-sourced DINOv2, a foundation model for computer vision (CV) tasks. DINOv2 is pretrained on a curated dataset of 142M images and can be used as a backbone for several tasks, including image classification, video action recognition, semantic segmentation, and depth estimation.
-
Google's Universal Speech Model Performs Speech Recognition on Hundreds of Languages
Google Research announced Universal Speech Model (USM), a 2B parameter automated speech recognition (ASR) model trained on over 12M hours of speech audio. USM can recognize speech in over 100 languages, including low-resource languages, and achieves new state-of-the-art performance on several benchmarks.
-
Microsoft Open-Sources Weather Forecasting Deep Learning Model ClimaX
Researchers from Microsoft's Autonomous Systems and Robotics Research group have open-sourced ClimaX, a deep learning foundation model for weather and climate modeling. ClimaX can be fine-tuned for a variety of prediction tasks and performs as well as or better than state-of-the-art models on several benchmarks.
-
Stanford Researchers Develop Brain-Computer Interface for Speech Synthesis
Researchers from Stanford University have developed a brain-computer interface (BCI) for synthesizing speech from signals captured in a patient's brain and processed by a recurrent neural network (RNN). The prototype system can decode speech at 62 words-per-minute, 3.4x faster than previous BCI methods.
-
DeepMind Announces Minecraft-Playing AI DreamerV3
Researchers from DeepMind and the University of Toronto announced DreamerV3, a reinforcement-learning (RL) algorithm for training AI models for many different domains. Using a single set of hyperparameters, DreamerV3 outperforms other methods on several benchmarks and can train an AI to collect diamonds in Minecraft without human instruction.
-
Microsoft Unveils VALL-E, a Game-Changing TTS Language Model
Microsoft has introduced VALL-E, a novel language model method for text-to-speech synthesis (TTS) that employs audio codec codes as intermediate representations and can replicate anyone's voice after listening to just three seconds of audio recording.
-
Deep Learning Pioneer Geoffrey Hinton Publishes New Deep Learning Algorithm
Geoffrey Hinton, professor at the University of Toronto and engineering fellow at Google Brain, recently published a paper on the Forward-Forward algorithm (FF), a technique for training neural networks that uses two forward passes of data through the network, instead of backpropagation, to update the model weights.
-
Researchers Publish Survey of Algorithmically-Efficient Deep Learning
Researchers from Lawrence Livermore National Laboratory and MosaicML have published a survey of over 200 papers on algorithmically-efficient deep learning. The survey includes a taxonomy of methods to speed up training as well as a practitioner's guide for mitigating training bottlenecks.
-
Meta's CICERO AI Wins Online Diplomacy Tournament
Meta AI Research recently open-sourced CICERO, an AI that can beat most humans at the strategy game Diplomacy, a game that requires coordinating plans with other players. CICERO combines chatbot-like dialogue capabilities with a strategic reasoning, and recently placed first in an online Diplomacy tournament against human players.
-
PyTorch Becomes Linux Foundation Top-Level Project
PyTorch, the popular deep-learning framework developed by Meta AI Research, has now become an independent top-level project of the Linux Foundation. The project will be managed by the newly-chartered PyTorch Foundation, with support from several large companies including Meta, AWS, NVIDIA, AMD, Google, and Microsoft.