InfoQ Homepage Deep Learning Content on InfoQ
-
Meta Open-Sources Computer Vision Foundation Model DINOv2
Meta AI Research open-sourced DINOv2, a foundation model for computer vision (CV) tasks. DINOv2 is pretrained on a curated dataset of 142M images and can be used as a backbone for several tasks, including image classification, video action recognition, semantic segmentation, and depth estimation.
-
Google's Universal Speech Model Performs Speech Recognition on Hundreds of Languages
Google Research announced Universal Speech Model (USM), a 2B parameter automated speech recognition (ASR) model trained on over 12M hours of speech audio. USM can recognize speech in over 100 languages, including low-resource languages, and achieves new state-of-the-art performance on several benchmarks.
-
Stability AI Open-Sources 7B Parameter Language Model StableLM
Stability AI released two sets of pre-trained model weights for StableLM, a suite of large language models (LLM). The models are trained on 1.5 trillion text tokens and are licensed for commercial use under CC BY-SA-4.0.
-
Meta's Toolformer Uses APIs to Outperform GPT-3 on Zero-Shot NLP Tasks
Meta AI Research announced Toolformer, a language model that learns to call APIs to help solve natural language processing (NLP) tasks. Toolformer automatically annotates a training dataset which is used to fine-tune the model and can outperform the much larger GPT-3 model on several zero-shot NLP tasks.
-
Twitter Open-Sources Recommendation Algorithm
Twitter recently open-sourced several components of their system for recommending tweets for a user's Twitter timeline. The release includes the code for several of the services and jobs that run the algorithm, as well as code for training machine learning models for embedding and ranking tweets.
-
Google Uses AutoML to Discover More Efficient AI Training Algorithm
Researchers at Google have open-sourced EvoLved sIgn mOmeNtum (Lion), an optimization algorithm for training neural networks, which was discovered using an automated machine learning (AutoML) evolutionary algorithm. Models trained with Lion can achieve better accuracy on several benchmarks than models trained with other optimizers, while requiring fewer compute cycles to converge.
-
Meta AI’s Large Language Model with 10x Fewer Parameters
Meta AI recently released a new large language model called Language Large Models Meta AI (LLaMA) that outperforms foundational models such as GPT-3 and is competitive with PaLM, despite having 10 times fewer parameters. LLaMA has better performance in language tasks such as natural questions, common-sense reasoning and mathematical reasoning.
-
Microsoft Open-Sources Weather Forecasting Deep Learning Model ClimaX
Researchers from Microsoft's Autonomous Systems and Robotics Research group have open-sourced ClimaX, a deep learning foundation model for weather and climate modeling. ClimaX can be fine-tuned for a variety of prediction tasks and performs as well as or better than state-of-the-art models on several benchmarks.
-
Zero-Copy In-Memory Sharing of Large Distributed Data: V6d
Zero-copy and in-memory data manager Vineyard (v6d) is maintained as a CNCF sandbox project and provides distributed operators that can be utilized to share immutable data within or across cluster nodes. V6d is of interest particularly for deep network training on big (sharded) datasets such as large language and graph models.
-
DeepMind Open-Sources AI Interpretability Research Tool Tracr
Researchers at DeepMind have open-sourced TRAnsformer Compiler for RASP (Tracr), a compiler that translates programs into neural network models. Tracr is intended for research in mechanistic interpretability of Transformer AI models such as GPT-3.
-
Stanford Researchers Develop Brain-Computer Interface for Speech Synthesis
Researchers from Stanford University have developed a brain-computer interface (BCI) for synthesizing speech from signals captured in a patient's brain and processed by a recurrent neural network (RNN). The prototype system can decode speech at 62 words-per-minute, 3.4x faster than previous BCI methods.
-
Carnegie Mellon Researchers Develop AI Model for Human Detection via WiFi
Researchers from the Human Sensing Laboratory at Carnegie Mellon University (CMU) have published a paper on DensePose From WiFi, an AI model which can detect the pose of multiple humans in a room using only the signals from WiFi transmitters. In experiments on real-world data, the algorithm achieves an average precision of 87.2 at the 50% IOU threshold.
-
Unsupervised Object Detection and Semantic Segmentation Using Deep Learning
Meta AI released CutLER, a state-of-the-art zero-shot unsupervised object detector which improves detection performance by over 2.7 times on 11 benchmark datasets for different domains like video frames, painting, sketches, etc. This model’s simplicity allows compatibility with different object-detection architectures across different domains.
-
Microsoft Open Sources AI Prompt Optimization Toolkit LMOps
Microsoft Research open sourced LMOps, a collection of tools for improving text prompts used as input to generative AI models. The toolkit includes Promptist, which optimizes a user's text input for text-to-image generation, and Structured Prompting, a technique for including more examples in a few-shot learning prompt for text generation.
-
DeepMind Announces Minecraft-Playing AI DreamerV3
Researchers from DeepMind and the University of Toronto announced DreamerV3, a reinforcement-learning (RL) algorithm for training AI models for many different domains. Using a single set of hyperparameters, DreamerV3 outperforms other methods on several benchmarks and can train an AI to collect diamonds in Minecraft without human instruction.