InfoQ Homepage Computer Vision Content on InfoQ
-
LAION Releases Five Billion Image-Text Pair Dataset LAION-5B
The Large-scale Artificial Intelligence Open Network (LAION) released LAION-5B, an AI training dataset containing over five billion image-text pairs. LAION-5B contains images and captions scraped from the internet and is 14x larger than its predecessor LAION-400M, making it the largest freely available image-text dataset.
-
University of Washington Open-Sources AI Fine-Tuning Algorithm WISE-FT
A team of researchers from University of Washington (UW), Google Brain, and Columbia University have open-sourced weight-space ensembles for fine-tuning (WiSE-FT), an algorithm for fine-tuning AI models that improves robustness under distribution shift. Experiments on several computer vision (CV) benchmarks show that WISE-FT improves accuracy up to 6 percentage points.
-
Evaluating Continual Deep Learning: a New Benchmark for Image Classification
Continual learning aims to preserve knowledge across deep network training iterations. A new dataset entitled "The CLEAR Benchmark: Continual LEArning on Real-World Imagery" has recently been published. The goal of the study is to establish a consistent image classification benchmark with the natural time evolution of objects for a more realistic comparison of continual learning models.
-
Analyze Video Feeds at the Edge with AWS Panorama Appliance
Recently, AWS announced the general availability (GA) of AWS Panorama Appliance, a new device that customers can install in their facilities to run applications that analyze multiple video streams from existing on-premises cameras.
-
Facebook Open-Sources Computer Vision Model Multiscale Vision Transformers
Facebook AI Research (FAIR) recently open-sourced Multiscale Vision Transformers (MViT), a deep-learning model for computer vision based on the Transformer architecture. MViT contains several internal resolution-reduction stages and outperforms other Transformer vision models while requiring less compute power, achieving new state-of-the-art accuracy on several benchmarks.
-
Google Announces 800M Parameter Vision-Language AI Model ALIGN
Google Research announced the development of A Large-scale ImaGe and Noisy-Text Embedding (ALIGN), an 800M-parameter pre-trained deep-learning model trained on a noisy dataset of 1.8B image-text pairs. The model can be used on several downstream tasks and achieves state-of-the-art accuracy on several image-text retrieval benchmarks.
-
Google Trains Two Billion Parameter AI Vision Model
Researchers at Google Brain announced a deep-learning computer vision (CV) model containing two billion parameters. The model was trained on three billion images and achieved 90.45% top-1 accuracy on ImageNet, setting a new state-of-the-art record.
-
NVIDIA Announces AI Training Dataset Generator DatasetGAN
Researchers at NVIDIA have created DatasetGAN, a system for generating synthetic images with annotations to create datasets for training AI vision models. DatasetGAN can be trained with as few as 16 human-annotated images and performs as well as fully-supervised systems requiring 100x more annotated images.
-
ML Kit for iOS and Android Now Generally Available
After two years in beta, Google has announced the general availability of ML Kit for iOS and Android along with improvements to the Pose Detection API. Furthermore, Selfie Segmentation is now available in public beta.
-
OpenAI Announces GPT-3 Model for Image Generation
OpenAI has trained a 12B-parameter AI model based on GPT-3 that can generate images from textual description. The description can specify many independent attributes, including the position of objects as well as image perspective, and can also synthesize combinations of objects that do not exist in the real world.
-
Microsoft Research Develops a New Vision-Language System: VinVL
Microsoft Research recently developed a new object-attribute detection model for image encoding, which they named VinVL - Visual features in Vision-Language.
-
MediaPipe Introduces Holistic Tracking for Mobile Devices
Holistic tracking is a new feature in MediaPipe that enables the simultaneous detection of body and hand pose and face landmarks on mobile devices. The three capabilities were previously already available separately but they are now combined in a single, highly optimized solution.
-
ML Kit Pose Detection Brings Body Movement Tracking to iOS and Android
Initially available under the ML Kit early access program, Pose Detection is now officially part of ML Kit. The library is capable of tracking the human body, including facial landmarks, hands, and feet.
-
Artificial Intelligence Can Create Sound Tracks for Silent Videos
Researchers Ghose and Prevost created a deep learning algorithm which, given a silent video, can generate a realistic sounding synchronised soundtrack. They trained the neural network to classify the class of the sound to generate, and they also trained a sequential network to generate the sound. They thus could go from temporally aligned images to the generation of sound: a different modality!
-
Google Announces TensorFlow 2 Support in Object Detection API
Google announced support for TensorFlow 2 (TF2) in the TensorFlow Object Detection (OD) API. The release includes eager-mode compatible binaries, two new network architectures, and pre-trained weights for all supported models.