InfoQ Homepage Computer Vision Content on InfoQ
-
ML Kit for iOS and Android Now Generally Available
After two years in beta, Google has announced the general availability of ML Kit for iOS and Android along with improvements to the Pose Detection API. Furthermore, Selfie Segmentation is now available in public beta.
-
OpenAI Announces GPT-3 Model for Image Generation
OpenAI has trained a 12B-parameter AI model based on GPT-3 that can generate images from textual description. The description can specify many independent attributes, including the position of objects as well as image perspective, and can also synthesize combinations of objects that do not exist in the real world.
-
Microsoft Research Develops a New Vision-Language System: VinVL
Microsoft Research recently developed a new object-attribute detection model for image encoding, which they named VinVL - Visual features in Vision-Language.
-
MediaPipe Introduces Holistic Tracking for Mobile Devices
Holistic tracking is a new feature in MediaPipe that enables the simultaneous detection of body and hand pose and face landmarks on mobile devices. The three capabilities were previously already available separately but they are now combined in a single, highly optimized solution.
-
ML Kit Pose Detection Brings Body Movement Tracking to iOS and Android
Initially available under the ML Kit early access program, Pose Detection is now officially part of ML Kit. The library is capable of tracking the human body, including facial landmarks, hands, and feet.
-
Artificial Intelligence Can Create Sound Tracks for Silent Videos
Researchers Ghose and Prevost created a deep learning algorithm which, given a silent video, can generate a realistic sounding synchronised soundtrack. They trained the neural network to classify the class of the sound to generate, and they also trained a sequential network to generate the sound. They thus could go from temporally aligned images to the generation of sound: a different modality!
-
Google Announces TensorFlow 2 Support in Object Detection API
Google announced support for TensorFlow 2 (TF2) in the TensorFlow Object Detection (OD) API. The release includes eager-mode compatible binaries, two new network architectures, and pre-trained weights for all supported models.
-
MIT and Toyota Release Autonomous Driving Dataset DriveSeg
Toyota's Collaborative Safety Research Center (CSRC) and MIT's AgeLab have released DriveSeg, a dataset for autonomous driving research. DriveSeg contains over 25,000 frames of high-resolution video with each pixel labelled with one of 12 classes of road object. DriveSeg is available free of charge for non-commercial use.
-
Google ML Kit SDK Now Focuses on On-Device Machine Learning
Google has introduced a new ML Kit SDK aimed at working in standalone mode without requiring a tight integration with Firebase, as the original ML Kit SDK did. Additionally, it provides limited support for replacing its default models with custom ones for image labeling and object detection and tracking.
-
Google Open-Sources Computer Vision Model Big Transfer
Google Brain has released the pre-trained models and fine-tuning code for Big Transfer (BiT), a deep-learning computer vision model. The models are pre-trained on publicly-available generic image datasets and can meet or exceed state-of-the-art performance on several vision benchmarks after fine-tuning on just a few samples.
-
Google's V8 Engine Adds Support for WebAssembly SIMD
The WebAssembly SIMD proposal has come to Google JavaScript engine V8, albeit still as an experimental feature. Exploiting data parallelism, V8 support for SIMD (Single instruction, multiple data) aims to accelerate compute intensive tasks like audio/video processing, machine learning, and more.
-
Apple Acquires Edge-Focused AI Startup Xnor.ai
Apple has acquired Xnor.ai, a Seattle-based startup that builds AI models that run on edge devices, for approximately $200 million.
-
Uber's Synthetic Training Data Speeds Up Deep Learning by 9x
Uber AI Labs has developed an algorithm called Generative Teaching Networks (GTN) that produces synthetic training data for neural networks which allows the networks to be trained faster than when using real data. Using this synthetic data, Uber sped up its neural architecture search (NAS) deep-learning optimization process by 9x.
-
Facebook AI Releases New Computer Vision Library Detectron2
Facebook AI Research (FAIR) has released Detectron2, a PyTorch-based computer vision library that brings a series of new research and production capabilities to the framework. While the first Detectron was written in Caffe2, Detectron2 represents a full rewrite of the original framework in PyTorch from the ground up, with several new object detection capabilities.
-
Google Announces Updates to AutoML Vision Edge, AutoML Video, and the Video Intelligence API
In a recent blog post, Google announced enhancements to a part of its Vision AI portfolio: AutoML Vision Edge, AutoML Video, and the Video Intelligence API. Each received updates to enhance their capabilities.