InfoQ Homepage Computer Vision Content on InfoQ

News

RSS Feed

Newer Older

AI, ML & Data Engineering

Adobe Researchers Open-Source Image Captioning AI CLIP-S

Researchers from Adobe and the University of North Carolina (UNC) have open-sourced CLIP-S, an image-captioning AI model that produces fine-grained descriptions of images. In evaluations with captions generated by other models, human judges preferred those generated by CLIP-S a majority of the time.

Anthony Alford
on Jul 05, 2022
AI, ML & Data Engineering

DeepMind Trains 80 Billion Parameter AI Vision-Language Model Flamingo

DeepMind recently trained Flamingo, an 80B parameter vision-language model (VLM) AI. Flamingo combines separately pre-trained vision and language models and outperforms all other few-shot learning models on 16 vision-language benchmarks. Flamingo can also chat with users, answering questions about input images and videos.

Anthony Alford
on Jun 21, 2022
AI, ML & Data Engineering

LAION Releases Five Billion Image-Text Pair Dataset LAION-5B

The Large-scale Artificial Intelligence Open Network (LAION) released LAION-5B, an AI training dataset containing over five billion image-text pairs. LAION-5B contains images and captions scraped from the internet and is 14x larger than its predecessor LAION-400M, making it the largest freely available image-text dataset.

Anthony Alford
on May 17, 2022
AI, ML & Data Engineering

University of Washington Open-Sources AI Fine-Tuning Algorithm WISE-FT

A team of researchers from University of Washington (UW), Google Brain, and Columbia University have open-sourced weight-space ensembles for fine-tuning (WiSE-FT), an algorithm for fine-tuning AI models that improves robustness under distribution shift. Experiments on several computer vision (CV) benchmarks show that WISE-FT improves accuracy up to 6 percentage points.

Anthony Alford
on Mar 22, 2022
AI, ML & Data Engineering

Evaluating Continual Deep Learning: a New Benchmark for Image Classification

Continual learning aims to preserve knowledge across deep network training iterations. A new dataset entitled "The CLEAR Benchmark: Continual LEArning on Real-World Imagery" has recently been published. The goal of the study is to establish a consistent image classification benchmark with the natural time evolution of objects for a more realistic comparison of continual learning models.

Sabri Bolkar
on Feb 01, 2022
Cloud

Analyze Video Feeds at the Edge with AWS Panorama Appliance

Recently, AWS announced the general availability (GA) of AWS Panorama Appliance, a new device that customers can install in their facilities to run applications that analyze multiple video streams from existing on-premises cameras.

Steef-Jan Wiggers
on Oct 29, 2021
AI, ML & Data Engineering

Facebook Open-Sources Computer Vision Model Multiscale Vision Transformers

Facebook AI Research (FAIR) recently open-sourced Multiscale Vision Transformers (MViT), a deep-learning model for computer vision based on the Transformer architecture. MViT contains several internal resolution-reduction stages and outperforms other Transformer vision models while requiring less compute power, achieving new state-of-the-art accuracy on several benchmarks.

Anthony Alford
on Sep 21, 2021
AI, ML & Data Engineering

Google Announces 800M Parameter Vision-Language AI Model ALIGN

Google Research announced the development of A Large-scale ImaGe and Noisy-Text Embedding (ALIGN), an 800M-parameter pre-trained deep-learning model trained on a noisy dataset of 1.8B image-text pairs. The model can be used on several downstream tasks and achieves state-of-the-art accuracy on several image-text retrieval benchmarks.

Anthony Alford
on Jul 20, 2021
AI, ML & Data Engineering

Google Trains Two Billion Parameter AI Vision Model

Researchers at Google Brain announced a deep-learning computer vision (CV) model containing two billion parameters. The model was trained on three billion images and achieved 90.45% top-1 accuracy on ImageNet, setting a new state-of-the-art record.

Anthony Alford
on Jun 22, 2021
AI, ML & Data Engineering

NVIDIA Announces AI Training Dataset Generator DatasetGAN

Researchers at NVIDIA have created DatasetGAN, a system for generating synthetic images with annotations to create datasets for training AI vision models. DatasetGAN can be trained with as few as 16 human-annotated images and performs as well as fully-supervised systems requiring 100x more annotated images.

Anthony Alford
on May 18, 2021
Mobile

ML Kit for iOS and Android Now Generally Available

After two years in beta, Google has announced the general availability of ML Kit for iOS and Android along with improvements to the Pose Detection API. Furthermore, Selfie Segmentation is now available in public beta.

Sergio De Simone
on Mar 12, 2021
AI, ML & Data Engineering

OpenAI Announces GPT-3 Model for Image Generation

OpenAI has trained a 12B-parameter AI model based on GPT-3 that can generate images from textual description. The description can specify many independent attributes, including the position of objects as well as image perspective, and can also synthesize combinations of objects that do not exist in the real world.

Anthony Alford
on Feb 02, 2021
AI, ML & Data Engineering

Microsoft Research Develops a New Vision-Language System: VinVL

Microsoft Research recently developed a new object-attribute detection model for image encoding, which they named VinVL - Visual features in Vision-Language.

Steef-Jan Wiggers
on Jan 25, 2021
Mobile

MediaPipe Introduces Holistic Tracking for Mobile Devices

Holistic tracking is a new feature in MediaPipe that enables the simultaneous detection of body and hand pose and face landmarks on mobile devices. The three capabilities were previously already available separately but they are now combined in a single, highly optimized solution.

Sergio De Simone
on Dec 13, 2020
Mobile

ML Kit Pose Detection Brings Body Movement Tracking to iOS and Android

Initially available under the ML Kit early access program, Pose Detection is now officially part of ML Kit. The library is capable of tracking the human body, including facial landmarks, hands, and feet.

Sergio De Simone
on Sep 11, 2020

Newer News

Older News

InfoQ Software Architects' Newsletter

News