InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Meta's Genomics AI ESMFold Predicts Protein Structure 6x Faster Than AlphaFold2
Meta AI Research recently announced ESMFold, an AI model for predicting protein structure from a sequence of genes. ESMFold is built on a 15B parameter Transform model and achieves accuracy comparable to other state-of-the-art models with an order-of-magnitude inference time speedup.
-
Write Directly from Cloud Pub/Sub to BigQuery with BigQuery Subscription
Recently Google introduced a new type of Pub/Sub subscription called a “BigQuery subscription,” allowing to write directly from Cloud Pub/Sub to BigQuery. The company claims that this new extract, load, and transform (ELT) path will be able to simplify event-driven architectures.
-
PrefixRL: Nvidia's Deep-Reinforcement-Learning Approach to Design Better Circuits
Nvidia has developed PrefixRL, an approach based on reinforcement learning (RL) to designing parallel-prefix circuits that are smaller and faster than those designed by state-of-the-art electronic-design-automation (EDA) tools.
-
Meta Open-Sources 200 Language Translation AI NLLB-200
Meta AI recently open-sourced NLLB-200, an AI model that can translate between any of over 200 languages. NLB-200 is a 54.5B parameter Mixture of Experts (MoE) model that was trained on a dataset containing more than 18 billion sentence pairs. On benchmark evaluations, NLLB-200 outperforms other state-of-the-art models by up to 44%.
-
Google AI Open-Sourced a New ML Tool for Conceptual and Subjective Queries over Images
Google AI open-sourced mood board search, a new ML-powered tool for subjective or conceptual queries over images. Mood board search helps users to define conceptual and subjective queries like peaceful, beautiful, over images.
-
A New Service from the Microsoft and Oracle Partnership: Oracle Database Service for Microsoft Azure
Recently, Microsoft and Oracle announced the general availability (GA) of Oracle Database Service for Microsoft Azure, a new service that allows Microsoft Azure customers to provision, access, and monitor enterprise-grade Oracle Database services in Oracle Cloud Infrastructure (OCI).
-
BigScience Releases 176B Parameter AI Language Model BLOOM
The BigScience research workshop released BigScience Large Open-science Open-access Multilingual Language Model (BLOOM), an autoregressive language model based on the GPT-3 architecture. BLOOM is trained on data from 46 natural languages and 13 programming languages and is the largest publicly available open multilingual model.
-
Meta Hopes to Increase Accuracy of Wikipedia with New AI Model
Meta AI's research and advancements team developed a neural-network-based system, called SIDE, that is capable of scanning hundreds of thousands of Wikipedia citations at once and checking whether they truly support the corresponding contents. Wikipedia is a multilingual free online encyclopedia written and maintained by volunteers through open collaboration and a wiki-based editing system.
-
AWS Announced Synthetic Data Generation for SageMaker Ground Truth
AWS announced that users can now create labeled synthetic data with Amazon SageMaker Ground Truth. SageMaker Ground Truth is a data labeling service that makes it simple to label data and allows you the choice to use human annotators through third-party suppliers, Amazon Mechanical Turk, or your own private workforce.
-
Java News Roundup: JDK 19 in RDP2, Oracle Critical Patch Update, TornadoVM on M1, Grails CVE
This week's Java roundup for July 18th, 2022, features news from Oracle, JDK 18, JDK 19, JDK 20, Spring Boot and Spring Security milestone and point releases, Spring for GraphQL 1.0.1, Liberica JDK updates, Quarkus 2.10.3, CVE in Grails, JobRunr 5.1.6, JReleaser maintenance, Apache Tomcat 9.0.65 and 10.1.0-M17, Tornado VM on Apple M1 and the JBNC conference.
-
Amazon Redshift Serverless Generally Available to Automatically Scale Data Warehouse
Amazon recently announced the general availability of Redshift Serverless, an elastic option to scale data warehouse capacity. The new service allows data analysts, developers and data scientists to run and scale analytics without provisioning and managing data warehouse clusters.
-
Shopify’s Practical Guidelines from Running Airflow for ML and Data Workflows at Scale
Shopify engineering shared its experience in the company's blog post on how to scale and optimize Apache Airflow for running ML and data workflows. They shared practical solutions for the challenges they faced like slow file access, insufficient control over DAG, irregular level of traffic, resource contention among workloads, and more.
-
Obituary: Alex Blewitt
It is with great sadness that we announce that InfoQ editor Dr. Alex Blewitt has unexpectedly passed away.
-
Google's Image-Text AI LIMoE Outperforms CLIP on ImageNet Benchmark
Researchers at Google Brain recently trained Language-Image Mixture of Experts (LIMoE), a 5.6B parameter image-text AI model. In zero-shot learning experiments on ImageNet, LIMoE outperforms CLIP and performs comparably to state-of-the-art models while using fewer compute resources.
-
PyTorch 1.12 Release Includes Accelerated Training on Macs and New Library TorchArrow
The PyTorch open-source deep-learning framework announced the release of version 1.12 which includes support for GPU-accelerated training on Apple silicon Macs and a new data preprocessing library, TorchArrow, as well as updates to other libraries and APIs.