InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Designing for Failure in the BBC's Analytics Platform
Last week at InfoQ Live, Blanca Garcia-Gil, principal systems engineer at BBC, gave a session on Evolving Analytics in the Data Platform. During this session, Garcia-Gil focused on how her team prepared and designed for two types of failure - "known unknowns" and "unknown unknowns."
-
NLP Library spaCy 3.0 Features Transformer-Based Models and Distributed Training
AI software makers Explosion announced version 3.0 of spaCy, their open-source natural-language processing (NLP) library. The new release includes state-of-the-art Transformer-based pipelines and pre-trained models for 17 languages.
-
Google Brings Databricks to Its Cloud Platform
Recently Google announced a partnership with Databricks to bring their fully-managed Apache Spark offering and data lake capabilities to Google Cloud. The offering will become available as Databricks on Google Cloud.
-
Java News Roundup - Week of Feb 15th, 2021
A roundup of smaller stories in the Java ecosystem from the week of February 15th, 2021.
-
Google Open-Sources Trillion-Parameter AI Language Model Switch Transformer
Researchers at Google Brain have open-sourced the Switch Transformer, a natural-language processing (NLP) AI model. The model scales up to 1.6T parameters and improves training time up to 7x compared to the T5 NLP model, with comparable accuracy.
-
Microsoft Announces Limited Access to Its Neural Text-to-Speech AI
Recently, Microsoft announced limited access to its neural text-to-speech AI called Custom Neural Voice. The service allows developers to create custom synthetic voices.
-
QCon Plus (May 17-28) Program Committee and Conference Chair Announced
This May at QCon Plus over 1500 senior software engineers, architects, and team leads will discuss emerging software trends and practices, develop their technical and non-technical skills and get valuable insights they can take home to their team to implement right away.
-
PayPal Standardizes on Apache Airflow and Apache Gobblin for Its Next-Gen Data Movement Platform
PayPal recently described how it standardized on Apache Airflow and Apache Gobblin for implementing its next-gen data movement platform. In a recent blog post, PayPal engineers detail how the existing data movement platform evolved into many tools & platforms in a complex and unmanageable ecosystem and their shift towards a new implementation.
-
Pinterest Describes an Architecture for Efficient Retrieval of Hierarchical Documents
In a recent blog post, Pinterest engineers describe how they implemented an efficient two-stage retrieval architecture to retrieve hierarchical documents in a home-grown search engine. They accomplished it by combining index flattening, index normalization, and index denormalization.
-
AWS Announces Amazon Aurora Supports PostgreSQL 12
AWS has recently announced that Amazon Aurora, a MySQL and PostgreSQL-compatible relational Database built for the Cloud, now supports major version 12 of PostgreSQL.
-
Kaggle Publishes 2020 State of Machine Learning and Data Science Report
Kaggle has published a report on the State of Machine Learning and Data Science for 2020. The report is based on survey responses from over two thousand users currently employed as data scientists. The report notes that the "vast majority" of data scientists are under 35 years of age, two-thirds have a graduate degree, and most have less than 10 years coding experience.
-
OpenAI Announces GPT-3 Model for Image Generation
OpenAI has trained a 12B-parameter AI model based on GPT-3 that can generate images from textual description. The description can specify many independent attributes, including the position of objects as well as image perspective, and can also synthesize combinations of objects that do not exist in the real world.
-
AWS Announces Enhanced Console Experience and New v2 APIs for Amazon Lex
AWS recently announced updates to Amazon Lex, a service for building conversational interfaces into any application using voice and text. The service now has an enhanced management console and new V2 APIs, including continuous streaming capability.
-
Using Language and Developer Friendly Data Structures with Couchbase
Couchbase APIs have evolved to provide programming language friendly data structures making it easier for programmers to incorporate into the respective programs. Some examples highlight how to use data structures with the Couchbase Python SDK.
-
Confluent Announces Strategic Alliance with Microsoft
Confluent, the company of the founders of Apache Kafka, recently announced a new strategic alliance between them and Microsoft to enable a more integrated experience between Confluent Cloud and the Azure platform.