InfoQ Homepage Data Content on InfoQ
-
AWS Announced Synthetic Data Generation for SageMaker Ground Truth
AWS announced that users can now create labeled synthetic data with Amazon SageMaker Ground Truth. SageMaker Ground Truth is a data labeling service that makes it simple to label data and allows you the choice to use human annotators through third-party suppliers, Amazon Mechanical Turk, or your own private workforce.
-
Google's BigQuery Introduces Column-Level Encryption Functions and Dynamic Masking of Information
Google recently released new features for its SaaS data warehouse BigQuery which include column level encryption functions and dynamic masking of information. Specifically, dynamic masking of information can be used for real-time transactions whereas column level encryption provides additional security for data at rest or in motion where real-time usability is not required.
-
Microsoft's New Simulation Framework FLUTE Accelerates Federated Learning Algorithm Development
Microsoft Research has recently released Federated Learning Utilities and Tools for Experimentation (FLUTE), a new simulation framework to accelerate federated learning ML algorithm development. The main goal of federated learning is to train complex machine-learning models over massive amounts of data without the need to share that data in a centralized location.
-
Meta AI’s New Data Set to Accelerate Renewable Energy Catalyst Discovery for Hydrogen Fuel
Meta AI recently announced that it will soon release an entirely new data set for green hydrogen fuel ML modeling and simulation, focused on oxide catalysts for the oxygen evolution reaction (OER), a critical chemical reaction used in green hydrogen fuel production via wind and solar energy.
-
Microsoft Introduces Open Data for Social Impact Framework
Microsoft recently introduced the Open Data for Social Impact Framework, a guide to help organizations put data to work to get new insights, make better decisions, and improve efficiency while tackling pressing social issues. The framework includes a five-step roadmap that organizations can use to get started.
-
Orchestrate Operations, Validations, and Approvals on Data Entities with Azure Purview Workflows
Recently, Microsoft announced the preview of Azure Purview Workflows, allowing customers to orchestrate then create, update and delete operations, validation, and approval of data entities using repeatable business processes. These workflows are currently in preview.
-
Get Consistent Access to Third-Party APIs with AWS Data Exchange for APIs
During the recent AWS re:Invent in Las Vegas, the company announced the AWS Data Exchange for APIs. This new capability enables customers to find, subscribe to, and use third-party API products from providers on AWS Data Exchange.
-
Apache Spark Brings Pandas API with Version 3.2
The Apache Spark team has integrated the Pandas API in the product's latest 3.2 release. With this change, dataframe processing can be scaled to multiple clusters or multiple processors in a single machine using the PySpark execution engine.
-
Cloudera Announces the General Availability of Cloudera DataFlow for the Public Cloud
The enterprise data cloud company Cloudera recently announced the general availability (GA) of Cloudera DataFlow for the Public Cloud, a cloud-native service for data flows to process hybrid streaming workloads on the Cloudera Data Platform (CDP).
-
Microsoft Renames Its Azure for FHIR API to Azure Healthcare APIs
Recently Microsoft announced the renaming of its Cloud for Healthcare's Azure API for Fast Healthcare Interoperability Resource (FHIR) to "Azure Healthcare APIs." In addition to the renaming of the APIs, the company also expands support for healthcare data to include patient health data via FHIR, medical imaging data via DICOM - and medical device data via the Azure IoT Connector for FHIR .
-
Perceiver: One Neural-Network Model for Multiple Input Data Types
Google’s DeepMind company has recently released a state-of-the-art deep-learning model called Perceiver that receives and processes multiple input data ranging from audio to images, similarly to how the human brain perceives multimodal data. Perceiver is able to receive and classify input multiple data types, namely point cloud, audio and images.
-
The Journey from Monolith to Microservices at GitHub: QCon Plus Q&A
GitHub needed to fundamentally rethink how they did software development due to all of the different cultures, norms, and technology stacks that their teams brought to the table. They are migrating toward a microservices architecture that enables different teams and systems and technologies to work harmoniously together.
-
Google Announces a New, More Services-Based Architecture Called Runner V2 to Dataflow
Google Cloud Dataflow is a fully-managed service for executing Apache Beam pipelines within the Google Cloud Platform(GCP). In a recent blog post, Google announced a new, more services-based architecture called Runner v2 to Dataflow – which will include multi-language support for all of its language SDKs.
-
The Distributed Data Mesh as a Solution to Centralized Data Monoliths
Instead of building large, centralized data platforms, corporations and data architects should create distributed data meshes.
-
Data Science at the Intersection of Emerging Technologies
Kirk Borne, principal data scientist at Booz Allen Hamilton, gave a keynote presentation at this year’s Oracle Code One Conference on how the connection between emerging technologies, data, and machine learning are transforming data into value. Emerging technological innovations like AI, robotics, computer vision and more, are enabled by data and create value from data.