InfoQ Homepage Big Data Content on InfoQ

News

RSS Feed

Newer Older

AI, ML & Data Engineering

Compliance and the California Privacy Act - the Empire Strikes Back

On January 1, 2020, the California Privacy Act came into effect. Many companies have not complied with the law, and the long term effects of the legislation are unclear.

Michael Stiefel
on Feb 10, 2020
Architecture & Design

The Distributed Data Mesh as a Solution to Centralized Data Monoliths

Instead of building large, centralized data platforms, corporations and data architects should create distributed data meshes.

Thomas Betts
on Jan 31, 2020
Cloud

Simplifying ETL in the Cloud, Microsoft Releases Azure Data Factory Mapping Data Flows

In a recent blog post, Microsoft announced the general availability (GA) of their serverless, code-free Extract-Transform-Load (ETL) capability inside of Azure Data Factory called Mapping Data Flows. This tool allows organizations to embrace a data-driven culture without the need to manage large infrastructure footprints while having the ability to dynamically scale data processing workloads.

Kent Weare
on Oct 21, 2019
Cloud

Google Releases Cloud Dataproc for Kubernetes in Alpha

Google Cloud Dataproc is an open-source data and analytic processing service based on Hadoop and Spark. Google has recently announced the alpha availability of Cloud Dataproc for Kubernetes, which provides customers with a more efficient method to process data across platforms.

Steef-Jan Wiggers
on Sep 23, 2019
AI, ML & Data Engineering

Jagadish Venkatraman on LinkedIn's Journey to Samza 1.0

At the recent ApacheCon North America, Jagadish Venkatraman spoke about how LinkedIn developed Apache Samza 1.0 to handle stream processing at scale. He described LinkedIn's use cases involving trillions of events and petabytes of data, then highlighted the features added for the 1.0 release, including: stateful processing, high-level APIs, and a flexible deployment model.

Anthony Alford
on Sep 14, 2019
AI, ML & Data Engineering

ApacheCon 2019 Keynote: Google Cloud Enhances Big-Data Processing with Kubernetes

At ApacheCon North America, Christopher Crosbie gave a keynote talk title "Yet Another Resource Negotiator for Big Data? How Google Cloud is Enhancing Data Lake Processing with Kubernetes." He highlighted Google's efforts to make Apache big-data software "cloud native" by developing open-source Kubernetes Operators to provide control planes for running Apache software in a Kubernetes cluster.

Anthony Alford
on Sep 13, 2019
Cloud

Google Introduces Cloud Storage Connector for Hadoop Big Data Workloads

In a recent blog post, Google announced a new Cloud Storage connector for Hadoop. This new capability allows organizations to substitute their traditional HDFS with Google Cloud Storage. Columnar file formats such as Parquet and ORC may realize increased throughput, and customers will benefit from Cloud Storage directory isolation, lower latency, increased parallelization and intelligent defaults

Kent Weare
on Sep 09, 2019
Architecture & Design

An Introduction to Structured Data at Etsy

Etsy recently published a blog post detailing how they store and manage structured data. The Etsy team make extensive use of taxonomies, and store the structured data with JSON files.

Alex Giamas
on Aug 31, 2019
Cloud

Amazon Releases AWS Lake Formation to General Availability

Recently, Amazon announced the general availability (GA) of AWS Lake Formation, a fully managed service that makes it much easier for customers to build, secure, and manage data lakes.

Steef-Jan Wiggers
on Aug 13, 2019
AI, ML & Data Engineering

The First AI to Beat Pros in 6-Player Poker, Developed by Facebook and Carnegie Mellon

Facebook AI Research’s Noam Brown and Carnegie Mellon’s professor Tuomas Sandholm recently announced Pluribus, the first Artificial Intelligence program able to beat humans in 6 player hold-em poker. In the past years, computers have progressively improved, beating humans in checkers, chess, Go, and the Jeopardy TV show. Poker poses more challenges around information asymmetry and bluffing.

Alex Giamas
on Jul 31, 2019
Cloud

Microsoft Announces Public Preview of Azure Data Share

Microsoft has announced the public preview of Azure Data Share, which provides capabilities to share data with users in the own organization, as well as with other organizations. Essentially, Microsoft positions the recently announced service as a big data tool, though it’s also possible to share individual files.

Eldert Grootenboer
on Jul 23, 2019
Cloud

Amazon Personalize Is Now Generally Available, Bringing ML to Customers

After the first announcement of Amazon Personalize during AWS re:Invent last November, the service is now generally available for all AWS customers. With this service, developers can add custom machine learning models to their application, including ones for personalized product recommendations, search results and direct marketing, even if they don’t have much machine learning experience.

Steef-Jan Wiggers
on Jun 25, 2019
AI, ML & Data Engineering

Los Angeles CTO Roundtable about AI and Data

The recent "Leaders in Data CTO Roundtable" in Los Angeles included discussions about an artificial intelligence (AI) framework/platform for business, data in the next five years, data software stacks, and acquiring data talent.

Aslan Brooke
on Jun 17, 2019
AI, ML & Data Engineering

Databricks Open Sources Delta Lake to Make Data Lakes More Reliable

Databricks recently announced open sourcing Delta Lake, their proprietary storage layer, to bring ACID transactions to Apache Spark and big data workloads. Databricks is the company behind the creators of Apache Spark, while Delta Lake is already being used in several companies like McAffee, Upwork etc . Delta Lake is addressing the heterogeneous data problem that data lakes often have...

Alex Giamas
on May 20, 2019
AI, ML & Data Engineering

A Framework for High-Value Big Data

Asha Saxena recently spoke at the Enterprise Data World 2019 Conference about the value big data analytics initiatives bring to the organizations. Saxena proposed a big data framework that can help with organizational maturity and internal competencies.

Srini Penchikala
on Apr 03, 2019

Newer News

Older News

InfoQ Software Architects' Newsletter

News