InfoQ Homepage Big Data Content on InfoQ

News

RSS Feed

Newer Older

Architecture & Design

Designing for Failure in the BBC's Analytics Platform

Last week at InfoQ Live, Blanca Garcia-Gil, principal systems engineer at BBC, gave a session on Evolving Analytics in the Data Platform. During this session, Garcia-Gil focused on how her team prepared and designed for two types of failure - "known unknowns" and "unknown unknowns."

Eran Stiller
on Feb 24, 2021
Cloud

Google Brings Databricks to Its Cloud Platform

Recently Google announced a partnership with Databricks to bring their fully-managed Apache Spark offering and data lake capabilities to Google Cloud. The offering will become available as Databricks on Google Cloud.

Steef-Jan Wiggers
on Feb 23, 2021
Architecture & Design

PayPal Standardizes on Apache Airflow and Apache Gobblin for Its Next-Gen Data Movement Platform

PayPal recently described how it standardized on Apache Airflow and Apache Gobblin for implementing its next-gen data movement platform. In a recent blog post, PayPal engineers detail how the existing data movement platform evolved into many tools & platforms in a complex and unmanageable ecosystem and their shift towards a new implementation.

Eran Stiller
on Feb 10, 2021
Culture & Methods

Analyzing Large Amounts of Feedback to Learn from Users

Making it easy for users to give feedback and automating the collection of feedback helps to get more feedback faster. Using artificial intelligence, you can analyze large amounts of feedback to get insights and visualize trends. Sharing this information widely supports taking action to enhance your product and solve issues that users are having.

Ben Linders
on Dec 24, 2020
Cloud

Google Announces a New, More Services-Based Architecture Called Runner V2 to Dataflow

Google Cloud Dataflow is a fully-managed service for executing Apache Beam pipelines within the Google Cloud Platform(GCP). In a recent blog post, Google announced a new, more services-based architecture called Runner v2 to Dataflow – which will include multi-language support for all of its language SDKs.

Steef-Jan Wiggers
on Aug 30, 2020
AI, ML & Data Engineering

Spark AI Summit 2020 Highlights: Innovations to Improve Spark 3.0 Performance

At the recent Spark AI Summit 2020, held online for the first time, the highlights of the event were innovations to improve Apache Spark 3.0 performance, including optimizations for Spark SQL, and GPU acceleration.

Carol McDonald
on Jul 03, 2020
DevOps

Splunk Launches New Release of SignalFx APM

Splunk, a platform for searching, monitoring, and examining machine-generated big data, has launched a new release of application monitoring tool SignalFx Microservices APM™. The new release combines NoSample™ tracing, open standards based instrumentation and artificial intelligence (AI)-driven directed troubleshooting from SignalFx and Omnition into a single solution.

Helen Beal
on Apr 30, 2020
AI, ML & Data Engineering

Compliance and the California Privacy Act - the Empire Strikes Back

On January 1, 2020, the California Privacy Act came into effect. Many companies have not complied with the law, and the long term effects of the legislation are unclear.

Michael Stiefel
on Feb 10, 2020
Architecture & Design

The Distributed Data Mesh as a Solution to Centralized Data Monoliths

Instead of building large, centralized data platforms, corporations and data architects should create distributed data meshes.

Thomas Betts
on Jan 31, 2020
Cloud

Simplifying ETL in the Cloud, Microsoft Releases Azure Data Factory Mapping Data Flows

In a recent blog post, Microsoft announced the general availability (GA) of their serverless, code-free Extract-Transform-Load (ETL) capability inside of Azure Data Factory called Mapping Data Flows. This tool allows organizations to embrace a data-driven culture without the need to manage large infrastructure footprints while having the ability to dynamically scale data processing workloads.

Kent Weare
on Oct 21, 2019
Cloud

Google Releases Cloud Dataproc for Kubernetes in Alpha

Google Cloud Dataproc is an open-source data and analytic processing service based on Hadoop and Spark. Google has recently announced the alpha availability of Cloud Dataproc for Kubernetes, which provides customers with a more efficient method to process data across platforms.

Steef-Jan Wiggers
on Sep 23, 2019
AI, ML & Data Engineering

Jagadish Venkatraman on LinkedIn's Journey to Samza 1.0

At the recent ApacheCon North America, Jagadish Venkatraman spoke about how LinkedIn developed Apache Samza 1.0 to handle stream processing at scale. He described LinkedIn's use cases involving trillions of events and petabytes of data, then highlighted the features added for the 1.0 release, including: stateful processing, high-level APIs, and a flexible deployment model.

Anthony Alford
on Sep 14, 2019
AI, ML & Data Engineering

ApacheCon 2019 Keynote: Google Cloud Enhances Big-Data Processing with Kubernetes

At ApacheCon North America, Christopher Crosbie gave a keynote talk title "Yet Another Resource Negotiator for Big Data? How Google Cloud is Enhancing Data Lake Processing with Kubernetes." He highlighted Google's efforts to make Apache big-data software "cloud native" by developing open-source Kubernetes Operators to provide control planes for running Apache software in a Kubernetes cluster.

Anthony Alford
on Sep 13, 2019
Cloud

Google Introduces Cloud Storage Connector for Hadoop Big Data Workloads

In a recent blog post, Google announced a new Cloud Storage connector for Hadoop. This new capability allows organizations to substitute their traditional HDFS with Google Cloud Storage. Columnar file formats such as Parquet and ORC may realize increased throughput, and customers will benefit from Cloud Storage directory isolation, lower latency, increased parallelization and intelligent defaults

Kent Weare
on Sep 09, 2019
Architecture & Design

An Introduction to Structured Data at Etsy

Etsy recently published a blog post detailing how they store and manage structured data. The Etsy team make extensive use of taxonomies, and store the structured data with JSON files.

Alex Giamas
on Aug 31, 2019

Newer News

Older News

InfoQ Software Architects' Newsletter

News