InfoQ Homepage Big Data Content on InfoQ

Articles

RSS Feed

Newer Older

AI, ML & Data Engineering

COVID-19 and Mining Social Media - Enabling Machine Learning Workloads with Big Data

In this article, author Adi Pollock discusses how to enable machine learning workloads with big data to query and analyze COVID-19 tweets to understand social sentiment towards COVID-19.

Adi Polak
on Oct 02, 2020
Cloud

From Cloud to Cloudlets: a New Approach to Data Processing?

The growing popularity of small, distributed clouds, or “cloudlets” is an implicit recognition of the limitations of the “traditional” cloud model, and could signal a major shift in the way that data is collected, stored, and processed.

Sam Bocetta
on Oct 01, 2020
Cloud

Combining DataOps and DevOps: Scale at Speed

DataOps is an extension of DevOps standards and processes into the data analytics world. It's about streamlining the processes involved in processing, analyzing and deriving value from big data.

Sam Bocetta
on Aug 14, 2020
AI, ML & Data Engineering

Data Leadership Book Review and Interview

Data Leadership book, authored by Anthony Algmin, covers the data leadership topic and how data leaders should manage and govern the data management programs in their organizations. Data Leadership is how organizations choose to apply their energy and resources toward creating data capabilities to influence their business.

Srini Penchikala Anthony Algmin
on Jul 25, 2020
Java

Apache Arrow and Java: Lightning Speed Big Data Transfer

Apache Arrow puts forward a cross-language, cross-platform, columnar in-memory data format for data. It is designed to eliminate the need for data serialization and reduce the overhead of copying.

Joris Gillis
on May 23, 2020
Culture & Methods

Data Analytics in the World of Agility

Is it all about customer-centric business, or is there any data left? Can we integrate data analytics and customer empathy? This article explores how we can move towards a more customer-centric business and what information we require in order to understand the most valuable thing we have: our customer.

Almudena Rodriguez Pardo
on Sep 06, 2019
AI, ML & Data Engineering

Stream Processing Anomaly Detection Using Yurita Framework

In this article, author Guy Gerson discusses the stream processing anomaly detection framework they developed by PayPal, called Yurita. The framework is based on Spark Structured Streaming.

Guy Gerson
on Jul 10, 2019
AI, ML & Data Engineering

Real-Time Data Processing Using Redis Streams and Apache Spark Structured Streaming

Structured Streaming, introduced with Apache Spark 2.0, delivers a SQL-like interface for streaming data. Redis Streams enables Redis to consume, hold and distribute streaming data between multiple producers and consumers. In this article, author Roshan Kumar walks us through how to process streaming data in real time using Redis and Apache Spark Streaming technologies.

Roshan Kumar
on May 13, 2019
AI, ML & Data Engineering

Conquering the Challenges of Data Preparation for Predictive Maintenance

Predictive maintenance (PdM) applications aim to apply machine learning (ML) on IIoT datasets in order to reduce occupational hazards, machine downtime, and other costs. In this article, the author addresses some of the data preparation challenges faced by the industrial practitioners of ML and the solutions for data ingest and feature engineering related to PdM.

Ian Downard
on Jan 04, 2019
AI, ML & Data Engineering

Analytics Zoo: Unified Analytics + AI Platform for Distributed Tensorflow, and BigDL on Apache Spark

In this article we described how Analytics Zoo can help real-world users to build end-to-end deep learning pipelines for big data, including unified pipelines for distributed TensorFlow and Keras on Apache Spark, easy-to-use abstractions such as transfer learning and Spark ML pipeline support, built-in deep learning models and reference use cases, etc.

Jason Dai
on Dec 11, 2018
AI, ML & Data Engineering

Sentiment Analysis: What's with the Tone?

Sentiment analysis is widely applied in voice of the customer (VOC) applications. In this article, the authors discuss NLP-based Sentiment Analysis based on machine learning (ML) and lexicon-based approaches using KNIME data analysis tools.

Rosaria Silipo Kathrin Melcher
on Nov 27, 2018
AI, ML & Data Engineering

Spark Application Performance Monitoring Using Uber JVM Profiler, InfluxDB and Grafana

In this article, author Amit Baghel discusses how to monitor the performance of Apache Spark based applications using technologies like Uber JVM Profiler, InfluxDB database and Grafana data visualization tool.

Amit Baghel
on Nov 18, 2018

Newer Articles

Older Articles

InfoQ Software Architects' Newsletter

Articles