Big Data Processing with Apache Spark - Part 4: Spark Machine Learning

Posted by Srini Penchikala on  May 15, 2016

In this fourth installment of Apache Spark article series, author Srini Penchikala discusses machine learning concept & Spark MLlib library for running predictive analytics using a sample application.

The Role of a Data Scientist in 2016

Posted by Ed Jones on  Mar 27, 2016

Data Science has been getting lot of attention as organizations are starting to use data analytics to gain insights into their data. This article takes a closer look at Data Scientist role in 2016.

Unified Data Modeling for Relational and NoSQL Databases

Posted by Allen Wang on  Feb 28, 2016

Current enterprise data architectures include NoSQL databases co-existing with RDBMS. In this article, author discusses a solution for managing NoSQL & relational data using unified data modeling. 5

Getting Ready for IoT’s Big Data Challenges with Couchbase Mobile

Posted by Ralph Winzinger on  Jan 20, 2016

Our physical world is about to become digitally enabled and according to various predictions, there will be many billions of IoT devices going online and collecting data in the coming years. 2

Big Data Processing with Apache Spark - Part 3: Spark Streaming

Posted by Srini Penchikala on  Jan 07, 2016

In this article, third installment of Apache Spark series, author discusses Apache Spark Streaming framework for processing real-time streaming data using a log analytics sample application. 7

Health Informatics and Survival Prediction of Cancer with Apache Spark Machine Learning Library

Posted by Konur Unyelioglu on  Dec 22, 2015

In this article, author discusses the survival prediction of colorectal cancer as a multi-class classification problem and how to solve that problem using the Apache Spark's MLlib Java API.

Data Lake-as-a-Service: Big Data Processing and Analytics in the Cloud

Posted by Srini Penchikala on  Dec 10, 2015

Data Lake-as-a-Service provides big data processing in the cloud for business outcomes in a cost effective way. InfoQ spoke with Lovan Chetty & Hannah Smalltree from Cazena about these solutions work.

Real-time Data Processing in AWS Cloud

Posted by Oleksii Tymchenko on  Nov 11, 2015

In this article, author discusses a bio-informatic software as a service (SaaS) product which was built as a public data warehousing and analytical platform for mass spectrometry data. 3

Oozie Plugin for Eclipse

Posted by Ahmed Mahran on  Oct 30, 2015

A new Eclipse Oozie plugin allows to significantly simplify implementation of Oozie processes by allowing to define them graphically. An article introduces plugin and provides an example of its usage. 1