“Spark GraphX in Action” book from Manning Publications, authored by Michael Malak and Robin East, provides a tutorial based coverage of Spark GraphX, the graph data processing library from Apache Spark framework. InfoQ spoke with authors about the book and Spark GraphX library as well as overall Spark framework and what's coming up in the area of graph data processing and analytics.
Christine Doig spoke at this year's OSCON Conference about data science as a team discipline and how to navigate the data science Python ecosystem. InfoQ spoke with Christine about challenges data science teams need to address to be more effective.
Big Data Analytics with Spark, authored by Mohammed Guller, provides a practical guide for learning Apache Spark. InfoQ and the author discuss the book & development tools for big data applications.
In this fourth installment of Apache Spark article series, author Srini Penchikala discusses machine learning concept & Spark MLlib library for running predictive analytics using a sample application.
Data Science has been getting lot of attention as organizations are starting to use data analytics to gain insights into their data. This article takes a closer look at Data Scientist role in 2016.
Current enterprise data architectures include NoSQL databases co-existing with RDBMS. In this article, author discusses a solution for managing NoSQL & relational data using unified data modeling. 5
Our physical world is about to become digitally enabled and according to various predictions, there will be many billions of IoT devices going online and collecting data in the coming years. 2
In this article, third installment of Apache Spark series, author discusses Apache Spark Streaming framework for processing real-time streaming data using a log analytics sample application. 7
In this article, author discusses the survival prediction of colorectal cancer as a multi-class classification problem and how to solve that problem using the Apache Spark's MLlib Java API.
Data Lake-as-a-Service provides big data processing in the cloud for business outcomes in a cost effective way. InfoQ spoke with Lovan Chetty & Hannah Smalltree from Cazena about these solutions work.
In this article, author discusses a bio-informatic software as a service (SaaS) product which was built as a public data warehousing and analytical platform for mass spectrometry data. 3