With support for Machine Learning data pipelines, Apache Spark framework is a great choice for building a unified use case that combines ETL, batch analytics, streaming data analysis, and machine learning. In this fifth installment of Apache Spark article series, author Srini Penchikala discusses Spark ML package and how to use it to create and manage machine learning data pipelines.
“Spark GraphX in Action” book from Manning Publications, authored by Michael Malak and Robin East, provides a tutorial based coverage of Spark GraphX, the graph data processing library from Apache Spark framework. InfoQ spoke with authors about the book and Spark GraphX library as well as overall Spark framework and what's coming up in the area of graph data processing and analytics.
Containers are just around the corner for the Windows community, and this article takes a closer look at using SQL Server containers.
InfoQ interviews Chris Fregly, organizer for the 4000+ member Advanced Spark and TensorFlow Meetup about the PANCAKE STACK workshop, Spark and building data pipelines for a machine learning pipeline
Christine Doig spoke at OSCON Conference about data science as a team discipline and how to navigate data science Python ecosystem. InfoQ spoke with Christine about challenges of data science teams.
Kostiantyn Cherniavskyi looks at some of the issues surrounding the object-relation impedance mismatch and how many of them can be solved with hybrid databases such as Starcounter. 5
NoSQL databases have been around for several years and have become a preferred choice for managing unstructured data. InfoQ spoke with four panelists about the current state of NoSQL databases. 2
Big Data Analytics with Spark, authored by Mohammed Guller, provides a practical guide for learning Apache Spark. InfoQ and the author discuss the book & development tools for big data applications.
It makes no difference how hard you try- some form of lock-in is unavoidable. What matters most is understanding the layers of lock-in, and how to assess and reduce your switching costs.
Datastax recently announced DataStax Graph to support graph data models. InfoQ spoke with Martin Van Ryswyk from DataStax team about the new product. 1
In this fourth installment of Apache Spark article series, author Srini Penchikala discusses machine learning concept & Spark MLlib library for running predictive analytics using a sample application.
Reveno is a powerful new, performant, JVM based lock-free transaction processing framework based on CQRS and event-sourcing patterns. In this article we develop a simple trading system using Reveno. 2