In this article, author Srini Penchikala discusses Apache Spark GraphX library used for graph data processing and analytics. The article includes sample code for graph algorithms like PageRank, Connected Components and Triangle Counting.
Clemens Szyperski (Microsoft), Martin Petitclerc (IBM), and Roger Barga (Amazon Web Services) answer three questions: What major challenges do you face when building scalable, big data systems? How do you address these challenges? Where should the research community focus its efforts to create tools and approaches for building highly reliable, scalable, big data systems?
This article compares different alternative techniques to prepare data, including extract-transform-load (ETL) batch processing, streaming ingestion and data wrangling. The article also discusses how this is related to visual analytics, and best practices for how different user roles such as the Data Scientist or Business Analyst should work together to build analytic models.
Advice on the best talks to attend at QCon London 2017 from London Thought Leaders.
InfoQ talked with Immuta’s Andrew Burt and Steve Touw, to better understand the implications and challenges of the EU's Global Data Protection Regulation, which will come into effect in May 2018.
Cassandra: The Definitive Guide, 2nd Edition book authored by Jeff Carpenter and Eben Hewitt covers the Cassandra NoSQL database version 3.0. InfoQ spoke with the co-author Jeff Carpenter.
In this series we explore ways of making sense of data science - understanding where it’s needed and where it’s not, and how to make it an asset for you, from people who’ve been there and done it.
Yahoo uses Hadoop for different use cases in big data & machine learning areas. InfoQ spoke with Peter Cnudde on how Yahoo leverages big data technologies.
Internet of Things (IoT) is an emerging technology. One of the areas of IoT is the connected vehicles. In this article, we'll use Spark and Kafka to analyse and process IoT connected vehicle's data. 8
In this fifth installment of Apache Spark article series, author Srini Penchikala discusses Spark ML package and how to use it to create and manage machine learning data pipelines. 2
InfoQ spoke with authors of Spark GraphX in Action book, Apache Spark framework and what's coming up in the area of graph data processing and analytics.
InfoQ interviews Chris Fregly, organizer for the 4000+ member Advanced Spark and TensorFlow Meetup about the PANCAKE STACK workshop, Spark and building data pipelines for a machine learning pipeline