InfoQ Homepage Big Data Content on InfoQ
-
Three Experts on Big Data Engineering
Clemens Szyperski (Microsoft), Martin Petitclerc (IBM), and Roger Barga (Amazon Web Services) answer three questions: What major challenges do you face when building scalable, big data systems? How do you address these challenges? Where should the research community focus its efforts to create tools and approaches for building highly reliable, scalable, big data systems?
-
Learning Paths: QCon London Expert Recommendations
Advice on the best talks to attend at QCon London 2017 from London Thought Leaders.
-
Q&A with Immuta on the Implications of EU’s General Data Protection Regulation (GDPR)
InfoQ talked with Immuta’s Andrew Burt and Steve Touw, to better understand the implications and challenges of the EU's Global Data Protection Regulation, which will come into effect in May 2018.
-
Cassandra: The Definitive Guide, 2nd Edition Book Review and Interview
Cassandra: The Definitive Guide, 2nd Edition book authored by Jeff Carpenter and Eben Hewitt covers the Cassandra NoSQL database version 3.0. Authors discuss several different important topics related to this popular database, including data modeling and Cassandra architecture. InfoQ spoke with Jeff Carpenter about the book and Cassandra database current features and future roadmap.
-
Article Series: Getting a Handle on Data Science as a Software Developer
Software developers and managers are realizing that they need data science among their skills, to be able to tackle pressing problems. In this series, field experts provide guidance to help us navigate among the available data analysis options. They explore ways of understanding where data science is needed and where it’s not, and how to turn it into an asset.
-
Peter Cnudde on How Yahoo Uses Hadoop, Deep Learning and Big Data Platform
Yahoo uses Hadoop for different use cases in big data & machine learning areas. They also use deep learning techniques in their products like Flickr. InfoQ spoke with Peter Cnudde on how Yahoo leverages big data platform technologies.
-
Traffic Data Monitoring Using IoT, Kafka and Spark Streaming
Internet of Things (IoT) is an emerging disruptive technology and becoming an increasing topic of interest. One of the areas of IoT application is the connected vehicles. In this article we'll use Apache Spark and Kafka technologies to analyse and process IoT connected vehicle's data and send the processed data to real time traffic monitoring dashboard.
-
Big Data Processing with Apache Spark - Part 5: Spark ML Data Pipelines
With support for Machine Learning data pipelines, Apache Spark framework is a great choice for building a unified use case that combines ETL, batch analytics, streaming data analysis, and machine learning. In this fifth installment of Apache Spark article series, author Srini Penchikala discusses Spark ML package and how to use it to create and manage machine learning data pipelines.
-
Spark GraphX in Action Book Review and Interview
“Spark GraphX in Action” book from Manning Publications, authored by Michael Malak and Robin East, provides a tutorial based coverage of Spark GraphX, the graph data processing library from Apache Spark framework. InfoQ spoke with authors about the book and Spark GraphX library as well as overall Spark framework and what's coming up in the area of graph data processing and analytics.
-
Chris Fregly on the PANCAKE STACK Workshop and Data Pipelines
InfoQ Interviews Chris Fregly, organizer for the 4000+ member Advanced Spark and TensorFlow Meetup about the PANCAKE STACK workshop, Spark and building data pipelines for a machine learning pipeline
-
Christine Doig on Data Science as a Team Discipline
Christine Doig spoke at this year's OSCON Conference about data science as a team discipline and how to navigate the data science Python ecosystem. InfoQ spoke with Christine about challenges data science teams need to address to be more effective.
-
Big Data Analytics with Spark Book Review and Interview
Big Data Analytics with Spark book, authored by Mohammed Guller, provides a practical guide for learning Apache Spark framework for different types of big-data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning. InfoQ spoke with author about the book & development tools for big data applications.