InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Peter Cnudde on How Yahoo Uses Hadoop, Deep Learning and Big Data Platform
Yahoo uses Hadoop for different use cases in big data & machine learning areas. They also use deep learning techniques in their products like Flickr. InfoQ spoke with Peter Cnudde on how Yahoo leverages big data platform technologies.
-
A Quick Primer on Isolation Levels and Dirty Reads
Recently MongoDB found itself at the top of Reddit again when developer David Glasser learned the hard way that MongoDB performs dirty reads by default. In this article we will explain what isolation levels and dirty reads are and how they are implemented in popular databases.
-
Traffic Data Monitoring Using IoT, Kafka and Spark Streaming
Internet of Things (IoT) is an emerging disruptive technology and becoming an increasing topic of interest. One of the areas of IoT application is the connected vehicles. In this article we'll use Apache Spark and Kafka technologies to analyse and process IoT connected vehicle's data and send the processed data to real time traffic monitoring dashboard.
-
RXJava by Example
In the ongoing evolution of paradigms for simplifying concurrency under load, the most promising addition is reactive programming, a specification that provides tools for handling asynchronous streams of data and for managing flow-control, making it easier to reason about overall program design. In this article we overcome the learning curve with a gentle progression of examples.
-
Big Data Processing with Apache Spark - Part 5: Spark ML Data Pipelines
With support for Machine Learning data pipelines, Apache Spark framework is a great choice for building a unified use case that combines ETL, batch analytics, streaming data analysis, and machine learning. In this fifth installment of Apache Spark article series, author Srini Penchikala discusses Spark ML package and how to use it to create and manage machine learning data pipelines.
-
The InfoQ Podcast: Cathy O'Neil on Pernicious Machine Learning Algorithms and How to Audit Them
In this week's podcast InfoQ’s editor-in-chief Charles Humble talks to Data Scientist Cathy O’Neil. Topics discussed include her book “Weapons of Math Destruction,” predictive policing models, the teacher value added model, approaches to auditing algorithms and whether government regulation of the field is needed.
-
Spark GraphX in Action Book Review and Interview
“Spark GraphX in Action” book from Manning Publications, authored by Michael Malak and Robin East, provides a tutorial based coverage of Spark GraphX, the graph data processing library from Apache Spark framework. InfoQ spoke with authors about the book and Spark GraphX library as well as overall Spark framework and what's coming up in the area of graph data processing and analytics.
-
Book Review: Cathy O’Neil’s Weapons of Math Destruction
“Big Data has plenty of evangelists, but I’m not one of them,” writes Cathy O’Neil, a blogger (mathsbabe.org) and former quantitative analyst at the hedge fund DE Shaw who became sufficiently disillusioned with her hedge fund modelling that she joined the Occupy movement.
-
Chris Fregly on the PANCAKE STACK Workshop and Data Pipelines
InfoQ Interviews Chris Fregly, organizer for the 4000+ member Advanced Spark and TensorFlow Meetup about the PANCAKE STACK workshop, Spark and building data pipelines for a machine learning pipeline
-
Christine Doig on Data Science as a Team Discipline
Christine Doig spoke at this year's OSCON Conference about data science as a team discipline and how to navigate the data science Python ecosystem. InfoQ spoke with Christine about challenges data science teams need to address to be more effective.
-
Virtual Panel: Current State of NoSQL Databases
NoSQL databases have been around for several years now and have become a choice of data storage for managing semi-structured and unstructured data. These databases offer lot of advantages in terms of linear scalability and better performance for both data writes and reads. InfoQ spoke with four panelists to get different perspectives on the current state of NoSQL databases.
-
Key Takeaway Points and Lessons Learned from QCon New York 2016
The fifth annual QCon New York was the biggest yet, bringing together over 800 team leads, architects, project managers, and engineering directors. In total, over 140 practitioner-speakers presented 79 full-length technical sessions and 16 in-depth tutorials, providing deep insights into real-world architectures and state of the art software development practices from a practitioner’s perspective.