InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Case Study: Selecting Big Data and Data Science Technologies at a large Financial Organisation
Adopting Big Data and Data Science technologies into an organisation is a transformative project similar to an agile transformation and with many similar challenges. In this article, the author describes such a project for a FTSE100 financial services company.
-
Solving Business Problems with Data Science
Enterprises are increasingly realising that many of their most pressing business problems could be tackled with the application of a little data science. This article, the first in a series, looks at the foundations of a successful business-orientated data science project.
-
Testing RxJava
You are ready to explore reactive opportunities in your code but you are wondering how to test out the reactive idiom in your codebase. In this article Java Champion Andres Almiray provides techniques and tools for testing RxJava.
-
Peter Cnudde on How Yahoo Uses Hadoop, Deep Learning and Big Data Platform
Yahoo uses Hadoop for different use cases in big data & machine learning areas. They also use deep learning techniques in their products like Flickr. InfoQ spoke with Peter Cnudde on how Yahoo leverages big data platform technologies.
-
A Quick Primer on Isolation Levels and Dirty Reads
Recently MongoDB found itself at the top of Reddit again when developer David Glasser learned the hard way that MongoDB performs dirty reads by default. In this article we will explain what isolation levels and dirty reads are and how they are implemented in popular databases.
-
Traffic Data Monitoring Using IoT, Kafka and Spark Streaming
Internet of Things (IoT) is an emerging disruptive technology and becoming an increasing topic of interest. One of the areas of IoT application is the connected vehicles. In this article we'll use Apache Spark and Kafka technologies to analyse and process IoT connected vehicle's data and send the processed data to real time traffic monitoring dashboard.
-
RXJava by Example
In the ongoing evolution of paradigms for simplifying concurrency under load, the most promising addition is reactive programming, a specification that provides tools for handling asynchronous streams of data and for managing flow-control, making it easier to reason about overall program design. In this article we overcome the learning curve with a gentle progression of examples.
-
Big Data Processing with Apache Spark - Part 5: Spark ML Data Pipelines
With support for Machine Learning data pipelines, Apache Spark framework is a great choice for building a unified use case that combines ETL, batch analytics, streaming data analysis, and machine learning. In this fifth installment of Apache Spark article series, author Srini Penchikala discusses Spark ML package and how to use it to create and manage machine learning data pipelines.
-
The InfoQ Podcast: Cathy O'Neil on Pernicious Machine Learning Algorithms and How to Audit Them
In this week's podcast InfoQ’s editor-in-chief Charles Humble talks to Data Scientist Cathy O’Neil. Topics discussed include her book “Weapons of Math Destruction,” predictive policing models, the teacher value added model, approaches to auditing algorithms and whether government regulation of the field is needed.
-
Spark GraphX in Action Book Review and Interview
“Spark GraphX in Action” book from Manning Publications, authored by Michael Malak and Robin East, provides a tutorial based coverage of Spark GraphX, the graph data processing library from Apache Spark framework. InfoQ spoke with authors about the book and Spark GraphX library as well as overall Spark framework and what's coming up in the area of graph data processing and analytics.
-
Book Review: Cathy O’Neil’s Weapons of Math Destruction
“Big Data has plenty of evangelists, but I’m not one of them,” writes Cathy O’Neil, a blogger (mathsbabe.org) and former quantitative analyst at the hedge fund DE Shaw who became sufficiently disillusioned with her hedge fund modelling that she joined the Occupy movement.
-
Chris Fregly on the PANCAKE STACK Workshop and Data Pipelines
InfoQ Interviews Chris Fregly, organizer for the 4000+ member Advanced Spark and TensorFlow Meetup about the PANCAKE STACK workshop, Spark and building data pipelines for a machine learning pipeline