InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
RXJava by Example
In the ongoing evolution of paradigms for simplifying concurrency under load, the most promising addition is reactive programming, a specification that provides tools for handling asynchronous streams of data and for managing flow-control, making it easier to reason about overall program design. In this article we overcome the learning curve with a gentle progression of examples.
-
Big Data Processing with Apache Spark - Part 5: Spark ML Data Pipelines
With support for Machine Learning data pipelines, Apache Spark framework is a great choice for building a unified use case that combines ETL, batch analytics, streaming data analysis, and machine learning. In this fifth installment of Apache Spark article series, author Srini Penchikala discusses Spark ML package and how to use it to create and manage machine learning data pipelines.
-
The InfoQ Podcast: Cathy O'Neil on Pernicious Machine Learning Algorithms and How to Audit Them
In this week's podcast InfoQ’s editor-in-chief Charles Humble talks to Data Scientist Cathy O’Neil. Topics discussed include her book “Weapons of Math Destruction,” predictive policing models, the teacher value added model, approaches to auditing algorithms and whether government regulation of the field is needed.
-
Spark GraphX in Action Book Review and Interview
“Spark GraphX in Action” book from Manning Publications, authored by Michael Malak and Robin East, provides a tutorial based coverage of Spark GraphX, the graph data processing library from Apache Spark framework. InfoQ spoke with authors about the book and Spark GraphX library as well as overall Spark framework and what's coming up in the area of graph data processing and analytics.
-
Book Review: Cathy O’Neil’s Weapons of Math Destruction
“Big Data has plenty of evangelists, but I’m not one of them,” writes Cathy O’Neil, a blogger (mathsbabe.org) and former quantitative analyst at the hedge fund DE Shaw who became sufficiently disillusioned with her hedge fund modelling that she joined the Occupy movement.
-
Chris Fregly on the PANCAKE STACK Workshop and Data Pipelines
InfoQ Interviews Chris Fregly, organizer for the 4000+ member Advanced Spark and TensorFlow Meetup about the PANCAKE STACK workshop, Spark and building data pipelines for a machine learning pipeline
-
Christine Doig on Data Science as a Team Discipline
Christine Doig spoke at this year's OSCON Conference about data science as a team discipline and how to navigate the data science Python ecosystem. InfoQ spoke with Christine about challenges data science teams need to address to be more effective.
-
Virtual Panel: Current State of NoSQL Databases
NoSQL databases have been around for several years now and have become a choice of data storage for managing semi-structured and unstructured data. These databases offer lot of advantages in terms of linear scalability and better performance for both data writes and reads. InfoQ spoke with four panelists to get different perspectives on the current state of NoSQL databases.
-
Key Takeaway Points and Lessons Learned from QCon New York 2016
The fifth annual QCon New York was the biggest yet, bringing together over 800 team leads, architects, project managers, and engineering directors. In total, over 140 practitioner-speakers presented 79 full-length technical sessions and 16 in-depth tutorials, providing deep insights into real-world architectures and state of the art software development practices from a practitioner’s perspective.
-
What the JIT!? Anatomy of the OpenJDK HotSpot VM
If you've ever wondered what happens when your bytecode executes, join former Oracle G1GC performance-lead Monica Beckwith in her guided tour of just-in-time (JIT) compilation and runtime optimizations in OpenJDK HotSpot VM.
-
Big Data Analytics with Spark Book Review and Interview
Big Data Analytics with Spark book, authored by Mohammed Guller, provides a practical guide for learning Apache Spark framework for different types of big-data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning. InfoQ spoke with author about the book & development tools for big data applications.
-
Configure Once, Run Everywhere: Decoupling Configuration and Runtime
Configuration is one of the most widely used cross-cutting concerns in application development. Apache Tamaya is a new incubator project that brings standardized property management to Java.