InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Fraud Detection Using Random Forest, Neural Autoencoder, and Isolation Forest Techniques
In this article, the authors discuss how to detect fraud in credit card transactions, using supervised machine learning algorithms (random forest, logistic regression) as well as outlier detection approaches using isolation forest technique and anomaly detection using the neural autoencoder.
-
Privacy Attacks on Machine Learning Models
Research has shown that machine learning models can expose personal information present in their training data. This vulnerability exposes sensitive user information to attackers savvy enough to learn how to hack a machine learning API. We'll explore the details of several privacy attacks against machine learning models and provide some potential solutions for this growing security issue.
-
Stream Processing Anomaly Detection Using Yurita Framework
In this article, author Guy Gerson discusses the stream processing anomaly detection framework they developed by PayPal, called Yurita. The framework is based on Spark Structured Streaming.
-
How to Use Open Source Prometheus to Monitor Applications at Scale
In this article, the author discusses how to collect metrics and achieve anomaly detection from streaming data using Prometheus, Apache Kafka and Apache Cassandra technologies.
-
Using Intel Analytics Zoo to Inject AI into Customer Service Platform (Part II)
This article shares the practical experience of building a QA ranker module on Azure’s customer support platform using Intel Analytics Zoo by Microsoft Azure China team. You can quickly learn step by step how to prepare data to train, evaluate and tune a text matching model at scale and finally productionize it as a service using Analytics Zoo.
-
How to Mitigate the Pain of Getting and Giving Feedback
Companies that encourage open and honest feedback do better than companies that do not. Nonetheless, giving feedback is difficult because social and physical pain share some of the same neural circuitry. Hence, feedback can feel physically painful, as Sarah Hagan discusses in her 2018 QCon San Francisco talk . Hagan uses scientific research to demonstrate how to give feedback properly.
-
Real-Time Data Processing Using Redis Streams and Apache Spark Structured Streaming
Structured Streaming, introduced with Apache Spark 2.0, delivers a SQL-like interface for streaming data. Redis Streams enables Redis to consume, hold and distribute streaming data between multiple producers and consumers. In this article, author Roshan Kumar walks us through how to process streaming data in real time using Redis and Apache Spark Streaming technologies.
-
Open Source Robotics: Getting Started with Gazebo and ROS 2
An introduction to Gazebo, a powerful robot simulator that calculates physics, generates sensor data and provides convenient interfaces, and ROS 2, the latest version of the Robot Operating System, which offers familiar tools and capabilities, while expanding to new use cases. Both are open source and used by academia and industry alike.
-
The Data Science Mindset: Six Principles to Build Healthy Data-Driven Organizations
In this article, business and technical leaders will learn methods to assess whether their organization is data-driven and benchmark its data science maturity. They will learn how to use the Healthy Data Science Organization Framework to nurture a data science mindset within the organization.
-
The Impact and Ethics of Conversational Artificial Intelligence
Improvements in natural language understanding and our changing relationship means we can use chatbots in ways we couldn’t before - both to augment human conversation and support, or indeed, to replace it. Those working in the software industry must understand and take responsibility for how we use Conversational AI and our users' data.
-
Test Automation in the World of AI & ML
An in-depth look at the criteria & requirements for Functional Test Automation in the agile world, and the capabilities you should build in your custom framework, or should exist the tools you choose. Anand Bagmar explores aspects like readability, reuse, debugging / rca, CI, Test Data, Parallel Execution, integration with other tools & libraries, free Vs open-source and support.
-
Increasing the Quality of Patient Care through Stream Processing
Today’s healthcare technology landscape is disaggregated and siloed. Physicians analyse patient data streams from different systems without much correlation. Even though health-tech domain is mature and rich with data, the value of it is not directed towards increasing the quality of patient care. This article presents a stream processing solution in which streams are co-related.