InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Big Data Processing Using Apache Spark - Part 6: Graph Data Analytics with Spark GraphX
In this article, author Srini Penchikala discusses Apache Spark GraphX library used for graph data processing and analytics. The article includes sample code for graph algorithms like PageRank, Connected Components and Triangle Counting.
-
Three Experts on Big Data Engineering
Clemens Szyperski (Microsoft), Martin Petitclerc (IBM), and Roger Barga (Amazon Web Services) answer three questions: What major challenges do you face when building scalable, big data systems? How do you address these challenges? Where should the research community focus its efforts to create tools and approaches for building highly reliable, scalable, big data systems?
-
Data Preprocessing vs. Data Wrangling in Machine Learning Projects
This article compares different alternative techniques to prepare data, including extract-transform-load (ETL) batch processing, streaming ingestion and data wrangling. The article also discusses how this is related to visual analytics, and best practices for how different user roles such as the Data Scientist or Business Analyst should work together to build analytic models.
-
Testing RxJava2
You are ready to explore reactive opportunities in your code but you are wondering how to test out the reactive idiom in your codebase. In this article Java Champion Andres Almiray provides techniques and tools for testing RxJava2.
-
Article Series: An Introduction to Machine Learning for Software Developers
Get an introduction to some powerful but generally applicable techniques in machine learning for software developers. These include deep learning but also more traditional methods that are often all the modern business needs. After reading the articles in the series, you should have the knowledge necessary to embark on concrete machine learning experiments in a variety of areas on your own.
-
Book Review: Andrew McAfee and Erik Brynjolfsson's "The Second Machine Age"
Andrew McAffee and Erik Brynjolfsson begin their book The Second Machine Age with a simple question: what innovation has had the greatest impact on human history?
-
Deterministic Execution on the JVM
For many use cases (for example cryptocurrency ledgers), we need to ensure that any action will execute deterministically and terminate. In this article, Ben Evans reviews the theory behind the WhitelistClassLoader.
-
Learning Paths: QCon London Expert Recommendations
Advice on the best talks to attend at QCon London 2017 from London Thought Leaders.
-
Real-World, Man-Machine Algorithms
In this article, we'll talk about the end-to-end flow of developing machine learning models: where you get training data, how you pick the ML algorithm, what you must address after your model is deployed, and so forth.
-
RXJava2 by Example
In the ongoing evolution of paradigms for simplifying concurrency under load, the most promising addition is reactive programming, a specification that provides tools for handling asynchronous streams of data and for managing flow-control, making it easier to reason about overall program design. In this article we overcome the learning curve with a gentle progression of examples.
-
Anomaly Detection for Time Series Data with Deep Learning
This article introduces neural networks, including brief descriptions of feed-forward neural networks and recurrent neural networks, and describes how to build a recurrent neural network that detects anomalies in time series data. To make our discussion concrete, we’ll show how to build a neural network using Deeplearning4j, a popular open-source deep-learning library for the JVM.
-
Q&A with Immuta on the Implications of EU’s General Data Protection Regulation (GDPR)
InfoQ talked with Immuta’s Andrew Burt and Steve Touw, to better understand the implications and challenges of the EU's Global Data Protection Regulation, which will come into effect in May 2018.