InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
From Alibaba to Apache: RocketMQ’s Past, Present, and Future
Feng Jia and Wang Xiaorui share the core distributed systems principals behind RocketMQ, Alibaba's distributed messaging and data streaming platform now open sourced through the Apache Foundation.
-
Key Takeaway Points and Lessons Learned from QCon London 2017
This year was the 11th for QCon London; it was also our largest London event to date. Including our 140 speakers we had 1435 team leads, architects, and project managers attending 112 technical sessions across 18 concurrent editorial tracks and 16 in-depth workshops.
-
Want to Know What’s in a GC Pause? Go Look at the GC Log!
Sometimes a superficial analysis of our application performance can incorrectly have the Garbage Collector point to itself. A proper GC log analysis can lead us past the “blame the collector” game. When this happens, we can make amazing discoveries that improve the performance and stability of our applications.
-
Building Pipelines for Heterogeneous Execution Environments for Big Data Processing
The Pipeline61 framework supports the building of data pipelines involving heterogeneous execution environments. It reuses the existing code of the deployed jobs in different environments and provides version control and dependency management that deals with typical software engineering issues. A real-world case study shows its effectiveness.
-
Introducing Reladomo - Enterprise Open Source Java ORM, Batteries Included!
Goldman Sachs is widely known as a leader in investment banking, but they are very much a leading technology firm as well. Reladomo is the primary Java ORM used at GS, and it is now open source. In this article GS Technology Fellow, Mohammad Rezaei, takes us on a deep dive into Reladomo.
-
There's No AI (Artificial Intelligence) without IA (Information Architecture)
Artificial intelligence (AI) is increasingly hyped by everyone, from well-funded startups to well-known software brands. In this article the author describes the need for high-quality, structured data before AI technologies can be of use to organizations and their customers.
-
Big Data Processing Using Apache Spark - Part 6: Graph Data Analytics with Spark GraphX
In this article, author Srini Penchikala discusses Apache Spark GraphX library used for graph data processing and analytics. The article includes sample code for graph algorithms like PageRank, Connected Components and Triangle Counting.
-
Three Experts on Big Data Engineering
Clemens Szyperski (Microsoft), Martin Petitclerc (IBM), and Roger Barga (Amazon Web Services) answer three questions: What major challenges do you face when building scalable, big data systems? How do you address these challenges? Where should the research community focus its efforts to create tools and approaches for building highly reliable, scalable, big data systems?
-
Data Preprocessing vs. Data Wrangling in Machine Learning Projects
This article compares different alternative techniques to prepare data, including extract-transform-load (ETL) batch processing, streaming ingestion and data wrangling. The article also discusses how this is related to visual analytics, and best practices for how different user roles such as the Data Scientist or Business Analyst should work together to build analytic models.
-
Testing RxJava2
You are ready to explore reactive opportunities in your code but you are wondering how to test out the reactive idiom in your codebase. In this article Java Champion Andres Almiray provides techniques and tools for testing RxJava2.
-
Article Series: An Introduction to Machine Learning for Software Developers
Get an introduction to some powerful but generally applicable techniques in machine learning for software developers. These include deep learning but also more traditional methods that are often all the modern business needs. After reading the articles in the series, you should have the knowledge necessary to embark on concrete machine learning experiments in a variety of areas on your own.
-
Book Review: Andrew McAfee and Erik Brynjolfsson's "The Second Machine Age"
Andrew McAffee and Erik Brynjolfsson begin their book The Second Machine Age with a simple question: what innovation has had the greatest impact on human history?