InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Apache Drill - Interactive Query and Analysis at Scale
Michael Hausenblas introduces Apache Drill, a distributed system for interactive analysis of large-scale datasets, including its architecture and typical use cases.
-
A Guide to Python Frameworks for Hadoop
Uri Laserson reviews the different available Python frameworks for Hadoop, including a comparison of performance, ease of use/installation, differences in implementation, and other features.
-
Evolving Panorama of Data
Rebecca Parsons reviews some of the changes in how data is used and analyzed, looking at how data is used to track violence, and attempts to predict famine and other crises before they happen.
-
Leveraging Scriptable Infrastructures, Towards a Paradigm Shift in Software for Data Science
Karim Chine introduces Elastic-R, demonstrating some of its applications in bioinformatics and finance.
-
Approximate Methods for Scalable Data Mining
Andrew Clegg overviews methods and provides use cases for performing data sets operations like membership testing, distinct counts, and nearest-neighbour finding more efficiently.
-
Data Science of Love
Vaclav Petricek digs some of the romantic interactions nuggets hidden in eHarmony's large collection of human relationships.
-
Leveraging Your Hadoop Cluster Better - Running Performant Code at Scale
Michael Kopp explains how to run performance code at scale with Hadoop and how to analyze and optimize Hadoop jobs.
-
Core.logic and SQL Killed my ORM
Craig Brozefsky presents the tradeoffs involved with moving to a purely SQL relational model, instead of using an ORM, along with some of the tools built to facilitate this.
-
The Technology behind an Equity Trade
John O’Hara discusses banking business and technology integration, covering: low-latency, high-frequency trading, in-memory caches, multi-terabyte time-series databases, and contracts in NoSQL stores.
-
Lessons Learned Building Storm
Nathan Marz shares lessons learned building Storm, an open-source, distributed, real-time computation system.
-
Building Applications using Apache Hadoop
Eli Collins overviews how to build new applications with Hadoop and how to integrate Hadoop with existing applications, providing an update on the state of Hadoop ecosystem, frameworks and APIs.
-
Copious Data, the "Killer App" for Functional Programming
Dean Wampler supports using Functional Programming and its core operations to process large amounts of data, explaining why Java’s dominance in Hadoop is harming Big Data’s progress.