InfoQ Homepage Data Analysis Content on InfoQ
-
Haskell in the Newsroom
Erik Hinton discusses the successes and failures of making a cultural shift in the newsroom at NYT to accept Haskell and some of the projects Haskell has been used for.
-
An API for Distributed Computing
Cliff Click introduces a coding style & API for in-memory analytics that handles datasets from 1K to 1TB without changing a line of code and clusters with TB of RAM and hundreds of CPUs.
-
From The Lab To The Factory: Building A Production Machine Learning Infrastructure
Josh Wills discusses using Hadoop technologies to build real-time data analysis models with a focus on strategies for data integration, large-scale machine learning, and experimentation.
-
R for Big Data
Indrajit Roy presents HP Labs’ attempts at scaling R to efficiently perform distributed machine learning and graph processing on industrial-scale data sets.
-
Deploying Machine Learning and Data Science at Scale
Nick Kolegraff discusses common problems and architecture to support all the phases of data science and how to start a data science initiative, sharing lessons from Accenture, Best Buy, and Rackspace.
-
Functional Programming for Optimization Problems with City of Palo Alto Open Data
Paco Nathan reviews an example data analysis application written in Cascalog used for a recommender system based on City of Palo Alto Open Data.
-
Add ALL the Things: Abstract Algebra Meets Analytics
Avi Bryant discusses how the laws of group theory provide a useful codification of the practical lessons of building efficient distributed and real-time aggregation systems.
-
"Big Data" Agile Analytics
Ken Collier discusses Agile Analytics, a combination of sophisticated analytics techniques, lean learning principles, agile delivery methods, and "big data" technologies.
-
Making the Internet a Better Place: Scaling AppNexus
Mike Nolet shares lessons learned scaling AppNexus and architectural details of their system processing 30TB/day: Hadoop, DNS built in GSLB and Keepalived, and real-time data streaming built in C.
-
Apache Drill - Interactive Query and Analysis at Scale
Michael Hausenblas introduces Apache Drill, a distributed system for interactive analysis of large-scale datasets, including its architecture and typical use cases.
-
A Little Graph Theory for the Busy Developer
Jim Webber explains how to understand the forces and tensions within a graph structure and to apply graph theory in order to predict how the graph will evolve over time.
-
A Guide to Python Frameworks for Hadoop
Uri Laserson reviews the different available Python frameworks for Hadoop, including a comparison of performance, ease of use/installation, differences in implementation, and other features.