InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Deploying Machine Learning and Data Science at Scale
Nick Kolegraff discusses common problems and architecture to support all the phases of data science and how to start a data science initiative, sharing lessons from Accenture, Best Buy, and Rackspace.
-
Working with Databases and Groovy
Paul King presents working with databases in Groovy, covering datasets, GMongo, Neo4J, raw JDBC, Groovy-SQL, CRUD, Hibernate, caching, Spring Data technologies, etc.
-
Functional Programming for Optimization Problems with City of Palo Alto Open Data
Paco Nathan reviews an example data analysis application written in Cascalog used for a recommender system based on City of Palo Alto Open Data.
-
Add ALL the Things: Abstract Algebra Meets Analytics
Avi Bryant discusses how the laws of group theory provide a useful codification of the practical lessons of building efficient distributed and real-time aggregation systems.
-
Big Data Platform as a Service at Netflix
Jeff Magnusson details some of Netflix' key services: Franklin, Sting and Lipstick.
-
Stream Processing: Philosophy, Concepts, and Technologies
Dan Frank discusses stream data processing and introduces NSQ – Bitly’s open source queuing system – and other new technologies used for communication between streaming programs.
-
"Big Data" Agile Analytics
Ken Collier discusses Agile Analytics, a combination of sophisticated analytics techniques, lean learning principles, agile delivery methods, and "big data" technologies.
-
High Speed Smart Data Ingest into Hadoop
Oleg Zhurakousky discusses architectural tradeoffs and alternative implementations of real-time high speed data ingest into Hadoop.
-
Making the Internet a Better Place: Scaling AppNexus
Mike Nolet shares lessons learned scaling AppNexus and architectural details of their system processing 30TB/day: Hadoop, DNS built in GSLB and Keepalived, and real-time data streaming built in C.
-
Apache Drill - Interactive Query and Analysis at Scale
Michael Hausenblas introduces Apache Drill, a distributed system for interactive analysis of large-scale datasets, including its architecture and typical use cases.
-
A Guide to Python Frameworks for Hadoop
Uri Laserson reviews the different available Python frameworks for Hadoop, including a comparison of performance, ease of use/installation, differences in implementation, and other features.
-
Evolving Panorama of Data
Rebecca Parsons reviews some of the changes in how data is used and analyzed, looking at how data is used to track violence, and attempts to predict famine and other crises before they happen.