InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Achieving Mega-Scale Business Intelligence through Speed of Thought Analytics on Hadoop
Ian Fyfe discusses the different options for implementing speed-of-thought business analytics and machine learning tools directly on top of Hadoop.
-
Hydrator: Open Source, Code-Free Data Pipelines
Jonathan Gray introduces Hydrator, an open source framework and user interface for creating data lakes for building and managing data pipelines on Spark, MapReduce, Spark Streaming and Tigon.
-
Developing a Machine Learning Based Predictive Analytics Engine for Big Data Analytics
Ali Jalali presents how to develop a machine learning predictive analytics engine for big data analytics.
-
MongoDB-as-a-Service on Pivotal Cloud Foundry
Mallika Iyer and Sam Weaver cover a brief overview of Pivotal Cloud Foundry and deep dive into running MongoDB as a managed service on this platform.
-
Building an AI in the Cloud
Simon Chan shares the on-going challenges, the design dilemma and the steps to be taken when building customized large-scale predictive ML applications on a ML SaaS platform.
-
Wall St. Derivative Risk Solutions Using Geode
Andre Langevin and Mike Stolz discuss how Geode forms the core of many Wall Street derivative risk solutions which provide cross-product risk management at speeds suitable for automated hedging.
-
The Joy of Analysis Development
Hilary Parker discusses the history of the analysis development tools, the current state of the art, and the importance for data scientists and analysts to understand programming principles.
-
Machine Learning Fast and Slow
Suman Deb Roy talks about some of Betaworks’ internal data tools and platform, product-specific solutions and best practices they learned when machine learning has to drive the startup road.
-
Exploring Wikipedia with Apache Spark: A Live Coding Demo
Sameer Farooqui demos connecting to the live stream of Wikipedia edits, building a dashboard showing what’s happening with Wikipedia datasets and how people are using them in real time.
-
Adaptive Availability for Quality of Service
Theo Schlossnagle talks about lessons learned in building an always-on distributed time-series database with aggressive quality of service guarantees, and techniques for dealing with bad machines.
-
Ingest & Stream Processing - What Will You Choose?
Pat Patterson and Ted Malaska talk about current and emerging data processing technologies, and the various ways of achieving "at least once" and "exactly once" timely data processing.
-
Structuring Data for Self-Serve Customer Insights
Jim Porzak discusses creating an analyst ready data mart that is complete at different levels of abstraction and models customer decision points in order to be able to understand customers.