InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Using NLP, Machine Learning & Deep Learning Algorithms to Extract Meaning from Text
David Talby walks through building a natural language annotations pipeline with domain-specific annotators, and using deep learning to automatically expand and update taxonomies.
-
Scaling up Near Real-Time Analytics @Uber &LinkedIn
Chinmay Soman and Yi Pan discuss how Uber and LinkedIn use Apache Samza, Calcite and Pinot along with the analytics platform AthenaX to transform data to make it available for querying in minutes.
-
Real-Time Recommendations Using Spark Streaming
Elliot Chow discusses the data pipeline that they built with Kafka, Spark Streaming, and Cassandra to process Netflix user activities in real time for the Trending Now row.
-
Effective Data Pipelines: Data Mngmt from Chaos
Katharine Jarmul discusses implementation decisions for those looking for a practical recommendation on the "what" and "how" of data automation workflows.
-
The Move to AI: from HFT to Laplace Demon
Eric Horesnyi and Albert Bifet discuss how hedge funds have moved beyond High Frequency Trading using AI and real-time data processing.
-
Building Data Pipelines in Python
Marco Bonzanini discusses the process of building data pipelines and all the steps necessary to prepare data, focusing on data plumbing and going from prototype to production.
-
Policing the Stock Market with Machine Learning
Cliff Click talks about SCORE, a solution for doing Trade Surveillance using H2O, Machine Learning, and a whole lot of domain expertise and data munging.
-
Stream Processing & Analytics with Flink @Uber
Danny Yuan discusses how Uber builds its next generation of stream processing system to support real-time analytics as well as complex event processing.
-
Building a Data Science Capability from Scratch
Victor Hu covers the challenges, both technical and cultural, of building a data science team and capability in a large, global company.
-
Data Cleansing and Understanding Best Practices
Casey Stella talks about discovering missing values, values with skewed distributions and likely errors within data, as well as a novel approach to finding data interconnectedness.
-
Predictability in ML Applications
Claudia Perlich presents scenarios in which the combination of different and highly informative features can have significantly negative overall impact on the usefulness of predictive modeling.
-
SQL Server on Linux: Will it Perform or Not?
Slava Oks talks about SQL Server’s history, high-level architecture and dives into core of I/O Manager, Memory Manager, and Scheduler. Topics include lessons learned and experiences behind the scenes.