InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
GoshawkDB: Making Time with Vector Clocks
Matthew Sackman discusses dependencies between transactions, how to capture these with Vector Clocks, how to treat Vector Clocks as a CRDT, and how GoshawkDB uses them for a distributed data store.
-
Predicting the Future: Surprising Revelations trom Truly Big Data
Pushpraj Shukla discusses how Microsoft Bing predicts the future based on aggregate human behavior using one of the largest scale data sets, and recent progress in large scale deep learnt models.
-
Staying in Sync: from Transactions to Streams
Martin Kleppmann explores using event streams and Kafka for keeping data in sync across heterogeneous systems, and compares this approach to distributed transactions.
-
Netflix Keystone - How We Built a 700B/day Stream Processing Cloud Platform in a Year
Peter Bakas presents in detail how Netflix has used Kafka, Samza, Docker, and Linux to implement a multi-tenant pipeline processing 700B events/day in the Amazon AWS cloud.
-
Hunting Criminals with Hybrid Analytics
David Talby demos using Python libraries to build a ML model for fraud detection, scaling it up to billions of events using Spark, and what it took to make the system perform and ready for production.
-
Resilient Predictive Data Pipelines
Sid Anand discusses how Agari is applying big data best practices to the problem of securing its customers from email-born threats, presenting a system that leverages big data in the cloud.
-
Big-Data Analytics Misconceptions
Irad Ben-Gal discusses Big Data analytics misconceptions, presenting a technology predicting consumer behavior patterns that can be translated into wins, revenue gains, and localized assortments.
-
Microservices for a Streaming World
Ben Stopford discusses using stream processing tools for real-time business apps, handling infinite streams, leveraging high throughput, deploying dynamic, fault-tolerant, and streaming services.
-
How Comcast Uses Data Science and ML to Improve the Customer Experience
Jan Neumann presents how Comcast uses machine learning and big data processing to facilitate search for users, for capacity planning, and predictive caching.
-
The Mechanics of Testing Large Data Pipelines
Mathieu Bastian explores the mechanics of unit, integration, data and performance testing for large, complex data workflows, along with the tools for Hadoop, Pig and Spark.
-
Modeling Avengers: Open Source Technology Mix for Saving the World
The speakers discuss Smart Farming System Tooling, an environment to model, analyze and simulate an agricultural exploitation, biomass growth and water consumption based on user input and open data.
-
Understanding Real-time Conversations on Facebook
Janet Wiener discusses using a data pipeline and graphic visualizations to extract and analyze the Chorus – the aggregated, anonymized voice of the people communicating on Facebook - in real time.