InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Streaming Live Data and the Hadoop Ecosystem
Oleg Zhurakousky discusses the Hadoop ecosystem – Hadoop, HDFS, Yarn-, and how projects such as Hive, Atlas, NiFi interact and integrate to support the variety of data used for analytics.
-
Scaling the Data Infrastructure @Spotify
Mārtiņš Kalvāns and Matti Pehrs overview the Data Infrastructure at Spotify, diving into some of the data infrastructure components, such us Event Delivery, Datamon and Styx.
-
Scaling Counting Infrastructure @Quora
Chun-Ho Hung and Nikhil Garg discuss Quanta, Quora's counting system powering their high-volume near-real-time analytics, describing the architecture, design goals, constraints, and choices made.
-
Java (SE) State of the Union
Gil Tene presents the current state of Java SE and OpenJDK, the role of Java in the Big Data and Infrastructure components, JCP, the ecosystem, trends, etc.
-
Scaling Quality on Quora Using Machine Learning
Nikhil Garg talks about the various Machine Learning problems that are important for Quora to solve in order to keep the quality high at such a massive scale.
-
Query Understanding: a Manifesto
Daniel Tunkelang talks about what search looks like when viewed through a query understanding mindset. He focuses on query performance prediction, query rewriting, and search suggestions.
-
Iterative Design for Data Science Projects
Bo Peng goes over how Datascope iterated on the major pieces of the Expert Finder application project to produce actionable insights and recommendations on methodologies.
-
The Art of Relevance and Recommendations
Clarence Chio talks about the creation of a real-world relevance and recommendation system from scratch.
-
Reactive Kafka
Rajini Sivaram talks about Kafka and reactive streams and then explores the development of a reactive streams interface for Kafka and the use of this interface for building robust applications.
-
Cloud Native Streaming and Event-driven Microservices
Marius Bogoevici demonstrates how to create complex data processing pipelines that bridge the big data and enterprise integration together and how to orchestrate them with Spring Cloud Data Flow.
-
Operationalizing Data Science Using Cloud Foundry
Lawrence Spracklen creates a machine learning model leveraging data within MPP databases such as Apache HAWQ or Greenplum integrated with Chorus and then deploying this as a microservice on PCF.
-
Spring for Apache Kafka
Gary Russell takes a look at the features of the spring-kafka project as well as the new version (2.0) of spring-integration-kafka which is now based on the Spring for Apache Kafka project.