InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Developing Real-time Data Pipelines with Apache Kafka
Joe Stein makes an introduction for developers about why and how to use Apache Kafka. Apache Kafka is a publish-subscribe messaging system rethought of as a distributed commit log.
-
The Lightning Memory-mapped Database
Howard Chu discusses the Lightning Memory-Mapped Database (LMDB) design and architecture, and its impact on other projects such as OpenLDAP.
-
Supercharging Operations and Analytics: Using Spring XD to Support Analytics and CEP
Joseph Paulchell discusses the journey from batch-oriented processes using databases to a real-time data streaming solution and the significant benefits achieved as well as the challenges encountered.
-
Hadoop Workflows and Distributed YARN Apps using Spring Technologies
The authors discuss how Spring for Apache Hadoop can make developing workflows with Map Reduce, Spark, Hive and Pig jobs easier, and using Spring Cloud to build distributed apps for YARN.
-
Apache Spark for Big Data Processing
Ilayaperumal Gopinathan and Ludwine Probst discuss Spark and its ecosystem, in particular Spark Streaming and MLlib, providing a concrete example, and showing how to use Spark with Spring XD.
-
Experiences Building InfluxDB in Go
Paul Dix shares his experience building InfluxDB, an open source distributed time series database, in Go.
-
Federated Queries with HAWQ - SQL on Hadoop and Beyond
Christian Tzolov shows different integration approaches between HAWQ and GemFire, showing using Spring XD to ingest GemFire data into HDFS and using Spring Boot to implement a RESTful proxy for HAWQ.
-
IoT Realized - The Connected Car v2
Phil Berman and Michael T Minella present a solution developed with Spring XD to stream real-time analytics from a moving car using open standards.
-
Dino DNA! Health Identity from the Wrist @Jawbone
Brian Wilt discusses how applied machine learning techniques and data science helped Jawbone build a successful fitness tracking app.
-
Takes a Village to Raise a Machine Learning Model
Lucian Vlad Lita focuses on the next step in personalization: well-designed software architectures for storing, computing, and delivering responsive, accurate in-product predictions and experiments.
-
The Lego Model for Machine Learning Pipelines
Leah McGuire describes the machine learning platform Salesforce wrote on top of Spark to modularize data cleaning and feature engineering.
-
Richer Data History with Event Sourcing
Steve Pember presents the basic concepts of Event Sourcing, its role on analytics and performance, and the importance of storing historical events to get a view on data at any time.