InfoQ Homepage Streaming Content on InfoQ
-
The Future of Data Engineering
Chris Riccomini examines the current and future states of the art in data pipelines, data streaming, and data warehousing. He presents a six-stage evolution that data ecosystems follow, from a simple monolith to a complex data-microwarehouse architecture as the data engineers who manage them solve problems and clarify their roles as infrastructure engineers, rather than data stewards.
-
How Apache Pulsar is Helping Iterable Scale its Customer Engagement Platform
In this article, author Greg Methvin discusses his experience implementing a distributed messaging platform based on Apache Pulsar.
-
Beyond the Database, and beyond the Stream Processor: What's the Next Step for Data Management?
Databases have been around forever with the same shape: you make a request to your data and then you receive an answer. Now, stream processors came along with a different approach: data isn’t locked up, it is in motion. Understand how stream processors and databases relate and why there is an emerging new category of databases that focus on data that stays in place as well as data that moves.
-
Real Time APIs in the Context of Apache Kafka
Events offer a Goldilocks-style approach in which real-time APIs can be used as the foundation for applications which is flexible yet performant; loosely-coupled yet efficient. Apache Kafka offers a scalable event streaming platform with which you can build applications around the powerful concept of events.
-
The Challenges of Building a Reliable Real-Time Event-Driven Ecosystem
Globally, there is an increasing appetite for data delivered in real time; we are witnessing the emergence of the real time API. When it comes to event-driven APIs engineers can choose between multiple different protocols. In addition to choosing a protocol, engineers also have to think about subscription models, too: server-initiated (push-based) or client-initiated (pull-based).
-
Applied Probability - Counting Large Set of Unstructured Events with Theta Sketches
In this article, author Ronen Cohen discusses the solution to processing the event data using Theta Sketches and technologies like HBase and Kafka.
-
Is Edge Computing a Thing?
Edge Computing is definitely a thing, but the computing need not occur at the edge. Instead what is needed is an ability to compute (anywhere) on streaming data from large numbers of dynamically changing devices, in the edge environment. This in turn demands an architectural pattern for stateful, distributed computing.
-
The Kongo Problem: Building a Scalable IoT Application with Apache Kafka
In this article, author Paul Brebner discusses the best practices for developing IoT projects using Apache Kafka and Kafka Streams technologies and how to maximize Kafka scalability.
-
Rethinking Flink’s APIs for a Unified Data Processing Framework
Since its very early days, Apache Flink has followed the philosophy of taking a unified approach to batch and streaming. The core building block is the “continuous processing of unbounded data streams, with batch as a special, bounded set of those streams.” Recent updates to the Flink APIs include architectural designs by the community to support batch and streaming unification in Apache Flink.
-
Azure Data Lake Analytics and U-SQL
In this article, the author shows how to use big data query and processing language U-SQL on Azure Data Lake Analytics platform. U-SQL combines the concepts and constructs both of SQL and C#. It combines the simplicity and declarative nature of SQL with the programmatic power of C# including rich types and expressions.
-
Stream Processing Anomaly Detection Using Yurita Framework
In this article, author Guy Gerson discusses the stream processing anomaly detection framework they developed by PayPal, called Yurita. The framework is based on Spark Structured Streaming.
-
How to Use Open Source Prometheus to Monitor Applications at Scale
In this article, the author discusses how to collect metrics and achieve anomaly detection from streaming data using Prometheus, Apache Kafka and Apache Cassandra technologies.