InfoQ Homepage Event Stream Processing Content on InfoQ
-
Julien Nioche on StormCrawler, Open-Source Crawler Pipelines Backed by Apache Storm
Julien Nioche, director of DigitalPebble, PMC member and committer of the Apache Nutch web crawler project, talks about StormCrawler, a collection of reusable components to build distributed web crawlers based on the streaming framework Apache Storm. InfoQ interviewed Nioche, main contributor of the project, to find out more about StormCrawler and how it compares to other similar technologies.
-
Azure Functions Reach General Availability
Microsoft recently announced an addition to its Platform as a Service (PaaS) offering called Azure Functions. Initially launched as a preview service in March 2016, Azure Functions provide developers with an event-driven serverless compute platform that allow organizations to pay for only what they consume.
-
Microservices and Stream Processing Architecture at Zalando Using Apache Flink
Javier Lopez and Mihail Vieru spoke at Reactive Summit 2016 Conference about cloud-based data integration and distribution platform used for stream processing in business intelligence use cases. Their solution is based on technologies such as Flink, Kafka and Elasticsearch.
-
Stream Processing and Lambda Architecture Challenges
Lambda architecture has been a popular solution that combines batch and stream processing. Kartik Paramasivam at LinkedIn wrote about how his team addressed stream processing and Lambda architecture challenges using Apache Samza for data processing. The challenges described are the late arrival of events and the processing of duplicated messages.
-
Confluent Announces Kafka for the Enterprise with Multi-Datacenter Replication
Confluent Enterprise latest version supports multi-datacenter replication, automatic data balancing, and cloud migration capability. Confluent, provider of the Apache Kafka based streaming platform, announced last week the new features for Confluent Enterprise, to help build streaming data pipelines and develop stream processing applications.
-
Neha Narkhede: Large-Scale Stream Processing with Apache Kafka
In her presentation "Large-Scale Stream Processing with Apache Kafka" at QCon New York 2016, Neha Narkhede introduces Kafka Streams, a new feature of Kafka for processing streaming data. According to Narkhede stream processing has become popular because unbounded datasets can be found in many places. It is no longer a niche problem like, for example, machine learning.
-
Comparison of Event Sourcing with Stream Processing
Event sourcing and CQRS are two patterns that has emerged in the Domain-Driven Design (DDD) community. Stream processing builds on similar ideas but has emerged in a different community, Martin Kleppmann noted in his presentation at the Domain-Driven Design Europe conference earlier this year comparing event sourcing with stream processing.
-
Azure Stream Analytics Publishing to Power BI Reaches General Availability
On Thursday, April 21 Microsoft announced the integration between Azure Stream Analytics and Power BI has reached General Availability (GA). Using this capability, customers can gain real-time insight into their business performance by analyzing in-flight data streams.
-
Apache Storm Reaches 1.0, Brings Improved Performance, Many New Features
Version 1.0 is "a major milestone in the evolution of Apache Storm", writes Apache Software Foundation VP for Apache Storm P. Taylor Goetz, and it includes many new features and improvements. In particular, Goetz claims a 3x–16x boost in performance.
-
Microservices for a Streaming World
Embrace decentralization, build service-based systems and attack the problems that come with distributed state using stream processing tools, Ben Stopford urged in his presentation at the recent QCon London conference.
-
Moving from Transactions to Streams to Gain Consistency
With many databases in a system they are rarely independent from each other, instead pieces of the same data are stored in many of them. Using transactions to keep everything in sync is a fragile solution. Working with a stream of changes in the order they are created is a much simpler and more resilient solution, Martin Kleppmann stated in his presentation at the recent QCon London conference.
-
Netflix Details Evolution of Keystone Data Pipeline
Netflix has shed light on how the company uses the latest version of their Keystone Data Pipeline, a petabyte-scale real-time event stream processing system for business and product analytics. This news summarizes the three major versions of the pipeline, now used by almost every application at Netflix.
-
Architecting Scalable, Dynamic Systems when Eventual Consistency Won’t Work
Architecting a scalable and dynamic system without caching is explained by Peter Morgan, head of engineering for the sports betting company William Hill. The values of the bets on sporting events change constantly. No data can be cached; all system values must be current. Distributed Erlang processes model domain objects which instantly recalculate system values based on data streams from Kafka.
-
The Basics of Being Reactive
A key problem with the whole Reactive space and why it’s so hard to understand is the vocabulary with all the terms and lots of different interpretations of what it means, Peter Ledbrook claims and also a reason for why he decided to work out what it’s all about and sharing his knowledge in a presentation.
-
Yahoo! Benchmarks Apache Flink, Spark and Storm
Yahoo! has benchmarked three of the main stream processing frameworks: Apache Flink, Spark and Storm.