Hazelcast, previously known for the open-source caching and in-memory data grid technologies, has announced a major release of their new stream processing engine, Jet.
The Uber Engineering team released their Kafka auditing tool called Chaperone as an open-source project. Chaperone allows for auditing and detection of data loss, latency, and duplication of messages in the multi-datacenter and high-volume Kafka setup at Uber.
Apache Eagle, an open-source solution for identifying security and performance issues on big data platforms, graduates to Apache top level project on January 10, 2017. Firstly open-sourced by eBay on October 2015, Eagle was created to instantly detect access to sensitive data or malicious activities and, to take actions in a timely fashion.
Julien Nioche, director of DigitalPebble, PMC member and committer of the Apache Nutch web crawler project, talks about StormCrawler, a collection of reusable components to build distributed web crawlers based on the streaming framework Apache Storm. InfoQ interviewed Nioche, main contributor of the project, to find out more about StormCrawler and how it compares to other similar technologies.
Microsoft recently announced an addition to its Platform as a Service (PaaS) offering called Azure Functions. Initially launched as a preview service in March 2016, Azure Functions provide developers with an event-driven serverless compute platform that allow organizations to pay for only what they consume.
Javier Lopez and Mihail Vieru spoke at Reactive Summit 2016 Conference about cloud-based data integration and distribution platform used for stream processing in business intelligence use cases. Their solution is based on technologies such as Flink, Kafka and Elasticsearch.
Lambda architecture has been a popular solution that combines batch and stream processing. Kartik Paramasivam at LinkedIn wrote about how his team addressed stream processing and Lambda architecture challenges using Apache Samza for data processing. The challenges described are the late arrival of events and the processing of duplicated messages.
Confluent Enterprise latest version supports multi-datacenter replication, automatic data balancing, and cloud migration capability. Confluent, provider of the Apache Kafka based streaming platform, announced last week the new features for Confluent Enterprise, to help build streaming data pipelines and develop stream processing applications.
In her presentation "Large-Scale Stream Processing with Apache Kafka" at QCon New York 2016, Neha Narkhede introduces Kafka Streams, a new feature of Kafka for processing streaming data. According to Narkhede stream processing has become popular because unbounded datasets can be found in many places. It is no longer a niche problem like, for example, machine learning.
Event sourcing and CQRS are two patterns that has emerged in the Domain-Driven Design (DDD) community. Stream processing builds on similar ideas but has emerged in a different community, Martin Kleppmann noted in his presentation at the Domain-Driven Design Europe conference earlier this year comparing event sourcing with stream processing.
On Thursday, April 21 Microsoft announced the integration between Azure Stream Analytics and Power BI has reached General Availability (GA). Using this capability, customers can gain real-time insight into their business performance by analyzing in-flight data streams.
Version 1.0 is "a major milestone in the evolution of Apache Storm", writes Apache Software Foundation VP for Apache Storm P. Taylor Goetz, and it includes many new features and improvements. In particular, Goetz claims a 3x–16x boost in performance.
Embrace decentralization, build service-based systems and attack the problems that come with distributed state using stream processing tools, Ben Stopford urged in his presentation at the recent QCon London conference.
With many databases in a system they are rarely independent from each other, instead pieces of the same data are stored in many of them. Using transactions to keep everything in sync is a fragile solution. Working with a stream of changes in the order they are created is a much simpler and more resilient solution, Martin Kleppmann stated in his presentation at the recent QCon London conference.
Netflix has shed light on how the company uses the latest version of their Keystone Data Pipeline, a petabyte-scale real-time event stream processing system for business and product analytics. This news summarizes the three major versions of the pipeline, now used by almost every application at Netflix.