Netflix has developed an orchestration engine called “Conductor”, and has used it internally in production for the last year . During this time they executed some 2.6 million process workflows, starting with linear ones and ending with dynamic ones running over multiple days. Now they have open sourced Conductor, making it available to all those interested in workflow orchestration.
Julien Nioche, director of DigitalPebble, PMC member and committer of the Apache Nutch web crawler project, talks about StormCrawler, a collection of reusable components to build distributed web crawlers based on the streaming framework Apache Storm. InfoQ interviewed Nioche, main contributor of the project, to find out more about StormCrawler and how it compares to other similar technologies.
Drew Koszewnik of Netflix talks to Rags Srinivas about a disseminated cache called Hollow.
Google wants to push for HTTPS everywhere with a combination of deprecating existing Chrome features in non-secure sites, as well as new features only supported in HTTPS.
A Facebook team has recently published a comparison of the performance of their existing Giraph-based graph processing system with the newer GraphX which is part of the popular Spark framework. Their conclusion is that GraphX is neither sufficiently scalable or performant to support their graph processing workloads.
Julien Le Dem, the PMC chair of the Apache Arrow project, presented on Data Eng Conf NY on the future of column-oriented data processing. Apache Arrow is an open-source standard for columnar in-memory execution. InfoQ interviewed Le Dem to find out the differences between Arrow and Parquet.
Couchbase 4.6 Developer Preview features full text search improvements, cross data center replication with globally-ordered conflict resolution and connectors for real-time analytics technologies: one for Spark 2.0 and the other for Kafka.
Apache Spark integration with deep learning library TensorFlow, online learning using Structured Streaming and GPU hardware acceleration were the highlights of Spark Summit EU 2016 held last week in Brussels.
Microsoft recently released two new data science tools for interactive data exploration: modeling and reporting. These tools can be reused by data science teams with data specific tasks in their projects. The goal is to ensure consistency and completeness of data science tasks across different projects in the organization.
Javier Lopez and Mihail Vieru spoke at Reactive Summit 2016 Conference about cloud-based data integration and distribution platform used for stream processing in business intelligence use cases. Their solution is based on technologies such as Flink, Kafka and Elasticsearch.
On September 26th, Microsoft announced the Azure DNS service has reached General Availability (GA) in all public Azure regions. Azure DNS allows customers to host their DNS domain in Azure, so they can manage their DNS records using the same credentials, billing and support contract as their other Azure services.
Wolfram, the software company behind computation-centric products like Mathematica and Wolfram|Alpha, shipped a new private cloud appliance targeting companies that want to centralize their computational efforts.
Lambda architecture has been a popular solution that combines batch and stream processing. Kartik Paramasivam at LinkedIn wrote about how his team addressed stream processing and Lambda architecture challenges using Apache Samza for data processing. The challenges described are the late arrival of events and the processing of duplicated messages.
Apache Kafka and Kafka Streams frameworks help with developing stream-centric architectures and distributed stream processing applications. Jay Kreps, CEO of Confluent, gave the keynote presentation on stream processing and microservices at Reactive Summit 2016 Conference last week.
Reactive microservices, data center scale operating system (DCOS), and staging reactive data pipelines were the highlighted topics at Reactive Summit 2016 Conference held this week. InfoQ team attended the conference and this post is a summary of the first day's events at the conference.