Julien Nioche, director of DigitalPebble, PMC member and committer of the Apache Nutch web crawler project, talks about StormCrawler, a collection of reusable components to build distributed web crawlers based on the streaming framework Apache Storm. InfoQ interviewed Nioche, main contributor of the project, to find out more about StormCrawler and how it compares to other similar technologies.
Akka.NET 1.1 was recently released, bringing new features and performance improvements. InfoQ reached out to Aaron Stannard, maintainer of Akka.net, to learn more about Akka.Streams and Akka.Cluster. Stannard also explains how the roadmap is planned with regards to the JVM implementation of Akka.
Version 1.0 is "a major milestone in the evolution of Apache Storm", writes Apache Software Foundation VP for Apache Storm P. Taylor Goetz, and it includes many new features and improvements. In particular, Goetz claims a 3x–16x boost in performance.
Yahoo! has benchmarked three of the main stream processing frameworks: Apache Flink, Spark and Storm.
After several years of development, MBrace 1.0 was released last week. MBrace is a programming model for scalable cloud data scripting and programming with F# and C#. The project consists mainly of code libraries and cloud providers runtime.
GameAnalytics, maker of a free analytics platform, has recently open sourced gascheduler an Erlang library that provides a generic scheduler for parallel execution of distributed tasks. InfoQ has spoken to Chris de Vries, one of gascheduler’s creators.
ELIoT (Extensible Language for the Internet of Things) is a simple and small programming language aiming to make distributed programming easier. A program in ELIoT may appear as a sigle program, but it actually runs on different computers, so, e.g., a variable or function declared on one computer is transparently used on another.
Twitter has replaced Storm with Heron which provides up to 14 times more throughput and up to 10 times less latency on a word count topology, and helped them reduce the needed hardware to a third.
Martin Thompson answers a few questions about the opportunity for developers and architects to introduce custom protocols to their system's interaction points.
Hadoop is definitely the platform of choice for Big Data analysis and computation. While data Volume, Variety and Velocity increases, Hadoop as a batch processing framework cannot cope with the requirement for real time analytics. Spark, Storm and the Lambda Architecture can help bridge the gap between batch and event based processing.
Mobile Backend as a Service provider AnyPresence continues to hone their chops. Launching the fifth update to their self-titled platform geared for the enterprise. Co-founder Rich Mendis provides some insights for InfoQ readers…
Twitter has open-sourced Storm, its distributed, fault-tolerant, real-time computation system, at GitHub under the Eclipse Public License 1.0. Storm is the real-time processing system developed by BackType, which is now under the Twitter umbrella.
FlightCaster recently open sourced Crane, a tool for distributing and remotely controlling Clojure instances, currently specialized for EC2. Incanter is a Clojure library and tool that makes R-like statistical computations easy with Clojure. Also: the build and dependency management tool Leiningen 1.0 is now available.
With the multiplicity of existing remoting mechanisms it is often necessary to build clients in a way that allows to swap/introduce new protocols with no/minimal impact to the client’s implementation. A new framework – CRISPY - provides support for such implementations.
Tim Bray of Sun Microsystems writes of the Fallacies of Distributed Computing; He observes that despite its profound implications when designing distributed systems, “you don’t often find them coming up in conversations about building big networked systems”.