According to a new Forrest report, Hadoop’s momentum is unstoppable. Its usage in the enterprise is continuously growing due to its ability to offer companies new ways to store, process, analyze, and share big data. The report takes a look at Hadoop vendors and ranks them.
Recently, Spark graduated from the Apache incubator. Spark claims up to 100x speed improvements over Apache Hadoop over in-memory datasets and gracefully falling back to 10x speed improvement for on-disk performance. Based on Scala, it can run SQL queries and be used directly in R. It provides Machine Learning, Graph database capabilities and other further discussed in the article.
There are both commonalities and some differences when comparing architectural principles and coding styles in Akka Actors and Java EE 7 Enterprise JavaBeans, specifically stateless session beans and JMS message-driven beans, Dr Gerald Loeffler concludes in a recent introductory talk when explaining and comparing the three approaches from a high-level concurrency view.
Version 2.1 of CQRS framework Axon supports annotations and ordering of event handlers, a new conflict resolution together with performance improvements. The recently released version also adds compatibility with OSGi.
Elasticsearch released version 1.0.0 of its self-titled, open-source analytics tool. Elasticsearch is a distributed search engine which allows for real-time data analysis in big-data environments. The new version comes with various functional enhancements and changes to the API to make Elasticsearch more intuitive and powerful to use.
Google have announced general availability of their Cloud SQL service. At launch the service comes with automatic encryption of customer data, a 99.95% uptime SLA and support for databases up to 500GB in size.
To incrementally develop and deliver products using agile software development, requirements are gathered and organized into a product backlog. A requirement technique that is used in agile software development is use cases. Some techniques to apply use cases for managing product requirements in agile are use case 2.0, slicing and laminating.
The patterns & practices group at Microsoft have released a guide with solutions and patterns suitable when implementing cloud-hosted applications. The guide contains ten guidance topics together with 24 design patterns targeting eight categories of problems covering common areas in cloud application development. Also included are ten sample applications to demonstrate the usage these patterns.
Recently, Marvel has made available a public API and a RESTful service which provides access to their comics metadata.
In the race for interactive SQL in Big Data environments, there are two open source based front-runners, Impala and Hive with the Stinger project. Cloudera recently announced that Impala is up to 69 times faster than Hive 0.12 and can outperform DBMS. Other than raw speed, we take a look at other considerations in choosing a SQL engine for Hadoop and also Tez, an application framework for YARN.
Hadoop is definitely the platform of choice for Big Data analysis and computation. While data Volume, Variety and Velocity increases, Hadoop as a batch processing framework cannot cope with the requirement for real time analytics. Spark, Storm and the Lambda Architecture can help bridge the gap between batch and event based processing.
Complex Event Processing, CEP, can be very useful for problems that have to do with time e.g. querying over historical data when you want to correlate things that have happened at different times, Greg Young explained in a recent presentation.
The NodeJS based Koa web application framework has released version 0.2.0. Koa is the successor of the popular Express MVC platform, but relies heavily on newer ES6 constructs. This release is marked as an important one in that that it reaffirms the team’s design choices from the initial 0.1.0 release, solidifying Koa's API for future releases and production use.
With a new connector, it is now possible for Hadoop to run directly against Google Cloud Storage instead of using the default, distributed file system. This results in lower storage costs, fewer data replication activities, and a simpler overall process.