Evolution in Data Integration From EII to Big Data
Approaches to integrating data are changing with emergence of cloud computing.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Gilad Manor on Apr 19, 2010
The Machine Learning Open Source Project Apache Mahout has announced its 0.3 version on March, adding functionality, stability and performance. InfoQ spoke with co-founder and commiter Grant Ingersoll and commiter Ted Dunning, from the Apache Mahout project.
The need for machine-learning techniques like clustering, collaborative filtering, and categorization has steadily increased the last decade along with the number of solutions needing algorithms to transform vast amounts of raw data into relevant information.
The Mahout Project as introduced by Grant Ingersoll addresses:
Another important aspect of the Mahout solution is the set of tools for creating vector representations of textual data. This is the first step in enabling Mahout learning algorithms process a data base.
The Mahout project was started by several people involved in the Apache Lucene (the open source search project) community with an active interest in machine learning algorithms for clustering and categorization. The community was initially driven by Ng et al.'s paper "Map-Reduce for Machine Learning on Multicore" but has since evolved to cover much broader machine-learning approaches.
The new Apache Mahout release highlights:
When asked what the most exiting feature in this release is, Ingersoll replied:
The addition of distributed Singular Value Decomposition (SVD) is pretty exciting as well as many utilities to make it easier for people to get their content into Mahout… the most exciting feature is actually a non-tangible one… the demonstration of the Mahout community reaching a critical mass of contributors and users. In the life of any open source project, the early stages can be very tenuous with just one or two people doing most of the work and if any one of those people stops or even slows down, the project can whither on the vine. I believe that Mahout has passed that threshold and has many people now actively contributing to build something truly exciting.
Future plans for the Mahout project include:
The implementations of the SGD and the SVM will be applicable to document mining and other applications that relate to text or repeated categorical data. Of particular interest is the fact that the SGD system will be introducing the ability to build interaction variables on the fly.
Monitor your Production Java App - includes JMX! Low Overhead - Free download
Using Drools? See what you're missing! Get the Power of Drools with the Assurance of Red Hat
In today’s hyper-competitive world, later may be too late to adopt Agile development and this Roadmap for Success will help you get started. Download "Agile Development: A Manager's Roadmap for Success" now!
Approaches to integrating data are changing with emergence of cloud computing.
Michele Ide-Smith presents the lessons learned in the process of introducing UX principles and techniques into a large organization through a series of small steps.
Dave Farley and Martin Thompson discuss solutions for doing low-latency high throughput transactions based on the Disruptor concurrency pattern.
Rajneesh Namta shares his thoughts, experiences, and some of the critical lessons learned while implementing software test automation on a recent Agile project.
Dale Schumacher presents several patterns of actor interaction that can be used in collaborative programs written in any language.
Rúnar Bjarnason discusses Scalaz, a Scala library of pure data structures, type classes, highly generalized functions, and concurrency abstractions to perform functional programming in Scala.
One of the main challenges when designing software architecture is considering quality attributes. Not only their design turns out to be difficult, but also the specification of these attributes.
Michael Feathers analyzes real code bases concluding that code is not nearly as beautiful as designers aspire to, discussing the everyday decisions that alter the code bit by bit.
1 comment
Watch Thread Reply