10 tips on how to prevent business value risk
One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Ryan Slobojan on Apr 23, 2009
The Apache Mahout project, a set of highly scalable machine-learning libraries, recently announced it's first public release. InfoQ spoke with Grant Ingersoll, co-founder of Mahout and a member of the technical staff at Lucid Imagination, to learn more about this project and machine learning in general.
When asked to describe Mahout in more detail, Ingersoll said:
Mahout is a library aimed at delivering scalable machine learning tools under the Apache license. Our goal is to build a healthy, active community of users and contributors around practical, scalable, production-ready machine learning algorithms like, but not limited to, clustering, classification and collaborative filtering. We use Hadoop as a way of delivering on the scalability promise for many of the implementations, but we are not solely dependent on it. Many machine learning algorithms simply do not fit the Map Reduce model, so we will employ other means when appropriate.
Personally speaking, I hope Mahout does for machine learning what Apache Lucene and Solr has done for search. Namely, make it easy for anyone to build a production-quality, intelligent application that scales to fit their needs just as Lucene and Solr have made it possible for anyone to build a scalable search application. We have a ways to go in this regard, but the 0.1 release is a good first step in that direction.
In describing what machine learning was, Ingersoll quoted Introduction To Machine Learning by Ethem Alpaydin, "Machine Learning is programming computers to optimize a performance criterion using example data or past experience".
Major features which are included in the initial release of Mahout are:
A comprehensive feature list is also available.
When asked to describe sample applications for some of these algorithms, Ingersoll indicated that the Taste filtering provided recommendations of items that a user would like based on their preferences, such as movie recommendations. Clustering is used to group together arbitrary data into categories of similar items, with the grouping of similar news stories being an example of this. Classification is another, with the most common example being the classification of email as either junk mail or not. The use of Mahout on the Amazon Elastic MapReduce cloud was also touched upon, with Ingersoll mentioning that work to get Mahout running on the cloud is in progress and that Mahout is a natural fit for the cloud:
Many of the big players in search and social networking are already using Map Reduce (and other distributed approaches) and machine learning to drive their applications. Mahout, in the long run, should make the ability to build these types of applications even easier and cheaper by reducing the startup costs and licensing fees associated with obtaining machine learning capabilities and know-how. Furthermore, by working to build a community of users where anyone is welcome to contribute, we think we will be around for a long time.
When asked about future plans for Mahout, Ingersoll said:
First and foremost is getting Mahout known so that people can try it out and give us feedback to improve it. Because it is open source, it is sometimes difficult to know exactly what is going to happen because so many great ideas come from seemingly out of the blue; however, I can tell you my personal wish list:
- More demos and documentation, especially info on how to run on EC2
- More algorithms. I'd particularly like to see a linear regression implementation and neural networks implementations, amongst others, because those are familiar to a lot of people.
- Solidify the API's so that we can work towards a 1.0 release such that people can reliably upgrade to a new release without having to make major changes to their code.
- Obtain a variety of performance metrics so that people can know what they are likely to see in their implementation.
Complimentary Gartner (Hype Cycle for Cloud Security Report)
Monitor your Production Java App - includes JMX! Low Overhead - Free download
Improve Java Garbage Collection, Runtime Execution, and JVM visibility with Zing
Using Drools? See what you're missing! Get the Power of Drools with the Assurance of Red Hat
One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.
InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.
Alex Papadimoulis discusses ugly code, where it comes from, how to avoid it, and how to get rid of it.
John Davies examines Visa’s architecture and shows how enterprises have architected complex integrations incorporating Hadoop, memcached, Ruby on Rails, and others to deliver innovative solutions.
Sean Comerford unveils ESPN.com’s architecture, what components are used and why, and the current changes the website goes through.
Are there repeated patterns of failure on Enterprise Agile Enablement efforts? Sanjiv and Arlen discuss Seven Deadly Sins to avoid when adopting Agile in an enterprise.
Erik Dörnenburg answers: What is Enterprise and Evolutionary Architecture?, discussing 4 issues: Turning strategy into execution, Ensuring conformance, Where do the architects sit? Buying or building?
Sean Cribbs explains what Map-Reduce and Riak are, why and how to use Map-Reduce with Riak, and how to convert SQL queries into their Map-Reduce equivalents.
No comments
Watch Thread Reply