Amazon Web Services launches Machine Learning Service

Amazon Web Services have recently launched their Amazon Machine Learning service that allows users to learn predictive models in the cloud. After Google with Prediction API, and Microsoft with Azure Machine Learning, Amazon is the latest major cloud service provider to launch a similar service.

The service currently provides a learning model similar to that used in many large scale learning applications, as well as visualizations for basic data statistics and the predictive performance of the learned model, but still has some limitations in terms of flexibility, data import and export, and support for automated model parameter tuning.

In the past years, many services and products have been launched to simplify data analysis. Some of these have focussed on simplicity by hiding most of the complexity from the user, while others try to provide a more complete set of data analysis tools for specialists.

Amazon's latest offering falls into the first category. It only deals with prediction problems. The exact underlying learning algorithm is not known, but the features it provides are very similar to vowpal wabbit, a fast machine learning algorithm developed by Jon Langford based on the stochastic gradient descent algorithm. This algorithm, which works by sequentially streaming the data past the model and adapting it based on the observed prediction error, is inherently hard to parallelized but very efficient and has bounded memory usage, and is therefore the workhorse behind many large scale applications (used, for example, for ad click prediction at Google).

In addition, Amazon Machine Learning can compute basic statistics per feature for the training data, and it provides visualizations for the prediction performance of the learned model. These two features allow the user to inspect the data and gain a better understanding into the learned prediction model. Finally, the service has some basic features for doing simple transformations on the data like extracting features, or turning text into an n-gram representation, which is often used for textual data.

There are some limitations. Data must reside in Amazon's S3 storage service, or in a Redshift database, and there is no way to import or export the learned model. There is no support for automatically training and evaluating many model variants in parallel in order to tune the model parameters, although this procedure has high practical value.

A first review also notes that the performance of the system is still somewhat lacking compared to just using a tool like vowpal wabbit locally on a laptop.

Google's Prediction API, which was launched in 2010, falls into the same category. It only deals with prediction problems, and not with more complex problems like recommendation, or unsupervised learning methods like clustering. The interface essentially only lets you upload data, train, and evaluate a model, and use a stored model to compute predictions.

Microsoft Azure Machine Learning, on the other hand, has a much more rich interface and is geared to a more specialized audience. It exposes different kinds of learning algorithms, lets the user compose complex feature transformation pipelines, and even integrates R scripts. Other examples are PredictionIO or GraphLab Create.

Apache Spark is also developing a machine learning library that can be used, for example, via databricks cloud to perform complex scalable data analysis in the cloud.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Enterprise Architecture topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter