BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Amazon Makes Internal Machine-Learning Courses Public

Amazon Makes Internal Machine-Learning Courses Public

This item in japanese

Bookmarks

Amazon has published videos and supplementary materials from several of its internal Machine Learning University courses. The course lectures cover three machine-learning topics and can be watched on-demand on YouTube, while the slides, notebooks, and datasets can be downloaded from GitHub. A total of twelve courses are planned to be released by the end of the year.

Amazon announced the release of the courses in a recent blog post. The initial release consists of three "accelerated" courses, which all provide introductions to ML, then progress to the more specialized topics of tabular data, natural language processing (NLP), and computer vision (CV). Each class's video lectures are organized into playlists on YouTube, and supplemental material, including slide decks, datasets, and Jupyter notebooks are available from dedicated GitHub repositories. Nine additional "in-depth" classes are planned by the end of the year. The company initiated its internal Machine Learning University (MLU) as a way to help meet the demand for engineers with ML skills, and according to Amazon Web Services (AWS) research scientist Brent Werness,

By going public with the classes, we are contributing to the scientific community on the topic of machine learning, and making machine learning more democratic.

The three accelerated courses are all organized as a series of three lectures. While all the courses do touch on neural networks, the Tabular Data class focuses on what might be considered more "traditional" ML techniques, such as linear regression, decision tree, and k-nearest-neighbor models. Because tabular data---rows and columns---is a common format for much of the data world, the course also covers many ML basics such as feature engineering and model evaluation, as well as more advanced topics such as AutoML. The NLP class also covers some of the basics, but devotes a full lecture to deep-learning and neural networks, especially recurrent neural networks (RNN) and Transformers, which are the state-of-the-art NLP models. The CV class is almost entirely devoted to deep-learning, with particular emphasis on convolutional neural networks (CNNs), including AlexNet, ResNet, and VGGNet. At the end of each lecture series, students are encouraged to complete a final project, by building and evaluating their own machine-learning model on a given dataset.

While most of the material is presented as platform-agnostic, there is a detectable bias for Amazon products, and especially AWS services. Two of the courses, NLP and CV, are led by AWS scientists, and each course includes a video on how to run the course notebooks on the AWS Sagemaker machine-learning service. The neural network code in the supplemental Jupyter notebooks uses MXNet, the AWS deep-learning framework of choice, instead of either of the two most popular frameworks, TensorFlow and PyTorch.

In a Hacker News discussion about the announcment, one user said,

I took a number of these courses as an Amazon software engineer and I found them very useful. Amazon’s goal of the internal program is to train software engineers to know enough about data science to be effective as "machine learning engineers." Once trained, software engineers can either implement models themselves, or more likely, partner with a data science team to productionize a model.

The course lecture videos are available on YouTube, and the course materials are on GitHub.
 

Rate this Article

Adoption
Style

BT