Book Review Python Machine Learning - Second Edition - InfoQ

The book Python Machine Learning, second edition by Sebastian Raschka and Vahid Mirjalili, is a tutorial to a broad range of machine learning applications with Python. It provides a practical introduction to machine learning using popular libraries like SciPy, NumPy, scikit-learn, Matplotlib, and pandas.

The main revision to the first edition is more chapters on neural network practices. There are now five chapters that discuss neural networks, and their implementation in TensorFlow. Besides the additional content, a lot of concepts from the first edition are refined. The revised edition is much bigger, much more in-depth, and still practical.

The book has a Github repository with all code inside of it. The code is put in Jupyter notebooks, showing code, explanation, and graphics explaining the code alongside each other. Although this provides a good overview of each method, the reader would have to search through these notebooks every time he would need to find code for a certain concept.

Newcomers to the world of machine learning will be happy with the first three chapters of the book. An overview of the main subareas of machine learning is given in the first chapter, giving you an idea what kind of methods to use for several types of problems. A downside of this chapter is that not all methods are coming back into the book. Users interested in reinforcement learning with their applications won't get a lot of help from this book. The second chapter gives a very gentle introduction to pattern classification. Algorithms are implemented using existing functions in the Numpy library, giving readers a feel for how basic classifiers work.

The third chapter gives a tour of machine learning classifiers using scikit-learn. This long chapter provides a great overview of functions you can use to build a support vector machine, decision tree, perceptron, and k-nearest neighbors. After reading the fourth chapter developers will be able to set up a learning pipeline that handles input and output data, pre-processes it, selects meaningful features, and applies a classifier on it.

The book is not meant to be read from cover to back. Readers can select the chapters they think are interesting and read through them. Developers who want to deploy their model will be happy with chapter 9. In this chapter, they are taught how to deploy the sentiment analysis model they made in chapter 8 into a web application. What I like about this chapter is that it doesn't stop explaining after the user is able to run it on its own pc. but explains how you can deploy it to a public server using PythonAnywhere.

The last 200 pages of the almost 600-page book are completely dedicated to neural networks. In chapter 12 of the book, the author explains how to implement a multilayer perceptron using the NumPy library. From chapter 12 TensorFlow is used for more difficult neural networks. The user will learn a lot of in-depth knowledge necessary to build your own neural networks and will apply the gained knowledge to classify images with deep convolutional neural networks.

The last chapter introduces the modeling of sequential data using recurrent neural networks. At the end of this chapter, the user builds two big "final" projects. One project is focused on performing sentiment analysis on the IMDb movie review database. What I like about this setup is that, if the reader also read chapter 8, the reader would be able to compare several methods to solve the same problem. The second project focuses on implementing a recurrent neural network to generate text.

Conclusion

The revised edition offers a lot of insight into machine learning for both beginners, as well as for engineers, who already use some machine learning techniques. Concepts are explained clearly in the many graphs, and the background of these concepts is explained mathematically. The book has a good balance between the theory, code, and mathematical background in a broad range of machine learning concepts. The addition of the Jupyter notebooks in the git repository is a great way to go through the code in the books without having to type everything out yourself.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Book Review Python Machine Learning - Second Edition

InfoQ Article Contest

Conclusion

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter