Book Review – Data Science for Business

By | March 3, 2015

Provost and Fawcett do a fantastic job of describing the main techniques used in data mining – classification, clustering and regression – along with high level explanations of the algorithms most commonly used for each. In addition, they present an expected value framework that is very useful for choosing the right balance between true positives, false positives, etc. in the predictions of a model.

Data Science for Business is by no means an easy read for even technical readers, unless you have significant prior experience in machine learning and the relevant statistical techniques and algorithms. The book calls out the deeper technical sections that it says you can safely skip, but I feel there was a lot of critical detail in them. Nonetheless, it’s still relatively light on the math, in keeping with the target audience. The first few chapters could have been much shorter and clearer if the authors had replaced a lot of words with a much smaller number of equations, but then they may have needed to retitle the book.

One area the book doesn’t cover in extensive detail is the visualization of model performance, though there is an adequate description of ROC graphs, fitting graphs, lift curves, with an emphasis on learning enough about them to understand them at a high level.

The Coursera ML class is a better way to learn how machine learning algorithms really work, but you have to have decent programming and math skills and be willing to spend about 10-15 hours a week on it for up to 10 weeks. This book is an excellent compromise for the much shorter investment in time to read it.

Data Science for Business also turned out to be a great complement to another book I’ve been reading, How Not to Be Wrong, by Jordan Ellenberg, which I will also will recommend once I finish it. Or maybe I’ll just recommend it now.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.