Blog Archives

scikit-learn Cookbook by Trent Hauck, Packt Publishing Book Review

This book was released back in Fall 2014, but I did not had a chance to read it until recently. A big miss. As far as I can tell, it is the one of the few books covering as much ground as possible in concern to scikit-learn as free Machine Learning (ML) libraries available for Python. In general, the Machine Learning is a fascinated piece of science seeing a lot of traction these days, but it is a tad intimidating to grasp at the beginning, besides, its potential use cases given it fallen into the wrong hands (g-d forbid) can be scary. Otherwise I foresee a huge potential for it’s use in the IOT.

This book aims at easing the ML adoption hurdles providing with not less than 50 recipes which cover pretty much the whole scikit-learn landscape. I could see Trent made every effort to deliver a hight quality product. The book has a supplementary file that covers what an end user needs to install to go through all the material in the book and obtain sample data.

In terms of a general note, since this product is aiming at mostly the data scientist, engineers or research staff many topics are not going to be quite familiar to a wide non-technical or general IT audience, but please ensure you put an extra effort in understanding the concepts. Like I have said, the benefits are enormous. And prepare yourself to scratch your head a few times or more :-). Yes, this is a very advanced book. Yet, it seems that it covers all the possible scenarios and industry fields one can imagine off. Numerous graphics, detailed code samples and output examples, all are ready to copy and paste into the mighty Python REPL.

When I was reading the book I had a task at hand and I concentrated on the KMeans algorithm which is elegantly covered, and I enjoyed the most the chapter on Classifying Data. At the same time I think the cornerstone of the book is chapter 1 on pre-model workflow and the last on the post-model, I just did not see books to date going this far.

While this book is more like an ‘Academia’ publication it does have many practical applications, but for a less Data Science savvy person it desires to have more explanation on why XYZ and ABCs are necessary, or what each library function is used for and under what circumstances one would choose to use it.

Overall it is a tad dry, technical read, but at the same time no extra, volume inflating words were mixed in, so it is worth what you are paying for.

My verdict, it 4.5 our of 5.

Beginning Data Science with R by Manas A. Pathak, Springer

Beginnng Data Science With R

Continuing on with the Springer series on Computational Intelligence and Complexity I picked another book on the ever increasing in popularity R.

Besides, I read already several books from other publishers in 2014. The books were aiming at different levels, and at people from different professional backgrounds. Myself, a data practitioner, positioned rather away from being a data scientist, sitting closer to the server side, with periodic ETL or Business IntelligenceĀ development tasks at hand professional I started to realize the times have changed: each new project requires new depth and breaths of data analysis. Using Excel and its data add-ons does no longer cut it in. I was aware of tools as MATLAB, SAS and SPSS, but boy they cost!

I was always in love with data, linear, discrete algebra and statistics in general so for me R came to the natural choice. Learning tools as R (not just a new language) is quite an endeavour so I resorted to the World Wisde Web in search for good programming reference for R and was able to encounter numerous posts with recommendations. Then every book I picked did not quite stood to its promises. I was in despair.

But no longer, now when I found Beginning Data Science with R I feel empowered.

I will try to outline why this book does deliver.

All the beginner books I tossed into my tablet expected the reader to have an advanced knowledge in statistics. And far less in R. This is an ill conceived approach to me. In fact, R is a complex language with MANY nuances (not quirks IMO). Yet, there are dozens of ways to arrive to the same thing, like in Perl, but don’t let me get started…

The main idea I try to convey is that there was never a book with a good mix or balance per-se between R as a language and where it fits or excels (or delivers) when it comes to statistics or probalistics. This book does. It is very well-rounded. A reader from each level will find much useful information. So I don’t necessarily consider this book a beginner one. It has many reference links a reader can utilize to widen one’s knowledge. And I had so much fun reading this not terribly long book! All the topics are explained very well, with enough intro and concrete examples. Some chapters I see as a bonus, especially the one on text mining (which some from my G+ post do not consider a part of R) and decision trees. These I liked the most, they are poor fun, short, but very practical. Not to mention useful.

To sum up,

What this book is: a comprehensive, yet short tutorial on practical application of R to the modern data science tasks or projects. The book lays a solid foundation to develop your knowledge in R further on, a good guide for what is possible to extract from R and its CRAN (and the packages ecosystem) or even computational and quantitative science itself. Perhaps this book helps in grasping with Machine Learning as well, and other advanced areas of the Data Science.

What this book is not: a reader would need supplementary material to delve deeper into R as a language and may needĀ  extra practice on concrete or narrowed down, specific tasks/applications.

Who I recommend it to: managers who work on data projects, technical team leaders, CS students, Business Intelligence professionals, beginner architects, general computer academia, statisticians, several categories of scientists or researchers as biologists, lab, criminologists, and also Finance pros or actuarials.

Verdict: 5 out of 5, well done Mr Manas A. Pathak!