Category Archives: Book Review

FuelPHP Application Development Blueprints by Sébastien Drouyer, Packt Publishing Book Review

You will not come across many robust HMVC web frameworks and especially those built on top of the excellent and now in its rebirth PHP. To not to reiterate over the things how HMVC is much more advanced than the vanilla MVC you can read a nice blog post dedicated to this topic. If you do not bother visiting it I will not be lazy to state it is the familiar MVC, but logically structured. FuelPHP is exactly such an HMVC implementation. So if you ask why HMVC, well, here I offer two choices, ether get a free chapter of the FuelPHP Application Development Blueprints book, or listen to me, well read rather: it allows what not so-web developers call “design by contract” (DbC). But this is what I say after I read this book.

Read the rest of this entry

Advertisements

Neo4j Essentials by Sumit Gupta, Packt Publishing Book Review

I was very happy to see a new book released on the Neo4j subject, which was shortly after I read Learning Neo4j which I reviewed here. I just feel it is a very good timing for the release and that these books need to be read in this sequence, first is because Learning Neo4j is very inspirational in addition to teaching Neo4j basics, and also showing what implementations this database can have in real life.

I must say right away that Neo4j Essentials goes beyond just the essentials, and it will become immediately apparent that Sumit has an in depth knowledge of both Neo4j and enterprise architecture. So you will have a good company.

In terms of setting up your environment, it is uncomplicated, as a bonus, Neo4j comes as a free version which is all you want to learn it, and then build a prototype. Be posted Neo4j will require Java (but not its SDK) because Neo4j itself is built in Java. Oracle Java 7 or newer specifically. As a side note, Neo4j can be even embedded into your standalone application.

To be efficient with the book and technology and make the most out of the book I suggest to familiarize yourself with REST and get an IDE like Eclipse or IntelliJ to run Java code samples from the book. I found them very valuable.

So back to the book, while not exactly structured the way I expected it to be laid out (e.g. troubleshooting and maintenance related items appear early in the book) it actually allows you to navigate efficiently and the content is chained logically.

There are plenty of examples in the book covering various aspects of data processing. While most examples are not very exiting I suspect they serve a good starting point in one’s journey toward efficient processing and representation of related data. I must say the author made special emphasis on covering the newer features of the last data base release (Neo4j 2.0).

The most interesting item discussed in the book to my taste was Spring, it is just often not covered in even books dedicated to Java. Apparently Spring supports Neo4j, strongly. What it means, you can build enterprise grade, data rich web applications. Another awesome topic is clustering.

This book dedicates a lot of details with attention to deployment, maintenance, writing code for performance, aspect oriented programming and more, but just enough at the same time to build a reliable implementation of a modern enterprise grade database centric application.

I am sure this book will be of much help to many of these who embark on a wonderful journey with Neo4j!

scikit-learn Cookbook by Trent Hauck, Packt Publishing Book Review

This book was released back in Fall 2014, but I did not had a chance to read it until recently. A big miss. As far as I can tell, it is the one of the few books covering as much ground as possible in concern to scikit-learn as free Machine Learning (ML) libraries available for Python. In general, the Machine Learning is a fascinated piece of science seeing a lot of traction these days, but it is a tad intimidating to grasp at the beginning, besides, its potential use cases given it fallen into the wrong hands (g-d forbid) can be scary. Otherwise I foresee a huge potential for it’s use in the IOT.

This book aims at easing the ML adoption hurdles providing with not less than 50 recipes which cover pretty much the whole scikit-learn landscape. I could see Trent made every effort to deliver a hight quality product. The book has a supplementary file that covers what an end user needs to install to go through all the material in the book and obtain sample data.

In terms of a general note, since this product is aiming at mostly the data scientist, engineers or research staff many topics are not going to be quite familiar to a wide non-technical or general IT audience, but please ensure you put an extra effort in understanding the concepts. Like I have said, the benefits are enormous. And prepare yourself to scratch your head a few times or more :-). Yes, this is a very advanced book. Yet, it seems that it covers all the possible scenarios and industry fields one can imagine off. Numerous graphics, detailed code samples and output examples, all are ready to copy and paste into the mighty Python REPL.

When I was reading the book I had a task at hand and I concentrated on the KMeans algorithm which is elegantly covered, and I enjoyed the most the chapter on Classifying Data. At the same time I think the cornerstone of the book is chapter 1 on pre-model workflow and the last on the post-model, I just did not see books to date going this far.

While this book is more like an ‘Academia’ publication it does have many practical applications, but for a less Data Science savvy person it desires to have more explanation on why XYZ and ABCs are necessary, or what each library function is used for and under what circumstances one would choose to use it.

Overall it is a tad dry, technical read, but at the same time no extra, volume inflating words were mixed in, so it is worth what you are paying for.

My verdict, it 4.5 our of 5.

Learning Neo4j by Rik Van Bruggen, Packt Publishing Book Review

One of the fascinated technical books I have ever read, while not difficult to understand (hint: Neo4J is well put together), it is very informative, yet elegantly written that to an able to learn on its own reader can open a whole slew of possibilities and opportunities.

I need to step back and tell that I approached this book initially with some scepticism, and while I was on the very first pages of it one of my co-workers pryingly glanced over my shoulder and asked, what are you reading? I replied: “it is a book on a Graph database called Neo4J”. His reaction was, “I did not even know there are such a database type!”. As a database professional though I felt obliged to make myself aware and somewhat educated on alternative or niche databases, this is what I thought then, but now… read on.

Ric Van Bruggen is definitely in love with Neo4J, but he is also a talented teacher, this book could much less of a technical reference than it is without him. Ric tastefully expands from the history and basics of Graph databases up to very practical, yet advance usages and ends the book with the tools ecosystem and even a neat Cypher ref-card. Besides, I think Cypher is the cornerstone to the successful operation of Neo4J. For the impatient reader, Cypher to Neo4J is as SQL to RDBMS’.

The book reads like a story, it gradually builds your level of knowledge and at the same time makes you “feel at home” with Neo4J, it is a very welcoming database.

My interest grew with every page I flipped.

The database gets easily installed on any platform, and for the record, its graphing (visual) abilities are stunning! It can also output data in the tabular form. Yet, based on what Ric stated it is VERY fast – he processed 2.1 mil nodes in under 50 sec on a laptop (disclaimer: I did not run any Cypher code, but I am planning to).

This book, especially its chapters 6 and 7 (use case examples), were of immense interest to me, the Recommendation Engine, Impact Analysis I loved both a lot. I think my case using Neo4J will be on finding the best driving path (based on more than 2 factors) that I want to experiment with.

I figured, Neo4J is supported by a vast number of enterprise grade tools comprising a serious ecosystem in which Neo4J strives to take on new heights.

After reading this book I must say the Graph Databases are seriously overlooked.

Well done, Ric, Packt and etc. A 5 out of 5 from me!

Take Neo4J for spin!

Beginning Data Science with R by Manas A. Pathak, Springer

Beginnng Data Science With R

Continuing on with the Springer series on Computational Intelligence and Complexity I picked another book on the ever increasing in popularity R.

Besides, I read already several books from other publishers in 2014. The books were aiming at different levels, and at people from different professional backgrounds. Myself, a data practitioner, positioned rather away from being a data scientist, sitting closer to the server side, with periodic ETL or Business Intelligence development tasks at hand professional I started to realize the times have changed: each new project requires new depth and breaths of data analysis. Using Excel and its data add-ons does no longer cut it in. I was aware of tools as MATLAB, SAS and SPSS, but boy they cost!

I was always in love with data, linear, discrete algebra and statistics in general so for me R came to the natural choice. Learning tools as R (not just a new language) is quite an endeavour so I resorted to the World Wisde Web in search for good programming reference for R and was able to encounter numerous posts with recommendations. Then every book I picked did not quite stood to its promises. I was in despair.

But no longer, now when I found Beginning Data Science with R I feel empowered.

I will try to outline why this book does deliver.

All the beginner books I tossed into my tablet expected the reader to have an advanced knowledge in statistics. And far less in R. This is an ill conceived approach to me. In fact, R is a complex language with MANY nuances (not quirks IMO). Yet, there are dozens of ways to arrive to the same thing, like in Perl, but don’t let me get started…

The main idea I try to convey is that there was never a book with a good mix or balance per-se between R as a language and where it fits or excels (or delivers) when it comes to statistics or probalistics. This book does. It is very well-rounded. A reader from each level will find much useful information. So I don’t necessarily consider this book a beginner one. It has many reference links a reader can utilize to widen one’s knowledge. And I had so much fun reading this not terribly long book! All the topics are explained very well, with enough intro and concrete examples. Some chapters I see as a bonus, especially the one on text mining (which some from my G+ post do not consider a part of R) and decision trees. These I liked the most, they are poor fun, short, but very practical. Not to mention useful.

To sum up,

What this book is: a comprehensive, yet short tutorial on practical application of R to the modern data science tasks or projects. The book lays a solid foundation to develop your knowledge in R further on, a good guide for what is possible to extract from R and its CRAN (and the packages ecosystem) or even computational and quantitative science itself. Perhaps this book helps in grasping with Machine Learning as well, and other advanced areas of the Data Science.

What this book is not: a reader would need supplementary material to delve deeper into R as a language and may need  extra practice on concrete or narrowed down, specific tasks/applications.

Who I recommend it to: managers who work on data projects, technical team leaders, CS students, Business Intelligence professionals, beginner architects, general computer academia, statisticians, several categories of scientists or researchers as biologists, lab, criminologists, and also Finance pros or actuarials.

Verdict: 5 out of 5, well done Mr Manas A. Pathak!

R for Cloud Computing by A. Ohri, Springer, Book Review

R for Cloud Computing

R for Cloud Computing is not the first book I am reading on R, however, it is my first from Springer.

I picked this title for review due to several reasons. First, but not the major one is because Springer is viewed as an advanced, specialized or narrow subject book shop rather than popular technology content and/or educational material publisher. Another deal maker is that this book’s title sounded like the next step in the corporate, small business or even personal Computational Statistics space. Not just R itself.

It turned out to be the case!

In short, the main idea of this book is to state and proof that using R in the Cloud is a more than a workable idea, but it is very possible in a vast number of ways. And it is, I now thankfully agree. And the competition is tight.

Why it makes sense? In short, since R’s design (as many other programming languages) is to use the local machine runnable memory (RAM) and CPU by default (as of end of 2014, and except when the Snowfall package is used), one can rip enormous benefits from R any scripts developed locally and deployed to the Cloud (let me stress, without any changes) where there is as much RAM and CPU power at your disposal as you need (or can pay for), and therefore the limit to how much data you can process gets lifted

But let me speculate, what remains to be discovered or seen, as well as it is not mentioned in the book is how parallelized R in the Cloud would work. Personally, it is a huge thing, bigger than harnessing the power of the local GPU. There is some ground work laid: http://cran.r-project.org/web/views/HighPerformanceComputing.html but again, it seems to me not progressing fast enough (perhaps as many other grid computing technologies). To me, passing this milestone is of an utter importance to be able to process the data volumes of 2020 . But please read the book to know more, a lot more.

Another supporting item for the Cloud + R scheme I can add is that most end-results are anyways shared on the Web, either in form of a publication, chart, or even a web application. And the Cloud and Web are close neighbours.

OK, more on the book itself. And may be I shall start from an item I did not expect to find in a Springer book: personal interviews. It seems that every chapter in the book has at least one. This says to me Mr. A. Ohri keeps in close contact with and very well respected by the technology leaders in his area of interests, yet that the author keeps abreast with the latest happenings in the R space. I enjoyed many interviews and found them very technologically tasteful and professional. The most I liked is the one with Jeroen Ooms, the person I admire as an advanced data scientist, the inventor of OpenCPU. How useful all the interviews are, hmm, I will let you to decide.

It is needless to say, Ajay made sure there is comprehensive ground covered of what is available to a person working or planning to with R in the Cloud, and it seems to be a non bias coverage based on a well done, prior research, exactly as I expected it to be seen in a Springer book. I made a dozen of bookmarks or so discovering new articles and projects I was not aware of. Thank you Ajay!

Otherwise, the book is opening for experimentation and thought. It is full of practical examples, tons of relevant reference. Alas, several things did not work for me and some links appeared dead.

In terms of closing, do not expect this book to be at a student’s desk, I mean it is not for learning R, even though there is runnable code in images. It in my opinion is targeting a mature R user who wants to expand one’s horizons or a corporate decision maker willing to take one’s enterprise one notch (well actually a lot) further ahead in the game.

My verdict: 4 out of 5. A deserving read, even though more like a collection of stories and collection of technologies. A possibly convincing approach and sure inspiring to take the R community to new heights.

NetBeans IDE 8 Cookbook by David Salter and Rhawi Dantas, Packt Publishing Book Review

Every developer should eventually master at least one IDE. I know many who never bothered. And I do not say you are less productive using only Emacs or VIM. But at times you have to rely on support of a productivity tool. Yet, it seems nowadays it is not quite possible to imaging developing for an enterprise and not utilizing an IDE. Reason is there are so many kinds of projects, not working with an IDE makes even at times impossible to deliver on time. Luckily there is one such book that will help you master one of the best IDEs around: NetBeans. Backed by the well known Oracle Copr. with its roots originating in Xelfi editor a decade ago it became a mature and popular development environment. I must add the book luckily covers the freshly released (Fall 2014) version 8 of the NetBeans IDE.
So more on the book: it covers seemingly as wide range of topics as one can imagine (or not) using at any workplace, from installing the IDE and writing its plug-ins (modules) to using WebServices and JavaFX UI development.
And the material is covered very nicely by David and Rhawi. The book is easy to follow and repeat exercises. The book has plenty if high-res images and is so well structured it makes a come back very easy (I needed a few re-visits to accomplish a task or two). What I liked the most was working with the RESTfull services (not only fun, but it delivers a great automation examples). Testing and profiling was a great skill to learn, a very important aspect, too, not just fun. Version control was very useful and the authors covered a lot of ground and many providers (I only expected Git covered). The enterprise Java was a tad more difficult to grasp, but again having such great teachers it is not intimidating at all.
In short, my verdict is 5 out 5. It is a great book especially for students and those who depart from other IDEs. I must say I liked NetBeans very much, even though I clocked more hours using the Visual Studio and Eclipse.

Getting Started with Red Hat Enterprise Virtualization by Pradeep Subramanian, Packt Publishing Book Review

Even though the virtualization has began in the 60-ies (http://en.wikipedia.org/wiki/Virtualization), the advancements in the OS, computer capacity and especially the CPU capabilities have opened a whole slew of possibilities to the modern IT shops and especially to the SMBs.

RedHat’s implementation is one of the most robust, yet prominent offerings in this area. It allows to provision such logical components as data centres, clusters, hosts, storage, networks, disks and more. And it is not limited to running only the Linux based OSes, Windows 8 or Windows Server 2012 are also supported (well, there is no possibility to accessed them via VNC yet).

Virtualization is great in balancing production loads, providing HA or in scenarios as conducting a POC, temporal machines spawning or on demand scaling out/up.

The chapter 2 on installing RHEV is the most important to understanding how the RedHat’s technology works.

Chapter 4 – Creating Virtual Machines I expect to be the most re-visited as it will be the most often used procedure by an administrator, and I suspect is the centrepiece of the book. It covers both, creating a custom OS image or a template to be re-used.

Chapter 5 – High Availability is my favourite, it was also fun to learn about live migrations, fencing and Resilient Policies, features I even did not expect exists in RedHat virtualization offering. This chapter also contains several very valid gotchas to using the HA mode effectively.

Chapter 5 – Advanced Storage and Networking Features is really a showcase of the RedHat virtualization capabilities and also would be very valuable to the not so technical audience to get familiar with what it offers and how it can help deliver greater flexibility across an enterprise.

Chapter 7, 8 are targeting primarily the RedHat virtualization system administrators, and it appears having superb capabilities to managing its resources and users. Chapter 8 explains how to use Quotas and assign resources.

Chapter 9 makes you start mastering the CLI, as everything Linux command line is the king, as while a GUI is great, the command line offers greater productivity and most importantly – automation, but do not stop there as it is a beginning to start writing automation scripts which sooner or later would necessary.

The book then naturally flows into the chapter on troubleshooting. This chapter, I deem, is absolutely making sense as many things can and undoubtedly some day bound to go awry. Make sure you read this chapter carefully, and in advance of the unforeseen.

The story ends with a short, but helpful chapter on setting the storage peripherals as iSCSI, NAS and NFS. Like the others, this chapter exposes a plethora of command line commands to set everything up you may need. So it is a good idea to read it even if there are no plans to manipulate on directory management or configuring storage, after all, you never know when it may become handy.

Having 178 pages in the book Pradeep Subramanian actually managed to packed quite a lot of insight and knowledge that I expect will trigger several “aha” moments in your IT life.

In my opinion, overall, since it is quite tedious to install and support RedHat EV this book should be invaluable.

Good style, concise and friendly writing, a 5 out 5!

R Graph Essentials by David Alexander Lillis, Packt Publishing Book Review

R as a language has experienced an explosion in adoption in the last several years, and this is despite the proliferation of the spreadheet applications, most notably the Microsoft’s Excel. Besides, R Bloggers came up with a 14 bullet points list explaining why. While I do not agree with all the 14 points I admit that R has many unique capabilities, and one of them is its graphing (or charting, if you wish). R Graph Essentials is the book that aims straight at this strength making you very proficient in producing useful, awesomely looking and most importantly professional grade plots, charts or graphs.

I must say Having David Alexander (the author) on board with you means you are bound to success, I liked his style of writing a lot. David possesses all the necessary skills to cover such a wide topic efficiently, accurately and comprehensively. I admit I had little issues producing most of the plots from the book. On one occasion only I got stuck with qplot not working, but Packt and the awesome R community on G+ replied quickly putting me back on tracks to charting by explaining that I need to install ggplot2. A big thank you!

I advocate the book is best read with your R Studio humming alongside as you will have a ton of fun producing interesting graphs. And it is not important if you run a Linux or Windows.

It was very convenient to have the datasets used for examples saved for later use (I recommend R Studio as one of the reasons as saving the state between sessions in it is trivial). The topics I wish the author could cover is how to put the graphics on the web and make the data obtained from a database, but the book explains how to get your data from files.

In terms of closing, I have to say I benefited a lot from this book. As I work with data most of my time I was able to produce super nice histograms of table data (off flat files) which helped me get better insight into the selectivity of my data and this resulted in better and mew indexes yet some indexes even were removed in our SQL Server databases.

My next to do as pet project is to visualize data movements. I trust it will be a lot of fun and this is all thanks to this book.

Five out of five!

Using Flume: Flexible, Scalable, and Reliable Data Streaming by Hari Shreedharan, O’Reilly Media Book Review

Flexible, Scalable, and Reliable Data Streaming

Using Flume is one of the books from the so called Big Data series. Flume is one of the graduated projects from the Apache foundation incubator and as of time of this writing is at version 1.5 which means (in the OSS terms) a mature product. How battle tested it is I cannot say as I am not using it, but our world increasingly relies on fast and distributed methods of log data processing. I truly believe it is worth investing time in learning tomorrow’s technology and propose using it at the right moment and opportunity. I am confident my own journey through data will naturally take me to using Flume some day, and I may not be surprised if it is happening soon. This book I am sure is going to take a Big Data practitioner (like myself) a step or two further regardless. If you are looking at entering a project or POC involving Flume, then this book is a must. If you are using it already this book is worth your buck too and not only for “just in case”. The work of Hari (who worked at such iconic companies as Yahoo! before Cloudera) is probably fundamental to Flume. Here is why it helps:

  • Assessing whether Flume is the appropriate fit to address your project/business needs/goals;
  • The book has seemingly enough code (Java only) to create simple Flume extensions or indices
  • Full coverage of the three popular data serializations techniques
  • Persisting logs and even in-transit processing
  • Optimization of Flume
  • Performance tuning and monitoring

If you want to know why I gave this book a 4 out of 5 star rating is because

  1. The structure, or flow of the book I see supposed to be different, 1st should be basics 1st, it is not too logically outlined
  2. The book is a tad dry (to my taste maybe), what I mean there are no practical, “from the trenches” examples on why this and that setup, configuration is needed in what circumstances;
  3. Java centric and discusses only the Apache products

Disclaimer: I received a free copy of this book in exchange of writing a review as per the reader review program rules.