Talend for Big Data by Bahaaldine Azarmi, Packt Publishing Book Review
Talend for Big Data means exactly it! One of the shortest technical books I read, but sure to the point.
This book does not spend your time unwisely, if you happened to suddenly find yourself on a project involving Hadoop (or its ecosystem components) and you know at least some Talend (if not, I recommend a supplementary book that I also reviewed, Talend Open Studio Cookbook by Packt, too) then this is your book. Print it (if you got an eBook) and place a copy by your desk.
The book nicely covers what I feared complexities of dealing with Hadoop as Hive and Pig (a MR generator, not an animal), which actually turned out to be not true, thanks Talend and its 500+ components that cover 90% of what you need out of Big Data is already there for you to use. To my disbelief Talend actually is a very mature and (in paid variant) fully enterprise ready ETL solution.
The book has 7 chapters, each dedicated to a specific goal that accomplishes an exercise with a particular technology piece.
My favorite is #7: Big Data Architecture and Integration Patterns chapter. The last one, but this is the chapter where you get kind of awarded and start benefiting from the material you ingested.
Chapter 6: Aggregate Data with Pig is alot of fun and showed me a new way of interacting with Pig. It turned to be also a much easier way.
As a side note, I am in love with ETL, in general, I think it has the highest ROI out of all the enterprise tools, yet very much fun to work with and what is best – visually documenting!
Chapter 2: Building your First Big Data Job is like your first swim in deep waters – intimidating, but rewarding, full of uncertainty, but excitement and unforgettable.
All the less relevant topics as setting your training system up are shifted to the appendixes, but I recommend actually starting there if you are new to Cloudera’s Hadoop (CDH) VM distribution and/or VMPlayer (served in role of your Virtual Machine).
It seemed to me that a reader does not need ANY prior knowledge of neither Talend nor Hadoop to accomplish the tasks in the book.
One suggestion I have to the author is instead of basing the examples on MySQL which seems to be out of favor by the user community MariaDB is the equivalent substitute that with the release of version 10 going to capture a lot of attention.
Another point is the Hadoop distribution preference, it seems that Hortonworks offers more bells and whistles, but it is a catchup game anyways.
It is a 5 out 5 stars book, thank you Bahaaldine and Packt!