Building a Data Mart with Pentaho Data Integration Video Review by Diethard Steiner, Packt Publishing
The Building a Data Mart with Pentaho Data Integration Video by Diethard Steiner from Packt Publishing is more than just a course on how to use Pentaho Data Integration, it also implements and uses the principals of the Data Warehousing (and I even heard the name of Ralph Kimball in the video). Indeed, a video watcher should be familiar with its concepts as the Star Schema, Slowly Changing Dimension types, etc. so I suggest prior to watching this course to consider skimming through the Data Warehouse concepts (if unfamiliar) or even better, read the excellent Ralph’s The Data Warehouse Tooolkit. By the way, the author expands beyond using Pentaho along to MySQL and MonetDB which is a real icing on the cake!
Indeed, I even suggest the name of the course should be ‘Building a Data Warehouse with Pentaho’.
To successfully complete the course one needs to know some Linux (Ubuntu used in the course), the VI editor and the Bash command shell, but it seems that similar requirements would also apply to the Windows OS. Additionally, knowing some basic SQL would not hurt. As I had said, MonetDB is used in this course several times which seems to be not anymore complex than say MySQL, but based on what I read is very well suited for fast querying big volumes of data thanks to having a columnstore (vertical data storage). I don’t see what else can be a barrier, the material is very digestible.
On this note, I must add that the author does not cover how to acquire the software, so here is what I found may help:
- Pentaho: the free Community Edition must be more than anyone needs to learn it. Or even go into a POC.
- MonetDB can be downloaded (exists for both, Linux and Windows) from http://goo.gl/FYxMy0 (just see the appropriate link on the left).
- The author seems to be using Eclipse to run SQL code, one can get it from http://goo.gl/5CcuN. To create, or edit database entities and/or schema otherwise one can use a universal tool called SQuirreL, get it from http://squirrel-sql.sourceforge.net.
Next, I must confess Diethard is very knowledgeable in what he does and beyond. However, there will be some accent heard to the user of the course especially if one’s mother tongue language is English, but it I got over it in a few chapters.
I liked the rate at which the material is being presented, it makes me feel I paid for every second
Eventually, my impressions are:
- Pentaho is an awesome ETL offering, it is worth learning it very much (I am an ETL fan and a heavy user of SSIS)
- MonetDB is nice, it tickles my fancy to know it more
- Data Warehousing, despite all the BigData tool offerings (Hive, Scoop, Pig on Hadoop), using the traditional tools still rocks
- Chapters 2 to 6 were the most fun to me with chapter 8 being the most difficult.
In terms of closing, I highly recommend this video to anyone who needs to grasp Pentaho concepts quick, likewise, the course is very well suited for any developer on a “supposed to be done yesterday” type of a project. It is for a beginner to intermediate level ETL/DW developer. But one would need to learn more on Data Warehousing and Pentaho, for such I recommend the 5 star Pentaho Data Integration 4 Cookbook.
Disclaimer: I received this video from the publisher for the purpose of a public review.