I want to write two scheduled jobs for my Ubuntu 14.04.4 server. The jobs need to be sequential.
The first job should unzip a .gz file (SQL Dump) and then import the table "myTable" into MySQL Database (localhost).
The second job (written using Pentaho Data Integration tool) extracts data from the table "myTable" , transforms it and loads it into a new database.
I could have accomplished the first task using pentaho PDI spoon but it doesn't provide any function to unzip a .gz file & after some research and coming accross these posts :
http://forums.pentaho.com/showthread.php?82566-How-to-use-the-content-of-a-tar-gz-file-in-Kettle
How to uncompress and import a .tar.gz file in kettle?
I have gathered that I should manually write a job to accomplish the first task i.e. unzip a .gz file and then import the table "myTable" into MySQL Database.
My question is that how to create a cron job that executes the two sequentially i.e. first job first completes and then the second is executed.
If there is any better alternative approach to this please suggest.
You can make use of the "SHELL" step in a PDI job. Code the unzip portion of your code in the shell step followed sequentially by your transformation. A sample image looks like this:
Now you can schedule this complete job in CRON or any other scheduler. No need for separate scripts.
Note: This works only in a linux env. which i assume you are using.
Hope this helps :)