Search code examples
scheduled-taskskettlegzippdipentaho-spoon

Scheduled sequential jobs in Ubuntu server


I want to write two scheduled jobs for my Ubuntu 14.04.4 server. The jobs need to be sequential.

The first job should unzip a .gz file (SQL Dump) and then import the table "myTable" into MySQL Database (localhost).

The second job (written using Pentaho Data Integration tool) extracts data from the table "myTable" , transforms it and loads it into a new database.

I could have accomplished the first task using pentaho PDI spoon but it doesn't provide any function to unzip a .gz file & after some research and coming accross these posts :

http://forums.pentaho.com/showthread.php?82566-How-to-use-the-content-of-a-tar-gz-file-in-Kettle

How to uncompress and import a .tar.gz file in kettle?

I have gathered that I should manually write a job to accomplish the first task i.e. unzip a .gz file and then import the table "myTable" into MySQL Database.

My question is that how to create a cron job that executes the two sequentially i.e. first job first completes and then the second is executed.

If there is any better alternative approach to this please suggest.


Solution

  • You can make use of the "SHELL" step in a PDI job. Code the unzip portion of your code in the shell step followed sequentially by your transformation. A sample image looks like this:

    enter image description here

    Now you can schedule this complete job in CRON or any other scheduler. No need for separate scripts.

    Note: This works only in a linux env. which i assume you are using.

    Hope this helps :)