Search code examples
hadoopcronapache-pigbigdata

Pig script scheduled by crontab not giving result


I have pig script which when I run from pig(map reduce mode) gives proper result but when I schedule from crontab does not store output as per the script.

Pig script is,

a1 = load '/user/training/abhijit_hdfs/id' using PigStorage('\t') as (id:int,name:chararray,desig:chararray); 
a2 = load '/user/training/abhijit_hdfs/trips' using PigStorage('\t') as (id:int,place:chararray,no_trips:int); 
j = join a1 by id,a2 by id;
g = group j by(a1::id,a1::name,a1::desig);`  
`su = foreach g generate group,SUM(j.a2::no_trips) as tripsum; 
ord = order su by tripsum desc; 
f2 = foreach ord generate $0.$0,$0.$1,$0.$2,$1; 
store f2 into '/user/training/abhijit_hdfs/results/trip_output' using PigStorage(' ');

Crontab is,

[training@localhost ~]$ crontab -l
40 3 * * * /home/training/Abhijit_Local/trip_crontab.pig

Please Guide.


Solution

  • Your crontab is attempting to treat the Pig script as an executable file and run it directly. Instead, you will likely need to pass it through the pig command explicitly, as described in the Apache Pig documentation on Batch Mode. You may also find it helpful to redirect stdout and stderr output to a log file somewhere in case you need to troubleshoot failures.

    40 3 * * * pig /home/training/Abhijit_Local/trip_crontab.pig 2>&1 > /some/path/to/logfile
    

    Depending on PATH environment variable settings, you might find that it's necessary to specify the absolute path to the pig command.

    40 3 * * * /full/path/pig /home/training/Abhijit_Local/trip_crontab.pig 2>&1 > /some/path/to/logfile