I have pig script which when I run from pig(map reduce mode) gives proper result but when I schedule from crontab does not store output as per the script.
Pig script is,
a1 = load '/user/training/abhijit_hdfs/id' using PigStorage('\t') as (id:int,name:chararray,desig:chararray);
a2 = load '/user/training/abhijit_hdfs/trips' using PigStorage('\t') as (id:int,place:chararray,no_trips:int);
j = join a1 by id,a2 by id;
g = group j by(a1::id,a1::name,a1::desig);`
`su = foreach g generate group,SUM(j.a2::no_trips) as tripsum;
ord = order su by tripsum desc;
f2 = foreach ord generate $0.$0,$0.$1,$0.$2,$1;
store f2 into '/user/training/abhijit_hdfs/results/trip_output' using PigStorage(' ');
Crontab is,
[training@localhost ~]$ crontab -l
40 3 * * * /home/training/Abhijit_Local/trip_crontab.pig
Please Guide.
Your crontab is attempting to treat the Pig script as an executable file and run it directly. Instead, you will likely need to pass it through the pig
command explicitly, as described in the Apache Pig documentation on Batch Mode. You may also find it helpful to redirect stdout and stderr output to a log file somewhere in case you need to troubleshoot failures.
40 3 * * * pig /home/training/Abhijit_Local/trip_crontab.pig 2>&1 > /some/path/to/logfile
Depending on PATH
environment variable settings, you might find that it's necessary to specify the absolute path to the pig
command.
40 3 * * * /full/path/pig /home/training/Abhijit_Local/trip_crontab.pig 2>&1 > /some/path/to/logfile