I am new to PIG and trying to analyse UberDataSet for 2 months to find out on which day more trips were booked.
Format:
B02617,2/27/2015,1551,14677
B02598,2/27/2015,1114,10755
B02512,2/27/2015,272,2056
B02764,2/27/2015,4253,38780
Pig Script1:
A = Load 'UberDataSet.txt' using PigStorage(',') as
(base:chararray, tripdate:datetime, cars:int, tripkms:int);
DESCRIBE A;
DUMP A;
I am able to see that tripdate is of datetime type but I am getting only ,, in output but not dates.
Output:
(B02682,,1395,12693)
(B02617,,1473,12811)
(B02764,,3934,31957)
(B02598,,1134,10661)
(B02617,,1539,14461)
(B02682,,1465,13814)
(B02512,,243,1797)
Then I tried like this.
Pigscript2:
A = Load 'UberDataSet.txt' using PigStorage(',') as
(base:chararray, tripdate:chararray, cars:int, tripkms:int);
B = FOREACH A GENERATE tripdate;
C = FOREACH B GENERATE ToDate(tripdate,'yyyy-MM-dd') as mytripdate;
DESCRIBE C;
DUMP C;
Job Failed with an error message:
Job DAG: job_1495878748804_1697 2017-06-10 16:58:32,785 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2017-06-10 16:58:32,790 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias C. Backend error : org.apache.pig.b ackend.executionengine.ExecException: ERROR 0: Exception while executing [POUserFunc (Name: POUserFunc(org.apache.pig.builtin.ToDate2ARGS)[datetime] - sc ope-25 Operator Key: scope-25) children: null at []]: java.lang.IllegalArgumentException: Invalid format: "date" Details at logfile: /home/manasa.testing_gmail/pig_1497109612992.log
There is some question related to this problem but could not get right solution or my problem. Loading datetime format files using PIG
I tried to change the date format to 'MM/dd/yyyy' also in
"C = FOREACH B GENERATE ToDate(tripdate,'yyyy-MM-dd') as mytripdate;" keeping remaining script same... But I am getting same error saying about dateformat....
Can anyone help me to go further...
Thanks in advance....
You have to use your second pig script as pig have issues to load datetime datatype from log.
Reason why it is not working :
The format of date in your dataset/log and the format you are passing with pig script is not the same. That's why you're getting this error
Format date in your log is 'MM/dd/yyyy'
C = FOREACH B GENERATE ToDate(tripdate,'yyyy-MM-dd') as mytripdate;
While according to your script it should be 'yyyy-MM-dd'
Solution: You can simply copy paste below lines just by inserting log path in your system
A = Load '/tmp/a.log' using PigStorage(',') as (base:chararray, tripdate:chararray, cars:int, tripkms:int);
B = FOREACH A GENERATE tripdate;
C = FOREACH B GENERATE ToDate(tripdate,'MM/dd/yyyy') as mytripdate;
you will get output as
(2015-02-27T00:00:00.000+05:30)
(2015-02-27T00:00:00.000+05:30)
(2015-02-27T00:00:00.000+05:30)
(2015-02-27T00:00:00.000+05:30)
now if you want a further formatting in date you can use ToString() funcation on it.
D = FOREACH C GENERATE ToString(mytripdate,'yyyy-MM-dd') as mytripdate;
you will get output like this
(2015-02-27)
(2015-02-27)
(2015-02-27)
(2015-02-27)