Search code examples
apache-pigto-datestring-to-datetime

Error in ToDate function in Pig


I have datetime data in my input and would like to load it correctly from Pig. I googled and learned it's suggested to load as chararray then covert to datetime with ToDate function. However, the same script works for one input but not another, which have the identical data format. My pig version is 0.12.1. The script I'm using:

A = load '/user/ss/debug/debug' using PigStorage(',') as (AUDIT:chararray,JOB:chararray,TYPE:chararray,ID:long,STATUS_ID:long,POOL_NAME:chararray,SLA_PRIORITY:long,STATUS:chararray,RUN_ID:long,TASK:chararray,SCENARIO_ID:long,CREDIT_CNT:long,COMM_CNT:long,BONUS_CNT:long,PAYMENT_CNT:long,RUN_TIME:long,START_TIME:chararray,END_TIME:chararray,ITEM_COUNT:long); 

B = foreach A generate JOB, TYPE, ID, CREDIT_CNT, COMM_CNT, BONUS_CNT, PAYMENT_CNT, ToDate(START_TIME, 'yyyy-MM-dd HH:mm:ss') as (START_TIME_DT:datetime), ToDate(END_TIME, 'yyyy-MM-dd HH:mm:ss') as (END_TIME_DT:datetime), START_TIME, END_TIME, ITEM_COUNT; 

dump B;

The data looks like following:

Input that reports errors:

D789FD70FE9E3ABBE0432165880A09E1,D789FD70FE9D3ABBE0432165880A09E1,VA,123,4946586,DEFAULT,1,Completed,,DD13,,0,0,0,0,0,2013-03-10 02:41:14,2013-03-10 02:41:16,0

Input that run correctly:

C888E618A7740A71E0432165880ABCA3,C888E618A7730A71E0432165880ABCA3,VA,123,4680120,DEFAULT,1,Completed,,DD12,,0,0,0,0,0,2012-08-31 04:16:56,2012-08-31 04:17:02,0
C888FC5DA4B212F3E0432165880A3C34,C888FC5DA4B112F3E0432165880A3C34,VA,123,4680125,DEFAULT,1,Completed,,DD12,,0,0,0,0,0,2012-08-31 04:17:51,2012-08-31 04:17:57,0
C888FC5DA4B912F3E0432165880A3C34,C888FC5DA4B812F3E0432165880A3C34,VA,123,4680127,DEFAULT,1,Completed,,DD14,,0,0,0,0,0,2012-08-31 04:18:17,2012-08-31 04:18:22,0

I don't understand why the identical input schema and scripts can have different results. The error says "Cannot parse "2013-03-10 02:41:14": Illegal instant due to time zone offset transition (America/Los_Angeles)".

The error log looks like following:

Backend error message
---------------------
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing [POUserFunc (Name: POUserFunc(org.apache.pig.builtin.ToDate2ARGS)[datetime] - scope-120 Operator Key: scope-120) children: null at []]: java.lang.IllegalArgumentException: Cannot parse "2013-03-10 02:41:14": Illegal instant due to time zone offset transition (America/Los_Angeles)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:338)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:707)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:352)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
	at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.IllegalArgumentException: Cannot parse "2013-03-10 02:41:14": Illegal instant due to time zone offset transition (America/Los_Angeles)
	at org.joda.time.format.DateTimeParserBucket.computeMillis(DateTimeParserBucket.java:336)
	at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:672)
	at org.apache.pig.builtin.ToDate2ARGS.exec(ToDate2ARGS.java:45)
	at org.apache.pig.builtin.ToDate2ARGS.exec(ToDate2ARGS.java:33)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:330)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextDateTime(POUserFunc.java:422)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:329)
	... 13 more

Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias C. Backend error : Exception while executing [POUserFunc (Name: POUserFunc(org.apache.pig.builtin.ToDate2ARGS)[datetime] - scope-120 Operator Key: scope-120) children: null at []]: java.lang.IllegalArgumentException: Cannot parse "2013-03-10 02:41:14": Illegal instant due to time zone offset transition (America/Los_Angeles)

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias C. Backend error : Exception while executing [POUserFunc (Name: POUserFunc(org.apache.pig.builtin.ToDate2ARGS)[datetime] - scope-120 Operator Key: scope-120) children: null at []]: java.lang.IllegalArgumentException: Cannot parse "2013-03-10 02:41:14": Illegal instant due to time zone offset transition (America/Los_Angeles)
	at org.apache.pig.PigServer.openIterator(PigServer.java:870)
	at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774)
	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
	at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
	at org.apache.pig.Main.run(Main.java:541)
	at org.apache.pig.Main.main(Main.java:156)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing [POUserFunc (Name: POUserFunc(org.apache.pig.builtin.ToDate2ARGS)[datetime] - scope-120 Operator Key: scope-120) children: null at []]: java.lang.IllegalArgumentException: Cannot parse "2013-03-10 02:41:14": Illegal instant due to time zone offset transition (America/Los_Angeles)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:338)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:707)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:352)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
	at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.IllegalArgumentException: Cannot parse "2013-03-10 02:41:14": Illegal instant due to time zone offset transition (America/Los_Angeles)
	at org.joda.time.format.DateTimeParserBucket.computeMillis(DateTimeParserBucket.java:336)
	at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:672)
	at org.apache.pig.builtin.ToDate2ARGS.exec(ToDate2ARGS.java:45)
	at org.apache.pig.builtin.ToDate2ARGS.exec(ToDate2ARGS.java:33)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:330)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextDateTime(POUserFunc.java:422)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:329)

Any help or suggestion will be highly appreciated. Thanks a lot!


Solution

  • Its looks like the datetime "2013-03-10 02:41:14" doesn't exist in 'America/Los_Angeles' timezone. This may due to day light saving time in US. The same inputs are working fine in my time zone, so to solve this issue you need to specfiy the timezone 'America/Los_Angeles' as third argument in the ToDate function.

    Can you change the ToDate function like this?

    ToDate(START_TIME, 'yyyy-MM-dd HH:mm:ss','America/Los_Angeles')