Search code examples
javaoracle-databaseapachehadoopsqoop

Error ORA-01843 when importing table to Sqoop


I've already imported a table with 54 million rows to HDFS of my Hadoop Cluster, and now, importing from Oracle DB a slice of that table
[I've created a View, selecting just a short period. Resulted in 260K rows only]
, i'm getting the error below:

Error: java.io.IOException: SQLException in nextKeyValue at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.sql.SQLDataException: ORA-01843: not a valid month

The command used was:

sqoop import --connect jdbc:oracle:thin:@<my_host:port:instanceia> --username <user> --password <pass> --table <DB.TABLE> --split-by NUM_LINE --m 10 --target-dir /user/rodrigo/SLICED_TABLE 2>&1|tee import_tab_menor.txt

I'm a Hadoop/Hive/Sqoop newbie and this was just for learning, but I really want to know how to solve this.

I thought about creating a view, changing date data type to string just to import the table, but it can give me inacurate results.

EDIT:

I have discovered that it was an internal DB error in the view creation. THe way I was creating the view returned me the results I needed. But it was a wrong method to creating it.

In the query inside Sqoop you need to separate the last AND \$CONDITIONS, and the format to send the query to Oracle is to_date('02/04/16', 'dd/mm/yy'). And then it worked.

Just to bypass the error, I created a view changing the datatype of date field to varchar and for some reason it won't work too.

So I've changed again with to_number(to_char('02/04/16', 'dd/mm/yy')) to make it work.

I won't close the question because I want an answer for this problem. And if I find it, I will share.


Solution

  • The error was in the way I created the view:

    I've created the view using literal format to select date intervals.

    Something like

    DT_F>'02/04/16

    But when I create the View using

    to_date('02/04/16', 'dd/mm/yy')

    the error ORA-01843 don't occur anymore.