Search code examples
oraclehadoopapache-sparkhiveavro

How to write date / timestamp string to Date timestamp column in Oracle DB?


I have stored some Oracle tables in Hadoop using AVRO file format and Hive external tables to access the data.
I have stored Date and Timestamp values as a formatted String, using the TO_CHAR function from Oracle on the import.

Now I want to export this exact data back with Spark to an Oracle table having a Date column. I use the command:

// Create a data frame from the Hive table
val data = sqlContext.sql("select * from avro_table")

// export df to existing oracle table
data.write.mode("overwrite").jdbc(jdbcString, "tableName", prop)

But then i get the error:

ORA-00902: invalid data type

This is because it tries to insert a string into a date column. Is there a safe way to insert a date / timestamp string from a Spark dataframe to an Oracle date / timestamp column? With safe i mean do not lose any timezone information.


Solution

  • You should use to_date, to_timestamp and/or date_format functions to do the transformation from stringified date/timestamp values to their corresponding type-aware ones.

    date_format(dateExpr: Column, format: String): Column Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.

    to_date(e: Column, fmt: String): Column Converts the column into a DateType with a specified format (see http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html) return null if fail.

    to_timestamp(s: Column, fmt: String): Column Convert time string to a Unix timestamp (in seconds) with a specified format (see http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html) to Unix timestamp (in seconds), return null if fail.

    Use select or withColumn operators.

    A sample code could be as follows:

    data.withColumn("real_date", date_format(...))
      .write
      .mode("overwrite")
      .jdbc(jdbcString, "tableName", prop)