Search code examples
mysqlsqloracle-databasehiverdbms

Hive - Ways to standardize incoming date fields n hive?


I've got datasources from all over using mysql, oracle, etc. Each datasource stores the date as a record in a table(s), but the format is not standard and can even vary from table to table in the same datasource (yyyy-MM-dd, yyyy-dd-MM, MM-dd-yyyy, yyyy-MMM-dd HH:mm:SS:ss, etc.).

What are some options to standardize these different date fields to store in hive? Pig?


Solution

  • If you are using Sqoop for pulling data to Hive, you can write your own query to get the date in a specific standard format.

    sqoop --options-file <Source RDMS options file> 
    --query "select to_char(start_date,'mm/dd/yyyy') as my_date from SALES"