Search code examples
datehadoophiveambarihdp

Hive - the correct way to permanently change the date and type in the entire column


I would be grateful if someone could explain here step by step what the process of changing the date format and column type from string to date should look like in the table imported via Hive View to HDP 2.6.5. The data source is the well-known MovieLens 100K Dataset set ('u.item' file) from: https://grouplens.org/datasets/movielens/100k/

$ hive --version is: 1.2.1000.2.6.5.0-292
Date format for the column is: '01-Jan-1995'
Data type of column is: 'string'
ACID Transactions is 'On'

Ultimately, I would like to convert permanently the data in the entire column to the correct Hive format 'yyyy-MM-dd' and next column type to 'Date'.
I have looked at over a dozen threads regarding similar questions before. Of course, the problem is not to display the column like this, it can be easily done using just:

SELECT from_unixtime(unix_timestamp(prod_date,'dd-MMM-yyyy'),'yyyy-MM-dd') FROM moviesnames;

The problem is to finally write it down this way. Unfortunately, this cannot be done via UPDATE in the following way, despite the inclusion of atomic operations in Hive config.

UPDATE moviesnames SET prodate = (select to_date(from_unixtime(UNIX_TIMESTAMP(prod_date,'dd-MMM-yyyy'))) from moviesnames);

What's the easiest way to achieve the above using Hive-SQL? By copying and transforming a column or an entire table?


Solution

  • Try this:

    UPDATE moviesnames SET prodate = to_date(from_unixtime(UNIX_TIMESTAMP(prod_date,'dd-MMM-yyyy')));