Search code examples
apache-sparkhivehiveddl

Is there anyway to change the datatype of the non-partition column of the external hive table?


I am trying to change the datatype of the non-partition column of the external hive table which is a partitioned table?

For example, I have a partitioned table called trans in which I have 5 columns. transactions(data type = string ), product(data type = string ), quantity(data type = decimal(7,2)), year(data type = int ), week(data type = int), date(data type = date)

trans table is external and partitioned on year, week, date.

Now I have to change the datatype of the quantity from decimal(7,2) to decimal(15,2). How we can do it? I don't want to reprocess the whole table with the updated datatype. Is there any direct query or command to achieve this?

I did use the command/query:

ALTER TABLE trans CHANGE quantity quantity decimal(15,2);

But this is changing the datatype on metadata in metastore of the hive but not on data level means in the Parquet file it is not changing the datatype because of which when I am trying to select the quantity column from spark I am getting below mentioned error.

Py4JJavaError: An error occurred while calling o108.showString. 
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 10 times, most recent failure: Lost task 0.9 in stage 2.0 : java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary

Solution

  • There is a Jira about the same: HIVE-6784 - parquet-hive should allow column type change

    It is closed with Won't Fix resolution because of performance impact, please read Jira comments for more details.

    The solution is to create another table, insert overwrite from original table, then remove original table and rename new table.