Search code examples
hadoophivehiveqlpartition

hive add columns on partitioned table does not work


I share my experience about adding columns on a partitioned hive table. As you can see, despite the CASCADE function, the ALTER brakes my table :(

add columns on partitioned table

table description

CREATE TABLE test (
a                       string,      
b                       string,
c                       string
)
PARTITIONED BY (
x                       string,
y                       string, 
z                       string
)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
TBLPROPERTIES (
'orc.compress'='SNAPPY'
);

duplicate the table

CREATE TABLE test_tmp...

hadoop distcp hdfs://.../test/* dfs://.../test_tmp

MSCK REPAIR TABLE test_tmp;

SELECT * FROM test_tmp
LIMIT 100

check : OK (I get results)

modify the table

ALTER TABLE test_tmp
ADD COLUMNS(
aa  timestamp,
bb  string,
cc  int,
dd  string
) CASCADE;

SELECT * FROM test_tmp
LIMIT 100

...
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:19, Vertex vertex_1502459312997_187854_4_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
... 1 statement(s) executed, 0 rows affected, exec/fetch time: 21.655/0.000 sec  [0 successful, 1 errors]

check : KO (I get this error)


Solution

  • If you are using Hive 0.x or 1.x then you are probably a victim of...

    HIVE-10598   Vectorization borks when column is added to table.

    ...which is specific to ORC format, even if it's not apparent from the JIRA label.

    There is a partial fix as of Hive 2.0 (i.e. ADD is fixed, but DROP / RENAME / CHANGE are still crippled) thanks to

    HIVE-11981   ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)

    And another related fix as of Hive 2.1.1 for CHANGE

    HIVE-14355   Schema evolution for ORC in llap is broken for Int to String conversion

    To be continued...