Search code examples
hadoophivehiveqlavrooverwrite

Error insert overwrite from orc table to avro table?


I'm new to this kind of things so, please ask me more in case i didn't give enough clarified question

These are run in hive

This is the avro table

CREATE EXTERNAL TABLE cm 
 PARTITIONED BY (
 `year` STRING
     )
   ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED as INPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  LOCATION 's3://blabla/cm' 
 TBLPROPERTIES (
     'avro.schema.url'='s3://blabla/cm')

I'm trying to insert overwite to that avro table from orc table. i'm trying to achieve something like this :

INSERT OVERWRITE TABLE cm (a,b) PARTITION (year=p_year)
SELECT a,b,p_year FROM orctable

Solution

  • Use DISTRIBUTE BY to add dynamic partition. Using DISTRIBUTE BY your query should be like this:

    set hive.exec.dynamic.partition.mode=nonstrict;
    insert into table table_name partition(year)
    select a, b, p_year from orc_table DISTRIBUTE BY p_year;
    

    This query will generate a MapReduce job rather than Map-only job. The SELECT-clause will be converted to a plan to the mappers and the output will be distributed to the reducers based on the value of (year) pairs. The INSERT-clause will be converted to the plan in the reducer which writes to the dynamic partitions.