I'm new to this kind of things so, please ask me more in case i didn't give enough clarified question
These are run in hive
This is the avro table
CREATE EXTERNAL TABLE cm
PARTITIONED BY (
`year` STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED as INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 's3://blabla/cm'
TBLPROPERTIES (
'avro.schema.url'='s3://blabla/cm')
I'm trying to insert overwite to that avro table from orc table. i'm trying to achieve something like this :
INSERT OVERWRITE TABLE cm (a,b) PARTITION (year=p_year)
SELECT a,b,p_year FROM orctable
Use DISTRIBUTE BY
to add dynamic partition. Using DISTRIBUTE BY
your query should be like this:
set hive.exec.dynamic.partition.mode=nonstrict;
insert into table table_name partition(year)
select a, b, p_year from orc_table DISTRIBUTE BY p_year;
This query will generate a MapReduce job rather than Map-only job. The SELECT-clause will be converted to a plan to the mappers and the output will be distributed to the reducers based on the value of (year) pairs. The INSERT-clause will be converted to the plan in the reducer which writes to the dynamic partitions.