Search code examples
hiveinserthiveqlhive-partitions

Is there a workaround to my attempted Hive insert


I copy the structure of schema2.card_master over to schema1.card_master using

hive> create table schema1.card_master like schema2.card_master;

That works, and it is partitioned as was the original on a field. This new table has hundreds of fields so they are inconvenient to list out, but I want all the fields populated from the original table using a Join filter. Now I want to populate it using a JOIN:

hive> insert overwrite table schema1.card_master (select * from schema2.card_master ccm INNER JOIN schema1.accounts da on ccm.cm13 = da.cm13);

FAILED: SemanticException 1:23 Need to specify partition columns because the destination table is partitioned. Error encountered near token 'cmdl_card_master'

I checked the partition that was copied over, and it was a field mkt_cd that could take on 2 values, US or PR.

So I try

hive> insert overwrite table schema1.card_master PARTITION (mkt_cd='US')  (select * from schema2.card_master ccm INNER JOIN schema1.accounts da on ccm.cm13 = da.cm13);
FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different ''US'': Table insclause-0 has 255 columns, but query has 257 columns.
hive>

What is going on here? Is there any work around to load my data without having to explicitly mention all the fields in the Select statement for schema2.card_master ?


Solution

  • select * selects columns from each table in a join. Use select ccm.* instead of select * to select columns from ccm table only. Also remove static partition specification ('US'), use dynamic instead, because ccm.* contains partition column, and when you are loading static partition you should not have partition column in the select.

    set hive.exec.dynamic.partition=true;
    set hive.exec.dynamic.partition.mode=nonstrict;
    
    insert overwrite table schema1.card_master partition(mkt_cd) --dynamic partition
    select ccm.* --use alias 
      from schema2.card_master ccm 
           INNER JOIN schema1.accounts da on ccm.cm13 = da.cm13
    ;