Search code examples
hadoophivehql

How to create table in Hive with specific column values from another table


I am new to Hive and have some problems. I try to find a answer here and other sites but with no luck... I also tried many different querys that come to my mind, also without success.

I have my source table and i want to create new table like this.

Were:

  • id would be number of distinct counties as auto increment numbers and primary key
  • counties as distinct names of counties (from source table)

Solution

  • You could follow this approach.

    A CTAS(Create Table As Select) with your example this CTAS could work

    CREATE TABLE t_county 
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
    STORED AS TEXTFILE AS
    WITH t AS(
    SELECT DISTINCT county, ROW_NUMBER() OVER() AS id
    FROM counties)
    SELECT id, county
    FROM t;
    

    You cannot have primary key or foreign keys on Hive as you have primary key on RBDMSs like Oracle or MySql because Hive is schema on read instead of schema on write like Oracle so you cannot implement constraints of any kind on Hive.