Search code examples
sqlhiveclouderacloudera-cdhimpala

How to partition a non-partitioned table on Hive?


Given a table with 360 days of data, we want to partition it by date to improve performance. Do we need to use following SELECT command for each date? Any more efficient way to do this?

INSERT INTO TABLE <new_table> Partition (dt='2015-07-01')
SELECT * from <table> WHERE dt='2015-07-01'

Solution

  • First make your table:

    create  db.my_table(column1 int, column2 string,
                         -- ...
    )
    comment 'I like paritioned tables'
    partitioned by(dt string)
    location '/path/to/file';
    

    Now you can load the data into dt partitions:

    insert overwrite into table db.my_table partition (dt) select * from other_table;