Given a table with 360 days of data, we want to partition it by date to improve performance. Do we need to use following SELECT command for each date? Any more efficient way to do this?
INSERT INTO TABLE <new_table> Partition (dt='2015-07-01')
SELECT * from <table> WHERE dt='2015-07-01'
First make your table:
create db.my_table(column1 int, column2 string,
-- ...
)
comment 'I like paritioned tables'
partitioned by(dt string)
location '/path/to/file';
Now you can load the data into dt partitions:
insert overwrite into table db.my_table partition (dt) select * from other_table;