I have some data stored in a hive table whose fields are - date,AverageTemperature,Longitude,Latitude,City and Country.
I want to partition my data based on city and country in such a way that, city partition is present inside the country partition. For example, if I have data given below,
Then, after creating the partitions and moving the data into the partitions, the data should be stored in the below form -
Table Name
|
-- Country-J
| |
| -- City-A
| -- City-B
|
-- Country-K
|
-- City-A
-- City-B
I know about creating single partitions but not about nested partitions. I have tried looking over the web, but haven't got any solutions till now. Any help will be much appreciated. Thanks!
You can do this by partitioning on multiple columns. Order of partition columns determines subfolders.
Create your table like this -
CREATE TABLE mytable(
ID BIGINT,
NAME STRING
)
COMMENT 'Multi level partitioned table'
PARTITIONED BY(Country STRING, City STRING)
This will create a partition structure(physical folders) like you are mentioned in your question.
|
-- Folder_Country-J
| |
| -- Folder_City-A
- files for Country-J and City-A
| -- Folder_City-B
- files for Country-J and City-B
|
-- Folder_Country-K
|
-- Folder_City-A
- files for Country-K and City-A
-- Folder_City-B
- files for Country-K and City-B