Search code examples
hadoophive

How to create partitions inside partition in hive?


I have some data stored in a hive table whose fields are - date,AverageTemperature,Longitude,Latitude,City and Country.

I want to partition my data based on city and country in such a way that, city partition is present inside the country partition. For example, if I have data given below,

  1. 06-09-2023,23.2,87.5N,87.4W,City-A,Country-J
  2. 06-08-2023,23.2,87.6N,87.2W,City-B,Country-J
  3. 05-08-2022,23.1,84.2N,88.3W,City-A,Country-K
  4. 04-08-2022,22.1,81.2N,82.3W,City-B,Country-K

Then, after creating the partitions and moving the data into the partitions, the data should be stored in the below form -

Table Name 
    |
     -- Country-J
    |      |
    |       -- City-A
    |       -- City-B
    |
     -- Country-K
           |
            -- City-A
            -- City-B

I know about creating single partitions but not about nested partitions. I have tried looking over the web, but haven't got any solutions till now. Any help will be much appreciated. Thanks!


Solution

  • You can do this by partitioning on multiple columns. Order of partition columns determines subfolders.
    Create your table like this -

    CREATE TABLE mytable(
    ID BIGINT,
    NAME STRING 
    )
    COMMENT 'Multi level partitioned table'
    PARTITIONED BY(Country STRING, City STRING)
    

    This will create a partition structure(physical folders) like you are mentioned in your question.

     
        |
         -- Folder_Country-J
        |      |
        |       -- Folder_City-A
                     - files for Country-J and City-A
        |       -- Folder_City-B
                     - files for Country-J and City-B
        |
         -- Folder_Country-K
               |
                -- Folder_City-A
                     - files for Country-K and City-A
                -- Folder_City-B
                     - files for Country-K and City-B