How to create dynamic partition using java map reduce, like sql we have group by country column. Example i have country based dataset and need to separate the records based on country ( partition). We can't limit the coutry. since every day will get new country data.
You can leverage the dynamic partitioning feature of Hive to automatically populate partitions based on incoming data. Below example demonstrates auto-partitioning of raw data based upon country
information.
Create a raw data file (country1.csv), which has data for multiple countries
1,USA
2,Canada
3,USA
4,Brazil
5,Brazil
6,USA
7,Canada
Upload this file to a location in HDFS
hadoop fs -mkdir /example_hive
hadoop fs -mkdir /example_hive/country
hadoop fs -put country1.csv /example_hive/country
Create a non-partitioned Hive table on top of the data
CREATE EXTERNAL TABLE country
(
id int,
country string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION 'hdfs:///example_hive/country';
Verify that the Hive table is created correctly
hive (default)> select * from country;
1 USA
2 Canada
3 USA
4 Brazil
5 Brazil
6 USA
7 Canada
Create a partitioned Hive table, with country as the partition
hive (default)> CREATE TABLE country_par
(
id int
)
PARTITIONED BY (country string);
Enable dynamic partitioning
hive (default)> SET hive.exec.dynamic.partition = true;
hive (default)> SET hive.exec.dynamic.partition.mode = nonstrict;
Populate the partitioned table, with Hive automatically putting the data in the right country partition
hive (default)> INSERT INTO TABLE country_par
PARTITION(country)
SELECT id,country FROM country;
Verify that the partitions were created, and populated correctly
hive (default)> show partitions country_par;
country=Brazil
country=Canada
country=USA
hive (default)> select * from country_par where country='Brazil';
4 Brazil
5 Brazil
hive (default)> select * from country_par where country='USA';
1 USA
3 USA
6 USA
hive (default)> select * from country_par where country='Canada';
2 Canada
7 Canada
hive (default)> select country,count(*) from country_par group by country;
Brazil 2
Canada 2
USA 3