Loading data using Load Data in HIve/Impala

I can able to load data into hive using following command :

LOAD DATA   INPATH '/xx/person/a.csv' INTO TABLE person PARTITION (age = 30);

in above statement , age=30 is partition where data has to be stored.

what if a.csv actually have the age column inside? Is there a way to get hive to correctly insert each line of a.csv into my person table under the right partition with one LOAD DATA statement?

Solution

LOAD DATA only support static partitioning: "When the LOAD DATA statement operates on a partitioned table, it always operates on one partition at a time."

INSERT, on the other hand, supports dynamic partitioning: "If a partition key column is mentioned but not assigned a value, [...] the unassigned columns are filled in with the final columns of the SELECT list."

So what you can do is define a table over the source data, optionally also define a view to move the partition columns to the final positions, and finally use insert into [...] select [...] to populate the partitioned table from the view.