Search code examples
hiveimpala

Loading data using Load Data in HIve/Impala


I can able to load data into hive using following command :

LOAD DATA   INPATH '/xx/person/a.csv' INTO TABLE person PARTITION (age = 30);

in above statement , age=30 is partition where data has to be stored.

what if a.csv actually have the age column inside? Is there a way to get hive to correctly insert each line of a.csv into my person table under the right partition with one LOAD DATA statement?


Solution

  • LOAD DATA only support static partitioning: "When the LOAD DATA statement operates on a partitioned table, it always operates on one partition at a time."

    INSERT, on the other hand, supports dynamic partitioning: "If a partition key column is mentioned but not assigned a value, [...] the unassigned columns are filled in with the final columns of the SELECT list."

    So what you can do is define a table over the source data, optionally also define a view to move the partition columns to the final positions, and finally use insert into [...] select [...] to populate the partitioned table from the view.