Search code examples
hadoophivehadoop-partitioningexternal-tables

Inserting Partitioned Data into External Table in Hive


I needed few clarification regarding inserting data into External Table.

I have created an external parquet table, which is partitioned by week pointing to a hadoop location, after this I have moved the data (a .csv file) to that location.

My doubt is since the table is partitoned by week , even if I just move the file to that directory , hive would not read and I have to use insert command ,compared to say when we have a hive table not partitioned , which will read directly from that hadoop path


Solution

  • You need to consider what data is within the CSV. For example, if you partitioned timed data by years, you wouldn't copy a CSV containing several year values into a single partition. You would need to split the Dataset.

    even if I just move the file to that directory , hive would not read and I have to use insert command

    Correct. Especially since it's a parquet serde trying to read a CSV.

    To clarify, Hive would read the CSV if placed in a table that was stored as text.

    You need a separate table where you can read text files, then insert into the other, while converting file formats