I'm using Hive to process my CSV files. I've stored CSV files in HDFS and wanna create tables from those files.
I use the following command:
create external table if not exists csv_table (dummy STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION 'hdfs://localhost:9000/user/hive'
TBLPROPERTIES ("skip.header.line.count"="1");
LOAD DATA INPATH '/CsvData/csv_table.csv' OVERWRITE INTO TABLE csv_table;
So the file under /CsvData
will be moved into /user/hive
. It makes sense.
But how if I want to create another table?
create external table if not exists csv_table2 (dummy STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION 'hdfs://localhost:9000/user/hive'
TBLPROPERTIES ("skip.header.line.count"="1");
LOAD DATA INPATH '/CsvData/csv_table2.csv' OVERWRITE INTO TABLE csv_table2;
It will raise an exception complaining that the directory is not empty.
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. Directory hdfs://localhost:9000/user/hive could not be cleaned up.
So it is hard for me to understand, does it mean I can store only one file understand one directory? To store multiple files I have to create one directory for every file?
Is it possible to store all the files together?
Create table sentence will NOT raise an exception complaining that the directory is not empty because it is quite normal scenario when you create table on top of existing directory.
You can store as many files in the directory as necessary. And all of them will be accessible to the table built on top of the folder.
Table location is directory, not file. If you need to create new table and keep it's files not mixed with other table then create separate folder.
Read also this answer for clear understanding: https://stackoverflow.com/a/54038932/2700344