In Hue --> Hive Query Browser I created an external table in Hive and loaded data from one of my CSV files into it using the following statements:
CREATE EXTERNAL TABLE movies(movieId BIGINT, title VARCHAR(100), genres VARCHAR(100)) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE;
LOAD DATA INPATH '/user/admin/movie_data/movies' INTO TABLE movies;
I see that the source file "movies" disappears from HDFS and moves to the hive datawarehouse. I am under the impression that an external table acts only as a link to original source data.
Should the external table not be independent of source data - as in if I were to drop the table, source file will still persist? How do I create such an external table?
The external tables stores data in a hdfs location mentioned while we create the table. So if we dont provide location while creating the table it will be defaulted to warehouse hdfs folder. Try running "use mydatabase_name;show create table mytable_name;" to get the table definition to see what is the location it is pointed to. If you need a hdfs location other than default one you need to mention it while creating table.Refer below query [Create external table test (col1 string) location '/data/database/tablename';] Secondly LOAD INPATH will not move data from INPATH to external hdfs location, it will insert data from INPATH to your table table (more like copying data from inpath to tables's hdfs location)