Search code examples
hadoophiveexternal-tables

Is Hive external table data distributed to data nodes in the same way as internal tables?


I can't find reference information to explain certain details of Hive external tables. When a file located outside the default data warehouse is loaded to an external table (using LOCATION), is the data ingested and distributed among the data nodes as is the case with internal tables -- and the file used as the source remains intact in the file system, which essentially duplicates the data?


Solution

  • If the data are already in HDFS, there is no duplication. An EXTERNAL table points to any HDFS location for its storage...